CPU Performance

While Denver’s architecture is something fascinating to study, it’s important to see how well this translates to the real world. Denver on paper is a beast, but in the real world there are a number of factors to consider, not the least of which is the effectiveness of NVIDIA’s DCO. We’ve laid out that Denver’s best and worst case scenarios heavily ride on the DCO, and for NVIDIA to achieve their best-case performance they need to be able to generate and feed Denver with lots and lots of well optimized code. If Denver spends too much time working directly off of ARM code or can’t do a good job optimizing the recurring code it finds then Denver will struggle. Meanwhile other important factors are in play as well, including the benefits and drawbacks of Denver’s two cores versus competing SoC’s quad A15/A57 configurations, and in thermally constrained scenarios Denver’s ability to deliver good performance while keeping its power consumption in check.

In order to test this and general system performance, we turn our suite of benchmarks that include browser performance tests, general system tests, and game-type benchmarks. As Denver relies on code-morphing to enable out of order execution and speculative execution, most of these benchmarks should be able to show ideal performance as loop performance in Denver is basically second to none. While most of these benchmarks are showing their age, they should be usable for valid comparisons until we move to our new test suite.

SunSpider 1.0.2 Benchmark  (Chrome/Safari/IE)

Kraken 1.1 (Chrome/Safari/IE)

Google Octane v2  (Chrome/Safari/IE)

WebXPRT (Chrome/Safari/IE)

Basemark OS II 2.0 - Overall

Basemark OS II 2.0 - System

Basemark OS II 2.0 - Memory

The Basemark System test seems to contribute quite strongly to how the Nexus 9 performs in the overall subtest. Given that this is a storage performance benchmark, it's likely that Basemark OS II has issues similar to Androbench on 5.0 Lollipop or that random I/O is heavily prioritized in this test.

Basemark OS II 2.0 - Graphics

There's a noticeable performance uplift in the graphics test, and although not exactly part of the CPU this does seem at least somewhat plausible as GPU driver updates can improve performance over time.

Basemark OS II 2.0 - Web

Overall, performance seems to be quite checkered, although improved from our initial evaluation of the Nexus 9. Unfortunately, even in benchmarks where the DCO should be able to easily unroll loops to achieve massive amounts of performance, we see inconsistent performance in Denver. This may come down to an issue with the DCO, or even more simply the fact that Denver is spending more time than it would like to directly executing ARM code as opposed to going through the DCO.

In this case looking at the SunSpider and Kraken javascript benchmarks offers an interesting proxy case for exactly that scenario. SunSpider on modern CPUs executes extremely quickly, so quickly that the individual tests are often over in only a couple of dozen of milliseconds. This is a particularly rough scenario for Denver, as it doesn’t provide Denver with much time to optimize, even if the code is run multiple times. Meanwhile Kraken pushes many similar buttons, but its tests are longer, and that gives Denver more time to optimize. Consequently we find that Denver’s SunSpider performance is quite poor – underperforming even the A15-based Tegra K1-32 – while Denver passes even the iPad Air 2 in Kraken.

Ultimately this kind of inconsistent performance is a risk and a challenge for Denver. While no single SoC tops every last CPU benchmark, we also don’t typically see the kind of large variations that are occurring with Denver. If Denver’s lows are too low, then it definitely impacts the suitability of the SoC for high-end devices, as users have come to expect peppy performance at all times.

In practice, I didn't really notice any issues with the Nexus 9's performance, although there were odd moments during intense multitasking where I experienced extended pauses/freezes that were likely due to the DCO getting stuck somewhere in execution, seeing as how the DCO can often have unexpected bugs such as repeated FP64 multiplication causing crashes. In general, I noticed that the device tended to also get hot even on relatively simple tasks, which doesn't bode well for battery life. This is localized to the top of the tablet, which should help with user comfort although this comes at the cost of worse sustained performance.

SPECing Denver's Performance GPU and NAND Performance
Comments Locked

169 Comments

View All Comments

  • lucam - Thursday, February 5, 2015 - link

    Next time you will write the article for Anand.
  • tuxRoller - Thursday, February 5, 2015 - link

    Just tested on my N7 2013. Results were far higher than shown in the chart.
    SR:64.2->76.1
    SW:18.4->30.1
    RR:11.2->13.4
    RW:0.7->3.1
  • mpokwsths - Thursday, February 5, 2015 - link

    Well, your results are far far more improved than 10% Andrei says.
    3 devices by 2 different users, all showed vast improvements (10-500%).
    Only they refuse to acknowledge it.
    Who knows, it seems Anandtech guys are on Apple's payroll...
  • eiriklf - Thursday, February 5, 2015 - link

    Just wanted to note that on the NAND performance front, I believe the android devices which beat the nexus 9 in sequential speed use emmc 5.0 while the nexus uses a high quality emmc 4.5. I think this is because the tegra K1 SoC does not support emmc 5.0.
  • tviceman - Wednesday, February 4, 2015 - link

    Better late than never, although being this late is indeed a big letdown.

    Onto the hardware, looks like Denver is an interesting first custom SoC from Nvidia. Solid in some respects, lacking in others. I think it's a solid building block from which to work on and improve. I hope Nvidia continues the custom ARM core path and gets more design wins (if warranted) moving forward.
  • kepstin - Wednesday, February 4, 2015 - link

    The Denver chip design is pretty interesting, but it reminds me very strongly of another mobile-targeted chip that didn't do well in the marketplace; the Transmeta Crusoe.

    Both are VLIW designs with in-order execution, both rely on software code translation that runs on the CPU itself. Both even used a partitioned section of system ram as a translated ops cache.

    The most significant difference that I see between them is the addition of a native ARM decoder to the Denver CPU; the Crusoe didn't have a native X86 decoder and relied on the dynamic translation for all code that it executed.

    I had a Crusoe for a while in a Sony Vaio; it was used in some of the very small/lightweight ultraportable laptops by Japanese manufacturers for a while.
  • phoenix_rizzen - Wednesday, February 4, 2015 - link

    Didn't a large group of Transmeta devs get hired by Nvidia?
  • ABR - Thursday, February 5, 2015 - link

    Crusoe lost because Transmeta woke the sleeping giant Intel to the value of low-power, and then a group of 100 people couldn't keep up the resulting engineering race. The x86 world would be a pretty different place today if that hadn't occurred. But I'd say the jury is still out on the overall capability of the VLIW + morphing approach.
  • frenchy_2001 - Thursday, February 5, 2015 - link

    I would second that. A quick search returned a licensing agreement where nvidia licensed Transmeta's technology.
    This could be a good part of Denver.

    About in order execution, the biggest experiment was from intel: itanium.
  • kgh00007 - Wednesday, February 4, 2015 - link

    It's 3 months late, the nexus 9 was released on the 3rd of November!!

    No excuses, but it's just too late to help people make an informed decision!! Just like dog years, one year for a tablet is like 7 technology(dog) years!!

Log in

Don't have an account? Sign up now