CPU Performance

While Denver’s architecture is something fascinating to study, it’s important to see how well this translates to the real world. Denver on paper is a beast, but in the real world there are a number of factors to consider, not the least of which is the effectiveness of NVIDIA’s DCO. We’ve laid out that Denver’s best and worst case scenarios heavily ride on the DCO, and for NVIDIA to achieve their best-case performance they need to be able to generate and feed Denver with lots and lots of well optimized code. If Denver spends too much time working directly off of ARM code or can’t do a good job optimizing the recurring code it finds then Denver will struggle. Meanwhile other important factors are in play as well, including the benefits and drawbacks of Denver’s two cores versus competing SoC’s quad A15/A57 configurations, and in thermally constrained scenarios Denver’s ability to deliver good performance while keeping its power consumption in check.

In order to test this and general system performance, we turn our suite of benchmarks that include browser performance tests, general system tests, and game-type benchmarks. As Denver relies on code-morphing to enable out of order execution and speculative execution, most of these benchmarks should be able to show ideal performance as loop performance in Denver is basically second to none. While most of these benchmarks are showing their age, they should be usable for valid comparisons until we move to our new test suite.

SunSpider 1.0.2 Benchmark  (Chrome/Safari/IE)

Kraken 1.1 (Chrome/Safari/IE)

Google Octane v2  (Chrome/Safari/IE)

WebXPRT (Chrome/Safari/IE)

Basemark OS II 2.0 - Overall

Basemark OS II 2.0 - System

Basemark OS II 2.0 - Memory

The Basemark System test seems to contribute quite strongly to how the Nexus 9 performs in the overall subtest. Given that this is a storage performance benchmark, it's likely that Basemark OS II has issues similar to Androbench on 5.0 Lollipop or that random I/O is heavily prioritized in this test.

Basemark OS II 2.0 - Graphics

There's a noticeable performance uplift in the graphics test, and although not exactly part of the CPU this does seem at least somewhat plausible as GPU driver updates can improve performance over time.

Basemark OS II 2.0 - Web

Overall, performance seems to be quite checkered, although improved from our initial evaluation of the Nexus 9. Unfortunately, even in benchmarks where the DCO should be able to easily unroll loops to achieve massive amounts of performance, we see inconsistent performance in Denver. This may come down to an issue with the DCO, or even more simply the fact that Denver is spending more time than it would like to directly executing ARM code as opposed to going through the DCO.

In this case looking at the SunSpider and Kraken javascript benchmarks offers an interesting proxy case for exactly that scenario. SunSpider on modern CPUs executes extremely quickly, so quickly that the individual tests are often over in only a couple of dozen of milliseconds. This is a particularly rough scenario for Denver, as it doesn’t provide Denver with much time to optimize, even if the code is run multiple times. Meanwhile Kraken pushes many similar buttons, but its tests are longer, and that gives Denver more time to optimize. Consequently we find that Denver’s SunSpider performance is quite poor – underperforming even the A15-based Tegra K1-32 – while Denver passes even the iPad Air 2 in Kraken.

Ultimately this kind of inconsistent performance is a risk and a challenge for Denver. While no single SoC tops every last CPU benchmark, we also don’t typically see the kind of large variations that are occurring with Denver. If Denver’s lows are too low, then it definitely impacts the suitability of the SoC for high-end devices, as users have come to expect peppy performance at all times.

In practice, I didn't really notice any issues with the Nexus 9's performance, although there were odd moments during intense multitasking where I experienced extended pauses/freezes that were likely due to the DCO getting stuck somewhere in execution, seeing as how the DCO can often have unexpected bugs such as repeated FP64 multiplication causing crashes. In general, I noticed that the device tended to also get hot even on relatively simple tasks, which doesn't bode well for battery life. This is localized to the top of the tablet, which should help with user comfort although this comes at the cost of worse sustained performance.

SPECing Denver's Performance GPU and NAND Performance
Comments Locked

169 Comments

View All Comments

  • PC Perv - Wednesday, February 4, 2015 - link

    It is clear, even though you did not say, why no one other than NV and Google will use Denver in their products. Thank you for the coherent review, Ryan.

    P.S. I can't wait for the day SunSpider, Basemark, and WebXPRT disappear from your benchmark suit.
  • jjj - Wednesday, February 4, 2015 - link

    You always make those kind of claims about dual core vs more cores but you have never attempted to back them up with real world perf and power testing.
    In real use there are alerts and chats and maybe music playing and so on. While your hypothesis could be valid or partially valid you absolutely need to first verify it before heavily insisting on it and accepting it as true. Subjective conclusions are just not your style is it, you test things to get to objective results.
    And it wold be easy you already have "clean"numbers and you would just need to run the same benchmarks for perf and power with some simulated background activity to be able to compare the differences in gains/loses.
  • PC Perv - Wednesday, February 4, 2015 - link

    Where would you put the performance of "backup" ARM-only part of Denver? Cortex-A7? Is it measurable at all?

    Also, why don't Samsung use F2FS for their devices? I thought it was developed by them.
  • abufrejoval - Wednesday, February 4, 2015 - link

    While the principal designer seems to be a Korean, I'm not sure he works for Samsung, who typically used Yet Another Flash File System (YAFFS).
  • Ryan Smith - Wednesday, February 4, 2015 - link

    It's not measurable in a traditional sense, as the DCO will kick in at some point. However I'd say it's somewhere along the lines of A53, though overall a bit better.
  • Shadowmaster625 - Wednesday, February 4, 2015 - link

    The design philosophy of the DCO does make a lot of sense. When your mobile device starts to bog down and you start cursing at it, what is it usually doing? It is usually looping or iterating through something. The DCO wont help with small blocks of code that execute in 500uS, but you dont need help with that sort of code anyway. What you want to improve is exactly the type of code the DCO can improve: the kind of code that takes several dozen milliseconds (or more) to execute. That is when you begin to notice the lag in your cpu.
  • mpokwsths - Wednesday, February 4, 2015 - link

    Joshua & Ryan,

    please update the charts with the bench results of the newer version of Androbench 4: https://play.google.com/store/apps/details?id=com....
    (I had previously commented on the fact that you can't safely compare the i/o results of different OS AND different bench apps).

    Androbench 4 is redesigned it to use multiple i/o threads (as a proper i/o bench app should have) and produces vastly improved results on both Lollipop and earlier Android devices.

    You will not be able to compare the newer results with older ones, but at least it will put an end to this ridiculus ι/ο performance difference between iOS and Android, the one you persistently -but falsly- keep projecting.
  • Andrei Frumusanu - Wednesday, February 4, 2015 - link

    I tested this out on several of my devices and could see only minor improvements, all within 10%. The performance difference to iOS devices does not seem to be a dupe at all.
  • mpokwsths - Wednesday, February 4, 2015 - link

    My results strongly disagree with you:
    Nexus 5: Seq Write: 19MB/s --> 55 MB/s
    Rand Write: 0.9 --> 2.9 MB/s

    Sony Z3 Tablet: Seq Write: 21 MB/s --> 53 MB/s
    Rand Write: 1,6 MB/s --> 8MB/s
    Seq Read: 135 MB/s --> 200MB/s

    I can upload pics showing my findings.
  • mpokwsths - Wednesday, February 4, 2015 - link

    Meet the fastest Nexus 5 in the world:https://www.dropbox.com/s/zkhn073xy8l28ry/Screensh...

    ;)

Log in

Don't have an account? Sign up now