SPECing Denver's Performance

Finally, before diving into our look at Denver in the real world on the Nexus 9, let’s take a look at a few performance considerations.

With so much of Denver’s performance riding on the DCO, starting with the DCO we have a slide from NVIDIA profiling the execution of SPECInt2000 on Denver. In it NVIDIA showcases how much time Denver spends on each type of code execution – native ARM code, the optimizer, and finally optimized code – along with an idea of the IPC they achieve on this benchmark.

What we find is that as expected, it takes a bit of time for Denver’s DCO to kick in and produce optimized native code. At the start of the benchmark execution with little optimized code to work with, Denver initially executes ARM code via its ARM decoder, taking a bit of time to find recurring code. Once it finds that recurring code Denver’s DCO kicks in – taking up CPU time itself – as the DCO begins replacing recurring code segments with optimized, native code.

In this case the amount of CPU time spent on the DCO is never too great of a percentage of time, however NVIDIA’s example has the DCO noticeably running for quite some time before it finally settles down to an imperceptible fraction of time. Initially a much larger fraction of the time is spent executing ARM code on Denver due to the time it takes for the optimizer to find recurring code and optimize it. Similarly, another spike in ARM code is found roughly mid-run, when Denver encounters new code segments that it needs to execute as ARM code before optimizing it and replacing it with native code.

Meanwhile there’s a clear hit to IPC whenever Denver is executing ARM code, with Denver’s IPC dropping below 1.0 whenever it’s executing large amounts of such code. This in a nutshell is why Denver’s DCO is so important and why Denver needs recurring code, as it’s going to achieve its best results with code it can optimize and then frequently re-use those results.

Also of note though, Denver’s IPC per slice of time never gets above 2.0, even with full optimization and significant code recurrence in effect. The specific IPC of any program is going to depend on the nature of the code, but this serves as a good example of the fact that even with a bag full of tricks in the DCO, Denver is not going to sustain anything near its theoretical maximum IPC of 7. Individual VLIW instructions may hit 7, but over any period of time if a lack of ILP in the code itself doesn’t become the bottleneck, then other issues such as VLIW density limits, cache flushes, and unavoidable memory stalls will. The important question is ultimately whether Denver’s IPC is enough of an improvement over Cortex A15/A57 to justify both the power consumption costs and the die space costs of its very wide design.

NVIDIA's example also neatly highlights the fact that due to Denver’s favoritism for code reuse, it is in a position to do very well in certain types of benchmarks. CPU benchmarks in particular are known for their extended runs of similar code to let the CPU settle and get a better sustained measurement of CPU performance, all of which plays into Denver’s hands. Which is not to say that it can’t also do well in real-world code, but in these specific situations Denver is well set to be a benchmark behemoth.

To that end, we have also run our standard copy of SPECInt2000 to profile Denver’s performance.

SPECint2000 - Estimated Scores
  K1-32 (A15) K1-64 (Denver) % Advantage
164.gzip
869
1269
46%
175.vpr
909
1312
44%
176.gcc
1617
1884
17%
181.mcf
1304
1746
34%
186.crafty
1030
1470
43%
197.parser
909
1192
31%
252.eon
1940
2342
20%
253.perlbmk
1395
1818
30%
254.gap
1486
1844
24%
255.vortex
1535
2567
67%
256.bzip2
1119
1468
31%
300.twolf
1339
1785
33%

Given Denver’s obvious affinity for benchmarks such as SPEC we won’t dwell on the results too much here. But the results do show that Denver is a very strong CPU under SPEC, and by extension under conditions where it can take advantage of significant code reuse. Similarly, because these benchmarks aren’t heavily threaded, they’re all the happier with any improvements in single-threaded performance that Denver can offer.

Coming from the K1-32 and its Cortex-A15 CPU to K1-64 and its Denver CPU, the actual gains are unsurprisingly dependent on the benchmark. The worst case scenario of 176.gcc still has Denver ahead by 17%, meanwhile the best case scenario of 255.vortex finds that Denver bests A15 by 67%, coming closer than one would expect towards doubling A15's performance entirely. The best case scenario is of course unlikely to occur in real code, though I’m not sure the same can be said for the worst case scenario. At the same time we find that there aren’t any performance regressions, which is a good start for Denver.

If nothing else it's clear that Denver is a benchmark monster. Now let's see what it can do in the real world.

The Secret of Denver: Binary Translation & Code Optimization CPU Performance
Comments Locked

169 Comments

View All Comments

  • AbRASiON - Thursday, February 5, 2015 - link

    LCD, not OLED? Blacks being grey? Nope :/
  • blzd - Friday, February 6, 2015 - link

    I'd actually rather grey blacks then the loss of detail in black areas. Pure black is nice, but not when it comes at the expense of shadow details.
  • techn0mage - Thursday, February 5, 2015 - link

    I agree that late is better than never. Rather than discuss things that can't be changed, I felt the following points were worth raising:

    Is there any Nexus 6 data in the benchmark charts? I didn't see any. The N6 and N9 were released roughly around the same point in time, and like the N5 and N7 they are high-profile devices in the Android landscape, so it would have been nice to have them in the charts to make comparisons. Please correct me if I've overlooked anything.

    The Denver deep dive, while certainly relevant to Nexus 9 and good AT content on any day, was probably a good candidate for having its own article. I believe it is fair to say the Denver content is -less- time sensitive than the overall review. Hopefully the review was not held back by the decision to include the "DDD" content - and to be clear right now I have no reason to believe it was.
  • WndlB - Thursday, February 5, 2015 - link

    Particularly in this kind of full-dress review of high-end devices, could you start covering the delivered sound, the DAC chips and headphone jack?

    Via A-B comparisons, I'm finding some real differences and, as people go to more high-quuality audiio streams (plus video sound), this is becoming a differentiator of significance. Thanks.
  • JoshHo - Tuesday, February 10, 2015 - link

    We could do subjective opinion, but properly testing 3.5mm output requires significant investment in test equipment.
  • name99 - Thursday, February 5, 2015 - link

    I know this isn't exactly a Nexus9 questions, but how can your battery life results for iPad Air2 be so inconsistent?
    We are given 10.18 hrs for "display a white image" and 13.63 hrs for "display video". For an OLED display this is possible, but not for a LED-backlit display unless you are running the video at a "base-level" brightness of much lower than the 200 nits of the "display a white image", and what's the point of that? Surely the relevance of the "display a white image" is to show how long the display+battery lasts under normal usage conditions, not when being used as a flashlight?

    My point is --- I am guessing that the "display a white image" test utilizes some app that prevents the screen from going black. Do you have confidence that that app (and in particular whatever tickling of the OS that is done to prevent sleep) is doing this in the energy optimal way, on both iOS and Android?
  • JoshHo - Tuesday, February 10, 2015 - link

    I don't believe there was any real background CPU usage. To my knowledge the difference is that Apple enables dynamic contrast in movies.
  • easp - Thursday, February 5, 2015 - link

    "The successor to the Nexus 7 was even more incredible, as it pushed hardware that was equal to or better than most tablets on the market at a lower price. However, as with most of these low cost Nexus devices not everything was perfect as corners still had to be cut in order to hit these low price points."

    So, hardware that was equal or better, except it wasn't? This is a situation where being more specific would help. My guess, when you said equal or better you were referring to certain specifications, certain obvious specifications like core count, RAM, and maybe screen resolution?
  • mkygod - Friday, February 6, 2015 - link

    Owned a Nexus 9 for almost 3 months. I purchased three actually to see if backlight bleed was any better, but nope; so I ended up returning them a couple weeks ago. The bleeding was pretty bad; worse than any LCD device i've ever used and definitely worse than the Nexus 5 and Nexus 7. And it would've been okay if it had uniform bleeding like the Nexus 5, but it had blotches of bright spots all along the edges which is even more distracting. I found the reflectivity with the screen a non-factor in my exclusively indoor use. It's a shame because the Nexus 9 is an otherwise damn good tablet. What's also disappointing, as the review points out, is if you want a high-end tablet around this size, your only options are the 9 and the Tab S. It seems like a lot of really good Android tablets are in the 8" size, such as the Shield and new Dell Venue, with more manufacturers on the horizon making tablets in this size.
  • MartinT - Friday, February 6, 2015 - link

    I wonder what level of load penalty is incurred by having to ship in optimized code from main memory. Is there any prefetching going on to preposition code segments in lower level caches ahead of being called?

Log in

Don't have an account? Sign up now