SPECing Denver's Performance

Finally, before diving into our look at Denver in the real world on the Nexus 9, let’s take a look at a few performance considerations.

With so much of Denver’s performance riding on the DCO, starting with the DCO we have a slide from NVIDIA profiling the execution of SPECInt2000 on Denver. In it NVIDIA showcases how much time Denver spends on each type of code execution – native ARM code, the optimizer, and finally optimized code – along with an idea of the IPC they achieve on this benchmark.

What we find is that as expected, it takes a bit of time for Denver’s DCO to kick in and produce optimized native code. At the start of the benchmark execution with little optimized code to work with, Denver initially executes ARM code via its ARM decoder, taking a bit of time to find recurring code. Once it finds that recurring code Denver’s DCO kicks in – taking up CPU time itself – as the DCO begins replacing recurring code segments with optimized, native code.

In this case the amount of CPU time spent on the DCO is never too great of a percentage of time, however NVIDIA’s example has the DCO noticeably running for quite some time before it finally settles down to an imperceptible fraction of time. Initially a much larger fraction of the time is spent executing ARM code on Denver due to the time it takes for the optimizer to find recurring code and optimize it. Similarly, another spike in ARM code is found roughly mid-run, when Denver encounters new code segments that it needs to execute as ARM code before optimizing it and replacing it with native code.

Meanwhile there’s a clear hit to IPC whenever Denver is executing ARM code, with Denver’s IPC dropping below 1.0 whenever it’s executing large amounts of such code. This in a nutshell is why Denver’s DCO is so important and why Denver needs recurring code, as it’s going to achieve its best results with code it can optimize and then frequently re-use those results.

Also of note though, Denver’s IPC per slice of time never gets above 2.0, even with full optimization and significant code recurrence in effect. The specific IPC of any program is going to depend on the nature of the code, but this serves as a good example of the fact that even with a bag full of tricks in the DCO, Denver is not going to sustain anything near its theoretical maximum IPC of 7. Individual VLIW instructions may hit 7, but over any period of time if a lack of ILP in the code itself doesn’t become the bottleneck, then other issues such as VLIW density limits, cache flushes, and unavoidable memory stalls will. The important question is ultimately whether Denver’s IPC is enough of an improvement over Cortex A15/A57 to justify both the power consumption costs and the die space costs of its very wide design.

NVIDIA's example also neatly highlights the fact that due to Denver’s favoritism for code reuse, it is in a position to do very well in certain types of benchmarks. CPU benchmarks in particular are known for their extended runs of similar code to let the CPU settle and get a better sustained measurement of CPU performance, all of which plays into Denver’s hands. Which is not to say that it can’t also do well in real-world code, but in these specific situations Denver is well set to be a benchmark behemoth.

To that end, we have also run our standard copy of SPECInt2000 to profile Denver’s performance.

SPECint2000 - Estimated Scores
  K1-32 (A15) K1-64 (Denver) % Advantage
164.gzip
869
1269
46%
175.vpr
909
1312
44%
176.gcc
1617
1884
17%
181.mcf
1304
1746
34%
186.crafty
1030
1470
43%
197.parser
909
1192
31%
252.eon
1940
2342
20%
253.perlbmk
1395
1818
30%
254.gap
1486
1844
24%
255.vortex
1535
2567
67%
256.bzip2
1119
1468
31%
300.twolf
1339
1785
33%

Given Denver’s obvious affinity for benchmarks such as SPEC we won’t dwell on the results too much here. But the results do show that Denver is a very strong CPU under SPEC, and by extension under conditions where it can take advantage of significant code reuse. Similarly, because these benchmarks aren’t heavily threaded, they’re all the happier with any improvements in single-threaded performance that Denver can offer.

Coming from the K1-32 and its Cortex-A15 CPU to K1-64 and its Denver CPU, the actual gains are unsurprisingly dependent on the benchmark. The worst case scenario of 176.gcc still has Denver ahead by 17%, meanwhile the best case scenario of 255.vortex finds that Denver bests A15 by 67%, coming closer than one would expect towards doubling A15's performance entirely. The best case scenario is of course unlikely to occur in real code, though I’m not sure the same can be said for the worst case scenario. At the same time we find that there aren’t any performance regressions, which is a good start for Denver.

If nothing else it's clear that Denver is a benchmark monster. Now let's see what it can do in the real world.

The Secret of Denver: Binary Translation & Code Optimization CPU Performance
Comments Locked

169 Comments

View All Comments

  • dtgoodwin - Wednesday, February 4, 2015 - link

    I really appreciate the depth that this article has, however, I wonder if it would have been better to separate the in depth CPU analysis for a separate article. I will probably never remember to come back to the Nexus 9 review if I want to remember a specific detail about that CPU.
  • nevertell - Wednesday, February 4, 2015 - link

    Has nVidia exposed that they would provide a static version of the DCO so that app developers would be able to optimize their binaries at compile time ? Or do these optimizations rely on the program state when they are being executed ? From a pure academic point of view, it would be interesting to see the overhead introduced by the DCO when comparing previously optimized code without the DCO running and running the SoC as was intended.
  • Impulses - Wednesday, February 4, 2015 - link

    Nice in depth review as always, came a little late for me (I purchased one to gift it, which I ironically haven't done since the birthday is this month) but didn't really change much as far as my decision so it's all good...

    I think the last remark nails it, had the price point being just a little lower most of the minor QC issues wouldn't have been blown up...

    I don't know if $300 for 16GB was feasible (pretty much the price point of the smaller Shield), but $350 certainly was and Amazon was selling it for that much all thru Nov-Dec which is bizarre since Google never discounted it themselves.

    I think they should've just done a single $350-400 32GB SKU, saved themselves a lot of trouble and people would've applauded the move (and probably whined for a 64GB but you can't please everyone). Or a combo deal with the keyboard, which HTC was selling at 50% at one point anyway.
  • Impulses - Wednesday, February 4, 2015 - link

    No keyboard review btw?
  • JoshHo - Thursday, February 5, 2015 - link

    We did not receive the keyboard folio for review.
  • treecats - Wednesday, February 4, 2015 - link

    Where is the comparison to NEXUS 10????

    Maybe because Nexus 10's battery life is crap after 1 year of use!!!

    Please come back review it again when you used it for a year.
  • treecats - Wednesday, February 4, 2015 - link

    My previously holds true for all the Nexus device line I own.

    I had Nexus 4,

    currently have Nexus 5, and Nexus 10. All the Nexus devices I own have bad battery life after 1 year of use.

    Google, fix the battery problem.
  • blzd - Friday, February 6, 2015 - link

    That tells me you are mistreating your batteries. You think it's coincidence that it's happening to all your devices? Do you know how easy it is for batteries to degrade when over heating? Do you know every battery is rated for a certain number of charges only?

    Mostly you want to avoid heat, especially while charging. Gaming while charging? That's killing the battery. GPS navigation while charging? Again, degrading the battery.

    Each time you discharge and charge the battery you are using one of it's charge cycles. So if you use the device a lot and charge it multiple times a day you will notice degradation after a year. This is not unique to Google devices.
  • grave00 - Sunday, February 8, 2015 - link

    I don't think you have the latest info on how battery charging vs battery life works.
  • hstewartanand - Wednesday, February 4, 2015 - link

    Even though I personal have 6 tablets ( 2 iPads, 2 Windows 8.1 and 2 android ) and as developer I find them technically inferior to Actual PC - except for Windows 8.1 Surface Pro.

    I recently purchase an Lenovo y50 with i7 4700 - because I desired AVX 2 video processing. To me ARM based platforms will never replace PC devices for certain applications - like Video processing and 3d graphics work.

    I am big fan of Nvidia GPU's but don't care much for ARM cpus - I do like the completion that it given to Intel to produce low power CPU's for this market

    What I really like to see is a true technical bench mark that compare the true power of cpus from ARM and Intel and rank them. This includes using extended instructions like AVX 2 on Intel cpus.

    Compared this with equivalent configured Nvidia GPU on Intel CPU - and I would say ARM has a very long way to go.

    But a lot depends on what you doing with the device. I am currently typing this on a 4+ year old Macbook Air - because it easy to do it and convenient. My other Windows 8.1 ( Lenovo 2 Mix 8 - Intel Adam Baytrail ) has roughly the same speed - but Macbook AIR is more convenient. My primary tablet is the Apple Mini with Retina screen, it is also convent for email and amazon and small stuff.

    The problem with some of bench marks - is that they maybe optimized for one platform more than another and dependent on OS components which may very between OS environments. So ideal the tests need to native compile for cpu / gpu combination and take advantage of hardware. I don't believe such a benchmark exists. Probably the best way to do this get developers interested in platforms to come up with contest for best score and have code open source - so no cheating. It would be interesting to see ranking of machines from tablets, phones, laptop and even high performance xeon machines. I also have an 8+ Year old dual Xeon 5160 Nvidia GTX 640 (best I can get on this old machine ) and I would bet it will blow away any of this ARM based tablets. Performance wise it a little less but close to my Lenovo y50 - if not doing VIDEO processing because of AVX 2 is such significant improvement.

    In summary it really hard to compare performance of ARM vs Intel machines. But this review had some technical information that brought me back to my older days when writing assembly code on OS - PC-MOS/386

Log in

Don't have an account? Sign up now