SPECing Denver's Performance

Finally, before diving into our look at Denver in the real world on the Nexus 9, let’s take a look at a few performance considerations.

With so much of Denver’s performance riding on the DCO, starting with the DCO we have a slide from NVIDIA profiling the execution of SPECInt2000 on Denver. In it NVIDIA showcases how much time Denver spends on each type of code execution – native ARM code, the optimizer, and finally optimized code – along with an idea of the IPC they achieve on this benchmark.

What we find is that as expected, it takes a bit of time for Denver’s DCO to kick in and produce optimized native code. At the start of the benchmark execution with little optimized code to work with, Denver initially executes ARM code via its ARM decoder, taking a bit of time to find recurring code. Once it finds that recurring code Denver’s DCO kicks in – taking up CPU time itself – as the DCO begins replacing recurring code segments with optimized, native code.

In this case the amount of CPU time spent on the DCO is never too great of a percentage of time, however NVIDIA’s example has the DCO noticeably running for quite some time before it finally settles down to an imperceptible fraction of time. Initially a much larger fraction of the time is spent executing ARM code on Denver due to the time it takes for the optimizer to find recurring code and optimize it. Similarly, another spike in ARM code is found roughly mid-run, when Denver encounters new code segments that it needs to execute as ARM code before optimizing it and replacing it with native code.

Meanwhile there’s a clear hit to IPC whenever Denver is executing ARM code, with Denver’s IPC dropping below 1.0 whenever it’s executing large amounts of such code. This in a nutshell is why Denver’s DCO is so important and why Denver needs recurring code, as it’s going to achieve its best results with code it can optimize and then frequently re-use those results.

Also of note though, Denver’s IPC per slice of time never gets above 2.0, even with full optimization and significant code recurrence in effect. The specific IPC of any program is going to depend on the nature of the code, but this serves as a good example of the fact that even with a bag full of tricks in the DCO, Denver is not going to sustain anything near its theoretical maximum IPC of 7. Individual VLIW instructions may hit 7, but over any period of time if a lack of ILP in the code itself doesn’t become the bottleneck, then other issues such as VLIW density limits, cache flushes, and unavoidable memory stalls will. The important question is ultimately whether Denver’s IPC is enough of an improvement over Cortex A15/A57 to justify both the power consumption costs and the die space costs of its very wide design.

NVIDIA's example also neatly highlights the fact that due to Denver’s favoritism for code reuse, it is in a position to do very well in certain types of benchmarks. CPU benchmarks in particular are known for their extended runs of similar code to let the CPU settle and get a better sustained measurement of CPU performance, all of which plays into Denver’s hands. Which is not to say that it can’t also do well in real-world code, but in these specific situations Denver is well set to be a benchmark behemoth.

To that end, we have also run our standard copy of SPECInt2000 to profile Denver’s performance.

SPECint2000 - Estimated Scores
  K1-32 (A15) K1-64 (Denver) % Advantage

Given Denver’s obvious affinity for benchmarks such as SPEC we won’t dwell on the results too much here. But the results do show that Denver is a very strong CPU under SPEC, and by extension under conditions where it can take advantage of significant code reuse. Similarly, because these benchmarks aren’t heavily threaded, they’re all the happier with any improvements in single-threaded performance that Denver can offer.

Coming from the K1-32 and its Cortex-A15 CPU to K1-64 and its Denver CPU, the actual gains are unsurprisingly dependent on the benchmark. The worst case scenario of 176.gcc still has Denver ahead by 17%, meanwhile the best case scenario of 255.vortex finds that Denver bests A15 by 67%, coming closer than one would expect towards doubling A15's performance entirely. The best case scenario is of course unlikely to occur in real code, though I’m not sure the same can be said for the worst case scenario. At the same time we find that there aren’t any performance regressions, which is a good start for Denver.

If nothing else it's clear that Denver is a benchmark monster. Now let's see what it can do in the real world.

The Secret of Denver: Binary Translation & Code Optimization CPU Performance
Comments Locked


View All Comments

  • Mondozai - Wednesday, February 4, 2015 - link

    No offence but how relevant is this review so many months after release?
    You guys dropped the ball on this one. We're also still waiting for the GTX 960 review.

    What has happened to Anandtech...
  • LocutusEstBorg - Wednesday, February 4, 2015 - link

    There's no Anand.
  • nathanddrews - Wednesday, February 4, 2015 - link

    That's the only change I've noticed.
  • Morawka - Wednesday, February 4, 2015 - link

    and no Brian Klug
  • nathanddrews - Wednesday, February 4, 2015 - link

    Yeah, but that was earlier.
  • Ryan Smith - Wednesday, February 4, 2015 - link

    "What has happened to Anandtech..."

    Nothing has happened to AnandTech. We're still here and working away at new articles.=)

    However this article fell victim to bad timing. The short story is that I was out sick for almost 2 weeks in December, which meant this got backed up into the mess that is the holidays and CES.

    As for how relevant it is, it is still Google's premiere large format tablet and the only shipping Denver device, both of which make it a very interesting product.
  • Jon Tseng - Wednesday, February 4, 2015 - link

    It's fine to be late (although maybe not as late as the Razer Blade 2014 review!). Better to have late, differentiated content than early, commoditised content. Whether the review like's the colour of a tablet's trim is of limited interest for me; the details of Denver code-morphing are.

    Actually my worry is that under new ownership Anandtech might be pushed to go down the publish early/get click views route vs. the publish late/actually deliver something useful. Hopefully it won't come to this, but this is what historically happens... :-(
  • Operandi - Thursday, February 5, 2015 - link

    Being there on day one is not a huge deal but its certainly not ok be as late as this review is or the still MIA 960 review. If you are going to be late you better be brining something new to the table to justify not being there in the same time frame as your peers. This is so laughably late its almost embarrassing to release it at all at this point.

    Tech journalism like most other markets is competitive and there are lots of other very competent publications out there competing for the same readers. Personally I've already gotten all the Nexus 9 information elsewhere so this review is of no value to me whatsoever. The same goes for the 960 review when/if that review ever shows up.
  • akdj - Wednesday, February 11, 2015 - link

    Not sure where you've seen such an extensive write up and dissection of Denver, but I certainky haven't. Nor were the N9/6 widely available until the holidays were over. Like a month ago
    For every 10,000,000 iPads produced, HTC is probably knocking out 10,000
    Excellent review, write up and information about the 'other 64bit' option.
  • Taneli - Wednesday, February 4, 2015 - link

    Timing is secondary for a deeply technical article like this here. You guys did exactly the right thing, reporting when the device was announced and waited for the review to be done before publishing. Also, having people out sick in a small team is something you really can't do that much about. I hope you're well now.

    The article itself was superb. Thanks for the read and keep up the good work.

Log in

Don't have an account? Sign up now