SoC Architecture: NVIDIA's Denver CPU

It admittedly does a bit of a disservice to the rest of the Nexus 9 both in terms of hardware and as a complete product, but there’s really no getting around the fact that the highlight of the tablet is its NVIDIA-developed SoC. Or to be more specific, the NVIDIA-developed Denver CPUs within the SoC, and the fact that the Nexus 9 is the first product to ship with a Denver CPU.

NVIDIA for their part is no stranger to the SoC game, now having shipped 5 generations of Tegra SoCs (with more on their way). Since the beginning NVIDIA has been developing their own GPUs and then integrating those into their Tegra SoCs, using 3rd party ARM cores and other 1st party and 3rd party designs to fully flesh out Tegra. However even though NVIDIA is already designing some of their own IP, there’s still a big leap to be made from using licensed ARM cores to using your own ARM cores, and with Denver NVIDIA has become just the second company to release their own ARMv8 design for consumer SoCs.

For long time readers Denver may feel like a long time coming, and that perception is not wrong. NVIDIA announced Denver almost 4 years ago, back at CES 2011, where at the time they made a broad announcement about developing their own 64bit ARM core for use in wide range of devices, ranging from mobile to servers. A lot has happened in the SoC space since 2011, and given NVIDIA’s current situation Denver likely won’t be quite as broad a product as they first pitched it as. But as an A15 replacement for the same tablet and high performance embedded markets that the TK1-32 has found a home in, the Denver-based TK1-64 should fit right in.


K1-64 Die Shot Mock-up (NVIDIA)

Denver comes at an interesting time for NVIDIA and for the ARM SoC industry as a whole. Apple’s unexpected launch of the ARMv8 capable Cyclone core in 2013 beat competing high-performance ARMv8 designs by nearly a year. And overall Apple set a very high bar for performance and power efficiency that is not easily matched and has greatly impacted the development and deployment schedules of other ARMv8 SoCs. At the same time because Cyclone and its derivatives are limited to iOS devices, the high-performance Android market is currently served by a mix of ARMv7 designs (A15, Krait, etc) and the just recently arrived A57 and Denver CPUs.

Showcasing the full scope of the ARM architecture license and how many different designs can execute the same instruction set, none of these ARMv8 CPUs are all that much alike. Thanks to its wide and conservatively clocked design, Apple’s Cyclone ends up looking a lot like what a recent Intel Core processor would look like if it were executing ARM instead of x86. Meanwhile ARM’s A57 design is (for lack of a better term) very ARMy, following ARM’s own power efficient design traditions and further iterating on ARM’s big.LITTLE philosophy to pair up high performance A57 and moderate performance A53 cores to allow a SoC to cover a wide power/performance curve. And finally we have Denver, perhaps the most interesting and certainly least conventional design, forgoing the established norms of Out of Order Execution (OoOE) in favor of a very wide in-order design backed by an ambitious binary translation and optimization scheme.

Counting Cores: Why Denver?

To understand Denver it’s best to start with the state of the ARM device market, and NVIDIA’s goals in designing their own CPU core. In the ARM SoC space, much has been made of core counts, both as a marketing vehicle and of value to overall performance. Much like the PC space a decade prior, when multi-core processors became viable they were of an almost immediate benefit. Even if individual applications couldn’t yet make use of multiple cores, having a second core meant that applications and OSes were no longer time-sharing a single core, which came with its own performance benefits. The OS could do its work in the background without interrupting applications as much, and greedy applications didn’t need to fight with the OS or other applications for basic resources.

However also like the PC space, the benefits of additional cores began to taper off with each additional core. One could still benefit from 4 cores over 2 cores, but unless software was capable of putting 3-4 cores to very good use, generally one would find that performance didn’t scale well with the cores. Compounding matters in the mobile ecosystem, the vast majority of devices run apps in a “monolithic” fashion with only one app active and interacting with the user at any given point in time. This meant that in absence of apps that could use 3-4 cores, there weren’t nearly as many situations in which multitasking could be employed to find work for the additional cores. The end result has been that it has been difficult for mobile devices to consistently saturate an SoC with more than a couple of cores.

Meanwhile the Cortex family of designs coming from ARM have generally allowed high core counts. Cortex-A7 is absolutely tiny, and even the more comparable Cortex-A15 isn’t all that big on the 28nm process. Quad core A15 designs quickly came along, setting the stage for the high core count situations we previously discussed.

This brings us to NVIDIA’s goals with Denver. In part due to the issues feeding 4 cores, NVIDIA has opted for a greater focus on single-threaded performance than the ARM Cortex designs they used previously. Believing that fewer, faster cores will deliver better real-world performance and better power consumption, NVIDIA set out to build a bigger, wider CPU that would do just that. The result of this project was what NVIDIA awkwardly calls their first “super core,” Denver.

Though NVIDIA wouldn’t know it at the time it was announced in 2011, Denver in 2015 is in good company that helps to prove that NVIDIA was right to focus on single-threaded performance over additional cores. Apple’s Cyclone designs have followed a very similar philosophy and the SoCs utilizing them remain the SoCs to beat, delivering chart-topping performance even with only 2 or 3 CPU cores. Deliver something similar in performance to Cyclone in the Android market and prove the performance and power benefits of 2 larger cores over 4 weaker cores, and NVIDIA would be well set in the high-end SoC marketplace.

Performance considerations aside, for NVIDIA there are additional benefits to rolling their own CPU core. First and foremost is that it reduces their royalty rate to ARM; ARM still gets a cut as part of their ISA license, but that cut is less than if you are also using ARM licensed cores. The catch of course is that NVIDIA needs to sell enough SoCs in the long run to pay for the substantial costs of developing a CPU, which means that along with the usual technical risks, there are some financial risks as well for developing your own CPU.

The second benefit to NVIDIA then is differentiation in a crowded SoC market. The SoC market has continued to shed players over the years, with players such as Texas Instruments and ST-Ericsson getting squeezed out of the market. With so many vendors using the same Cortex CPU designs, from a performance perspective their SoCs are similarly replaceable, making the risk of being the next TI all the greater. Developing your own CPU is not without risks as well – especially if it ends up underperforming the competition – but played right it means being able to offer a product with a unique feature that helps the SoC stand out from the crowd.

Finally, at the time NVIDIA announced Denver, NVIDIA also had plans to use Denver to break into the server space. With their Tesla HPC products traditionally paired x86 CPUs, NVIDIA could never have complete control over the platform, or the greater share of revenue that would entail. Denver in turn would allow NVIDIA to offer their own CPU, capturing that market and being able to play off of the synergy of providing both the CPU and GPU. Since then however the OpenPOWER consortium happened, opening up IBM’s POWER CPU lineup to companies such as NVIDIA and allowing them to add features such as NVLink to POWER CPUs. In light of that, while NVIDIA has never officially written off Denver’s server ambitions, it seems likely that POWER has supplanted Denver as NVIDIA’s server CPU of choice.

Introduction Designing Denver
Comments Locked

169 Comments

View All Comments

  • melgross - Wednesday, February 4, 2015 - link

    So, people only buy devices during the first three months?
  • Impulses - Wednesday, February 4, 2015 - link

    Apparently... Although getting the review in before February would've shut all these people up, cheapest place to get the Nexus 9 all thru the holidays was Amazon ($350 for 16GB) and they gave you until January 31 to return it regardless of when you bought it.

    Only reason I'm so keenly aware is I bought one as a February birthday gift, opened it last weekend just to check it was fine before the return window closed... Not much backlight bleed at all even tho it was manufacturerd in October (bought in late December), some back flex but it's going in a case anyway.
  • blzd - Friday, February 6, 2015 - link

    What does the month of manufacture have to do with the back light bleed? You don't actually believe those "revision" rumors, do you?

    If you do, consider how practical it is for a hardware revision to come out 1 month after release. Then consider how one set of pictures on a Reddit post proves anything other than that their RMA worked as intended.
  • ToTTenTranz - Wednesday, February 4, 2015 - link

    I wish more smartphone/tablet makers put as much thought into their external speakers as HTC does.

    Once having a HTC One M7, I simply can't go back to mono speakers at the back of devices.
  • Dribble - Wednesday, February 4, 2015 - link

    Glad the review is here at last, next one a little bit quicker please :)
  • UpSpin - Wednesday, February 4, 2015 - link

    I have following issues with your review:
    1. You run webbrowser tests and derive CPU performance from it. That's nonsense! It's a web-browser test, and it won't be a CPU test whatever you do. If you want to test raw CPU performance you have to run native CPU test applications.

    2. Your battery life analysis is based on false assumptions and you derive doubtful claims from it.
    The error is quite evident on the iPad Air test. In your newly introduced white display test, with airplane on, CPU/GPU idling, etc. the iPad Air 2 has a battery life of 10:18 hours. Now in your web-browsing battery test with WiFi on and the CPU busy, the iPad Air 2 has a battery life of 9:76 hours. That's a difference of 4%. The Nexus 9 has a difference of 30%, the Note 4 15%, the Shield Tablet 25%.
    You conclude: The Tegra K1 is inefficient. But I could also conclude that the A8 is inefficient and the Tegra K1 very efficient. The Tegra K1 needs significantly less power while idling, compared to the A8, which consumes always the same, mostly independent on the load. So finally, the A8 lacks any kind of power saving mode.
    That's abstruse, but the consequence of your test. Or maybe your test is flawed from the beginning on.

    3. " I suspect we’re looking at the direct result of the large battery, combined with an efficient display as the Nexus 9 can last as long as 15 hours in this test compared to the iPad Air 2’s 10 hours."
    Sorry, but I don't get this either. The Nexus 9 has a 25.46 WHr battery, the iPad Air 2 a 27.3 WHr battery (+7%). The Nexus 9 has a 8.9" Display, the iPad Air 2 a 9.7". (+19% area). The resolution is the same, thus the DPI on the Nexus 9 higher. The display techonoly is the same, as you said in your analysis. So the difference must be related to something else, like a highly efficient idle SoC in the Nexus 9.
  • Andrei Frumusanu - Wednesday, February 4, 2015 - link

    The battery life tests analysis is based on true facts on the technical workings of the SoC and its idle power states and we are confident in the resulting conclusions.
  • JarredWalton - Wednesday, February 4, 2015 - link

    Going along with what Andrei said, an SoC isn't "efficient" if it's doing no work -- the A8 may not have idle power as low as the K1-64, but when you're actually doing anything more with the tablet in question is when efficiency matters. It's clear that the Air 2 wins out over the Nexus 9 in some of those tests (GFX in particular). Doing more (or equivalent) work while using less power is efficient.

    Imagine this as an example of why idle power only matters so far: if you were to start comparing cars on how long they could idle instead of actual gas mileage, would anyone care? "Car XYZ can run for 20 hours off a tank while idle while Car ZYX only lasts 15 hours!" Except, neither car is actually doing what a car is suppose to do, which is take you from point A to point B.

    The white screen test is merely a way to look at the idle power draw for a device, and by that we can get an idea of how much additional power is needed when the device is actually in use. Also note that it's possible due to the difference in OS that Android simply better disables certain services in the test scenario and iOS might be wasting power -- the fact that the battery life hardly changes in our Internet WiFi test even suggests that's the case.

    To that end, the battery life of the N9 is still quite good. Get rid of the smartphones in the charts and it's actually pretty much class leading. But it's still odd that the NVIDIA SHIELD Tablet and iPad Air 2 only show a small drop between idle and Internet, while N9 loses 33% of its battery life.
  • ABR - Thursday, February 5, 2015 - link

    Idle power is pretty important for real world use for tablets, for example where you are reading something and the system is just sitting there. Those "load web page then pause for xx time" test would probably be really good for measuring.
  • JarredWalton - Thursday, February 5, 2015 - link

    That's exactly what our Internet test does, which is why the 33% drop in battery life is so alarming. What exactly is going on that N9 loading a generally not too complex web page every 15 seconds or so kills battery life?

Log in

Don't have an account? Sign up now