NVIDIA Tegra K1 Preview & Architecture Analysis

Name: NVIDIA Tegra K1 Preview & Architecture Analysis
Item: NVIDIA Tegra K1 Preview & Architecture Analysis

by Brian Klug & Anand Lal Shimpi on January 6, 2014 6:31 AM EST

88 Comments | Add A Comment

88 Comments

CPU Option 2: Dual-Core 64-bit NVIDIA Denver

Three years ago, also at CES, NVIDIA announced that it was working on its own custom ARM based microprocessor, codenamed Denver. Denver was teased back in 2011 as a solution for everything from PCs to servers, with no direct mention of going into phones or tablets. In the second half of 2014, NVIDIA expects to offer a second version of Tegra K1 based on two Denver cores instead of 4+1 ARM Cortex A15s. Details are light but here’s what I’m expecting/have been able to piece together.

Given the 28nm HPM process for Tegra K1, I’d expect that the Denver version is also a 28nm HPM design. NVIDIA claims the two SoCs are pin-compatible, which tells me that both feature the same 64-bit wide LPDDR3 memory interface.

The companion core is gone in the Denver version of K1, as is the quad-core silliness. Instead you get two, presumably larger cores with much higher IPC; in other words, the right way to design a CPU for mobile. Ironically it’s NVIDIA, the company that drove the rest of the ARM market into the core race, that is the first (excluding Apple/Intel) to come to the realization that four cores may not be the best use of die area in pursuit of good performance per watt in a phone/tablet design.

It’s long been rumored that Denver was a reincarnation of NVIDIA’s original design for an x86 CPU. The rumor there being NVIDIA used binary translation to convert x86 assembly to some internal format (optimizing the assembly in the process for proper scheduling/dispatch/execution) before it hit the CPU core itself. The obvious change being instead of being x86 compatible, NVIDIA built something that was compatible with ARMv8.

I believe Denver still works the same way though. My guess is there’s some form of a software abstraction layer that intercepts ARMv8 machine code, translates and optimizes/morphs it into a friendlier format and then dispatches it to the underlying hardware. We’ve seen code morphing + binary translation done in the past, including famously in Transmeta’s offerings in the early 2000s, but it’s never been done all that well at the consumer client level.

Mobile SoC vendors are caught in a tough position. Each generation they are presented with opportunities to increase performance, however at some point you need to move to a larger out of order design in order to efficiently scale performance. Once you make that jump, there’s a corresponding increase in power consumption that you simply can’t get over. Furthermore, subsequent performance increases usually depend on leveraging more speculative execution, which also comes with substantial power costs.

ARM’s solution to this problem is to have your cake and eat it too. Ship a design with some big, speculative, out of order cores but also include some in-order cores when you don’t absolutely need the added performance. Include some logic to switch between the cores and you’re golden.

If Denver indeed follows this path of binary translation + code optimization/morphing, it offers another option for saving power while increasing performance in mobile. You can build a relatively wide machine (NVIDIA claims Denver is a 7-issue design, though it’s important to note that we’re talking about the CPU’s internal instruction format and it’s not clear what type of instructions can be co-issued) but move a lot of the scheduling/ILP complexities into software. With a good code morphing engine the CPU could regularly receive nice bundles of instructions that are already optimized for peak parallelism. Removing the scheduling/OoO complexities from the CPU could save power.

Granted all of this funky code translation and optimization is done in software, which ultimately has to run on the same underlying CPU hardware, so some power is expended doing that. The point being that if you do it efficiently, any power/time you spend here will still cost less than if you had built a conventional OoO machine.

I have to say that if this does end up being the case, I’ve got to give Charlie credit. He called it all back in late 2011, a few months after NVIDIA announced Denver.

NVIDIA announced that Denver would have a 128KB L1 instruction cache and a 64KB L1 data cache. It’s fairly unusual to see imbalanced L1 I/D caches like that in a client machine, which I can only assume has something to do with Denver’s more unique architecture. Curiously enough, Transmeta’s Efficeon processor (2nd generation code morphing CPU) had the exact same L1 cache sizes (it also worked on 8-wide VLIW instructions for what it’s worth). NVIDIA also gave us a clock target of 2.5GHz. For an insanely wide machine 2.5GHz sounds pretty high, especially if we’re talking about 28nm HPM, so I’m betting Charlie is right in that we need to put machine width in perspective.

NVIDIA showed a Denver Tegra K1 running Android 4.4 at CES. The design came back from the fab sometime in the past couple of weeks and is already up and running Android. NVIDIA hopes to ship the Denver version of Tegra K1 in the second half of the year.

The Denver option is the more interesting of the two as it not only gives us another (very unique) solution to the power problem in mobile, but it also embraces a much more sane idea of the right balance of core size vs. core count in mobile.

Introduction & CPU Option 1 The GPU

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

88 Comments

View All Comments

easp - Wednesday, January 8, 2014 - link
So, it seems to me that 8 of these Denver cores would offer similar general purpose compute performance to a dual socket server from ~5-6 years ago, and yet, would make up a minuscule % of die area on a Tesla-class GPU die...
Krysto - Saturday, January 11, 2014 - link
Some also say a Denver core should equal the Sandy Bridge core in performance, which would be quite impressive. That's what I have in my laptop, and it was a pretty high-end one 2 years ago.
OreoCookie - Sunday, January 12, 2014 - link
Who wrote that, can you provide a link? I haven't seen any such claims. And I'm fairly sure nVidia would have mentioned that during the press event. Apple's A7 packs about the same punch as a Core 2 Duo, so it'd not be out of the question, but I'd be more cautious, especially seeing how high Intel's cpus turbo these days.
PC Perv - Saturday, January 11, 2014 - link
How can you make so many definitive statements over what was essentially a PR pitch? It's too bad there is no "critics" or ombudsman to hold these bloggers accountable over time. (Granted that is also why these bloggers will never garner respects from mainstream media) These bloggers seemingly get away with anything they say as long as they keep their industry friends happy.

If anyone wants to know what I am talking about, go back 2 ~ 3 years and check these clowns' articles. And check if they ever, i mean EVER, acknowledge their misjudgments or stupidity.
PC Perv - Saturday, January 11, 2014 - link
For instance, do you guys have any follow up on Tegra 4i?

http://www.anandtech.com/show/6787/nvidia-tegra-4-...

Or ist it just the way it is with you guys? Just blow fanfare whenever OEM does a press conference, and completely forget about it in less than a year?

Have you no shame?
TheJian - Tuesday, January 14, 2014 - link
What fanfare? T4i is a Q1 product and the modem just got certified on ATT last month or so. The whole point of the T4i is the modem and phones so what is the problem? NV already showed it doing 150mbps (an update from 100mbps preview info) and this hasn't even been rolled out yet (anybody else running this besides Honk Kong/Singapore?). What do you want them to report? This product has been PULLED IN along with K1 at the cost of some T4 delay and sales. This is not news and everyone (even this NV hating site) has reported it :) T4i if late at all is only because of the modem awaiting which after checking happened Early Nov.

Not sure this new modem speed is even interesting with caps we have today. At 50mbps on cable I can clock ~325GB or so pegged all day (that's north of 10TB/month). Even Hong Kong has a 10GB cap which is what, like 5x USA caps at 2GB usually? Even in HK that's only ONE 1080p flick and some browsing? I hope we start seeing Cell phone bill lawsuits soon that tie up these CAPPED companies so badly they are forced to stop this crap just due to litigation cost fears. But I think this is probably a pipe dream until google or someone else can offer unlimited mobile.

IE, google mentions rolling out Gbit internet in Austin, and ATT goes on immediate defense announcing huge speed upgrades (20x faster to 300mbps) and a future upgrade past that on the books not long after. So it is terribly expensive and not doable before google, but the same week google announces their roll-out, ATT can suddenly roll-out a huge upgrade and BEAT google's roll-out dates...LOL. But to match google's prices ($70) you have to OK them spying on you...ROFL. At least Google forced the updates.
http://www.theverge.com/2013/12/11/5200250/at-t-be...
Then claims they can deny google access to poles a few days later:
http://arstechnica.com/tech-policy/2013/12/why-att...
We can only hope the city votes on 23rd (next week) to allow low pole access pricing. Hard to say google would lose offering free internet to 100 libraries and public joints in the city that the CITY chooses, but they already delayed so maybe they're stupid or bribed heavily. :)

Maybe google just needs to announce everywhere and get ATT etc to announce matching $70 pricing then just say "just kidding". :) At worst they seem to easily force monopolies to respond as shown here. I hope they do the same in phones, amazon and apple too (heck MS also). We need all these big tech dogs to bark at cell providers big time and threaten their business models in any way they can. Competition from outsiders is sorely needed in cell or we'll be stuck with Verizon/ATT etc caps forever.
phoenix_rizzen - Thursday, January 16, 2014 - link
Rogers in Canada has 150 Mbps LTE using their 2600 MHz spectrum. It's been live for about a year now.

They ran a speedtest competition around the time they lit up the first 2600 MHz towers in Ontario, and there were a *lot* of entries showing over 90 Mbps entries. It's listed somewhere on their Redboard blog.

My phone only does 100 Mbps LTE, and our town doesn't yet officially have LTE (there are 2 towers with it enabled out of the dozen or so towers in town), but I can get a consistent 40 Mbps on speedtests, with the occasional jump over 70.

So, if backward old Canada can get 150 Mbps LTE working, anywhere should be able to. :)

Oh, and 6 GB data plans are very common up here.
tipoo - Thursday, November 6, 2014 - link
I wonder if the code morphing has anything to do with the Nexus 9s performance inconsistency? Does amazing in most singular benchmarks, but when thrown multitasking or unpredictable code it chokes.

NVIDIA Tegra K1 Preview & Architecture Analysis

CPU Option 2: Dual-Core 64-bit NVIDIA Denver

Post Your Comment

88 Comments

View All Comments

easp - Wednesday, January 8, 2014 - link

Krysto - Saturday, January 11, 2014 - link

OreoCookie - Sunday, January 12, 2014 - link

PC Perv - Saturday, January 11, 2014 - link

PC Perv - Saturday, January 11, 2014 - link

TheJian - Tuesday, January 14, 2014 - link

phoenix_rizzen - Thursday, January 16, 2014 - link

tipoo - Thursday, November 6, 2014 - link

Log in

Don't have an account? Sign up now