NVIDIA Tegra K1 Preview & Architecture Analysis

Name: NVIDIA Tegra K1 Preview & Architecture Analysis
Item: NVIDIA Tegra K1 Preview & Architecture Analysis

by Brian Klug & Anand Lal Shimpi on January 6, 2014 6:31 AM EST

88 Comments | Add A Comment

88 Comments

Final Words

NVIDIA’s challenge with Tegra has always been getting design wins. In the past NVIDIA offered quirky alternatives to Qualcomm, most of the time at a more attractive price point. With Tegra K1, NVIDIA offers a substantial feature and performance advantage thanks to its mobile Kepler GPU. I still don’t anticipate broad adoption in the phone space. If NVIDIA sees even some traction among Android tablets that’s enough to get to the next phase, which is trying to get some previous generation console titles ported over to the platform.

NVIDIA finally has the hardware necessary to give me what I’ve wanted ever since SoC vendors first started focusing on improving GPU performance: the ability to run Xbox 360 class titles in mobile. With Tegra K1 the problem goes from being a user interface, hardware and business problem to mostly a business problem. Android support for game controllers is reasonable enough, and K1 more or less fixes the hardware limitations, leaving only the question of how do game developers make enough money to justify the effort of porting. I suspect if we’re talking about moving over a library of existing titles that have already been substantially monetized, there doesn’t need to be all that much convincing. NVIDIA claims it’s already engaged with many game developers on this front, but I do believe it’ll still be an uphill battle.

If I were in Microsoft’s shoes, I’d view Tegra K1 as an opportunity to revolutionize my mobile strategy. Give users the ability to run games like Grand Theft Auto V on a mobile device in the not too distant future and you’ve now made your devices more interesting to a large group of users. I don’t anticipate many wanting to struggle to play console games on a 5-inch touchscreen, but with a good controller dock (or tablet with a kickstand + wireless controller) the interface problem goes away.

For the first time I’m really excited about an NVIDIA SoC. It took the company five generations to get here, but we finally have an example of NVIDIA doing what it’s really good at (making high performance GPUs) in mobile. NVIDIA will surely update its Tegra Note 7 to a Tegra K1 version (most of its demos were run in a Tegra Note 7 chassis), but even if that and Shield are the best we get the impact on the rest of the market will be huge. With Tegra K1, NVIDIA really raised the bar on the GPU performance.

The CPU side, at least for the Cortex A15 version is less interesting to me. ARM’s Cortex A15, particularly at high clocks, has proven to be a decent fit for tablets. I am curious to see how the Denver version of Tegra K1 turns out. If the rumors are true, Denver could very well be one of the biggest risks we’ve seen taken in pursuit of building a low power mobile CPU. I am eager to see how this one plays out.

Finally: two big cores instead of a silly number of tiny cores

NVIDIA hasn’t had the best track record of meeting shipping goals on previous Tegra designs. I really hope we see Tegra K1, particularly the Denver version, ship on time (although I'm highly doubtful this will happen - new custom CPU core, GPU and process all at the same time?). I’m very eager not only for an install base of mobile devices with console-class GPUs to start building, but also to see what Denver can do. I suspect we’ll find out more at GTC about the latter. It’s things like Tegra K1 that really make covering the mobile space exciting.

Tegra K1 ISP & Video

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

88 Comments

View All Comments

jerrylzy - Tuesday, January 7, 2014 - link
Exactly. I don't see Loki726's point of gamers paying extra $ for Double Precision. AMD Cards are generally much cheaper at the same performance level, though at the cost of power consumption.
Loki726 - Tuesday, January 7, 2014 - link
I mean compared to a world where AMD decided to rip out the double precision units. There are obviously many (thousands) other factors that do into the efficiency of a GPU.
jerrylzy - Tuesday, January 7, 2014 - link
Unfortunately, instead of using VLIW 5, Qualcomm implemented new scalar architecture way back in adreno 320.
Loki726 - Tuesday, January 7, 2014 - link
Yep, the have improved on it, but they started with the AMD design. My point was that the Qualcomm GPU is a better comparison point to a Tegra SoC than an AMD desktop part.
ddriver - Wednesday, January 8, 2014 - link
The decision to chose Qualcomm in favor of Tegra would be based entirely on the absence of OpenCL support in Tegra. Exclusive cuda? Come on, who would want to invest into writing a parallel accelerated high performance routine that only works on like no more than 5% of the hardware-capable to run it devices? Not me anyway.

The mention of the radeon was regarding a completely different point - that nvidia sacks DP performance even where it makes no sense to, and is IMO criminal to do so - the "gain" of such a terrible DP implementation is completely diminished by the loss of potential performance and possibility of accelerating a lot of professional workstation software. And for what, so the only spared parts - the "professional" products can have their ridiculous prices better "justified"? Because it is such a sweet deal to make a product 10% more expensive to make and ask 5000% more money for it.

Which is the reason AMD offers so much more value, while limp and non-competitive in the CPU performance, the place where computation is really needed - professional workstation software can greatly benefit from parallelization, and the much cheaper desktop enthusiast product actually delivers more raw computational power than the identical, but more conservatively clocked fireGL analog. Surely, fireGL still has its perks - ECC, double the memory, but those advantages shine in very rare circumstances, in most of the professional computation demanding software the desktop part is still an incredibly lucrative investment, something you just don't get with nvidia because of what they decided to do the last few years, coincidentally the move to cripple DP performance to 1/24 coincided with the re-pimping of the quadros into the tesla line. I think it is rather obvious that nvidia decided to shamelessly milk the parallel supercomputing professional market, something that will backfire in their face, especially stacking with the downplay-ment of OpenCL in favor of a vendor exclusive API to use the hardware.
Loki726 - Wednesday, January 8, 2014 - link
Agreed with the point about code portability, but that's an entirely different issue. I'd actually take the point further and say that OpenCL is too vendor specific -> it only runs on a few GPUs and has shaky support on mobile. Parallel code should be a library like pthreads, C++ (or pick your favorite language) standard library threads, or MPI. Why program in a new language that is effectively C/C++, except that it isn't?

I personally think that if a company artificially inflates the price of specific features like double precision, then they leave themselves open to being undercut by a competitor and they will either be forced to change it or go out of business. As I said, AMD's design choice penalizes gamers, but helps users who want compute features, and NVIDIA's choice benefits gamers, but penalizes desktop users who want the best value for some compute features like double precision.

I have a good understanding of circuit design and VLSI implementation of floating point units and I can say that the area and power overheads of adding in 768 extra double precision units to a Kepler GPU or 896 double precision units to a GCN GPU would be noticeable, even if you merged pairs of single precision units together and shared common logic (which would create scheduling hazards at the uArch level that could further eat into perf, and increase timing pressure during layout).

Take a look at this paper from Mark Horowitz (an expert) that explores power and area tradeoffs in floating point unit design if you don't believe me. It should be easy to verify. http://www.cpe.virginia.edu/grads/pdfs/August2012/... . Look at the area and power comparisons in Table 1, scale them to 28nm, and multiply them by ~1000x (to get up to 1/2 or 1/4 of single precision throughput).

Double precision units are big, and adding a lot of them adds a lot of power and area.
Krysto - Saturday, January 11, 2014 - link
I want to believe OpenCL was left out because they've been trying to squeeze so much in this time-frame already. But since they fully ported everything in one swoop, I still find it hard to believe they didn't omit it on purpose. Hopefully, they'll support OpenCL 2.0 in Maxwell, because OpenCL 2.0 also offers some great parallelism features, which Maxwell could take advantage of.
Andromeduck - Wednesday, January 8, 2014 - link
Isn't that what the GTX Titan is for?
Jon Tseng - Monday, January 6, 2014 - link
Sounds very interesting. The Q for me though as you allude to at the end is whether they can recruit devs to utilise this. Especially when mobile games are a freemium dominated world the temptation is to code for lowest common denominator/max audience, probably with a Samsung label on it (I'm not complaining - its whats enabled me to run World of Tanks happily on my Bay Trail T100!).

World beating GPU tech no use unless people are utilising it. Interesting thought about getting MSFT on board though - I guess the downer is that Windows Phone is a minority sport still, and tablet wise it would have to be Windows RT... :-x
nicolapeluchetti - Monday, January 6, 2014 - link
The processing power might be the same as the X-Box 360-PS 3 but using Direct X doesn't incur in a performance Hit?

NVIDIA Tegra K1 Preview & Architecture Analysis

Final Words

Post Your Comment

88 Comments

View All Comments

jerrylzy - Tuesday, January 7, 2014 - link

Loki726 - Tuesday, January 7, 2014 - link

jerrylzy - Tuesday, January 7, 2014 - link

Loki726 - Tuesday, January 7, 2014 - link

ddriver - Wednesday, January 8, 2014 - link

Loki726 - Wednesday, January 8, 2014 - link

Krysto - Saturday, January 11, 2014 - link

Andromeduck - Wednesday, January 8, 2014 - link

Jon Tseng - Monday, January 6, 2014 - link

nicolapeluchetti - Monday, January 6, 2014 - link

Log in

Don't have an account? Sign up now