NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

Name: NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On
Item: NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

by Anand Lal Shimpi & Brian Klug on February 24, 2013 3:00 PM EST

75 Comments | Add A Comment

75 Comments

Ever since NVIDIA arrived on the SoC scene, it has done a great job of introducing its ultra mobile SoCs. Tegra 2 and 3 were both introduced with a healthy amount of detail and the sort of collateral we expect to see from any PC silicon vendor. While the rest of the mobile space is slowly playing catchup, NVIDIA continued the trend with its Tegra 4 and Tegra 4i architecture disclosure.

Since Tegra 4i is a bit further out, much of NVIDIA’s focus for today’s disclosure focused on its flagship Tegra 4 SoC due to begin shipping in Q2 of this year along with the NVIDIA i500 baseband. At a high level you’re looking at a quad-core ARM Cortex A15 (plus fifth A15 companion core) and a 72-core GeForce GPU. To understand Tegra 4 at a lower level, we’ll dive into the individual blocks beginning, as usual with the CPU.

ARM’s Cortex A15 and Power Consumption

Tegra 4’s CPU complex sees a significant improvement over Tegra 3. Despite being an ARM architecture licensee, NVIDIA once again licensed a complete processor from ARM rather than designing its own core. I do fundamentally believe that NVIDIA will go the full custom route eventually (see: Project Denver), but that’s a goal that will take time to come to fruition.

In the case of Tegra 4, NVIDIA chose to license ARM’s Cortex A15 - the only vanilla ARM core presently offered that can deliver higher performance than a Cortex A9.

Samsung recently disclosed details about its Cortex A15 implementation compared to the Cortex A7, a similarly performing but more power efficient alternative to the A9. In its ISSCC paper on the topic Samsung noted that the Cortex A15 offered up to 3x the performance of the Cortex A7, at 4x the area and 6x the power consumption. It’s a tremendous performance advantage for sure, but it comes at a great cost to area and power consumption. The area side isn’t as important as NVIDIA has to eat that cost, but power consumption is a valid concern.

To ease fears about power consumption, NVIDIA provided the following data:

The table above is a bit confusing so let me explain. In the first row NVIDIA is showing that it has configured the Tegra 3 and 4 platforms to deliver the same SPECint_base 2000 performance. SPECint is a well respected CPU benchmark that stresses everything from the CPU core to the memory interface. The int at the end of the name implies that we’re looking at purely single threaded integer performance.

The second row shows us the SPECint per watt of the Tegra 3/4 CPU subsystem, when running at the frequencies required to deliver a SPECint score of 520. By itself this doesn’t tell us a whole lot, but we can use this data to get some actual power numbers.

At the same performance level, Tegra 4 operates at 40% lower power than Tegra 3. The comparison is unfortunately not quite apples to apples as we’re artificially limiting Tegra 4’s peak clock speed, while running Tegra 3 at its highest, most power hungry state. The clocks in question are 1.6GHz for Tegra 3 and 825MHz for Tegra 4. Running at lower clocks allows you to run at a lower voltage, which results in much lower power consumption. In other words, NVIDIA’s comparison is useful but skewed in favor of Tegra 4.

What this data does tell us however is exactly how NVIDIA plans on getting Tegra 4 into a phone: by aggressively limiting frequency. If a Cortex A15 at 825MHz delivers identical performance at a lower power compared to a 40nm Cortex A9 at 1.6GHz, it’s likely possible to deliver a marginal performance boost without breaking the power bank.

That 825MHz mark ends up being an important number, because that’s where the fifth companion Cortex A15 tops out at. I suspect that in a phone configuration NVIDIA might keep everything running on the companion core for as long as possible, which would address my fears about typical power consumption in a phone. Peak power consumption is still going to be a problem I think.

ARM's Cortex A15 Architecture

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

75 Comments

View All Comments

Krysto - Monday, February 25, 2013 - link
S600 is just a slightly overclocked S4 Pro with the same GPU.

The real competitor of Tegra 4 will be S800. We'll see if it wins in CPU performance (it might not), and I think there's a high chance it will lose in GPU performance, as Adreno 330 is only 50% faster than Adreno 320 I think, and Tegra 4 is about twice as fast.

Qualcomm has always had slower graphics performance than Nvidia actually. The only "gap" they found in the market was last fall with the Adreno 320, when Nvidia didn't have anything good to show. But Tegra 3 beat S4 with its Adreno 225.
watersb - Monday, February 25, 2013 - link
I'm amazed at the depth of this NVIDIA data-dump. Brilliant work.

Anand's observation re: die size, cost strategy, position in the market and how this buys them time to consolidate... Wow.

Clearly, Nvidia is in this game for the long haul.
djgandy - Monday, February 25, 2013 - link
So OpenGL ES 3.0 doesn't matter, but quad core A15 does? Why do people suck up to Nvidia and their marketing BS so much?

T4i still single channel memory? What a joke configuration.
djgandy - Monday, February 25, 2013 - link
Also a 9 page article about a mobile SoC without a single reference to the word "battery".
varad - Monday, February 25, 2013 - link
Read the article before you write such comments. The very first page is "Introduction & Power" where they do mention some numbers and their thoughts.
djgandy - Tuesday, February 26, 2013 - link
Yeah its all smoke and mirrors under lab test conditions. Where is the real battery life? Is this not for battery powered devices?
Krysto - Monday, February 25, 2013 - link
Personally, I think all 2013 GPU's should have support for OpenGL ES 3.0 and OpenCL. I was stunned to find out Tegra 4 was not going to support it as they haven't even switched to a unified shader architecture.

That being said, Anand is probably right that it was the right move for Nvidia, and they are just going to wait for the Maxwell architecture to streamline the same custom ARMv8 CPU from Tegra 5 to Project Denver across product line-ups, and also the same Maxwell GPU cores.

If that's indeed their plan, then switching Tegra 4 to Kepler this year, only to switch again to Maxwell next year wouldn't have made any sense. GPU architectures barely change even every 2-3 years, let alone 1 year. It wouldn't have been cost effective for them.

I do hope they aren't going to delay the transition again with Tegra 5 though, and I also do hope they follow Qualcomm's strategy with S4 last year of switching IMEMDIATELY to the 20nm process, instead of continuing on 28nm with Tegra 5, like they did with Tegra 3 on 40nm. But I fear Nvidia will repeat the same mistake.

If they put Tegra 5 on 20nm, and make it 120mm2 in size, with Maxwell GPU core, I don't think even Apple's A8X will stand against it next year in terms of GPU performance (and of course it will get beaten easily in CPU performance, just like this year).
djgandy - Tuesday, February 26, 2013 - link
Tegra is smaller because it lacks features and also memory bandwidth. The comparison is not really fair to assume you can just throw more shaders at the problem. You'll need wider memory bus for a start. You'll need more TMU's and in the future it's probably smart to have a dedicate ROP unit. Then also are you seriously going to just stick with FP20 and not support ES 3.0 and OpenCL? OEMs see OpenCL as a de facto feature these days, not because it is widely used but because it opens up future possibilities. Nvidia has simply designed an SoC for gaming here.

Your post focuses on performance, but these are battery powered devices. The primary design goal is efficiency, and it would appear that is why apple went swift and not A15. A15 is just too damn power hungry, even for a tablet.
metafor - Tuesday, February 26, 2013 - link
If the silicon division of Apple were its own business, they'd be in the red. Very few silicon providers can afford to make 120mm^2 chips and still make a profit; let alone one with as little bargaining clout in the mobile space as nVidia.

Numbers are great but at the end of the day, making money is what matters.
milli - Monday, February 25, 2013 - link
nVidia is trying hard but Tegra still isn't making them any money ...

NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

ARM’s Cortex A15 and Power Consumption

Post Your Comment

75 Comments

View All Comments

Krysto - Monday, February 25, 2013 - link

watersb - Monday, February 25, 2013 - link

djgandy - Monday, February 25, 2013 - link

djgandy - Monday, February 25, 2013 - link

varad - Monday, February 25, 2013 - link

djgandy - Tuesday, February 26, 2013 - link

Krysto - Monday, February 25, 2013 - link

djgandy - Tuesday, February 26, 2013 - link

metafor - Tuesday, February 26, 2013 - link

milli - Monday, February 25, 2013 - link

Log in

Don't have an account? Sign up now