Originally announced in February of this year at MWC, NVIDIA is finally officially launching its next-generation SoC. Previously known under the code name Kal-El, the official name is Tegra 3 and we'll see it in at least one product before the end of the year.

Like Tegra 2 before it, NVIDIA's Tegra 3 is an SoC aimed at both smartphones and tablets built on TSMC's 40nm LPG process. Die size has almost doubled from 49mm^2 to somewhere in the 80mm^2 range.

The Tegra 3 design is unique in the industry as it is the first to implement four ARM Cortex A9s onto a chip aimed at the bulk of the high end Android market. NVIDIA's competitors have focused on ramping up the performance of their dual-core solutions either through higher clocks (Samsung Exynos) or through higher performing microarchitectures (Qualcomm Krait, ARM Cortex A15). While other companies have announced quad-core ARM based solutions, Tegra 3 will likely be the first (and only) to ship in an Android tablet and smartphone in 2011 - 2012.

NVIDIA will eventually focus on improving per-core performance with subsequent iterations of the Tegra family (perhaps starting with Wayne in 2013), but until then Tegra 3 attempts to increase performance by exploiting thread level parallelism in Android.

GPU performance also sees a boon thanks to a larger and more efficient GPU in Tegra 3, but first let's talk about the CPU.

Tegra 3's Four Five Cores

The Cortex A9 implementation in Tegra 3 is an improvement over Tegra 2; each core now includes full NEON support via an ARM MPE (Media Processing Engine). Tegra 2 lacked any support for NEON instructions in order to keep die size small.


NVIDIA's Tegra 2 die


NVIDIA's Tegra 3 die, A9 cores highlighted in yellow

L1 and L2 cache sizes remain unchanged. Each core has a 32KB/32KB L1 and all four share a 1MB L2 cache. Doubling core count over Tegra 2 without a corresponding increase in L2 cache size is a bit troubling, but it does indicate that NVIDIA doesn't expect the majority of use cases to saturate all four cores. L2 cache latency is 2 cycles faster on Tegra 3 than 2, while L1 cache latencies haven't changed. NVIDIA isn't commenting on L2 frequencies at this point.

The A9s in Tegra 3 can run at a higher max frequency than those in Tegra 2. With 1 core active, the max clock is 1.4GHz (up from 1.0GHz in the original Tegra 2 SoC). With more than one core active however the max clock is 1.3GHz. Each core can be power gated in Tegra 3, which wasn't the case in Tegra 2. This should allow for lightly threaded workloads to execute on Tegra 3 in the same power envelope as Tegra 2. It's only in those applications that fully utilize more than two cores that you'll see Tegra 3 drawing more power than its predecessor.

The increase in clock speed and the integration of MPE should improve performance a bit over Tegra 2 based designs, but obviously the real hope for performance improvement comes from using four of Tegra 3's cores. Android is already well threaded so we should see gains in portions of things like web page rendering.

It's an interesting situation that NVIDIA finds itself in. Tegra 3 will show its biggest performance advantage in applications that can utilize all four cores, yet it will be most power efficient in applications that use as few cores as possible.

There's of course a fifth Cortex A9 on Tegra 3, limited to a maximum clock speed of 500MHz and built using LP transistors like the rest of the chip (and unlike the four-core A9 cluster). NVIDIA intends for this companion core to be used for the processing of background tasks, for example when your phone is locked and in your pocket. In light use cases where the companion core is active, the four high performance A9s will be power gated and overall power consumption should be tangibly lower than Tegra 2.

Despite Tegra 3 featuring a total of five Cortex A9 cores, only four can be active at one time. Furthermore, the companion core cannot be active alongside any of the high performance A9s. Either the companion core is enabled and the quad-core cluster disabled or the opposite.

NVIDIA handles all of the core juggling through its own firmware. Depending on the level of performance Android requests, NVIDIA will either enable the companion core or one or more of the four remaining A9s. The transition should be seamless to the OS and as all of the cores are equally capable, any apps you're running shouldn't know the difference between them.

The Tegra 3 GPU: 2x Pixel Shader Hardware of Tegra 2
POST A COMMENT

94 Comments

View All Comments

  • dagamer34 - Wednesday, November 09, 2011 - link

    Using 40nm isn't an excuse when both Apple and Samsung use 45nm and have GPUs that trounce the Tegra 2 in real life and Tegra 3 on paper. Reply
  • eddman - Wednesday, November 09, 2011 - link

    Yeah, and A5 is about 42%-43% bigger than tegra 3, an seems to be consuming more power and run hotter. I'd rather have less GPU power than that.

    Don't know anything about exynos' size and other characteristics.

    Anand, do you have any such information on exynos?
    Reply
  • MySchizoBuddy - Wednesday, November 09, 2011 - link

    what's your source of A5 die size? Reply
  • eddman - Wednesday, November 09, 2011 - link

    At first this: http://www.anandtech.com/show/4840/kalel-has-five-...

    Anand says tegra 3 is 30% smaller than A5, which means A5 is 42-43% bigger.

    After your above comment, I searched a little bit, and noticed in the IT pro portal article linked in my other comment, it says 120 mm^2.

    I also found these:

    http://www.eetimes.com/electronics-news/4215094/A5...

    http://www.notebookcheck.net/Analyst-explains-grap...

    Here, it's 122 mm^2.

    Now with the exact size known, it puts the A5 in an even worse situation, 50-52% bigger.
    Reply
  • eddman - Wednesday, November 09, 2011 - link

    Ok, it seems exynos' size is about 118 mm^2.

    http://www.itproportal.com/2011/06/07/exynos-soc-s...

    http://www.businesswire.com/news/home/201107070061...

    Considering that tegra 3 has 5 cores and yet is still much smaller, I might say nvidia has actually done some nice engineering here.

    Wonder how much of that difference is because of 40 nm process vs. 45 nm. Probably not much, but what do I know. Can anyone do some calculations?
    Reply
  • metafor - Wednesday, November 09, 2011 - link

    It's really difficult to judge because they're from two different foundries. The minimum etch (e.g. 45nm, 40nm) isn't the only thing that affects die area. Some processes require stricter design rules that end up bloating the size of logic.

    Samsung uses Samsung semi's foundries while nVidia uses TSMC. It's difficult to say how they compare without two identical designs that have gone to fab on both.
    Reply
  • Klinky1984 - Wednesday, November 09, 2011 - link

    I think the 500Mhz companion core & proper power gating alleviates most of the concerns about power consumption. Reply
  • metafor - Wednesday, November 09, 2011 - link

    Not really. It alleviates the concern of power consumption on light loads. While that is a big part of common usage and it's definitely a benefit to have great idle/light power, I still would like to have better battery life while I'm heavily using the device. For instance, while playing a resource-heavy game or going to pretty complex websites.

    One thing I do like is that they've improved the efficiency of the video decoder. This makes one of the most common use-cases (watching movies) less power-intensive.
    Reply
  • SniperWulf - Wednesday, November 09, 2011 - link

    I agree. I would rather they had made a strong dual-core and dedicate the rest of the die space to a second memory channel and a stronger GPU Reply
  • a5cent - Wednesday, November 09, 2011 - link

    Qualcom is the only SoC manufacturer making the transition to 28nm anytime soon. Everyone else is shifting at the very end of 2012 (at the earliest). Reply

Log in

Don't have an account? Sign up now