The Tegra 3 GPU: 2x Pixel Shader Hardware of Tegra 2

Tegra 3's GPU is very much an evolution of what we saw in Tegra 2. The GeForce in Tegra 2 featured four pixel shader units and four vertex shader units; in Tegra 3 the number of pixel shader units doubles while the vertex processors remain unchanged. This brings Tegra 3's GPU core count up to 12. NVIDIA still hasn't embraced a unified architecture, but given how closely it's mimicking the evolution of its PC GPUs I wouldn't expect such a move until the next-gen architecture - possibly in Wayne.

Mobile SoC GPU Comparison
  Adreno 225 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name - USSE USSE2 USSE2 Core Core Core
# of SIMDs 8 4 4 8 4 + 1 8 12
MADs per SIMD 4 2 4 4 4 / 2 1 1
Total MADs 32 8 16 32 18 8 12

Per core performance has improved a bit. NVIDIA worked on timing of critical paths through the GPU's execution units to help it run at higher clock speeds. NVIDIA wouldn't confirm the target clock for Tegra 3's GPU other than to say it was higher than Tegra 2's 300MHz. Peak floating point throughput per core is unchanged (one MAD per clock), but each core should be more efficient thanks to larger caches in the design.

A combination of these improvements as well as newer drivers are what give Tegra 3's GPU its 2x - 3x performance advantage over Tegra 2 despite only a 50% increase in overall execution resources. In pixel shader bound scenarios, there's an effective doubling of execution horsepower so the 2x gains are more believable there. I don't expect many games will be vertex processing bound so the lack of significant improvement there shouldn't be a big issue for Tegra 3.

Ready for Gaming: Stereoscopic 3D and Expanded Controller Support

Tegra 3 now supports stereoscopic 3D for displaying content from YouTube, NVIDIA's own 3D Vision Live website and some Tegra Zone games. In its port of Android, NVIDIA has also added expanded controller support for PS3, Xbox 360 and Wii controllers among others.

Tegra 3 Video Encoding/Decoding and ISP

There's unfortunately not too much to go on here, especially not until we have some testable hardware in hand, but NVIDIA is claiming a much improved video decoder and more efficient video encoder in Tegra 3.

Tegra 3's video decoder can accelerate 1080p H.264 high profile content at up to 40Mbps, although device vendors can impose their own bitrate caps and file limitations on the silicon. NVIDIA wouldn't go into greater detail as to what's changed since Tegra 2, other than to say that the video decoder is more efficient. The video encoder is capable of 1080p H.264 base profile encode at 30 fps. 

The Image Signal Processor (ISP) in Tegra 3 is twice as fast as what was in Tegra 2 and NVIDIA promised more details would be forthcoming (likely alongside the first Tegra 3 smartphone announcements).

Memory Interface: Still Single Channel, DDR3L-1500 Supported

Tegra 3 supports higher frequency memories than Tegra 2 did, but the memory controller itself is mostly unchanged from the previous design. While Tegra 2 supported LPDDR2 at data rates of up to 600MHz, Tegra 3 increases that to LPDDR2-1066 and DDR3-L is supported at data rates of up to 1500MHz. The memory interface is still only 32-bits wide, resulting in far less theoretical bandwidth than Apple's A5, Samsung's Exynos 4210, TI's OMAP 4, or Qualcomm's upcoming MSM8960. This is particularly concerning given the increase in core count as well as GPU execution resources. NVIDIA doesn't expect memory bandwidth to be a limitation, but I can't see how that wouldn't be the case in 3D games. Perhaps it's a good thing that Infinity Blade doesn't yet exist for Android.

SATA II Controller: On Die

Given Tegra 3 will find itself in convertible Windows 8 tablets, this next feature makes a lot of sense. NVIDIA's latest SoC includes an on-die SATA II controller, a feature that wasn't present on Tegra 2.

The CPU ASUS' Transformer Prime: The First Tegra 3 Tablet
Comments Locked


View All Comments

  • jcompagner - Thursday, November 10, 2011 - link

    does the OS not do the scheduling?

    I think there are loads of things build in to the OS that schedules the processors threads.. For example the OS must be Numa aware for numa systems so that they keep processes/threads on the right cores that are in the same cpu/memory banks

    If i look at windows, then windows schedules everything all lover the place but it does now about hyper threading because those cores are skipped when i don't use more then 4 cores at the same time.
  • DesktopMan - Wednesday, November 9, 2011 - link

    Seems risky to launch with a GPU that's weaker than existing SOCs. Compared to the Apple A5 performance it looks more like a 2009 product... Exynos also has it beat. The main competitor it beats is Qualcomm, who isn't far from launching new SOCs themselves.
  • 3DoubleD - Wednesday, November 9, 2011 - link

    At least it looks more powerful than the SGX540 which is in the Galaxy Nexus. I'll wait and see what the real world performance is before writing it off. I suspect it will have "good enough" performance. I doubt we will see much improvement in Android devices until 28 nm as die sizes seem to be the limiting factor. Fortunately Nvidia has their name on the line here and they seem to be viciously optimizing their drivers to get every ounce of performance out of this thing.
  • DesktopMan - Wednesday, November 9, 2011 - link

    Totally agree on the Galaxy Nexus. That GPU is dinosaur old though. Very weird to use it in a phone with that display resolution. Any native 3d rendering will be very painful.
  • eddman - Wednesday, November 9, 2011 - link

    "Exynos also has it beat"

    We don't know that. On paper kal-el's geforce should be at least as fast as exynos. Better wait for benchmarks.
  • mythun.chandra - Wednesday, November 9, 2011 - link

    It's all about the content. While it would be great to win GLBench and push out competition-winning benchmarks scores, what we've focused on is high quality content that fully exploits everything Tegra 3 has to offer.
  • psychobriggsy - Friday, November 11, 2011 - link

    I guess it depends on the clock speed the GPU is running at, and the efficiency it achieves when running. Whilst not as powerful per-clock (looking at the table in the article), a faster clock could make up a lot of the difference. Hopefully NVIDIA's experience with GPUs also means it is very efficient. Certainly the demos look impressive.

    But they're going to have to up their game soon considering the PowerVR Series 6, the ARM Mali 6xx series, and so on, as these are far more capable.
  • AmdInside - Wednesday, November 9, 2011 - link

    Anyone else getting an error when opening the Asus Transformer Prime gallery?
  • skydrome1 - Wednesday, November 9, 2011 - link

    I am still quite underwhelmed by it's GPU. I mean, come on NVIDIA. A company with roots in GPU development having the lowest GPU performance?

    They need to up their game. Or everyone's just going to license other's IPs and develop their own SoCs. LG got an ARM license. Sony got an Imagination license. Samsung's even got their own SoCs shipping. Apple is sticking to in-house design. HTC acquired S3.

    After telling the whole world that by the end of next year, there will be phones that will beat consoles in raw graphical performance, I feel like an idiot.

    Please prove me right, NVIDIA.
  • EmaNymton - Wednesday, November 9, 2011 - link

    REALLY getting tired of all the Anandtech articles being overly focused on performance and ignoring battery life or making statements about the technologies that will theoretically increase battery life. Total ACTUAL battery life matters and increases in perf shouldn't come to the detriment of total ACTUAL battery life.

    This over-emphasis on perf and refusing to hold MFGRs to account for battery life is bordering on irresponsible and is driving this behavior in the hardware MFGRs.


Log in

Don't have an account? Sign up now