NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

Name: NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On
Item: NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

by Anand Lal Shimpi & Brian Klug on February 24, 2013 3:00 PM EST

75 Comments | Add A Comment

75 Comments

The GPU

Tegra 4 features an evolved GPU core compared to Tegra 3. The architecture retains a fixed division between pixel and vertex shader hardware, making it the only modern mobile GPU architecture not to adopt a unified shader model.

I already described a lot of what makes the Tegra 4 GPU different in our original article on the topic. The diagram below gives you an idea of how the pixel and vertex shader hardware grew over the past 3 generations:

We finally have a competitive GPU architecture from NVIDIA. It’s hardly industry leading in terms of specs, but there’s a good amount of the 80mm^2 die dedicated towards pixel and vertex shading hardware. There's also a new L2 texture cache that helps improve overall bandwidth efficiency.

The big omission here is the lack of full OpenGL ES 3.0 support. NVIDIA’s pixel shader hardware remains FP24, while the ES 3.0 spec requires full FP32 support for both pixel and vertex shaders. NVIDIA also lacks ETC and FP texture support, although some features of ES 3.0 are implemented (e.g. Multiple Render Targets).

Mobile SoC GPU Comparison

GeForce ULP (2012)

PowerVR SGX 543MP2

PowerVR SGX 543MP4

PowerVR SGX 544MP3

PowerVR SGX 554MP4

GeForce ULP (2013)

Used In

Tegra 3

A5X

Exynos 5 Octa

A6X

Tegra 4

SIMD Name

core

USSE2

core

# of SIMDs

MADs per SIMD

Total MADs

128

GFLOPS @ Shipping Frequency

12.4 GFLOPS

16.0 GFLOPS

32.0 GFLOPS

51.1 GFLOPS

71.6 GFLOPS

74.8 GFLOPS

For users today, the lack of OpenGL ES 3.0 support likely doesn’t matter - but it’ll matter more in a year or two when game developers start using OpenGL ES 3.0. NVIDIA is fully capable of building an OpenGL ES 3.0 enabled GPU, and I suspect the resistance here boils down to wanting to win performance comparisons today without making die size any larger than it needs to be. Remembering back to the earlier discussion about NVIDIA’s cost position in the market, this decision makes sense from NVIDIA’s stance although it’s not great for the industry as a whole.

Tegra 4i retains the same base GPU architecture as Tegra 4, but dramatically cuts down on hardware. NVIDIA goes from 4 down to 3 vertex units, and moves to two larger pixel shader units (increasing the ratio of compute to texture hardware in the T4i GPU). The max T4i GPU clock drops a bit down to 660MHz, but that still gives it substantially more performance than NVIDIA’s Tegra 3.

Memory Interface

The first three generations of Tegra SoCs had an embarrassingly small amount of memory bandwidth, at least compared to Apple, Samsung and Qualcomm. Admittedly, Samsung and Qualcomm were late adopters of a dual-channel memory interface, but they still got there much quicker than NVIDIA did.

With Tegra 4, complaints about memory bandwidth can finally be thrown out the window. The Tegra 4 SoC features two 32-bit LPDDR3 memory interfaces, bringing it up to par with the competition. The current max data rate supported by Tegra 4’s memory interfaces is 1866MHz, but that may go up in the future.

Tegra 4 won’t ship in a PoP (package-on-package) configuration and will have to be paired with external DRAM. This will limit Tegra 4 to larger devices, but it should still be able to fit in a phone.

Unfortunately, Tegra 4i only has a single channel LPDDR3 memory interface. Tegra 4i on the other hand will be available in PoP as well as discrete configurations. The PoP configuration may top out at LPDDR3-1600, while the discrete version can scale up to 1866MHz and beyond.

Round Two, Still Quad-Core Tegra 4 Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

75 Comments

View All Comments

TheJian - Monday, February 25, 2013 - link
http://www.anandtech.com/show/6472/ipad-4-late-201...
ipad4 scored 47 vs. 57 for T4 in egypt hd offscreen 1080p. I'd say it's more than competitive with ipad4. T4 scores 2.5x iphone5 in geekbench (4148 vs. 1640). So it's looking like it trumps A6 handily.

http://www.pcmag.com/article2/0,2817,2415809,00.as...
T4 should beat 600 in Antutu and browsermark. IF S800 is just an upclocked cpu and adreno 330 this is going to be a tight race in browsermark and a total killing for NV in antutu. 400mhz won't make up the 22678 for HTC ONE vs. T4's 36489. It will fall far short in antutu unless the gpu means a lot more than I think in that benchmark. I don't think S600 will beat T4 in anything. HTC ONE only uses 1.7ghz the spec sheet at QCOM says it can go up to 1.9ghz but that won't help from the beating it took according to pcmag. They said this:
"The first hint we've seen of Qualcomm's new generation comes in some benchmarks done on the HTC One, which uses Qualcomm's new 1.7-GHz Snapdragon 600 chipset - not the 800, but the next notch down. The Tegra 4 still destroys it."

Iphone5 got destroyed too. Geekbench on T4=4148 vs. iphone5=1640. OUCH.

Note samsung/qualcomm haven't let PC mag run their own benchmarks on Octa or S800. Nvidia is showing now signs of fear here. Does anyone have data on the cpu in Snapdragon800? Is the 400cpu in it just a 300cpu clocked up 400mhz because of the process or is it actually a different core? It kind of looks like this is just 400mhz more on the cpu with an adreno330 on top instead of 320 of S600.
http://www.qualcomm.com/snapdragon/processors/800-...

https://www.linleygroup.com/newsletters/newsletter...
"The Krait 300 provides new microarchitecture improvements that increase per-clock performance by 10–15% while pushing CPU speed from 1.5GHz to 1.7GHz. The Krait 400 extends the new microarchitecture to 2.3GHz by switching to TSMC's high-k metal gate (HKMG) process."

Anyone have anything showing the cpu is MORE than just 400mhz more on a new process? This sounds like no change in the chip itself. That article was Jan23 and Gwennap is pretty knowledgeable. Admittedly I didn't do a lot of digging yet (can't find much on 800 cpu specs, did most of my homework on S600 since it comes first).

We need some Rogue 6 data now too :) Lots of post on the G6100 in the last 18hrs...Still reading it all... ROFL (MWC is causing me to do a lot of reading today...). About 1/2 way through and most of it seems to just brag about opengl es3.0 and DX11.1 (not seeing much about perf). I'm guessing because NV doesn't have it on T4 :) It's not used yet, so I don't care but that's how I'd attack T4 in the news ;) Try running something from DX11.1 on a soc and I think we'll see a slide show (think crysis3 on a soc...LOL). I'd almost say the same for all of es3.0 being on. NV was wise to save die space here and do a simpler chip that can undercut prices of others. They're working on DX9_3 features in WinRT (hopefully MS will allow it). OpenGL ES3.0 & DX11.1 will be more important next xmas. Game devs won't be aiming at $600 phones for their games this xmas, they'll aim at mass market for the most part just like on a pc (where they aim at consoles DX9, then we get ports...LOL). It's a rare game that's aimed at GXT680/7970ghz and up. Crysis3? Most devs shoot far lower.
http://www.imgtec.com/corporate/newsdetail.asp?New...
No perf bragging just features...Odd while everyone else brags vs. their own old versions or other chips.

Qcom CMO goes all out:
http://www.phonearena.com/news/Qualcomm-CMO-disses...
"Nvidia just launched their Tegra 4, not sure when those will be in the market on a commercial basis, but we believe our Snapdragon 600 outperforms Nvidia’s Tegra 4. And we believe our Snapdragon 800 completely outstrips it and puts a new benchmark in place.

So, we clean Tegra 4′s clock. There’s nothing in Tegra 4 that we looked at and that looks interesting. Tegra 4 frankly, looks a lot like what we already have in S4 Pro..."

OOPS...I guess he needs to check the perf of tegra4 again. PCmag shows he's 600 chip got "DESTROYED" and all other competition "crushed". Why is Imagination not bragging about perf of G6100? Is it all about feature/api's without much more power? Note that page from phonearena is having issues (their server is) as I had to get it out of google cache just now from earlier. He's a marketing guy from Intel so you know, a "blue crystals" kind of guy :) The CTO would be bragging about perf I think if he had it. Anand C is fluff marketing guy from Intel (but he has a masters in engineering, he's just marketing it appears now and NOT throwing around data just "i believe" comments). One last note, Exynos octa got kicked out of Galaxy S4 because it overheated the phone according to the same site. So Octa is tablet only I guess? Galaxy S4 is a superphone and octa didn't work in it if what they said is true (rumored 1.9ghz rather than 1.7ghz HTC ONE version).
fteoath64 - Wednesday, February 27, 2013 - link
@TheJian: "ipad4 scored 47 vs. 57 for T4 in egypt hd offscreen 1080p. I'd say it's more than competitive with ipad4. T4 scores 2.5x iphone5 in geekbench (4148 vs. 1640). So it's looking like it trumps A6 handily."

Good reference!. This shows T4 doing what it ought to in the tablet space as Apple's CPU release cycle tends to be 12 to 18 months, giving Nvidia lots of breathing room. Also besides, since Qualcomm launched all their new ranges, the next cycle is going to be a while. However, Qualcomm has so many design wins on their Snapdragons, it leaves little room for Nvidia and others to play. Is this why TI went out of this market ?. So could Amazon be candidate for T4i on their next tablet update ?.

PS: The issue with Apple putting quad PVR544 into iPad was to ensure the performance overall with retina is up to par with the non-retina version. Especially the Mini which is among the fastest tablet out there considering it needs to push less than a million pixels yet delivering a good 10 hours of use.
mayankleoboy1 - Tuesday, February 26, 2013 - link
Hey AnandTech, you never told us what is the "Project Thor" , that JHH let slip at CES..
CeriseCogburn - Thursday, February 28, 2013 - link
This is how it goes for nVidia from, well we know whom at this point, meaning, it appears everyone here.

" I have to give NVIDIA credit, back when it introduced Tegra 3 I assumed its 4+1 architecture was surely a gimmick and to be very short lived. I remember asking NVIDIA’s Phil Carmack point blank at MWC 2012 whether or not NVIDIA would standardize on four cores for future SoCs. While I expected a typical PR response, Phil surprised me with an astounding yes. NVIDIA was committed to quad-core designs going forward. I still didn’t believe it, but here we are in 2013 with NVIDIA’s high-end and mainstream roadmaps both exclusively featuring quad-core SoCs. NVIDIA remained true to its word, and the more I think about it, the more the approach makes sense."

paraphrased: " They're lying to me, they lie, lie ,lie ,lie ,lie. (pass a year or two or three) Oh my it wasn't a lie. "
Rinse and repeat often and in overlapping fashion.

Love this place, and no one learns.
Here's a clue: It's AMD that has been lying it's yapper off to you for years on end.
Origin64 - Tuesday, March 12, 2013 - link
Wow. 120mbps LTE? I get a fifth of that through a cable at home.

NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

The GPU

Memory Interface

Post Your Comment

75 Comments

View All Comments

TheJian - Monday, February 25, 2013 - link

fteoath64 - Wednesday, February 27, 2013 - link

mayankleoboy1 - Tuesday, February 26, 2013 - link

CeriseCogburn - Thursday, February 28, 2013 - link

Origin64 - Tuesday, March 12, 2013 - link

Log in

Don't have an account? Sign up now