NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

Name: NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On
Item: NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

by Anand Lal Shimpi & Brian Klug on February 24, 2013 3:00 PM EST

75 Comments | Add A Comment

75 Comments

It basically goes without saying, but Tegra 4i, formerly Grey, is primarily a story about the modem. About halfway through 2011, NVIDIA acquired Icera, a player in the handset cellular baseband market who had managed to fit a software-defined radio into the power budget for a mobile device. NVIDIA clearly recognized the importance of being able to deliver an SoC with an integrated modem, and after the Icera acquisition, and later Grey’s appearance on the roadmap, it just became a matter of when that would happen. NVIDIA executed pretty quickly, it turns out, and the fruits of that work manifest themselves in two products — Icera’s standalone i500 and Tegra 4i which contains the same i500 IP block, minus USB interfaces and a different memory subsystem.

Icera was an interesting player in the baseband space because their architecture was that of a software defined radio on the digital baseband side. The reality is that nobody is either fully software defined or burned out to ASIC (hardware), but rather somewhere inbetween. For example, Qualcomm is a combination of software and hardware, though it’s never been entirely clear what functional blocks are ASIC and which other blocks are software, though I’ve been told this is often a matter of whatever is most advantageous for power and what gets re-used most. That said, Icera’s implementation is the furthest towards being pure software defined of anyone, with the entire digital baseband being just one big platform to run their own software atop. There’s an external transceiver which does downconversion, but after that it’s pure software. The question has always been how Icera could afford to build a power competitive platform with an entirely software designed stack, and the clue lies in their choice of 28 HPM instead of LP or HPL silicon for i500 and 4i. Icera designs to a high performance process, then turns off blocks when they’re not in use, rather than make a larger SoC that’s lower leakage. The result is that NVIDIA claims a 40% smaller die for i500 than MDM9x15.

Icera has maintained the same software platform across its modem portfolio, each time taking advantage of increased processing performance afforded by faster silicon to include more air interfaces and capabilities. The result is an interesting advantage when it comes to operator certification — they can make a compelling case for each new version being close to the previous one, since they run the same PHY and protocol stack entirely in software.

In the case of i500, it runs the exact same multimode modem software as the i4xx series, just with increased DXP support for 15 and 20 MHz channel bandwidths on LTE.

The entire stack ends up being 1.2 million lines of C and DXP code, with a total size of 7.7 MB compiled. NVIDIA gave a great breakdown of the protocol stack as well.

The real name for Icera i500 is ICE9045, and it is paired with a ICE9245 transceiver. ICE9045 is built on 28nm HPM as I mentioned before, and ICE9245 remains 65nm TSMC LP CMOS process which is RF friendly. ICE9045 supports basically all the 3GPP air interfaces, as mentioned in the earlier announcement piece. There’s up to Category 3 LTE on the baseband at launch, with Category 4 in the future. For WCDMA, up to Category 24 (42 Mbps) (dual carrier with 64QAM), and interestingly enough the same Category 18 16 QAM with 2x2 MIMO (28 Mbps) as earlier implemented in Icera 450, and an optional future upgrade to Category 28 64 QAM with 2x2 MIMO (84 Mbps). Of course there’s also TD-SCDMA, GSM/EDGE, full support for voice including AMR-WB and VoLTE/IMS. In addition to 2x2 MIMO the ICE9045 can also do 4x4 MIMO on LTE with a second ICE9245 transceiver.

NVIDIA broke down the ICE9045 functionally, which consists of two large DXP units and one smaller DXP unit which runs the rest of the software and management stack. The two larger DXP units run at up to 1.3 GHz. The Icera instruction set consists of two different fundamental sets. Icera refers to these as the “C” and “D” side, with C being rather obvious. The C side is unsurprisingly a C-complier targeted version of the 3GPP protocol stack, and manages the higher level functions of the modem above physical interface, and is a scalar machine. In the block diagram, the C side runs on DXP1, the D side runs on the larger beefier DXP0 and DXP2 machines.

The D side is a proprietary assembly language vector instruction set that runs the physical layer of the modem, this is a combination of specific libraries that really make up the magic and give the Icera platform its reconfigurability. NVIDIA gave an excellent breakdown of the data paths inside both sides. It turns out that in an LTE configuration one core does all the inverse FFTs and MIMO matrix math, the second core does rate matching and decoding. The ultimate goal is to have each of the cores processing around the same equal workload, and since it’s software these tasks can be shuffled in-between to get the i500 running each core at the lowest possible frequency and voltage. Each of the cores can also be individually power collapsed.

On the transceiver side we have ICE9245 which is again TSMC 65nm LP CMOS. This talks over a digital interface to the baseband, and increases the number of primary RX ports from 6 to 8. All filtering and channelization is onboard the ISP. The goal is to keep all the A2D and D2A on the transceiver and away from the baseband. NVIDIA hasn’t quite given me the frequency breakdown for their low band, high band, and super high band, but you can figure it out just by looking. There’s also diversity ports for the same set of bands. All these ports are created equal and can handle 20 MHz LTE, in addition to inter-band and intra-band CA.

The big question is how well Icera’s i500 will stack up against the MDM9x15s and MDM9x25s of the world. Icera believes its ability to switch between four fundamentally different receiver types on the fly and dedicate processing to different functions based on the cellular geometry at the moment offers it an advantage in the long run.

NVIDIA demonstrated a Phoenix with Tegra 4i cabled up to a base station emulator doing full Category 3 LTE at 100 Mbps nice and steady. I asked whether they could go any higher to demonstrate the headroom in the current i500 architecture for upgrading to Category 4 and was shown just short of 120 Mbps on the two week old Tegra 4i silicon.

Gallery: Phoenix LTE Demonstration

Tegra 4 Performance ISP - Chimera Computational Camera

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

75 Comments

View All Comments

Krysto - Monday, February 25, 2013 - link
S600 is just a slightly overclocked S4 Pro with the same GPU.

The real competitor of Tegra 4 will be S800. We'll see if it wins in CPU performance (it might not), and I think there's a high chance it will lose in GPU performance, as Adreno 330 is only 50% faster than Adreno 320 I think, and Tegra 4 is about twice as fast.

Qualcomm has always had slower graphics performance than Nvidia actually. The only "gap" they found in the market was last fall with the Adreno 320, when Nvidia didn't have anything good to show. But Tegra 3 beat S4 with its Adreno 225.
watersb - Monday, February 25, 2013 - link
I'm amazed at the depth of this NVIDIA data-dump. Brilliant work.

Anand's observation re: die size, cost strategy, position in the market and how this buys them time to consolidate... Wow.

Clearly, Nvidia is in this game for the long haul.
djgandy - Monday, February 25, 2013 - link
So OpenGL ES 3.0 doesn't matter, but quad core A15 does? Why do people suck up to Nvidia and their marketing BS so much?

T4i still single channel memory? What a joke configuration.
djgandy - Monday, February 25, 2013 - link
Also a 9 page article about a mobile SoC without a single reference to the word "battery".
varad - Monday, February 25, 2013 - link
Read the article before you write such comments. The very first page is "Introduction & Power" where they do mention some numbers and their thoughts.
djgandy - Tuesday, February 26, 2013 - link
Yeah its all smoke and mirrors under lab test conditions. Where is the real battery life? Is this not for battery powered devices?
Krysto - Monday, February 25, 2013 - link
Personally, I think all 2013 GPU's should have support for OpenGL ES 3.0 and OpenCL. I was stunned to find out Tegra 4 was not going to support it as they haven't even switched to a unified shader architecture.

That being said, Anand is probably right that it was the right move for Nvidia, and they are just going to wait for the Maxwell architecture to streamline the same custom ARMv8 CPU from Tegra 5 to Project Denver across product line-ups, and also the same Maxwell GPU cores.

If that's indeed their plan, then switching Tegra 4 to Kepler this year, only to switch again to Maxwell next year wouldn't have made any sense. GPU architectures barely change even every 2-3 years, let alone 1 year. It wouldn't have been cost effective for them.

I do hope they aren't going to delay the transition again with Tegra 5 though, and I also do hope they follow Qualcomm's strategy with S4 last year of switching IMEMDIATELY to the 20nm process, instead of continuing on 28nm with Tegra 5, like they did with Tegra 3 on 40nm. But I fear Nvidia will repeat the same mistake.

If they put Tegra 5 on 20nm, and make it 120mm2 in size, with Maxwell GPU core, I don't think even Apple's A8X will stand against it next year in terms of GPU performance (and of course it will get beaten easily in CPU performance, just like this year).
djgandy - Tuesday, February 26, 2013 - link
Tegra is smaller because it lacks features and also memory bandwidth. The comparison is not really fair to assume you can just throw more shaders at the problem. You'll need wider memory bus for a start. You'll need more TMU's and in the future it's probably smart to have a dedicate ROP unit. Then also are you seriously going to just stick with FP20 and not support ES 3.0 and OpenCL? OEMs see OpenCL as a de facto feature these days, not because it is widely used but because it opens up future possibilities. Nvidia has simply designed an SoC for gaming here.

Your post focuses on performance, but these are battery powered devices. The primary design goal is efficiency, and it would appear that is why apple went swift and not A15. A15 is just too damn power hungry, even for a tablet.
metafor - Tuesday, February 26, 2013 - link
If the silicon division of Apple were its own business, they'd be in the red. Very few silicon providers can afford to make 120mm^2 chips and still make a profit; let alone one with as little bargaining clout in the mobile space as nVidia.

Numbers are great but at the end of the day, making money is what matters.
milli - Monday, February 25, 2013 - link
nVidia is trying hard but Tegra still isn't making them any money ...

NVIDIA Tegra 4 Architecture Deep Dive, Plus Tegra 4i, Icera i500 & Phoenix Hands On

Post Your Comment

75 Comments

View All Comments

Krysto - Monday, February 25, 2013 - link

watersb - Monday, February 25, 2013 - link

djgandy - Monday, February 25, 2013 - link

djgandy - Monday, February 25, 2013 - link

varad - Monday, February 25, 2013 - link

djgandy - Tuesday, February 26, 2013 - link

Krysto - Monday, February 25, 2013 - link

djgandy - Tuesday, February 26, 2013 - link

metafor - Tuesday, February 26, 2013 - link

milli - Monday, February 25, 2013 - link

Log in

Don't have an account? Sign up now