The Architecture

Kal-El looks a lot like NVIDIA's Tegra 2, just with more cores and some pin pointed redesigns. The architecture will first ship in a quad-core, 40nm version. These aren't NVIDIA designed CPU cores, but rather four ARM Cortex A9s running at some presently unannounced clock speed. I asked NVIDIA if both the tablet and smartphone versions of Kal-El will feature four cores. The plan is for that to be the case, at least initially. NVIDIA expects high end smartphones manufacturers to want to integrate four cores this year and going in to 2012.

The CPU cores themselves have changed a little bit. Today NVIDIA's Tegra 2 features two Cortex A9s behind a shared 1MB L2 cache. Kal-El will use four Cortex A9s behind the same shared 1MB L2 cache.

NVIDIA also chose not to implement ARM's Media Processing Engine (MPE) with NEON support in Tegra 2. It has since added in MPE to each of the cores in Kal-El. You may remember that MPE/NEON support is one of the primary differences between TI's OMAP 4 and NVIDIA's Tegra 2. As of Kal-El, it's no longer a difference.

Surprisingly enough, the memory controller is still a single 32-bit wide LPDDR2 controller. NVIDIA believes that even a pair of Cortex A9s can not fully saturate a single 32-bit LPDDR2 channel and anything wider is a waste of power at this point. NVIDIA also said that effective/usable memory bandwidth will nearly double with Kal-El vs. Tegra 2. Some of this doubling in bandwidth will come from faster LPDDR2 (perhaps up to 1066?) while the rest will come as a result of some changes NVIDIA made to the memory controller itself.

Power consumption is an important aspect of Kal-El and Kal-El is expected to require, given the same workload, no more power than Tegra 2. Whether it's two fully loaded cores or one fully loaded and one partially loaded core, NVIDIA believes there isn't a single example of a situation where equal work is being done and Kal-El isn't lower power than Tegra 2. Obviously if you tax all four cores you'll likely have worse battery life than with a dual-core Tegra 2 platform, but given equal work you should see battery life that's equal if not better than a Tegra 2 device of similar specs. Given that we're still talking about a 40nm chip, this is a pretty big claim. NVIDIA told me that some of the power savings in Kal-El are simply due to learnings it had in the design of Tegra 2, while some of it is due to some pretty significant architectural discoveries. I couldn't get any more information than that.


Kal-El vs. Tegra 2 running 3D game content today at 2 - 2.5x the frame rate

On the GPU side, Kal-El implements a larger/faster version of the ULP GeForce GPU used in Tegra 2. It's still not a unified shader architecture, but NVIDIA has upped the core count from 8 to 12. Note that in Tegra 2 the 8 cores refer to 4 vertex shaders and 4 pixel shaders. It's not clear how the 12 will be divided in Kal-El but it may not be an equal scaling to 6+6.

The GPU clock will also be increased, although it's unclear to what level.

The combination of the larger GPU and the four, larger A9 cores (MPE is not an insignificant impact on die area) results in an obviously larger SoC. NVIDIA measures the package of the AP30 (the smartphone version of Kal-El) at 14mm x 14mm. The die size is somewhere around 80mm^2, up from ~49mm^2 with Tegra 2.

 

The Roadmap Video Decode
POST A COMMENT

76 Comments

View All Comments

  • theagentsmith - Wednesday, February 16, 2011 - link

    There is the Mobile World Congress happening right now in the nice city of Barcelona.... almost every company involved in mobile electronics sector is showing off new products, that's why you see only news about smartphones! Reply
  • R3MF - Wednesday, February 16, 2011 - link

    nvidia, you have not lost the magic! Reply
  • Dribble - Wednesday, February 16, 2011 - link

    @40nm the power draw would be too high for a phone so I don't suppose there's much point having this processor in one until 28nm arrives.

    However for the new tablet market you have larger batteries so you can target them with a higher power draw soc (it's still going to be much much smaller then any x86 chip and I expect the big screen will still be sucking most of the power).

    Impressive they got it working first time, puts a lot of pressure on competitors who are still struggling to catch up with tegra 2 let alone compete with this.
    Reply
  • SOC_speculation - Wednesday, February 16, 2011 - link

    Very cool chip, lots of great technology. But it will not be successful in the market.
    a 1080p high profile decode onto a tablet's SXGA display can easily jump into the 1.2GB/s range. if you drive it over HDMI to a TV and then run a small game or even a nice 3D game on the tablet's main screen, you can easily get into the 1.7 to 2GB/s range.

    why is this important? a 533Mhz lpddr2 channel has a max theoretical bandwidth of 4.3GB/s. Sounds like enough right? well, as you increase frequency of ddr, your _actual_ bandwidth lowers due to latency issues. in addition, across workloads, the actual bandwidth you can get from any DDR interface is between 40 to 60% of the theoretical max.

    So that means the single channel will get between 2.5GBs (60%) down to 1.72 (40%). Trust me, ask anyone who designs SOCs, they will confirm the 40 to 60% bandwidth #.
    So the part will be restricted to use cases that current single core/single channel chips can do.

    So this huge chip with 4 cores, 1440p capable, probably 150MT/s 3D, has an Achilles heel the size of Manhattan. Don't believe what Nvidia is saying (that dual channel isn't required). They know its required but for some reason couldn't get it into this chip.

    Reply
  • overzealot - Monday, February 21, 2011 - link

    Actually, as memory frequency increases bandwidth and latency improve. Reply
  • araczynski - Wednesday, February 16, 2011 - link

    so if i know that what i'm about to buy is outdated by a factor of two or five not even a year later, i'm not very likely to bother buying at all. Reply
  • kenyee - Wednesday, February 16, 2011 - link

    Crazy how fast stuff is progressing. I want one...at least this might justify the crazy price of a Moto Xoom tablet.... :-) Reply
  • OBLAMA2009 - Wednesday, February 16, 2011 - link

    it makes a lot of sense to differentiate phones from tablets by giving them much faster cpus, higher resolutions and longer battery life. otherwise why get a tablet if you have a cell phone Reply
  • yvizel - Wednesday, February 16, 2011 - link

    " NVIDIA also expects Kal-El to be somewhere in the realm of the performance of a Core 2 Duo processor (more on this later)."

    I don't think that you referred to this statement anywhere in the article.

    Can you elaborate?
    Reply
  • Quindor - Wednesday, February 16, 2011 - link

    Seems to me NVidia might be pulling a Qualcomm, meaning they are going with what they have and are trying to stretch it out longer and wider before giving us the complete redesign/refresh. You can see this quite clearly at the MWC right now.

    Not a bad strategy as far as I can tell right now. Only threat that I see is that Qualcomm is actually scheduled to release their new core design around the time Nvidia will releasing the Kal-El.

    So who's going to win that bet? ;) More IPC VS Raw Ghz/cores. Quite a reversed world too if you ask me, because Qualcomm was never big on IPC and went for the 1Ghz hype.

    Hopefully NVidia doesn't make the same mistakes as with the GPU market, building such a revolutionary designs that they actually design "sideways" from the market. Making their GPU's fantastic in certain area's, which might not take off at all.

    Mind you, I'm an NVidia fan... but it won't be the first time NVidia releases a revolutionary architecture, which isn't as efficient as they thought it would be. ;)
    Reply

Log in

Don't have an account? Sign up now