Original Link: http://www.anandtech.com/show/4181/nvidias-project-kalel-quadcore-a9s-coming-to-smartphonestablets-this-year

If there's any one takeaway from both CES and Mobile World Congress this year it's that NVIDIA is unequivocally a player in the SoC space. With design wins from LG, Motorola and Samsung, NVIDIA may not have the entire market but it has enough of it to be taken seriously.

In our Optimus 2X Review I mentioned that it looked like NVIDIA was going to be moving to a 6-month product cycle in the SoC space. The intention is to out execute its competitors frequently enough that they are either forced out of the market or into making a mistake trying to keep up. It's the same strategy that NVIDIA used to compete with 3dfx almost fifteen years ago.

I wrote that in 2011 NVIDIA would release Tegra 2 followed by the Tegra 2 3D (a higher clocked version of the Tegra 2 with support for 3D content) and finally the Tegra 3 before the end of the year. While it wasn't too long ago that NVIDIA was telling people about its 6-month product cycle, things have changed.

The Tegra 2 3D looks like it's not going to happen. The higher clocked SoC is not currently in any designs that are in the pipeline. There are Tegra 2 based smartphones and tablets that are due out this year, but nothing based on T25/AP25 as far as I can tell.

Although the middle of the roadmap changed, it's the end of 2011 that's sort of amazing. Internally NVIDIA referred to this chip as Tegra 3, and externally we expected it at the tail end of 2011 with devices launching in Q1 2012.

NVIDIA got the first silicon back from the fab 12 days ago. While the chip may end up being called Tegra 3 or some variation of that, for now NVIDIA refers to it as Project Kal-El. Named after young superman (or Nicholas Cage's son), Kal-El will be sampling this year and shipping in devices as early as August 2011.

The Roadmap

I must say that this is highly unlikely behavior for a SoC manufacturer. Qualcomm recently announced its dual-core MSM8960 would be sampling in Q2 2011 and shipping in devices starting next year. NVIDIA is announcing sampling starting sometime very soon (the chip is only 12 days old after all) and device availability before the end of the year.

NVIDIA went on to be even more specific. Tablets based on Kal-El will be available starting August 2011, while smartphones will be available this Christmas and into the first half of next year. This is either NVIDIA over committing to an unrealistic future or the most aggressive schedule we've seen from an SoC vendor yet. NVIDIA won some points by actually pulling off the coup with Tegra 2 this year, however it's still too early to tell whether we'll see the whole thing repeated again just 9 months from now. I'm willing to at least give NVIDIA the benefit of the doubt here.

It doesn't stop with Kal-El either. NVIDIA is committing to a yearly refresh of its architecture, NVIDIA quantifies the move from Tegra 2 to Kal-El as a 5x increase in performance. By 2012 we'll have Wayne, which doulbes performance over Kal-El. Then we've got another 5x increase over Wayne with Logan in 2013. The furthest NVIDIA is willing to go out is 2014 with Stark, at roughly a doubling of the performance offered by Logan.

The baseline reference point is Tegra 2, which NVIDIA expects Stark to outperform by a factor of 100x. NVIDIA also expects Kal-El to be somewhere in the realm of the performance of a Core 2 Duo processor (more on this later).

Based on the cadence that NVIDIA presented, it looks like every year we'll either get a doubling or 5x increase in performance over the previous year. Kal-El is one of those 5x years, followed by a doubling with Wayne, 5x again with Logan and a doubling with Stark. Now the performance axis in the chart above is really vague, so end users will likely not see 5x Tegra 2 with Kal-El, but they will see something tangible at least.

The Architecture

Kal-El looks a lot like NVIDIA's Tegra 2, just with more cores and some pin pointed redesigns. The architecture will first ship in a quad-core, 40nm version. These aren't NVIDIA designed CPU cores, but rather four ARM Cortex A9s running at some presently unannounced clock speed. I asked NVIDIA if both the tablet and smartphone versions of Kal-El will feature four cores. The plan is for that to be the case, at least initially. NVIDIA expects high end smartphones manufacturers to want to integrate four cores this year and going in to 2012.

The CPU cores themselves have changed a little bit. Today NVIDIA's Tegra 2 features two Cortex A9s behind a shared 1MB L2 cache. Kal-El will use four Cortex A9s behind the same shared 1MB L2 cache.

NVIDIA also chose not to implement ARM's Media Processing Engine (MPE) with NEON support in Tegra 2. It has since added in MPE to each of the cores in Kal-El. You may remember that MPE/NEON support is one of the primary differences between TI's OMAP 4 and NVIDIA's Tegra 2. As of Kal-El, it's no longer a difference.

Surprisingly enough, the memory controller is still a single 32-bit wide LPDDR2 controller. NVIDIA believes that even a pair of Cortex A9s can not fully saturate a single 32-bit LPDDR2 channel and anything wider is a waste of power at this point. NVIDIA also said that effective/usable memory bandwidth will nearly double with Kal-El vs. Tegra 2. Some of this doubling in bandwidth will come from faster LPDDR2 (perhaps up to 1066?) while the rest will come as a result of some changes NVIDIA made to the memory controller itself.

Power consumption is an important aspect of Kal-El and Kal-El is expected to require, given the same workload, no more power than Tegra 2. Whether it's two fully loaded cores or one fully loaded and one partially loaded core, NVIDIA believes there isn't a single example of a situation where equal work is being done and Kal-El isn't lower power than Tegra 2. Obviously if you tax all four cores you'll likely have worse battery life than with a dual-core Tegra 2 platform, but given equal work you should see battery life that's equal if not better than a Tegra 2 device of similar specs. Given that we're still talking about a 40nm chip, this is a pretty big claim. NVIDIA told me that some of the power savings in Kal-El are simply due to learnings it had in the design of Tegra 2, while some of it is due to some pretty significant architectural discoveries. I couldn't get any more information than that.

Kal-El vs. Tegra 2 running 3D game content today at 2 - 2.5x the frame rate

On the GPU side, Kal-El implements a larger/faster version of the ULP GeForce GPU used in Tegra 2. It's still not a unified shader architecture, but NVIDIA has upped the core count from 8 to 12. Note that in Tegra 2 the 8 cores refer to 4 vertex shaders and 4 pixel shaders. It's not clear how the 12 will be divided in Kal-El but it may not be an equal scaling to 6+6.

The GPU clock will also be increased, although it's unclear to what level.

The combination of the larger GPU and the four, larger A9 cores (MPE is not an insignificant impact on die area) results in an obviously larger SoC. NVIDIA measures the package of the AP30 (the smartphone version of Kal-El) at 14mm x 14mm. The die size is somewhere around 80mm^2, up from ~49mm^2 with Tegra 2.


Video Decode

One of the stones we've thrown at NVIDIA is the lack of high profile H.264 decode support. Tegra 2 can decode main profile H.264 at up to 20Mbps, but throw any high profile 1080p content at the chip and it can't do it. This is a problem because a lot of video content out there today is high profile, high bitrate 1080p H.264. Today, even on Tegra 2, you'll have to transcode a lot of your 1080p video content to get it to play on the phone.

With Kal-El, that could change.

NVIDIA's video decoder gets an upgrade in Kal-El to support H.264 at 40Mbps sustained (60Mbps peak) at a resolution of 2560 x 1440. This meets the bandwidth requirements for full Blu-ray disc playback. NVIDIA didn't just make the claim however, it showed us a 50Mbps 1440p H.264 stream decoded and output to two screens simultaneously: a 2560 x 1600 30" desktop PC monitor and a 1366 x 768 tablet display.

Did I mention that this is 12-day-old A0 silicon?

Kal-El also supports stereoscopic 3D video playback, although it's unclear to me what the SoC's capabilities are for 3D capture.

I asked NVIDIA if other parts of the SoC have changed, particularly the ISP as we've seen in both the Optimus 2X and Atrix 4G articles that camera quality is pretty poor on the initial Tegra 2 phones. NVIDIA stated that both ISP performance and quality will go up in Kal-El although we don't know any more than that. NVIDIA did insist that its own development Tegra 2 platforms have good still capture quality, so what we've seen from LG and Motorola may just be limited to those implementations.


Final Words

The first thing everyone at NVIDIA asked me after I saw Kal-El running was an eager and expected: "well, what did you think?"

On the one hand, we have a clear underdog in the SoC space demonstrating a brand new chip just 12 days after getting it back from the fab. It's functional, it can render 3D games, it can decode high bitrate video and it runs Android today. The word impressive is insufficient to convey the magnitude of what I just described, particularly in the SoC space.

On the other hand, it's still just an announcement. It wasn't too long ago that NVIDIA was struggling to name a single design win. The recent success with LG, Motorola and Samsung is awesome, but it isn't a guarantee of what's to come. That being said, the handset vendors and carriers clearly take NVIDIA seriously today and they would be foolish not to consider Kal-El as it'll be the quickest way to get to quad-core in an Android phone.

Architecturally, Kal-El isn't a huge departure from what we currently have today with Tegra 2. NVIDIA claims a 5x performance improvement over Tegra 2 however that seems a bit optimistic. The 5x gains appear to be from combining the 2x theoretical gain from 2 to 4 cores plus a 3x gain from the new GPU. NVIDIA claims that this is enough to put Kal-El above a Core 2 Duo clocked at 2GHz (see the test results below), however the NVIDIA generated scores seem suspect not to mention that Coremark isn't representative of the sort of workload you'd see on a smartphone/tablet. 

If NVIDIA can increase clock speeds a bit we'll see better performance than Tegra 2 on lightly threaded workloads, but I'm not convinced of the gains to be had in single-tasking workloads from four cores in a smartphone/tablet. The bigger gains over Tegra 2 will likely come from any improvements to the memory controller as well as the faster GPU. This being said, NVIDIA does believe that even web page rendering can benefit significantly from a quad-core CPU so I could be very well proven wrong once devices are out in the wild.

If NVIDIA can secure significant design wins with Kal-El based tablets in August of this year and smartphones in Q4 I will be beyond impressed. NVIDIA gets major points for putting on good demos of working silicon today but in this business you need to have devices. For now we play the waiting game. I suspect if you're not taking NVIDIA seriously at this point, you really should be.

Log in

Don't have an account? Sign up now