Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core

Name: Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core
Item: Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core
Author: Anand Lal Shimpi

by Anand Lal Shimpi on September 20, 2011 11:46 AM EST

74 Comments | Add A Comment

74 Comments

Last week NVIDIA provided an update on its Tegra SoC roadmap. Kal-El, its third generation SoC (likely to launch as Tegra 3) has been delayed by a couple of months. NVIDIA originally expected the first Kal-El tablets would arrive in August, but now it's looking like sometime in Q4. Kal-El's successor, Wayne, has also been pushed back until late 2012/early 2013. In between these two SoCs is a new part dubbed Kal-El+. It's unclear if Kal-El+ will be a process shrink or just higher clocks/larger die on 40nm.

In the smartphone spirit, NVIDIA is letting small tidbits of information out about Kal-El as it gets closer to launch. In February we learned Kal-El would be NVIDIA's first quad-core SoC design, featuring four ARM Cortex A9s (with MPE) behind a 1MB shared L2 cache. Kal-El's GPU would also see a boost to 12 "cores" (up from 8 in Tegra 2), but through architectural improvements would deliver up to 3x the GPU performance of T2. Unfortunately the increase in GPU size and CPU core count doesn't come with a wider memory bus. Kal-El is still stuck with a single 32-bit LPDDR2 memory interface, although max supported data rate increases to 800MHz.

We also learned that NVIDIA was targeting somewhere around an 80mm^2 die, more than 60% bigger than Tegra 2 but over 30% smaller than the A5 in Apple's iPad 2. NVIDIA told us that although the iPad 2 made it easier for it to sell a big SoC to OEMs, it's still not all that easy to convince manufacturers to spend more on a big SoC.

Clock speeds are up in the air but NVIDIA is expecting Kal-El to run faster than Tegra 2. Based on competing A9 designs, I'd expect Kal-El to launch somewhere around 1.3 - 1.4GHz.

Now for the new information. Power consumption was a major concern with the move to Kal-El but NVIDIA addressed that by allowing each A9 in the SoC to be power gated when idle. When a core is power gated it is effectively off, burning no dynamic power and leaking very little. Tegra 2 by comparison couldn't power gate individual cores, only the entire CPU island itself.

In lightly threaded situations where you aren't using all of Kal-El's cores, the idle ones should simply shut off (if NVIDIA has done its power management properly of course). Kal-El is built on the same 40nm process as Tegra 2, so when doing the same amount of work the quad-core chip shouldn't consume any more power.

Power gating idle cores allows Kal-El to increase frequency to remaining active cores resulting in turbo boost-like operation (e.g. 4-cores active at 1.2GHz or 2-cores at 1.5GHz, these are hypothetical numbers of course). Again, NVIDIA isn't talking about final clocks for Kal-El or dynamic frequency ranges.

Five Cores, Not Four

Courtesy NVIDIA

Finally we get to the big news. There are actually five ARM Cortex A9s with MPE on a single Kal-El die: four built using TSMC's 40nm general purpose (G) process and one on 40nm low power (LP). If you remember back to our Tegra 2 review you'll know that T2 was built using a similar combination of transistors; G for the CPU cores and LP for the GPU and everything else. LP transistors have very low leakage but can't run at super high frequencies, G transistors on the other hand are leaky but can switch very fast. Update: To clarify, TSMC offers a 40nm LPG process that allows for an island of G transistors in a sea of LP transistors. This is what NVIDIA appears to be using in Kal-El, and what NV used in Tegra 2 prior.

The five A9s can't all be active at once, you either get 1 - 4 of the GP cores or the lone LP core. The GP cores and the LP core are on separate power planes.

NVIDIA tells us that the sole point of the LP Cortex A9 is to provide lower power operation when your device is in active standby (e.g. screen is off but the device is actively downloading new emails, tweets, FB updates, etc... as they come in). The LP core runs at a lower voltage than the GP cores and can only clock at up to 500MHz. As long as the performance state requested by the OS/apps isn't higher than a predetermined threshold, the LP core will service those needs. Even with your display on it's possible for the LP core to be active, so long as the performance state requested by the OS/apps isn't too high.

Courtesy NVIDIA

Once it crosses that threshold however, the LP core is power gated and state is moved over to the array of GP cores. As I mentioned earlier, both CPU islands can't be active at the same time - you only get one or the other. All five cores share the same 1MB L2 cache so memory coherency shouldn't be difficult to work out.

Android isn't aware of the fifth core, it only sees up to 4 at any given time. NVIDIA accomplishes this by hotplugging the cores into the scheduler. The core OS doesn't have to be modified or aware of NVIDIA's 4+1 arrangement (which it calls vSMP). NVIDIA's CPU governor code defines the specific conditions that trigger activating cores. For example, under a certain level of CPU demand the scheduler will be told there's only a single core available (the companion core). As the workload increases, the governor will sleep the companion core and enable the first GP core. If the workload continues to increase, subsequent cores will be made available to the scheduler. Similarly if the workload decreases, the cores will be removed from the scheduling pool one by one.

Courtesy NVIDIA

NVIDIA can switch between the companion and main cores in under 2ms. There's also logic to prevent wasting time flip flopping between the LP and GP cores for workloads that reside on the trigger threshold.

NVIDIA expects pretty much all active work to be done on the quad-core GP array, it's really only when your phone is idle and dealing with background tasks that the LP core will be in use. As a result of this process dichotomy NVIDIA is claiming significant power improvements over Tegra 2, despite an increase in transistor count:

Courtesy NVIDIA

NVIDIA isn't talking about GPU performance today but it did reveal a few numbers in a new white paper:

Courtesy NVIDIA

We don't have access to the benchmarks here but everything was run on Android 3.2 at 1366 x 768 with identical game settings. The performance gains are what NVIDIA has been promising, in the 2 - 3x range. Obviously we didn't run any of these tests ourselves so approach with caution.

Final Words

What sold NVIDIA's Tegra 2 wasn't necessarily its architecture, but timing and the fact that it was Google's launch platform for Honeycomb. If the rumors are correct, NVIDIA isn't the launch partner for Ice Cream Sandwich, which means Kal-El has to stand on its own as a convincing platform.

Courtesy NVIDIA

The vSMP/companion core architecture is a unique solution to the problem of increasing SoC performance while improving battery life. This is a step towards heterogenous multiprocessing, despite the homogenous implementation in Kal-El. It remains to be seen how tangible is the companion core's impact on real world battery life.

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

74 Comments

View All Comments

z0mb13n3d - Tuesday, September 20, 2011 - link
Makes sense. Without content to make use of the underlying hardware, all the "MY PH0NE PWNZ URZ" comments are just silly and good enough to draw pretty charts and graphs.

If nvidia can push out compelling enough games/apps that make actual use of all 4 cores/better GPU at launch, the Exynos might still look like the champion, on paper. Samsung doesn't seem to care much about software and content (aside from their UI), TI is busy doing god alone knows what and only Qualcomm is beginning to understand that content is just as important as the silicon they throw out, although they still have quite some way to go.

This is quite similar to what Intel is facing (as Anand pointed out) with the QuickSync technology. Excellent tech on paper, but with little to no freely available apps that makes actual use of the tech, it's all just a big pile of useless.
Draiko - Tuesday, September 20, 2011 - link
Unless devs are either excited or incentivized, they're going to build apps that run on the largest number of devices.

nVidia is incentivizing. Other companies should do the same. There will always be a common library of Android apps that run on all devices.

I'm sure that once the dev tools get more advanced and the platform matures, we'll see general apps and games that work on all devices but have abilities that are enabled only on certain hardware.
Death666Angel - Tuesday, September 20, 2011 - link
Since most other competitors will be using 28nm technology and Cortex A15 for their quad cores (afair), it stands to reason that a quad core built on the 40nm technology with A9 innards will be quite the power hog. :-)

I'm very interested to see how the next round of ARM refreshes goes.
Draiko - Tuesday, September 20, 2011 - link
Ummm... those 28nm SOCs like Krait and OMAP5 won't be in products for a while. They're also pretty expensive to make so OEMs will shy away from using them at first.

Tegra Kal-el products are going to be on store shelves as early as next month and after using a Tegra 2 (40nm dual-A9), I'm pretty sure the Kal-el won't be a power hog.
jjj - Tuesday, September 20, 2011 - link
Dual core Krait at 1.5-1.7GHz is supposed to show up in devices early next year (according to Qualcomm anyway).
Draiko - Tuesday, September 20, 2011 - link
Last I heard, they were scheduled to start sampling Krait in Q2 2011 and release devices around a year+ later. That puts Krait devices almost another year out at best. Tegra Wayne devices will be shipping by then.
jjj - Tuesday, September 20, 2011 - link
your info is outdated
Draiko - Tuesday, September 20, 2011 - link
No it isn't, they were sampling in volume back in June. That was on-schedule (June is part of Q2 last time I checked).

A few hopeful bloggers were saying that Krait might hit early. We'll see.
jjj - Tuesday, September 20, 2011 - link
28nm parts started sampling in Q2,that part is true.
In the last month Qualcomm said multiple times that phones will show up early nest year,most recently at Qualcomm IQ in Istambul (watch out some sites wrongly reported that they'll have 2.5GHz quads).Now ofc this is what they expect and as always things can go somewhat differently.
As for wayne,i wouldn't expect it in 2012.
Draiko - Tuesday, September 20, 2011 - link
If nVidia doesn't show Wayne at CES 2012, I wouldn't expect it in 2012. Until we see or don't see Wayne, we can only make assumptions based on nVidia's roadmap in which they've clearly committed to a new Tegra every year.

Early next year could mean anything before June and most likely points to a MWC showcase. Qualcomm is pushing Krait up because of increased competition. They even cancelled the MSM8672.

I'll also remind you that Qualcomm's roadmap for the MSM8660 stated a Q3 2010 release but the first product was the Pantech Vega Racer which didn't hit until May, 2011. The US launched the first MSM8660 equipped device in June (Evo 3D).

Using that schedule history, the Dual-core MSM8690 equipped products won't hit shelves until Q3 2012 and the Quad-core Kraits (Q1 2013) won't hit stores until Q4 2013.

Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core

Five Cores, Not Four

Final Words

Post Your Comment

74 Comments

View All Comments

z0mb13n3d - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

Death666Angel - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

Log in

Don't have an account? Sign up now