After cashing Intel’s check and appearing more competitive than expected against Clarkdale 2010 is like a fresh start for AMD. The news gets better.

Late last year AMD said that before the end of 2010 it would be sampling its first APU (Accelerated Processing Unit) - codenamed Llano. Today AMD is announcing that the first Llano samples, built on Global Foundries 32nm high-k + metal gate, SOI process will be sampling to partners in the first half of this year.

GF's 32nm SOI High-K + MG process will be used with Llano

For those not in the know, Llano is AMD’s first hybrid CPU-GPU with on-die graphics. The graphics core is a derivative of AMD’s DirectX 11 Evergreen lineup (the same lineage as the Radeon HD 5970, 5870, 5850, 5670, 5570, 5450, etc...).

Llano will go up against Sandy Bridge, which seems to have been pushed back to 2011 for volume availability according to Intel’s internal roadmaps. While Sandy Bridge will have graphics on-die, it will still only be DX10 class - AMD will have the feature-set advantage as far as graphics is concerned.

Llano's Features

Today we learn a bit more about the CPU side of Llano. The first chip will be a quad-core processor plus on-die graphics. Each core is Phenom II derived, but there’s no shared L3 cache. So Llano cores look a lot like Athlon II cores. I’m hearing that they may have some architectural tweaks, so performance could be better than present-day Athlon IIs.

At 32nm each core (minus L2 cache) is only 9.69 mm^2 and is made up of over 35M transistors. Each core is paired with its own 1MB L2 cache, meaning the quad-core processor will have a total of 4MB of L2 on-die. AMD expects Llano to run at above 3GHz, which should be more than possible at 32nm given that we’re already at close to 3GHz with the 45nm Athlon II X4.

AMD’s First Power Gated CPU

With Nehalem Intel introduced power gating, a technique that allows a core to be near-completely powered down minimizing leakage current when inactive. This not only reduces idle power but it also enables Intel to use extra TDP to turbo up active cores.

Llano uses power gating as well as a Digital APM Module. AMD doesn’t go into much detail on the digital APM module but I’m guessing we’ll see the same sort of turbo-like functionality out of Llano, including graphics turbo.

AMD also pointed out that Llano uses a “power aware clock grid design”. I couldn’t get much more information out of AMD on this one, other than its expecting a ~2x reduction in clock switching power. Simply distributing the clock to all parts of a modern day microprocessor can take up quite a bit of power, any improvements in efficiency there are very important.

I’ll keep digging to see if I can get any more details on this aspect of Llano.

Final Words

Llano will obviously require a new socket. All AMD is saying is that OEMs will be shipping systems in 2011. It’s unclear if we’ll see anything in the channel before then, but with sampling in the coming months it appears that AMD could be ready for Sandy Bridge when it arrives next year.

AMD isn’t qualifying its 2011 statement with an indication of what quarter to expect systems. Given that the first samples are going out now, I’d expect to see Llano sometime in the first half of 2011 but that’s purely conjecture on my part. Sandy Bridge is scheduled to ship in volume in the first quarter of 2011.

The big questions going forward are 1) how much AMD and Intel are going to scale up its graphics performance on these chips, and 2) how important DX11 support will be to the upcoming APU race.

Comments Locked


View All Comments

  • knowom - Tuesday, February 9, 2010 - link

    I hate new socket boards hopefully AMD will make some 32nm cpu's for their current AM2/AM3 motherboards.
  • Kiijibari - Tuesday, February 9, 2010 - link

    Sure they do .. ever heard of the Bulldozer core ? Take 8 of them, put them on a die together with a memory controller, caches and Hypertransport and say hello to the Zambezi - fitting into AM3.">

    Much better than an old K10 @32nm :)
  • GaMEChld - Wednesday, February 10, 2010 - link

    I suspect that the Zambezi will be 4 Bulldozer modules, each showing up to the OS as 2 cores. I think they will market it as 8, but it will be 4 Bulldozer modules. Not that I'd mind that or anything (My current 4 Cores / 4 Threads aren't exactly being strained). Just want to make sure they aren't trying to slip one past us.

    The main thing I want is for Bulldozer to be much faster clock for clock than STARS.
  • DigitalFreak - Monday, February 8, 2010 - link

    Essentially a quad core Athlon II vs Intel's next gen technology? Gee, I wonder how that will work out.
  • JKflipflop98 - Wednesday, February 10, 2010 - link

    :) I don't think it's going to work out so well for the little guy.
  • Calin - Tuesday, February 9, 2010 - link

    They will hit a lower price bracket than Intel is doing right now.
    Same computing power at lower cost? Bring it on!
  • Calin - Tuesday, February 9, 2010 - link

    They will hit a lower price bracket than Intel is doing right now.
    Same computing power at lower cost? Bring it on!
  • GaiaHunter - Monday, February 8, 2010 - link

    This is just their test run on APUs.

    They will pit (or hope yo) Bulldozer vs higher end Intel.
  • Blessedman - Monday, February 8, 2010 - link

    So with the integration of CPU/GPU, who cuts through the muck and produces a CPU that is more like a GPU as far as parallel processing goes? To clarify it seems GPU's are innovating faster and becoming quicker at doing all purpose functions of mainstream CPU's. So when do they ditch the idea of 2-6 thick core CISC chips in favor of thin mass multi core RISC chips that can emulate CISC commands quicker then native CISC CPU's? Or am I just way way off base?
  • tcube - Saturday, March 20, 2010 - link

    You see, basically you're right ... practlcally you're not. To create an emulator you need extra power. Creating generic wireing for a jack of all trades cpu you need technology that's in my opinion at least 10-20 years out. Besides the fact that you need to manage the entire wireing of small unified processing units, you need to translate the input than you need to make this entire thing happen quicker then to have command->dedicated wireing->output(per cicle). That means that your jack-of-all-trades needs to be so insanely efficiently engineered that it does not waste extra energy cost you extra clock cicles and so on and so foth. And this my friend is not on anybodys drawingboards at this moment. Even AMD has sensed that this is the future back then when they decided to go ahead with fusion (And only now intel has gotten the idea and are going with larrabee)... but they just didn't realise how complex the task is... luckilly for us intell did, and gave us some good cpus while AMD was in the s(t)inkhole... now they seem to come back, let's wellcome this advancement and hope for the best competition 2010+

Log in

Don't have an account? Sign up now