The Llano A-Series APU

Although Llano is targeted solely at the mainstream, it is home to a number of firsts for AMD. This is AMD's first chip built on a 32nm SOI process at GlobalFoundries, it is AMD's first microprocessor to feature more than a billion transistors, and as you'll soon see it's the first platform with integrated graphics that's actually worth a damn.

AMD is building two distinct versions of Llano, although only one will be available at launch. There's the quad-core, or big Llano, with four 32nm CPU cores and a 400 core GPU. This chip weighs in at 1.45 billion transistors, nearly 50% more than Sandy Bridge. Around half of the chip is dedicated to the GPU however, so those are tightly packed transistors resulting in a die size that's only 5% larger than Sandy Bridge.

CPU Specification Comparison
CPU Manufacturing Process Cores Transistor Count Die Size
AMD Llano 4C 32nm 4 1.45B 228mm2
AMD Llano 2C 32nm 2 758M ?
AMD Thuban 6C 45nm 6 904M 346mm2
AMD Deneb 4C 45nm 4 758M 258mm2
Intel Gulftown 6C 32nm 6 1.17B 240mm2
Intel Nehalem/Bloomfield 4C 45nm 4 731M 263mm2
Intel Sandy Bridge 4C 32nm 4 995M 216mm2
Intel Lynnfield 4C 45nm 4 774M 296mm2
Intel Clarkdale 2C 32nm 2 384M 81mm2
Intel Sandy Bridge 2C (GT1) 32nm 2 504M 131mm2
Intel Sandy Bridge 2C (GT2) 32nm 2 624M 149mm2

Given the transistor count, big Llano has a deceptively small amount of cache for the CPU cores. There is no large catch-all L3 and definitely no shared SRAM between the CPU and GPU, just a 1MB private L2 cache per core. That's more L2 cache than either the 45nm quad-core Athlon II or Phenom II parts.


Intel's Sandy Bridge die is only ~20% GPU

The little Llano is a 758 million transistor dual-core version with only 240 GPU cores. Cache sizes are unchanged; little Llano is just a smaller version for lower price points. Initially both quad- and dual-core parts will be serviced by the same 1.45B transistor die. Defective chips will have unused cores fused off and will be sold as dual-core parts. This isn't anything unusual, AMD, Intel and NVIDIA all use die harvesting as part of their overall silicon strategy. The key here is that in the coming months AMD will eventually introduce a dedicated little Llano die to avoid wasting fully functional big Llano parts on the dual-core market. This distinction is important as it indicates that AMD isn't relying on die harvesting in the long run but rather has a targeted strategy for separate market segments.

Architecturally AMD has made some minor updates to each Llano core. AMD is promising more than a 6% increase in instructions executed per clock (IPC) for the Llano cores vs. their 45nm Athlon II/Phenom II predecessors. The increase in IPC is due to the larger L2 cache, larger reorder and load/store buffers, new divide hardware, and improved hardware prefetchers.

On average I measured around a 3% performance improvement at the same clock speed as AMD's 45nm parts. Peak performance improved up to 14% however most of the gains were down in the 3—5% range. This is arguably the biggest problem that faces Llano. AMD's Phenom architecture debuted in 2007 and was updated in 2009. Llanos cores have been sitting around for the past 3-4 years with only a mild update while Intel has been through two tocks in the same timeframe. A ~6% increase in IPC isn't anywhere near close enough to bridge the gap left by Nehalem and Sandy Bridge.

Note that this comparison is without AMD's Turbo Core enabled, but more on that later.

What Took So Long? The GPU
Comments Locked

177 Comments

View All Comments

  • JarredWalton - Tuesday, June 14, 2011 - link

    The only way to make sure that Intel's current processors aren't at the top of most charts is to leave them out, particularly on notebooks. If we only look at IGP/fGPU, AMD comes out on top of graphics charts, but is that fair to NVIDIA's Optimus technology that allows dynamic switching between IGP and dGPU in a fraction of a second? The overall tone of this article (apart from the CrossFire section) is positive, but still people look at the charts and freak out because we didn't manipulate data to make Llano look even better. It's not bad, but it's certainly not without flaws.
  • kevith - Tuesday, June 14, 2011 - link

    Oh too bad.

    I would like to use a laptop for music production with Nuendo and Win 7.

    It actually reqires a little more graphics-musclle than you might think to run an app like Nuendo.

    And,up to now, it has not been possible to get both a powerful CPU and GPU in the same machine for the money I have.

    So now the fGPU is powerful enough, that's great. But it seems, that the CPU-part of these APU's are too weak.

    Øv...
  • krumme - Tuesday, June 14, 2011 - link

    Øhhh

    Just make sure your computer have 1Gb ram and win xp sp2, Nuendo even runs on single core 2Ghz whatever old shit.
    I would save the money and buy a e350.
    Heck you could even buy an Atom 510, acording to Anandtech, its just as fast as e350 for the cpu side.

    When i think about it. Just do that.
  • ET - Tuesday, June 14, 2011 - link

    As madseven7 commented correctly, this isn't the fastest Llano CPU. There are 45W parts which perform better. They will have less battery life, but a significant increase in core speed. If you're interested in Llano you might want to wait until they get reviewed.
  • JarredWalton - Tuesday, June 14, 2011 - link

    I suspect the 45W Llano parts will only have less battery life if you're specifically doing CPU/GPU intensive tasks. At idle, SNB and Llano should both bottom out at similar levels. For example, if you have a 2630QM and a 2820QM doing nothing, they both run at a very low clock and voltage. We'll test any other Llano chips we can get and report our findings, but other factors (BIOS and firmware optimizations) will generally be more important than whether the TDP is 35W or 45W, at least for our particular battery life tests.
  • Shadowmaster625 - Tuesday, June 14, 2011 - link

    I dont get the Cinebench single threaded results. An N660 is about the same as a desktop X2 250/255 on that benchmark. Yet this A83500M scores only 61% of what an X2 250 does. That would seem to indicate that it is only running at 1.8GHz during that single threaded test. Why so low with 3 idle cores? It should be running at 2.5GHz and scoring 2500, or just neck and neck with a P520. Turbo is clearly not working anywhere near as well as it should be.
  • krumme - Tuesday, June 14, 2011 - link

    Well this is AMD business at work. They are in a constant learning process and have been for the last 40 years.

    Next time they might consider the following:

    1. Dont send some half baked prototypes to the reviewers. Wait fx. 3 more weeks. This is just old Jerry Sanders style.

    2. Consider not sending stuff to Anandtech. As anandtech lives from backlinking also, the site needs the new product. And AMD, - and for the sake of the consumers right decisions, can live without 3 similar i7 plus high end discrete gfx, at 1.200 usd at the top of each chart. If AMD dont understand they have other interest than Anandtech - its business for all - they cannot serve their own interest. And its about time they start to earn their own money. They are competing against Otellini not some stupid schoolboy.
  • JarredWalton - Tuesday, June 14, 2011 - link

    Thanks, krumme; always a helpful response. Lenovo has taken this to heart, I'm sure you'll be happy to know, and is not sending any review samples our way. Amazingly, we're still able to survive. And FWIW, if AMD hadn't sent us anything, we'd have had more content earlier through other sources. The only way they can get us to abide by NDAs is by actually working with us.
  • krumme - Tuesday, June 14, 2011 - link

    Well thank you Jarred. That was an helpfull answer! that explains a lot.

    I hope AMD gives you attention and work with you in the future, its in all your readers interest.

    That means AMD giving you priority, broad access to the right people and more time to do the reviews.
  • JarredWalton - Tuesday, June 14, 2011 - link

    This is something I discussed with AMD numerous times, and it's one of the reasons we want a utility that will show us true CPU clock speeds in real time. Unfortunately, they don't have anything they're willing to share with us right now. They said they have test units where they can monitor this stuff, but it requires special BIOS hooks and those are not present in our preview samples. In theory, Turbo Core should allow the single-threaded Cinebench result to run up to 60% faster than non-Turbo. Of course, we can't even disable Turbo Core either, so we don't know how much TC is actually helping.

    P920 is clocked 6.7% higher than A8-3500M, but 3500M has twice the L2 cache and some other enhancements. With 3500M coming in 17% faster than P920, that would suggest that 3500M averages around 1900MHz, but that could mean it runs at 2.4GHz for a bit and then 1.5GHz for a bit, or somewhere in between.

    Given the way AMD does Turbo Core (monitoring instruction workloads and their relative power requirements), I think that at least right now, it's not being as aggressive as Intel's Turbo Boost. It feels more like Bloomfield and Arrandale turbo, where you got an extra 2-4 bins, rather than Sandy Bridge where you can get an extra 5-10 bins. Hopefully we'll see refinements with Turbo Core over the coming months and years.

Log in

Don't have an account? Sign up now