It’s an Out of Order Atom

Ever since the Pentium Pro (P6), we have been blessed with out of order microprocessor architectures - these being designs that can execute instructions out of program order to improve performance. Out of order architectures let you schedule independent instructions ahead of others that are either waiting for data from main memory or waiting for specific execution resources to free up. The resulting performance boost comes at the expense of power and die size. All of the tracking logic to make sure that instructions executed out of order still retire in order eats up die area as well as more power.

When Intel designed the Atom processor it went back to an in-order design as a way of reducing power. Intel has committed to using in-order architectures in Atom for 4 - 5 years post introduction (that would end sometime in the 2012 - 2013 time frame).

For smartphones, Intel’s commitment to in-order makes sense. Average power consumption under load needs to remain at less than 1W and you simply can’t hit that with an out-of-order Atom at 45nm.

For netbooks and notebooks however, the tradeoff makes less sense. Jarred has often argued that a CULV notebook is a far better performer than a netbook at very similar price/battery life metrics. No one is pleased with Atom’s performance in a netbook, but there’s clearly demand for the form factor and price point. Where there’s an architectural opportunity like this, AMD is usually there to act.

Over the past decade AMD has refrained from copying an Intel design, instead AMD usually looks to leapfrog Intel by implementing forward looking technologies earlier than its competitor. We saw this with the 64-bit K8 and the cache hierarchy of the original Phenom and Phenom II processors. Both featured design decisions that Intel would later adopt, they were simply ahead of their time.

With Atom stuck in an in-order world for the near future, AMD’s opportunity to innovate is clear.

The Architecture

Admittedly I was caught off guard by Bobcat’s architecture: it’s a dual-issue design, the first AMD has introduced since the K6 and also the same issue width Intel chose for Atom. Where AMD and Intel diverge however is in the execution side: Bobcat is a fully out of order architecture.

The move to out of order should provide a healthy single threaded performance boost over Atom, assuming AMD can ramp clocks up. Bobcat has a 15 stage integer pipeline, very close to Atom's 16 stage pipe. The two pipeline diagrams are below:


Click to Enlarge


Intel's Atom pipeline

You’ll note that there are technically six fetch stages, although only the first three are included in the 15 stage number I mentioned above. AMD mentioned that the remaining three stages are used for branch prediction, but in a manner it is unwilling to disclose at this time due to competitive concerns.

Bobcat has two independent, dual ported integer scheduler. One feeds two ALUs (one of which can perform integer multiplies) while the other feeds two AGUs (one for loads and one for stores).

The FPU has a single dual ported scheduler that feeds two independent FPUs. Similar to the Atom processor, only one of the ports can handle floating point multiplies. The FP mul and add units can perform two single precision (32-bit) multiplies/adds per cycle. Like the integer side, the FPU uses a physical register file to reduce power.

Bobcat supports SSE1-3, with future versions adding more instructions as necessary.

Bobcat supports out of order loads and stores similar to Intel’s Core architecture as well.

The Bobcat core has a 3-cycle 64KB L1 (32KB instruction + 32KB data cache) that’s 8-way set associative. The L2 cache is a 17-cycle, 512KB 16-way set associative cache. I originally measured Atom’s L1 and L2 at 3 and 18 cycles respectively (I’ve heard numbers as low as 15 for Atom’s L2) so AMD is definitely in the right ballpark here.


Intel's Atom Microarchitecture

Unlike the original Atom, Bobcat will never ship as a standalone microprocessor. Instead it will be integrated with other cores and a GPU and sold as a single SoC. The first incarnation of Bobcat will be a processor due out in early 2011 for netbooks and thin and light notebooks called Ontario. Ontario will integrate two Bobcat cores with an AMD GPU manufactured on TSMC’s 40nm process (Bobcat will be the first x86 core made at TSMC). This will be the first Fusion product to hit the market.

Note that there's an on-die memory controller but it's actually housed in between the CPU and GPU in order to equally serve both masters.

The Three Chip Roadmap Bobcat Performance & Power
Comments Locked

76 Comments

View All Comments

  • Dustin Sklavos - Tuesday, August 24, 2010 - link

    Comments like this really bother me. You may not care about netbooks, but a lot of people do. Current ones don't pass the grandma test - your grandmother can do whatever task she needs to on them, like check e-mail, browse the internet, watch HD video - and any advance here is welcome.

    Generally speaking a netbook is not supposed to be your main machine, but something you can chuck into your bag and take with you and do a little work on here and there. I write a lot, and have to work on other peoples' computers from time to time, so a netbook that doesn't completely suck is invaluable to me. Netbook performance is dismal right now, but Bobcat could successfully fix this market segment.

    So no, you're not interested in netbooks and you'd rather be raked through hot coals than purchase one. But that just means they're not useful - TO YOU. There are a lot of people here interested in what Bobcat can do for these portables, and I count myself among them.
  • Lonbjerg - Wednesday, August 25, 2010 - link

    I don't care that many people care for mediocore performance in a crappy format.
    Not matter what you do with a netbook, it will alway be lacking.

    I don't care what gandma wants (she will buy intel BTW, due to Intel's brand recognition)

    I don't care for Atom either.
    Or i3
    Or i5
    Or Phenom
    I do care about a replacement for my i7 @ 3.5GHz...
  • Dustin Sklavos - Wednesday, August 25, 2010 - link

    I'm trying to figure out why you're commenting on any of this at all.
  • flipmode - Tuesday, August 24, 2010 - link

    Seriously Anand, it is crummy that I cannot find a whole section of your website. I hate to spam an entirely separate article, but how completely lame it is to have to spend 15 minutes doing a Google advanced search to find the Anandtech article I'm looking for.

    One of the very, very few truly Class A+ hardware sites on the internet - you can count all the members of that class on one hand - and you make it seriously hard to find past articles and you completely OMIT a link to an entire category of your reviews. Insane.

    Please put a link to the "System" section somewhere. Please!
  • JarredWalton - Tuesday, August 24, 2010 - link

    Our system section hasn't had a lot of updates, but you can get there via:
    http://www.anandtech.com/tag/systems

    In fact, most common tags can be put there (i.e. /AMD, /Intel, /NVIDIA, /HP, /ASUS, etc.) The only catch is that many of the tags will only bring up articles since the site redesign, so you'll want to stick with the older main topics for some areas. Hope that helps.
  • mino - Tuesday, August 24, 2010 - link

    "so I’m wondering if we’ll see Bulldozer adopt a 3 - 4 channel DDR3 memory controller"

    Bulldozer will use current G34 platform. Hoe that answers your wonder :)
  • VirtualLarry - Tuesday, August 24, 2010 - link

    BullDozer sounds like amazing stuff. I wonder, if the way that they have arranged int units into modules, if that means that we will be getting more cores for our dollars, compared to Intel. More REAL cores, I mean. I'm just a little disappointed that the int pipelines went from 3 ALU to 2 ALU, I hope that doesn't affect performance too much.
  • gruffi - Thursday, August 26, 2010 - link

    Integer instruction pipelines are increased from 3 to 4. That's 33% more peak throughput. The number of ALUs/AGUs to keep these pipelines busy is meaningless without knowing details. K10 has 3 ALUs and 3 AGUs, but they are bottlenecked and partially idling most of the time. Bulldozer can do more operations per cycle while drawing less power, even with only 2 ALUs and 2 AGUs. How can that be disappointing?
  • ezodagrom - Tuesday, August 24, 2010 - link

    I think Bulldozer has the potential to be really competitive, mainly because Sandy Bridges looks quite unimpressive.
    In a recent leaked powerpoint from Intel, apparently until Q3 2011 the best Intel CPU is still going to be Gulftown based, possibly Core i7 990X. According to Intel benchmarks on the leaked powerpoint, the best Sandy Bridge, that is, Core i7 2600, apparently will be around 15% to 25% better than the i7 870, with the i7 980X being 25% to 35% better than the i7 2600.
  • Mat3 - Tuesday, August 24, 2010 - link

    I have a question.. it was earlier speculated that BD would have four ALU pipelines per integer core. It was thought that one way they could make use of them was to send a branch down two pipes and take the correct result. Obviously this isn't the case, but my question is, why not? Wouldn't it be better to do that and just discard the branch predictors entirely? Why isn't that better?

Log in

Don't have an account? Sign up now