It’s an Out of Order Atom

Ever since the Pentium Pro (P6), we have been blessed with out of order microprocessor architectures - these being designs that can execute instructions out of program order to improve performance. Out of order architectures let you schedule independent instructions ahead of others that are either waiting for data from main memory or waiting for specific execution resources to free up. The resulting performance boost comes at the expense of power and die size. All of the tracking logic to make sure that instructions executed out of order still retire in order eats up die area as well as more power.

When Intel designed the Atom processor it went back to an in-order design as a way of reducing power. Intel has committed to using in-order architectures in Atom for 4 - 5 years post introduction (that would end sometime in the 2012 - 2013 time frame).

For smartphones, Intel’s commitment to in-order makes sense. Average power consumption under load needs to remain at less than 1W and you simply can’t hit that with an out-of-order Atom at 45nm.

For netbooks and notebooks however, the tradeoff makes less sense. Jarred has often argued that a CULV notebook is a far better performer than a netbook at very similar price/battery life metrics. No one is pleased with Atom’s performance in a netbook, but there’s clearly demand for the form factor and price point. Where there’s an architectural opportunity like this, AMD is usually there to act.

Over the past decade AMD has refrained from copying an Intel design, instead AMD usually looks to leapfrog Intel by implementing forward looking technologies earlier than its competitor. We saw this with the 64-bit K8 and the cache hierarchy of the original Phenom and Phenom II processors. Both featured design decisions that Intel would later adopt, they were simply ahead of their time.

With Atom stuck in an in-order world for the near future, AMD’s opportunity to innovate is clear.

The Architecture

Admittedly I was caught off guard by Bobcat’s architecture: it’s a dual-issue design, the first AMD has introduced since the K6 and also the same issue width Intel chose for Atom. Where AMD and Intel diverge however is in the execution side: Bobcat is a fully out of order architecture.

The move to out of order should provide a healthy single threaded performance boost over Atom, assuming AMD can ramp clocks up. Bobcat has a 15 stage integer pipeline, very close to Atom's 16 stage pipe. The two pipeline diagrams are below:


Click to Enlarge


Intel's Atom pipeline

You’ll note that there are technically six fetch stages, although only the first three are included in the 15 stage number I mentioned above. AMD mentioned that the remaining three stages are used for branch prediction, but in a manner it is unwilling to disclose at this time due to competitive concerns.

Bobcat has two independent, dual ported integer scheduler. One feeds two ALUs (one of which can perform integer multiplies) while the other feeds two AGUs (one for loads and one for stores).

The FPU has a single dual ported scheduler that feeds two independent FPUs. Similar to the Atom processor, only one of the ports can handle floating point multiplies. The FP mul and add units can perform two single precision (32-bit) multiplies/adds per cycle. Like the integer side, the FPU uses a physical register file to reduce power.

Bobcat supports SSE1-3, with future versions adding more instructions as necessary.

Bobcat supports out of order loads and stores similar to Intel’s Core architecture as well.

The Bobcat core has a 3-cycle 64KB L1 (32KB instruction + 32KB data cache) that’s 8-way set associative. The L2 cache is a 17-cycle, 512KB 16-way set associative cache. I originally measured Atom’s L1 and L2 at 3 and 18 cycles respectively (I’ve heard numbers as low as 15 for Atom’s L2) so AMD is definitely in the right ballpark here.


Intel's Atom Microarchitecture

Unlike the original Atom, Bobcat will never ship as a standalone microprocessor. Instead it will be integrated with other cores and a GPU and sold as a single SoC. The first incarnation of Bobcat will be a processor due out in early 2011 for netbooks and thin and light notebooks called Ontario. Ontario will integrate two Bobcat cores with an AMD GPU manufactured on TSMC’s 40nm process (Bobcat will be the first x86 core made at TSMC). This will be the first Fusion product to hit the market.

Note that there's an on-die memory controller but it's actually housed in between the CPU and GPU in order to equally serve both masters.

The Three Chip Roadmap Bobcat Performance & Power
Comments Locked

76 Comments

View All Comments

  • mino - Tuesday, August 24, 2010 - link

    From the HW design POW, those pipes are "MMX/3Dnow" class stuff.
    They run SSE3, but they are still MMX-class.

    There is a reason Bulldozer has "FMAC" written there ...
  • Kiijibari - Tuesday, August 24, 2010 - link

    ... it is stupid to name a circuit after a deprecated ISA extension and not after its function.
    If its doing stuff like 3dnow and mmx then call it Shuffel / permutation pipeline but not MMX ...

    The FMAC is the best example .. why is it written FMAC in that case and not SSE5/AVX/XOP ?
  • KonradK - Thursday, August 26, 2010 - link

    Depracated does not mean prohibited. Also there are existing MMX programs and other than Windows 64bit operating systems and compilers other than MSVSC.

    MMX and x87 is prohibited in 64bit kernel code.

    http://msdn.microsoft.com/en-us/library/ff545910%2...
  • iwod - Tuesday, August 24, 2010 - link

    From the design of Bulldozer's FPU it is cleared that AMD want Multi Threaded FPU to run on OpenCL. While the dual Integer looks interesting now. It is up against the SandyBridge, the architecture that is suppose to leap again like Pentium 4 to C2D. And if Bulldozer comes any later, it will be up against the die shrink of SandyBridge, Ivy Bridge. Things dont look so good in here.

    It is mainstream / low end that looks very interesting. I am currently using a Pentium M 1.8Ghz Dothan with 2GB DDR Ram. With a Radeon 1600 Graphics. I dont get hardware acceleration from GPU, 720P is just barely playable with some very fast software decoder. It is fast enough to watch some 460p youtube and most of my day web serving.

    Now if Bobcat have similar or higher IPC then Dothan. A Quad Core Bobcat with Radeon 5000 64 SP will still be within reasonable die size on 40nm, It will be cheap when it drops to 32nm or lower. Most of us dont need SUPER FAST computer. And Bobcat with Radeon 5 Series or Higher Plus a Fast SSD are all we need.
  • aegisofrime - Tuesday, August 24, 2010 - link

    I don't recall Sandy Bridge being a revolutionary leap. Everyone has been saying that it's more of evolutionary, the main difference being the addition of AVX.

    I REALLY REALLY REALLY hope that AMD announces later today what socket Bulldozer will be on... I desperately need more video encoding performance. I have a AM2+ motherboard and that bloody 1055T is singing it's siren song to me every night. If Bulldozer is on AM3 I can get an AM3 board and the 1055T and do a quick upgrade to Bulldozer.

    Come on AMD. Your customers need more information to make an informed decision!
  • mino - Tuesday, August 24, 2010 - link

    Buldozer gen1 == primarily servers
    => 16/12-core (MCM) Socket G34 (current platfrom)
    => 8/6/4-core Socket G32 (current platfrom)

    Bulldozer Desktop (hopefully before X-mas 2011)
    => 8?/6/4-core Socket AM3R2(or AM3+, whatever they call it)
  • Pirks - Tuesday, August 24, 2010 - link

    Huh? You want more video encoding perfomance and you think about upgrading CPU? What kind of idiocy is that? Use 480GTX with Badaboom and your video encoding speed won't be matched by CPUs of year 2020 or maybe even 2030 :P
  • aegisofrime - Tuesday, August 24, 2010 - link

    Don't talk if you don't know what you are talking about. No GPU encoder out there is able to match x264 quality or SPEED wise. And the huge flaw in your statement is that Badaboom doesn't even support Fermi GPUs right now.

    Have you done any serious video encoding before, or are you just trolling as usual?
  • ChronoReverse - Tuesday, August 24, 2010 - link

    Indeed. I would try out CUDA encoders every once in a while in hopes that I could at least get the quality of x264 at MINIMUM quality but they can't even match that.

    Since x264 at minimum quality encodes slightly quicker (on my quad core) a CUDA encoder does (on my GTX260) and still yields better quality, I really appreciate faster CPU's.
  • mapesdhs - Tuesday, August 24, 2010 - link


    Hate to say it but unless GPU acceleration is available, the i7 is a far better
    choice for video encoding. I still use a 6000+ for most tasks, but numerous
    article reviews made it quite clear that AMD was not the best choice for
    video encoding, so I went with an i7 860 4GHz. Pricing was surprisingly good,
    speed is excellent.

    Ian.

Log in

Don't have an account? Sign up now