It’s an Out of Order Atom

Ever since the Pentium Pro (P6), we have been blessed with out of order microprocessor architectures - these being designs that can execute instructions out of program order to improve performance. Out of order architectures let you schedule independent instructions ahead of others that are either waiting for data from main memory or waiting for specific execution resources to free up. The resulting performance boost comes at the expense of power and die size. All of the tracking logic to make sure that instructions executed out of order still retire in order eats up die area as well as more power.

When Intel designed the Atom processor it went back to an in-order design as a way of reducing power. Intel has committed to using in-order architectures in Atom for 4 - 5 years post introduction (that would end sometime in the 2012 - 2013 time frame).

For smartphones, Intel’s commitment to in-order makes sense. Average power consumption under load needs to remain at less than 1W and you simply can’t hit that with an out-of-order Atom at 45nm.

For netbooks and notebooks however, the tradeoff makes less sense. Jarred has often argued that a CULV notebook is a far better performer than a netbook at very similar price/battery life metrics. No one is pleased with Atom’s performance in a netbook, but there’s clearly demand for the form factor and price point. Where there’s an architectural opportunity like this, AMD is usually there to act.

Over the past decade AMD has refrained from copying an Intel design, instead AMD usually looks to leapfrog Intel by implementing forward looking technologies earlier than its competitor. We saw this with the 64-bit K8 and the cache hierarchy of the original Phenom and Phenom II processors. Both featured design decisions that Intel would later adopt, they were simply ahead of their time.

With Atom stuck in an in-order world for the near future, AMD’s opportunity to innovate is clear.

The Architecture

Admittedly I was caught off guard by Bobcat’s architecture: it’s a dual-issue design, the first AMD has introduced since the K6 and also the same issue width Intel chose for Atom. Where AMD and Intel diverge however is in the execution side: Bobcat is a fully out of order architecture.

The move to out of order should provide a healthy single threaded performance boost over Atom, assuming AMD can ramp clocks up. Bobcat has a 15 stage integer pipeline, very close to Atom's 16 stage pipe. The two pipeline diagrams are below:


Click to Enlarge


Intel's Atom pipeline

You’ll note that there are technically six fetch stages, although only the first three are included in the 15 stage number I mentioned above. AMD mentioned that the remaining three stages are used for branch prediction, but in a manner it is unwilling to disclose at this time due to competitive concerns.

Bobcat has two independent, dual ported integer scheduler. One feeds two ALUs (one of which can perform integer multiplies) while the other feeds two AGUs (one for loads and one for stores).

The FPU has a single dual ported scheduler that feeds two independent FPUs. Similar to the Atom processor, only one of the ports can handle floating point multiplies. The FP mul and add units can perform two single precision (32-bit) multiplies/adds per cycle. Like the integer side, the FPU uses a physical register file to reduce power.

Bobcat supports SSE1-3, with future versions adding more instructions as necessary.

Bobcat supports out of order loads and stores similar to Intel’s Core architecture as well.

The Bobcat core has a 3-cycle 64KB L1 (32KB instruction + 32KB data cache) that’s 8-way set associative. The L2 cache is a 17-cycle, 512KB 16-way set associative cache. I originally measured Atom’s L1 and L2 at 3 and 18 cycles respectively (I’ve heard numbers as low as 15 for Atom’s L2) so AMD is definitely in the right ballpark here.


Intel's Atom Microarchitecture

Unlike the original Atom, Bobcat will never ship as a standalone microprocessor. Instead it will be integrated with other cores and a GPU and sold as a single SoC. The first incarnation of Bobcat will be a processor due out in early 2011 for netbooks and thin and light notebooks called Ontario. Ontario will integrate two Bobcat cores with an AMD GPU manufactured on TSMC’s 40nm process (Bobcat will be the first x86 core made at TSMC). This will be the first Fusion product to hit the market.

Note that there's an on-die memory controller but it's actually housed in between the CPU and GPU in order to equally serve both masters.

The Three Chip Roadmap Bobcat Performance & Power
POST A COMMENT

76 Comments

View All Comments

  • Zoomer - Wednesday, August 25, 2010 - link

    Basically you'll need 2x the power for much less than 2x performance increase. Modern branch predictors can have very good hit rates ~90%+. It simply made more sense to use the second int unit for another thread.

    However, if you need the absolutely best single threaded int performance at all costs, imho, what you suggest wouldn't be bad. In fact,
    Reply
  • Edison5do - Tuesday, August 24, 2010 - link

    Finally besides the price competition, we will be able to see some tech competition, we have to raise our praise for AMD not to reject the ATI btand because New and HiTech CPU´s, should be paired with HiQuality, nice priced, Radeon GPU´s.

    I really dont think People are ready to see "AMD" Brand as a Head-toHead Competitor to "INTEL" Brand, by this i mean that they should rely on ATI for being well accepted by the public for more time before they even star thinking about that.
    Reply
  • angrysand - Tuesday, August 24, 2010 - link

    they may have had the on die memory controller, but Atom basically created the netbook market. AMD is just improving on what Intel help create (and that remains to be seen).

    I had to see AMD go because I like having resonable performance for reasonable price. But they had better get their act together and put out faster CPU's.
    Reply
  • ABR - Wednesday, August 25, 2010 - link

    Atom did not create the netbook market, some convergence of wireless data and increasing use of the web by non-computer folk did. The first "netbook" products were the Crusoe-based mini-notebooks starting in 2001. Unfortunately for Transmeta, interest in the high-portability / long battery life model was low, only a couple of models even came out, and they ended up having to compete with Intel for scraps of the low-end laptop market. They lost, and Intel only finally caught up with their technology later with the Atom, when, coincidentally or not, the market was finally ready. Reply
  • Nehemoth - Tuesday, August 24, 2010 - link

    Why Bodcat will be manufactured in the 40nm process instead of 32nm is cause the GPU?.

    Why will be manufactured on TSMC instead of GlobalFoundries?.

    I supposed that this could be a problem with GF not being ready in 32nm but can we see a switch from TSMC to GlobalFoundries after Bulldozer begin to be manufacture?.
    Reply
  • iwod - Wednesday, August 25, 2010 - link

    TSMC has much higher 40nm capacity then GF's 32nm. Bobcat is going to be a low end product which will hopefully generate high volume of sales. TSMC in this case will be a much better fit then GF. Reply
  • moozoo - Wednesday, August 25, 2010 - link

    I wonder how hard it would be to make a version has two Floating point cores and one integer core.

    Will AMD have a product to match Intel MIC's (Larrabee) .
    (http://www.anandtech.com/show/3749/intel-mic-22nm-...
    Reply
  • YuryMalich - Wednesday, August 25, 2010 - link

    Hi,
    There is a mistake on page 5 on this picture http://images.anandtech.com/reviews/cpu/amd/hotchi...
    There were drawn two 128-bit FMAC units on Phenom II Microarchitecture.
    But K10 processor doesn't have FMAC units at all! It has 1 FMUL and one FADD and one FMISC(FLOAD) units.
    The FMAC (multiple-add) units are new in Bulldozer microarchitecture.
    Reply
  • Jack Sparow - Wednesday, August 25, 2010 - link

    "Ivo August 25, 2010
    How many threads everyone processor (“Interlagos”, “Valencia” and “Zambezi”) can do simultaneously per core compare with Phenom II processor?

    Reply
    John Fruehe August 25, 2010
    One thread per core."

    This quote is from AMD blogs home. :)
    Reply
  • silverblue - Wednesday, August 25, 2010 - link

    I think I touched on this before once on a THQ news article - John Fruehe is being confusing. The correct definition of a complete Bulldozer core is a module, which is a monolithic dual-integer core package also consisting of other shared resources - the top image on page 4 of this article is a great guide. So, a four module (or quad core as we currently term them) Bulldozer will handle eight threads concurrently as those four cores possess eight integer cores.

    As such, I don't see non-SMT Bulldozer cores ever coming out.
    Reply

Log in

Don't have an account? Sign up now