Physical Register Files to Save Power

The original x86 instruction set has a very limited number of registers (8). In order to maintain backwards compatibility with legacy x86 code, the ISA and associated registers were preserved. To scale performance with wide out of order architectures however, we needed larger register files. The solution was to enable register renaming, where the hardware could have additional registers not defined in the x86 spec and rename them on the fly.

Register renaming is done in all modern day x86 processors. There are two approaches to register renaming. The current Phenom II/Opteron approach actually carries the data from renamed registers along with the instruction as it moves through queues before it gets executed. You effectively create very wide instructions, which is horribly power inefficient (moving data on a chip takes a lot of power) although it gets the job done from a performance standpoint.

The alternative is something that we don’t see used in any current generation microprocessors. Instead of carrying data along with the instructions, you simply carry pointers to the data with those instructions. There’s added management complexity but you don’t have to worry about moving lots of data around, and therefore avoid much of the power penalty.

Bobcat (as well as Bulldozer) uses physical register files to save power. Intel actually did this in the Pentium 4 but hasn’t used PRFs since. AMD argues that with power as a major driver of design, PRFs will be necessary in future architectures.

Bobcat’s Performance Expectations

With nearly the same pipeline depth as Atom (15 vs. 16 stages), nearly the same cache latencies, the same instruction issue width and presumably competitive clock speeds (~1.5GHz), Bobcat based microprocessors should inherently outperform Atom thanks to its out of order architecture.

Atom does hold an advantage in that each core is multithreaded, so heavily threaded apps may have an advantage on Intel’s architecture. That being said, by far the biggest issue we have with Atom based netbooks is their single threaded performance that contributes to an overall slow user experience. Bobcat should hopefully address that.

On the threaded side, AMD does have another solution. As I mentioned before, Bobcat won’t be used in a microprocessor by itself - Ontario will feature two of them. AMD said that future designs are expected to integrate 2 or 4 Bobcat cores, while there are no plans to produce a single core version it’s always possible.

I believe a dual core Ontario based on Bobcat, if clocked high enough, could deliver a good enough balance of single and multithreaded performance to really challenge Atom in the netbook space. The assumption is that graphics performance will be much better than Atom with Ontario integrating an AMD GPU.

AMD’s official line is that Ontario will be able to deliver 90% of the performance of a mainstream notebook in less than half the die area. AMD isn’t just looking to compete with Atom, but go after even the CULV market with Ontario. Only time will tell if the latter is over zealous.

Power Concerns

AMD calls Bobcat sub-1W capable, which seems to imply that short of a smartphone Bobcat could go anywhere Atom could go. Technically, if AMD wanted to, even getting one into a smartphone wouldn’t be impossible - it would just require a healthy investment in chipsets.

It remains to be seen how good TSMC’s 40nm process will be compared to Atom’s Intel-manufactured 45nm transistors in terms of power consumption. Presumably the out of order aspect of the design will guarantee higher power consumption than Atom, but for the netbook/CULV notebook market the added performance may be worth the added power consumption.

It’s an Out of Order Atom Bulldozer
Comments Locked

76 Comments

View All Comments

  • Zoomer - Wednesday, August 25, 2010 - link

    Basically you'll need 2x the power for much less than 2x performance increase. Modern branch predictors can have very good hit rates ~90%+. It simply made more sense to use the second int unit for another thread.

    However, if you need the absolutely best single threaded int performance at all costs, imho, what you suggest wouldn't be bad. In fact,
  • Edison5do - Tuesday, August 24, 2010 - link

    Finally besides the price competition, we will be able to see some tech competition, we have to raise our praise for AMD not to reject the ATI btand because New and HiTech CPU´s, should be paired with HiQuality, nice priced, Radeon GPU´s.

    I really dont think People are ready to see "AMD" Brand as a Head-toHead Competitor to "INTEL" Brand, by this i mean that they should rely on ATI for being well accepted by the public for more time before they even star thinking about that.
  • angrysand - Tuesday, August 24, 2010 - link

    they may have had the on die memory controller, but Atom basically created the netbook market. AMD is just improving on what Intel help create (and that remains to be seen).

    I had to see AMD go because I like having resonable performance for reasonable price. But they had better get their act together and put out faster CPU's.
  • ABR - Wednesday, August 25, 2010 - link

    Atom did not create the netbook market, some convergence of wireless data and increasing use of the web by non-computer folk did. The first "netbook" products were the Crusoe-based mini-notebooks starting in 2001. Unfortunately for Transmeta, interest in the high-portability / long battery life model was low, only a couple of models even came out, and they ended up having to compete with Intel for scraps of the low-end laptop market. They lost, and Intel only finally caught up with their technology later with the Atom, when, coincidentally or not, the market was finally ready.
  • Nehemoth - Tuesday, August 24, 2010 - link

    Why Bodcat will be manufactured in the 40nm process instead of 32nm is cause the GPU?.

    Why will be manufactured on TSMC instead of GlobalFoundries?.

    I supposed that this could be a problem with GF not being ready in 32nm but can we see a switch from TSMC to GlobalFoundries after Bulldozer begin to be manufacture?.
  • iwod - Wednesday, August 25, 2010 - link

    TSMC has much higher 40nm capacity then GF's 32nm. Bobcat is going to be a low end product which will hopefully generate high volume of sales. TSMC in this case will be a much better fit then GF.
  • moozoo - Wednesday, August 25, 2010 - link

    I wonder how hard it would be to make a version has two Floating point cores and one integer core.

    Will AMD have a product to match Intel MIC's (Larrabee) .
    (http://www.anandtech.com/show/3749/intel-mic-22nm-...
  • YuryMalich - Wednesday, August 25, 2010 - link

    Hi,
    There is a mistake on page 5 on this picture http://images.anandtech.com/reviews/cpu/amd/hotchi...
    There were drawn two 128-bit FMAC units on Phenom II Microarchitecture.
    But K10 processor doesn't have FMAC units at all! It has 1 FMUL and one FADD and one FMISC(FLOAD) units.
    The FMAC (multiple-add) units are new in Bulldozer microarchitecture.
  • Jack Sparow - Wednesday, August 25, 2010 - link

    "Ivo August 25, 2010
    How many threads everyone processor (“Interlagos”, “Valencia” and “Zambezi”) can do simultaneously per core compare with Phenom II processor?

    Reply
    John Fruehe August 25, 2010
    One thread per core."

    This quote is from AMD blogs home. :)
  • silverblue - Wednesday, August 25, 2010 - link

    I think I touched on this before once on a THQ news article - John Fruehe is being confusing. The correct definition of a complete Bulldozer core is a module, which is a monolithic dual-integer core package also consisting of other shared resources - the top image on page 4 of this article is a great guide. So, a four module (or quad core as we currently term them) Bulldozer will handle eight threads concurrently as those four cores possess eight integer cores.

    As such, I don't see non-SMT Bulldozer cores ever coming out.

Log in

Don't have an account? Sign up now