Physical Register Files to Save Power

The original x86 instruction set has a very limited number of registers (8). In order to maintain backwards compatibility with legacy x86 code, the ISA and associated registers were preserved. To scale performance with wide out of order architectures however, we needed larger register files. The solution was to enable register renaming, where the hardware could have additional registers not defined in the x86 spec and rename them on the fly.

Register renaming is done in all modern day x86 processors. There are two approaches to register renaming. The current Phenom II/Opteron approach actually carries the data from renamed registers along with the instruction as it moves through queues before it gets executed. You effectively create very wide instructions, which is horribly power inefficient (moving data on a chip takes a lot of power) although it gets the job done from a performance standpoint.

The alternative is something that we don’t see used in any current generation microprocessors. Instead of carrying data along with the instructions, you simply carry pointers to the data with those instructions. There’s added management complexity but you don’t have to worry about moving lots of data around, and therefore avoid much of the power penalty.

Bobcat (as well as Bulldozer) uses physical register files to save power. Intel actually did this in the Pentium 4 but hasn’t used PRFs since. AMD argues that with power as a major driver of design, PRFs will be necessary in future architectures.

Bobcat’s Performance Expectations

With nearly the same pipeline depth as Atom (15 vs. 16 stages), nearly the same cache latencies, the same instruction issue width and presumably competitive clock speeds (~1.5GHz), Bobcat based microprocessors should inherently outperform Atom thanks to its out of order architecture.

Atom does hold an advantage in that each core is multithreaded, so heavily threaded apps may have an advantage on Intel’s architecture. That being said, by far the biggest issue we have with Atom based netbooks is their single threaded performance that contributes to an overall slow user experience. Bobcat should hopefully address that.

On the threaded side, AMD does have another solution. As I mentioned before, Bobcat won’t be used in a microprocessor by itself - Ontario will feature two of them. AMD said that future designs are expected to integrate 2 or 4 Bobcat cores, while there are no plans to produce a single core version it’s always possible.

I believe a dual core Ontario based on Bobcat, if clocked high enough, could deliver a good enough balance of single and multithreaded performance to really challenge Atom in the netbook space. The assumption is that graphics performance will be much better than Atom with Ontario integrating an AMD GPU.

AMD’s official line is that Ontario will be able to deliver 90% of the performance of a mainstream notebook in less than half the die area. AMD isn’t just looking to compete with Atom, but go after even the CULV market with Ontario. Only time will tell if the latter is over zealous.

Power Concerns

AMD calls Bobcat sub-1W capable, which seems to imply that short of a smartphone Bobcat could go anywhere Atom could go. Technically, if AMD wanted to, even getting one into a smartphone wouldn’t be impossible - it would just require a healthy investment in chipsets.

It remains to be seen how good TSMC’s 40nm process will be compared to Atom’s Intel-manufactured 45nm transistors in terms of power consumption. Presumably the out of order aspect of the design will guarantee higher power consumption than Atom, but for the netbook/CULV notebook market the added performance may be worth the added power consumption.

It’s an Out of Order Atom Bulldozer
Comments Locked

76 Comments

View All Comments

  • mino - Tuesday, August 24, 2010 - link

    From the HW design POW, those pipes are "MMX/3Dnow" class stuff.
    They run SSE3, but they are still MMX-class.

    There is a reason Bulldozer has "FMAC" written there ...
  • Kiijibari - Tuesday, August 24, 2010 - link

    ... it is stupid to name a circuit after a deprecated ISA extension and not after its function.
    If its doing stuff like 3dnow and mmx then call it Shuffel / permutation pipeline but not MMX ...

    The FMAC is the best example .. why is it written FMAC in that case and not SSE5/AVX/XOP ?
  • KonradK - Thursday, August 26, 2010 - link

    Depracated does not mean prohibited. Also there are existing MMX programs and other than Windows 64bit operating systems and compilers other than MSVSC.

    MMX and x87 is prohibited in 64bit kernel code.

    http://msdn.microsoft.com/en-us/library/ff545910%2...
  • iwod - Tuesday, August 24, 2010 - link

    From the design of Bulldozer's FPU it is cleared that AMD want Multi Threaded FPU to run on OpenCL. While the dual Integer looks interesting now. It is up against the SandyBridge, the architecture that is suppose to leap again like Pentium 4 to C2D. And if Bulldozer comes any later, it will be up against the die shrink of SandyBridge, Ivy Bridge. Things dont look so good in here.

    It is mainstream / low end that looks very interesting. I am currently using a Pentium M 1.8Ghz Dothan with 2GB DDR Ram. With a Radeon 1600 Graphics. I dont get hardware acceleration from GPU, 720P is just barely playable with some very fast software decoder. It is fast enough to watch some 460p youtube and most of my day web serving.

    Now if Bobcat have similar or higher IPC then Dothan. A Quad Core Bobcat with Radeon 5000 64 SP will still be within reasonable die size on 40nm, It will be cheap when it drops to 32nm or lower. Most of us dont need SUPER FAST computer. And Bobcat with Radeon 5 Series or Higher Plus a Fast SSD are all we need.
  • aegisofrime - Tuesday, August 24, 2010 - link

    I don't recall Sandy Bridge being a revolutionary leap. Everyone has been saying that it's more of evolutionary, the main difference being the addition of AVX.

    I REALLY REALLY REALLY hope that AMD announces later today what socket Bulldozer will be on... I desperately need more video encoding performance. I have a AM2+ motherboard and that bloody 1055T is singing it's siren song to me every night. If Bulldozer is on AM3 I can get an AM3 board and the 1055T and do a quick upgrade to Bulldozer.

    Come on AMD. Your customers need more information to make an informed decision!
  • mino - Tuesday, August 24, 2010 - link

    Buldozer gen1 == primarily servers
    => 16/12-core (MCM) Socket G34 (current platfrom)
    => 8/6/4-core Socket G32 (current platfrom)

    Bulldozer Desktop (hopefully before X-mas 2011)
    => 8?/6/4-core Socket AM3R2(or AM3+, whatever they call it)
  • Pirks - Tuesday, August 24, 2010 - link

    Huh? You want more video encoding perfomance and you think about upgrading CPU? What kind of idiocy is that? Use 480GTX with Badaboom and your video encoding speed won't be matched by CPUs of year 2020 or maybe even 2030 :P
  • aegisofrime - Tuesday, August 24, 2010 - link

    Don't talk if you don't know what you are talking about. No GPU encoder out there is able to match x264 quality or SPEED wise. And the huge flaw in your statement is that Badaboom doesn't even support Fermi GPUs right now.

    Have you done any serious video encoding before, or are you just trolling as usual?
  • ChronoReverse - Tuesday, August 24, 2010 - link

    Indeed. I would try out CUDA encoders every once in a while in hopes that I could at least get the quality of x264 at MINIMUM quality but they can't even match that.

    Since x264 at minimum quality encodes slightly quicker (on my quad core) a CUDA encoder does (on my GTX260) and still yields better quality, I really appreciate faster CPU's.
  • mapesdhs - Tuesday, August 24, 2010 - link


    Hate to say it but unless GPU acceleration is available, the i7 is a far better
    choice for video encoding. I still use a 6000+ for most tasks, but numerous
    article reviews made it quite clear that AMD was not the best choice for
    video encoding, so I went with an i7 860 4GHz. Pricing was surprisingly good,
    speed is excellent.

    Ian.

Log in

Don't have an account? Sign up now