A Real Redesign

When we first met Phenom we were disappointed that it didn’t introduce the major architectural changes AMD needed to keep up with Intel. The front end and execution hardware remained largely unchanged from the K8, and as a result Intel pulled ahead significantly in performance per clock over the past few years. With Bulldozer, we finally got the redesign that we’ve been asking for.

If we look at Westmere, Intel has a 4-issue architecture that’s shared among two threads. At the front end, a single Bulldozer module is essentially the same. The fetch logic in Bulldozer can grab instructions from two threads and send it to the decoder. Note that either thread can occupy the full width of the front end if necessary.

The instruction fetcher pulls from a 64KB 2-way instruction cache, unchanged from the Phenom II.

The decoder is now 4-wide an increase from the 3-wide front end that AMD has had since the K7 all the way up to Phenom II. AMD can now fuse x86 branch instructions, similar to Intel’s macro-ops fusion to increase the effective width of the machine as well. At a high level, AMD’s front end has finally caught up to Intel, but here’s where AMD moves into the passing lane.

The 4-wide decode engine feeds three independent schedulers: two for the integer cores and one for the shared floating point hardware.


Bullddozer, 2 threads per module

Each integer scheduler is now unified. In the Phenom II and previous architectures AMD had individual schedulers for math and address operations, but with Bulldozer it’s all treated as one.


Phenom II, 1 thread per core

Each scheduler has four ports that feed a pair of ALUs and a pair of AGUs. This is down one ALU/AGU from Phenom II (it had 3 ALUs and 3 AGUs respectively and could do any mix of 3). AMD insists that the 3rd address generation unit wasn’t necessary in Phenom II and was only kept around for symmetry with the ALUs and to avoid redesigning that part of the chip - the integer execution core is something AMD has kept around since the K8. The 3rd ALU does have some performance benefits, and AMD canned it to reduce die size, but AMD mentioned that the 4-wide front end, fusion and other enhancements more than make up for this reduction. In other words, while there’s fewer single thread integer execution resources in Bulldozer than Phenom II, single threaded integer performance should still be higher.

Each integer core has its own 16KB L1 data cache. The L1 caches are segmented by thread so the shared FP core chooses which L1 cache to pull from depending on what thread it’s working on.

I asked AMD if the small L1 data cache was going to be a problem for performance, but it mentioned that in modern out of order machines it’s quite easy to hide the latency to L2 and thus this isn’t as big of an issue as you’d think. Given how aggressive AMD has been in the past with ramping up L1 cache sizes, this is a definite change of pace which further indicates how significant of a departure Bulldozer is from the norm at AMD.

While there are two integer schedulers in a single Bulldozer module (one for each thread), there’s only one FP scheduler. There’s some hardware duplication at the FP scheduler to allow two threads to share the execution resources behind it. While each integer core behaves like an independent core, the FP resources work as they would in a SMT (Hyper Threading) system.

The FP scheduler has four ports to its FPUs. There are two 128-bit FMAC pipes and two 128-bit packed integer pipes. Like Sandy Bridge, AMD’s Bulldozer will support SSE all the way up to 4.2 as well as Intel’s new AVX instructions. The 256-bit AVX ops will be handled by the two 128-bit FMAC units in each Bulldozer module.

Each Bulldozer module has its own private L2 cache shared by both integer cores and the FP execution hardware.

Bulldozer Predictors, Prefetching, Power Gating & Real Turbo
POST A COMMENT

76 Comments

View All Comments

  • stalker27 - Wednesday, August 25, 2010 - link

    Those "some of you" don't make them R&D money... which, silly boy that you are... got you those fast chips in the first place.

    Oh boy, how important some people think they are.
    Reply
  • iwod - Tuesday, August 24, 2010 - link

    Deleted: Reply
  • iwod - Tuesday, August 24, 2010 - link

    I am not a Professional Engineer, but i do have my degree in Electrics Engineering. I fully understand how Models and Simulations, MatLabs, Video Encoding, or CG Rendering requires as much Performance as it can get.

    But to the world, I am sorry you are right. You are exactly not counted in "the rest of the world". You are inside a professional niche whether you like it or not. That includes even Hardcore PC gamers, which is shrinking day by day due to completion from consoles. No this is not to say PC Gaming to going to die. It just means it is getting smaller. And this trend is not going to change until something dramatic happens.

    The rest of the world, counted by BILLIONS, are moving, or looking to move to iPad, Netbook, Cheap Notebook, or something that just get things done as cheaply as possible. It is the reason why Netbook took off. Why Atom based All in One PC took off. No one in the marketing department knew such Gigantic market exists.

    Lastly, i want to emphasis, by the "world" i really mean the World. Not an American view of the world which would just literally be America by itself. China, India, Brazil, and even countries like Japan are having trouble selling high end PC.
    Reply
  • jabber - Wednesday, August 25, 2010 - link

    Yep, all you computer engineer folks and render farms etc. account for a very small minority in the "world of computer users". You are not mainstream users.

    In general terms the world isnt really that interested anymore in CPU performance improvements.

    Most folks out there just want smaller and lower power so they can carry a computer around with them. They dont give a damn what the CPU architecture is.

    The leviathan CPU approach by AMD and Intel could go the way of the dinosaur for mainstream computing. ARM could well be the new mainstream CPU leader in just five years.

    Just think outside your own little box.
    Reply
  • B3an - Wednesday, August 25, 2010 - link

    Ridiculous small minded comment.

    Render farms and the like may not be mainstream, but gaming is, then theres things like video encoding, servers, workstations, databases, all very popular mainstream stuff that millions of people use and the internet also relies on.

    A very large percentage of computer users will always want faster CPU's.

    If Intel or AMD did what you think most people want, then nothing would progress either. No 3D interfaces, no artificial intelligence, no anything, as the power needed for it would never be there.
    Reply
  • BitJunkie - Wednesday, August 25, 2010 - link

    It all comes down to usage models right? The point is that AMD and Intel are trying to capture as many usage models as possible within a given architecture.

    This is why modular design is kind of appealing - you can bolt stuff together to hit the desired spot in the thermal-computational envelope.

    The thing that "engineers" fall foul of is that there is a divergence going on. On the one hand general computing is dominating, with a desire to drive down power usage. On the other hand there is the same appetite for improved computational performance as we get smarter and more ambitious in the way we tackle engineering problems.

    The issue is that both camps are looking to the same architecture for answers.

    The reason why that doesnt work for me is that some computations just don't benefit from parallelism - more cores doesn't mean more productivity. Therefore I want to see the few cores that I do use become super efficient at flipping bits on a floating point calculation.

    Right now there's no clear answer to that problem - but it will probably come with Fusion and the point at which the GPU takes the role that math co-processors did before being swallowed into the CPU. For this to work we need Microsoft to handle the GPU compute stuff natively within windows so that existing code can execute and not think about what part of the hardware is lifting the load.

    Therefore my sincere hope is that GPUs will become the new math co-processors and Windows 8 will make that happen.

    Oh, and there's no need for any tribalism here wrt to usage models. It's all good.
    Reply
  • jabber - Wednesday, August 25, 2010 - link

    No its not small minded. Its looking at the big picture.

    The big picture is that for most users their CPU power needs were reached and surpassed some time ago.

    CPUs are not the bottle neck in modern PCs. Storage systems are.

    We need better, cheaper and faster storage.

    I've been pushing out 1.6Ghz dual core Atoms to 95% of my small business customers and a good chunk of domestics for the past year.

    I havent had one moan or complaint that the PCs were not fast enough. Very few customers are hardcore gamers. Gamers are still a small subsection of the computing world.

    I'm not asking AMD/Intel to stop research in new and faster CPU designs. Keep going boys its all good.

    I'm just saying that the majority of mainstream computing lays along a very different path going forward to those that require power at all costs.

    Not all of us need octa-cores at 4Ghz+. A lot of us can get by with a 2Ghz dual core and a half decent 7200rpm HDD.

    Most of the PCs I see are still single core. Folks are managing just fine right now.

    Plenty folks are now managing with just 1Ghz or less on a mobile device. Thats why Intel are taking ARM more seriously as they see that future mainstream being more low power, mobile based than leviathan mega-core-mega-wattage beasts.

    Things will change rapidly over the next three or four years.
    Reply
  • Aries1470 - Friday, August 27, 2010 - link

    Quote:

    "We need better, cheaper and faster storage.

    I've been pushing out 1.6Ghz dual core Atoms to 95% of my small business customers and a good chunk of domestics for the past year.

    I havent had one moan or complaint that the PCs were not fast enough. Very few customers are hardcore gamers. Gamers are still a small subsection of the computing world.

    Well, I for one totally agree. I purchaed last year an Atom 330 dual core, and it does more than enough.
    I already had a much more powerful system, of which I use about.... one or twice a month if that! It is a quad core, has 4 gigs and a 2gb 9600 gpu.

    I have moved away from gaming and encoding and all that stuff.

    The motherboard I have is:
    ATOM-GM1-330 of which I imported to Australia from the U.S.A., since the distibuter here does not bring this model.
    I have paired it with 4Gb of memory, but using only 3gb, since I am running 32bit systems (XP & win 7)
    A Blu-ray writer....
    and a LP 5450 used with an adapter from 16x -> 1x

    It plays blu-ray great while browsing at the same time!
    I browse the internet at the same time as my wife and kid watch a movie on the 50" plasma, with NO stutter.

    Needless to say, I got it as a secondary pc... and has become my main pc. It is left on basically 24/7, with NO fan on the cpu! Low power consumption too.

    It performs great for the functions I want, and can even play Civ IV on it... but not much else. If I want to play real gaming, I use my other pc.

    So for what it is, it works great for my needs! No useless power consumption, does its Boinc too, albeit slow, but still better than my older P2-550... that was still alive a few years ago.

    Most people I know, don't use their pc for gaming anymore, mostly for facebook/ twitter and video calling, they have their Wii's & Xbox and one has a PS3...

    Ok, end of rant, but to conclude, I concur, your average Joe, has his gaming machine, and his pc is for htpc or not a gaming power pc.
    Reply
  • gruffi - Thursday, August 26, 2010 - link

    Sandy Bridge, the architecture that is suppose to leap again like Pentium 4 to C2D? Thanks for the joke of the day.

    Sandy Bridge looks more like a minor update of Nehalem/Westmere. More load/store bandwidth, improved cache, AVX and maybe a few other little tweaks. Nothing special. I think it will be less of an improvement as Core 2 (Core successor) and Nehalem (Core 2 successor) were.

    In many ways Bulldozer looks superior to what Intel has to offer in 2011.
    Reply
  • Lonbjerg - Tuesday, August 24, 2010 - link

    I don't care for "Bobcat"...mediocre performance in a cramped formfactor (netbooks) have as much interest to me as being dragged naked across field filled with broken glass.

    The "Bulldozer" looks fine on paper...problem is that so did Phenom.
    I look forward to the real reviews, and not PR slices :)
    Reply

Log in

Don't have an account? Sign up now