Bulldozer

AMD already gave us a good amount of detail on Bulldozer earlier this year. We’ll start with a quick refresher.

With Nehalem, Intel moved to a more modular design process that would allow it the ability to quickly configure different versions of the chip to hit various markets. With Bulldozer, AMD is doing the same.

The basic building block is the Bulldozer module. AMD calls this a dual-core module because it has two independent integer cores and a single shared floating point core that can service instructions from two independent threads. The two thread machine is larger than a single core but smaller than two cores with straight duplication of resources.

All else being the same, it should give you more threaded performance than a single SMT (Hyper Threaded) core but less than two dedicated cores. The savings are obviously on the die side. AMD tells us that the second integer core increases the Bulldozer module die by around 12%, despite significantly increasing performance in threaded integer applications.

Processors may implement anywhere from one to four Bulldozer modules and will be referred to as 2 to 8 core CPUs. Each core appears to the OS as a logical processor similar to what you get with Hyper Threading. A CPU with four Bulldozer modules would appear as an 8-threaded processor under Task Manager in Windows.

AMD argues that the Bulldozer module is ideal provisioning of hardware. With SMT (Hyper Threading) you force too much into a single core, while with traditional multicore you often waste hardware as any idle resources are duplicated across the chip.

Bulldozer CPUs will be AMD’s first 32nm processors manufactured at GlobalFoundries.

The new details today are about everything inside of the Bulldozer module.

Bobcat Performance & Power A Real Redesign
POST A COMMENT

76 Comments

View All Comments

  • Mr Perfect - Wednesday, August 25, 2010 - link

    It sounds like AMD will be selling by the integer core though, not by module. There's this from Page 4:

    "Processors may implement anywhere from one to four Bulldozer modules and will be referred to as 2 to 8 core CPUs."

    So they will be referring to four module APUs as having eight cores, rather then a quad core with HyperThreading.
    Reply
  • silverblue - Wednesday, August 25, 2010 - link

    Sorry, I did mean to tackle the part of your thread dealing with different versions of Bulldozer. Valencia is a server version of Zambezi, i.e. 4 modules/8 threads. Interlagos is 8 modules/16 threads.

    From AMD's own figures, each module is 1.8 times the speed of a current K10.5 core at the same clock speed. It is a little unfair to compare "core" to core due to the way they're designed and implemented. Considering each K10.5 core has three ALUs and Bulldozer has two per integer core, 90% of that integer performance is very good - for a quad core CPU in the current sense, Bulldozer would theoretically outpace Phenom II by 80% in integer work by only having 33% more integer resources, assuming the chip is well fed. If the rumours about a quad-channel memory bus are correct, you'd hope it would be.
    Reply
  • jeremyshaw - Wednesday, August 25, 2010 - link

    I believe Intel also delegated some Atom production to TSMC, unless if I am wrong? Reply
  • Penti - Thursday, August 26, 2010 - link

    TSMC also does manufacture VIAs / Centaur Tech x86 processor.

    Probably a few others too. There's some x86 SoCs for embedded stuff from other vendors.
    Reply
  • Perisphetic - Wednesday, August 25, 2010 - link

    It's time to kick ass and chew bubble gum... and AMD is all outta gum. Reply
  • NaN42 - Wednesday, August 25, 2010 - link

    At first: I think AMD made a huge progress with Bulldozer.
    But I'm wondering how the FPU will work exactly. A look at the latencies (especially of fma-instructions) would be interesting too. Another question is, if it is possible to start one independent multiply and one addition at the same time in a FMAC-unit. Furthermore the throughput is of interest. Is it one mul and add instruction per cycle? Is there any advantage to use 256 bit AVX-instructions, besides shorter code?
    I appreciate that AMD will drop most 3Dnow-instructions because these are just outdated. Perhaps they could also drop MMX instructions but maintain x87-instructions because these are sometimes useful and needed.

    I expect the decoder besides the FPU (compared to Sandy Bridge) to be another bottleneck because the 4-wide decoder has to feed two nearly independent cores and todays 3-wide decoders (except those in Nehalem/Westmere) are sometimes a bottleneck in a single core design.

    @Ontario: I expect this platform to be much more powerful than the Atom platforms. Perhaps it will even be much more efficient than Atom. A direct comparison between Ontario and VIA Nano 3000 might be interesting especially when VIA releases dual core chips.
    Reply
  • GourdFreeMan - Thursday, August 26, 2010 - link

    It seems that AMD is ceding the traditional laptop and desktop market to Intel and chasing the server market and Atom/ARM's market with Bulldozer and Bobcat respectively. Lower theoretical peak IPC and greater parallelism target well the high level of data and transaction level parallelism in the server market, but existing consumer software excepting video encoding and a handful of games still tend to favor single threaded performance over parallelism. I suppose we should wait for benchmarks in actual applications to see how well architectural improvements have impacted the performance of AMD's new designs, but I imagine some people are already disappointed. Too bad the resources in both integer cores in a module can't work on a single thread, otherwise we could have had a very serious contender on the desktop... Reply
  • silverblue - Thursday, August 26, 2010 - link

    He sure seemed confusing on the comments page of his blog a few weeks back. Understandably evasive considering he's a server tech guy, not consumer tech, plus AMD were yet to reveal these details, but he was comparing 16 Bulldozer cores to 12 Magny Cours cores, which is technically incorrect as they're not comparable UNLESS you're talking about integer cores. At least, that's my interpretation.

    AMD will probably market Zambezi as an 8-core CPU in order to woo the more-is-better crowd, but regardless of how it handles multi-threading, I still view a module as an actual core virtue of the fact that the "cores" are not independant of the module they belong to. I know I'm wrong and that's fine, but it helps in understanding the technology better - eight cores that exist in pairs and share additional resources might serve to confuse.
    Reply
  • gruffi - Thursday, August 26, 2010 - link

    A 12-core Magny-Cours has 12 "integer cores" and 12 128-bit FPUs. A 16-core Interlagos has 16 "integer cores" and 16 128-bit FMACs. Why is it technically not comparable? At least you know you are wrong. ;) Reply
  • silverblue - Friday, August 27, 2010 - link

    The implementation is very different to what AMD have done before, that's what I'm trying to get at. Everyone knew that despite Intel and AMD having different types of quad core processor prior to Nehalem, they were still classed the same so I suppose it doesn't matter in the grand scheme of things. There's nothing to stop AMD from releasing a 24-"core" Bulldozer; it shouldn't be any larger than Magny-Cours - perhaps slightly smaller in the end - yet its integer performance would be through the roof.

    However, people are bemoaning the fact that for 33% more "cores", AMD are only getting 50% extra performance - it's worth bearing in mind that AMD does this with 4 less, albeit better utilised ALUs than Magny-Cours (32 compared to 36). Make no mistake, Bulldozer is far more efficient and capable in this scenario, but I can't help wondering how strong Phenom II may have been if it'd had a slightly more elegant design.
    Reply

Log in

Don't have an account? Sign up now