Last week Johan posted his thoughts from an server/HPC standpoint on AMD's roadmap. Much of my analysis was limited to desktop/mobile, so if you're making million dollar server decisions then his article is better suited for your needs.

He also unveiled a couple of details about AMD's Bulldozer architecture that I thought I'd call out in greater detail. Johan has been working on a CMP vs. SMT article so I'll try to not step on his toes too much here.

It all started about two weeks ago when I got a request from AMD to have a quick conference call about Bulldozer. I get these sorts of calls for one of two reasons. Either:

1) I did something wrong, or
2) Intel did something wrong.

This time it was the former. I hate when it's the former.

It's called a Module

This is the Bulldozer building block, what AMD is calling a Bulldozer Module:

AMD refers to the module as being two tightly coupled cores, which starts the path of confusing terminology. A few of you wondered how AMD was going to be counting cores in the Bulldozer era; I took your question to AMD via email:

Also, just to confirm, when your roadmap refers to 4 bulldozer cores that is four of these cores:

http://images.anandtech.com/reviews/cpu/amd/FAD2009/2/bulldozer.jpg

Or does each one of those cores count as two? I think it's the former but I just wanted to confirm.

AMD responded:

Anand,

Think of each twin Integer core Bulldozer module as a single unit, so correct.

I took that to mean that my assumption was correct and 4 Bulldozer cores meant 4 Bulldozer modules. It turns out there was a miscommunication and I was wrong. Sorry about that :)

Inside the Bulldozer Module

There are two independent integer cores on a single Bulldozer module. Each one has its own L1 instruction and data cache (thanks Johan), as well as scheduling/reordering logic. AMD is also careful to mention that the integer throughput of one of these integer cores is greater than that of the Phenom II's integer units.

Intel's Core architecture uses a unified scheduler fielding all instructions, whether integer or floating point. AMD's architecture uses independent integer and floating point schedulers. While Bulldozer doubles up on the integer schedulers, there's only a single floating point scheduler in the design.

Behind the FP scheduler are two 128-bit wide FMACs. AMD says that each thread dispatched to the core can take one of the 128-bit FMACs or, if one thread is purely integer, the other can use all of the FP execution resources to itself.

AMD believes that 80%+ of all normal server workloads are purely integer operations. On top of that, the additional integer core on each Bulldozer module doesn't cost much die area. If you took a four module (eight core) Bulldozer CPU and stripped out the additional integer core from each module you would end up with a die that was 95% of the size of the original CPU. The combination of the two made AMD's design decision simple.AMD has come back to us with a clarification: the 5% figure was incorrect. AMD is now stating that the additional core in Bulldozer requires approximately an additional 50% die area. That's less than a complete doubling of die size for two cores, but still much more than something like Hyper Threading.

The New Way to Count Cores
Comments Locked

94 Comments

View All Comments

  • Lifted - Monday, November 30, 2009 - link

    This. Maybe this takes into account the moving of FP off the die to a add on module or GPU.

    I never could have imagined we'd be going back to the add on FP modules.
  • GodisanAtheist - Monday, November 30, 2009 - link

    I believe that's because Interlagos gets the integrated GPU core, which in terms of theoretical performance will send FP performance through the roof.
  • DominionSeraph - Monday, November 30, 2009 - link

    But then the performance is underwhelming. Current-gen GPUs would be off the chart.
  • medi01 - Monday, November 30, 2009 - link

    Could someone decrypt the following text for me please:

    [quote]It all started about two weeks ago when I got a request from AMD to have a quick conference call about Bulldozer. I get these sorts of calls for one of two reasons. Either:

    1) I did something wrong, or
    2) Intel did something wrong.

    This time it was the former. I hate when it's the former.[/quote]
  • GaiaHunter - Monday, November 30, 2009 - link

    It means Anand get these calls ("short conference calls") requested by AMD when:

    1) Anand makes a mistake

    or

    2) Intel is being naughty (like telling OEM to not sell AMD).

    I asked Anand in the Bulldozer article if a quad-core zambezi meant 4cores/8 threads or 4cores/4 threads.

    He said (and I was convinced at that time it was the correct answer too)that a zambezi quad-core meant 4 cores/8 threads and an octo-core would be 8cores/16 threads. Or if you prefer 4Modules/8cores/8threads and 8Modules/16cores/16threads.

    But it seems it is 2Modules/4cores/4threads and 4modules/8cores/8threads.

    Sincerely, I can't really blame Anand - this shit is confusing.
  • Kiijibari - Monday, November 30, 2009 - link

    Yes .. for desktops. However for Servers there will be again an MCM with two dies, i.e. 8 modules, 16 cores, 16 threads, called Interlagos.
  • piesquared - Monday, November 30, 2009 - link

    Which makes a person wonder, if AMD has a 16 core Intelagos in the server space, how nice and cool and efficient will an 8 core Zambezi be.
  • GaiaHunter - Monday, November 30, 2009 - link

    U can see it in there

    http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

    And also we can see that this new designation was causing quite a confusion in the forums.

    http://forums.anandtech.com/showthread.php?t=20230...">http://forums.anandtech.com/showthread.php?t=20230...
  • pcfxer - Monday, November 30, 2009 - link

    The first with L3 cache was Intel Pentium 4 EXTREME EDITION.

    Phenom just had the most logical use of L3 in the sense that it served as a "community" buffer.
  • JimmiG - Monday, November 30, 2009 - link

    The first with L3 cache was actually the AMD K6-III released in 1999. Of course, the L3 was actually on the mobo, while the L2 was on-die. But it did use a tri-level cache, making it outperform the Pentium III Katmai on integer workloads.

Log in

Don't have an account? Sign up now