Last week Johan posted his thoughts from an server/HPC standpoint on AMD's roadmap. Much of my analysis was limited to desktop/mobile, so if you're making million dollar server decisions then his article is better suited for your needs.

He also unveiled a couple of details about AMD's Bulldozer architecture that I thought I'd call out in greater detail. Johan has been working on a CMP vs. SMT article so I'll try to not step on his toes too much here.

It all started about two weeks ago when I got a request from AMD to have a quick conference call about Bulldozer. I get these sorts of calls for one of two reasons. Either:

1) I did something wrong, or
2) Intel did something wrong.

This time it was the former. I hate when it's the former.

It's called a Module

This is the Bulldozer building block, what AMD is calling a Bulldozer Module:

AMD refers to the module as being two tightly coupled cores, which starts the path of confusing terminology. A few of you wondered how AMD was going to be counting cores in the Bulldozer era; I took your question to AMD via email:

Also, just to confirm, when your roadmap refers to 4 bulldozer cores that is four of these cores:

http://images.anandtech.com/reviews/cpu/amd/FAD2009/2/bulldozer.jpg

Or does each one of those cores count as two? I think it's the former but I just wanted to confirm.

AMD responded:

Anand,

Think of each twin Integer core Bulldozer module as a single unit, so correct.

I took that to mean that my assumption was correct and 4 Bulldozer cores meant 4 Bulldozer modules. It turns out there was a miscommunication and I was wrong. Sorry about that :)

Inside the Bulldozer Module

There are two independent integer cores on a single Bulldozer module. Each one has its own L1 instruction and data cache (thanks Johan), as well as scheduling/reordering logic. AMD is also careful to mention that the integer throughput of one of these integer cores is greater than that of the Phenom II's integer units.

Intel's Core architecture uses a unified scheduler fielding all instructions, whether integer or floating point. AMD's architecture uses independent integer and floating point schedulers. While Bulldozer doubles up on the integer schedulers, there's only a single floating point scheduler in the design.

Behind the FP scheduler are two 128-bit wide FMACs. AMD says that each thread dispatched to the core can take one of the 128-bit FMACs or, if one thread is purely integer, the other can use all of the FP execution resources to itself.

AMD believes that 80%+ of all normal server workloads are purely integer operations. On top of that, the additional integer core on each Bulldozer module doesn't cost much die area. If you took a four module (eight core) Bulldozer CPU and stripped out the additional integer core from each module you would end up with a die that was 95% of the size of the original CPU. The combination of the two made AMD's design decision simple.AMD has come back to us with a clarification: the 5% figure was incorrect. AMD is now stating that the additional core in Bulldozer requires approximately an additional 50% die area. That's less than a complete doubling of die size for two cores, but still much more than something like Hyper Threading.

The New Way to Count Cores
POST A COMMENT

94 Comments

View All Comments

  • GaiaHunter - Monday, November 30, 2009 - link

    You start assuming the BM is 30mm^2.

    But both integer cores are exactly the same size. So if the resources you add are 10mm^2, that means that 2 int cores take 20mm^2 and 8 is 80 mm^2.

    Reply
  • GaiaHunter - Monday, November 30, 2009 - link

    You start assuming the BM is 30mm^2.

    But both integer cores are exactly the same size. So if the resources you add are 10mm^2, that means that 2 int cores take 20mm^2 and 8 is 80 mm^2.

    Now each integer core is 10mm^2 and represents 5% total die size - bam 200mm^2 die as you said.

    You had 50% resources to the BM and end with each int core at 5% of the die size and even get your 200mm^2 die.
    Reply
  • GaiaHunter - Tuesday, December 01, 2009 - link

    Now lets go the other way.

    Lets assume JF is right and Moore is also right.

    Grab a 8 core bulldozer CPU, shave 4 cores and save 5% die space.

    CPU die size is 200mm^2.

    5% is 10mm^2, so each int core is 2.5mm^2 and 8 of these will take 20mm^2.

    Now, Deneb is 260mm^2.

    If 8 core Bulldozer is 300mm^2, you end with 3.75mm^2 int cores.

    Small?

    Maybe.

    Around half of the die will be L3$.

    Northbridge circuits stay. Memory controller and the HT PHY also stay. L2$, fetch, decode and FPU are also shared.

    So basically you are just removing a very small portion.

    The question would be if you would need as much of those resources in the first place.
    Reply
  • GaiaHunter - Monday, November 30, 2009 - link

    Moore affirmation is only related to the INTEGER AREA of the BULLDOZER Module.

    Fruehe's claim is about total die size.

    If Moore's claim is about Integer Area of the Bulldozer module and Fruehe's claim is about die size, then, these claims don't have to be mutually exclusive.

    Additionally, there is no BULLDOZER MODULE. Forget about it. It isn't a unit by itself.

    2 (or more) of those bulldozer modules will share L2$ and L3$ for example, so you can't even define a damn size for a frigging module to start with.


    Reply
  • ThaHeretic - Monday, November 30, 2009 - link

    I really hate the move to call each "module" 2 "cores." AMD is shooting themselves in the foot when it comes to software licensing, in particular, Oracle DB licensing where they charge .5 CPU license for each x86 "multi-core." AMD's decision will double the cost of software running on Bulldozer.

    Bad move AMD, bad move.
    Reply
  • cfaalm - Monday, November 30, 2009 - link

    Really? This issue popped up for OS licenses when x86 dual cores were first introduced. Microsoft decided to go on a "per socket" base, not counting cores. Reply
  • Calin - Monday, November 30, 2009 - link

    Microsoft requests licensing per mainboard socket. Oracle requests a decreased licensing cost per core, if the core is a part of a socket. Meanwhile, some other companies requests licensing costs per core.
    Everyone with its own ways.
    Reply
  • cfaalm - Tuesday, December 01, 2009 - link

    Fair enough. So indeed this is a concern when buying this new stuff. I'd rather have AMD not call a this module 2 cores for the simple reason that is a sort of siamese twincore, not a true dual core. Though that is just the naming game. It looks promising nonetheless. Hopefully AMD/Oracle can enlighten the big system buyers by the time the decisions need to be made. Reply
  • ThaHeretic - Tuesday, December 01, 2009 - link

    A lot of licenses and MRCs are based on socket count, unfortunately, many of the most expensive software packages licensing arrangements are derived from core-count. Since Oracle changed their multi-core licensing near the end of 2006, quad-core x86 processors have counted as two licenses, six-cores as three licenses, etc. A Bulldozer quad-module die will therefore need 4 licenses for OracleDB.

    Does this suck? Yes. Is SQLServer's licensing model better for end-users? Yes. Is SQLServer anywhere near as awesome as Oracle Database? Hell no, not even close.
    Reply
  • DominionSeraph - Monday, November 30, 2009 - link

    "AMD claims that the performance benefit from the second integer core on a single Bulldozer module is up to 80% on threaded code."

    Yet their performance graph has FP gains outrunning integer.
    Reply

Log in

Don't have an account? Sign up now