Last week Johan posted his thoughts from an server/HPC standpoint on AMD's roadmap. Much of my analysis was limited to desktop/mobile, so if you're making million dollar server decisions then his article is better suited for your needs.

He also unveiled a couple of details about AMD's Bulldozer architecture that I thought I'd call out in greater detail. Johan has been working on a CMP vs. SMT article so I'll try to not step on his toes too much here.

It all started about two weeks ago when I got a request from AMD to have a quick conference call about Bulldozer. I get these sorts of calls for one of two reasons. Either:

1) I did something wrong, or
2) Intel did something wrong.

This time it was the former. I hate when it's the former.

It's called a Module

This is the Bulldozer building block, what AMD is calling a Bulldozer Module:

AMD refers to the module as being two tightly coupled cores, which starts the path of confusing terminology. A few of you wondered how AMD was going to be counting cores in the Bulldozer era; I took your question to AMD via email:

Also, just to confirm, when your roadmap refers to 4 bulldozer cores that is four of these cores:

http://images.anandtech.com/reviews/cpu/amd/FAD2009/2/bulldozer.jpg

Or does each one of those cores count as two? I think it's the former but I just wanted to confirm.

AMD responded:

Anand,

Think of each twin Integer core Bulldozer module as a single unit, so correct.

I took that to mean that my assumption was correct and 4 Bulldozer cores meant 4 Bulldozer modules. It turns out there was a miscommunication and I was wrong. Sorry about that :)

Inside the Bulldozer Module

There are two independent integer cores on a single Bulldozer module. Each one has its own L1 instruction and data cache (thanks Johan), as well as scheduling/reordering logic. AMD is also careful to mention that the integer throughput of one of these integer cores is greater than that of the Phenom II's integer units.

Intel's Core architecture uses a unified scheduler fielding all instructions, whether integer or floating point. AMD's architecture uses independent integer and floating point schedulers. While Bulldozer doubles up on the integer schedulers, there's only a single floating point scheduler in the design.

Behind the FP scheduler are two 128-bit wide FMACs. AMD says that each thread dispatched to the core can take one of the 128-bit FMACs or, if one thread is purely integer, the other can use all of the FP execution resources to itself.

AMD believes that 80%+ of all normal server workloads are purely integer operations. On top of that, the additional integer core on each Bulldozer module doesn't cost much die area. If you took a four module (eight core) Bulldozer CPU and stripped out the additional integer core from each module you would end up with a die that was 95% of the size of the original CPU. The combination of the two made AMD's design decision simple.AMD has come back to us with a clarification: the 5% figure was incorrect. AMD is now stating that the additional core in Bulldozer requires approximately an additional 50% die area. That's less than a complete doubling of die size for two cores, but still much more than something like Hyper Threading.

The New Way to Count Cores
POST A COMMENT

94 Comments

View All Comments

  • Calin - Monday, November 30, 2009 - link

    This was started by Sun's Niagara (I think) processor - 32 "int cores" and only one FP unit. A physical integer core ran four threads at a time (one instruction from each, with instant context switching between them), so one would have had eight physical integer cores with only one FP unit.
    The Niagara 2 would have had one FP unit for each of those integer cores, so one FP for each four int cores.
    Reply
  • defter - Monday, November 30, 2009 - link

    This confusion has nothing to do with int or fp cores. By a common definition, a core a standalone unit, which can function on it's own if necessary.

    For example, each of Niagara's 8 cores had an own fetch and decode unit. In AMD's case, the "module" is the unit with it's own fetch and decode units, and integer ALU clusters only have own scheduler. Therefore, it's very confusing to call these clusters "cores".

    AMD seems to learned confusing marketing from from ATI and it's 320/800/1600 shader GPUs (which have actually have 64/160/320 shader units) :)
    Reply
  • Spoelie - Monday, November 30, 2009 - link

    You're abusing the term pipelines and cores as well.

    Trying to describe an unconventional design with conventional terms might not always be very clear, but there is no right or wrong on this issue, just different viewpoints.
    Reply
  • kobblestown - Monday, November 30, 2009 - link

    Fair enough. The use of the term "core" is still confusing though. There seems to be only one complete pipeline which branches after instruction decode. It's interesting whether the Icache is trace cache, i.e. contains decoded instructions or is a regular cache that needs to be fed back via the fetch/decode bottleneck. In the latter case I see no merit in calling the two integer (for lack of a better term) pipelines separate cores. Reply
  • Penti - Monday, November 30, 2009 - link

    Uhm, the Sun UltraSPARC T1 was a 8-core eight integer-units and one floating point unit, and with CMT - eight threads per core 32-thread. It was still called a 8-core CPU, but the T2 included a fp unit for every integer unit, so I doubt AMD will use this config for long. But who knows. It's impressive if its up to 30% faster then a PII with twich the number of FPU's.

    Also one integer unit already includes three ALU's. They (the core/scheduler) seems independent enough.
    Reply
  • blyndy - Monday, November 30, 2009 - link

    I also don't like their mixing of definitions. Let's just skirt that issue for now by call them by their thread number ie: 4-thread bulldozer, 8-thread bulldozer etc. Reply
  • blyndy - Monday, November 30, 2009 - link

    From a marketing standpoint, '8 cores' would be more desirable to the laymen buyer than '4 cores', so that's one reason why they might have chosen to do it. Indeed I think Intel will quickly be following AMDs definition so as not to have 'less cores'. Reply
  • lyeoh - Wednesday, December 02, 2009 - link

    8 cores is not automatically more desirable than 4 cores to someone who buys stuff like Oracle. They get charged per _core_. Reply
  • heulenwolf - Monday, November 30, 2009 - link

    Its a poor choice of words. Unfortunately, since the term "core" was never fully defined, everyone gets to have their own understanding of what it means. I took core to mean a complete processor that could, if packaged alone, perform all the functions of a processor (both integer and floating point). I think AMD should have taken the high road and called "modules" "dual-integer cores" instead of splitting them into two "cores" with this extraneous, shared FPU tacked on like an afterthought. It makes the term core meaningless. I would guess that making the term "Core" meaningless would be advantageous to AMD, however, since that is the term Intel uses for their entire architecture. Reply
  • Nocturnal - Monday, November 30, 2009 - link

    Very interesting. I hope that AMD will one day regain their edge they once held against. Oh how I miss those days. I embrace Intel nonetheless. Reply

Log in

Don't have an account? Sign up now