The New Way to Count Cores

Henceforth AMD is referring to the number of integer cores on a processor when it counts cores. So a quad-core Zambezi is made up of four integer cores, or two Bulldozer modules. An eight-core would be four Bulldozer modules.


A hypothetical quad-core Bulldozer. Presumably the L3 cache would be shared by both modules.


A hypothetical eight-core Bulldozer. Presumably the L3 cache would be shared by all four modules.

It's a distinct shift from AMD's (and Intel's) current method of counting cores. A quad-core Phenom II X4 is literally four Phenom II cores on a single die, if you disabled three you would be left with a single core Phenom II. The same can't be said about a quad-core Bulldozer. The smallest functional block there is a module, which is two cores according to AMD.

Better than Hyper Threading?

Intel doesn't take, at least today, quite aggressive of a step towards multithreading. Nehalem uses SMT to send two threads to a single core, resulting in as much as a 30% increase in performance:

The added die area to enable HT on Nehalem is very small, far less than 5%.

AMD claims that the performance benefit from the second integer core on a single Bulldozer module is up to 80% on threaded code. That's more than what AMD could get through something like Hyper Threading, but as we've recently found out the impact to die size is not negligible. It really boils down to the sorts of workloads AMD will be running on Bulldozer. If they are indeed mostly integer, then the performance per die area will be quite good and the tradeoff worth it. Part of the integer/FP balance does depend on how quickly the world embraces computing on the GPU however...

According to AMD's roadmaps, Zambezi will use either 4 or 8 Bulldozer cores (that's 2 or 4 modules). The quad-core Zambezi should have roughly 10 - 35% better integer performance than a similarly clocked quad-core Phenom II. An eight-core Zambezi will be a threaded monster.

No GPU, for Now

The first APU from AMD will be Llano, but based on existing Phenom II cores. The move to a new manufacturing process combined with the first monolithic CPU/GPU is enough to do at once, there's no need to toss in a brand new microarchitecture at the same time.

AMD did add that eventually, in a matter of 3 - 5 years, most floating point workloads would be moved off of the CPU and onto the GPU. At that point you could even argue against including any sort of FP logic on the "CPU" at all. It's clear that AMD's design direction with Bulldozer is to prepare for that future.

In recent history AMD's architectural decisions have predicted, earlier than Intel, where the the microprocessor industry was headed. The K8 embraced 64-bit computing, a move that Intel eventually echoed some years later. Phenom was first to migrate to the 3 level cache hierarchy that we have today, with private L2 caches. Nehalem mimicked and improved on that philosophy. Bulldozer appears to be similarly ahead of its time, ready for world where heterogenous CPU/GPU computing is commonplace. I wonder if we'll see a similar architecture from Intel in a few years.

Index
POST A COMMENT

94 Comments

View All Comments

  • GaiaHunter - Monday, November 30, 2009 - link

    You start assuming the BM is 30mm^2.

    But both integer cores are exactly the same size. So if the resources you add are 10mm^2, that means that 2 int cores take 20mm^2 and 8 is 80 mm^2.

    Reply
  • GaiaHunter - Monday, November 30, 2009 - link

    You start assuming the BM is 30mm^2.

    But both integer cores are exactly the same size. So if the resources you add are 10mm^2, that means that 2 int cores take 20mm^2 and 8 is 80 mm^2.

    Now each integer core is 10mm^2 and represents 5% total die size - bam 200mm^2 die as you said.

    You had 50% resources to the BM and end with each int core at 5% of the die size and even get your 200mm^2 die.
    Reply
  • GaiaHunter - Tuesday, December 01, 2009 - link

    Now lets go the other way.

    Lets assume JF is right and Moore is also right.

    Grab a 8 core bulldozer CPU, shave 4 cores and save 5% die space.

    CPU die size is 200mm^2.

    5% is 10mm^2, so each int core is 2.5mm^2 and 8 of these will take 20mm^2.

    Now, Deneb is 260mm^2.

    If 8 core Bulldozer is 300mm^2, you end with 3.75mm^2 int cores.

    Small?

    Maybe.

    Around half of the die will be L3$.

    Northbridge circuits stay. Memory controller and the HT PHY also stay. L2$, fetch, decode and FPU are also shared.

    So basically you are just removing a very small portion.

    The question would be if you would need as much of those resources in the first place.
    Reply
  • GaiaHunter - Monday, November 30, 2009 - link

    Moore affirmation is only related to the INTEGER AREA of the BULLDOZER Module.

    Fruehe's claim is about total die size.

    If Moore's claim is about Integer Area of the Bulldozer module and Fruehe's claim is about die size, then, these claims don't have to be mutually exclusive.

    Additionally, there is no BULLDOZER MODULE. Forget about it. It isn't a unit by itself.

    2 (or more) of those bulldozer modules will share L2$ and L3$ for example, so you can't even define a damn size for a frigging module to start with.


    Reply
  • ThaHeretic - Monday, November 30, 2009 - link

    I really hate the move to call each "module" 2 "cores." AMD is shooting themselves in the foot when it comes to software licensing, in particular, Oracle DB licensing where they charge .5 CPU license for each x86 "multi-core." AMD's decision will double the cost of software running on Bulldozer.

    Bad move AMD, bad move.
    Reply
  • cfaalm - Monday, November 30, 2009 - link

    Really? This issue popped up for OS licenses when x86 dual cores were first introduced. Microsoft decided to go on a "per socket" base, not counting cores. Reply
  • Calin - Monday, November 30, 2009 - link

    Microsoft requests licensing per mainboard socket. Oracle requests a decreased licensing cost per core, if the core is a part of a socket. Meanwhile, some other companies requests licensing costs per core.
    Everyone with its own ways.
    Reply
  • cfaalm - Tuesday, December 01, 2009 - link

    Fair enough. So indeed this is a concern when buying this new stuff. I'd rather have AMD not call a this module 2 cores for the simple reason that is a sort of siamese twincore, not a true dual core. Though that is just the naming game. It looks promising nonetheless. Hopefully AMD/Oracle can enlighten the big system buyers by the time the decisions need to be made. Reply
  • ThaHeretic - Tuesday, December 01, 2009 - link

    A lot of licenses and MRCs are based on socket count, unfortunately, many of the most expensive software packages licensing arrangements are derived from core-count. Since Oracle changed their multi-core licensing near the end of 2006, quad-core x86 processors have counted as two licenses, six-cores as three licenses, etc. A Bulldozer quad-module die will therefore need 4 licenses for OracleDB.

    Does this suck? Yes. Is SQLServer's licensing model better for end-users? Yes. Is SQLServer anywhere near as awesome as Oracle Database? Hell no, not even close.
    Reply
  • DominionSeraph - Monday, November 30, 2009 - link

    "AMD claims that the performance benefit from the second integer core on a single Bulldozer module is up to 80% on threaded code."

    Yet their performance graph has FP gains outrunning integer.
    Reply

Log in

Don't have an account? Sign up now