The New Way to Count Cores

Henceforth AMD is referring to the number of integer cores on a processor when it counts cores. So a quad-core Zambezi is made up of four integer cores, or two Bulldozer modules. An eight-core would be four Bulldozer modules.


A hypothetical quad-core Bulldozer. Presumably the L3 cache would be shared by both modules.


A hypothetical eight-core Bulldozer. Presumably the L3 cache would be shared by all four modules.

It's a distinct shift from AMD's (and Intel's) current method of counting cores. A quad-core Phenom II X4 is literally four Phenom II cores on a single die, if you disabled three you would be left with a single core Phenom II. The same can't be said about a quad-core Bulldozer. The smallest functional block there is a module, which is two cores according to AMD.

Better than Hyper Threading?

Intel doesn't take, at least today, quite aggressive of a step towards multithreading. Nehalem uses SMT to send two threads to a single core, resulting in as much as a 30% increase in performance:

The added die area to enable HT on Nehalem is very small, far less than 5%.

AMD claims that the performance benefit from the second integer core on a single Bulldozer module is up to 80% on threaded code. That's more than what AMD could get through something like Hyper Threading, but as we've recently found out the impact to die size is not negligible. It really boils down to the sorts of workloads AMD will be running on Bulldozer. If they are indeed mostly integer, then the performance per die area will be quite good and the tradeoff worth it. Part of the integer/FP balance does depend on how quickly the world embraces computing on the GPU however...

According to AMD's roadmaps, Zambezi will use either 4 or 8 Bulldozer cores (that's 2 or 4 modules). The quad-core Zambezi should have roughly 10 - 35% better integer performance than a similarly clocked quad-core Phenom II. An eight-core Zambezi will be a threaded monster.

No GPU, for Now

The first APU from AMD will be Llano, but based on existing Phenom II cores. The move to a new manufacturing process combined with the first monolithic CPU/GPU is enough to do at once, there's no need to toss in a brand new microarchitecture at the same time.

AMD did add that eventually, in a matter of 3 - 5 years, most floating point workloads would be moved off of the CPU and onto the GPU. At that point you could even argue against including any sort of FP logic on the "CPU" at all. It's clear that AMD's design direction with Bulldozer is to prepare for that future.

In recent history AMD's architectural decisions have predicted, earlier than Intel, where the the microprocessor industry was headed. The K8 embraced 64-bit computing, a move that Intel eventually echoed some years later. Phenom was first to migrate to the 3 level cache hierarchy that we have today, with private L2 caches. Nehalem mimicked and improved on that philosophy. Bulldozer appears to be similarly ahead of its time, ready for world where heterogenous CPU/GPU computing is commonplace. I wonder if we'll see a similar architecture from Intel in a few years.

Index
Comments Locked

94 Comments

View All Comments

  • Calin - Monday, November 30, 2009 - link

    This was started by Sun's Niagara (I think) processor - 32 "int cores" and only one FP unit. A physical integer core ran four threads at a time (one instruction from each, with instant context switching between them), so one would have had eight physical integer cores with only one FP unit.
    The Niagara 2 would have had one FP unit for each of those integer cores, so one FP for each four int cores.
  • defter - Monday, November 30, 2009 - link

    This confusion has nothing to do with int or fp cores. By a common definition, a core a standalone unit, which can function on it's own if necessary.

    For example, each of Niagara's 8 cores had an own fetch and decode unit. In AMD's case, the "module" is the unit with it's own fetch and decode units, and integer ALU clusters only have own scheduler. Therefore, it's very confusing to call these clusters "cores".

    AMD seems to learned confusing marketing from from ATI and it's 320/800/1600 shader GPUs (which have actually have 64/160/320 shader units) :)
  • Spoelie - Monday, November 30, 2009 - link

    You're abusing the term pipelines and cores as well.

    Trying to describe an unconventional design with conventional terms might not always be very clear, but there is no right or wrong on this issue, just different viewpoints.
  • kobblestown - Monday, November 30, 2009 - link

    Fair enough. The use of the term "core" is still confusing though. There seems to be only one complete pipeline which branches after instruction decode. It's interesting whether the Icache is trace cache, i.e. contains decoded instructions or is a regular cache that needs to be fed back via the fetch/decode bottleneck. In the latter case I see no merit in calling the two integer (for lack of a better term) pipelines separate cores.
  • Penti - Monday, November 30, 2009 - link

    Uhm, the Sun UltraSPARC T1 was a 8-core eight integer-units and one floating point unit, and with CMT - eight threads per core 32-thread. It was still called a 8-core CPU, but the T2 included a fp unit for every integer unit, so I doubt AMD will use this config for long. But who knows. It's impressive if its up to 30% faster then a PII with twich the number of FPU's.

    Also one integer unit already includes three ALU's. They (the core/scheduler) seems independent enough.
  • blyndy - Monday, November 30, 2009 - link

    I also don't like their mixing of definitions. Let's just skirt that issue for now by call them by their thread number ie: 4-thread bulldozer, 8-thread bulldozer etc.
  • blyndy - Monday, November 30, 2009 - link

    From a marketing standpoint, '8 cores' would be more desirable to the laymen buyer than '4 cores', so that's one reason why they might have chosen to do it. Indeed I think Intel will quickly be following AMDs definition so as not to have 'less cores'.
  • lyeoh - Wednesday, December 2, 2009 - link

    8 cores is not automatically more desirable than 4 cores to someone who buys stuff like Oracle. They get charged per _core_.
  • heulenwolf - Monday, November 30, 2009 - link

    Its a poor choice of words. Unfortunately, since the term "core" was never fully defined, everyone gets to have their own understanding of what it means. I took core to mean a complete processor that could, if packaged alone, perform all the functions of a processor (both integer and floating point). I think AMD should have taken the high road and called "modules" "dual-integer cores" instead of splitting them into two "cores" with this extraneous, shared FPU tacked on like an afterthought. It makes the term core meaningless. I would guess that making the term "Core" meaningless would be advantageous to AMD, however, since that is the term Intel uses for their entire architecture.
  • Nocturnal - Monday, November 30, 2009 - link

    Very interesting. I hope that AMD will one day regain their edge they once held against. Oh how I miss those days. I embrace Intel nonetheless.

Log in

Don't have an account? Sign up now