The New Way to Count Cores

Henceforth AMD is referring to the number of integer cores on a processor when it counts cores. So a quad-core Zambezi is made up of four integer cores, or two Bulldozer modules. An eight-core would be four Bulldozer modules.


A hypothetical quad-core Bulldozer. Presumably the L3 cache would be shared by both modules.


A hypothetical eight-core Bulldozer. Presumably the L3 cache would be shared by all four modules.

It's a distinct shift from AMD's (and Intel's) current method of counting cores. A quad-core Phenom II X4 is literally four Phenom II cores on a single die, if you disabled three you would be left with a single core Phenom II. The same can't be said about a quad-core Bulldozer. The smallest functional block there is a module, which is two cores according to AMD.

Better than Hyper Threading?

Intel doesn't take, at least today, quite aggressive of a step towards multithreading. Nehalem uses SMT to send two threads to a single core, resulting in as much as a 30% increase in performance:

The added die area to enable HT on Nehalem is very small, far less than 5%.

AMD claims that the performance benefit from the second integer core on a single Bulldozer module is up to 80% on threaded code. That's more than what AMD could get through something like Hyper Threading, but as we've recently found out the impact to die size is not negligible. It really boils down to the sorts of workloads AMD will be running on Bulldozer. If they are indeed mostly integer, then the performance per die area will be quite good and the tradeoff worth it. Part of the integer/FP balance does depend on how quickly the world embraces computing on the GPU however...

According to AMD's roadmaps, Zambezi will use either 4 or 8 Bulldozer cores (that's 2 or 4 modules). The quad-core Zambezi should have roughly 10 - 35% better integer performance than a similarly clocked quad-core Phenom II. An eight-core Zambezi will be a threaded monster.

No GPU, for Now

The first APU from AMD will be Llano, but based on existing Phenom II cores. The move to a new manufacturing process combined with the first monolithic CPU/GPU is enough to do at once, there's no need to toss in a brand new microarchitecture at the same time.

AMD did add that eventually, in a matter of 3 - 5 years, most floating point workloads would be moved off of the CPU and onto the GPU. At that point you could even argue against including any sort of FP logic on the "CPU" at all. It's clear that AMD's design direction with Bulldozer is to prepare for that future.

In recent history AMD's architectural decisions have predicted, earlier than Intel, where the the microprocessor industry was headed. The K8 embraced 64-bit computing, a move that Intel eventually echoed some years later. Phenom was first to migrate to the 3 level cache hierarchy that we have today, with private L2 caches. Nehalem mimicked and improved on that philosophy. Bulldozer appears to be similarly ahead of its time, ready for world where heterogenous CPU/GPU computing is commonplace. I wonder if we'll see a similar architecture from Intel in a few years.

Index
POST A COMMENT

94 Comments

View All Comments

  • mattclary - Monday, December 14, 2009 - link

    [quote]

    Anand,

    Think of each twin Integer core Bulldozer module as a single unit, so correct.

    [/quote]

    It's no wonder you misinterpreted what he said. This is vague at best! "Is it either or? - Correct!"
    Reply
  • aj28 - Thursday, December 03, 2009 - link

    Alright, so here's the quote from the article. Take note of the parts in bold...

    [QUOTE]Also, just to confirm, when your roadmap refers to 4 bulldozer cores that is four of these cores:

    http://images.anandtech.com/reviews/cpu/amd/FAD200...">http://images.anandtech.com/reviews/cpu/amd/FAD200...

    Or does each one of those cores count as two? I think it's the former but I just wanted to confirm.[/QUOTE]

    And AMD's response...

    [QUOTE]Think of each twin Integer core Bulldozer module as a single unit, so correct.[/QUOTE]

    So to me this reads, "Correct, the former, meaning..."

    [QUOTE]...when your roadmap refers to 4 bulldozer cores that is four of these cores:

    http://images.anandtech.com/reviews/cpu/amd/FAD200...">http://images.anandtech.com/reviews/cpu/amd/FAD200...[/QUOTE]

    There's a good chance that the majority is correct and I am in fact wrong, but... Well, that's just how I read their response. I feel there is a good chance of some more confusion afoot, much like the percentages being thrown around in the original article.
    Reply
  • aj28 - Thursday, December 03, 2009 - link

    I think it's also worth noting that I fail at quoting... Evidently... Sorry! Reply
  • swindelljd - Wednesday, December 02, 2009 - link

    I bet Oracle is salivating over the new core count technique since it is sure to create a huge surge in their revenue because they charge per core on the x86 platform. Reply
  • Sivar - Tuesday, December 01, 2009 - link

    If FP performance is given the backseat, it could impact game performance for well multi-threaded games. Reply
  • JumpingJack - Wednesday, December 02, 2009 - link

    Depends on how effectively the designers are able to share the FP in this arrangement, but yeah -- gaming will be a question mark. I am pretty confident it will be better not worse. Reply
  • nirmv - Tuesday, December 01, 2009 - link

    For what I understand, AMD figured out how to reduce core size by 25% without impacting performance.
    Each 2 cores will now share the same fetch/decode units (using SMT like Intel), and also the same FP unit (but doubled for 256 bits so actually it's 2 128 bit unit), but seperate Int unit like before). So actually they share half of the logic of two cores together, so they now use 150% of the die area of one core for 2 cores, or in other words save 25% of each core (75% * 2 = 150%).
    But, it will still have 1/2 the throuput of Sandy Bridge in FP, and they still will have 1/2 the bandwidth of the fetch/decode because they use 1 for two cores instead using 1 per each.

    Nevertheless it looks like a wise decision in terms of power/performance. So nice, but it won't give AMD the performance crown.

    Reply
  • Seramics - Tuesday, December 01, 2009 - link

    From the way it seems, I'm afraid the badly delayed, highly anticipated, much hyped and AMD's only hope to retake the performance crown from Intel will fall short of expectations. Unless they really come up with a competitive n powerful processor, I'm afraid the AMD we know from A64 days will continue to be history till the next major architecture after bulldozer which could well be 5 years or so after 2011. AMD to be budget player till then. Reply
  • Alberto - Tuesday, December 01, 2009 - link

    Buldozer seems too late against Intel upcoming offerings.
    An eight core Buldozer will be clearly slower than an eight core Sandy Bridge, in both integer and Fp.
    This cpu implementation seems done to fight Nehalem ( two 128 bit units, both possibly utilized from one core only ).
    Sandy Bridge will have two times Fp power and threads per die,
    assuming the article right.
    The only manner to be competitive is to consider a single "block"
    like a monolitic core. Intel can answer with 50% more cores/die,
    performing a complessive better integer and Fp performance.

    Still we don't know what will be the new integer performance of the
    Sandy Bridge integer unit. I believe it will be higher than in Nehalem.
    Reply
  • epobirs - Tuesday, December 01, 2009 - link

    I don't buy this claim that FP will be eliminated from CPUs in favor of doing it all on a GPU. There are too many situations where FP is still needed on a per core basis with a primarily integer load. About two minutes after the first systems ship with no integrated FP in the CPU (Bulldozer SX?) there will be engineers thinking themselves clever by proposing to boost FP performance by integrating it into the CPU die!

    What will happen instead is the FP and onboard low-end graphics solution will merge. The monster GPUs will be there for high-end FP as needed and the die area consumed by the FP and IGA minimized so as to be beneath concern. FP may be external to the cores but they won't be sold without at least one FP/IGA module in the mix. That way you have a chip that is versatile for a wide range of different boxes but also cost competitive.
    Reply

Log in

Don't have an account? Sign up now