The New Way to Count Cores

Henceforth AMD is referring to the number of integer cores on a processor when it counts cores. So a quad-core Zambezi is made up of four integer cores, or two Bulldozer modules. An eight-core would be four Bulldozer modules.


A hypothetical quad-core Bulldozer. Presumably the L3 cache would be shared by both modules.


A hypothetical eight-core Bulldozer. Presumably the L3 cache would be shared by all four modules.

It's a distinct shift from AMD's (and Intel's) current method of counting cores. A quad-core Phenom II X4 is literally four Phenom II cores on a single die, if you disabled three you would be left with a single core Phenom II. The same can't be said about a quad-core Bulldozer. The smallest functional block there is a module, which is two cores according to AMD.

Better than Hyper Threading?

Intel doesn't take, at least today, quite aggressive of a step towards multithreading. Nehalem uses SMT to send two threads to a single core, resulting in as much as a 30% increase in performance:

The added die area to enable HT on Nehalem is very small, far less than 5%.

AMD claims that the performance benefit from the second integer core on a single Bulldozer module is up to 80% on threaded code. That's more than what AMD could get through something like Hyper Threading, but as we've recently found out the impact to die size is not negligible. It really boils down to the sorts of workloads AMD will be running on Bulldozer. If they are indeed mostly integer, then the performance per die area will be quite good and the tradeoff worth it. Part of the integer/FP balance does depend on how quickly the world embraces computing on the GPU however...

According to AMD's roadmaps, Zambezi will use either 4 or 8 Bulldozer cores (that's 2 or 4 modules). The quad-core Zambezi should have roughly 10 - 35% better integer performance than a similarly clocked quad-core Phenom II. An eight-core Zambezi will be a threaded monster.

No GPU, for Now

The first APU from AMD will be Llano, but based on existing Phenom II cores. The move to a new manufacturing process combined with the first monolithic CPU/GPU is enough to do at once, there's no need to toss in a brand new microarchitecture at the same time.

AMD did add that eventually, in a matter of 3 - 5 years, most floating point workloads would be moved off of the CPU and onto the GPU. At that point you could even argue against including any sort of FP logic on the "CPU" at all. It's clear that AMD's design direction with Bulldozer is to prepare for that future.

In recent history AMD's architectural decisions have predicted, earlier than Intel, where the the microprocessor industry was headed. The K8 embraced 64-bit computing, a move that Intel eventually echoed some years later. Phenom was first to migrate to the 3 level cache hierarchy that we have today, with private L2 caches. Nehalem mimicked and improved on that philosophy. Bulldozer appears to be similarly ahead of its time, ready for world where heterogenous CPU/GPU computing is commonplace. I wonder if we'll see a similar architecture from Intel in a few years.

Index
POST A COMMENT

94 Comments

View All Comments

  • Lifted - Monday, November 30, 2009 - link

    This. Maybe this takes into account the moving of FP off the die to a add on module or GPU.

    I never could have imagined we'd be going back to the add on FP modules.
    Reply
  • GodisanAtheist - Monday, November 30, 2009 - link

    I believe that's because Interlagos gets the integrated GPU core, which in terms of theoretical performance will send FP performance through the roof. Reply
  • DominionSeraph - Monday, November 30, 2009 - link

    But then the performance is underwhelming. Current-gen GPUs would be off the chart. Reply
  • medi01 - Monday, November 30, 2009 - link

    Could someone decrypt the following text for me please:

    [quote]It all started about two weeks ago when I got a request from AMD to have a quick conference call about Bulldozer. I get these sorts of calls for one of two reasons. Either:

    1) I did something wrong, or
    2) Intel did something wrong.

    This time it was the former. I hate when it's the former.[/quote]
    Reply
  • GaiaHunter - Monday, November 30, 2009 - link

    It means Anand get these calls ("short conference calls") requested by AMD when:

    1) Anand makes a mistake

    or

    2) Intel is being naughty (like telling OEM to not sell AMD).

    I asked Anand in the Bulldozer article if a quad-core zambezi meant 4cores/8 threads or 4cores/4 threads.

    He said (and I was convinced at that time it was the correct answer too)that a zambezi quad-core meant 4 cores/8 threads and an octo-core would be 8cores/16 threads. Or if you prefer 4Modules/8cores/8threads and 8Modules/16cores/16threads.

    But it seems it is 2Modules/4cores/4threads and 4modules/8cores/8threads.

    Sincerely, I can't really blame Anand - this shit is confusing.
    Reply
  • Kiijibari - Monday, November 30, 2009 - link

    Yes .. for desktops. However for Servers there will be again an MCM with two dies, i.e. 8 modules, 16 cores, 16 threads, called Interlagos. Reply
  • piesquared - Monday, November 30, 2009 - link

    Which makes a person wonder, if AMD has a 16 core Intelagos in the server space, how nice and cool and efficient will an 8 core Zambezi be. Reply
  • GaiaHunter - Monday, November 30, 2009 - link

    U can see it in there

    http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

    And also we can see that this new designation was causing quite a confusion in the forums.

    http://forums.anandtech.com/showthread.php?t=20230...">http://forums.anandtech.com/showthread.php?t=20230...
    Reply
  • pcfxer - Monday, November 30, 2009 - link

    The first with L3 cache was Intel Pentium 4 EXTREME EDITION.

    Phenom just had the most logical use of L3 in the sense that it served as a "community" buffer.
    Reply
  • JimmiG - Monday, November 30, 2009 - link

    The first with L3 cache was actually the AMD K6-III released in 1999. Of course, the L3 was actually on the mobo, while the L2 was on-die. But it did use a tri-level cache, making it outperform the Pentium III Katmai on integer workloads. Reply

Log in

Don't have an account? Sign up now