Zooming in on SPEC CPU2006: the Good

We filtered out those benchmarks that showed a 30% improvement over Magny-Cours (based on the K10 core). Remember the Bulldozer architecture has been designed to deliver 33% more cores in the same power envelope while keeping the IPC more or less at 95% of the K10. The rest of the performance should have come from a clock speed increase. The clock speed increases did not materialize in the real world, and we also kept the clock speed the same to focus on the architecture. Where a 30-35% performance increase is good, anything over 35% indicates that the Bulldozer architecture handles that particular sort of software better than Magny-Cours.

SPEC Int CPU2006: the Bulldozer friendly

The Libquantum score is the most spectacular. Bulldozer performs over twice as fast and the score of 2750 is not that far from the all mighty Xeon 2660 at 2.2GHz (3310). Bulldozer here is only 17% slower.

At first sight, there is nothing that should make Libquantum run very fast on Bulldozer. Libquantum contains a high amount of branches (27%) and we have seen before that although Bulldozer has a somewhat improved branch predictor, the deeper pipeline and higher branch misprediction penalty can cause a lot of trouble. In fact, Perlbench (23%), Sjeng Chess (21%), and Gobmk (AI, 21%) are branchy software and are among the worst performing tests on Bulldozer. Luckily, Libquantum has a much easier to predict branches: libquantum is among the software pieces that has the lowest branch misprediction rates (less than six per 1000 instructions).

We all know that Bulldozer can deal much better with loads and stores than Magny-Cours. However, libquantum has the lowest (!) amount of load/stores (19%=14% Loads, 5% Stores). The improved Memory Level Parallelism of Bulldozer is not the answer. The table below gives an idea of the instruction mix of SPEC CPU2006int.

SPEC Int 2006 Application IPC* Branches Stores Loads Total Loads/
perlbench 1.67 23 12 24 36
Bzip compression 1.43 15 9 26 35
Gcc 0.83 22 13 26 39
mcf 0.28 19 9 31 40
Go AI 1.00 21 14 28 42
hmmer 1.67 8 16 41 57
Chess 1.25 21 8 21 29
libquantum 0.43 27 5 14 1
h264 encoding 2.00 8 12 35 47
omnetppp 0.38 21 18 34 52
astar 0.56 17 5 27 32
XML processing 0.66 26 9 32 41

* IPC as measured on Core 2 Duo.

Libquantum has a relatively high amount of cache misses on most CPUs as it works with a 32MB data set, so it benefits from a larger cache. The 8MB L3 vs 6MB L3 might have boosted performance a bit, but the main reason is vastly improved prefetching inside Bulldozer. According to the researchers of the university of Austin and Microsoft, the prefetch requests in libquantum are very accurate. If you check AMD's own publications you'll notice that there were two major improvements to improve the single-threaded performance of the Bulldozer architecture (compared to the previous ones): an improved Turbo Core and vastly improved prefetching.

Next, let's look at the excellent mcf result. mcf is by far the most memory intensive SPEC CPU Int benchmark out there. mcf misses the L1 data cache about five times more than all the other benchmarks on average. The hit rate is lower than 70%! mcf also misses the last level cache up to eight times more than all other benchmarks. Clearly mcf is a prime candidate to benefit from the vastly improved L/S units of Bulldozer.

Omnetpp is not that extreme, but the instruction mix has 52% loads and stores, and the L2 and last level cache misses are twice as high as the rest of the pack. In contrast to mcf, the amount of branch mispredictions is much lower, despite the fact that it has a similar, relatively high percentage of branches (20%). So the somewhat lower reliance on the memory subsystem is largely compensated for by a much lower amount of branch mispredictions. To be more precise: the amount of branch predictions is about three times lower! This most likely explains why Bulldozer makes a slightly larger step forward in omnetpp compared to the previous AMD architecture than in it does in mcf.

SPEC CPU 2006 Integer Zooming in on SPEC CPU 2006: the Bad


  • Homeles - Wednesday, May 30, 2012 - link

    This. Read 3rd party reviews (like AnandTech!) -- several of them -- and draw your conclusions from there. That's pretty much the point of reviews; if marketing teams could provide honest, reliable benchmarks over a wide range of applications, we'd have little need for 3rd party reviews. Reply
  • Mugur - Thursday, May 31, 2012 - link

    Well... they actually did! Reply
  • moravista - Wednesday, May 30, 2012 - link

    Great article Johan! I have been reading your articles since the Pentium III / K6-2 days and have really enjoyed them! Thanks for sharing your insight! Keep 'em coming! Reply
  • JohanAnandtech - Friday, June 1, 2012 - link

    Great to hear from you. Did you used to participate at the different forums on a different callsign? Reply
  • muy - Wednesday, May 30, 2012 - link

    i want a phenom II x4 980+ on 32 nm. this whole idea of "lets put as many crippled dual cores on a die and smack a level 3 cache on top and call it out next cpu" is utter crap stuff that doesn't multi thread well (95 % of all stuff).

    6 core bulldozer i bought to replace my amd x3 450 is slower than the chip i wanted to replace at the same clock speed. now i have a shiny asus rog mb, a x3 450 powering it, and a 6 core bulldozer gathering dust. what a waste of money that was.

    shame i can't find any x4 970+'s anymore and amd is to foolhardy to keep manufacturing their best gaming cpu's, let alone do a shrink on them to 32 nm.

    i can only imagine how much better a phenom 2 x4 9xx, default clocked at 4.2 ghz+ would be than any bulldozer. (and how much cheaper to manufacture considering the die size compared to the die size of bulldozer).

    i just don't understand amd.
  • Roland00Address - Wednesday, May 30, 2012 - link

    Microcenter has these following processors
    1045t six core for $99
    965 quad core black edition for $99
    960t quad core black edition for $89 (this model is a disabled six core and has a possibility of unlocking to a 6 core. The 960t is a clearance processor so it is while supplies last.
  • fic2 - Thursday, May 31, 2012 - link

    Those are all 45 nm. He is wanting a tick - a die shrunk Phenom II.
    Would have to agree with him. If AMD would do a die shrink they would have a killer product - assuming GloFo didn't f*ck it up.
  • muy - Wednesday, May 30, 2012 - link

    bulldozer doesn't do single threaded, highly branching (cough games cough) stuff well.

    and before you say "some games use multiple cores", i'll say that 1 core running on 100 % and 7 cores at 5 % is not a good use of multi threading.

    (1 * 100) + (7 * 5) = (1 * 100) + (1 * 35) - 1.35 cores used. this means that a DUAL core going at 10 % higher speed than the exampled 8 core would be 10 % faster than the 8 core 'using' it's 8 cores.

    clock speed + ipc are the only things that matter 90% + of the time for games.
  • wolfman3k5 - Wednesday, May 30, 2012 - link

    People don't buy CPUs based on theoretical performance, ideology or brand loyalty (OK, some fan-boys do). Most of us are not computer engineers, and even if we where, it wouldn't matter, because at the end of the day only the end result would matter: performance, efficiency and price. Just like I didn't buy Intel because it looked good on paper back in the glory days of AMD (cca. 2005). So no matter how deep and involved these articles are, AMD still trails Intel when it comes to performance, and it will do so until their lazy and incompetent CPU engineers will get off their lazy buts and start working. The sole reason why Bulldozer was such a massive fail was because most of the design process was highly automated. So, stop slacking and start working lazy AMD engineers! Reply
  • Homeles - Wednesday, May 30, 2012 - link

    Being a "lazy" electrical engineer is practically impossible. The amount of work that has to go into making these processors simply function is quite massive. These guys work hard to get to where they are with their careers and work even harder to keep those careers. The margin of error here is also quite huge... a small flaw can create enormous performance penalties.

    I'd be willing to bet that many, if not most of Bulldozer's shortcomings could be blamed on management. Saying it was "lazy engineers" is callous and ignorant.

