Zooming in on SPEC CPU2006: the Bad

The optimized SPEC CPU2006 int binaries allow gains in the range of 30% to 117%. Unfortunately the complete benchmark suite only shows a gain of 21% when we compare the Opteron 6276 with the 6176. Closer inspection shows that four benchmarks regress. The regression appears to be small in most benchmarks (7 to 14%), but remember that we have 33% more cores. Even a small regression of 7% means that we are losing up to 30% of the previous architecture's single-threaded performance!

SPEC Int CPU2006: the Bulldozer unfriendly

Perlbench has high locality in the L1 and L2 caches and rarely accesses the Last Level Cache, let alone the memory. The result is a benchmark that delivers high IPC: 1.67 on a five year old Core 2 Duo ("Merom"), and close to +/- 1.9 IPC on the latest Intel CPUs. The interesting thing to note is that h264ref and Perlbench are among the top IPC performers in the SPEC CPU2006 suite.

Sjeng (chess) and Gobmk are both Artificial Intelligence subroutines. Again, the IPC is relatively high (>1), but their most important performance characteristic is that they contain a very high percentage of hard to predict branches: twice the average of the SPEC CPU integer suite.

Granted, the evidence we've presented is still circumstantial. It would take an extremely long and intensive profiling session on all new processors to really determine what is going on, and that is beyond our time budget: one SPEC CPU run alone consumes a whole day. However, we did get our hands dirty. A short profiling session on three different benchmarks gives us some very interesting results that we want to discuss next.

Zooming in on SPEC CPU 2006: the Good IPC Analysis
Comments Locked

84 Comments

View All Comments

  • Aone - Monday, June 4, 2012 - link

    Bulldozer's conception was wrong from the scratch.
    I told it a few time, let's me explain it here again.

    I'm sure everyone of you do remember AMD's own words "one BD module has 80% of throughput of two independent cores".
    What does this mean in figures?
    Let's take the performance of one core as 1.0 point. Therefore two BD modules would have 3.2 points or in other words less than 10% than 3.0 (performance of three independent cores).
    Should I remind that with development of independent cores AMD wouldn't had wasted resources (engineering, transistors, money and time) on design and debugging the shared logic. The chip could have been much smaller due to the fact that the chip would have had only 1MB L2 and 2MB L3 per each core and no shared logic. And all of those released resources could have been allocated for development of a more advanced core.

    You see that packing two cores inside a one module was wrong even on the conceptional level. I'm very curious who was the main supporter and decision maker of this approach in AMD.

    AMD must through away BD conception and return to standard practice. The only question remains: Does AMD have long enough TTL to do it?

    BTW, I recommend to look through Spec results again. The comparison of 12c Opteron 62xx w/ 12c Opteron 61xx is of special interest. And let's not forget that Opteron 62xx submissions have higher freq, faster memory and as well as more advanced compiler version and extended instruction set.
  • TC2 - Monday, June 18, 2012 - link

    I'm agree in 100%!!!
    The BD uA is "unsuccessful" port from graphics uA. There is many and major drawbacks! Note for example one - to write an optimal software you must adopt an application at algorithmic level (in sense of thread specialization)! This is because the both BD-cores are not the same! Also they shares L1 IC, the number of elements is high, ... and many others uA weaknesses.
  • evolucion8 - Tuesday, June 17, 2014 - link

    Northwood was 20 stage pipelines and Prescott was 31, not 39...
  • tipoo - Wednesday, October 8, 2014 - link

    Where is the aftermath?

    "But what about the fourth show stopper? That is probably one of the most interesting ones because it seems to show up (in a lesser degree) in Sandy Bridge too. However, we're not quite ready with our final investigations into this area, so you'll have to wait a bit longer. To be continued...."

Log in

Don't have an account? Sign up now