The Current Situation

It's not hard to explain why an 8-thread processor with slightly lower single-threaded performance does not do well in many desktop applications. If you compare for example the hex-core Core i7-3960X with a quad-core i7-3820, four games did not benefit from the extra two cores: Civilization V, Crysis, Dirt 3 and Metro 2033. In Starcraft 2, World of Warcraft, and Dawn of War 2, the 50% higher core count was good for a 10% performance boost at best. In other words, the situation has improved, but most games don't scale well beyond four cores. There are also other factors at play, though, as it's already known that StarCraft II doesn't use more than two cores; instead, it's likely the 15MB (vs. 10MB in i7-3820) L3 cache that helps improve performance.

The situation in the server space is a lot harder to explain. The Opteron 6100 was able to keep up—more or less—with the Xeon 5600 performancewise. However, the Xeon 5600 was equipped with much better power management and the Xeon won the performance/watt race in most applications, with the exception of HPC applications.

The Opteron 6200 added a bit of performance but sips much less power at low and medium load, so it was capable of offering a better performance per Watt ratio than its older brother. However, since the Xeon E5 came out, the situation became pretty dramatic for the Opteron. One telling example is the fact that only one VMmark 2.0 result on the Opteron 6200 exists, but it has been withdrawn. Even if the reported 12.77 score is close to truth, we need four AMD Opteron 6726 (2.3GHz) to beat the best dual Xeon E5 (2690 at 2.9GHz) by 15%.

We have shown already quite a few benchmarks in two Opteron 6276 articles and one Xeon E5 review. We summarized the relevant numbers of both articles in the table below. The benchmarks below are real world and very relevant to the professional in our opinion.

Software: Importance in the market Opteron 6276 vs.
Opteron 6174
Xeon E5-2660
vs. Opteron 6276

Virtualization: 20-50%

   
ESXi + Linux (vApusMark FOS)

+1%

+40%

OLAP Databases: 10-15%

 

 
MS SQL Server 2008 R2 (OLAP throughput)

-9%

+34%

HPC: 5-7%

 

 

LS-Dyna (Neon-Refined)

+21%

+26%

Rendering software: 2-3%

 

 

Cinebench

+2%

+37%

ERP

 

 

SAP

+18%

+13%

Now consider that all these applications are highly-threaded and scale well. Despite the 33% higher integer core count, the Opteron 6276 is not able to outperform the older Magny-Cours in the OLAP, virtualization and rendering benchmarks. However, the architecture is showing its promise by offering about 20% better performance in SAP and HPC applications.

What makes the Bulldozer cores fail in the OLAP benchmark and succeed in SAP? We now have some interesting profiling details on SAP as well as our OLAP benchmark, so we can delve deeper.

Setting Expectations: the Back End SAP S&D Benchmark in Depth
Comments Locked

84 Comments

View All Comments

  • Aone - Monday, June 4, 2012 - link

    Bulldozer's conception was wrong from the scratch.
    I told it a few time, let's me explain it here again.

    I'm sure everyone of you do remember AMD's own words "one BD module has 80% of throughput of two independent cores".
    What does this mean in figures?
    Let's take the performance of one core as 1.0 point. Therefore two BD modules would have 3.2 points or in other words less than 10% than 3.0 (performance of three independent cores).
    Should I remind that with development of independent cores AMD wouldn't had wasted resources (engineering, transistors, money and time) on design and debugging the shared logic. The chip could have been much smaller due to the fact that the chip would have had only 1MB L2 and 2MB L3 per each core and no shared logic. And all of those released resources could have been allocated for development of a more advanced core.

    You see that packing two cores inside a one module was wrong even on the conceptional level. I'm very curious who was the main supporter and decision maker of this approach in AMD.

    AMD must through away BD conception and return to standard practice. The only question remains: Does AMD have long enough TTL to do it?

    BTW, I recommend to look through Spec results again. The comparison of 12c Opteron 62xx w/ 12c Opteron 61xx is of special interest. And let's not forget that Opteron 62xx submissions have higher freq, faster memory and as well as more advanced compiler version and extended instruction set.
  • TC2 - Monday, June 18, 2012 - link

    I'm agree in 100%!!!
    The BD uA is "unsuccessful" port from graphics uA. There is many and major drawbacks! Note for example one - to write an optimal software you must adopt an application at algorithmic level (in sense of thread specialization)! This is because the both BD-cores are not the same! Also they shares L1 IC, the number of elements is high, ... and many others uA weaknesses.
  • evolucion8 - Tuesday, June 17, 2014 - link

    Northwood was 20 stage pipelines and Prescott was 31, not 39...
  • tipoo - Wednesday, October 8, 2014 - link

    Where is the aftermath?

    "But what about the fourth show stopper? That is probably one of the most interesting ones because it seems to show up (in a lesser degree) in Sandy Bridge too. However, we're not quite ready with our final investigations into this area, so you'll have to wait a bit longer. To be continued...."

Log in

Don't have an account? Sign up now