The Bulldozer Aftermath: Delving Even Deeper

Name: The Bulldozer Aftermath: Delving Even Deeper
Item: The Bulldozer Aftermath: Delving Even Deeper
Author: Johan De Gelas

by Johan De Gelas on May 30, 2012 1:15 AM EST

84 Comments | Add A Comment

84 Comments

Next stop: SPEC CPU2006 Int Rate

There is no denying that SPEC CPU2006 was never one of our favorite benchmarks in the Professional IT section of AnandTech. Although it is the standard benchmark of most CPU designers and academic researchers, it is far from a real world benchmark for most professional IT users.

For starters, a typical SPEC CPU2006 benchmark consists of running as many SPEC CPU2006 instances as there are cores available in the machine. The SPEC CPU2006 instances run completely independently from each other, so there are much fewer locks or other synchronization mechanisms at work: the benchmark scales almost perfectly as long as there is enough bandwidth available. Unfortunately, that is not how the majority of business software behaves: databases have high locking overhead and most applications need some synchronization.

Secondly, most of the subtests are related to gaming and simulations (HPC). Typically these applications are much more processing intensive and achieve a higher IPC than your average business application.

Lastly, the source code of the SPEC CPU2006 tests is compiled with extremely aggressively tuned compiler settings and compilers that are less used in the rest of the IT world. Few SPEC CPU2006 results are compiled with gcc and Microsoft's Visual Studio, for example.

However, it would be a step too far to call SPEC CPU2006 useless. From a high level perspective, the scores of SPEC CPU2006 show a strong correlation with L2/L3 cache misses, cache latency, and to a lesser degree branch prediction, just like many business applications. Given similar platforms (like Intel Nehalem and AMD's Shanghai), the CPU SPEC2006 Int score gives a vague idea of which CPU has the most raw integer crunching power, although it overemphasizes memory bandwidth and core count.

To understand the weaknesses and strengths of a certain CPU architecture, even in server workloads, there is no better test than SPEC CPU2006. The first reason is that it has been profiled by so many different people from academia to engineers. If we zoom in on the subtests we can derive a lot of information as we know exactly how these applications behave: there have been lots of performance characterization papers going into great detail.

The second reason is that SPEC CPU2006 tests are compiled with the most optimal compilers and compiler options available at a certain point in time. This gives us some insight into the "real" (e.g. future) potential of a processor. We can exclude the possibility that a processor performs badly because some legacy piece of code is detrimental to the performance. If the CPU cannot score well with these kinds of binaries, it never will!

Auto-parallelization made the normal single-threaded SPEC CPU benchmarks very hard to read. We turn to the rate version instead. Since it scales almost perfectly, it is relatively easy to deduce single-threaded performance from the SPEC rate numbers--on the condition that cache interference and bandwidth bottlenecks do not blur the picture too much, so we have to be careful with those benchmarks that miss the L2 cache a lot. The current CPU2006 int scores are as follows:

SPEC CPU2006 int rate base

The Xeon E5 is the most efficient clock for clock, core for core. But let us compare the Opteron 6276 (2.3GHz, 16-core Bulldozer) and the Opteron 6176 (2.3GHz, 12-core Magny-Cours) in the subtests.

SPEC Int CPU2006

You can immediately derive from these numbers that the "Bulldozer" architecture has a very different architecture profile than Magny-Cours (which was based on the improved Barcelona architecture, Istanbul). Libquantum, omnetpp and mcf show larger performance boosts than you might expect from the 33% higher corecount. These benchmarks show that in some scenarios, Bulldozer can even increase the IPC compared to its predecessor.

We also notice that Bulldozer has some serious weaknesses compared to its predecessor, as performance decreases in the Perlbench, the game AI (gobmk), the chess (Sjeng), and the x264 encoding subtests. And although it is not uncommon that a new architecture fails to beat the previous architecture in every benchmark, it is not a good sign that even a 33% core count cannot overcome the IPC decrease in a very good scaling benchmark. If we try to understand what makes these subtests different from the others, we can get an idea of what kind of software makes Bulldozer choke. This in turn can help us to understand if relatively small tweaks can help future Opterons.

SAP S&D Benchmark in Depth Zooming in on SPEC CPU 2006: the Good

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

84 Comments

View All Comments

Homeles - Wednesday, May 30, 2012 - link
This. Read 3rd party reviews (like AnandTech!) -- several of them -- and draw your conclusions from there. That's pretty much the point of reviews; if marketing teams could provide honest, reliable benchmarks over a wide range of applications, we'd have little need for 3rd party reviews.
Mugur - Thursday, May 31, 2012 - link
Well... they actually did!
moravista - Wednesday, May 30, 2012 - link
Great article Johan! I have been reading your articles since the Pentium III / K6-2 days and have really enjoyed them! Thanks for sharing your insight! Keep 'em coming!
JohanAnandtech - Friday, June 1, 2012 - link
Great to hear from you. Did you used to participate at the different forums on a different callsign?
muy - Wednesday, May 30, 2012 - link
i want a phenom II x4 980+ on 32 nm. this whole idea of "lets put as many crippled dual cores on a die and smack a level 3 cache on top and call it out next cpu" is utter crap stuff that doesn't multi thread well (95 % of all stuff).

6 core bulldozer i bought to replace my amd x3 450 is slower than the chip i wanted to replace at the same clock speed. now i have a shiny asus rog mb, a x3 450 powering it, and a 6 core bulldozer gathering dust. what a waste of money that was.

shame i can't find any x4 970+'s anymore and amd is to foolhardy to keep manufacturing their best gaming cpu's, let alone do a shrink on them to 32 nm.

i can only imagine how much better a phenom 2 x4 9xx, default clocked at 4.2 ghz+ would be than any bulldozer. (and how much cheaper to manufacture considering the die size compared to the die size of bulldozer).

i just don't understand amd.
Roland00Address - Wednesday, May 30, 2012 - link
Microcenter has these following processors
1045t six core for $99
965 quad core black edition for $99
960t quad core black edition for $89 (this model is a disabled six core and has a possibility of unlocking to a 6 core. The 960t is a clearance processor so it is while supplies last.
fic2 - Thursday, May 31, 2012 - link
Those are all 45 nm. He is wanting a tick - a die shrunk Phenom II.
Would have to agree with him. If AMD would do a die shrink they would have a killer product - assuming GloFo didn't f*ck it up.
muy - Wednesday, May 30, 2012 - link
bulldozer doesn't do single threaded, highly branching (cough games cough) stuff well.

and before you say "some games use multiple cores", i'll say that 1 core running on 100 % and 7 cores at 5 % is not a good use of multi threading.

(1 * 100) + (7 * 5) = (1 * 100) + (1 * 35) - 1.35 cores used. this means that a DUAL core going at 10 % higher speed than the exampled 8 core would be 10 % faster than the 8 core 'using' it's 8 cores.

clock speed + ipc are the only things that matter 90% + of the time for games.
wolfman3k5 - Wednesday, May 30, 2012 - link
People don't buy CPUs based on theoretical performance, ideology or brand loyalty (OK, some fan-boys do). Most of us are not computer engineers, and even if we where, it wouldn't matter, because at the end of the day only the end result would matter: performance, efficiency and price. Just like I didn't buy Intel because it looked good on paper back in the glory days of AMD (cca. 2005). So no matter how deep and involved these articles are, AMD still trails Intel when it comes to performance, and it will do so until their lazy and incompetent CPU engineers will get off their lazy buts and start working. The sole reason why Bulldozer was such a massive fail was because most of the design process was highly automated. So, stop slacking and start working lazy AMD engineers!
Homeles - Wednesday, May 30, 2012 - link
Being a "lazy" electrical engineer is practically impossible. The amount of work that has to go into making these processors simply function is quite massive. These guys work hard to get to where they are with their careers and work even harder to keep those careers. The margin of error here is also quite huge... a small flaw can create enormous performance penalties.

I'd be willing to bet that many, if not most of Bulldozer's shortcomings could be blamed on management. Saying it was "lazy engineers" is callous and ignorant.

The Bulldozer Aftermath: Delving Even Deeper

Post Your Comment

84 Comments

View All Comments

Homeles - Wednesday, May 30, 2012 - link

Mugur - Thursday, May 31, 2012 - link

moravista - Wednesday, May 30, 2012 - link

JohanAnandtech - Friday, June 1, 2012 - link

muy - Wednesday, May 30, 2012 - link

Roland00Address - Wednesday, May 30, 2012 - link

fic2 - Thursday, May 31, 2012 - link

muy - Wednesday, May 30, 2012 - link

wolfman3k5 - Wednesday, May 30, 2012 - link

Homeles - Wednesday, May 30, 2012 - link

Log in

Don't have an account? Sign up now