Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series

Name: Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series
Item: Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series
Author: Johan De Gelas

by Johan De Gelas on November 15, 2011 5:09 PM EST

106 Comments | Add A Comment

106 Comments

Conclusions

To help summarize the current situation in the server CPU market, we have drawn up a comparison table of the performance we have measured so far. We'll compare the new Interlagos Opteron 6276 against the outgoing Opteron 6174 as well as teh Xeon X5650.

	Opteron 6276 vs. Opteron 6174	Opteron 6276 vs. Xeon X5650
ESXi + Linux	-1%	-2%
ESXi + Windows	=	+3%
Cinebench	+2%	+9%
3DS Max 2012 (iRay)	-9% to + 4%	-10% to +3%
Maxwell Render	+4%	+6%
Blender	-4%	-24%
Encryption/Decryption AES	+265% / +275%	+2% / +7%
Encryption/Decryption Twofish/Serpent	+25% / +25%	31% / 46%
Compression/decompression	+10% / +10%	-33%/ +22%

Let us first discuss the virtualization scene, the most important market. Unfortunately, with the current power management in ESXi, we are not satisfied with the Performance/watt ratio of the Opteron 6276. The Xeon needs up to 25% less energy and performs slightly better. So if performance/watt is your first priority, we think the current Xeons are your best option.

The Opteron 6276 offers a better performance per dollar ratio. It delivers the performance of $1000 Xeon (X5650) at $800. Add to this that the G34 based servers are typically less expensive than their Intel LGA 1366 counterparts and the price bonus for the new Opteron grows. If performance/dollar is your first priority, we think the Opteron 6276 is an attractive alternative.

And then there is Windows Server 2008 R2. Typically we found that under heavy load (benchmarking at 85-100% CPU load) the power consumption was between 3% (integer) to 7% (FP) higher on the Opteron 6276 than on the Xeons and Opteron 6100, a lot better than under ESXi. Add to this the fact that the new Opteron energy usage at low load is excellent and you understand that we feel that there is no reason to go for the Opteron 6100 anymore. Again, AMD still understands that it should price its CPUs more attractive than the competition, so from the price/performance/watt point of view, the Opteron 6276 is a good cost effective alternative to the Xeon...on the condition that you enable the "high performance" policy and that AMD keeps the price delta the same in the coming months.

That is the good news. We cannot help but to feel a bit disappointed too. AMD promised us (in 2009/2010) that the Opteron 6200 would be significantly faster than the 6100: "unprecedented server performance gains". That is somewhat the case if you recompile your software with the latest and greatest optimized compiler as AMD's own SPEC CINT (+19%), CFP 2006 (+11%) and Linpack benchmarks (+32%) show.

One of the real advantages of a new processor architecture (prime examples where the K7 and K8) is if it performs well in older software too, without requiring a recompile. For some people of the HPC world, recompiling is acceptable and common, but for everybody else (that is probably >95% of the market!), it's best if existing binaries run faster. Administrators generally are not going to upgrade and recompile their software just to make better use of a new server CPU. Hopefully AMD's engineers have been looking into improving the legacy software performance of their latest chip the last few months, because it could use some help.

On the other side of the coin, it is clear that some of the excellent features of the new Opteron are not leveraged by the current software base. The deeper sleep and more advanced core gating is not working to its full potential, and the current operating systems frequently don't appear to know how to get the best from Turbo Core. The clock can be boosted by 39% when half of the cores are active, but an 18% boost was the best we saw (in a single-threaded app!). Simply turning the right knobs gave some tangible power savings (see ESXi) and some impressive performance improvements (see Windows Server 2008).

In short, we're going to need to do some additional testing and take this server out for another test drive, and we will. Stay tuned for a follow-up article as we investigate other options for improving performance.

Other Tests: TrueCrypt and 7-Zip

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

106 Comments

View All Comments

veri745 - Tuesday, November 15, 2011 - link
Shouldn't there be 8 x 2MB L2 for Interlagos instead of just 4x?
ClagMaster - Tuesday, November 15, 2011 - link
A core this complex in my opinion has not been optimized to its fullest potential.

Expect better performance when AMD introduces later steppings of this core with regard to power consumption and higher clock frequencies.

I have seen this in earlier AMD and Intel Cores, this new core will be the same.
C300fans - Tuesday, November 15, 2011 - link
1x i7 3960x or 2x Interlagos 6272? It is up to you. Money cow.
tech6 - Tuesday, November 15, 2011 - link
We have a bunch of 6100 in our data center and the performance has been disappointing. They do no better in single thread performance than old 73xx series Xeons. While this is OK for non-interactive stuff, it really isn't good enough for much else. These results just seem to confirm that the Bulldozer series of processors is over-hyped and that AMD is in danger of becoming irrelevant in the server, mobile and desktop market.
mino - Wednesday, November 16, 2011 - link
Actually, for interactive stuff (read VDI/Citrix/containers) core counts rule the roost.
duploxxx - Thursday, November 17, 2011 - link
this is exactly what should be fixed now with the turbo when set correct, btw the 73xx series were not that bad on single thread performance, it was wide scale virtualization and IO throughput which was awefull one these systems.
alpha754293 - Tuesday, November 15, 2011 - link
"Let us first discuss the virtualization scene, the most important market." Yea, I don't know about that.

Considering that they've already shipped like some half-a-million cores to the leading supercomputers of the world; where some of them are doing major processor upgrades with this new release; I wouldn't necessarily say that it's the most IMPORTANT market. Important, yes. But MOST important...I dunno.

Looking forward to more HPC benchmark results.

Also, you might have to play with thread schedule/process affinity (masks) to make it work right.

See the Techreport article.
JohanAnandtech - Thursday, November 17, 2011 - link
Are you talking about the Euler3D benchmark?

And yes, by any metric (revenue, servers sold) the virtualization market is the most important one for servers. Depending on the report 60 to 80% of the servers are bought to be virtualized.
alpha754293 - Tuesday, November 15, 2011 - link
Folks: chip-multithreading (CMT) is nothing new.

I would explain it this way: it is the physical, hardware manifestation of simultaneous multi-threading (SMT). Intel's HTT is SMT.

IBM's POWER (since I think as early as POWER4), Sun/Oracle/UltraDense's Niagara (UltraSPARC T-series), maybe even some of the older Crays were all CMT. (Don't quote me on the Crays though. MIPS died before CMT came out. API WOULD have had it probably IF there had been an EV8).

But the way I see it - remember what a CPU IS: it's a glorified calculator. Nothing else/more.

So, if it can't calculate, then it doesn't really do much good. (And I've yet to see an entirely integer-only program).

Doing integer math is fairly easy and straightforward. Doing floating-point math is a LOT harder. If you check the power consumption while solving a linear algebra equation using Gauss elimination (parallelized or using multiple instances of the solver); I can guarantee you that you will consume more power than if you were trying to run VMs.

So the way I see it, if a CPU is a glorified calculator, then a "core" is where/whatever the FPU is. Everything else is just ancillary and that point.
mino - Wednesday, November 16, 2011 - link
1) Power is NOT CMT, it allways was a VERY(even by RISC standards) wide SMT design.

2) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.

Bulldozer indeed IS a first of its kind. With all the associated advantages(future scaling) and disadvantages(alfa version).

There is a nice debate somewhere on cpu.arch groups from the original author(think 1990's) of the CMT concept.

Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series

Post Your Comment

106 Comments

View All Comments

veri745 - Tuesday, November 15, 2011 - link

ClagMaster - Tuesday, November 15, 2011 - link

C300fans - Tuesday, November 15, 2011 - link

tech6 - Tuesday, November 15, 2011 - link

mino - Wednesday, November 16, 2011 - link

duploxxx - Thursday, November 17, 2011 - link

alpha754293 - Tuesday, November 15, 2011 - link

JohanAnandtech - Thursday, November 17, 2011 - link

alpha754293 - Tuesday, November 15, 2011 - link

mino - Wednesday, November 16, 2011 - link

Log in

Don't have an account? Sign up now