Bulldozer for Servers: Testing AMD's "Interlagos" Opteron 6200 Series
by Johan De Gelas on November 15, 2011 5:09 PM ESTMaxwell Render Suite
The developers of Maxwell Render Suite--Next Limit--aim at delivering a renderer that is physically correct and capable of simulating light exactly as it behaves in the real world. As a result their software has developed a reputation of being powerful but slow. And "powerful but slow" always attracts our interest as such software can be quite interesting benchmarks for the latest CPU platforms. Maxwell Render 2.6 was released less than two weeks ago, on November 2, and that's what we used.
We used the "Benchwell" benchmark, a scene with HDRI (high dynamic range imaging) developed by the user community. Note that we used the "30 day trial" version of Maxwell. We converted the time reported to render the scene in images rendered per hour to make it easier to interprete the numbers.
Since Magny-cours made its entrance, AMD did rather well in the rendering benchmarks and Maxwell is no difference. The Bulldozer based Opteron 6276 gives decent but hardly stunning performance: about 4% faster than the predecessor. Interestingly, the Maxwell renderer is not limited by SSE (Floating Point) performance. When we disable CMT, the AMD Opteron 6276 delivered only 17 frames per second. In other words the extra integer cluster delivers 44% higher performance. There is a good chance that the fact that you disable the second load/store unit by disabling CMT is the reason for the higher performance that the second integer cluster delivers.
Rendering: Blender 2.6.0
Blender is a very popular open source renderer with a large community. We tested with the 64-bit Windows version 2.6.0a. If you like, you can perform this benchmark very easily too. We used the metallic robot, a scene with rather complex lighting (reflections) and raytracing. To make the benchmark more repetitive, we changed the following parameters:
- The resolution was set to 2560x1600
- Antialias was set to 16
- We disabled compositing in post processing
- Tiles were set to 8x8 (X=8, Y=8)
- Threads was set to auto (one thread per CPU is set).
To make the results easier to read, we again converted the reported render time into images rendered per hour, so higher is better.
Last time we checked (Blender 2.5a2) in Windows, the Xeon X5670 was capable of 136 images per hour, while the Opteron 6174 did 113. So the Xeon was about 20% faster. Now the gap widens: the Xeon is now 36% faster. The interesting thing that we discovered is that the Opteron is quite a bit faster when benchmarked in linux. We will follow up with some Linux numbers in the next article. The Opteron 6276 is in this benchmark 4% slower than its older brother, again likely due in part to the newness of its architecture.
106 Comments
View All Comments
veri745 - Tuesday, November 15, 2011 - link
Shouldn't there be 8 x 2MB L2 for Interlagos instead of just 4x?ClagMaster - Tuesday, November 15, 2011 - link
A core this complex in my opinion has not been optimized to its fullest potential.Expect better performance when AMD introduces later steppings of this core with regard to power consumption and higher clock frequencies.
I have seen this in earlier AMD and Intel Cores, this new core will be the same.
C300fans - Tuesday, November 15, 2011 - link
1x i7 3960x or 2x Interlagos 6272? It is up to you. Money cow.tech6 - Tuesday, November 15, 2011 - link
We have a bunch of 6100 in our data center and the performance has been disappointing. They do no better in single thread performance than old 73xx series Xeons. While this is OK for non-interactive stuff, it really isn't good enough for much else. These results just seem to confirm that the Bulldozer series of processors is over-hyped and that AMD is in danger of becoming irrelevant in the server, mobile and desktop market.mino - Wednesday, November 16, 2011 - link
Actually, for interactive stuff (read VDI/Citrix/containers) core counts rule the roost.duploxxx - Thursday, November 17, 2011 - link
this is exactly what should be fixed now with the turbo when set correct, btw the 73xx series were not that bad on single thread performance, it was wide scale virtualization and IO throughput which was awefull one these systems.alpha754293 - Tuesday, November 15, 2011 - link
"Let us first discuss the virtualization scene, the most important market." Yea, I don't know about that.Considering that they've already shipped like some half-a-million cores to the leading supercomputers of the world; where some of them are doing major processor upgrades with this new release; I wouldn't necessarily say that it's the most IMPORTANT market. Important, yes. But MOST important...I dunno.
Looking forward to more HPC benchmark results.
Also, you might have to play with thread schedule/process affinity (masks) to make it work right.
See the Techreport article.
JohanAnandtech - Thursday, November 17, 2011 - link
Are you talking about the Euler3D benchmark?And yes, by any metric (revenue, servers sold) the virtualization market is the most important one for servers. Depending on the report 60 to 80% of the servers are bought to be virtualized.
alpha754293 - Tuesday, November 15, 2011 - link
Folks: chip-multithreading (CMT) is nothing new.I would explain it this way: it is the physical, hardware manifestation of simultaneous multi-threading (SMT). Intel's HTT is SMT.
IBM's POWER (since I think as early as POWER4), Sun/Oracle/UltraDense's Niagara (UltraSPARC T-series), maybe even some of the older Crays were all CMT. (Don't quote me on the Crays though. MIPS died before CMT came out. API WOULD have had it probably IF there had been an EV8).
But the way I see it - remember what a CPU IS: it's a glorified calculator. Nothing else/more.
So, if it can't calculate, then it doesn't really do much good. (And I've yet to see an entirely integer-only program).
Doing integer math is fairly easy and straightforward. Doing floating-point math is a LOT harder. If you check the power consumption while solving a linear algebra equation using Gauss elimination (parallelized or using multiple instances of the solver); I can guarantee you that you will consume more power than if you were trying to run VMs.
So the way I see it, if a CPU is a glorified calculator, then a "core" is where/whatever the FPU is. Everything else is just ancillary and that point.
mino - Wednesday, November 16, 2011 - link
1) Power is NOT CMT, it allways was a VERY(even by RISC standards) wide SMT design.2) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.
Bulldozer indeed IS a first of its kind. With all the associated advantages(future scaling) and disadvantages(alfa version).
There is a nice debate somewhere on cpu.arch groups from the original author(think 1990's) of the CMT concept.