Conclusions

To help summarize the current situation in the server CPU market, we have drawn up a comparison table of the performance we have measured so far. We'll compare the new Interlagos Opteron 6276 against the outgoing Opteron 6174 as well as teh Xeon X5650.

  Opteron 6276 vs.
Opteron 6174
Opteron 6276 vs.
Xeon X5650
ESXi + Linux -1% -2%
ESXi + Windows = +3%
Cinebench +2% +9%
3DS Max 2012 (iRay) -9% to + 4% -10% to +3%
Maxwell Render +4% +6%
Blender -4% -24%
Encryption/Decryption AES +265% / +275% +2% / +7%
Encryption/Decryption Twofish/Serpent +25% / +25% 31% / 46%
Compression/decompression +10% / +10% -33%/ +22%

Let us first discuss the virtualization scene, the most important market. Unfortunately, with the current power management in ESXi, we are not satisfied with the Performance/watt ratio of the Opteron 6276. The Xeon needs up to 25% less energy and performs slightly better. So if performance/watt is your first priority, we think the current Xeons are your best option.

The Opteron 6276 offers a better performance per dollar ratio. It delivers the performance of $1000 Xeon (X5650) at $800. Add to this that the G34 based servers are typically less expensive than their Intel LGA 1366 counterparts and the price bonus for the new Opteron grows. If performance/dollar is your first priority, we think the Opteron 6276 is an attractive alternative.

And then there is Windows Server 2008 R2. Typically we found that under heavy load (benchmarking at 85-100% CPU load) the power consumption was between 3% (integer) to 7% (FP) higher on the Opteron 6276 than on the Xeons and Opteron 6100, a lot better than under ESXi. Add to this the fact that the new Opteron energy usage at low load is excellent and you understand that we feel that there is no reason to go for the Opteron 6100 anymore. Again, AMD still understands that it should price its CPUs more attractive than the competition, so from the price/performance/watt point of view, the Opteron 6276 is a good cost effective alternative to the Xeon...on the condition that you enable the "high performance" policy and that AMD keeps the price delta the same in the coming months.

That is the good news. We cannot help but to feel a bit disappointed too. AMD promised us (in 2009/2010) that the Opteron 6200 would be significantly faster than the 6100: "unprecedented server performance gains". That is somewhat the case if you recompile your software with the latest and greatest optimized compiler as AMD's own SPEC CINT (+19%), CFP 2006 (+11%) and Linpack benchmarks (+32%) show.

One of the real advantages of a new processor architecture (prime examples where the K7 and K8) is if it performs well in older software too, without requiring a recompile. For some people of the HPC world, recompiling is acceptable and common, but for everybody else (that is probably >95% of the market!), it's best if existing binaries run faster. Administrators generally are not going to upgrade and recompile their software just to make better use of a new server CPU. Hopefully AMD's engineers have been looking into improving the legacy software performance of their latest chip the last few months, because it could use some help.

On the other side of the coin, it is clear that some of the excellent features of the new Opteron are not leveraged by the current software base. The deeper sleep and more advanced core gating is not working to its full potential, and the current operating systems frequently don't appear to know how to get the best from Turbo Core. The clock can be boosted by 39% when half of the cores are active, but an 18% boost was the best we saw (in a single-threaded app!). Simply turning the right knobs gave some tangible power savings (see ESXi) and some impressive performance improvements (see Windows Server 2008).

In short, we're going to need to do some additional testing and take this server out for another test drive, and we will. Stay tuned for a follow-up article as we investigate other options for improving performance.

Other Tests: TrueCrypt and 7-Zip
POST A COMMENT

106 Comments

View All Comments

  • JohanAnandtech - Thursday, November 17, 2011 - link


    1) Niagara is NOT a CMT. It is interleaved multipthreading with SMT on top.

    I haven't studied the latest Niagaras but the T1 was a fine grained mult-threaded CPU. It switched like a gatling gun between threads, and could not execute two threads at the same time.
    Reply
  • Penti - Thursday, November 17, 2011 - link

    SPARC T2 and onwards has additional ALU/AGU resources for a half physical two thread (four logically) solution per core with shared scheduler/pipeline if I remember correctly. That's not when CMT entered the picture according to SUN and Sun engineers any way. They regard the T1 as CMT as it's chip level. It's not just a CMP-chip any how. SMT is just running multiple threads on the cpus, CMP is working the same as SMP on separate sockets. It is not the same as AMDs solution however. Reply
  • Phylyp - Tuesday, November 15, 2011 - link

    Firstly, this was a very good article, with a lot of information, especially the bits about the differences between server and desktop workloads.

    Secondly, it does seem that you need to tune either the software (power management settings) or the chip (CMT) to get the best results from the processor. So, what advise is AMD offering its customers in terms of this tuning? I wouldn't want to pony up hundreds of dollars to have to then search the web for little titbits like switching off CMT in certain cases, or enabling High-performance power management.

    Thirdly, why is the BIOS reporting 32 MB of L2 cache instead of 8 MB?
    Reply
  • mino - Wednesday, November 16, 2011 - link

    No need for tuning - turbo is OS-independent (unless OS power management explicitly disables it aka Windows).
    Just disable the power management on the OS level (= high performance fro Windows) and you are good to go.
    Reply
  • JohanAnandtech - Thursday, November 17, 2011 - link

    The BIOS is simply wrong. It should have read 16 MB (2 orochi dies of 8 MB L3) Reply
  • gamoniac - Tuesday, November 15, 2011 - link

    Thanks, Johan. I run HyperV on Windows Server 2008 R2 SP1 on Phonem II X6 (my workstation) and have noticed the same CPU issue. I previously fixed it by disabling AMD's Cool'n'Quiet BIOS setting. After switching to high performance increase my overall power usage by 9 watts but corrected the CPU capping issue you mentioned.

    Yet another excellent article from AnandTech. Well done. This is how I don't mind spending 1 hour of my precious evening time.
    Reply
  • mczak - Tuesday, November 15, 2011 - link

    L1 data and instruction cache are swapped (instruction is 8x64kB 2-way data is 16x16kB 4-way)
    L2 is 8x2MB 16-way
    Reply
  • JohanAnandtech - Thursday, November 17, 2011 - link

    fixed. My apologies. Reply
  • hechacker1 - Tuesday, November 15, 2011 - link

    Curious if those syscalls for virtualization were improved at all. I remember Intel touting they improved the latency each generation.

    http://www.anandtech.com/show/2480/9

    I'm guessing it's worse considering the increased general cache latency? I'm not sure how the latency, or syscall, is related if at all.

    Just curious as when I do lots of compiling in a guest VM (Gentoo doing lots of checking of packages and hardware capabilities each compile) it tends to spend the majority of time in the kernel context.
    Reply
  • hechacker1 - Tuesday, November 15, 2011 - link

    Just also wanted to add: Before I had a VT-x enabled chip, it was unbearably slow to compile software in a guest VM. I remember measuring latencies of seconds for some operations.

    After getting an i7 920 with VT-x, it considerably improved, and most operations are in the hundred or so millisecond range (measured with latencytop).

    I'm not sure how the latests chips fare.
    Reply

Log in

Don't have an account? Sign up now