The Opteron 6276: a closer look

Name: The Opteron 6276: a closer look
Item: The Opteron 6276: a closer look
Author: Johan De Gelas

by Johan De Gelas on February 9, 2012 6:00 AM EST

46 Comments | Add A Comment

46 Comments

Making Sense of the New Interlagos Opteron

This second look at the current Xeon and Opteron platforms added OLAP, ERP, and OLTP power and performance data. Combine this with our first review and the other publicly available benchmark and power data and we should be able to evaluate the new Opteron 6200 more accurately. So in which situations does the Opteron 6200 make sense? We'll start with the perspective of the server buyer.

Positioning the Opteron 6276

First let's look at the pricing. The Opteron 6276 is priced similar to an E5649, which is clocked 5% lower than the X5650 we tested. If you calculate the price of a Dell R710 with the Xeon E5649 and compare it with a Dell R715 with the Opteron 6276 with similar specs, you end up more or less the same acquisition cost. However, the E5649 is an 80W TDP and should thus consume a bit less power. That is why we argued that the Opteron 6276 should at least offer a price/performance bonus and perform like an X5650. The X5650 is roughly $220 more expensive, so you end up with the dual socket Xeon system costing about $440 more. On a fully speced server, that is about a 10% price difference.

The Opteron 6276 offered similar performance to the Xeon in our MySQL OLTP benchmarks. If we take into account the hard to quantify TPC-C benchmarks, the Opteron 6276 offers equal to slightly better OLTP performance. So for midrange OLTP systems, the Opteron 6276 makes sense if the higher core count does not increase your software license. The same is true for low end ERP systems.

When we look at the higher end OLTP and the non low end ERP market, the cost of buying server hardware is lost in the noise. The Westmere-EX with its higher thread count and performance will be the top choice in that case: higher thread count, better RAS, and a higher number of DIMM slots.

AMD also lost the low end OLAP market: the Xeon offers a (far) superior performance/watt ratio on mySQL. In the midrange and high end OLAP market, the software costs of for example SQL Server increase the importance of performance and performance/watt and make server hardware costs a minor issue. Especially the "performance first" OLAP market will be dominated by the Xeon, which can offer up to 3.06GHz SKUs without increasing the TDP.

The strong HPC performance and the low price continue to make the Opteron a very attractive platform for HPC applications. While we haven't tested this ourself, even Intel admits that they are "challenged in that area".

The Xeon E5, aka Sandy Bridge EP

There is little doubt that the Xeon E5 will be a serious threat for the new Opteron. The Xeon E5 offers for example twice the peak AVX throughput. Add to this the fact that the Xeon will get a quad channel DDR3-1600 memory interface and you know that the Opteron's leadership in HPC applications is going to be challenged. Luckily for AMD, the 8-core top models of the Xeon E5 will not be cheap according to leaked price tables. Much will depend on how the 6-core midrange models fare against the Opteron.

The Hardware Enthusiast Point of View

The disappointing results in the non-server applications is easy to explain as the architecture is clearly more targeted at server workloads. However, the server workloads show a very blurry picture as well. Looking at the server performance results of the new Opteron is nothing less than very confusing. It can be very capable in some applications (OLTP, ERP, HPC) but disappointing in others (OLAP, Rendering). The same is true for the performance/watt results. And of course, if you name a new architecture Bulldozer and you target it at the server space, you expect something better than "similar to a midrange Xeon".

It is clear to us that quite a few things are suboptimal in the first implementation of this new AMD architecture. For example, the second integer cluster (CMT) is doing an excellent job. If you make sure the front end is working at full speed, we measured a solid 70 to 90% increase in performance enabling CMT (we will give more detail in our next article). CMT works superbly and always gives better results than SMT... until you end up with heavy locking contention issues. That indicates that something goes wrong in the front end. The software applications that do not scale well could be served well with low core count "Valencia" Opteron 4200s, but when we write this, the best AMD could offer was a 3.3GHz 6-core. The architecture is clearly capable of reaching very high clockspeeds, but we saw very little performance increase from Turbo Core.

What we end up with then is more questions. That means it's time for us to do some deep profiling and see if we can get some more answers. Until then, we hope you've enjoyed our second round of Interlagos benchmarking, and as always, comments and feedback on our testing methods are welcome.

SAP S&D Benchmark

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

46 Comments

View All Comments

Jaguar36 - Thursday, February 9, 2012 - link
I too would love to see more HPC related benchmarks. Finite Element Analysis (FEA) or Computational Fluid Dynamic (CFD) programs scale very well with increased core count, and are something that is highly CPU dependent. I've found it very difficult to find good performance information for CPUs under this load.

I'd be happy to help out developing some benchmark problems if need be.
dcollins - Thursday, February 9, 2012 - link
These would indeed be interesting benchmarks to see. These workloads are very floating point heavy so I imagine that the new Opterons will perform poorly. 16 modules won't matter when they only have 8 FPUs. Of course, I am speculating here.

Going forward, these types of workloads should be moving toward GPUs rather than CPUs, but I understand the burden of legacy software.
silverblue - Friday, February 10, 2012 - link
They have 8 FPUs capable of 16x 128-bit or 8x 256-bit instructions per clock. On that level, it shouldn't be at a disadvantage.
bnolsen - Sunday, February 12, 2012 - link
GPUs are pretty poor for general purpose HPC. If someone wants to fork out tons of $$$ to hack their problem onto a gpu (or they get lucky and somehow their problem fits a gpu well) that's fine but not really smart considering how short release cycles are, etc.

I have access to a quad socket magny cours built mid last year. In december I put together a sandy-e 3930k portable demo system. Needless to say the 3930k had at least 10% more throughput on heavy processing tasks (enabling all intel sse dropped in another 15%). It also handily beat our dual xeon nehalem development system as well. With mixed IO and cpu heavy loads the advantage dropped but was still there.

I'd love to be able to test these new amds just to see but its been much easier telling customers to stick with intel, especially with this new amd cpu.
MySchizoBuddy - Friday, March 9, 2012 - link
"GPUs are pretty poor for general purpose HPC."
tell that to the #2, #4 and #5 most powerful supercomputers in the world. I'm sure no one told them.
hooflung - Thursday, February 9, 2012 - link
I think I'd rather see some benchmarks based around Java EE6 and an appropriate container such as Jboss AS 7. I'd also like to see some Java 7 application benchmarks ( server oriented ).

I'd also like to see some custom Java benchmarks using Akka library so we can see some Software transactional memory benchmarks. Possibly a node.js benchmark as well to see if these new technologies can scale.

What I've seen here is that the enterprise circa 2006 has a love hate relationship with AMD. I'd also like to see some benchmarks of the Intel vs AMD vs SPARC T4 in both virtualized and non virtualized J2EE environments. But this article does have some really interesting data.
jibberegg - Thursday, February 9, 2012 - link
Thanks for the great and informative article! Minor typo for you...

"Using a PDU for accurate power measurements might same pretty insane"
should be
"Using a PDU for accurate power measurements might seem pretty insane"
phoenix_rizzen - Thursday, February 9, 2012 - link
MySQL has to be the absolute worst possible choice for testing multi-core CPUs (as evidenced in this review). It just doesn't scale beyond 4-8 cores, depending on CPU choice and MySQL version.

A much better choice for "alternative SQL database" would be PostgreSQL. That at least scales to 32 cores (possibly more, but I've never seen a benchmark beyond 32). Not to mention it's a much better RDBMS than MySQL.

MySQL really is only a toy. The fact that many large websites run on top of MySQL doesn't change that fact.
PixyMisa - Friday, February 10, 2012 - link
This is a very good point. While it can be done, it's very fiddly to get MySQL to scale to many CPUs, much simpler to just shard the database and run multiple instances of MySQL. (And replication is single-threaded anyway, so if you manage to get one MySQL instance running with very high inserts/updates, you'll find replication can't keep up.)

Same goes for MongoDB and, of course, Redis, which is single-threaded.

We have ten large Opteron servers running CentOS 6, five 32-core and five 48-core, and all our applications are sharded and virtualised at a point where the individual nodes still have room to scale. Since our applications are too large to run un-sharded anyway, and the e7 Xeons cost an absolute fortune, the Opteron was the way to go.

The only back-end software we've found that scales smoothly to large numbers of CPUs is written in Erlang - RabbitMQ, CouchDB, and Riak. We love RabbitMQ and use it everywhere; unfortunately, while CouchDB and Riak scale very nicely, they start out pretty darn slow.

We actually ran a couple of 40-core e7 Xeon systems for a few months, and they had some pretty serious performance problems for certain workloads too - where the same workload worked fine on either a dual X5670 or a quad Opteron. Working out why things don't scale is often more work than just fixing them so that they do; sometimes the only practical thing to do is know what platform works for what workload, and use the right hardware for the task at hand.

Having said all that, the MySQL results are still disappointing.
JohanAnandtech - Friday, February 10, 2012 - link
"It just doesn't scale beyond 4-8 cores, depending on CPU choice and MySQL version."

You missed something: it does scale beyond 12 Xeon cores, and I estimate that scaling won't be bad until you go beyond 24 cores. I don't see why the current implementation of MySQL should be called a toy.

PostgreSQL: interesting several readers have told me this too. I hope it is true, because last time we test PostgreSQL was worse than the current MySQL.

The Opteron 6276: a closer look

Post Your Comment

46 Comments

View All Comments

Jaguar36 - Thursday, February 9, 2012 - link

dcollins - Thursday, February 9, 2012 - link

silverblue - Friday, February 10, 2012 - link

bnolsen - Sunday, February 12, 2012 - link

MySchizoBuddy - Friday, March 9, 2012 - link

hooflung - Thursday, February 9, 2012 - link

jibberegg - Thursday, February 9, 2012 - link

phoenix_rizzen - Thursday, February 9, 2012 - link

PixyMisa - Friday, February 10, 2012 - link

JohanAnandtech - Friday, February 10, 2012 - link

Log in

Don't have an account? Sign up now