The Opteron 6276: a closer look

Name: The Opteron 6276: a closer look
Item: The Opteron 6276: a closer look
Author: Johan De Gelas

by Johan De Gelas on February 9, 2012 6:00 AM EST

46 Comments | Add A Comment

46 Comments

Threading Tricks or Not?

AMD claimed more than once that Clustered Multi Threading (CMT) is a much more efficient way to crunch through server applications than Simultaneous Multi Threading (SMT), aka Hyper-Threading (HTT). We wanted to check this, so for our next tests we disabled and enabled CMT and HTT. Below you can see how we disabled CMT in the Supermicro BIOS Setup:

First, we look at raw throughput (TP in the table). All measurements were done with the "High Performance" power policy.

Concurrency	CMT	No CMT	TP Increase CMT vs. No CMT	HTT	No HTT	TP Increase HTT vs. No HTT
25	24	24	100%	24	25	100%
40	39	39	100%	39	39	100%
80	77	77	100%	78	78	100%
100	96	96	100%	97	98	100%
125	120	118	101%	122	122	100%
200	189	183	103%	193	192	100%
300	275	252	109%	282	278	102%
350	312	269	116%	321	315	102%
400	344	276	124%	350	339	103%
500	380	281	135%	392	367	107%
600	390	286	136%	402	372	108%
800	389	285	137%	405	379	107%

Only at 300 concurrent users (or queries per second) do the CPUs start to get close their maximum throughput (around 400 q/s). At around that point is where the multi-threading technologies start to pay off.

It is interesting to note that the average IPC of one MS SQL Server thread is about 0.95-1.0 (measured with Intel vTune). That is low enough to have quite a few unused execution slots in the Xeon, which is ideal for Hyper-Threading. However, Hyper-Threading is only capable of delivering a 3-8% performance boost.

On the AMD Opteron we measured an IPC of 0.72-0.8 (measured with AMD CodeAnalyst). That should also be more than low enough to allow two threads to pass through the shared front-end without obstructing each other. While it is not earth shattering, CMT does not disappoint: we measure a very solid 24-37% increase in throughput. Now let's look at the response times (RT in the table).

Concurrency	CMT	No CMT	RT Increase (CMT vs. No CMT)	HTT	No HTT	RT Increase HTT vs. No HTT
25	29	28.5	2%*	20.4	18.9	8%*
40	31.1	32.1	-3% *	21.7	20.3	7%*
80	36	39	-9%*	24	23	2%*
100	39	46	-14%	28	25	13%
125	46	57	-20%	28	28	0%
200	59	92	-35%	38	40	-4%
300	92	189	-51%	62	79	-21%
350	121	303	-60%	91	112	-19%
400	164	452	-64%	143	182	-21%
500	320	788	-59%	278	335	-17%
600	545	1111	-51%	498	621	-20%
800	1003	1825	-45%	989	1120	-12%

* Difference between results is within error margin and thus unreliable.

The SQL server software engine shows excellent scaling and is ideal for CMT and Hyper-Threading. CMT seems to reduce the response time even at low loads. This is not the case for Hyper-Threading, but we must be careful to interpret the results. At the lower concurrencies, the response times measured are so small that they fall within the error margin. A 21.7 ms response time is indeed 7% more than a 20.3 ms response time, but the error margin of these measurements is much higher at these very low concurrencies than at the higher concurrencies, so take these percentages with a grain of salt.

What we can say is that Hyper-Threading only starts to reduce the response times when the CPU goes beyond 50% load. CMT reduces the response times much more than HTT, but the non-CMT response times are already twice (and more) as high as the non-HTT response times.

In the end, both multi-threading technologies improve performance. CMT seems to be quite a bit more efficient than SMT; however, it must be said that the Xeon with HTT disabled already offers response times that are much lower than the Opteron without CMT. So you could also argue the other way around: the Xeon already does a very good job of filling its pipelines (IPC of 1 versus 0.72), and there is less headroom available.

MS SQL Server 2008 R2 at Low Load MS SQL Server 2008 Power Analysis

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

46 Comments

View All Comments

Scali - Friday, February 10, 2012 - link
No, because if you read the ENTIRE benchmark configuration page, you'd see that all the AMD systems had 2 CPUs as well.
Scali - Saturday, February 11, 2012 - link
Oh, and while we're at it... the Intel system had only 48 GB of 1333 memory, where the AMDs had 64 GB of 1600 memory.
(Yes, Bulldozer is THAT bad)
PixyMisa - Saturday, February 11, 2012 - link
Or rather, MySQL scales that poorly.

What we can tell from this article is that if you want to run a single instance of MySQL as fast as possible and don't want to get involved with subtle performance tuning options, the Opteron 6276 is not the way to go.

For other workloads, the result can be very different.
JohanAnandtech - Saturday, February 11, 2012 - link
Feel free to send me a suggestion on how to setup another workload. We know how to tune MySQL. So far none of these settings helped. The issue discussed (spinlocks) can not be easily solved.
Scali - Saturday, February 11, 2012 - link
I'm not sure if you bothered to read the entire article, because MySQL was not the only database that was tested.
There were also various tests with MS SQL, and again, Interlagos failed to impress compared to both Magny Cours-based Opterons and the Xeon system.
JohanAnandtech - Saturday, February 11, 2012 - link
The clockspeed of the RAM has a small impact here. 64 vs 48 GB does not matter.
Scali - Saturday, February 11, 2012 - link
Not saying it does... Just pointing out that the AMD system had more impressive specs on paper, yet failed to deliver the performance.
JohanAnandtech - Saturday, February 11, 2012 - link
Again, it is not CMT that makes AMD's transistor count explode but the combination of 2x L3 caches and 4x 2M L2-caches. You can argue that AMD made poor choices concerning caches, but again it is not CMT that made the transistor count grow.

I am not arguing that AMD's performance/billion transistors is great.
Scali - Saturday, February 11, 2012 - link
I think you are looking at it from the wrong direction.
You are trying to compare SMT and CMT, but contrary to what AMD wants to make everyone believe, they are not very similar technologies.
You see, SMT enables two threads to run on one physical core, without adding any kind of execution units, cache or anything. It is little more than some extra logic so that the OoOE buffers can handle two thread contexts at the same time, rather than one.

So the thing with SMT is that it REDUCES the transistorcount required for running two threads. By nearly 100%.
CMT on the other hand does not reduce the transistorcount nearly as much. So if you are merely looking at an 'exposion of transistor count', you are missing the point of what SMT really does.

Other than that, your argument is still flawed. Even an 8-thread Bulldozer has a higher transistor count than the 12-thread Xeon here. It's not just cache. CMT just doesn't pack as many threads per transistor as SMT does... and to make matters worse, CMT also has a negative impact on single-threaded performance (which again, if you are looking at it from the wrong direction, may look like better scaling in threadcount... but effectively, both with low and high threadcounts, the Xeon is the better option... and this is just a midrange Xeon compared to a high-end Interlagos. The Xeon can scale to higher clockspeeds, improving both single-threaded and multithreaded performance for the same transistorcount).

So what your article says is basically this:
CMT, which is nearly the same as having full cores, especially in integer-only tasks such as databases, since you have two actual integer cores, has nearly the same scaling in threadcount as conventional multicore CPUs.
Which has a very high 'duh'-factor, since it pretty much *is* conventional multicore.
It does not reduce transistorcount, nor does it improve performance, so what's the point?
JohanAnandtech - Friday, February 10, 2012 - link
Semantics :-). I can call it a core with CMT, or a module with 2 cores. Both are valid.

The Opteron 6276: a closer look

Post Your Comment

46 Comments

View All Comments

Scali - Friday, February 10, 2012 - link

Scali - Saturday, February 11, 2012 - link

PixyMisa - Saturday, February 11, 2012 - link

JohanAnandtech - Saturday, February 11, 2012 - link

Scali - Saturday, February 11, 2012 - link

JohanAnandtech - Saturday, February 11, 2012 - link

Scali - Saturday, February 11, 2012 - link

JohanAnandtech - Saturday, February 11, 2012 - link

Scali - Saturday, February 11, 2012 - link

JohanAnandtech - Friday, February 10, 2012 - link

Log in

Don't have an account? Sign up now