Threading Tricks or Not?
AMD claimed more than once that Clustered Multi Threading (CMT) is a much more efficient way to crunch through server applications than Simultaneous Multi Threading (SMT), aka Hyper-Threading (HTT). We wanted to check this, so for our next tests we disabled and enabled CMT and HTT. Below you can see how we disabled CMT in the Supermicro BIOS Setup:
First, we look at raw throughput (TP in the table). All measurements were done with the "High Performance" power policy.
| Concurrency | CMT | No CMT |
TP Increase CMT vs. No CMT |
HTT | No HTT |
TP Increase HTT vs. No HTT |
| 25 | 24 | 24 | 100% | 24 | 25 | 100% |
| 40 | 39 | 39 | 100% | 39 | 39 | 100% |
| 80 | 77 | 77 | 100% | 78 | 78 | 100% |
| 100 | 96 | 96 | 100% | 97 | 98 | 100% |
| 125 | 120 | 118 | 101% | 122 | 122 | 100% |
| 200 | 189 | 183 | 103% | 193 | 192 | 100% |
| 300 | 275 | 252 | 109% | 282 | 278 | 102% |
| 350 | 312 | 269 | 116% | 321 | 315 | 102% |
| 400 | 344 | 276 | 124% | 350 | 339 | 103% |
| 500 | 380 | 281 | 135% | 392 | 367 | 107% |
| 600 | 390 | 286 | 136% | 402 | 372 | 108% |
| 800 | 389 | 285 | 137% | 405 | 379 | 107% |
Only at 300 concurrent users (or queries per second) do the CPUs start to get close their maximum throughput (around 400 q/s). At around that point is where the multi-threading technologies start to pay off.
It is interesting to note that the average IPC of one MS SQL Server thread is about 0.95-1.0 (measured with Intel vTune). That is low enough to have quite a few unused execution slots in the Xeon, which is ideal for Hyper-Threading. However, Hyper-Threading is only capable of delivering a 3-8% performance boost.
On the AMD Opteron we measured an IPC of 0.72-0.8 (measured with AMD CodeAnalyst). That should also be more than low enough to allow two threads to pass through the shared front-end without obstructing each other. While it is not earth shattering, CMT does not disappoint: we measure a very solid 24-37% increase in throughput. Now let's look at the response times (RT in the table).
| Concurrency | CMT | No CMT |
RT Increase (CMT vs. No CMT) |
HTT | No HTT |
RT Increase HTT vs. No HTT |
| 25 | 29 | 28.5 | 2%* | 20.4 | 18.9 | 8%* |
| 40 | 31.1 | 32.1 | -3% * | 21.7 | 20.3 | 7%* |
| 80 | 36 | 39 | -9%* | 24 | 23 | 2%* |
| 100 | 39 | 46 | -14% | 28 | 25 | 13% |
| 125 | 46 | 57 | -20% | 28 | 28 | 0% |
| 200 | 59 | 92 | -35% | 38 | 40 | -4% |
| 300 | 92 | 189 | -51% | 62 | 79 | -21% |
| 350 | 121 | 303 | -60% | 91 | 112 | -19% |
| 400 | 164 | 452 | -64% | 143 | 182 | -21% |
| 500 | 320 | 788 | -59% | 278 | 335 | -17% |
| 600 | 545 | 1111 | -51% | 498 | 621 | -20% |
| 800 | 1003 | 1825 | -45% | 989 | 1120 | -12% |
* Difference between results is within error margin and thus unreliable.
The SQL server software engine shows excellent scaling and is ideal for CMT and Hyper-Threading. CMT seems to reduce the response time even at low loads. This is not the case for Hyper-Threading, but we must be careful to interpret the results. At the lower concurrencies, the response times measured are so small that they fall within the error margin. A 21.7 ms response time is indeed 7% more than a 20.3 ms response time, but the error margin of these measurements is much higher at these very low concurrencies than at the higher concurrencies, so take these percentages with a grain of salt.
What we can say is that Hyper-Threading only starts to reduce the response times when the CPU goes beyond 50% load. CMT reduces the response times much more than HTT, but the non-CMT response times are already twice (and more) as high as the non-HTT response times.
In the end, both multi-threading technologies improve performance. CMT seems to be quite a bit more efficient than SMT; however, it must be said that the Xeon with HTT disabled already offers response times that are much lower than the Opteron without CMT. So you could also argue the other way around: the Xeon already does a very good job of filling its pipelines (IPC of 1 versus 0.72), and there is less headroom available.