Conclusions: SMT On

I wasn’t too sure what we were going to see when I started this testing. I know the theory behind implementing SMT, and what it means for the instruction streams having access to core resources, and how cores that have SMT in mind from the start are built differently to cores that are just one thread per core. But theory only gets you so far. Aside from all the forum messages over the years talking about performance gains/losses when a product has SMT enabled, and the few demonstrations of server processors running focused workloads with SMT disabled, it is actually worth testing on real workloads to find if there is a difference at all.

Results Overview

In our testing, we covered three areas: Single Thread, Multi-Thread, and Gaming Performance.

In single threaded workloads, where each thread has access to all of the resources in a single core, we saw no change in performance when SMT is enabled – all of our workloads were within 1% either side.

In multi-threaded workloads, we saw an average uplift in performance of +22% when SMT was enabled. Most of our tests scored a +5% to a +35% gain in performance. A couple of workloads scored worse, mostly due to resource contention having so many threads in play – the limit here is memory bandwidth per thread. One workload scored +60%, a computational workload with little-to-no memory requirements; this workload scored even better in AVX2 mode, showing that there is still some bottleneck that gets alleviated with fewer instructions.

On gaming, overall there was no difference between SMT On and SMT Off, however some games may show differences in CPU limited scenarios. Deus Ex was down almost 10% when CPU limited, however Borderlands 3 was up almost 10%. As we moved to a more GPU limited scenario, those discrepancies were neutralized, with a few games still gaining single-digit percentage points improvement with SMT enabled.

For power and performance, we tested two examples where performance at two threads per core was either saw no improvement (Agisoft), or significant improvement (3DPMavx). In both cases, SMT Off mode (1 thread/core) ran at higher temperatures and higher frequencies. For the benchmark per performance was about equal, the power consumed was a couple of percentage points lower when running one thread per core. For the benchmark were running two threads per core has a big performance increase, the power in that mode was also lower, and there was a significant +91% performance per watt improvement by enabling SMT.

What Does This Mean?

I mentioned at the beginning of the article that SMT performance gains can be seen from two different viewpoints.

The first is that if SMT enables more performance, then it’s an easy switch to use, and some users consider that if you can get perfect scaling, then if SMT is an effective design.

The second is that if SMT enables too much performance, then it’s indicative of a bad core design. If you can get perfect scaling with SMT2, then perhaps something is wrong about the design of the core and the bottleneck is quite bad.

Having poor SMT scaling doesn’t always mean that the SMT is badly implemented – it can also imply that the core design is very good. If an effective SMT design can be interpreted as a poor core design, then it’s quite easy to see that vendors can’t have it both ways. Every core design has deficiencies (that much is true), and both Intel and AMD will tell its users that SMT enables the system to pick up extra bits of performance where workloads can take advantage of it, and for real-world use cases, there are very few downsides.

We’ve known for many years that having two threads per core is not the same as having two cores – in a worst case scenario, there is some performance regression as more threads try and fight for cache space, but those use cases seem to be highly specialized for HPC and Supercomputer-like tasks. SMT in the real world fills in the gaps where gaps are available, and this occurs mostly in heavily multi-threaded applications with no cache contention. In the best case, SMT offers a sizeable performance per watt increase. But on average, there are small (+22% on MT) gains to be had, and gaming performance isn’t disturbed, so it is worth keeping enabled on Zen 3.

 
Power Consumption, Temperature
Comments Locked

126 Comments

View All Comments

  • abufrejoval - Thursday, December 3, 2020 - link

    It's hard to imagine a transistor defect that would break *only* SMT. As you say all non-SMT chips are really SMT chips internally and the decision to disable SMT doesn't really result in huge chunks of transistors going dark (the potential target area for physical defects).

    I'd say most of the SMT vs. no-SMT decisions on individual CPUs are binning related: SMT can create significantly more heat because there is less idle which allows the chip to cool. So if you have a chip with higher resistance in critical vias and require higher voltage to function, you need to sacrifice clocks, TDP or utilization (and permutations).
  • leexgx - Saturday, December 5, 2020 - link

    With HT off I have definitely noticed less smoothness windows, as with HT it can keep the cpu active when a thread is slightly stuck
  • iranterres - Thursday, December 3, 2020 - link

    Why are people still testing SMT in 2020? Cache coherency and hierarchy design is mature enough to offset the possible instruction bottleneck issues. I don't even know the purpose of this article at all... Anyways, perhaps fallng back to 2008? Come on...
  • quadibloc - Friday, December 4, 2020 - link

    Well, instead of testing the concept of SMT, which has been around for a while, perhaps one could think of it as testing the implementation of SMT found on the chips we can get in 2020.
  • eastcoast_pete - Friday, December 4, 2020 - link

    Thanks Ian! I always thought of SMT as a way of using whatever compute capacity a core has, but isn't being used in the moment. Hence it's efficient if many tasks need doing that each don't take a full core most of the time. However, that hits a snag if the cores get really busy. Hence (for desktop or laptop), 6 or 8 real cores are usually better than 4 cores that pretend to be 8.
  • AntonErtl - Friday, December 4, 2020 - link

    I found the "Is SMT an good thing" discussion (and later discussion of the same topics) strange, because it seemed to take the POV of someone who wants to optimize some efficiency or utilization metric of someone who can choose the number of resources in the core. If you are in that situation, then the take of the EV8 designers was: we build a wide machine so that single-threaded applications can run fast, even though we know the wideness leads to low utilization; we also add SMT so that multi-threaded applications can increase utilization. Now, 20 years later, such wide cores become reality, although interestingly Apple and ARM do not add SMT.

    Anyway, buyers and users of off-the-shelf CPUs are not in that situation, and for them the questions are: For buyers: How much benefit does the SMT capabilty provide, and is it worth the extra money? For users: Does disabling SMT on this SMT-capable CPU increase the performance or the efficiency?

    The article shows that the answers to these questions depend on the application (although for the Zen3 CPUs available now the buyer's question does not pose itself).

    It would be interesting to see whether the wider Zen3 design gives significantly better SMT performance than Zen or Zen2 (and maybe also a comparison with Intel), but that would require also testing these CPUs.

    I did not find it surprising that the 5950X runs into the power limit with and without SMT. The resulting clock rates are mentioned in the text, but might be more interesting graphically than the temperature. What might also be interesting is the power consumed at the same clock frequency (maybe with fewer active cores and/or the clock locked at some lower clock rate).

    If SMT is so efficient (+91%) for 3DPMavx, why does the graphics only show a small difference?
  • Bensam123 - Friday, December 4, 2020 - link

    Anand, while I value your in depth articles you guys really need to drop the 95th percentile frame times and get on board with 1% and .1% lows. What disrupts gaming the most is the hiccups, not looking at a statistically smooth chart. SMT/HT effects these THE most, especially in heavily single threaded games. If you aren't testing what it influences, why test it at all? Youtube reviews are also having problems with tests that don't reflect real world scenarios as well. Sometimes it's a lot more disagreeable then others.

    Completely invalid testing methodology at this point.

    My advice based on my own testing. You turn off SMT/HT except in scenarios in which you become CPU bound, across all cores, not one. This improved .1 and 1% frame time... IE stutters. You turn it on when you reach a point of 90%+ utilization as it helps and a lot when your CPU is maxed out. Generally speaking <6 and soon to be 8 cores should always have it on.

    You didn't even test where this helps the most and that's low end CPUs vs high end CPUs where you find the Windows scheduler messes things up.

    Also if you're testing this on your own, always turn it off in the bios. If you use something like process lasso or manually change affinity, windows will still put protected services and process onto those extra virtual cores causing contention issues that lead to the stuttering.

    Most obvious games that get a benefit from SMT/HT off are heavily single threaded games, such as MOBAS.
  • Gloryholle - Friday, December 4, 2020 - link

    Testing Zen3 with 3200CL16?
  • peevee - Friday, December 4, 2020 - link

    "Most modern processors, when in SMT-enabled mode, if they are running a single instruction stream, will operate as if in SMT-off mode and have full access to resources."

    Which would have access to the whole microinstruction cache (L0I) in SMT mode?
  • Arbie - Friday, December 4, 2020 - link

    Another excellent AT article, which happens to hit my level of knowledge and interest; thanks!

Log in

Don't have an account? Sign up now