Conclusions: SMT On

I wasn’t too sure what we were going to see when I started this testing. I know the theory behind implementing SMT, and what it means for the instruction streams having access to core resources, and how cores that have SMT in mind from the start are built differently to cores that are just one thread per core. But theory only gets you so far. Aside from all the forum messages over the years talking about performance gains/losses when a product has SMT enabled, and the few demonstrations of server processors running focused workloads with SMT disabled, it is actually worth testing on real workloads to find if there is a difference at all.

Results Overview

In our testing, we covered three areas: Single Thread, Multi-Thread, and Gaming Performance.

In single threaded workloads, where each thread has access to all of the resources in a single core, we saw no change in performance when SMT is enabled – all of our workloads were within 1% either side.

In multi-threaded workloads, we saw an average uplift in performance of +22% when SMT was enabled. Most of our tests scored a +5% to a +35% gain in performance. A couple of workloads scored worse, mostly due to resource contention having so many threads in play – the limit here is memory bandwidth per thread. One workload scored +60%, a computational workload with little-to-no memory requirements; this workload scored even better in AVX2 mode, showing that there is still some bottleneck that gets alleviated with fewer instructions.

On gaming, overall there was no difference between SMT On and SMT Off, however some games may show differences in CPU limited scenarios. Deus Ex was down almost 10% when CPU limited, however Borderlands 3 was up almost 10%. As we moved to a more GPU limited scenario, those discrepancies were neutralized, with a few games still gaining single-digit percentage points improvement with SMT enabled.

For power and performance, we tested two examples where performance at two threads per core was either saw no improvement (Agisoft), or significant improvement (3DPMavx). In both cases, SMT Off mode (1 thread/core) ran at higher temperatures and higher frequencies. For the benchmark per performance was about equal, the power consumed was a couple of percentage points lower when running one thread per core. For the benchmark were running two threads per core has a big performance increase, the power in that mode was also lower, and there was a significant +91% performance per watt improvement by enabling SMT.

What Does This Mean?

I mentioned at the beginning of the article that SMT performance gains can be seen from two different viewpoints.

The first is that if SMT enables more performance, then it’s an easy switch to use, and some users consider that if you can get perfect scaling, then if SMT is an effective design.

The second is that if SMT enables too much performance, then it’s indicative of a bad core design. If you can get perfect scaling with SMT2, then perhaps something is wrong about the design of the core and the bottleneck is quite bad.

Having poor SMT scaling doesn’t always mean that the SMT is badly implemented – it can also imply that the core design is very good. If an effective SMT design can be interpreted as a poor core design, then it’s quite easy to see that vendors can’t have it both ways. Every core design has deficiencies (that much is true), and both Intel and AMD will tell its users that SMT enables the system to pick up extra bits of performance where workloads can take advantage of it, and for real-world use cases, there are very few downsides.

We’ve known for many years that having two threads per core is not the same as having two cores – in a worst case scenario, there is some performance regression as more threads try and fight for cache space, but those use cases seem to be highly specialized for HPC and Supercomputer-like tasks. SMT in the real world fills in the gaps where gaps are available, and this occurs mostly in heavily multi-threaded applications with no cache contention. In the best case, SMT offers a sizeable performance per watt increase. But on average, there are small (+22% on MT) gains to be had, and gaming performance isn’t disturbed, so it is worth keeping enabled on Zen 3.

 
Power Consumption, Temperature
Comments Locked

126 Comments

View All Comments

  • warpuck - Friday, December 25, 2020 - link

    With a R 5 1600 it makes about 5-6% difference in usable clock speed. (200-250 Mhz) and also with temperature. With a R 7 3800X it is not as noticeable.
    If you reduce the background operations while gaming with either CPU.
    I don't know about recent game releases but older ones only use 2-4 cores (threads) so clocking the R 5 1600 @ 3750 (SMT on) Mhz vs 3975 Mhz (SMT off) does make a difference on frame rates
  • whatthe123 - Saturday, December 5, 2020 - link

    it doesn't make much of a difference unless you go way past the TDP and have exotic cooling.

    these CPUs are already boosting close to their limits at stock settings to maintain high gaming performance.
  • 29a - Saturday, December 5, 2020 - link

    There is a lot of different scenarios that would be interesting to see. I would like to see some testing with a dual core chip 2c/4t.
  • Netmsm - Thursday, December 3, 2020 - link

    good point
  • Wilco1 - Friday, December 4, 2020 - link

    I think that 5% area cost for SMT is marketing. If you only count the logic that is essential for SMT, then it might be 5%. However many resources need to be increased or doubled. Even if that helps single-threaded performance, it still adds a lot of area that you wouldn't need without SMT.

    Graviton 2 proves that 2 small non-SMT cores will beat one big SMT core on multithreaded workloads using a fraction of the silicon and power.
  • peevee - Monday, December 7, 2020 - link

    Except they are not faster, but whatever.
  • RickITA - Thursday, December 3, 2020 - link

    Several compute applications do not need hyper-threading. A couple of official references:
    1. Wolfram Mathematica: "Mathematica’s Parallel Computing suite does not necessarily benefit from hyper-threading, although certain kernel functionality will take advantage of it when it provides a speedup." [source: https://support.wolfram.com/39353?src=mathematica]. Indeed Mathematica automatically set-up a number of threads equal to the number of physical cores of the CPU.
    2. Intel MKV library. "Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled." [source: https://software.intel.com/content/www/us/en/devel...].

    BTW Ian: Wolfram Mathematica has a benchmark mode [source: https://reference.wolfram.com/language/Benchmarkin...], please consider to add it to your test suite. Or something with Matlab.
  • realbabilu - Thursday, December 3, 2020 - link

    Apparently intel mkl and Matlab that uses intel mkl only allowing AMD uses non AVX2 library only. Only Linux with fake cpu preloaded library could go around this.
    https://www.google.com/amp/s/simon-martin.net/2020...
  • RickITA - Thursday, December 3, 2020 - link

    Not a matlab user, but this is no longer true since version 2020a. Source: https://www.extremetech.com/computing/308501-cripp...
  • leexgx - Saturday, December 5, 2020 - link

    The "if not Intel genuine cpu" disabled all optimisations (this rubbish has been going on for years only 2020 or 2019 where they are actually fixing there code to detect if AVX is available, even BTRFS had this problem it wouldn't use hardware acceleration if it wasn't on an intel CPU, again lazy coding )

Log in

Don't have an account? Sign up now