Conclusions: SMT On

I wasn’t too sure what we were going to see when I started this testing. I know the theory behind implementing SMT, and what it means for the instruction streams having access to core resources, and how cores that have SMT in mind from the start are built differently to cores that are just one thread per core. But theory only gets you so far. Aside from all the forum messages over the years talking about performance gains/losses when a product has SMT enabled, and the few demonstrations of server processors running focused workloads with SMT disabled, it is actually worth testing on real workloads to find if there is a difference at all.

Results Overview

In our testing, we covered three areas: Single Thread, Multi-Thread, and Gaming Performance.

In single threaded workloads, where each thread has access to all of the resources in a single core, we saw no change in performance when SMT is enabled – all of our workloads were within 1% either side.

In multi-threaded workloads, we saw an average uplift in performance of +22% when SMT was enabled. Most of our tests scored a +5% to a +35% gain in performance. A couple of workloads scored worse, mostly due to resource contention having so many threads in play – the limit here is memory bandwidth per thread. One workload scored +60%, a computational workload with little-to-no memory requirements; this workload scored even better in AVX2 mode, showing that there is still some bottleneck that gets alleviated with fewer instructions.

On gaming, overall there was no difference between SMT On and SMT Off, however some games may show differences in CPU limited scenarios. Deus Ex was down almost 10% when CPU limited, however Borderlands 3 was up almost 10%. As we moved to a more GPU limited scenario, those discrepancies were neutralized, with a few games still gaining single-digit percentage points improvement with SMT enabled.

For power and performance, we tested two examples where performance at two threads per core was either saw no improvement (Agisoft), or significant improvement (3DPMavx). In both cases, SMT Off mode (1 thread/core) ran at higher temperatures and higher frequencies. For the benchmark per performance was about equal, the power consumed was a couple of percentage points lower when running one thread per core. For the benchmark were running two threads per core has a big performance increase, the power in that mode was also lower, and there was a significant +91% performance per watt improvement by enabling SMT.

What Does This Mean?

I mentioned at the beginning of the article that SMT performance gains can be seen from two different viewpoints.

The first is that if SMT enables more performance, then it’s an easy switch to use, and some users consider that if you can get perfect scaling, then if SMT is an effective design.

The second is that if SMT enables too much performance, then it’s indicative of a bad core design. If you can get perfect scaling with SMT2, then perhaps something is wrong about the design of the core and the bottleneck is quite bad.

Having poor SMT scaling doesn’t always mean that the SMT is badly implemented – it can also imply that the core design is very good. If an effective SMT design can be interpreted as a poor core design, then it’s quite easy to see that vendors can’t have it both ways. Every core design has deficiencies (that much is true), and both Intel and AMD will tell its users that SMT enables the system to pick up extra bits of performance where workloads can take advantage of it, and for real-world use cases, there are very few downsides.

We’ve known for many years that having two threads per core is not the same as having two cores – in a worst case scenario, there is some performance regression as more threads try and fight for cache space, but those use cases seem to be highly specialized for HPC and Supercomputer-like tasks. SMT in the real world fills in the gaps where gaps are available, and this occurs mostly in heavily multi-threaded applications with no cache contention. In the best case, SMT offers a sizeable performance per watt increase. But on average, there are small (+22% on MT) gains to be had, and gaming performance isn’t disturbed, so it is worth keeping enabled on Zen 3.

 
Power Consumption, Temperature
Comments Locked

126 Comments

View All Comments

  • Holliday75 - Thursday, December 3, 2020 - link

    As usage for modern users changes I wonder how this could be better tested/visualized.

    I am not looking at a 5900x to run any advanced tools. I am looking to game, run mutiple browsers with a few dozen tabs open, stream, download, run Plex (transcoding), security tools, VPN, and the million other applications a normal user would have running at any given point in time. While no two users will have the same workload at any given time, how could we quantify SMT versus no SMT for the average user?

    In the not to distance future we could be seeing the average PC running 32 cores. I am talking your run of the mill office machine from Dell that costs $800. Or will we? Is there a point where it does not matter anymore?
  • realbabilu - Thursday, December 3, 2020 - link

    Simple. At average user 4 core 8gen u series have more core than the generation before. It has more strength, but it's rarely got 100 percent cpu utilized for those normal you doing.
    To get 8 threads or 4 cores work 100 percent need killer applications that programmed by man know how to extract every juice of it processor, know how to program multithread, or using optimized math kernel.library / optimized compiler switch like FEM, Render, math applied science.
    Other than those app, maybe you could expense it to gpu for gaming.
  • schujj07 - Thursday, December 3, 2020 - link

    Or you just have multiple tabs open. I regularly hit 100% usage on my work i5-6400 with 4c/4t having 10-12 tabs open. It gets quite annoying as on a normal day I might need up to double that open at any given time. That means that 20 tabs would peg a 4c/8t CPU pretty easily.
  • evilpaul666 - Friday, December 4, 2020 - link

    You need an ad blocker unless those tabs are all very busy doing something. I mean, it sounds like they're mining Monero for somebody else, I mean what they're *supposed* to be doing for you.
  • schujj07 - Friday, December 4, 2020 - link

    I use an ad blocker and nothing is being mined. However, ads are an example of things that will destroy your performance in web browsing quite quickly and suck up a lot of CPU cycles. While right now 4c/8t is enough for an office machine, it will not be long before 6c/12t is the standard.
  • marrakech - Tuesday, December 15, 2020 - link

    15 cores are the futureeeeee
  • Hulk - Thursday, December 3, 2020 - link

    Wouldn't high SMT performance be an indication of bad software design rather than bad core design?
    While SMT performance is changing in these tests the core is not. Only the software is changing. It seems as though an Intel CPU in this comparison would have provided additional insights to these questions.
  • BillyONeal - Thursday, December 3, 2020 - link

    The situations that create high SMT performance are generally outside the software in question's control. For example, a program might have 1 thread that's doing all divides and another that's doing all multiplies. The thread that only has multiplies or divisions aren't poorly designed, they just aren't using units on the chip that don't help their respective workloads.

    There are also cache effects. If you have 2 threads working on data bigger than the CPU's caches while one is waiting for that data to come back from memory the other can make unrelated progress and vice versa, but the data being big isn't necessarily an indicator of poor software design. Some problem domains just have big data sets there's no way around.
  • WaltC - Thursday, December 3, 2020 - link

    Exactly. Some software is written to utilize a lot of threads simultaneously, some is not. Running software that does not make use of a lot of simultaneous threads tells us really nothing much about SMT CPU hardware, imo, other than "this software doesn't support it very well."
  • Elstar - Thursday, December 3, 2020 - link

    SMT24? Ha. Try SMT128: https://en.wikipedia.org/wiki/Cray_XMT#Threadstorm...

Log in

Don't have an account? Sign up now