Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000
by Dr. Ian Cutress on December 3, 2020 10:00 AM EST- Posted in
- CPUs
- AMD
- Zen 3
- X570
- Ryzen 5000
- Ryzen 9 5950X
- SMT
- Multi-Threading
Power Consumption, Temperature
Two other arguments for having SMT enabled or disabled comes down to power consumption and temperature.
With SMT enabled, the core utilization is expected to be higher, with more instructions flowing through and being processed per cycle. This naturally increases the power requirements on the core, but might also reduce the frequency of the core. The trade-off is meant to be that the work going through the core should be more than enough to make up for extra power used, or any lower frequency. The lower frequency should enable a more efficient throughput, assuming the voltage is adjusted accordingly.
This is perhaps where AMD and Intel differ slightly. Intel’s turbo frequency range is hard-bound to specific frequency values based on core loading, regardless of how many threads are active or how many threads per core are active. The activity is a little more opportunistic when we reach steady state power, although exactly how far down the line that is will depend on what the motherboard has set the power length to. AMD’s frequency is continually opportunistic from the moment load is applied: it obviously scales down as more cores are loaded, but it will balance up and down based on core load at all times. On the side of thermals, this will depend on the heat density being generated in each core, but this also acts as a feedback loop into the turbo algorithm if the power limit has not been reached.
For our analysis here, we’ve picked two benchmarks. Agisoft, which is a variable threaded test performs practically the same with SMT On/Off, and 3DPMavx, a pure MT test which gets the biggest gain from SMT.
Agisoft
Photoscan from Agisoft is a 2D image to 3D model creator, using dozens of high-quality 2D images to generate related point maps to form a 3D model, before finally texturing the model using the images provided. It is used in archiving artefacts, as well as converting 2D sculpture into 3D scenes. Our test analyses a standardized set of 85 x 18 megapixel photos, with a result measured in time to complete.
Simply looking at CPU temperatures while running our real-world Agisoft test, our current setup (MSI X570 Godlike with Noctua NH12S) shows that both CPUs will flutter around 74ºC sustained. Perhaps the interesting element is at the beginning of the test, where the CPU temperatures are higher in SMT Off mode. Looking into the data, and during SMT Off, the processor is at 4300 MHz, compared to 4150 MHz when SMT is enabled. This would account for the difference.
Looking at power, we can follow that for the bulk of the test, both processors have similar package power consumption, around 130 W. The SMT Off is drawing more power during the first couple of minutes of the test, due to the higher frequency. Clearly the thermal density in this part of the test by only having one thread per core is allowing for a higher turbo.
If we measure the total power of the test, it’s basically identical in any metric that matters. Nearer the end of the test, where the workload is more variably threaded, this is where the SMT Off mode seems to come under power. This benchmark completion time is essentially the same due to the nature of the test, but SMT Off comes in at 2% lower power overall.
3DPMavx (3D Particle Movement)
Our 3DPM test is an algorithmic sequence of non-interactive random three-dimensional movement, designed to simulate molecular diffusive movement inside a gas or a fluid. The simulation is made non-interactive (i.e. no two molecules will collide) due to the original average movement of each particle taking collisions into account. Our test cycles through six movement algorithms at ten seconds apiece, followed by ten seconds of idle, with the whole loop being repeated six times, taking about 20 minutes, regardless of how fast or slow the processor is. The related performance figure is millions of particle movements per second. Each algorithm has been accelerated for AVX2.
On the temperature side of things, it is clear that the SMT Off mode again puts up a higher thermal profile. Temperatures this time peak at 66ºC, but it is clear the difference between the two modes.
On the power side, we can see why SMT Off mode is warmer – the cores are drawing more power. Looking at the data, SMT Off mode is running ~4350 MHz, compared to SMT On which is running closer to 4000 MHz.
With the higher frequency with SMT Off, the estimated total power consumption is 6.8% higher. This appears to be very constant throughout the benchmark, which lasts about 20 minutes total.
But, let us add in the performance numbers. Because 3DPMavx can take advantage of SMT On, that mode scores +77.5% by having two threads per core rather than one (a score of 10245 vs 5773). Combined this makes SMT On mode +91% better in performance per watt on this benchmark.
126 Comments
View All Comments
dotjaz - Thursday, December 3, 2020 - link
Do you understand what "S(imultaneous)" in SMT means? Barrel processors are by definition NOT simultaneous. They switch between threads.quadibloc - Friday, December 4, 2020 - link
That all depends. There could be a unit that switches between threads to dispatch instructions into the pipeline, but instructions from all the threads are simultaneously working on calculations in the pipeline. I'd call that a way to implement SMT.Elstar - Friday, December 4, 2020 - link
Guys, I've got bad news for you. The difference between a barrel processor ("temporal multithreading") and SMT is all about the backend, not the frontend. I.e. whether the processor is superscalar or not. Otherwise there is no difference. They duplicate hardware resources and switch between them. And the frontend (a.k.a. the decoder) switches temporally between hardware threads. There are NOT multiple frontends/decoders simultaneously feeding one backend pipeline.Elstar - Friday, December 4, 2020 - link
For example the "SMT4" Intel Xeon Phi has a design weakness where three running threads per core get decoded as if four threads were running. (And yes, just one or two running threads per core get decoded efficiently.)dotjaz - Thursday, December 3, 2020 - link
You nailed 2 letters out of 3, gj.Luminar - Thursday, December 3, 2020 - link
Talk about being uninformed.MenhirMike - Thursday, December 3, 2020 - link
Will be interesting to see if this looks different with Quad-Channel Threadripper or Octo-Channel EPYC/TR Pro CPUs, since 16 Cores/32 Threads with 2 channels of memory doesn't seem very compute-friendly. Though it's good to see that "SMT On" is still the reasonable default it's pretty much always has been, except in very specific circumstances.schujj07 - Thursday, December 3, 2020 - link
Also would be interesting to see this on a 6c/12t or 8c/16t CPU.CityBlue - Thursday, December 3, 2020 - link
In your list of "Systems that do not use SMT" you forgot:* All x86 from Intel with CPU design vulnerabilities used in security conscious environments
MenhirMike - Thursday, December 3, 2020 - link
To be fair, "x86" and "security conscious" are already incompatible on anything newer than a Pentium 1/MMX. Spectre affects everything starting with the Pentium Pro, and newer processors have blackboxes in the form of Intel ME or AMD PSP. You can reduce the security risk by turning off some performance features (and get CPUs without Intel ME if you're the US government), but this is still just making an inherently insecure product slightly less insecure.