Power Consumption, Temperature

Two other arguments for having SMT enabled or disabled comes down to power consumption and temperature.

With SMT enabled, the core utilization is expected to be higher, with more instructions flowing through and being processed per cycle. This naturally increases the power requirements on the core, but might also reduce the frequency of the core. The trade-off is meant to be that the work going through the core should be more than enough to make up for extra power used, or any lower frequency. The lower frequency should enable a more efficient throughput, assuming the voltage is adjusted accordingly.

This is perhaps where AMD and Intel differ slightly. Intel’s turbo frequency range is hard-bound to specific frequency values based on core loading, regardless of how many threads are active or how many threads per core are active. The activity is a little more opportunistic when we reach steady state power, although exactly how far down the line that is will depend on what the motherboard has set the power length to. AMD’s frequency is continually opportunistic from the moment load is applied: it obviously scales down as more cores are loaded, but it will balance up and down based on core load at all times. On the side of thermals, this will depend on the heat density being generated in each core, but this also acts as a feedback loop into the turbo algorithm if the power limit has not been reached.

For our analysis here, we’ve picked two benchmarks. Agisoft, which is a variable threaded test performs practically the same with SMT On/Off, and 3DPMavx, a pure MT test which gets the biggest gain from SMT.

Agisoft

Photoscan from Agisoft is a 2D image to 3D model creator, using dozens of high-quality 2D images to generate related point maps to form a 3D model, before finally texturing the model using the images provided. It is used in archiving artefacts, as well as converting 2D sculpture into 3D scenes. Our test analyses a standardized set of 85 x 18 megapixel photos, with a result measured in time to complete.

Simply looking at CPU temperatures while running our real-world Agisoft test, our current setup (MSI X570 Godlike with Noctua NH12S) shows that both CPUs will flutter around 74ºC sustained. Perhaps the interesting element is at the beginning of the test, where the CPU temperatures are higher in SMT Off mode. Looking into the data, and during SMT Off, the processor is at 4300 MHz, compared to 4150 MHz when SMT is enabled. This would account for the difference.

Looking at power, we can follow that for the bulk of the test, both processors have similar package power consumption, around 130 W. The SMT Off is drawing more power during the first couple of minutes of the test, due to the higher frequency. Clearly the thermal density in this part of the test by only having one thread per core is allowing for a higher turbo.

If we measure the total power of the test, it’s basically identical in any metric that matters. Nearer the end of the test, where the workload is more variably threaded, this is where the SMT Off mode seems to come under power. This benchmark completion time is essentially the same due to the nature of the test, but SMT Off comes in at 2% lower power overall.

3DPMavx (3D Particle Movement)

Our 3DPM test is an algorithmic sequence of non-interactive random three-dimensional movement, designed to simulate molecular diffusive movement inside a gas or a fluid. The simulation is made non-interactive (i.e. no two molecules will collide) due to the original average movement of each particle taking collisions into account. Our test cycles through six movement algorithms at ten seconds apiece, followed by ten seconds of idle, with the whole loop being repeated six times, taking about 20 minutes, regardless of how fast or slow the processor is. The related performance figure is millions of particle movements per second. Each algorithm has been accelerated for AVX2.

On the temperature side of things, it is clear that the SMT Off mode again puts up a higher thermal profile. Temperatures this time peak at 66ºC, but it is clear the difference between the two modes.

On the power side, we can see why SMT Off mode is warmer – the cores are drawing more power. Looking at the data, SMT Off mode is running ~4350 MHz, compared to SMT On which is running closer to 4000 MHz.

With the higher frequency with SMT Off, the estimated total power consumption is 6.8% higher. This appears to be very constant throughout the benchmark, which lasts about 20 minutes total.

But, let us add in the performance numbers. Because 3DPMavx can take advantage of SMT On, that mode scores +77.5% by having two threads per core rather than one (a score of 10245 vs 5773). Combined this makes SMT On mode +91% better in performance per watt on this benchmark.

Gaming Performance (Discrete GPU) Conclusions: SMT On
Comments Locked

126 Comments

View All Comments

  • Bomiman - Saturday, December 5, 2020 - link

    That common knowledge is a few years old now. It was once common knowledge that games only used one thread.

    Consoles now have 3 times as many threads as before, and that's in a situation where 4t Cpus are barely usable and 4c 8t Cpus are obsolete.
  • MrPotatoeHead - Tuesday, December 15, 2020 - link

    Xbox360 came out in 2005. 3C/6T. Even the PS3 had a 1C/2T PowerPC PPE and 6 SPEs, so a total of 8T. PS4/XO is 8C/8T. Though I guess we could blame lack of CPU utilization still on this last generation using pretty weak cores from the get go. IIRC 8 core Jaguar would be on par with an Intel i3 at the time of these console releases.

    Though, the only other option AMD had was Piledriver. Piledriver still poor performer, a power hog, and it would likely only been worth it over 8 Jaguar cores if they went with a 3 or 4 module chip.

    It is nice that this generation MS and Sony both went all out on the CPU. Just too bad they aren't Zen 3 based. :(
  • Dolda2000 - Friday, December 4, 2020 - link

    It should be kept in mind that, at the time when AMD criticized Intel for that, that was when AMD had actual dual-cores (A64x2) and Intel still had single-cores with HT, which makes the criticism rather fair.
  • Xajel - Sunday, December 6, 2020 - link

    "Intel's HT approach proved superior".

    Intel's approach wasn't that much superior. In fact, in the early days of Intel's HTT processors, many Applications, even ones which supposed to be optimised for MC code path was getting lower scores with HTT enabled than when HTT was disabled.

    The main culprit was that Applications were designed to handle each thread in a real core, not two threads in a single core, the threads were fighting for resources that weren't there.

    Intel knew this and worked hard with developers to make them know the difference and apply this change to the code path. This actually took sometime till Multi-Core applications were SMT aware and had a code path for this.

    For AMD's case, AMD's couldn't work hard enough like Intel with developers to make them have a new code path just for AMD CPU's. Not to mention that intel was playing it dirty starting with their famous compiler which was -and still- used by most developers to compile applications, the compilers will optimise the code for intel's CPU's and have an optimised code path for every CPU and CPU feature intel have, but when the application detect a non-Intel CPU, including AMD's it will select the slowest code path, and will not try to test the feature and choose a code path.

    This applied also to AMD's CPU's, while sure the CPU's lacked FPU performance, and was not competitive enough (even when the software was optimised), but the whole optimisation thing made AMD's CPU inefficient, the idea should work better than Intel, because there's an actual real hardware there (at least for Integer), but developers didn't work harder, and the intel compiler played a major role for smaller developers also.

    TL'DR, the main issue was the intel compiler and lack of developers interest, then the actual cores were also not that much stronger than intel's (IPC side), AMD's idea should have worked, but things weren't in their side.

    And by the time AMD came with their design, they were already late, applications were already optimised for Intel HTT which became very good as almost all applications became SMT aware. AMD acknowledged this and knew that they must take what developers already have and work on it, they also worked hard on their SMT implementation that it is touted now that their SMT is better intel's own SMT implementation (HTT).
  • Keljian - Sunday, January 10, 2021 - link

    Urm no, intel’s compiler isn’t used often these days unless you’re doing really heavy maths. Microsoft’s compiler is used much more often, though clang is taking off
  • pogsnet - Tuesday, December 29, 2020 - link

    During P4, HT gives no difference in performance compared to AMD64 but on Core2Duo there it shows better performance. Probably because we have only 2-4 cores and not enough for our multi tasking needs, Now we have 4-32 cores plus much powerful and efficient cores, hence, SMT maybe not that significant already that is why on most test it shows no big performance lift.
  • willis936 - Thursday, December 3, 2020 - link

    5%? I think more than 5% is needed for a whole second set of registers plus the logic needed to properly handle context switching. Everything in between the cache and pipeline needs to be doubled.
  • tygrus - Thursday, December 3, 2020 - link

    Register rename means they already have more registers that don't need to be copied. The register renaming means they have more physical registers than logical registers exposed to programmer. Say you have: 16 logical registers exposed to coder per thread; 128 rename registers in HW; SMT 2tgreads/core = same 16 logical but each thread has 64 rename registers instead of 128.
    Compare mixing the workloads eg. 8 int/branch heavy with 8 FP heavy on 8 core; or OS background tasks like indexing/search/AntiVirus.
  • MrSpadge - Thursday, December 3, 2020 - link

    The 5% is from Intel for the original Pentium 4. At some point in the last 10 years I think I read a comparable number, probably here on AT, regarding a more modern chip.
  • Wilco1 - Friday, December 4, 2020 - link

    There is little accurate info about it, but the fact is that x86 cores are many times larger than Arm cores with similar performance, so it must be a lot more than 5%. Graviton 2 gives 75-80% of the performance of the fastest Rome at less than a third of the area (and half the power).

Log in

Don't have an account? Sign up now