64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review

Name: 64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review
Item: 64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review
Author: Dr. Ian Cutress

by Dr. Ian Cutress on February 9, 2021 9:00 AM EST

118 Comments | Add A Comment

118 Comments

Power Consumption

The nature of reporting processor power consumption has become, in part, a dystopian nightmare. Historically the peak power consumption of a processor, as purchased, is given by its Thermal Design Power (TDP, or PL1). For many markets, such as embedded processors, that value of TDP still signifies the peak power consumption. For the processors we test at AnandTech, either desktop, notebook, or enterprise, this is not always the case.

Modern high performance processors implement a feature called Turbo. This allows, usually for a limited time, a processor to go beyond its rated frequency. Exactly how far the processor goes depends on a few factors, such as the Turbo Power Limit (PL2), whether the peak frequency is hard coded, the thermals, and the power delivery. Turbo can sometimes be very aggressive, allowing power values 2.5x above the rated TDP.

AMD and Intel have different definitions for TDP, but are broadly speaking applied the same. The difference comes to turbo modes, turbo limits, turbo budgets, and how the processors manage that power balance. These topics are 10000-12000 word articles in their own right, and we’ve got a few articles worth reading on the topic.

In simple terms, processor manufacturers only ever guarantee two values which are tied together - when all cores are running at base frequency, the processor should be running at or below the TDP rating. All turbo modes and power modes above that are not covered by warranty. Intel kind of screwed this up with the Tiger Lake launch in September 2020, by refusing to define a TDP rating for its new processors, instead going for a range. Obfuscation like this is a frustrating endeavor for press and end-users alike.

However, for our tests in this review, we measure the power consumption of the processor in a variety of different scenarios. These include full AVX2/AVX512 (delete as applicable) workflows, real-world image-model construction, and others as appropriate. These tests are done as comparative models. We also note the peak power recorded in any of our tests.

AMD Ryzen Threadripper Pro 3995WX

The specifications for this processor list 64 cores running at a TDP of 280 W. In our testing, we never saw any power consumption over 280 W:

(0-0) Peak Power

Going through our POV-Ray scaling power test for per-core consumption, we’re seeing a trend whereby 40% of the power goes to the non-core operation of the system, which is also likely to include the L3 cache.

Red = Full Package, Blue = CPU Core only (minus L3 we think)

We only hit the peak 280 W when we are at 56-core loading, otherwise it is a steady climb moving from 7 W/core in the early loading down to about 3 W/core when fully loaded. What this does for core frequencies is relatively interesting.

Our system starts around 4200 MHz, which is the rated turbo frequency, settling down to 4000-4050 MHz in that 8-core to 20-core loading. After 20 cores, it’s a slow decline at a rate of 25 MHz per extra core loaded, until at full CPU load we observe 3100 MHz on all cores. This is above the 2700 MHz base frequency, but also comes out to 2.86 W per core in CPU-only power, or 4.37 W per core if we also include non-CPU power. Note that non-CPU power in this case might also include the L3.

For an actual workload, our 3DPMavx test is a bit more aggressive than POV-Ray, cycling to full load for ten seconds for each of its six algorithms then idling for a short time. In this test we saw idle frequencies of 2700 MHz, but all-core loading was at least 2900 MHz up to 3200 MHz. Power again was very much limited to 280 W.

Does 8-Channel Memory Matter? CPU Tests: Rendering

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

118 Comments

View All Comments

avb122 - Tuesday, February 9, 2021 - link
Those cases do not matter unless you are checking that the result is the same as a golden reference. Otherwise the image it creates is just as if the object it was rendering moved 10 micrometers. To our brain it not doesn't matter.

Being off by one bit with FP32 for geometry is about the same magnitude as modeling light as a partial instead of a wave. For color intensity, one bit of FP32 is less than one photon in real world cases.

But, CPUs and GPUs all get the same answer when doing the same FP32 arithmetic. The programmer can choose to do something else like use lossy texture compression or goofy rounding modes.
avb122 - Tuesday, February 9, 2021 - link
It's not because of the hardware. AMD and NVIDIA's GPUs have IEEE complient FPUs. So, they get the same answer as the CPU when using the same algorithm.

With CUDA, the same C or C++ code doing computations can run on the CPU and GPU and get the same answer.

The REAL reasons to not use a GPU are that the non-compete parts (threading, memory management, synchronization, etc.) are different on the GPU and not all GPUs support CUDA. Those are very good reasons. But it is not about the hardware. It is about the software ecosystem.

Also GPUs do not have a tiny amount of cache. They have more total cache than a CPU. The ratio of "threads" to cache is lower. That requires changing the size of the block that each "thread" operates on. Ultimately, GPUs have so much more internal and external bandwidth than a CPU that only extreme cases where everything fits in the CPUs' L1 caches buy not in the GPU's register file can a CPU have more bandwidth.

Ian's statement about wanting 36 bits so that it can do 12-bit color is way off. I only know CUDA and NVIDIA's OpenGL. For those, each color channel is represented by a non-SIMD register. Each color channel is then either an FP16 or FP32 value (before neural networks GPUs were not faster at FP16, it was just for memory capacity and bandwidth). Both cover 12-bit color space. Remember, games have had HDR for almost two decades.
Dug - Tuesday, February 9, 2021 - link
It's software.

But sometimes you don't want perfect. It can work in your benefit depending on what end results you view and interpret.
Smell This - Tuesday, February 9, 2021 - link

Page 4
Cinebench R20
Paragraph below the first image
**Results for Cinebench R20 are not comparable to R15 or older, because both the scene being used is different, but also the updates in the code bath. **

I do like my code clean ...
alpha754293 - Tuesday, February 9, 2021 - link
It's a pity that the processor and as a platform, you can buy a used dual EPYC 7702 server and still reap the multithreaded performance of 128-cores/256-threads moreso than you would be able to get out of this processor.

I'd wished that this review actually included the results of a dual EPYC 7702/7742 system for the purposes of comparing the two, as I think that the dual EPYC 7702/7742 would still outperform this Threadripper Pro 3995WX.
Duncan Macdonald - Tuesday, February 9, 2021 - link
Given the benchmarks and the prices, the main reason for using the Threadripper Pro rather than the plain Threadripper is likely to be the higher memory capacity (2TB vs 256GB) .
Even a small overclock on a standard Threadripper would allow it to be faster than a non-overclocked Threadripper Pro for any application that fits into 256GB.
twtech - Tuesday, February 9, 2021 - link
There are a couple other pretty significant differences that matter perf-wise in some scenarios - the Pro has 8-channel memory support, and more PCIE lanes.

Significant differences not directly tied to performance include registered ECC support, and management tools for corporate security, which actually matters quite a bit with everyone working remotely.
WaltC - Tuesday, February 9, 2021 - link
On the whole, a nice review...;)

Yes, it's fairly obvious that one CPU core does not equal one GPU core, as comparatively, the latter is wide and shallow and handles fewer instructions, IPC, etc. GPU cores are designed for a specific, narrow use case, whereas CPU cores are much deeper (in several ways) and designed for a much wider use case. It's nice that companies are designing programming languages to utilize GPUs as untapped computing resources, but the bottom line is that GPUs are designed primarily to accelerate 3d graphics and CPUs are designed for heavy, multi-use, multithreaded computation with a much deeper pipeline, etc. While it might make sense to use both GPUs and CPUs together in a more general computing case once the specific-case programming goals for each kind of processing hardware are reached, it makes no sense to use GPUs in place of CPUs or CPUs in place of GPUs. AMD has recently made no secret it is divulging its GPU line to provide more 3d-acceleration circuitry and less compute circuitry for gaming, and another branch that will include more CU circuitry and less gaming-use 3d-acceleration circuitry. 'bout time.

The software rendering of Crysis is a great example--an old, relatively slow 3d GPU accelerator with a CPU can bust the chops of even WX3995 CPUs *if* the 3995WX is tasked to rendering Crysis sans a 3d accelerator. When the Crysis-engine talks about how many cores and so on it will support, it's talking about using a 3d accelerator *with* a general-purpose CPU. That's what the engine is designed to do, actually. Take the CPU out and the engine won't run at all--trying to use the CPU as the API renderer and it's a crawl that no one wants...;) Most of all, using the CPU to "render" Crysis in software has no comparison to a CPU rendering a ray-traced scene, for instance. Whereas the CPU is rendering to a software D3d API in Crysis, ray-tracing is done by far more complex programming that will not be found in the Crysis engine (of course.)

I was surprised to read that Ian didn't think that 8-channel memory would add much of anything to performance beyond 4-channel support....;) Eh? It's the same principle as expecting 4-channel to outperform 2 channel, everything else being equal. Of course, it makes a difference--if it didn't there would be no sense in having 3995WX support 8 channels. No point at all...;)
Oxford Guy - Tuesday, February 9, 2021 - link
Yes, the same principle of expecting a dual core to outperform a single core — which is why single/core CPUs are still dominant.

(Or, we could recognize that diminishing returns only begin to matter at a certain point.)
tyger11 - Tuesday, February 9, 2021 - link
Definitely waiting for the Zen 3 version of the 3955X. I'm fine with 16 cores.

64 Cores of Rendering Madness: The AMD Threadripper Pro 3995WX Review

Power Consumption

AMD Ryzen Threadripper Pro 3995WX

Post Your Comment

118 Comments

View All Comments

avb122 - Tuesday, February 9, 2021 - link

avb122 - Tuesday, February 9, 2021 - link

Dug - Tuesday, February 9, 2021 - link

Smell This - Tuesday, February 9, 2021 - link

alpha754293 - Tuesday, February 9, 2021 - link

Duncan Macdonald - Tuesday, February 9, 2021 - link

twtech - Tuesday, February 9, 2021 - link

WaltC - Tuesday, February 9, 2021 - link

Oxford Guy - Tuesday, February 9, 2021 - link

tyger11 - Tuesday, February 9, 2021 - link

Log in

Don't have an account? Sign up now