Conclusions: Faster Than Expected

When I started testing for this review, looking purely at the specification sheet, I was expecting AMD’s Threadripper Pro 3995WX to come in just behind the 3990X in most of our testing. The same amount of cores, the same TDP, but slightly lower on frequencies in exchange for double the memory channels and 8x the memory support (also Pro features). More often than not our processor comparisons are usually testing systems with identical memory systems, or we don’t consider that memory difference that major in most of our testing. After going through the end data for this review, it would appear that it makes more of a difference than we initially had thought.

In the tests that matter, most noticeably the 3D rendering tests, we’re seeing a 3% speed-up on the Threadripper Pro compared to the regular Threadripper at the same memory frequency and sub-timings. The core frequencies were preferential on the 3990X, but the memory bandwidth of the 3995WX is obviously helping to a small degree, enough to pull ahead in our testing, along with the benefit of having access to 8x of the memory capacity as well as Pro features for proper enterprise-level administration.

The downside of this comparison is the cost: the SEP difference is +$1500, or another 50%, for the Threadripper Pro 3995WX over the regular Threadripper 3990X. With this price increase, you’re not really paying +50% for the performance difference (ECC memory also costs a good amount), but the feature set. Threadripper Pro is aimed at the visual effects and rendering market, where holding 3D models in main memory is a key aspect of workflow speed as well as full-scene production. Alongside the memory capacity difference, having double the PCIe 4.0 lanes means more access to offload hardware or additional fast storage, also important tools in the visual effects space. Threadripper Pro falls very much into the bucket of 'if you need it, this is the option to go for'.

For our testing, we used the Lenovo Thinkstation P620, the first Threadripper Pro system available in the market, and we’ll have a full review on it shortly. The Thinkstation Pro systems are always well designed workstations with longevity and professional workloads in mind, enabling 280 W cooling with a fun heatsink but also additional custom DRAM fans, a unique motherboard with an easily removable power supply, and support and space for a number of add-in cards. Lenovo’s units, if you buy them individually from the website, are eye-wateringly expensive (+$12200 for the 64-core CPU, a +120% markup), and it is recommended that any design studio that wants to test or order these units should work through a local distributor.

AMD is set to push Threadripper Pro into the consumer and commercial markets beyond Lenovo later this quarter. We have already been in touch with local regional system integrators who are already examining their options based on the three Threadripper Pro motherboards set to be available in the market from ASUS, GIGABYTE, and Supermicro. We are expecting a range of options to be available, and most design studios are likely to order pre-built systems with a variety of air and liquid cooling.

What might confuse a few users is that AMD is launching Threadripper Pro into the major market now, right on the cusp of its next-generation EPYC launch in the next eight weeks. These new EPYC processors should afford a sizeable raw compute upgrade moving to Zen 3 cores, all while Threadripper Pro is on Zen 2. As we saw comparing TR Pro to EPYC in this review, both on Zen 2, in some circumstances it is the push up to 280 W where TR Pro gets the best performance, and a 280 W version of next-generation EPYC might seem more appealing to users looking at TR Pro today. What exactly AMD will launch for EPYC is unknown, whereas TR Pro on this generation is now a known performance factor that system integrators are building on for the workstation market. EPYC never really fit into the workstation market that easily, which is why TR Pro exists today.

We have heard some conflicting dates as to when exactly Threadripper Pro will come to the mass market beyond Lenovo, but they all fall within Q1. We have reached out to AMD in order to source the other processors for our testing.

CPU Tests: Simulation
Comments Locked

118 Comments

View All Comments

  • avb122 - Tuesday, February 9, 2021 - link

    Those cases do not matter unless you are checking that the result is the same as a golden reference. Otherwise the image it creates is just as if the object it was rendering moved 10 micrometers. To our brain it not doesn't matter.

    Being off by one bit with FP32 for geometry is about the same magnitude as modeling light as a partial instead of a wave. For color intensity, one bit of FP32 is less than one photon in real world cases.

    But, CPUs and GPUs all get the same answer when doing the same FP32 arithmetic. The programmer can choose to do something else like use lossy texture compression or goofy rounding modes.
  • avb122 - Tuesday, February 9, 2021 - link

    It's not because of the hardware. AMD and NVIDIA's GPUs have IEEE complient FPUs. So, they get the same answer as the CPU when using the same algorithm.

    With CUDA, the same C or C++ code doing computations can run on the CPU and GPU and get the same answer.

    The REAL reasons to not use a GPU are that the non-compete parts (threading, memory management, synchronization, etc.) are different on the GPU and not all GPUs support CUDA. Those are very good reasons. But it is not about the hardware. It is about the software ecosystem.

    Also GPUs do not have a tiny amount of cache. They have more total cache than a CPU. The ratio of "threads" to cache is lower. That requires changing the size of the block that each "thread" operates on. Ultimately, GPUs have so much more internal and external bandwidth than a CPU that only extreme cases where everything fits in the CPUs' L1 caches buy not in the GPU's register file can a CPU have more bandwidth.

    Ian's statement about wanting 36 bits so that it can do 12-bit color is way off. I only know CUDA and NVIDIA's OpenGL. For those, each color channel is represented by a non-SIMD register. Each color channel is then either an FP16 or FP32 value (before neural networks GPUs were not faster at FP16, it was just for memory capacity and bandwidth). Both cover 12-bit color space. Remember, games have had HDR for almost two decades.
  • Dug - Tuesday, February 9, 2021 - link

    It's software.

    But sometimes you don't want perfect. It can work in your benefit depending on what end results you view and interpret.
  • Smell This - Tuesday, February 9, 2021 - link


    Page 4
    Cinebench R20
    Paragraph below the first image
    **Results for Cinebench R20 are not comparable to R15 or older, because both the scene being used is different, but also the updates in the code bath. **

    I do like my code clean ...
  • alpha754293 - Tuesday, February 9, 2021 - link

    It's a pity that the processor and as a platform, you can buy a used dual EPYC 7702 server and still reap the multithreaded performance of 128-cores/256-threads moreso than you would be able to get out of this processor.

    I'd wished that this review actually included the results of a dual EPYC 7702/7742 system for the purposes of comparing the two, as I think that the dual EPYC 7702/7742 would still outperform this Threadripper Pro 3995WX.
  • Duncan Macdonald - Tuesday, February 9, 2021 - link

    Given the benchmarks and the prices, the main reason for using the Threadripper Pro rather than the plain Threadripper is likely to be the higher memory capacity (2TB vs 256GB) .
    Even a small overclock on a standard Threadripper would allow it to be faster than a non-overclocked Threadripper Pro for any application that fits into 256GB.
  • twtech - Tuesday, February 9, 2021 - link

    There are a couple other pretty significant differences that matter perf-wise in some scenarios - the Pro has 8-channel memory support, and more PCIE lanes.

    Significant differences not directly tied to performance include registered ECC support, and management tools for corporate security, which actually matters quite a bit with everyone working remotely.
  • WaltC - Tuesday, February 9, 2021 - link

    On the whole, a nice review...;)

    Yes, it's fairly obvious that one CPU core does not equal one GPU core, as comparatively, the latter is wide and shallow and handles fewer instructions, IPC, etc. GPU cores are designed for a specific, narrow use case, whereas CPU cores are much deeper (in several ways) and designed for a much wider use case. It's nice that companies are designing programming languages to utilize GPUs as untapped computing resources, but the bottom line is that GPUs are designed primarily to accelerate 3d graphics and CPUs are designed for heavy, multi-use, multithreaded computation with a much deeper pipeline, etc. While it might make sense to use both GPUs and CPUs together in a more general computing case once the specific-case programming goals for each kind of processing hardware are reached, it makes no sense to use GPUs in place of CPUs or CPUs in place of GPUs. AMD has recently made no secret it is divulging its GPU line to provide more 3d-acceleration circuitry and less compute circuitry for gaming, and another branch that will include more CU circuitry and less gaming-use 3d-acceleration circuitry. 'bout time.

    The software rendering of Crysis is a great example--an old, relatively slow 3d GPU accelerator with a CPU can bust the chops of even WX3995 CPUs *if* the 3995WX is tasked to rendering Crysis sans a 3d accelerator. When the Crysis-engine talks about how many cores and so on it will support, it's talking about using a 3d accelerator *with* a general-purpose CPU. That's what the engine is designed to do, actually. Take the CPU out and the engine won't run at all--trying to use the CPU as the API renderer and it's a crawl that no one wants...;) Most of all, using the CPU to "render" Crysis in software has no comparison to a CPU rendering a ray-traced scene, for instance. Whereas the CPU is rendering to a software D3d API in Crysis, ray-tracing is done by far more complex programming that will not be found in the Crysis engine (of course.)

    I was surprised to read that Ian didn't think that 8-channel memory would add much of anything to performance beyond 4-channel support....;) Eh? It's the same principle as expecting 4-channel to outperform 2 channel, everything else being equal. Of course, it makes a difference--if it didn't there would be no sense in having 3995WX support 8 channels. No point at all...;)
  • Oxford Guy - Tuesday, February 9, 2021 - link

    Yes, the same principle of expecting a dual core to outperform a single core — which is why single/core CPUs are still dominant.

    (Or, we could recognize that diminishing returns only begin to matter at a certain point.)
  • tyger11 - Tuesday, February 9, 2021 - link

    Definitely waiting for the Zen 3 version of the 3955X. I'm fine with 16 cores.

Log in

Don't have an account? Sign up now