Retesting AMD Ryzen Threadripper’s Game Mode: Halving Cores for More Performance
by Ian Cutress on August 17, 2017 12:01 PM ESTCPU Legacy Tests
Our legacy tests represent benchmarks that were once at the height of their time. Some of these are industry standard synthetics, and we have data going back over 10 years. All of the data here has been rerun on Windows 10, and we plan to go back several generations of components to see how performance has evolved.
All of our benchmark results can also be found in our benchmark engine, Bench.
3D Particle Movement v1
3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. This is the original version, written in the style of a typical non-computer science student coding up an algorithm for their theoretical problem, and comes without any non-obvious optimizations not already performed by the compiler, such as false sharing.
CineBench 11.5 and 10
Cinebench is a widely known benchmarking tool for measuring performance relative to MAXON's animation software Cinema 4D. Cinebench has been optimized over a decade and focuses on purely CPU horsepower, meaning if there is a discrepancy in pure throughput characteristics, Cinebench is likely to show that discrepancy. Arguably other software doesn't make use of all the tools available, so the real world relevance might purely be academic, but given our large database of data for Cinebench it seems difficult to ignore a small five minute test. We run the modern version 15 in this test, as well as the older 11.5 and 10 due to our back data.
x264 HD 3.0
Similarly, the x264 HD 3.0 package we use here is also kept for historic regressional data. The latest version is 5.0.1, and encodes a 1080p video clip into a high-quality x264 file. Version 3.0 only performs the same test on a 720p file, and in most circumstances the software performance hits its limit on high-end processors, but still works well for mainstream and low-end. Also, this version only takes a few minutes, whereas the latest can take over 90 minutes to run.
The 1950X: the first CPU to score higher on the 2nd pass of this test than it does on the first pass.
104 Comments
View All Comments
peevee - Friday, August 18, 2017 - link
Compilation scales even on multi-CPU machines. With much higher communication latencies.In general, compilers running in parallel on MSVC (with MSBuild) run in different processes, they don't write into each other's address spaces and so do not need to communicate at all.
Quit making excuses. You are doing something wrong. I am doing development for multi-CPU machines and ON multi-CPU machines for a very long time. YOU are doing something wrong.
peevee - Friday, August 18, 2017 - link
BTW, when you enable NUMA on TR, does Windows 10 recognize it as one CPU group or 2?gzunk - Saturday, August 19, 2017 - link
It recognizes it as two NUMA nodes.Alexey291 - Saturday, September 2, 2017 - link
They aren't going to do anything.All their 'scientific benchmarking' is running the same macro again and again on different hardware setups.
What you are suggesting requires actual work and thought.
Arbie - Thursday, August 17, 2017 - link
As noted by edzieba, the correct phrase (and I'm sure it has a very British heritage) is "The proof of the pudding is in the eating".Another phrase needing repair: "multithreaded tests were almost halved to the 1950X". Was this meant to be something like "multithreaded tests were almost half of those in Creator mode" (?).
Technically, of course, your articles are really well-done; thanks for all of them.
fanofanand - Thursday, August 17, 2017 - link
Thank you for listening to the readers and re-testing this, Ian!ddriver - Thursday, August 17, 2017 - link
To sum it up - "game mode" is moronic. It is moronic for amd to push it, and to push TR as a gaming platform, which is clearly neither its peak, nor even its strong point. It is even more moronic for people to spend more than double the money just to have half of the CPU disabled, and still get worse performance than a ryzen chip.TR is great for prosumers, and represents a tremendous value and performance at a whole new level of affordability. It will do for games if you are a prosumer who occasionally games, but if you are a gamer it makes zero sense. Having AMD push it as a gaming platform only gives "people" the excuse to whine how bad it is at it.
Also, I cannot shake the feeling there should be a better way to limit scheduling to half the chip for games without having to disable the rest, so it is still usable to the rest of the system.
Gothmoth - Thursday, August 17, 2017 - link
first coders should do their job.. that is the main problem today. lazy and uncompetent coders.eriohl - Thursday, August 17, 2017 - link
Of course you could limit thread scheduling on software level. But it seems to me that there is a perfectly reasonable explanation why Microsoft and the game developers haven't been spending much time optimizing for running games on systems with NUMA.HomeworldFound - Thursday, August 17, 2017 - link
You can't call a coder that doesn't anticipate a 16 core 32 thread CPU lazy. The word is incompetent btw. I'd like to see you make a game worth millions of dollars and account for this processor, heck any processor with more than six cores.