The CPU Overload 2020 Suite

Our new CPU tests go through a number of main areas. We cover Web tests using our un-updateable version of Chromium, opening tricky PDFs, emulation, brain simulation, AI, 2D image to 3D model conversion, rendering (ray tracing, modeling), encoding (compression, AES, video and HEVC), office based tests, and our legacy tests (throwbacks from another generation of code but interesting to compare). Over the next few pages we’ll go over the high level of each test.

However, as mentioned in passing on the previous page, we run a number of registry edit commands again to ensure that various system features are turned off and disabled at the start of the benchmark suite. This includes disabling Cortana, disabling the GameDVR functionality, disabling Windows Error Reporting, disabling Windows Defender as much as possible again, disabling updates, and re-implementing power options and removing OneDrive, in-case it sprouted wings again.

A number of these tests have been requested by our readers, and we’ve split our tests into a few more categories than normal as our readers have been requesting specific focal tests for their workloads. A recent run on a Core i5-10600K, just for the CPU tests alone, took around 20 hours to complete.

Power

  • Peak Power (y-Cruncher using latest AVX)
  • Per-Core Loading Power using POV-Ray

Office

  • Agisoft Photoscan 1.3: 2D to 3D Conversion
  • Application Loading Time: GIMP 2.10.18 from a fresh install
  • Compile Testing (WIP)

Science

  • 3D Particle Movement v2.1 (Non-AVX + AVX2/AVX512)
  • y-Cruncher 0.78.9506 (Optimized Binary Splitting Compute for mathematical constants)
  • NAMD 2.13: Nanoscale Molecular Dynamics on ApoA1 protein
  • AI Benchmark 0.1.2 using TensorFlow (unoptimized for Windows)

Simulation

  • Digicortex 1.35: Brain stimulation simulation
  • Dwarf Fortress 0.44.12: Fantasy world creation and time passage
  • Dolphin 5.0: Ray Tracing rendering test for Wii emulator

Rendering

  • Blender 2.83 LTS: Popular rendering program, using PartyTug frame render
  • Corona 1.3: Ray Tracing Benchmark
  • Crysis CPU-Only: Can it run Crysis? What, on just the CPU at 1080p? Sure
  • POV-Ray 3.7.1: Another Ray Tracing Test
  • V-Ray: Another popular renderer
  • CineBench R20: Cinema4D Rendering engine

Encoding

  • Handbrake 1.32: Popular Transcoding tool
  • 7-Zip: Open source compression software
  • AES Encoding: Instruction accelerated encoding
  • WinRAR 5.90: Popular compression tool

Legacy

  • CineBench R10
  • CineBench R11.5
  • CineBench R15
  • 3DPM v1: Naïve version of 3DPM v2.1 with no acceleration
  • X264 HD3.0: Vintage transcoding benchmark

Web

  • Kraken 1.1: Depreciated web test with no successor
  • Octane 2.0: More comprehensive test (but also deprecated with no successor)
  • Speedometer 2: List-based web-test with different frameworks

Synthetic

  • Geekbench 4
  • AIDA Memory Bandwidth
  • Linux OpenSSL Speed (rsa2048 sign/verify, sha256, md5)
  • LinX 0.9.5 LINPACK

SPEC (Estimated)

  • SPEC2006 rate-1T
  • SPEC2017 rate-1T
  • SPEC2017 rate-nT

It should be noted that due to the terms of the SPEC license, because our benchmark results are not vetted directly by the SPEC consortium, we have to label them as ‘estimated’. The benchmark is still run and we get results out, but those results have to have the ‘estimated’ label.

Others

  • A full x86 instruction throughput/latency analysis
  • Core-to-Core Latency
  • Cache-to-DRAM Latency
  • Frequency Ramping
  • A y-cruncher ‘sprint’ to see how 0.78.9506 scales will increasing digit compute

Some of these tests also have AIDA power wrappers around them in order to provide an insight in the way the power is reported through the test.

2020 CPU Gaming (GPU) Benchmarks

For our new set of CPU Gaming tests, we wanted to think big. There are a lot of users in the ecosystem that prioritize gaming above all else, especially when it comes to choosing the correct CPU. If there is a chance to save $50 and get a better graphics card for no loss in performance from the CPU, then this is the route that gamers would prefer to tread. The angle here though is tough - lots of games have different requirements and cause different stresses on a system, with various graphics cards having different reactions to the code flow of a game. Then users also have different resolutions and different perceptions of what feels 'normal'. This all amounts to more degrees of freedom than we could hope to test in a lifetime, only for the data to become irrelevant in a few months when a new game or new GPU comes into the mix. Just for good measure, let us add in DirectX 12 titles that make it easier to use more CPU cores in a game to enhance fidelity.

When it comes down to gaming tests, some of the same rules apply to the CPU tests. If we can get standalone versions of tests, then perfect – even better if they will never update, because that gives us a consistent codebase to work with. However, given the nature of Steam or Origin or the EPIC Store, having a consistent code base is not always possible. So for our gaming tests, for those that we could find with offline DRM-free variants (such as those from GOG), we used those instead. Otherwise we rely on Steam for the most part, because it is the only store front that offers an external API to allow us to check if an account is online – and thus a single account to be used across multiple systems. When scaling out automation, it can be difficult when there are multiple accounts to deal with, so as we aim for fewer than 10 systems running simultaneously, one account is enough.

I could speak for a few days about the gripes of automating gaming benchmarks – the ones that do it well compared to the ones that have no consideration for the others that want to use an in-game benchmark repeatedly. There’s also the discussion for in-game benchmarks vs native benchmarks, which I’ve had many times with colleagues and peers, that I might go into depth sometime. But I have thrown benchmark titles out for the stupidest things – updates that cause *new* splash screens is why I’ve cut games like AoTS and Civ6 in the past. Or Ubisoft games that offer benchmark modes that do not output benchmark results files. Or those files that create HTML files that need to be pruned for the correct data, rather than a simple text file. Or shall we go into games that have their settings not as simple ini files, but are embedded in the registry !?! Total War gets thrown out for not allowing key presses in its menus, and then having cheat detection when you try to emulate mouse movements. I have, on multiple occasions, spent a day of work trying to code for a game that just doesn’t want to work – as a result, it gets thrown out of our benchmark suite.

In the past, we’ve tackled the GPU benchmark set in several different ways. We’ve had one GPU to multiple games at one resolution, or multiple GPUs take a few games at one resolution, then as the automation progressed into something better, multiple GPUs take a few games at several resolutions. However, based on feedback, having the best GPU we can get hold of over a dozen games at several resolutions seems to be the best bet.

Normally securing GPUs for this testing is difficult, as we need several identical models for concurrent testing, and very rarely is a GPU manufacturer, or one of its OEM partners, happy to hand me 3-4+ of the latest and greatest. In that aspect, over the years, I have to thank ECS for sending us four GTX 580s in 2012, MSI for sending us three GTX 770 Lightnings in 2014, Sapphire for sending us multiple RX 480s and R9 Fury X cards in 2016, and in our last test suite, MSI for sending us three GTX 1080 Gaming cards in 2018.

For our testing on the 2020 suite, we have secured three RTX 2080 Ti GPUs direct from NVIDIA. These GPUs have been optimized for with drivers and in gaming titles, and given how rare our updates are, we are thankful for getting the high-end hardware.  (It’s worth noting we won’t be updating to whatever RTX 3080 variant is coming out at some point for a while yet.)

On the topic of resolutions, this is something that has been hit and miss for us in the past. Some users state that they want to see the lowest resolution and lowest fidelity options, because this puts the most strain on the CPU, such as a 480p Ultra Low setting. In the past we have found this unrealistic for all use cases, and even if it does give the best shot for a difference in results, the actual point where you come GPU limited might be at a higher resolution. In our last test suite, we went from the 720p Ultra Low up to 1080p Medium, 1440p High, and 4K Ultra settings. However, our most vocal readers hated it, because even by 1080p medium, we were GPU limited for the most part.

So to that end, the benchmarks this time round attempt to follow the basic patter where possible:

  1. Lowest Resolution with lowest scaling, Lowest Settings
  2. 2560x1440 with the lowest settings (1080p where not possible)
  3. 3840x2160 with the lowest settings
  4. 1920x1080 at the maximum settings

Point (1) should give the ultimate CPU limited scenario. We should see that lift as we move up through (2) 1440p and (3) 4K, with 4K low still being quite strenuous in some titles.

Point (4) is essentially our ‘real world’ test. The RTX 2080 Ti is overkill for 1080p Maximum, and we’ll see that most modern CPUs pull well over 60 FPS average in this scenario.

What will be interesting is that for some titles, 4K Low is more compute heavy than 1080p Maximum, and for other titles that relationship is reversed.

So we have the following benchmarks as part of our script, automated to the point of a one-button run and out pops the results approximately 10 hours later, per GPU. Also listed are the resolutions and settings used.

Offline Games

  1. Chernobylite, 360p Low, 1440p Low, 4K Low, 1080p Max
  2. Civilization 6, 480p Low, 1440p Low, 4K Low, 1080p Max
  3. Deus Ex: Mankind Divided, 600p Low, 1440p Low, 4K Low, 1080p Max
  4. Final Fantasy XIV: 768p Min, 1440p Min, 4K Min, 1080p Max
  5. Final Fantasy XV: 720p Standard, 1080p Standard, 4K Standard, 8K Standard
  6. World of Tanks enCore: 768p Min, 1080p Standard, 1080p Max, 4K Max

Online Games

  1. Borderlands 3, 360p VLow, 1440p VLow, 4K VLow, 1080p Badass
  2. F1 2019, 768p ULow, 1440p ULow, 4K ULow, 1080p Ultra
  3. Far Cry 5, 720p Low, 1440p Low, 4K Low, 1080p Ultra*
  4. Gears Tactics, 720p Low, 4K Low, 8K Low 1080p Ultra
  5. Grand Theft Auto 5, 720p Low, 1440p Low, 4K Low, 1080p Max
  6. Red Dead Redemption 2, 384p Min, 1440p Min, 4K Min, 1080p Max
  7. Strange Brigade DX12, 720p Low, 1440p Low, 4K Low, 1080p Ultra
  8. Strange Brigade Vulkan, 720p Low, 1440p Low, 4K Low, 1080p Ultra

For each of the games in our testing, we take the frame times where we can (the two that we cannot are Chernobylite and FFXIV). For these games, at each resolution/setting combination, we run them for as many loops in a given time limit (often 10 minutes per resolution). Results are then taken as average frame rates and 95th percentiles.

Some of the games are ultimately still being evaluated for usefulness, and may eventually be dropped – Far Cry 5 has taken more time than I care to admit to get to work. Some of these titles require the exact CPU/GPU combination to be part of the settings files otherwise the settings file will be discarded, which gets ever increasingly frustrating.

*Update 7/20 : I recently found that Far Cry 5 has additional requirements regarding monitor resolution support. If the settings file requests a resolution that it can’t detect in the monitor on the test bed, then it defaults to 1080p. My test beds contain two brands of 4K monitor – Dell UP2415Qs and cheap 27-inch TN displays, in a 50:50 split. For whatever reason, FC5 doesn’t really like any resolution changes on the Dell monitors. I can adjust the resolution scale (0.5x-2.0x) for this game, and quality, but I only found this out on 7/20, which means we have to rerun chips for this data.

If there are any game developers out there involved with any of the benchmarks above, please get in touch at ian@anandtech.com. I have a list of requests to make benchmarking your title easier!

The other angle is DRM, and some titles have limits of 5 systems per day. This may limit our testing in some cases; in other cases it is solvable.

OS Preparation and Benchmark Installation CPU Tests: Office
Comments Locked

110 Comments

View All Comments

  • Arbie - Monday, July 20, 2020 - link

    I din't realize how much work was being done. Thank you for maintaining this great resource.
  • Arbie - Monday, July 20, 2020 - link

    And maybe consider the technically excellent and easily benchmarked Ashes of the Singularity instead of the problematic Far Cry 5. Not as popular but modern and multi-core (and a great game).
  • BushLin - Monday, July 20, 2020 - link

    I suspect it's due to Far Cry 5's need for 8 threads which manifests in stutter for 6c6t CPUs in contrast to smooth gameplay on lower clocked 4c8t CPUs.
  • Tilmitt - Tuesday, July 21, 2020 - link

    Has anyone ever played Ashes as a game though?
  • Arbie - Tuesday, July 21, 2020 - link

    1000+ hours so far. Glorious in all respects including phenomenal AI. But when Ashes is mentioned someone always pops a comment like yours, which they probably just read somewhere else since it certainly isn't based on actual experience. Still hurts the game, though.
  • driscoll42 - Monday, July 20, 2020 - link

    This is awesome and amazing, I can't wait to see the results. And I hate to say "But what about", but maybe, if possible, go back to some of the popular older ones? No need to retest *everything*, but the most popular CPUs pre-2010 like the i7-920, Core 2 Quad Q6600, Core 2 Duo E8600, Core i7-870, etc...
  • ltcommanderdata - Monday, July 20, 2020 - link

    If they are going to test a few LGA 775 CPUs, I'd vote to also include NetBurst's last gasp, aka the Pentium Extreme Edition 965 as a really old gen reference. It'd be interesting to include it's then competitor, the Socket 939 AMD Athlon 64 FX-60, as well. I've always been curious whether Hyperthreading support allowed the Pentium EE 965 to age better than expected as multithreading became mainstream and possibly reduce the gap against the FX-60 and even early Core 2 Duo Conroe CPUs in modern software compared to the gap seen at launch.
  • mganai - Thursday, July 23, 2020 - link

    How about the dual socket LGA 771 with two Core 2 Extreme QX9650s?

    https://www.youtube.com/watch?v=wNo7qoLRtkQ
  • aryonoco - Monday, July 20, 2020 - link

    Epic work Ian. Epic!

    Now if only your publisher implemented a subscription model (a la Ars Technica) so I could still support your work withthout being bombarded by ads and tracked, I would feel a lot less guilty enjoying the fruits of your amazing work.
  • lmcd - Tuesday, July 21, 2020 - link

    This 100%

Log in

Don't have an account? Sign up now