OS Preparation and Benchmark Installation

Windows 10 Pro

As we started to use Windows 10 Pro in our last update, there's a large opportunity for something to come in and disrupt our testing. Windows 10 is known to kick-in and check for updates at any hour of the day (and we’re testing 24hr), so anything that can interrupt or take CPU time away from benchmarking is a bit of a hassle. There’s also the added element of Windows silently adjusting the update schedule and moving places in the registry without warning.

During building this latest suite, Microsoft launched Windows 10 version 2004. There is always a question as to what we should do in this regard – move to the absolute latest, or take a step back to something more stable and fewer bugs but it might not be as relevant. In order to not create any level of programming debt, by which lots of work is needed to fix the smallest issues that might arise, we often choose the latter. In this regard, we are using Windows 10 version 1909 (18363.900). It has since transpired, from talking to peers, that 2004 has a number of issues that would affect benchmarking consistency, which validates our concerns.

Naturally, the first thing an OS wants to do when it starts up is connect to the internet and update. We install the OS without the internet connected, and our install image automatically sets the update period to the maximum period possible. The scripts we run are continuously updated to ensure that when the benchmark starts, the ‘don’t restart’ period for the OS is resynchronized to the latest possible time. There’s nothing worse than a restart in the middle of a scripted run to wake up in the morning to find that the system rebooted at 1am.

The OS is installed manually with most of the default settings, and disabling all the extra monitoring features offered on install. On entering the OS, our default strategy is multiple: disable the ability to update as much as possible in the registry, disable Windows Defender, uninstall OneDrive, disable Cortana as much as possible, implement the high performance mode in the power options and disable the platform from turning off the display. We also pull the latest version of CPU-Z from network storage, in case we are testing a very new system. Another script is in place to run when the OS loads, to check the CPU and GPU is what we expect, as well as the GPU drivers that we needed are in place, as Windows has a habit of updating those without saying anything. Windows Defender is also disabled, as it (personally) has historically seems to eat CPU time if the network changes for no reason, even when the system is in use.

Some of these strategies are designed to be redundant. The goal here is to attack the option needed in as many different ways as possible. There’s nothing lost by being thorough at this point and hammering the point home. This means executing registry files that adjust settings, executing batch files which do the same while installing files, and reiterating these commands before every benchmark run in order to be crystal clear. Simply put, do not implicitly trust Windows to leave the settings alone. Something always invariably changes (or moves somewhere else) if it is not monitored. Some of these commands that are in place are also old/legacy, but are kept as they don’t otherwise adjust the system (and can take effect if options that are continually moved around suddenly move back).

It is worth noting that some of the options, when run through a batch file, require the file to be run as Administrator. Windows 10 makes a frustrating task to do so manually recently without implementing user access elevation. The best way to ensure that the batch file always runs in admin mode seems to be to create a shortcut to the batch file, and adjusting the properties of the shortcut to always enable the ‘run as admin’ mode. It is an interesting kludge for that to work, and it is frustrating I cannot just adjust the batch file properties directly to run as admin every time.

Benchmark Installs

When choosing a benchmark, it often falls under two headers – standalone, such that it can be run as is, or ones that need installation. With installation, these are subdivided further into those with silent installers, and those who have to have the installation done manually.

Installing benchmarks can either be done before running the main script, or be integrated directly into the main testing script. As time has progressed, we have moved from the former to the latter, so we can wrap uninstall commands into the script if we only get limited access to a system. For the manually installed benchmarks this isn’t possible, and technically calling an install/uninstall from the script does make total testing time longer, but it also reduces requirements for SSD capacity by not having everything installed at once. Experience of doing this scripting over the past few years, and making the benchmark scripts as portable as possible, have pointed to making the install/uninstall part of the benchmark run.

Benchmarks that could be run without installing, known as ‘standalone’ benchmarks, are the holt grail. Cinebench and others are great for this. But for the others, these are probed for silent install methods. Certain benchmarks in the past, such as PCMark8, also have additional features to enable online registration to enable DRM through the command line. Other installers, such as .msi files, seem to be unable to be installed if they are not in the directory from which the batch file was called without the right commands. When scripting successive installs, it becomes important to check the previous one has finished before another one starts, otherwise the script might jump straight to the next installer before the previous ones were finished, making it tricky as well.

For msi files, our install code relies heavily on the following command to ensure that installs are finished before tackling the next one:

cmd /c start /wait msiexec /qb /i <file>

Most .msi files have the same flags for silent installs, however install executables can vary significantly and require probing the vendor documentation. For the most part, a ‘/S’ flag is the silent install flag, while others require /norestart to ensure the system doesn’t restart immediately, or /quiet, to get going in a silent fashion. Some installations use none of these and rely on their own definitions of what constitutes a silent install flag. I’m looking at you, Adobe. However ultimately, most software packages that can install silently, or require additional commands to enable licenses, and are ready to be called for their respective tests.

One benchmark is a special case: Chrome. Chrome has the amazing ability to update itself as soon as it is installed – even without opening it or when the system is booted. To stop this from happening is more than just a simple software adjustment, purely because Google no longer offers an option to delay updates. We initially found an undocumented way to stop it from updating, which requires the install script to gut some of the files after installing the software in order to stop this happening, however the quick update cycle of Chrome means that our v56 version from last year is now out of date. To get over this, we are using a standalone version of Chromium.

The final benchmark in our install is Steam, which is a fully manual only install. Valve has created Steam with a really odd interface interaction mechanism type, even for AHK scripting, which makes installing Steam a bit of a hassle. Valve does not offer a complete standalone installer here, so the base program opens after installation to download ~200MB of updates on a fresh system. We install the software over the Steam directory already present on the benchmark partition from a previous OS install, so the games do not need to be re-downloaded. (When an OS is installed, it’s installed on a specific OS partition, and all benchmarks are kept on a second partition).

One other point to be aware of is when software checks for updates. Loading AIDA, for example, means that it will probe online for the latest version and leave a hanging message box to be answered before a script can continue. There are often two ways to do this, and the best is if the program allows the user to set the ‘no updates’ automatically in the configuration files. The fall back tactic that works is to disable the internet connectivity (often by disabling all network adaptors through PowerShell) while the application is running.

Benchmark Automation The CPU Overload 2020 Suite
Comments Locked

110 Comments

View All Comments

  • Smell This - Monday, July 20, 2020 - link


    ;- )
  • Oxford Guy - Monday, July 20, 2020 - link

    "If there’s a CPU, old or new, you want to see tested, then please drop a comment below."

    • i7-3820. This one is especially interesting because it had roughly the same number of transistors as Piledriver on roughly the same node (Intel 32nm vs. GF 32 nm).

    • 5775C

    • 5675C (which outperformed and matched the 5775C in some games due to thermal throttling)

    • 5775C with TDP bypassed or increased if this is possible, to avoid the aforementioned throttling

    • I would really really like you to add Deserts of Kharak to your games test suite. It is the only game I know of that showed Piledriver beating Intel's chips. That unusual performance suggests that it was possible to get more performance out of Piledriver if developers targeted that CPU for optimization and/or the game's engine somehow simply suited it particularly.

    • 8320E or 8370E at 4.7 GHz (non-turbo) with 2133 CAS 9-11-10 RAM, the most optimal Piledriver setup. The 9590 was not the most performant of the FX line, likely because of the turbo. A straight overclock coupled with tuned RAM (not 1600 CAS 10 nonsense) makes a difference. 4.7 GHz is a realistic speed achievable by a large AIO or small loop. If you want air cooling only then drop to 4.5 Ghz but keep the fast RAM. The point of testing this is to see what people were able to get in the real world from the AMD alternative for all the years they had to wait for Zen. Since we were stuck with Piledriver as the most performant Intel alternative for so so many years it's worth including for historical context. The "E" models don't have to be used but their lower leakage makes higher clocks less stressful on cooling than a 9000 series. 4.7 GHz was obtainable on a cheap motherboard like the Gigabyte UD3P, with strong airflow to the VRM sink.

    • VIA's highest-performance model. If it won't work with Windows 10 then run the tests on it with 8.1. The thing is, though... VIA released an update fairly recently that should make it compatible with Windows 10. I saw Youtube footage of it gaming, in fact, with a discrete card. It really would be a refreshing thing to see VIA included, even though it's such a bit player.

    • Lynnfield at 3 GHz.

    • i7-9700K, of course.
  • Oxford Guy - Monday, July 20, 2020 - link

    Regarding Deserts of Kharak... It may be that it took advantage of the extra cores. That would make it noteworthy also as an early example of a game that scaled to 8 threads.
  • Oxford Guy - Monday, July 20, 2020 - link

    Also, the Chinese X86 CPU, the one based on Zen 1, would be very nice to have included.
  • Oxford Guy - Monday, July 20, 2020 - link

    VIA CPUs tested with games as recently as 2019 (there was another video of the quad core but I didn't find it today with a quick search):

    https://www.youtube.com/watch?v=JPvKwqSMo-k
    https://www.youtube.com/watch?v=Da0BkEW459E

    The Zhaoxin KaiXian KX-U6880A would be nice to see included, not just the Chinese Zen 1 derivative.
  • Oxford Guy - Monday, July 20, 2020 - link

    "due to thermal throttling"

    TDP throttling, to be more accurate. I suppose it could throttle due to current demand rather than temp.
  • axer1234 - Monday, July 20, 2020 - link

    honestly i would love to know how different generation processor perform today especially higher core count. like prescott series pentium 4 athlon II phenomX6 core2 duo core2quad nehlam sandy bridge bulldozer etc with todays generation work loads and offering

    in many scenario like word excel ppt photoshop it all works very well still in many offices
    its just the new generation of application slowing it down for almost the same work etc
  • herefortheflops - Monday, July 20, 2020 - link

    @Dr. Cutress.,

    As someone that has been dealing with similar or greater product testing challenges and configuration complexity for the better part of a decade or so, I would like to commend you for your ambitious goals and efforts so far. Additionally, I could be of high value to your effort if you are willing to discuss. I have reviewed in-depth the bench database (as well as competing websites) and I have come to the conclusion the Anandtech bench data is of very limited usefulness at present--and would require some significant changes to the data being collected/reported and the way things have been done to this point. I do understand where the industry is going, the questions the readers are going to be asking of the data, and the major comparisons that will be attempted with the data. Unfortunately, much of your effort may easily become irrelevant unless you proceed with some extreme caution to provide data with more utility. I also know methods to accomplish the desired result while reducing the size and cost of the task at hand. Reply by e-mail if you are interested in talking.

    Best,
    -A potential contributor to your effort.
  • Bensam123 - Tuesday, July 21, 2020 - link

    Despite how impressive this is, one thing that hasn't been tackled is still multiplayer performance and it vastly changes recommendations for CPUs (doesn't effect GPUs as much).

    It goes from recommending a 6 core chip hands down to trying to make a case for 4 core chips still in this day and age. I own a 3900x and 2800 and I can tell you hands down Modern Warfare will gobble 70% of that 12 core chip, sometimes a bit more, that's equivalent to maxing out a 8 core of the same series. That vastly changes recommendations and data points. It's not just Modern Warfare. Overwatch, Black Ops 3(same engine as MW), and recently Hyper Scape will will make use of those extra cores. I have a widget to monitor CPU utilization in the background and I can check Task Manager. If I had a better video card I'm positive it would've sucked down even more of those 12 cores (my GPU is running at 100% load according to MSI AB).

    This is a huge deal and while I understand, I get it, it's hard to reliably reproduce the same results in a multiplayer environment because it changes so much and generally seen as taboo from a hardware benchmarking standpoint, it is vastly different then singleplayer workloads to the point at which it requires completely different recommendations. Given how many people are making expensive hardware choices specifically because they play multiplayer games, I would even say most tech reviews in this day and age are irrelevant for CPU recommendations outside of the casual single player gamer. GPU recommendations are still very much on par, CPU is not remotely.

    I talk about this frequently on my stream and why I still recommended the 1600 AF even when it was sitting at $105-125, it's a steal if you play multiplayer games, while most people that either read benchmarking websites or run benchmarks themselves will start making a case for a 4c Intel. 6 core is a must at the very least in this day and age.

    Anandtech it's time to tread new ground and go into the uncharted area. Singleplayer results and multiplayer results are too different, you can't keep spinning the wheel and expect things to remain the same. You can verify this yourself just by running task manager in the background while playing one of the games I mentioned at the lowest settings regardless of being able to repeat those results exactly you'll see it's definitely a multi-core landscape for newer multiplayer games.

    Not even touched on in the article.
  • Bensam123 - Tuesday, July 21, 2020 - link

    70%, I have SMT off for clarification.

Log in

Don't have an account? Sign up now