Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010

Name: Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010
Item: Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010
Author: Dr. Ian Cutress

by Dr. Ian Cutress on July 20, 2020 1:30 PM EST

110 Comments | Add A Comment

110 Comments

CPU Tests: Synthetic

Most of the people in our industry have a love/hate relationship when it comes to synthetic tests. On the one hand, they’re often good for quick summaries of performance and are easy to use, but most of the time the tests aren’t related to any real software. Synthetic tests are often very good at burrowing down to a specific set of instructions and maximizing the performance out of those. Due to requests from a number of our readers, we have the following synthetic tests.

Linux OpenSSL Speed

In our last review, and on my twitter, I opined about potential new benchmarks for our suite. One of our readers reached out to me and stated that he was interested in looking at OpenSSL hashing rates in Linux. Luckily OpenSSL in Linux has a function called ‘speed’ that allows the user to determine how fast the system is for any given hashing algorithm, as well as signing and verifying messages.

OpenSSL offers a lot of algorithms to choose from, and based on a quick Twitter poll, we narrowed it down to the following:

rsa2048 sign and rsa2048 verify
sha256 at 8K block size
md5 at 8K block size

For each of these tests, we run them in single thread and multithreaded mode.

To automate this test, Windows Subsystem for Linux is needed. For our last benchmark suite I scripted up enabling WSL with Ubuntu 18.04 on Windows in order to run SPEC, so that stays part of the suite (and actually now becomes the biggest pre-install of the suite).

OpenSSL speed has some commands to adjust the time of the test, however the way the script was managing it meant that it never seemed to work properly. However, the ability to adjust how many threads are in play does work, which is important for multithreaded testing.

(8-3a) Linux OpenSSL Speed rsa2048 Sign (1T) (8-3b) Linux OpenSSL Speed rsa2048 Verify (1T) (8-3c) Linux OpenSSL Speed sha256 8K Block (1T) (8-3d) Linux OpenSSL Speed md5 8K Block (1T)

This test produces a lot of graphs, so for full reviews I might keep the rsa2048 ones and just leave the sha256/md5 data in Bench.

(8-4a) Linux OpenSSL Speed rsa2048 Sign (nT) (8-4b) Linux OpenSSL Speed rsa2048 Verify (nT) (8-4c) Linux OpenSSL Speed sha256 8K Block (nT) (8-4d) Linux OpenSSL Speed md5 8K Block (nT)

The AMD CPUs do really well in the sha256 test due to native support for SHA256 instructions.

GeekBench 4: Link

As a common tool for cross-platform testing between mobile, PC, and Mac, GeekBench is an ultimate exercise in synthetic testing across a range of algorithms looking for peak throughput. Tests include encryption, compression, fast Fourier transform, memory operations, n-body physics, matrix operations, histogram manipulation, and HTML parsing.

I’m including this test due to popular demand, although the results do come across as overly synthetic, and a lot of users often put a lot of weight behind the test due to the fact that it is compiled across different platforms (although with different compilers). Technically GeekBench 5 exists, however we do not have a key for the pro version that allows for command line processing.

For reviews we are posting the overall single and multi-threaded results.

(8-1a) Geekbench 4.0 ST (8-1b) Geekbench 4.0 MT

I have noticed that Geekbench 4 over Geekbench 5 does rely a lot on its memory subtests, which could play a factor if we have to test limited-access CPUs in different systems.

AIDA64 Memory Bandwidth: Link

Speaking of memory, one of the requests we have had is to showcase memory bandwidth. Lately AIDA64 has been doing some good work in providing automation access, so for this test I used the command line and some regex to extract the data from the JSON output. AIDA also provides screenshots of its testing windows as required.

For the most part, we expect CPUs of the same family with the same memory support to not differ that much – there will be minor differences based on the exact frequency of the time, or how the power budget gets moved around, or how many cores are being fed by the memory at one time.

LinX 0.9.5 LINPACK

One of the benchmarks I’ve been after for a while is just something that outputs a very simple GFLOPs FP64 number, or in the case of AI I’d like to get a value for TOPs at a given level of quantization (FP32/FP16/INT8 etc). The most popular tool for doing this on supercomputers is a form of LINPACK, however for consumer systems it’s a case of making sure that the software is optimized for each CPU.

LinX has been a popular interface for LINPACK on Windows for a number of years. However the last official version was 0.6.5, launched in 2015, before the latest Ryzen hardware came into being. HWTips in Korea has been updating LinX and has separated out into two versions, one for Intel and one for AMD, and both have reached version 0.9.5. Unfortunately the AMD version is still a work in progress, as it doesn’t work on Zen 2.

There does exist a program called Linpack Extreme 1.1.3, which claims to be updated to use the latest version of the Intel Math Kernel Libraries. It works great, however the way the interface has been designed means that it can’t be automated for our uses, so we can’t use it.

For LinX 0.9.5, there also is a difficulty of what parameters to put into LINPACK. The two main parameters are problem size and time – choose a problem size too small, and you won’t get peak performance. Choose it too large, and the calculation can go on for hours. To that end, we use the following algorithms as a compromise:

Memory Use = Floor(1000 + 20*sqrt(threads)) MB
Time = Floor(10+sqrt(threads)) minutes

For a 4 thread system, we use 1040 MB and run for 12 minutes.
For a 128 thread system, we use 1226 MB and run for 21 minutes.

We take the peak value of GFLOPs by the output as a result. Unfortunately the output doesn’t come out in a clean UTF-8 regular output, which means this is one result we have to read direct from the results file.

(8-5) LinX 0.9.5 LINPACK

As we add in more CPUs, this graph should look more interesting. If a Zen2 version is deployed, we will adjust our script accordingly.

CPU Tests: Legacy and Web CPU Tests: SPEC

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

110 Comments

View All Comments

ruthan - Monday, July 27, 2020 - link
Well lots of bla, bla, bla.. I checked graphs in archizlr they are classic just few entries.. there is link to your benchmark database, but here i see preselected some Crysis benchmark, which is not part of article.. and dont lead to some ultimate lots of cpus graphs. So it need much more streamlining.

i usually using old Geekbench for cpus tests and there i can compare usually what i want.. well not with real applications and games, but its quick too. Otherwise usually have enough knowledge to know if is some cpu good enough for some games or not.. so i dont need some very old and very need comparisions. Something can be found at Phoronix.
These benchmarks will always lots relevancy with new updates, unless all cpus would in own machines and update and running and reresting constantly - which could be quite waste of power and money.
Maybe some golden path is some simple multithreaded testing utility with 2 benchmarks one for integers and one for floats.
Ian Cutress - Wednesday, August 5, 2020 - link
When you're in Bench, Check the drop down menu on your left for the individual tests
hnlog - Wednesday, July 29, 2020 - link
> For our testing on the 2020 suite, we have secured three RTX 2080 Ti GPUs direct from NVIDIA.
Congrats!
Koenig168 - Saturday, August 1, 2020 - link
It would be more efficient to focus on the more popular CPUs. Some of the less popular SKUs which differ only by clock speed can have their performance extrapolated. Testing 900 CPUs sound nice but quickly hit diminishing returns in terms of usefulness after the first few hundred.

You might also wish to set some minimum performance standards using just a few tests. Any CPU which failed to meet those standards should be marked as "obsolete, upgrade already dude!" and be done with them rather than spend the full 30 to 40 hours testing each of them.

Finally, you need to ask yourself "How often do I wish to redo this project and how much resources will I be able to devote to it?" Bearing in mind that with new drivers, games etc, the database needs to be updated oeriodically to stay relevant. This will provide a realistic estimate of how many CPUs to include in the database.
Meteor2 - Monday, August 3, 2020 - link
I think it's a labour of love...
TrevorX - Thursday, September 3, 2020 - link
My suggestion would be to bench the highest performing Xeons that supported DDR3 RAM. Why? Because the cost of DDR3 RDIMMs is so amazingly cheap (as in, less than 10%) compared with DDR4. I personally have a Xeon E5-1660v2 @4.1GHz with 128GB DDR3 1866MHz RDIMMs that's the most rock stable PC I've ever had. Moving up to a DDR4 system with similar memory capacity would be eye-wateringly expensive. I currently have 466 tabs open in Chrome, Outlook, Photoshop, Word, several Excel spreadsheets, and I'm only using 31.3% of physical RAM. I don't game, so I would be genuinely interested in what actual benefit would be derived from an upgrade to Ryzen / Threadripper.

Also very keen to see server/hypervisor testing of something like Xeon E5-2667v2 vs Xeon W-1270P or Xeon Silver 4215R for evaluation of on-prem virtualisation hosts. A lot of server workloads are being shifted to the cloud for very good reasons, but for smaller businesses it might be difficult to justify the monthly expense of cloud hosting (and Azure licensing) when they still have a perfectly serviceable 5yo server with plenty of legs left on it. It would be great to be able to see what performance and efficiency improvements can be had jumping between generations.
Tilmitt - Thursday, October 8, 2020 - link
When is this going to be done?
Mil0 - Friday, October 16, 2020 - link
Well they launched with 12 results if I count correctly, and currently there are 38 listed, that's close to 10/month. With the goal of 900, that would mean over 7 years (in which ofc more CPUs would be released)
Mil0 - Friday, October 16, 2020 - link
Well they launched with 12 results if I count correctly, and currently there are 44 listed, that's about a dozen a month. With the goal of 900, that would mean 6 years (in which ofc more CPUs would be released)
Mil0 - Friday, October 16, 2020 - link
Caching hid my previous comment from me, so instead of a follow up there are now 2 pretty similar ones. However, in the mean time I found Ian is actually updating on twitter, which you can find here: https://twitter.com/IanCutress/status/131350328982...

He actually did 36 CPU's in 2.5 months, so it should only take 5 years! :D

Launching the #CPUOverload Project: Testing Every x86 Desktop Processor since 2010

CPU Tests: Synthetic

Linux OpenSSL Speed

GeekBench 4: Link

AIDA64 Memory Bandwidth: Link

LinX 0.9.5 LINPACK

Post Your Comment

110 Comments

View All Comments

ruthan - Monday, July 27, 2020 - link

Ian Cutress - Wednesday, August 5, 2020 - link

hnlog - Wednesday, July 29, 2020 - link

Koenig168 - Saturday, August 1, 2020 - link

Meteor2 - Monday, August 3, 2020 - link

TrevorX - Thursday, September 3, 2020 - link

Tilmitt - Thursday, October 8, 2020 - link

Mil0 - Friday, October 16, 2020 - link

Mil0 - Friday, October 16, 2020 - link

Mil0 - Friday, October 16, 2020 - link

Log in

Don't have an account? Sign up now