CPU Tests: SPEC2006 1T, SPEC2017 1T, SPEC2017 nT

SPEC2017 and SPEC2006 is a series of standardized tests used to probe the overall performance between different systems, different architectures, different microarchitectures, and setups. The code has to be compiled, and then the results can be submitted to an online database for comparison. It covers a range of integer and floating point workloads, and can be very optimized for each CPU, so it is important to check how the benchmarks are being compiled and run.

We run the tests in a harness built through Windows Subsystem for Linux, developed by our own Andrei Frumusanu. WSL has some odd quirks, with one test not running due to a WSL fixed stack size, but for like-for-like testing is good enough. SPEC2006 is deprecated in favor of 2017, but remains an interesting comparison point in our data. Because our scores aren’t official submissions, as per SPEC guidelines we have to declare them as internal estimates from our part.

For compilers, we use LLVM both for C/C++ and Fortan tests, and for Fortran we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.

clang version 10.0.0
clang version 7.0.1 (ssh://git@github.com/flang-compiler/flang-driver.git
 24bd54da5c41af04838bbe7b68f830840d47fc03)

-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2
-mfma -mavx -mavx2

Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions. We decided to build our SPEC binaries on AVX2, which puts a limit on Haswell as how old we can go before the testing will fall over. This also means we don’t have AVX512 binaries, primarily because in order to get the best performance, the AVX-512 intrinsic should be packed by a proper expert, as with our AVX-512 benchmark.

To note, the requirements for the SPEC licence state that any benchmark results from SPEC have to be labelled ‘estimated’ until they are verified on the SPEC website as a meaningful representation of the expected performance. This is most often done by the big companies and OEMs to showcase performance to customers, however is quite over the top for what we do as reviewers.

For each of the SPEC targets we are doing, SPEC2006 rate-1, SPEC2017 speed-1, and SPEC2017 speed-N, rather than publish all the separate test data in our reviews, we are going to condense it down into individual data points. The main three will be the geometric means from each of the three suites. 

(9-0a) SPEC2006 1T Geomean Total

(9-0b) SPEC2017 1T Geomean Total

(9-0c) SPEC2017 nT Geomean Total

A fourth metric will be a scaling metric, indicating how well the nT result scales to the 1T result for 2017, divided by the number of cores on the chip. 

(9-0d) SPEC2017 MP Scaling

The per-test data will be a part of Bench.

Experienced users should be aware that 521.wrf_r, part of the SPEC2017 suite, does not work in WSL due to the fixed stack size. It is expected to work with WSL2, however we will cross that bridge when we get to it. For now, we’re not giving wrf_r a score, which because we are taking the geometric mean rather than the average, should not affect the results too much.

CPU Tests: Synthetic CPU Tests: Microbenchmarks
Comments Locked

110 Comments

View All Comments

  • Sootie - Tuesday, July 21, 2020 - link

    Any chance of a crowd sourced version of the bench? People with unusual CPU's could run a cut down version of the bench with only software that does not require a license and heavily disclaimed that it was not an official run just to add a few more data points of rare devices. I have a whole museum of old servers I can run some tests on but it's not practical to send them elsewhere.

    I'm a big fan of all the work you have done and are doing on the bench though I use it constantly for work and home.
  • Tilmitt - Tuesday, July 21, 2020 - link

    Phenom II X6 and X4 would be cool to see if the "more cores make future proof" narrative actually holds up.
  • lmcd - Tuesday, July 21, 2020 - link

    X6 outperformed early Bulldozer 8 cores by a notable bit if that's of any interest.
  • loads2compute - Tuesday, July 21, 2020 - link

    Dear Ian,

    Wow! What a nice idea to test all these legacy processors on modern benchmarks. I think it is a great idea!

    But als wow, what an enormous effort you are taking on automating all that stuff, starting from scratch and using autohotkey as your main tool. It seems like going to an uninhabited island, starting civilization from scratch and taking a tin opener as your main tool.

    In my line of work (bioinformatics) we have to automate a load of consecutive tasks. Luckily there are frameworks for this, which make the work a lot easier.

    Luckily there is already a framework for automated testing and benchmarking which happens to work on Linux, Mac and Windows (and even BSD). It is called the phoronix test suite http://phoronix-test-suite.com/. It can be extended with modules, so you could integrate all your desired tests in there. There is even paid support available, but since they guy who runs this (Michael Larabel) is working on a fellow tech outlet (phoronix.com) I am sure you can work something out to your mutual benefit. No doubt he is interested in all these old processor benchmarks too!

    The phoronix test suite also comes with phoromatic, which according to the website : "allows the automatic scheduling of tests, remote installation of new tests, and the management of multiple test systems all through an intuitive, easy-to-use web interface."

    So please do not start from scratch and do this yourself! Use this great open-source tool that is already available and consequently you will be able to get a lot more work done on the stuff that actually interests you! (I take it AHK scripting is not your hobby).
  • Ian Cutress - Tuesday, July 21, 2020 - link

    Scripts are already done :)
    The issue is that a lot of tests have a lot of different entry points; with AHK I can customizer for each. I've been using it for 5 years now, so coding isn't an issue any more.

    Fwiw, I speak with Michael on occasion. We go to the same industry events etc
  • eek2121 - Tuesday, July 21, 2020 - link

    Was procuring a new GPU really that hard? I am going to blame your owner on this one. If you were an independent website I honestly would have purchased a 2080ti and donated it to you. It honestly seems like not being independent is hurting you more than it is helping. Without going into specifics, I know of websites smaller than AT that can afford at least 3 good full time writers and a bunch of awesome hardware.

    I have toyed with the idea of starting an alternative site where all hardware is procured in the retail channel. I know what advertising rates are like and I know that using affiliates, sponsorships, and advertising more than cover the cost of a few models per generation. Maybe it’s time AT staff strike out on their own. Just a thought.

    Outside of that, I look forward to future endeavors.
  • Ian Cutress - Tuesday, July 21, 2020 - link

    Procuring a GPU is always difficult, as we don't have the bandwidth to test AIB cards any more.

    Fwiw AT only has 2/3 FT writers.
    If we were to spin back out, we'd need investors and a strategy.
  • Igor_Kavinski - Tuesday, July 21, 2020 - link

    Request: Core i7-7700K DDR3 benchmarks (There are Asus and Gigabyte mobos that allow DDR3 to be used) to compare with Core i7-7700K DDR4 benchmarks. Thanks!
  • Xex360 - Tuesday, July 21, 2020 - link

    Very fascinating.
  • dad_at - Tuesday, July 21, 2020 - link

    Pls include HEDT Sandy Bridge E: one of Core i7 3960X, 3970X, 3930K, etc. Once it was present in the CPU bench, but you removed it since 2017...

Log in

Don't have an account? Sign up now