CPU Tests: SPEC

SPEC2017 and SPEC2006 is a series of standardized tests used to probe the overall performance between different systems, different architectures, different microarchitectures, and setups. The code has to be compiled, and then the results can be submitted to an online database for comparison. It covers a range of integer and floating point workloads, and can be very optimized for each CPU, so it is important to check how the benchmarks are being compiled and run.

We run the tests in a harness built through Windows Subsystem for Linux, developed by our own Andrei Frumusanu. WSL has some odd quirks, with one test not running due to a WSL fixed stack size, but for like-for-like testing is good enough. SPEC2006 is deprecated in favor of 2017, but remains an interesting comparison point in our data. Because our scores aren’t official submissions, as per SPEC guidelines we have to declare them as internal estimates from our part.

For compilers, we use LLVM both for C/C++ and Fortan tests, and for Fortran we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.

clang version 10.0.0-svn350067-1~exp1+0~20181226174230.701~1.gbp6019f2 (trunk)

-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2
-mfma -mavx -mavx2

Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions. We decided to build our SPEC binaries on AVX2, which puts a limit on Haswell as how old we can go before the testing will fall over. This also means we don’t have AVX512 binaries, primarily because in order to get the best performance, the AVX-512 intrinsic should be packed by a proper expert, as with our AVX-512 benchmark. All of the major vendors, AMD, Intel, and Arm, all support the way in which we are testing SPEC.

To note, the requirements for the SPEC licence state that any benchmark results from SPEC have to be labelled ‘estimated’ until they are verified on the SPEC website as a meaningful representation of the expected performance. This is most often done by the big companies and OEMs to showcase performance to customers, however is quite over the top for what we do as reviewers.

For each of the SPEC targets we are doing, SPEC2006 rate-1, SPEC2017 rate-1, and SPEC2017 rate-N, rather than publish all the separate test data in our reviews, we are going to condense it down into a few interesting data points. The full per-test values are in our benchmark database.

(9-0a) SPEC2006 1T Geomean Total(9-0b) SPEC2017 1T Geomean Total

Single thread is very much what we expected, with the consumer processors out in the lead and no real major differences between TR and TR Pro.

(9-0c) SPEC2017 nT Geomean Total

That changes when we move into full thread mode. The extra bandwidth of TR Pro is clear to see, even in the 32C/64T model. In this test we're using 128 GB of memory for all TR and TR Pro processors, and we're seeing a small bump when in 64C/64T mode, perhaps due to the increased memory cap/thread and memory bandwidth/thread as well. The 3990X 64C/128T run kept failing for an odd reason, so we do not have a score for that test.

CPU Tests: Synthetic CPU Tests: Microbenchmarks
Comments Locked

98 Comments

View All Comments

  • Spunjji - Friday, July 16, 2021 - link

    Having seen how modern processors behave with insufficient cooling, Threska's right that it won't get "fried", but you're correct to infer that it would result in unpredictably sub-optimal performance.

    Anecdotally, I had a friend with a Sandy Bridge system with a cooling issue that he only noticed when he bought a new GPU and ran 3DMark and got unexpectedly low results. The "cooling issue" was that the stock heatsink wasn't even making contact with the CPU heat-spreader; he'd been gaming with the system for 3 years by that point. 😬
  • serpretetsky - Friday, July 16, 2021 - link

    I had to do some thermal shutdown testing on some consumer intel cpu. I forgot which one. Maybe i5/i7 8000 series?

    With server CPUs this was usually pretty easy, remove fan, and wait for shutdown. With the consumer CPU it kept running. So i completely removed the heatsink, the thing simply downclocked to 800 MHz, and continued running happily with no heatsink. Booted to linux, ran everything great, and no heatsink (actually once it booted to linux I think it even started clocking back up once in a while). I had get a hot-air soldering gun to heat it up till shutdown.
  • mode_13h - Saturday, July 17, 2021 - link

    5-10 years ago, there was a heatsink gasket where you have to get near 100 degrees C to melt the material so it fuses with the heatsink and CPU. I forget the name, but I'm wondering if it's even possible to do that any more.
  • skaurus - Wednesday, July 14, 2021 - link

    That's great analysis.
  • Threska - Wednesday, July 14, 2021 - link

    It would be nice to see how these MBs do with VFIO since that has considerations most users don't.
  • mode_13h - Wednesday, July 14, 2021 - link

    Ian, is the source code for your 3DPM benchmark published anywhere? If not, it would be nice if we could see it and compare the AVX2 path with the AVX-512 one. Also, maybe someone could add support for ARM NEON or SVE.
  • techguymaxc - Wednesday, July 14, 2021 - link

    I'm slightly confused by the concluding remarks.

    "Performance between Threadripper Pro and Threadripper came in three stages. Either (a) the results between similar processors was practically identical, (b) Threadripper beat TR Pro by a small margin due to slightly higher frequencies, or (c) TR Pro thrashed Threadripper due to memory bandwidth availability. That last point, (c), only really kicks in for the 32c and 64c processors it should be noted. Our 16c TR Pro had the same memory bandwidth results as TR, most likely due to only having two chiplets in its design."

    A and B are observable, but C only proves true in synthetic benchmarks (and Pi calculation). Is there a real-world use-case for the additional memory bandwidth, outside of calculating Pi?
  • Blastdoor - Wednesday, July 14, 2021 - link

    The advantage shows up with multi-threaded SPEC. SPEC is essentially a composite of a suite of real-world tasks. I guess you could call it 'synthetic' due to it being a composite, but the individual tasks don't strike me as 'synthetic.' For example, here's a description of namd: https://www.spec.org/cpu2017/Docs/benchmarks/508.n...
  • techguymaxc - Wednesday, July 14, 2021 - link

    Thanks for that info. It would be nice to see the breakdown of individual test results from the SPEC suite.
  • arashi - Saturday, July 17, 2021 - link

    Bench

Log in

Don't have an account? Sign up now