Comparing Skylake-S and Skylake-X/SP Performance Clock-for-Clock

If you’ve read through the full review up to this point (and kudos), there should be three things that stick in the back of your mind about the new Skylake-SP cores: Cache, Mesh and AVX512. These are the three main features that separate the consumer grade Skylake-S core from this new core, and all three can have an impact in clock-for-clock performance. Even though the Skylake-S and the Skylake-SP are not competing in the same markets, it is still poignant to gather how much the changes affect the regular benchmark suite.

For this test, we took the Skylake-S based Core i5-6600 and the Skylake-SP based Core i9-7900X and ran them both with only 4 cores, no hyperthreading, and 3 GHz on all cores with no Turbo active. Both CPUs were run in high performance modes in the OS to restrict any time-to-idle, so it is worth noting here that we are not measuring power. This is just raw throughput.

Both of these cores support different DRAM frequencies, however: the i5-6600 lists DDR4-2133 as its maximum supported frequency, whereas the i9-7900X will run at DDR4-2400 at 2DPC. I queried a few colleagues as to what I should do here – technically the memory support is an extended element of the microarchitecture, and the caches/uncore/untile will be running at different frequencies, so how much of the system support should be chipped away for parity. The general consensus was to test with the supported frequencies, given this is how the parts ship.

For this analysis, each test was broken down in two ways: what sort of benchmark (single thread, multi-thread, mixed) and what category of benchmark (web, office, encode).

 

For the single threaded tests, results were generally positive. Kraken enjoyed the L2, and Dolphin emulation had a good gain as well. The legacy tests did not fair that great: 3DPM v1 has false sharing, which is likely taking a hit due to the increased L2 latency.

On the multithreaded tests, the big winner here was Corona. Corona is a high-performance renderer for Autodesk 3ds Max, showing that the larger L2 does a good job with its code base. The step back was in Handbrake – our testing does not implement any AVX512 code, but the L3 victim cache might be at play here over the L3 inclusive cache in SKL-S.

The mixed results are surprising: these tests vary with ST and MT parts to their computation, some being cache sensitive as well. The big outlier here is the compile test, indicating that the Skylake-SP might not be (clock for clock) a great compilation core. This is a result we can trace back to the L3 again, being a smaller non-inclusive cache. In our results database, we can see similar results when comparing a Ryzen 7 1700X, an 8-core 95W CPU with 16MB of L3 victim cache, is easily beaten by a Core i7-7700T, with 4 cores at 35W but has 8MB of inclusive L3 cache.

If we treat each of these tests with equal weighting, the overall result will offer a +0.5% gain to the new Skylake-SP core, which is with the margin of error. Nothing too much to be concerned about for most users (except perhaps people who compile all day), although again, these two cores are not in chips that directly compete. The 10-core SKL-SP chip still does the business on compiling:

Office: Chromium Compile (v56)

If all these changes (minus AVX512) offer a +0.5% gain over the standard Skylake-S core, then one question worth asking is what was the point? The answer is usually simple, and I suspect involves scaling (moving to chips with more cores), but also customer related. Intel’s big money comes from the enterprise, and no doubt some of Intel’s internal metrics (as well as customer requests) point to a sizeable chunk of enterprise compute being L2 size limited. I’ll be looking forward to Johan’s review on the enterprise side when the time comes.

Benchmarking Performance: CPU Legacy Tests Intel Skylake-X Core i9-7900X, i7-7820X and i7-7800X Conclusion
Comments Locked

264 Comments

View All Comments

  • halcyon - Monday, June 19, 2017 - link

    Thank you for the review. The AVX512 situation was a bit unclear. Which models have which AVX512 support of the now released chips (and the future, yet to be released chips). It would be interesting to see AVX512 specific article once the chips are out and we have (hopefully) some useful AVX512-optimized software (like encoding).

    For myself, decision is easy now to postpone, until ThreadRipper is out. The thermals are just out of whack and for my workloads, the price is not justified. Here's hoping Threadripper can deliver more for same price or less.
  • Bulat Ziganshin - Monday, June 19, 2017 - link

    all skl-x has avx-512 support. i9 cpus has double fma512 engines, but as i guess - that's only difference, i.e. remaining 512-bit commands (such as integer operations) will have the same throughput on i7 and i9
    but i may be wrong and it will be really very ibteresting to check throughput of all other operatins. probably Anand can't do that and we will need to wait until Agner Fog will reach these spus
  • halcyon - Monday, June 19, 2017 - link

    Thanks. Many questions remain: AVX512F vs AVX512BW, which are supported now, which in the future? What is the difference? How does it compare to Knights Landing? Are number of AVX units tied to number of cores? What is the speed differential in AVX512 loads? What is the oc limitation from AVX512 support? etc.
  • nevcairiel - Monday, June 19, 2017 - link

    Those Knights Landing/Knights Mill specific AVX512 instructions are actually very specific to the work-loads you would see on such a specialized CPU. The instructions chosen for Skylake-X and future Cannon Lake more closely match what we already know from AVX/AVX2.

    For the "types" of AVX512, basically its split into several sub-instruction sets, all containing different instructions. AVX512 will always include F, because thats the basis for everything (instruction encoding, 512-bit registers, etc). BW/DQ include the basic instructions we know from AVX/AVX2, just for 512-bit registers. SKL-X supports all of F, CD, BW, DQ, VL.

    Wikipedia on AVX-512 has some more info on the different feature sets of AVX-512:
    https://en.wikipedia.org/wiki/AVX-512

    Its generally safe to just ignore the Knights Landing specific instructions. They are very specific to the workloads on those systems. The AVX-512 subset used for Xeon "Purley" and SKL-X is more inline with the AVX/AVX2 instructions we had before - just bigger.

    For software, x264 for example already got some AVX512 optimizations over the recent weeks, it might be interesting to test how much that helps once all the launch dust settles.
  • halcyon - Tuesday, June 20, 2017 - link

    Thank you very much!
  • satai - Monday, June 19, 2017 - link

    Any idea, why is Intel so much better at Chrome compilation?
  • IanHagen - Monday, June 19, 2017 - link

    I'd like to know that as well. Ryzen does particularly fine compiling the Linux Kernel, for example, as seen in: http://www.phoronix.com/scan.php?page=article&...
  • tamalero - Monday, June 19, 2017 - link

    Optimizations?
    I still remember when Intel actively paid some software developers to block multicore and threading on AMD chips to boast about "more performance" during the athlon X2 days.
  • johnp_ - Tuesday, June 20, 2017 - link

    It's just the enterprise parts. Kaby Lake S is behind Ryzen:
    http://www.anandtech.com/show/11244/the-amd-ryzen-...
  • Gothmoth - Monday, June 19, 2017 - link

    if powerdraw and heat would not be so crazy i would buy the 8 core today.

    but more than the price these heat issues are a concern for me.. i like my systems to be quiet....

Log in

Don't have an account? Sign up now