SPEC2006 & 2017: Industry Standard - ST Performance

One big talking point around the new Ryzen 3000 series is the new augmented single-threaded performance of the new Zen 2 core. In order to investigate the topic in a more controlled manner with better documented workloads, we’ve fallen back to the industry standard SPEC benchmark suite.

We’ll be investigating the previous generation SPEC CPU2006 test suite giving us some better context to past platforms, as well as introducing the new SPEC CPU2017 suite. We have to note that SPEC2006 has been deprecated in favour of 2017, and we must also mention that the scores posted today are noted as estimates as they’re not officially submitted to the SPEC organisation.

For SPEC2006, we’re still using the same setup as on our mobile suite, meaning all the C/C++ benchmarks, while for SPEC2017 I’ve also went ahead and prepared all the Fortran tests for a near complete suite for desktop systems. I say near complete as due to time constraints we’re running the suite via WSL on Windows. I’ve checked that there are no noticeable performance differences to native Linux (we’re also compiling statically), however one bug on WSL is that it has a fixed stack size so we’ll be missing 521.wrf_r from the SPECfp2017 collection.

In terms of compilers, I’ve opted to use LLVM both for C/C++ and Fortran tests. For Fortran, we’re using the Flang compiler. The rationale of using LLVM over GCC is better cross-platform comparisons to platforms that have only have LLVM support and future articles where we’ll investigate this aspect more. We’re not considering closed-sourced compilers such as MSVC or ICC.

clang version 8.0.0-svn350067-1~exp1+0~20181226174230.701~1.gbp6019f2 (trunk)
clang version 7.0.1 (ssh://git@github.com/flang-compiler/flang-driver.git 
  24bd54da5c41af04838bbe7b68f830840d47fc03)

-Ofast -fomit-frame-pointer
-march=x86-64
-mtune=core-avx2 
-mfma -mavx -mavx2

Our compiler flags are straightforward, with basic –Ofast and relevant ISA switches to allow for AVX2 instructions.

The Ryzen 3900X system was run in the same way as the rest of our article with DDR4-3200CL16, same as with the i9-9900K, whilst the Ryzen 2700X had DDR-2933 with similar CL16 16-16-16-38 timings.

SPECint2006 Speed Estimated Scores

In terms of the int2006 benchmarks, the improvements of the new Zen2 based Ryzen 3900X is quite even across the board when compared to the Zen+ based Ryzen 2700X. We do note however somewhat larger performance increases in 403.gcc and 483.xalancbmk – it’s not immediately clear as to why as the benchmarks don’t have one particular characteristic that would fit Zen2’s design improvements, however I suspect it’s linked to the larger L3 cache.

445.gobmk in particular is a branch-heavy workload, and the 35% increase in performance here would be better explained by Zen2’s new additional TAGE branch predictor which is able to reduce overall branch misses.

It’s also interesting that although Ryzen3900X posted worse memory latency results than the 2700X, it’s still able to outperform the latter in memory sensitive workloads such as 429.mcf, although the increases for 471.omnetpp is amongst the smallest in the suite.

However we still see that AMD has an overall larger disadvantage to Intel in these memory sensitive tests, as the 9900K has large advantages in 429.mcf, and posting a large lead in the very memory bandwidth intensive 462.libquantum, the two tests that put the most pressure on the caches and memory subsystem.

SPECfp2006(C/C++) Speed Estimated Scores

In the fp2006 benchmarks, we gain see some larger jumps on the part of the Ryzen 3900X, particularly in 482.sphinx3. These two tests along with 450.soplex are characterized by higher data cache misses, so Zen2’s 16MB L3 cache should definitely be part of the reason we see such larger jumps.

I found it interesting that we’re not seeing much improvements in 470.lbm even though this is a test that is data store heavy, so I would have expected Zen2’s additional store AGU to greatly benefit this workload. There must be some higher level memory limitations which is bottlenecking the test.

453.povray isn’t data heavy nor branch heavy, as it’s one of the more simple workloads in the suite. Here it’s mostly up to the execution backend throughput and the ability of the front-end to feed it fast enough that are the bottlenecks. So while the Ryzen 3900X provides a big boost over the 2700X, it’s still largely lagging behind the 9900K, a characteristic we’re also seeing in the similar execution bottlenecked 456.hmmer of the integer suite.

SPEC2006 Speed Estimated Total

Overall, the 3900X is 25% faster in the integer and floating point tests of the SPEC2006 suite, which corresponds to an 17% IPC increase, above AMD's officially published figures for IPC increases.

Moving on to the 2017 suite, we have to clarify that we’re using the Rate benchmark variations. The 2017 suite’s speed and rate benchmarks differ from each other in terms of workloads. The speed tests were designed for single-threaded testing and have large memory demands of up to 11GB, while the rate tests were meant for multi-process tests. We’re using the rate variations of the benchmarks because we don’t see any large differentiation between the two variations in terms of their characterisation and thus the performance scaling between the both should be extremely similar. On top of that, the rate benchmarks take up to 5x less time (+1 hour vs +6 hours), and we're able run them on more memory limited platforms (which we plan on to do in the future).

SPECint2017 Rate-1 Estimated Scores

In the int2017 suite, we’re seeing similar performance differences and improvements, although this time around there’s a few workloads that are a bit more limited in terms of their performance boosts on the new Ryzen 3900X.

Unfortunately I’m not quite as familiar with the exact characteristics of these tests as I am with the 2006 suite, so a more detailed analysis should follow in the next few months as we delve deeper into microarchitectural counters.

SPECfp2017 Rate-1 Estimated Scores

In the fp2017 suite, things are also quite even. Interesting enough here in particular AMD is able to leapfrog Intel’s 9900K in a lot more workloads, sometimes winning in terms of absolute performance and sometimes losing.

SPEC2017 Rate-1 Estimated Total

As for the overall performance scores, the new Ryzen 3900X improves by 23% over the 2700X. Although closing the gap greatly and completely, it’s just a hair's width shy of actually beating the 9900K’s absolute single-threaded performance.

SPEC2017 Rate-1 Estimated Performance Per GHz

Normalising the scores for frequency, we see that AMD has achieved something that the company hasn’t been able to claim in over 15 years: It has beat Intel in terms of overall IPC. Overall here, the IPC improvements over Zen+ are 15%, which is a bit lower than the 17% figure for SPEC2006.

We already know about Intel’s new upcoming Sunny Cove microarchitecture which should undoubtedly be able to regain the IPC crown with relative ease, but the question for Intel is if they’ll be able to still maintain the single-thread absolute performance crown and continue to see 5GHz or similar clock speeds with the new core design.

Test Bed and Setup Benchmarking Performance: Web Tests
Comments Locked

447 Comments

View All Comments

  • Korguz - Tuesday, July 9, 2019 - link

    what ever phynaz... anandtech did a write up on the power intel uses for their chips : https://www.anandtech.com/show/13544/why-intel-pro...
    that link you posted, looks like a mistake was made with communication between several different parties, as the poster said, considering this is a new cpu, and accompanying mobo/chipset, things like this do happen, even intel has had its own issues with a new platform, and we will have to see how it levels off in the coming few weeks. amd does stay within its TDP limits better then intel.. at least when amd says their cpus use XXX watts, it uses around that number, unlike intel, where a 95watt cpu, cause use up to 200 watts, as the link i posted shows...

    you sure like to throw insults around dont you ? does it make you feel better about your self ? in the end.. maybe its YOU that cant handle the truth about your beloved intel ? face it, compared to zen2/ryzen 3, intels cpus use more power, and cost LESS then intels equivalent cpu, and amd has IPC parity with intel.
  • Xyler94 - Wednesday, July 10, 2019 - link

    Phynaz... you may wanna rethink your TDP argument there...

    Intel's i9 9900k's TDP is 95W, however regularly hits over 200W without an overclock.
  • just4U - Monday, July 8, 2019 - link

    The silver award seems apt. Since it certainly lived up to expectations and in some instances surpasses them. gold if it's the clear winner in everything, platinum if it beats out all expectations..
  • Meteor2 - Monday, July 15, 2019 - link

    Ryzen 3000 beats Intel ST and MT per Watt or per dollar, which are the only metrics which matter. Otherwise how are you comparing like with like?
  • patmanRR - Sunday, July 7, 2019 - link

    Using llvm for c/c++/Fortran codes is most likely to result in slower performance than gcc (and even more likely than Intel compilers) .I do not know if the performance impact is more/less/the same among Intel and and CPUs but I do not really trust these numbers in the first pages of the review.
  • Dragonsteel - Sunday, July 7, 2019 - link

    I'm excited by the 3800X, which based on this article, may showcase a much higher performance (and power) output at higher multi threaded applications.

    I'm very much looking forward to the inclusion of the 3800X numbers. Would also like to see some game updates with the 2080 and such at 1440p as most of the test either skipped that resolution and went to 4K. The 4K results mostly showed the GPU bottleneck.
  • danjw - Sunday, July 7, 2019 - link

    I was really looking forward to reading this review. I look forward to finding out what is going on with your PCMark numbers. I appreciate that you guys are willing to go the extra mile when you see something not looking right. Thank you and keep up the great work guys!
  • AshlayW - Sunday, July 7, 2019 - link

    Great review, thanks. Gains are good but I'm more than happy with my 2700X for now so I'll likely be waiting for Ryzen 4000. Seems like Intel CPUs are more or less obsolete at their current prices now unless you absolutely need the best possible gaming performance at any cost. (more money than sense).

    One nitpick, though. I completely disagree with this statement:

    "Ultimately, while AMD still lags behind Intel in gaming performance, the gap has narrowed immensely, to the point that Ryzen CPUs are no longer something to be dismissed if you want to have a high-end gaming machine."

    Specifically, about "dismissing" AMD Ryzen CPUs for high end gaming machines, I mean the 2nd and 1st gen ones. I have built many "high end" gaming machines, with Ryzen 1800X and 2700X and they are excellent. Anyone that "dismisses" Ryzen 1 or 2 for a high end gaming machine is a tool. (I'm gaming at 144Hz on a 2700X, lol).

    But I understand the point trying to be made. Gaming was the last bastion for Ryzen in absolute performance and now they have sort of cracked it. 9900K for 480+ bucks is going to be a hard sell with these new chips onm the market. Where are these rumoured Intel Price cuts? or is chipzilla really that arrogant?
  • GlossGhost - Monday, July 8, 2019 - link

    I think he said that because most people want to see AMD close to/or beat Intel in order to finally look at the processors as a proper alternative. I am playing on an R5 2600 daily and in what I need it to perform, it does great. People like us who have long researched and dug into those Ryzens will probably have already switched. Now it's time for those like my colleagues whom, when I showed the performances, went into deep thoughts as to how to plan their next Ryzen builds.
  • ballsystemlord - Sunday, July 7, 2019 - link

    Spelling, grammar, and 2 technical corrections (thus far):

    "...meaning for the very vast majority of workloads, you're better off staying at or under DDR4-3600 with a 1:1 MC:IF ratio."
    Acutally, AMD's graph shows DDR4-37333, not DDR4-3600 before the 2:1 IF ratio sets in.
    "...meaning for the very vast majority of workloads, you're better off staying at or under DDR4-3733 with a 1:1 MC:IF ratio."

    "...this put a lot more pressure on the L2 cache capacity, ..."
    Missing "s":
    "...this puts a lot more pressure on the L2 cache capacity, ..."

    "AMD here has essentially as 60% advantage in bandwidth as the CCX's L3 is much faster than Intel's L3"
    "a" not "as. Maybe get rid of the "essentially"?
    "AMD here has essentially a 60% advantage in bandwidth as the CCX's L3 is much faster than Intel's L3"

    "The X570 chipset is the first chipset its manufactured in-house using ASMedia's IP, whereas previously with the X470 and X370 chipsets, ASMedia developed and produced it based on its 55nm architecture."
    This sentence makes absolutely no sense. Have another cup of coffee? :)

    "...on top of being able to run them on more memory limited platforms which we plan on to do in the future."
    Excess "on".
    "...on top of being able to run them on more memory limited platforms which we plan to do in the future."

    "We're seeing quite an interesting match-up against Intel's 9700K here which is leading the all the benchmarks."
    Extra "the":
    "We're seeing quite an interesting match-up against Intel's 9700K here which is leading all the benchmarks."

    "In our test, we take v1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos and push it through a reasonably fast variant of the algorithms, but is still more stringent than our 2017 test."
    Replace "is" for "they are" as the word algorithms is plural.
    "In our test, we take v1.3.3 of the software with a good sized data set of 84 x 18 megapixel photos and push it through a reasonably fast variant of the algorithms, but they are still more stringent than our 2017 test."

    "Please note, if you plan to share out the Compression graph, please include the Decompression one. Otherwise you're only presenting half a picture."
    Excess words, try:
    "Please note, if you plan to share our Compression graph, please include the Decompression one. Otherwise you're only presenting half a picture."

    "but actually also raising the clock frequency at the same time, bringing for some impressive power efficiency benefits."
    Excess "for" or bringing:
    "but actually also raising the clock frequency at the same time, bringing some impressive power efficiency benefits."
    OR
    "but actually also raising the clock frequency at the same time, for some impressive power efficiency benefits."

    "Not that Zen 2 is soley about memory performance, either."
    Missing "l":
    "Not that Zen 2 is solely about memory performance, either."

    "We've also seen the core's new 256-bit (AVX2) vector datapaths to work very well."
    Excess "to":
    "We've also seen the core's new 256-bit (AVX2) vector datapaths work very well."

    "Intel's higher achieved frequencies as well as continued larger lead in memory sensitive workloads are still goals that AMD has to work towards to"
    Excess "to":
    "Intel's higher achieved frequencies as well as continued larger lead in memory sensitive workloads are still goals that AMD has to work towards"

    "The new design did seemingly make some compromises, and we saw that the DRAM memory latency of this new system architecture is slower than the previous monolithic implementation. However, here is also where things get interesting. Even though this is a theoretical regression on paper, when it comes to actual performance in workloads the regression is essentially non-existent, and AMD is able to showcase improvements even in the most memory-sensitive workloads."
    Not strictly accurate. AMD is showing a regression in performance compared to themselves in the "3DMark Physics - Ice Storm Unlimited" and "AppTimer: GIMP" benchmarks. GIMP is single threaded and the 3900X is loosing to the 2700X. Again, the same with "Ice Storm Unlimited", but I suspect that we're hitting a performance ceiling here.
    I suspect if deep dive into the regression in GIMP you'll find something more interesting than just a memory bottle-neck.

Log in

Don't have an account? Sign up now