Conclusion

The march towards Ryzen has been a long road for AMD. Anyone creating a new CPU microarchitecture deserves credit as designing such a complex thing requires millions of hours of hard graft. Nonetheless, hard graft doesn’t always guarantee success, and AMD’s targets of low power, small area, and high-performance while using x86 instructions was a spectacularly high bar, especially with a large blue incumbent that regularly outspends them in R&D many times over.

Through the initial disclosures on the Zen microarchitecture, one thing was clear when speaking to the senior staff, such as Dr. Lisa Su, Mark Papermaster, Jim Anderson, Mike Clark, Sam Naffziger and others: that quiet confidence an engineer gets when they know their product is not only good but competitive. The PR campaign up until this point of launch has been managed (we assume) such that the trickle of information comes down the pipe and keeps people on the edge of their seat. Given the interest in Ryzen, it has worked.

Ryzen is AMD’s first foray into CPUs on 14nm, and using FinFETs, as well as a new microarchitecture and pulling in optimization methods from previous products such as Excavator/Carrizo and the GPU line. If we were talking tick-tock strategy, such as Intel’s process over the last decade, this is both a tick and a tock in one. Ryzen is the first part of that strategy, on the desktop processors first, with server parts coming out in Q2 and Notebook APUs in 2H17. One of the main concerns with AMD is typically the ability to execute – get enough good parts out on time and with sufficient performance to merit their launch. As Dr. Su said in our interview, it’s one big hurdle but there are many to come.

AMD Ryzen 7 SKUs
  Cores/
Threads
Base/
Turbo
L3 TDP Cost Launch Date
Ryzen 7 1800X 8/16 3.6/4.0 16 MB 95 W $499 3/2/2017
Ryzen 7 1700X 8/16 3.4/3.8 16 MB 95 W $399 3/2/2017
Ryzen 7 1700 8/16 3.0/3.7 16 MB 65 W $329 3/2/2017

Today’s launch of three CPUs, part of the Ryzen 7 family, will be followed by Ryzen 5 in Q2 and Ryzen 3 later in the year. Ryzen 7 uses a single eight-core die, and uses simultaneous multi-threading (SMT) to provide sixteen threads altogether, up to 4.0 GHz on the top Ryzen 7 1800X chip for $499. Officially AMD is positioning the 1800X as a direct competitor to Intel’s i7-6900K, an 8-core processor with hyperthreading that costs over $1000. In our benchmarks, it’s been clear that this battle goes toe-to-toe.

Analyzing the Results

In the brief time we had before getting a sample and this review, we were able to run our new benchmark suite on twelve different Intel CPUs, as well as AMD’s former APU crown holder, the A10-7890K. Throughout the discussion about Ryzen, AMD was advertising a 40%+ gain in raw performance per clock over their previous generations, which was then upped to 52% when the CPUs were actually announced. Back of the envelope calculations put Ryzen at the level of the high-end desktop Broadwell CPUs or just about, which means it would be a case of pure frequency. That being said, Intel is already running CPUs two generations ahead on its mainstream platform, such as the Kaby-Lake based i7-7700K, so it’s going to be an interesting analysis.

Multi-Threaded Tests

First up, AMD’s strength in our testing was clearly in the multithreaded benchmarks. Given the microarchitectures of AMD’s and Intel’s high-performance x86 cores are somewhat similar, for the most part there’s a similar performance window. Intel’s design is slightly wider and has a more integrated cache hierarchy, meaning it manages to win out on ‘edge cases’ that might be a result of bad code. As one might expect, it takes a lot of R&D to cater for particular edge cases.

But as a workstation-based core design, the Zen microarchitecture pulls no punches. As long as the software doesn’t need strenuous AVX code, and manages its memory carefully (such as false sharing), the performance of Ryzen in conventional multi-threaded CPU environments means AMD is back in the game. This is going to be beneficial for the Zen microarchitecture when we see it applied into server environments, especially those that require virtualization.

Single Threaded Tests

Since the launch of Bulldozer, AMD has always been a step behind on single-threaded performance. ST performance is a holy-grail of x86 core design, but often requires significantly advanced features that can potentially burn extra power to get there. AMD’s message on this, even in our interview with Dr. Su, is that AMD has the ability to innovate. This is why AMD promotes features such as SenseMI for their new advanced pre-fetch algorithms, why implementing a micro-op cache into the core was a big thing, and how having double the L2 cache over Intel’s comparable parts were important to the story. Nonetheless, AMD hammered down that a 40%+ IPC gain into its narrative, which then folded into a 52% gain at launch. We’ll take a deeper look into this in a separate review, but our single threaded results show that AMD is back in the fight.

Most of the data points in this graph come from Intel Kaby Lake processors at various frequencies, but the important thing here is that the Ryzen parts are at least in the mix, sitting above the other eight core parts on the far right of the graph. Also, the jump from the A10 up to the 1700 is a big generational jump, let alone considering the 1800X.

Overall Performance

Putting these two elements into the same graph gives the following:

By this measure, for overall performance, it’s clear the Core i7-7700K is still better in price/performance. However, AMD would argue that the competition for the Ryzen 7 parts is on the right, with the i7-6900K and i7-5960X. In our CPU-based results, AMD wins this performance per dollar hands down.

Some Caveats

Since testing this review, and waiting a few days to even write the conclusion, there has been much going on about ways in which AMD’s performance is perhaps being neutered. This list includes discussions around:

  • Windows 10 RTC disliking 0.25x multipliers, causing timing issues,
  • Software not reading L3 properly (thinking each core has 8MB of L3, rather than 2MB/core),
  • Latency within a CCX being regular, but across CCX boundaries having limited bandwidth,
  • Static partitioning methods being used shows performance gains when SMT is disabled,
  • Ryzen showing performance gains with faster memory, more so than expected,
  • Gaming Performance, particularly towards 240 Hz gaming, is being questioned,
  • Microsoft’s scheduler not understanding the different CCX core-to-core latencies,
  • Windows not scheduling threads within a CCX before moving onto the next CCX,
  • Some motherboards having difficulty with DRAM compatibility,
  • Performance related EFIs being highly regularly near weekly since two weeks before launch.

A number of these we are already taking steps to measure. Some of the fixes to these issues come from Microsoft’s corner, and we are already told that both AMD and Microsoft are underway to implement scheduling routines to fix these. Other elements will be AMD focused: working with software companies to ensure that after a decade of programming for the other main x86 microarchitecture, it’s a small step to also consider the Zen platform.

At this point, we’re unsure at what level some of these might be default design issues with Zen. The issue of single-thread performance increasing when SMT is disabled (we’ve done some pre-testing, up to 6% in ST) is clearly related to the design of the core, with static partitioning vs competitive partitioning of certain parts of the design. The CCX latency and detection is one that certainly needs further investigation.

The future according to Senior Fellow Mike Clark, one of the principle engineers on the Zen microarchitecture, is that AMD knows where the easy gains are for their next generation product (codenamed Zen 2), and they're already working through the list. A question is then if Intel continues at 5% performance gains clock for clock each year, can AMD make 5-15% and close the gap?

The Silver Lining

It is relevant to point out that Intel is on its 7th Generation of Core microarchitecture. Sure, it looks significantly different when it was designed, but software vendors have had seven generations to optimize for it. Coming in and breaking the incumbent’s stranglehold is difficult, when every software vendor knows that design in and out. While AMD’s design looks similar to Intel, there are nuances which programmers might not have expected, and so it might be a good couple of years before the programming guides have made their way through into production software.

But,

When we look at the CPU benchmarks as part of this review, which have a strong range in algorithm difficulty and dependency, AMD still does well. There’s no getting around that AMD has a strong workstation core design on their hands. This bodes well for users who need compute at a lower cost, especially when you can pick up an eight-core Ryzen 7 at half the cost of the competition.

This makes the server story for AMD, under Naples, much more interesting.

AnandTech Recommended Award
For Performance/Price on a Workstation CPU

Bulldozer just got Steamrolled. Game on.

Related Reading

Benchmarking Performance: CPU Legacy Tests
Comments Locked

574 Comments

View All Comments

  • nt300 - Saturday, March 11, 2017 - link

    If AMD hadn't gone with GF's 14nm process, ZEN would probably have been delayed. I think as soon as Ryzen Optimizations come out, these chips will further outperform.
  • MongGrel - Thursday, March 9, 2017 - link


    For some reason making a casual comment about anything bad about the chip will get you banned at the drop of a hat on the tech forums, and then if you call him out they will ban you more.

    https://arstechnica.com/gadgets/2017/03/amds-momen...

  • MongGrel - Thursday, March 9, 2017 - link

    For some reason, MarkFW seems to thinks he is the reincarnation of Kyle Bennet, and whines a lot before retreating to his safe space.
  • nt300 - Saturday, March 11, 2017 - link

    I've noticed in the past that AMD has an issue with increasing L3 cache speed and/or Latencies. Hopefully they start tightening the L3 as much as possible. Can Anandtech do a comparison between Ryzen before Optimizations and after Optimizations. Ty
  • alpha754293 - Friday, March 17, 2017 - link

    Looks like that for a lot of the compute-intensive benchmarks, the new Ryzen isn't that much better than say a Core i5-7700K.

    That's quite a bit disappointing.

    AMD needs to up their FLOPS/cycle game in order to be able to compete in that space.

    Such a pity because the original Opterons were a great value proposition vs. the Intels. Now, it doesn't even come close.
  • deltaFx2 - Saturday, March 25, 2017 - link

    @Ian Cutress: When you do test gaming, if you can, I'd love to have the hypothesis behind the 'generally accepted methodology' tested out. The methodology being, to test it at lowest resolution. The hypothesis is that this stresses the CPU, and that a future, higher performance GPU will be bottlenecked by the slower CPU. Sounds logical, but is it?

    Here's the thing: Typically, when given more computing resources, people scale up their problem to utilize those resources. In other words, if I give you a more powerful GPU, games will scale up their perf requirements to match it, by doing stuff that were not possible/practical in earlier GPUs. Today's games are far more 'realistic' and are played at much higher resolutions than say 5 years ago. In which case, the GPU is always the limiting factor no matter what (unless one insists on playing 5 year old games on the biggest, baddest GPU). And I fully expect that the games of today are built to max out current GPUs, so hardware lags software.

    This has parallels with what happens in HPC: when you get more compute nodes for HPC problems, people scale up the complexity of their simulations rather than running the old, simplified simulations. Amdahl's law is still not a limiting factor for HPC, and we seem to be talking about Exascale machines now. Clearly, there's life in HPC beyond what a myopic view through the Amdahl law lens would indicate.

    Just a thought :) Clearly, core count requirements have gone up over the last decade, but is it true that a 4c/8t sandy bridge paired up with Nvidia's latest and greatest is CPU-bottlenecked at likely resolutions?
  • wavelength - Friday, March 31, 2017 - link

    I would love to see Anand test against AdoredTV's most recent findings on Ryzen https://www.youtube.com/watch?v=0tfTZjugDeg
  • LawJikal - Friday, April 21, 2017 - link

    What I'm surprised to see missing... in virtually all reviews across the web... is any discussion (by a publication or its readers) on the AM4 platform's longevity and upgradability (in addition to its cost, which is readily discussed).

    Any Intel Platform - is almost guaranteed to not accommodate a new or significantly revised microarchitecture... beyond the mere "tick". In order to enjoy a "tock", one MUST purchase a new motherboard (if historical precedent is maintained).

    AMD AM4 Platform - is almost guaranteed to, AT LEAST, accommodate Ryzen "II" and quite possibly Ryzen "III" processors. And, in such cases, only a new processor and BIOS update will be necessary to do so.

    This is not an insignificant point of differentiation.
  • PeterCordes - Monday, June 5, 2017 - link

    The uArch comparison table has some errors for the Intel columns. Dispatch/cycle: Skylake can read 6 uops per clock from the uop cache into the issue queue, but the issue stage itself is still only 4 uops wide. You've labelled Even running from the loop buffer (LSD), it can only sustain a throughput of 4 uops per clock, same 4-wide pipeline width it has been since Core2. (pre-Haswell it has to be a mix of ALU and some store or load to sustain that throughput without bottlenecking on the execution ports.) Skylake's improved decode and uop-cache bandwidth lets it refill the uop queue (IDQ) after bubbles in earlier stages, keeping the issue stage fed (since the back-end is often able to actually keep up).

    Ryzen is 6-wide, but I think I've read that it can only issue 6 uops per clock if some of them are from "double instructions". e.g. 256-bit AVX like VADDPS ymm0, ymm1, ymm2 that decodes to two separate 128-bit uops. Running code with only single-uop instructions, the Ryzen's front-end throughput is 5 uops per clock.

    In Intel terminology, "dispatch" is when the scheduler (aka Reservation Station) sends uops to the execution units. The row you've labelled "dispatch / cycle" is clearly the throughput for issuing uops from the front-end into the out-of-order core, though. (Putting them into the ROB and Reservation Station). Some computer-architecture people call that "dispatch", but it's probably not a good idea in an x86 context. (Unless AMD uses that terminology; I'm mostly familiar with Intel).

    ----

    You list the uop queue size at 128 for Skylake. This is bogus. It's always 64 per thread, with or without hyperthreading. Intel has alternated in SnB/IvB/HSW/SKL between this and letting one thread use both queues as a single big queue. HSW/BDW statically partition their 56-entry queue into two 28-entry halves when two threads are active, otherwise it's a 56-entry queue. (Not 64). Agner Fog's microarch pdf and Intel's optmization manual both confirm this (in Section 2.1.1 about Skylake's front-end improvements over previous generations).

    Also, the 4-uop per clock issue width is 4 fused-domain uops, so I was able to construct a loop that runs 7 unfused-domain uops per clock (http://www.agner.org/optimize/blog/read.php?i=415#... with 2 micro-fused ALU+load, one micro-fused store, and a dec/branch. AMD doesn't talk about "unfused" uops because it doesn't use a unified scheduler, IIRC, so memory source operands always stay with the ALU uop.

    Also, you mentioned it in the text, but the L1d change from write-through to write-back is worth a table row. IIRC, Bulldozer's L1d write-back has a small buffer or something to absorb repeated writes of the same lines, so it's not quite as bad as a classic write-through cache would be for L2 speed/power requirements, but Ryzen is still a big improvement.

Log in

Don't have an account? Sign up now