The AMD Zen and Ryzen 7 Review: A Deep Dive on 1800X, 1700X and 1700
by Ian Cutress on March 2, 2017 9:00 AM ESTChipsets and Motherboards: 300-series and AM4
Users keeping tabs on the developments of CPUs will have seen the shift over the last ten years to moving the traditional ‘northbridge’ onto the main CPU die. The northbridge was typically the connectivity hub, allowing the CPU to communicate to the PCIe, DRAM and the Chipset (or Southbridge), and moving this onto the CPU silicon gave better latency, better power characteristics, and reduced the complexity of the motherboard, all for a little extra die area. Typically when we say ‘CPU’ in the context of a modern PC build, this is the image we have, with the CPU containing cores and possibly graphics (which AMD calls an APU).
Typically the CPU/APU has limited connectivity: video outputs (if an integrated GPU is present), a PCIe root complex for the main PCIe lanes, and an additional connectivity pathway to the chipset to enable additional input/output functionality. The chipset uses a one-to-many philosophy, whereby the total bandwidth between the CPU and Chipset may be lower than the total bandwidth of all the functionality coming out of the chipset. Using FIFO buffers, this is typically managed as required. The best analogy for this is that a motorway is not 50 million lanes wide, because not all cars use it at the same time. You only need a few lanes to cater for all but the busiest circumstances.
If the CPU also has the chipset/southbridge built in, either in the silicon or as a multi-chip package, we typically call this an ‘SoC’, or system on chip, as the one unit has all the connectivity needed to fully enable its use. Add on some slots, some power delivery and firmware, then away you go.
Platform = SoC + Chipset (Optional)
The AM4 platform will cater for both Ryzen CPUs (Summit Ridge, SR) and the second generation of Excavator APUs (Bristol Ridge, BR). As a result, the capabilities of the two have share some commonality in order to be interoperable with the same motherboard products. As a result, there is only a few minor differences:
Here is Bristol Ridge, with eight PCIe 3.0 lanes for add-in cards, two SATA 6 Gbps, four USB 3.0 ports, two PCIe x1 lanes, and a PCIe x4 lane for the chipset. The chipset is optional, as those four lanes could be put to use elsewhere (or bifurcated into x1/x1/x2 as required) when extra IO is not needed.
What differs with Ryzen and Summit Ridge is numbers: sixteen lanes for add-in cards and four SATA 6 Gbps ports plus an x2 NVMe (or two SATA plus an x4 NVMe). What AMD is doing with AM4 is a half-way house between a SoC and having a fully external chipset. Some of the connectivity, such as SATA ports, PCIe storage, or PCIe lanes beyond the standard GPU lanes, is built into the processor. These fall under the features of the processor, and for the current launch is a fixed set of features. The CPU also has additional connectivity to an optional chipset which can provide more features, however the use of the chipset is optional.
PCIe is Fun with Switches: PLX, Thunderbolt, 10GigE, the Kitchen Sink
Another thing about the x16 link or x8/x8 links, rather than say a total of 28/40 lanes, is that it can be combined with an external PCIe switch. In my discussions with AMD, they suggested a switch that bifurcates a x8 to dual x4 interfaces, which could leverage fast PCIe storage while maintaining the onboard graphics for any GPU duties. There’s the other side, in using an x16 or x8 to x32 PCIe switch and affording two large x16 links.
Here’s a crazy mockup I thought of, using a $100 PCIe switch. I doubt this would come to market.
Ian plays a crazy game of PCIe Lego
The joy of PCIe and switches is that it becomes a mix and match game - there’s also the PCIe 3.0 x4 to the chipset. This can be used for non-chipset duties, such as anything that takes PCIe 3.0 x4 like a fast SSD, or potentially Thunderbolt 3. We discussed TB3 support, via Intel’s Alpine Ridge controller, and we were told that the AM4 platform is currently being validated for systems supporting AMD XConnect, which will require Thunderbolt support. AMD did state that they are not willing to speculate on TB3 use, and from my perspective this is because the external GPU feature is what AMD is counting on as being the primary draw for TB3 enabled systems (particularly for OEMs). I suspect the traditional motherboard manufacturers will offer wilder designs, and ASRock likes to throw some spaghetti at the wall, to see what sticks.
The AM4 Socket
One of the common perceptions of AMD is that they like to keep a socket for many, many generations. Both AM3/3+ and FM2/2+ have existed for at least three generations of CPUs a piece, and the main thinking here is that backwards compatibility was important. Because Ryzen requires DDR4 memory, enables PCIe, and perhaps some other magic, more pins and a new socket is needed, breaking that trend. The new CPU uses 1331 pins, up from the 939-941 we’ve had before, but essentially in the same dimensions, meaning that the pins are smaller and now easier to break / accidentally smear thermal grease into and not be able to get it out.
The socket then changes to accommodate the pins, however AMD is still using a zero insertion force (ZIF) method for contact. Which means every so often, if I have a good seal between the CPU and the heatsink, removing the heatsink also removes the CPU – not a critical issue, but not the preferred state of things.
The socket hole mounting does change however, from AM3 to AM4. On AM3 the dimensions were, center hole to center hole, 96mm x 48mm. This led to a very cumbersome rectangular design for sure. AM4 makes the design more square, down to 90mm x 54mm, but is still very square. With the new mounting holes, users will need brackets for existing CPU coolers or buy new coolers with the new brackets. More on that in a second.
It is worth noting that AMD has still kept the same mounting mechanism from AM3 for coolers that use a spring loaded mount on AMD’s plastic clips. So users with a spring loaded mount cooler, will still be able to use it as the plastic clip on the new boards is in the right place. Some motherboard manufacturers are also taking the initiative, and implementing both sets of AM3 and AM4 hole mounts such that the older coolers will be able to be used. However, using an older cooler might result in a few issues depending on the screw height, as reviewers have already been complaining about bad contact. In our review today, we used a Noctua DH-U12S SE-AM4 cooler that implements rails and then a secondary screw system, and it has none of the issues we just described. Ultimately all motherboards will end up just with AM4 holes, but it should facilitate smaller form factor motherboards where AM3 was perhaps too rectangular.
Coolers Supporting AMD Ryzen Processors in AM4 Form-Factor | ||||||
Manufacturer | Already Compatible |
Requires New Mounting Kit |
Upcoming | |||
ARCTIC | Alpine 64 Plus Alpine 64 Pro Alpine 64 GT Freezer 7 Pro Freezer 13 Freezer 13 Limited Freezer 13 CO Freezer Extreme |
Liquid Freezer 120/240/360 (kit to be available in April) | Freezer 12 Freezer 33 |
|||
Be Quiet! | Pure Rock Pure Rock Slim Shadow Rock LP |
Dark Rock 3 Dark Rock Pro 3 Dark Rock TF Shadow Rock 2 Shadow Rock Slim Silent Loop |
||||
Corsair | H60 H110i H100i |
All the remaining Hydro coolers. Customers can claim their retention bracket here for free, no proof of purchase is required. |
||||
Cryorig | - | C1 R1 Universal/Ultimate H5 Universal/Ultimate H7 H7 Quad Lumi M9a C7 A40 A40 Ultimate A80 |
||||
Cooler Master | MasterLiquid 240 MasterLiquid 120 MasterLiquid Lite 120 Hyper 212 LED Turbo Hyper T4 Hyper TX3 (Plastics) Hyper TX3 EVO (EU Ver.) Hyper TX3 EVO (JP Ver.) Hyper TX3 EVO Hyper T2 Hyper 101 PWM U BLIZZARD T2 BLIZZARD T2 MINI |
MasterLiquid Pro Series Nepton Series Seidon Series MasterAir Maker 8 MasterAir Pro 4 MasterAir Pro 3 Hyper 612 Ver.2 Hyper 412 Series Hyper 212 LED Hyper 212 EVO Hyper 212 X Hyper D92 |
||||
DeepCool | Beta 10 Beta 200ST CK0AM209 Gammaxx Gamma Archer Ice Blade 100 Ice Blade 200M Ice Edge Mini FS 2.0 |
Assassin II Beta 11 Beta 40 Captain Series Frostwin Series Gabriel HTPC-200 Ice Blade Pro 2.0 Lucifer Series Maelstrom Series Neptwin Series |
||||
Enermax | ETS-T50A-BVT ETS-T50A-WVS ETS-T40F-TB ETS-T40F-BK ETS-T40F-W ETS-T40F-RF ETS-N30R-HE ETS-N30R-TAA ELC-LMR240-BS ELC-LMR120S-BS |
|||||
MSI | Core Frozr L | |||||
Noctua | NH-D15 SE-AM4 NH-U12S SE-AM4 NH-L9x65 SE-AM4 |
NH-C12P NH-C12P SE14 NH-C14 NH-C14S NH-D14 NH-D14 SE2011 NH-D15 NH-D15S NH-D9L NH-L12 NH-L9x65 NH-U12 NH-U12F NH-U12P NH-U12P SE1366 NH-U12P SE2 NH-U9 NH-U9B NH-U9B SE2 NH-U9F NH-U12DX NH-U12DX 1366 NH-U12DX i4 NH-U9DX i4 NH-U9DX 1366 |
NM-AM4 upgrade-kit | |||
NH-U14S NH-U12S NH-U9S |
NM-AM4-UxS upgrade-kit |
Anton helpfully put this table together of the recent public statements from the various cooler companies regarding support of users who currently own their products. The statements range from nothing yet to offering a free bracket with proof of purchase of CPU, to proof of purchase with cooler. We suspect more information will come out in due course.
Motherboards
There’s going to be a wide ranging mix of AM4 motherboards available, and AMD has already been promoting them as early as January. Recent reports put the number at around 80-85 boards (some for specific regions/customers), using either the X370, B350 or A320 chipsets. In the review samples that the tech press were given, it was a random allocation of either the ASUS Crosshair VI Hero, the MSI X370 XPower Titanium, the ASRock X370 Taichi or the GIGABYTE AX370 Gaming 5. Other vendors such as ECS and Biostar will also be joining the fray on the shelves.
Pricing for AM4 motherboards is expected to start from as low as $50 USD, all the way up to $350 and perhaps beyond. We’ll do a bigger motherboard analysis piece later.
574 Comments
View All Comments
nt300 - Saturday, March 11, 2017 - link
If AMD hadn't gone with GF's 14nm process, ZEN would probably have been delayed. I think as soon as Ryzen Optimizations come out, these chips will further outperform.MongGrel - Thursday, March 9, 2017 - link
For some reason making a casual comment about anything bad about the chip will get you banned at the drop of a hat on the tech forums, and then if you call him out they will ban you more.
https://arstechnica.com/gadgets/2017/03/amds-momen...
MongGrel - Thursday, March 9, 2017 - link
For some reason, MarkFW seems to thinks he is the reincarnation of Kyle Bennet, and whines a lot before retreating to his safe space.nt300 - Saturday, March 11, 2017 - link
I've noticed in the past that AMD has an issue with increasing L3 cache speed and/or Latencies. Hopefully they start tightening the L3 as much as possible. Can Anandtech do a comparison between Ryzen before Optimizations and after Optimizations. Tyalpha754293 - Friday, March 17, 2017 - link
Looks like that for a lot of the compute-intensive benchmarks, the new Ryzen isn't that much better than say a Core i5-7700K.That's quite a bit disappointing.
AMD needs to up their FLOPS/cycle game in order to be able to compete in that space.
Such a pity because the original Opterons were a great value proposition vs. the Intels. Now, it doesn't even come close.
deltaFx2 - Saturday, March 25, 2017 - link
@Ian Cutress: When you do test gaming, if you can, I'd love to have the hypothesis behind the 'generally accepted methodology' tested out. The methodology being, to test it at lowest resolution. The hypothesis is that this stresses the CPU, and that a future, higher performance GPU will be bottlenecked by the slower CPU. Sounds logical, but is it?Here's the thing: Typically, when given more computing resources, people scale up their problem to utilize those resources. In other words, if I give you a more powerful GPU, games will scale up their perf requirements to match it, by doing stuff that were not possible/practical in earlier GPUs. Today's games are far more 'realistic' and are played at much higher resolutions than say 5 years ago. In which case, the GPU is always the limiting factor no matter what (unless one insists on playing 5 year old games on the biggest, baddest GPU). And I fully expect that the games of today are built to max out current GPUs, so hardware lags software.
This has parallels with what happens in HPC: when you get more compute nodes for HPC problems, people scale up the complexity of their simulations rather than running the old, simplified simulations. Amdahl's law is still not a limiting factor for HPC, and we seem to be talking about Exascale machines now. Clearly, there's life in HPC beyond what a myopic view through the Amdahl law lens would indicate.
Just a thought :) Clearly, core count requirements have gone up over the last decade, but is it true that a 4c/8t sandy bridge paired up with Nvidia's latest and greatest is CPU-bottlenecked at likely resolutions?
wavelength - Friday, March 31, 2017 - link
I would love to see Anand test against AdoredTV's most recent findings on Ryzen https://www.youtube.com/watch?v=0tfTZjugDegLawJikal - Friday, April 21, 2017 - link
What I'm surprised to see missing... in virtually all reviews across the web... is any discussion (by a publication or its readers) on the AM4 platform's longevity and upgradability (in addition to its cost, which is readily discussed).Any Intel Platform - is almost guaranteed to not accommodate a new or significantly revised microarchitecture... beyond the mere "tick". In order to enjoy a "tock", one MUST purchase a new motherboard (if historical precedent is maintained).
AMD AM4 Platform - is almost guaranteed to, AT LEAST, accommodate Ryzen "II" and quite possibly Ryzen "III" processors. And, in such cases, only a new processor and BIOS update will be necessary to do so.
This is not an insignificant point of differentiation.
PeterCordes - Monday, June 5, 2017 - link
The uArch comparison table has some errors for the Intel columns. Dispatch/cycle: Skylake can read 6 uops per clock from the uop cache into the issue queue, but the issue stage itself is still only 4 uops wide. You've labelled Even running from the loop buffer (LSD), it can only sustain a throughput of 4 uops per clock, same 4-wide pipeline width it has been since Core2. (pre-Haswell it has to be a mix of ALU and some store or load to sustain that throughput without bottlenecking on the execution ports.) Skylake's improved decode and uop-cache bandwidth lets it refill the uop queue (IDQ) after bubbles in earlier stages, keeping the issue stage fed (since the back-end is often able to actually keep up).Ryzen is 6-wide, but I think I've read that it can only issue 6 uops per clock if some of them are from "double instructions". e.g. 256-bit AVX like VADDPS ymm0, ymm1, ymm2 that decodes to two separate 128-bit uops. Running code with only single-uop instructions, the Ryzen's front-end throughput is 5 uops per clock.
In Intel terminology, "dispatch" is when the scheduler (aka Reservation Station) sends uops to the execution units. The row you've labelled "dispatch / cycle" is clearly the throughput for issuing uops from the front-end into the out-of-order core, though. (Putting them into the ROB and Reservation Station). Some computer-architecture people call that "dispatch", but it's probably not a good idea in an x86 context. (Unless AMD uses that terminology; I'm mostly familiar with Intel).
----
You list the uop queue size at 128 for Skylake. This is bogus. It's always 64 per thread, with or without hyperthreading. Intel has alternated in SnB/IvB/HSW/SKL between this and letting one thread use both queues as a single big queue. HSW/BDW statically partition their 56-entry queue into two 28-entry halves when two threads are active, otherwise it's a 56-entry queue. (Not 64). Agner Fog's microarch pdf and Intel's optmization manual both confirm this (in Section 2.1.1 about Skylake's front-end improvements over previous generations).
Also, the 4-uop per clock issue width is 4 fused-domain uops, so I was able to construct a loop that runs 7 unfused-domain uops per clock (http://www.agner.org/optimize/blog/read.php?i=415#... with 2 micro-fused ALU+load, one micro-fused store, and a dec/branch. AMD doesn't talk about "unfused" uops because it doesn't use a unified scheduler, IIRC, so memory source operands always stay with the ALU uop.
Also, you mentioned it in the text, but the L1d change from write-through to write-back is worth a table row. IIRC, Bulldozer's L1d write-back has a small buffer or something to absorb repeated writes of the same lines, so it's not quite as bad as a classic write-through cache would be for L2 speed/power requirements, but Ryzen is still a big improvement.