The High-Level Zen Overview

AMD is keen to stress that the Zen project had three main goals: core, cache and power. The power aspect of the design is one that was very aggressive – not in the sense of aiming for a mobile-first design, but efficiency at the higher performance levels was key in order to be competitive again. It is worth noting that AMD did not mention ‘die size’ in any of the three main goals, which is usually a requirement as well. Arguably you can make a massive core design to run at high performance and low latency, but it comes at the expense of die size which makes the cost of such a design from a product standpoint less economical (if AMD had to rely on 500mm2 die designs in consumer at 14nm, they would be priced way too high). Nevertheless, power was the main concern rather than pure performance or function, which have been typical AMD targets in the past. The shifting of the goal posts was part of the process to creating Zen.

This slide contains a number of features we will hit on later in this piece, but covers a number of main topics which come under those main three goals of core, cache and power.

For the core, having bigger and wider everything was to be expected, however maintaining a low latency can be difficult. Features such as the micro-op cache help most instruction streams improve in performance and bypass parts of potentially long-cycle repetitive operations, but also the larger dispatch, larger retire, larger schedulers and better branch prediction means that higher throughput can be maintained longer and in the fastest order possible. Add in dual threads and the applicability of keeping the functional units occupied with full queues also improves multi-threaded performance.

For the caches, having a faster prefetch and better algorithms ensures the data is ready when each of the caches when a thread needs it. Aiming for faster caches was AMD’s target, and while they are not disclosing latencies or bandwidth at this time, we are being told that L1/L2 bandwidth is doubled with L3 up to 5x.

For the power, AMD has taken what it learned with Carrizo and moved it forward. This involves more aggressive monitoring of critical paths around the core, and better control of the frequency and power in various regions of the silicon. Zen will have more clock regions (it seems various parts of the back-end and front-end can be gated as needed) with features that help improve power efficiency, such as the micro-op cache, the Stack Engine (dedicated low power address manipulation unit) and Move elimination (low-power method for register adjustment - pointers to registers are adjusted rather than going through the high-power scheduler).

The Big Core Diagram

We saw this diagram last year, showing some of the bigger features AMD wants to promote:

The improved branch predictor allows for 2 branches per Branch Target Buffer (BTB), but in the event of tagged instructions will filter through the micro-op cache. On the other side, the decoder can dispatch 4 instructions per cycle however some of those instructions can be fused into the micro-op queue. Fused instructions still come out of the queue as two micro-ops, but take up less buffer space as a result.

As mentioned earlier, the INT and FP pipes and schedulers are separated, however the INT rename space is 168 registers wide, which feeds into 6x14 scheduling queues. The FP employs as 160 entry register file, and both the FP and INT sections feed into a 192-entry retire queue. The retire queue can operate at 8 instructions per cycle, moving up from 4/cycle in previous AMD microarchitectures.

The load/store units are improved, supporting a 72 out-of-order loads, similar to Skylake. We’ll discuss this a bit later. On the FP side there are four pipes (compared to three in previous designs) which support combined 128-bit FMAC instructions. These can be combined for one 256-bit AVX, but beyond that it has to be scheduled over multiple instructions.

The Ryzen Die Fetch and Decode
Comments Locked

574 Comments

View All Comments

  • EchoWars - Thursday, March 2, 2017 - link

    No, apparently the failure was in your education, since it's obvious you did not read the article.
  • Notmyusualid - Friday, March 3, 2017 - link

    Ha...
  • sharath.naik - Thursday, March 2, 2017 - link

    I think you missed the biggest news in this information dump. The TDP is the biggest advantage amd has. Which means that for 150watt server cpu. they should be able to cram a lot more cores than intel will be able to.
  • Meteor2 - Friday, March 3, 2017 - link

    ^^^This. I think AMD's strength with Zen is going to be in servers.
  • Sttm - Friday, March 3, 2017 - link

    Yeah I can see that.
  • UpSpin - Thursday, March 2, 2017 - link

    According to a german site, in games, Ryzen is equal (sometimes higher, sometimes lower) to the Intel i7-6900K in high resolution games (WQHD). Once the resolution is set very low (720p) the Ryzen gets beaten by the Intel processor, but honestly, who cares about low resolution? For games, the probably best bet would be the i7-7700K, mainly because of the higher clock rate, for now. Once the games get better optimized for 8 cores, the 4-core i7-7700K will be beaten for sure, because in multi-threaded applications Ryzen is on par with the twice expensive Intel processor.

    I doubt it makes sense to buy the Core i7-6850K, it has the same low turbo boost frequency the 6900K has, thus low single threaded performance, but at only 6 cores. So I expect that it's the worst from both worlds. Poor multi-threaded performance compared to Ryzen, poor single threaded performance compared to i7-7700K.

    We also have to see how well Ryzen can get overclocked, thus improving single core performance.
  • fanofanand - Thursday, March 2, 2017 - link

    That is a well reasoned comment. Kudos!
  • ShieTar - Thursday, March 2, 2017 - link

    Well, the point of low-resolution testing is, that at normal resolutions you will always be GPU-restricted. So not only Ryzen and the i7-6900K are equal in this test, but so are all other modern and half-modern CPUs including any old FX-8...

    The most interesting question will be how Ryzen performs on those few modern games which manage to be CPU-restricted even in relevant resolutions, e.g. Battlefield 1 Multiplayer. But I think it will be a few more days, if not weeks, until we get that kind of in-depth review.
  • FriendlyUser - Thursday, March 2, 2017 - link

    This is true, but at the same time this artificially magnifies the differences one is going to notice in a real-world scenario. I saw reviews with a Titan X at 1080p, while many will be playing 1440p with a 1060 or RX480.

    The test case must also approximate real life.
  • khanikun - Friday, March 3, 2017 - link

    They aren't testing to show what it's like in real life though. The point of testing is to show the difference between the CPUs. Hence why they are gearing their benchmarking to stress the CPU, not other portions of the system.

Log in

Don't have an account? Sign up now