The High-Level Zen Overview

AMD is keen to stress that the Zen project had three main goals: core, cache and power. The power aspect of the design is one that was very aggressive – not in the sense of aiming for a mobile-first design, but efficiency at the higher performance levels was key in order to be competitive again. It is worth noting that AMD did not mention ‘die size’ in any of the three main goals, which is usually a requirement as well. Arguably you can make a massive core design to run at high performance and low latency, but it comes at the expense of die size which makes the cost of such a design from a product standpoint less economical (if AMD had to rely on 500mm2 die designs in consumer at 14nm, they would be priced way too high). Nevertheless, power was the main concern rather than pure performance or function, which have been typical AMD targets in the past. The shifting of the goal posts was part of the process to creating Zen.

This slide contains a number of features we will hit on later in this piece, but covers a number of main topics which come under those main three goals of core, cache and power.

For the core, having bigger and wider everything was to be expected, however maintaining a low latency can be difficult. Features such as the micro-op cache help most instruction streams improve in performance and bypass parts of potentially long-cycle repetitive operations, but also the larger dispatch, larger retire, larger schedulers and better branch prediction means that higher throughput can be maintained longer and in the fastest order possible. Add in dual threads and the applicability of keeping the functional units occupied with full queues also improves multi-threaded performance.

For the caches, having a faster prefetch and better algorithms ensures the data is ready when each of the caches when a thread needs it. Aiming for faster caches was AMD’s target, and while they are not disclosing latencies or bandwidth at this time, we are being told that L1/L2 bandwidth is doubled with L3 up to 5x.

For the power, AMD has taken what it learned with Carrizo and moved it forward. This involves more aggressive monitoring of critical paths around the core, and better control of the frequency and power in various regions of the silicon. Zen will have more clock regions (it seems various parts of the back-end and front-end can be gated as needed) with features that help improve power efficiency, such as the micro-op cache, the Stack Engine (dedicated low power address manipulation unit) and Move elimination (low-power method for register adjustment - pointers to registers are adjusted rather than going through the high-power scheduler).

The Big Core Diagram

We saw this diagram last year, showing some of the bigger features AMD wants to promote:

The improved branch predictor allows for 2 branches per Branch Target Buffer (BTB), but in the event of tagged instructions will filter through the micro-op cache. On the other side, the decoder can dispatch 4 instructions per cycle however some of those instructions can be fused into the micro-op queue. Fused instructions still come out of the queue as two micro-ops, but take up less buffer space as a result.

As mentioned earlier, the INT and FP pipes and schedulers are separated, however the INT rename space is 168 registers wide, which feeds into 6x14 scheduling queues. The FP employs as 160 entry register file, and both the FP and INT sections feed into a 192-entry retire queue. The retire queue can operate at 8 instructions per cycle, moving up from 4/cycle in previous AMD microarchitectures.

The load/store units are improved, supporting a 72 out-of-order loads, similar to Skylake. We’ll discuss this a bit later. On the FP side there are four pipes (compared to three in previous designs) which support combined 128-bit FMAC instructions. These can be combined for one 256-bit AVX, but beyond that it has to be scheduled over multiple instructions.

The Ryzen Die Fetch and Decode
Comments Locked

574 Comments

View All Comments

  • Cooe - Sunday, February 28, 2021 - link

    Find me these so-called people buying Intel HEDT CPU's (aka OG Ryzen 7's direct competition) for gaming & never for HPC uses.... Oh wait. They don't exist.
  • Haawser - Thursday, March 2, 2017 - link

    Yeah, but if you're a gamer who streams, Ryzen is waaaay better than anything Inter offer for $499. Especially if you're gaming at 4K, or going to be. Different people have different needs, even gamers.
  • Jimster480 - Thursday, March 2, 2017 - link

    Yes but no,
    Because Broadwell-E and Haswell-E HEDT platforms are in the same boat as Ryzen.

    But this is what this Ryzen 7 release is meant to do.
    Compete with the HEDT platforms, not against the "APU" chips.
    Those chips will come later, albeit with much higher clockspeeds to compete with intel.
    For now you have Intel with 10-20% clockspeed advantages in clockspeed dependent applications.
  • Meteor2 - Saturday, March 4, 2017 - link

    I hope you're right but there's no indication they will be clocked higher. AMD has access to processes which are generation behind Intel's, at least for a couple of years. We can't expect miracles.
  • nos024 - Thursday, March 2, 2017 - link

    Lol, butt hurt? Why even bother running gaming benchmarks? You even said it yourself that ryzen wont make it to your so called grown-up workstation because if low pcie count.

    So tell me who is this $500 Ryzen chip designed for? Not grown ups running workstation, or pathetic kiddies gamers...so theyre for Wannabes?
  • Tunnah - Thursday, March 2, 2017 - link

    He literally said it is ideal to replace his aging 3770k, he gave an example of how it will be used. Try more reading and less being a turd
  • ddriver - Thursday, March 2, 2017 - link

    Ryzen is that much more affordable that with the price difference I could have built another whole system, dedicated to running the 2 HBA adapters, thus saving on the need of 16 lanes. 40 - 16 is exactly 24, which is what ryzen has. If it was available a year ago I would have simply built two systems, offering a good 50-60% more CPU performance, double the GPU performance, with enough need to accommodate my IO needs, even if between two systems, that wouldn't have been much of an issue.

    The pci lane count is lower than intel E series chips, however it is still 50% higher than what you can get from intel outside the E series. It will actually suffice in most workstation scenarios, even if you end up running graphics at x8, which is not really a big deal.
  • ddriver - Thursday, March 2, 2017 - link

    "you even said it yourself that ryzen wont make it to your so called grown-up workstation because if low pcie count"

    I did not say that. Not all workstations require 40 pcie lanes. Most could do with 24. I was talking about my workstation in particular, which has plenty of pcie hardware. For the vast majority of HPC scenarios that would not be necessary, furthermore as already mentioned, with the saved money you can build additional systems dedicated to specific tasks, offloading both the need of more pcie lanes and the cpu time the attached hardware consumes.

    It remains to be seen how much IO will the server zen parts have. Ryzen is not particularly a workstation grade chip, it just happens to be GOOD ENOUGH to do the job. AMD give you 50% more performance and 50% more IO at the same or better price point, and I think they will do the same for the chips they actually design for workstation.

    It looks like the 16 core workstation chip will have 64 pcie lanes, and the 32 core - a whooping 128 lanes. So intel E series looks like a sad little orphan with its modest 40 lanes... And no, xeons aren't much better, they are in fact worse, the 24 core E7-8894 v4 only has a modest 32 lanes.

    So no, while I will not be replacing my main 10 core workstation with a ryzen, because that would win me nothing, I am definitely looking forward to replacing it next year with a Naples system, and I definitely wished ryzen was available last year as I could have spent my money much better than buying intel.
  • Intel999 - Thursday, March 2, 2017 - link

    "So tell me who is this $500 Ryzen chip designed for?"

    Logic would imply it is aimed at anyone that works in an environment where they need superior multithreading performance. For instance, anyone that has bought a 6900k or 6950k, but more importantly it is for those individuals that "wanted" to buy either of Intel's multi core champs but couldn't due to ridiculous prices.

    I'd dare to make a bet there are more people that wanted to buy a 6900k than there are people that actually did. Now they can buy one and still put food on the table this month.
  • FriendlyUser - Thursday, March 2, 2017 - link

    Exactly right. I was always tempted by the 6850K, but the price of the CPU+platform was simply ridiculous. For much less I got a faster CPU and a high-end MB. I won't miss the 40PCIe lanes.

Log in

Don't have an account? Sign up now