Section by Andrei Frumusanu

The New Zen 3 Core: High-Level

As we dive into the Zen3 microarchitecture, AMD made a note of their journey of the last couple of years, a success-story that’s been started off in 2017 with the revolutionary Zen architecture that helped bring AMD back to the competitive landscape after several sombre years of ailing products.

The original Zen architecture brought a massive 52% IPC uplift thanks to a new clean-sheet microarchitecture which brought at lot of new features to the table for AMD, introducing features such as a µOP cache and SMT for the first time into the company’s designs, as well as introducing the notion of CPU core-complexes with large (8MB at the time) L3 caches. Features on a 14nm FinFET process node, it was the culmination and the start-off point of a new roadmap of microarchitectures which leads into today’s Zen3 design.

Following a minor refresh in the form of Zen+, last year’s 2019 Zen2 microarchitecture was deployed into the Ryzen 3000 products, which furthered AMD’s success in the competitive landscape. Zen2 was what AMD calls a derivative of the original Zen designs, however it contained historically more changes than what you’d expect from such a design, bringing more IPC increases than what you’d typically see. AMD saw Zen2 as a follow-up to what they had learned with the original Zen microarchitecture, fixing and rolling out design goal changes that they had initially intended for the first design, but weren’t able to deploy in time for the planned product launch window. AMD also stated that it enabled an opportunity to bring some of the future Zen3 specific changes were moved forward into the Zen2 design.

This was also the point at which AMD moved to the new chiplet design, leveraging the transition to TSMC’s new 7nm process node to increase the transistor budget for things like doubling the L3 cache size, increasing clock speeds, and vastly reducing the power consumption of the product to enable aggressive ramp in total core counts both in the consumer space (16-core Ryzen 9 3950X), as well as in the enterprise space (64-core EPYC2 Rome).

Tying a cutting-edge high-performance 7nm core-complex-die (CCD) with a lower cost 12/14nm I/O die (IOD) in such a heterogenous package allowed AMD to maximise the advantages and minimise the disadvantages of both respective technologies – all whilst AMD’s main competitor, Intel, was, and still is, struggling to bring out 10nm products to the market. It was a technological gamble that AMD many times has said was made years in advance, and has since paid off plenty.

Zen 3 At A Glance

This brings us to today’s Zen3 microarchitecture and the new Ryzen 5000 series. As noted earlier, Mark Papermaster had mentioned that if you were to actually look at the new design from a 100,000-foot level, you’d notice that it does look extremely similar to previous generation Zen microarchitectures. In truth, while Zen3 does share similarities to its predecessors, AMD’s architects started off with a clean-sheet design, or as they call it – “a ground-up redesign”. This is actually quite a large claim as this is a quite enormous endeavour to venture in for any company. Arm’s Cortex-A76 is the most recent other industry design that is said to have been designed from scratch, leveraging years of learning of the different design teams and solving inherent issues that require more invasive and large changes to the design.

Because the new Zen3 core still exhibits quite a few defining characteristics of the previous generation designs, I think that AMD’s take on a “complete redesign” is more akin to a deconstruction and reconstruction of the core’s building blocks, much like you’d dismantle a LEGO set and rebuild it anew. In this case, Zen3 seems to be a set-piece both with new building blocks, but also leveraging set pieces and RTL that they’ve used before in Zen2.

Whatever the interpretation of a “clean-sheet” or “complete redesign” might be, the important take is that Zen3 is a major overhaul in terms of its complete microarchitecture, with AMD paying attention to every piece of the puzzle and trying to bring balance to the whole resulting end-design, which comes in contrast to a more traditional “derivative design” which might only touch and see changes in a couple of the microarchitecture’s building blocks.

AMD’s main design goals for Zen3 hovered around three main points:

- Delivering another significant generational single-threaded performance increase. AMD did not want to be relegated to top performance only in scenarios where workloads would be spread across all the cores. The company wanted to catch up and be an undisputed leader in this area to be able to claim an uncontested position in the market.

- Latency improvements, both in terms of memory latency, achieved through a reduction in effective memory latency through more cache-hits thanks to the doubled 32MB L3 that an individual core can take advantage of, as well as core-to-core latency which again thanks to the consolidated single L3 cache on the die is able to reduce long travel times across the dies.

- Continuing a power efficiency leadership: Although the new Zen3 cores still use the same base N7 process node from TSMC (although with incremental design improvements), AMD had a constraint of not increasing power consumption for the platform. This means that any new performance increases would have to come through simultaneous power efficiency improvements of the microarchitecture.

The culmination of all the design changes AMD has made with the Zen3 micro-architecture results in what the company claims as a 19% average performance uplift over a variety of workloads. We’ll be breaking down this number further into the review, but internal figures show we are matching the 19% average uplift across all SPEC workloads, with a median figure of 21%. That is indeed a tremendous achievement, considering the fact that the new Ryzen 5000 chips clock slightly higher than their predecessors, further amplifying the total performance increase of the new design.

AMD Zen 3 Ryzen Deep Dive Review Zen 3: Front-End Updates & Execution Unit Redesigns
Comments Locked

339 Comments

View All Comments

  • Luminar - Thursday, November 5, 2020 - link

    Cache Rules Everything Around Me
  • SIDtech - Thursday, November 5, 2020 - link

    Hi Andrei,

    Excellent work. Do you know how this performance shapes up against the Cortex A77 ?
  • t.s - Friday, November 6, 2020 - link

    Seconded. Want to know how the likes of ryzen 4 4350G or 5600 versus Cortex A77 or A78.
  • Kangal - Saturday, November 7, 2020 - link

    It's hard to say, because it really depends on the instruction/software as it is very situational. It also depends on the type of device it is powering, you can move up from Phones, to Thin Tablets, to Thick Laptops, to Large Desktops, and upto a Server. Each device offers different thermal constraints.

    The lower-thermal devices will favour the ARM chip, the mid-level will favour AMD, and the higher-thermal devices will favour Intel. That WAS the rule of thumb. In general, you could say Intel's SkyLake has the single-threaded performance crown, then AMD's Zen+ loses to it by a notable margin but beats it in multi-threaded tasks, and then going to an ARM Cortex A76 will have the lowest single-thread but the highest multi-threaded performance.

    Now?
    Well, there's the newly launched 2021 AMD Zen3 processor. And the upcoming 2021 ARM Cortex-X Overclocked Big-core using the new A78 microarchitecture. Lastly there's the 2022 Intel Rocket Lake yet to debut. So it's too early to tell, we can only make inferences.
  • Kangal - Saturday, November 7, 2020 - link

    Here is my personal (yet amateur) take on the future 2020-2022 standpoints between the three racers. Firstly I'll explain what the different keywords and attributes mean
    (from most technical to most real-world implication)

    Total efficiency: (think Full Server / Tractor) how much total calculations versus total power draw
    Multi-threaded: (think Large Desktop / Truck) how much total calculations
    Single-threaded: (think Thick Laptop / Car) how much priority calculations
    IPC performance: (think Thin Tablet / Motorbike) how much priority calculations at desirable frequency/voltage/power-draw

    *Emulating:
    Having a "simple" ARM chip running "complex" x86 instructions. Such as running 32bit or 64bit OS X or Windows programs, via new techniques of emulation using a partial-hardware and hybrid-software solutions. I think the hit to efficiency will be around x3, instead of the expected x12 degradation.

    So here are the lists (from most technical to most real-world implication)
    Simple Code > Mixed code > Recommended Solution

    Here's how they stack up when running identical new code (ie Modern Apps):
    Total efficiency: ARM >>>> AMD >> Intel
    Multi-threaded: ARM > AMD > Intel
    Single-threaded: Intel = AMD > ARM
    IPC performance: ARM >>> AMD > Intel

    Now what about them running legacy code (ie x86 Program):
    Efficiency + *emulating: AMD > Intel >> ARM
    Multi + *emulating: AMD > Intel >> ARM
    1n + *emulating: Intel = AMD >>> ARM
    IPC + *emulating: AMD > Intel > ARM

    My recommendation?
    Full Server: 60% legacy 40% new code. This makes ARM the best option by a small margin.
    Large Desktop: 80% legacy 20% new code. AMD is the best option with modest margin.
    Thick Laptop: 70% legacy 30% new code. Intel is the best. AMD is very close (tied?) second.
    Thin Tablet: 10% legacy 90% new code. ARM is the best option by huge margin.
  • Tomatotech - Monday, November 9, 2020 - link

    Excellent post, but worth pointing out that *all* modern chips now emulate x86 and x64 code. They run a front end that takes x86 / x64 machine code then convert that into RISC code and that goes through various microcode and translation layers before being processed by the backend. That black box structure has allowed swapping out and optimising the back end for decades while maintaining code compatibility on the front end.

    So it’s not as simple to differentiate between the various chips as you make it out to be.
  • Gondalf - Sunday, November 8, 2020 - link

    I don't know. Looking Spec results, we can say Anandtech is absolutely unable to set a Spec session correctly. From the review Zen 2 is slower per Ghz than old Skylake in integer, that is absolutely wrong in consumer cores (in server cores yes), even worse Ice Lake core is around fast as old Skylake per GHz.
    Basically this review is rushed and very likely they have set all AMD compiler flags on "fast" to do more contacts and a lot of hipe.
    My God, for Anandtech Zen 3 is 35% faster in the global Spec values than Zen 2. Not even AMD worst marketing slide say this. We have Zen 4 here not Zen 3. Wait wait please.
    A really crap review, the author need to go back to school about Spec.

    Obviously the article do not say that 28W Tiger Lake is unable to run at 4.8Ghz for more than a couple of seconds, after this it throttes down, so the same Willow Cove core on a desktop Cpu could destroy Zen 3 without mercy on a CB session. Not to mention the far slower memory subsystem of a mobile cpu.

    Basically looking at games results, Rocket Lake will eclipse this core forever. AMD have nothing of new in its hands, they need to wait Zen 4
  • Qasar - Sunday, November 8, 2020 - link

    yea ok gondalf, trying to find ways that your beloved intel doesnt lose at everything now ??
    accept it, amd is faster then intel across the board.
  • Spunjji - Monday, November 9, 2020 - link

    That's a strange claim about Tiger Lake performance, Gondalf, because I seem to recall Intel seeding all the reviewers with a laptop that could run TGL at 4.8Ghz boost 'til the cows come home - and that's what Anandtech used to get that number. It's literally the best they can do right now. You're right of course - in actual shipping ultrabooks, TGL is a hot PoS that cannot maintain its boost clocks. Maybe by 2022 they'll finally put Willow Cove into a shipping desktop CPU.

    "Basically looking at games results, Rocket Lake will eclipse this core forever"
    If by "eclipse" you mean gain a maximum 5% advantage at higher clock speeds and nearly double the power draw then sure, "eclipse", yeah. 🤭

    I love your posts here. Please, never stop stepping on rakes like Sideshow Bob.
  • macroboy - Saturday, December 12, 2020 - link

    LOL look at AMD's Efficiency and sustained core clocks, Intel runs too hot to stay at 5ghz for very long. meanwhile Zen3 plows along at 55C no problem, *you're the one who needs to check your facts.

Log in

Don't have an account? Sign up now