Zen 4 Execution Pipeline: Familiar Pipes With More Caching

Finally, let’s take a look at the Zen 4 microarchitecture’s execution flow in-depth. As we noted before, AMD is seeing a 13% IPC improvement over Zen 3. So how did they do it?

Throughout the Zen 4 architecture, there is not any single radical change. Zen 4 does make a few notable changes, but the basics of the instruction flow are unchanged, especially on the back-end execution pipelines. Rather, many (if not most) of the IPC improvements in Zen 4 come from improving cache and buffer sizes in some respect.

Starting with the front end, AMD has made a few important improvements here. The branch predictor, a common target for improvements given the payoffs of correct predictions, has been further iterated upon for Zen 4. While still predicting 2 branches per cycle (the same as Zen 3), AMD has increased the L1 Branch Target Buffer (BTB) cache size by 50%, to 2 x 1.5k entries. And similarly, the L2 BTB has been increased to 2 x 7k entries (though this is just an ~8% capacity increase). The net result being that the branch predictor’s accuracy is improved by being able to look over a longer history of branch targets.

Meanwhile the branch predictor’s op cache has been more significantly improved. The op cache is not only 68% larger than before (now storing 6.75k ops), but it can now spit out up to 9 macro-ops per cycle, up from 6 on Zen 3. So in scenarios where the branch predictor is doing especially well at its job and the micro-op queue can consume additional instructions, it’s possible to get up to 50% more ops out of the op cache. Besides the performance improvement, this has a positive benefit to power efficiency since tapping cached ops requires a lot less power than decoding new ones.

With that said, the output of the micro-op queue itself has not changed. The final stage of the front-end can still only spit out 6 micro-ops per clock, so the improved op cache transfer rate is only particularly useful in scenarios where the micro-op queue would otherwise be running low on ops to dispatch.

Switching to the back-end of the Zen 4 execution pipeline, things are once again relatively unchanged. There are no pipeline or port changes to speak of; Zen 4 still can (only) schedule up to 10 Integer and 6 Floating Point operations per clock. Similarly, the fundamental floating point op latency rates remain unchanged as 3 cycles for FADD and FMUL, and 4 cycles for FMA.

Instead, AMD’s improvements to the back-end of Zen 4 have here too focused on larger caches and buffers. Of note, the retire queue/reorder buffer is 25% larger, and is now 320 instructions deep, giving the CPU a wider window of instructions to look through to extract performance via out-of-order execution. Similarly, the Integer and FP register files have been increased in size by about 20% each, to 224 registers and 192 registers respectively, in order to accommodate the larger number of instructions that are now in flight.

The only other notable change here is AVX-512 support, which we touched upon earlier. AVX execution takes place in AMD’s floating point ports, and as such, those have been beefed up to support the new instructions.

Moving on, the load/store units within each CPU core have also been given a buffer enlargement. The load queue is 22% deeper, now storing 88 loads. And according to AMD, they’ve made some unspecified changes to reduce port conflicts with their L1 data cache. Otherwise the load/store throughput remains unchanged at 3 loads and 2 stores per cycle.

Finally, let’s talk about AMD’s L2 cache. As previously disclosed by the company, the Zen 4 architecture is doubling the size of the L2 cache on each CPU core, taking it from 512KB to a full 1MB. As with AMD’s lower-level buffer improvements, the larger L2 cache is designed to further improve performance/IPC by keeping more relevant data closer to the CPU cores, as opposed to ending up in the L3 cache, or worse, main memory. Beyond that, the L3 cache remains unchanged at 32MB for an 8 core CCX, functioning as a victim cache for each CPU core’s L2 cache.

All told, we aren’t seeing very many major changes in the Zen 4 execution pipeline, and that’s okay. Increasing cache and buffer sizes is another tried and true way to improve the performance of an architecture by keeping an existing design filled and working more often, and that’s what AMD has opted to do for Zen 4. Especially coming in conjunction with the jump from TSMC 7nm to 5nm and the resulting increase in transistor budget, this is good way to put those additional transistors to good use while AMD works on a more significant overhaul to the Zen architecture for Zen 5.

Zen 4 Architecture: Power Efficiency, Performance, & New Instructions Test Bed and Setup
POST A COMMENT

205 Comments

View All Comments

  • phoenix_rizzen - Monday, September 26, 2022 - link

    The Spec graphs are hard to read as you don't have the CPUs listed in the correct order. You should switch dark blue to be 5950X and light blue to be 3950X. Right now you have the CPUs (graphs) listed as:

    Intel
    7950X
    3950X
    5950X

    It really should be:
    Intel
    7950X
    5950X
    3950X

    That would make it a lot easier to see the generational improvements. Sort things logically, numerically. :)
    Reply
  • Otritus - Monday, September 26, 2022 - link

    @Ryan Smith please do this. I was also having difficulty reading the Spec graph. Reply
  • Gavin Bonshor - Monday, September 26, 2022 - link

    I apologize for doing it this way. I promise I'll sort it in the morning (UK based) Reply
  • yeeeeman - Monday, September 26, 2022 - link

    Retaking the high end for 1 month. Reply
  • yeeeeman - Monday, September 26, 2022 - link

    TBH, what I am most excited about is the zen 4 laptop parts, like the phoenix apu, with 8 zen 4 cores, rdna 3 igpu, lpddr5, 4nm cpu, 5nm gpu, that should bring some clear improvements over the 4000 series ryzen which are still amazingly good. 5000 and 6000 series haven't brought much improvements over the 4000 series, like my 4800H, so I am curious to see what the 7000 series will bring. Already dreaming about a fully metal body, slim laptop, 14-16 inch, OLED, 90Hz minimum, laminated screen, preferably touch and 360 hinge, 1.5kg top. that will be nice. Reply
  • abufrejoval - Wednesday, September 28, 2022 - link

    Since you're hinting that Intel will change things, there is much less of a chance for Intel to catch up in the mobile sector on 10nm.

    For the laptops I see a different story at almost every five Watts of permissible power for the CPUside of things. But much less change between the 4000-7000 Zen generations at the same energy settings.

    Any hopes for a more-than-casual gaming iGPU can't but fail, because AMD can't overcome the DRAM bandwidth limitations, unless they were to start with stuff like extra channels of RAM on the die carrier like Apple (or HBM).

    And that basically leaves 13% of IPC improvements, some efficiency gains but much less clock gains, because that's mostly additional Wattage on the desktop parts, not available on battery.

    I haven't tried the 6800U yet, but even if it were to be 100% better than my 5800U, that's still too slow a GPU to drive my Lenovo Yoga Slim 7 13ACN notebook's 2560x1600 display full throttle. Even 4x speed won't change that, it just takes a 250 Watt GPU to drive that resolution more like 350 Watts for 4k.

    I just bought a nice 3k 90Hz OLED 5825U based 14" notebook (Asus Zenbook 14) for one of my sons, full metal slim but without touch for less than €1000 including taxes and he's completely stunned by the combination of display brightness (he tends to use it outside) and battery life.

    As long as you think of it as a 2D machine that will do fine display Google Earth in 3D, you'll be happy. If you try to turn it into a gaming laptop it's outright grief or severe compromises.

    And I just don't see how a dGPU on an APU makes much sense, because you just purchase capabilities twice without the ability to combine them in something that actually works. Those hybrid approaches were only ever good in theory.
    Reply
  • Makaveli - Monday, September 26, 2022 - link

    "I have a 1440p 144Hz monitor and I play at 1080p just because that's what I'm used to."

    *Insert ryan reynolds meme

    Buy why?
    Reply
  • Gavin Bonshor - Monday, September 26, 2022 - link

    Because I fear that if I drop below 144 Hz in any title, that my life wouldn't be able to cope. Maybe I just need to upgrade from an RX 5700 XT? Reply
  • Makaveli - Tuesday, September 27, 2022 - link

    Ah yes its time.

    Go RDNA3
    Reply
  • kryn5 - Monday, September 26, 2022 - link

    "Despite modern-day graphics cards, especially the flagships, now at the level where 1440p and 4K gaming is viable, 1080p is still a very popular resolution to play games at; I have a 1440p 144Hz monitor and I play at 1080p just because that's what I'm used to."

    I... what?
    Reply

Log in

Don't have an account? Sign up now