Zen 4 Execution Pipeline: Familiar Pipes With More Caching

Finally, let’s take a look at the Zen 4 microarchitecture’s execution flow in-depth. As we noted before, AMD is seeing a 13% IPC improvement over Zen 3. So how did they do it?

Throughout the Zen 4 architecture, there is not any single radical change. Zen 4 does make a few notable changes, but the basics of the instruction flow are unchanged, especially on the back-end execution pipelines. Rather, many (if not most) of the IPC improvements in Zen 4 come from improving cache and buffer sizes in some respect.

Starting with the front end, AMD has made a few important improvements here. The branch predictor, a common target for improvements given the payoffs of correct predictions, has been further iterated upon for Zen 4. While still predicting 2 branches per cycle (the same as Zen 3), AMD has increased the L1 Branch Target Buffer (BTB) cache size by 50%, to 2 x 1.5k entries. And similarly, the L2 BTB has been increased to 2 x 7k entries (though this is just an ~8% capacity increase). The net result being that the branch predictor’s accuracy is improved by being able to look over a longer history of branch targets.

Meanwhile the branch predictor’s op cache has been more significantly improved. The op cache is not only 68% larger than before (now storing 6.75k ops), but it can now spit out up to 9 macro-ops per cycle, up from 6 on Zen 3. So in scenarios where the branch predictor is doing especially well at its job and the micro-op queue can consume additional instructions, it’s possible to get up to 50% more ops out of the op cache. Besides the performance improvement, this has a positive benefit to power efficiency since tapping cached ops requires a lot less power than decoding new ones.

With that said, the output of the micro-op queue itself has not changed. The final stage of the front-end can still only spit out 6 micro-ops per clock, so the improved op cache transfer rate is only particularly useful in scenarios where the micro-op queue would otherwise be running low on ops to dispatch.

Switching to the back-end of the Zen 4 execution pipeline, things are once again relatively unchanged. There are no pipeline or port changes to speak of; Zen 4 still can (only) schedule up to 10 Integer and 6 Floating Point operations per clock. Similarly, the fundamental floating point op latency rates remain unchanged as 3 cycles for FADD and FMUL, and 4 cycles for FMA.

Instead, AMD’s improvements to the back-end of Zen 4 have here too focused on larger caches and buffers. Of note, the retire queue/reorder buffer is 25% larger, and is now 320 instructions deep, giving the CPU a wider window of instructions to look through to extract performance via out-of-order execution. Similarly, the Integer and FP register files have been increased in size by about 20% each, to 224 registers and 192 registers respectively, in order to accommodate the larger number of instructions that are now in flight.

The only other notable change here is AVX-512 support, which we touched upon earlier. AVX execution takes place in AMD’s floating point ports, and as such, those have been beefed up to support the new instructions.

Moving on, the load/store units within each CPU core have also been given a buffer enlargement. The load queue is 22% deeper, now storing 88 loads. And according to AMD, they’ve made some unspecified changes to reduce port conflicts with their L1 data cache. Otherwise the load/store throughput remains unchanged at 3 loads and 2 stores per cycle.

Finally, let’s talk about AMD’s L2 cache. As previously disclosed by the company, the Zen 4 architecture is doubling the size of the L2 cache on each CPU core, taking it from 512KB to a full 1MB. As with AMD’s lower-level buffer improvements, the larger L2 cache is designed to further improve performance/IPC by keeping more relevant data closer to the CPU cores, as opposed to ending up in the L3 cache, or worse, main memory. Beyond that, the L3 cache remains unchanged at 32MB for an 8 core CCX, functioning as a victim cache for each CPU core’s L2 cache.

All told, we aren’t seeing very many major changes in the Zen 4 execution pipeline, and that’s okay. Increasing cache and buffer sizes is another tried and true way to improve the performance of an architecture by keeping an existing design filled and working more often, and that’s what AMD has opted to do for Zen 4. Especially coming in conjunction with the jump from TSMC 7nm to 5nm and the resulting increase in transistor budget, this is good way to put those additional transistors to good use while AMD works on a more significant overhaul to the Zen architecture for Zen 5.

Zen 4 Architecture: Power Efficiency, Performance, & New Instructions Test Bed and Setup
POST A COMMENT

205 Comments

View All Comments

  • Tom Sunday - Friday, September 30, 2022 - link

    Just today received a special sales notice from Micro Center giving away FREE 32GB DDR5 with any purchase of a Ryzen 7000 series CPU. I wonder if AMD is sponsering such a sales push and this early in the game? Giving away a $190 value is a big deal in the trying times of today! Reply
  • Castillan - Sunday, October 2, 2022 - link

    I suspect that's a Microcenter specific deal only. The RAM is 5600 at a fairly high latency (I think it was CAS40?). DDR5 prices have plummeted as well. The memory I picked up from Microcenter was 6600/CAS34 and marked down to 279 from 499.

    I'd guess that they have a surplus of a certain stock item that wasn't selling, and decided to use this promo to offload unwanted stock and still look good.
    Reply
  • imaskar - Friday, September 30, 2022 - link

    It would be really great to add code compilation tests: Java, Go, C++ (linux kernel), Rust. Reply
  • dizzynosed - Saturday, October 1, 2022 - link

    Si what shall I buy? Intel, amd, ??? Which cpu?? I only game. Reply
  • rocky12345 - Saturday, October 1, 2022 - link

    What's wrong with the gaming scores on the 7000 series there is no way a 5000 series should be able to match or beat a 7000 AMD CPU. I know this because I have a AMD Ryzen 5900x properly setup and tweaked. AMD is said to have sent DDR5 6000 with the test CPU's and asked the reviewers to use that to test with. Lets face it 97% of the people buying a new AMD Zen 4 setup or Intel 12th gen are not going to be using bargain basement low speed ram and if they do happen to buy cheaper ram most are more than likely to try and run it at the highest speed possible. did I read that right you used CL44 DDR5 5200Mhz talk about dead heading performance.

    Also maybe I missed it but what was the Intel test system setup? other than that it was a decent review. I never have seen Ryzen 5000 that close in gaming I guess using slow DDR5 knee jerks Ryzen 7000. My own ram is running at CL16 4000Mhz 2000IF and at the reported number in the review if I had the same video card I would be either faster or only slightly slower than the test results here for games and that would give me false hope that my Zen 3 was faster than it really is lol.
    Reply
  • Oxford Guy - Sunday, October 2, 2022 - link

    The only way you're going to see movement on this is if you lobby AMD to abandon JEDEC.

    This site sees JEDEC as all there is.
    Reply
  • GeoffreyA - Monday, October 3, 2022 - link

    I think it's about keeping a common baseline of memory speed, especially since Anandtech's database is about having parts directly comparable. Reply
  • Oxford Guy - Monday, October 10, 2022 - link

    That’s not the reason that has been given again and again and it’s a terrible one anyway. The parts are different. The memory that goes best with those parts differs. Reply
  • GeoffreyA - Tuesday, October 11, 2022 - link

    They should have set all the systems to DDR4 3200 and called it a day. Reply
  • byte99 - Sunday, October 2, 2022 - link

    I'm a bit confused. When Anandtech was doing their efficiency analysis, it seemed they were taking the 65W Eco mode label as the actual package power, instread of actually measuring it (as they usually do). When Ars Technica measured the package power of the 7950X and 7600X in 65w Eco Mode, they found it was 90W for both.

    [ https://arstechnica.com/gadgets/2022/09/ryzen-7600... ]

    Did Anandtech miss something obvious, or am I missing something?
    Reply

Log in

Don't have an account? Sign up now