Zen 4 Execution Pipeline: Familiar Pipes With More Caching

Finally, let’s take a look at the Zen 4 microarchitecture’s execution flow in-depth. As we noted before, AMD is seeing a 13% IPC improvement over Zen 3. So how did they do it?

Throughout the Zen 4 architecture, there is not any single radical change. Zen 4 does make a few notable changes, but the basics of the instruction flow are unchanged, especially on the back-end execution pipelines. Rather, many (if not most) of the IPC improvements in Zen 4 come from improving cache and buffer sizes in some respect.

Starting with the front end, AMD has made a few important improvements here. The branch predictor, a common target for improvements given the payoffs of correct predictions, has been further iterated upon for Zen 4. While still predicting 2 branches per cycle (the same as Zen 3), AMD has increased the L1 Branch Target Buffer (BTB) cache size by 50%, to 2 x 1.5k entries. And similarly, the L2 BTB has been increased to 2 x 7k entries (though this is just an ~8% capacity increase). The net result being that the branch predictor’s accuracy is improved by being able to look over a longer history of branch targets.

Meanwhile the branch predictor’s op cache has been more significantly improved. The op cache is not only 68% larger than before (now storing 6.75k ops), but it can now spit out up to 9 macro-ops per cycle, up from 6 on Zen 3. So in scenarios where the branch predictor is doing especially well at its job and the micro-op queue can consume additional instructions, it’s possible to get up to 50% more ops out of the op cache. Besides the performance improvement, this has a positive benefit to power efficiency since tapping cached ops requires a lot less power than decoding new ones.

With that said, the output of the micro-op queue itself has not changed. The final stage of the front-end can still only spit out 6 micro-ops per clock, so the improved op cache transfer rate is only particularly useful in scenarios where the micro-op queue would otherwise be running low on ops to dispatch.

Switching to the back-end of the Zen 4 execution pipeline, things are once again relatively unchanged. There are no pipeline or port changes to speak of; Zen 4 still can (only) schedule up to 10 Integer and 6 Floating Point operations per clock. Similarly, the fundamental floating point op latency rates remain unchanged as 3 cycles for FADD and FMUL, and 4 cycles for FMA.

Instead, AMD’s improvements to the back-end of Zen 4 have here too focused on larger caches and buffers. Of note, the retire queue/reorder buffer is 25% larger, and is now 320 instructions deep, giving the CPU a wider window of instructions to look through to extract performance via out-of-order execution. Similarly, the Integer and FP register files have been increased in size by about 20% each, to 224 registers and 192 registers respectively, in order to accommodate the larger number of instructions that are now in flight.

The only other notable change here is AVX-512 support, which we touched upon earlier. AVX execution takes place in AMD’s floating point ports, and as such, those have been beefed up to support the new instructions.

Moving on, the load/store units within each CPU core have also been given a buffer enlargement. The load queue is 22% deeper, now storing 88 loads. And according to AMD, they’ve made some unspecified changes to reduce port conflicts with their L1 data cache. Otherwise the load/store throughput remains unchanged at 3 loads and 2 stores per cycle.

Finally, let’s talk about AMD’s L2 cache. As previously disclosed by the company, the Zen 4 architecture is doubling the size of the L2 cache on each CPU core, taking it from 512KB to a full 1MB. As with AMD’s lower-level buffer improvements, the larger L2 cache is designed to further improve performance/IPC by keeping more relevant data closer to the CPU cores, as opposed to ending up in the L3 cache, or worse, main memory. Beyond that, the L3 cache remains unchanged at 32MB for an 8 core CCX, functioning as a victim cache for each CPU core’s L2 cache.

All told, we aren’t seeing very many major changes in the Zen 4 execution pipeline, and that’s okay. Increasing cache and buffer sizes is another tried and true way to improve the performance of an architecture by keeping an existing design filled and working more often, and that’s what AMD has opted to do for Zen 4. Especially coming in conjunction with the jump from TSMC 7nm to 5nm and the resulting increase in transistor budget, this is good way to put those additional transistors to good use while AMD works on a more significant overhaul to the Zen architecture for Zen 5.

Zen 4 Architecture: Power Efficiency, Performance, & New Instructions Test Bed and Setup
POST A COMMENT

205 Comments

View All Comments

  • Silver5urfer - Tuesday, September 27, 2022 - link

    Intel won't sell new mobos. They already have Z690 saturation. Barely anyone will get Z790. AMD on the other hand will continue to sell new boards, the quarter is not based on the Client only. It will include the HPC. Intel lost money there, and AMD won't be losing because Genoa is on track and SPR XEON is delayed.

    AMD AM5 is not just hey this thing is fast and just for gaming. It will be a socket that is going to last until Intel Nova Lake launches that is next 2 Intel sockets. That is a huge advantage for a small price for paying customers now.

    Also why is everyone chanting same BS that GN Steve did with AMD boards are too expensive, did you see how Z690 was at when it launched same thing it was expensive ? And DDR4 boards are worse quality and features than the premium cut DDR5. Then Intel launched B660 and AMD's B650/E is also coming. So nope that BS argument about Mobo pricing is too much thrown around. Once the B650 launches by that time 13th gen will hit Retail market and new GPUs as well. And it's November season and in America the Black Friday sales will kick in and see price cut for all products we are seeing now.

    So ultimately AMD is not going to lose money.

    The biggest BS from a smart customer pov is with Intel LGA1700 EOL and the whole socket bending crap, it's like AM4's unreliable IMC and poor IODie with it's issues. AM5 needs to prove itself but given how they removed the IF from memory clocks I can bet it won't have the issues from AM4.

    X3D is a niche market it won't be chart topper for sales at-least if it's again 7800X3D single SKU. Same for KS bin. It depends on how AMD will execute, idk why every single AMD fan says X3D is going to do something if AMD can clock it this high and also allow tuning then it will be a true gen refresh to compete vs Meteor Lake else it will be just a Gaming Juggernaut.
    Reply
  • nandnandnand - Tuesday, September 27, 2022 - link

    @Silver5urfer rumored to be 3 SKUs, including a 7900X3D, and +30% average performance instead of 15%. I guess that would be a result of improved latency, bandwidth, no voltage/clock decreases, etc. Reply
  • Silver5urfer - Wednesday, September 28, 2022 - link

    A 7950X3D means it will have extreme high heat because not only single cache stack you are adding 2 stacks atop the CCDs, how will AMD able to remove that ? Unless the way Cowos TSMC Stacking is technically changed OR they have to lap out the IHS internally to reduce the thickness and compensate the high heat transfer. The current IHS is thick due to many reasons one can assume - The LGA1718 stability, Chiplet integrity with high heat and pressure of HS and cooler compat and it causes the heat density increase, which is why 95C.

    I really think a 7800X3D is the only way for AMD even though rumors mention 3SKUs because a total SKU refresh totally cannibalize the entire 7000 lineup, because a 7600X is to get best gaming out of AM5 with cheaper option almost at more than 1/2 the price reduction vs a top end R9. And R7 7900X is basically an all rounder like 5900X best for gaming and production now you add the Cache block it would have to fight with 7900X.

    Voltage reduction was done on Zen 3 because AMD shoved 1.4v through all Ryzen 5000 processors, insanely high and IODie was also on high voltage, causing all that instability add the 1.3v bin silicon, everything gets better including the heat density. Zen 4 TSMC 5N is much better because it's just 1.2v now at high clock rate. The voltage is not an issue anymore, the design of the Zen 4 itself is like this, how AMD intended to breathe fire at 95C even for 7600X is the hint.
    Reply
  • nandnandnand - Wednesday, September 28, 2022 - link

    Heat was never the problem for the 5800X3D. It was only voltage, due to using an immature 3D (2.5D) chiplet technology that could not be run at the higher voltages. So I don't think the 7950X3D can't happen. If they have to drop voltages and clocks again, then hopefully the cache has improved.

    I think AMD should do at least a 7950X3D and 7800X3D. They can prevent cannibalization by giving it a healthy price bump. Probably +$100 to the 7950X3D, +$50 to the 7800X3D, and let the 7700X price drift lower. 7900X3D doesn't make sense, and people would love a 7600X3D but AMD would not.
    Reply
  • nandnandnand - Tuesday, September 27, 2022 - link

    @Hifihedgehog OP compared 7000X3D to the 13900KS, that's what I addressed. Reply
  • Hifihedgehog - Tuesday, September 27, 2022 - link

    Wrong: the i9-13900K is less than $600. The 7950X is going to have to have its price lowered, especially with the price of DDR5 and the motherboards simply off the charts. And good too: Lisa Su needs to be running a price war and not pretend that her company has more market share. Reply
  • The Von Matrices - Tuesday, September 27, 2022 - link

    A price war doesn't benefit AMD when they are supply constrained by TSMC and selling every chip they can manufacture. There's a reason that AMD doesn't offer any products in the <=$100 CPU market right now and it isn't because they don't want to make money. Reply
  • Hifihedgehog - Tuesday, September 27, 2022 - link

    https://download.intel.com/newsroom/2022/2022innov... Reply
  • dwade123 - Tuesday, September 27, 2022 - link

    Overheated and overpriced. Don't let those scumbags tell you that "95C is normal" because it's not. Avoid at all cost! Reply
  • Thanny - Tuesday, September 27, 2022 - link

    Running the memory at JEDEC speeds is definitely the wrong choice for a review. While it may be true that most people don't set the memory profile in the BIOS, none of those people read CPU reviews. Essentially every person who would read this reviews will be setting memory to the XMP/EXPO settings.

    So you're essentially invalidating your test results for the only people who see them.
    Reply

Log in

Don't have an account? Sign up now