Cache and Infinity Fabric

If it hasn’t been hammered in already,  the big change in the cache is the L1 instruction cache which has been reduced from 64 KB to 32 KB, but the associativity has increased from 4-way to 8-way. This change enabled AMD to increase the size of the micro-op cache from 2K entry to 4K entry, and AMD felt that this gave a better performance balance with how modern workloads are evolving.

The L1-D cache is still 32KB 8-way, while the L2 cache is still 512KB 8-way. The L3 cache, which is a non-inclusive cache (compared to the L2 inclusive cache), has now doubled in size to 16 MB per core complex, up from 8 MB. AMD manages its L3 by sharing a 16MB block per CCX, rather than enabling access to any L3 from any core.

Because of the increase in size of the L3, latency has increased slightly. L1 is still 4-cycle, L2 is still 12-cycle, but L3 has increased from ~35 cycle to ~40 cycle (this is a characteristic of larger caches, they end up being slightly slower latency; it’s an interesting trade off to measure). AMD has stated that it has increased the size of the queues handling L1 and L2 misses, although hasn’t elaborated as to how big they now are.

Infinity Fabric

With the move to Zen 2, we also move to the second generation of Infinity Fabric. One of the major updates with IF2 is the support of PCIe 4.0, and thus the increase of the bus width from 256-bit to 512-bit.

Overall efficiency of IF2 has improved 27% according to AMD, leading to a lower power per bit. As we move to more IF links in EPYC, this will become very important as data is transferred from chiplet to IO die.

One of the features of IF2 is that the clock has been decoupled from the main DRAM clock. In Zen and Zen+, the IF frequency was coupled to the DRAM frequency, which led to some interesting scenarios where the memory could go a lot faster but the limitations in the IF meant that they were both limited by the lock-step nature of the clock. For Zen 2, AMD has introduced ratios to the IF2, enabling a 1:1 normal ratio or a 2:1 ratio that reduces the IF2 clock in half.

This ratio should automatically come into play around DDR4-3600 or DDR4-3800, but it does mean that IF2 clock does reduce in half, which has a knock on effect with respect to bandwidth. It should be noted that even if the DRAM frequency is high, having a slower IF frequency will likely limit the raw performance gain from that faster memory. AMD recommends keeping the ratio at a 1:1 around DDR4-3600, and instead optimizing sub-timings at that speed.

Integer Units, Load and Store Conclusions: Platform, SoC, Core
Comments Locked

216 Comments

View All Comments

  • Korguz - Monday, June 17, 2019 - link

    im glad im not the only one that sees this...
  • Qasar - Monday, June 17, 2019 - link

    korguz, you aren't the only one that sees it.

    Xyler94, i dont hate intel.. but i am sick of what they have done so far to the cpu industry, sticking the mainstream with quad cores for how many years ? i would of loved to get a 6 or 8 core intel chip, but the cost of the platform, made it out of my reach. the little performance gains year over year, come on, thats the best intel can do with all the money they have ?? and the constant lies about 10nm.... then Zen is released and what was it, less then 2 months later, intel all of a sudden has more then 4 cores for the mainstream, and even more cores for the HEDT ? my next upgrade at this point, looks to be zen 2.. but i am waiting till the 7th, to read the reviews. hstewart does glorify intel any chance he can, and it just looks so stupid, cause some one calls him out on it.. and he seems to pretty much vanish from that convo
  • HStewart - Thursday, June 13, 2019 - link

    Notice that I mention unless they change it from dual 128 bit.
  • Targon - Thursday, June 13, 2019 - link

    Socket AM4 is limited to a dual-channel memory controller, because you need more pins to add more memory channels. The same applies to the number of PCI Express lanes as well. The only way around this would be to use one of the abilities of Gen-Z where the CPU would just talk to the Gen-Z bus, at which point, dedicated pins for memory and PCI Express could be replaced by a very wide and fast connection to the system bus/fabric. Since that would require a new motherboard and for the CPU to be designed around it, why bother with socket AM4 at that point?
  • Korguz - Thursday, June 13, 2019 - link

    why bother?? um upgrade ability ? maybe not quite needed ? the things you suggest, sound like they would be a little expensive to implement. if you need more memory bandwidth and pcie lanes.. grab a TR board and a lower end cpu....
  • austinsguitar - Monday, June 10, 2019 - link

    Thank you Ian for this write up. :)
  • megapleb - Monday, June 10, 2019 - link

    Why does the 3600X have power consumption of 95W, and the 3700X, with two more cores, four more threads, and the same frequency max, consume only 65W? I'm guessing those two got switched around?
  • anonomouse - Monday, June 10, 2019 - link

    higher sustained base clock drives up the tdp
  • megapleb - Monday, June 10, 2019 - link

    200Mhz extra base increases power consumption by 46%? I would have though max power consumption would be all cores operating at maximum frequency so the base would have nothing to do with it?
  • scineram - Tuesday, June 11, 2019 - link

    Nobody said anything about power consumption.

Log in

Don't have an account? Sign up now