The Core Complex, Caches, and Fabric

Many core designs often start with an initial low-core-count building block that is repeated across a coherent fabric to generate a large number of cores and the large die. In this case, AMD is using a CPU Complex (CCX) as that building block which consists of four cores and the associated caches.

Each core will have direct access to its private L2 cache, and the 8 MB of L3 cache is, despite being split into blocks per core, accessible by every core on the CCX with ‘an average latency’ also L3 hits nearer to the core will have a lower latency due to the low-order address interleave method of address generation.

The L3 cache is actually a victim cache, taking data from L1 and L2 evictions rather than collecting data from prefetch/demand instructions. Victim caches tend to be less effective than inclusive caches, however Zen counters this by having a sufficiency large L2 to compensate. The use of a victim cache means that it does not have to hold L2 data inside, effectively increasing its potential capacity with less data redundancy.

It is worth noting that a single CCX has 8 MB of cache, and as a result the 8-core Zen being displayed by AMD at the current events involves two CPU Complexes. This affords a total of 16 MB of L3 cache, albeit in two distinct parts. This means that the true LLC for the entire chip is actually DRAM, although AMD states that the two CCXes can communicate with each other through the custom fabric which connects both the complexes, the memory controller, the IO, the PCIe lanes etc.

 

The cache representation shows L1 and L2 being local to each the core, followed by 8MB of L3 split over several cores. AMD states that the L1 and L2 bandwidth is nearly double that of Excavator, with L3 now up to 5x for bandwidth, and that this bandwidth will help drive the improvements made on the prefetch side. AMD also states that there are large queues in play for L1/L2 cache misses.

One interesting story is going to be how AMD’s coherent fabric works. For those that follow mobile phone SoCs, we know fabrics and interconnects such as CCI-400 or the CCN family are optimized to take advantage of core clusters along with the rest of the chip. A number of people have speculated that the fabric used in AMD’s new design is based on HyperTransport, however AMD has confirmed that they are using a superset HyperTransport here for Zen, and that the Infinity fabric design is meant to be high bandwidth, low latency, and be in both Zen and Vega as well as future products. Almost similar to the CPU/GPU roadmaps, the Fabric has its own as well.

Ultimately the new fabric involves a series of control and data passing structures, with the data passing enabling third-party IP in custom designs, a high-performance common bus for large multi-unit (CPU/GPU) structures, and socket to socket communication. The control elements are an extension of power management, enabling parts of the fabric to duty cycle when not in use, security by way of memory management and detection, and test/initialization for activities such as data prefetch.

Execution, Load/Store, INT and FP Scheduling Simultaneous MultiThreading (SMT) and New Instructions
Comments Locked

574 Comments

View All Comments

  • FriendlyUser - Thursday, March 2, 2017 - link

    True. The 1600X will be competitive with the i5 at gaming and probably much faster in anything multithreaded. The crucial point is the price... $200 would be great.
  • MrSpadge - Thursday, March 2, 2017 - link

    "Ryzen will need to drop in price. $500 1800x is still too expensive. According to this even a 7700k @ $300 -$350 is still a good choice for gamers."

    That's what the 1700X is for.
  • lilmoe - Thursday, March 2, 2017 - link

    +1
    And for that, I'd say the 1700 (non-x) is the best consumer CPU available ATM. BUT, if someone just wants to game, I'd say get the Core i5... For me though, screw Intel. Never going them again.
  • fanofanand - Thursday, March 2, 2017 - link

    The 1700 is the sweet spot for anyone not trying to eek out a few more fps or drop their encode/decode times by a couple of seconds. To save $170 and lose a couple hundred mhz, I know which chip seems like the best all-around for price/performance and that's the 1700.
  • lilmoe - Thursday, March 2, 2017 - link

    Yep. You get both efficiency and performance when needed. This should allow for super quiet and very performant builds. Just take a look at the idle system power draw of these chips. Super nice.

    Everything is going either multi-threaded or GPU accelerated, even compiling code. What I'm really waiting for is Raven Ridge. I've got lots of stock $$ and high hopes for a low power 4-6 core Zen APU, with HBM and some bonus blocks for video encode (akin to Quicksync). I have a feeling they'll be much better for idling power and have better support for Microsoft's connected standby.
  • khanikun - Friday, March 3, 2017 - link

    i5 is a good gamer and all around cpu for majority of users. If all you plan to do is game and a tight budget, the i3 7350k is a great cpu for just that. Once the workload goes a bit more multithreaded, that's where you'll want to move to an i5.
  • Valis - Friday, March 3, 2017 - link

    I game now and then, but I do a lot of other things too. Video rendering, Crypto coins, Folding @ home, VM, etc. So any Zen, perhaps even 4 Core later thins year with a good GPU will suit me fine. :)
  • nos024 - Thursday, March 2, 2017 - link

    So the 1800x is pointless?
  • lilmoe - Thursday, March 2, 2017 - link

    I don't think pointless is the right word. I'd say it's the worse value for dollar of the three.
  • tacitust - Thursday, March 2, 2017 - link

    Not at all pointless if you do a lot of video transcoding or other CPU intensive tasks well suited to multiple cores. The price premium is still for the 1800x is way lower than the price premium for the Intel processors.

Log in

Don't have an account? Sign up now