The Core Complex, Caches, and Fabric

Many core designs often start with an initial low-core-count building block that is repeated across a coherent fabric to generate a large number of cores and the large die. In this case, AMD is using a CPU Complex (CCX) as that building block which consists of four cores and the associated caches.

Each core will have direct access to its private L2 cache, and the 8 MB of L3 cache is, despite being split into blocks per core, accessible by every core on the CCX with ‘an average latency’ also L3 hits nearer to the core will have a lower latency due to the low-order address interleave method of address generation.

The L3 cache is actually a victim cache, taking data from L1 and L2 evictions rather than collecting data from prefetch/demand instructions. Victim caches tend to be less effective than inclusive caches, however Zen counters this by having a sufficiency large L2 to compensate. The use of a victim cache means that it does not have to hold L2 data inside, effectively increasing its potential capacity with less data redundancy.

It is worth noting that a single CCX has 8 MB of cache, and as a result the 8-core Zen being displayed by AMD at the current events involves two CPU Complexes. This affords a total of 16 MB of L3 cache, albeit in two distinct parts. This means that the true LLC for the entire chip is actually DRAM, although AMD states that the two CCXes can communicate with each other through the custom fabric which connects both the complexes, the memory controller, the IO, the PCIe lanes etc.

One interesting story is going to be how AMD’s coherent fabric works. For those that follow mobile phone SoCs, we know fabrics and interconnects such as CCI-400 or the CCN family are optimized to take advantage of core clusters along with the rest of the chip. A number of people have speculated that the fabric used in AMD’s new design is based on HyperTransport, however AMD has confirmed that they are not using HyperTransport here for Zen. More information on the fabric may come out as we nearer the launch, although this remains one of the more mysterious elements to the design at this stage.

The cache representation in the new presentation at Hot Chips is almost identical to that in midweek, showing L1 and L2 in the core with 8MB of L3 split over several cores. AMD states that the L1 and L2 bandwidth is nearly double that of Excavator, with L3 now up to 5x for bandwidth, and that this bandwidth will help drive the improvements made on the prefetch side. AMD also states that there are large queues in play for L1/L2 cache misses.

Execution, Load/Store, INT and FP Scheduling Simultaneous MultiThreading (SMT) and New Instructions
Comments Locked

106 Comments

View All Comments

  • Krysto - Wednesday, August 24, 2016 - link

    I think PCs in general run better on four cores than on two, even if most apps themselves can't take advantage of them, although I think in the next 5 years most new games will take advantage of 8 threads. But otherwise, it's just good for multitasking.
  • tarqsharq - Wednesday, August 24, 2016 - link

    I had an argument with one fellow on the internet regarding i7 being plenty for whatever I was doing in terms of core count. But streaming a show on one monitor while playing Overwatch was hitting 70%+ CPU usage, with all logical cores being 60-70% utilized consistently, with spikes up to 90%+.

    That was on my i7-4770K to be specific, running 1080P on a 144hz monitor for Overwatch, and Crunchyroll for 1080P anime stream on the second monitor.

    So some games combined with slight multitasking is already taxing the 4C/8T environment.
  • galta - Wednesday, August 24, 2016 - link

    And how much multitasking are we really using? If I had to guess, I would say not much, on average.
    You might have some folks here and there using it, but regular users need something between two and four cores, just as you said.
    You have the OS, the software you're using, be it a game or not, plus everything that's running behind the scenes, including Windows ineficiencies, and that's it. But for some weird guy that spends his day on 7zip, more than 4 cores brings no extra power.
    This is the reason why, no matter how excited we might get with 10 cores (I would love one, even if for bragging rights only), our i5s are enough for what we do.
    Maybe in 5 years from now games will be multithreaded, but I'm not holding my breath: something similar was said 5 years ago, and here we are.
    At the end of the day, we still need improvement in per core performance.
  • looncraz - Wednesday, August 24, 2016 - link

    Browsers are becoming better and better at using more cores... and we're all running tens of processes in the background, some of which fire interrupts on a CPU. More cores allows for more going on at the same time without interruptions. You can actually feel this moving to an eight-core FX-8350 from a quad core i5... those eight cores provide a somewhat smoother multi-tasking environment, despite each core being slower and the overall performance being lower.

    Humans are simply sensitive to changes in timing - more cores and more threads reduces the variability in timing, which improves perceived performance.
  • galta - Thursday, August 25, 2016 - link

    Hum....
    I don't know many people who share your opinion about FX-8350 vs i5.
    Anyway, we have been multitasking for a while, a least to some extent: OS, Word, anti-virus, browser. The question is: for this light multitasking, are we better off with several cores with poor performance/core, or with less cores but with great performance/core.
    Reviews and actual people generally prefer the later.
    As of browsers, great news that they are improving, but download/upload speed is by far the most important factor in users experience.
  • Alexvrb - Sunday, August 28, 2016 - link

    Download speed is fine for web browsing if you've got something faster than DSL. How much data exactly do you think you're consuming while browsing the web? Outside of streaming videos you won't use up a ton of bandwidth.
  • Cooe - Thursday, May 6, 2021 - link

    I know this is ANCIENT, but how the hell did you not realize that multi-core optimization was so bad only because nobody could afford greater than >4 core CPU's pre-Zen??? Modern games run freaking TERRIBLE now on 4c/4t i5's.
  • Notmyusualid - Wednesday, August 24, 2016 - link

    No, nope, nej, and nein.

    I see (FEEL) tangible improvements in my computing ever since dropped 2 cores for 4.

    And it looks like others below agree....
  • galta - Thursday, August 25, 2016 - link

    I believe you do, for the sweet spot is now around 4 cores, as I said before.
    The question is: do you believe that your experience will improve significantly if you mo to 6 or 8 cores?
    Probably not, unless you spend your day zipping files or rendering images.
  • Alexvrb - Sunday, August 28, 2016 - link

    They said the same thing about quad cores, and dual cores before that. AMD has to get on top of the curve, not behind it. They'll offer quad cores for more mainstream systems, and 8 for performance rigs. More for servers, and potentially less for low-power and/or low-cost.

Log in

Don't have an account? Sign up now