AMD's EPYC Server CPU

If you have read Ian's articles about Zen and EPYC in detail, you can skip this page. For those of you who need a refresher, let us quickly review what AMD is offering. 

The basic building block of EPYC and Ryzen is the CPU Complex (CCX), which consists of 4 vastly improved "Zen" cores, connected to an L3-cache. In a full configuration each core technically has its own 2 MB of L3, but access to the other 6 MB is rather speedy. Within a CCX we measured 13 ns to access the first 2 MB, and 15 to 19 ns for the rest of the 8 MB L3-cache, a difference that's hardly noticeable in the grand scheme of things. The L3-cache acts as a mostly exclusive victim cache. 

Two CCXes make up one Zeppelin die. A custom fabric – AMD's Infinity Fabric – ties together two CCXes, the two 8 MB L3-caches, 2 DDR4-channels, and the integrated PCIe lanes. That topology is not without some drawbacks though: it means that there are two separate 8 MB L3 caches instead of one single 16 MB LLC. This has all kinds of consequences. For example the prefetchers of each core make sure that data of the L3 is brought into the L1 when it is needed. Meanwhile each CCX has its own separate (not inside the L3, so no capacity hit) and dedicated SRAM snoop directory (keeping track of 7 possible states). In other words, the local L3-cache communicates very quickly with everything inside the same CCX, but every data exchange between two CCXes comes with a tangible latency penalty. 

Moving further up the chain, the complete EPYC chip is a Multi Chip Module(MCM) containing 4 Zeppelin dies.

AMD made sure that each die is only one hop apart from the other, ensuring that the off-die latency is as low as reasonably possible.

Meanwhile scaling things up to their logical conclusion, we have 2P configurations. A dual socket EPYC setup is in fact a "virtual octal socket" NUMA system. 

AMD gave this "virtual octal socket" topology ample bandwidth to communicate. The two physical sockets are connected by four bidirectional interconnects, each consisting of 16 PCIe lanes. Each of these interconnect links operates at +/- 38 GB/s (or 19 GB/s in each direction). 

So basically, AMD's topology is ideal for applications with many independently working threads such as small VMs, HPC applications, and so on. It is less suited for applications that require a lot of data synchronization such as transactional databases. In the latter case, the extra latency of exchanging data between dies and even CCX is going to have an impact relative to a traditional monolithic design.

Tensions (And Chip Sizes) Are Rising AMD’s EPYC 7000-Series Processors
Comments Locked

219 Comments

View All Comments

  • msroadkill612 - Wednesday, July 12, 2017 - link

    It looks interesting. Do u have a point?

    Are you saying they have a place in this epyc debate? using cheaper ddr3 ram on epyc?
  • yuhong - Friday, July 14, 2017 - link

    "We were told from Intel that ‘only 0.5% of the market actually uses those quad ranked and LR DRAMs’, "
  • intelemployee2012 - Wednesday, July 12, 2017 - link

    what kind of a forum and website is this? we can't delete the account, cannot edit a comment for fixing typos, cannot edit username, cannot contact an admin if we need to report something. Will never use these websites from now on.
  • Ryan Smith - Wednesday, July 12, 2017 - link

    "what kind of a forum and website is this?"

    The basic kind. It's not meant to be a replacement for forums, but rather a way to comment on the article. Deleting/editing comments is specifically not supported to prevent people from pulling Reddit-style shenanigans. The idea is that you post once, and you post something meaningful.

    As for any other issues you may have, you are welcome to contact me directly.
  • Ranger1065 - Thursday, July 13, 2017 - link

    That's a relief :)
  • iwod - Wednesday, July 12, 2017 - link

    I cant believe what i just read. While I knew Zen was good for Desktop, i expected the battle to be in Intel's flavour on the Server since Intel has years to tune and work on those workload. But instead, we have a much CHEAPER AMD CPU that perform Better / Same or Slightly worst in several cases, using much LOWER Energy during workload, while using a not as advance 14nm node compared to Intel!

    And NO words on stability problems from running these test on AMD. This is like Athlon 64 all over again!
  • pSupaNova - Wednesday, July 12, 2017 - link

    Yes it is.

    But this time much worse for Intel with their manufacturing lead shrinking along with their workforce.
  • Shankar1962 - Wednesday, July 12, 2017 - link

    Competition has spoiled the naming convention Intels 14 === competetions 7 or 10
    Intel publicly challenged everyone to revisit the metrics and no one responded
    Can we discuss the yield density and scaling metrics? Intel used to maintain 2year lead now grew that to 3-4year lead
    Because its vertically integrated company it looks like Intel vs rest of the world and yet their revenue profits grow year over year
  • iwod - Thursday, July 13, 2017 - link

    Grew to 3 - 4 years? Intel is shipping 10nm early next year in some laptop segment, TSMC is shipping 7nm Apple SoC in 200M yearly unit quantity starting next September.

    If anything the gap from 2 - 3 years is now shrink to 1 to 1.5 year.
  • Shankar1962 - Thursday, July 13, 2017 - link

    Yeah 1-1.5 years if we cheat the metrics when comparison
    2-3years if we look at metrics accurately
    A process node shrink is compared by metrics like yield cost scaling density etc
    7nm 10nm etc is just a name

Log in

Don't have an account? Sign up now