Sizing Up Servers: Intel's Skylake-SP Xeon versus AMD's EPYC 7000 - The Server CPU Battle of the Decade?by Johan De Gelas & Ian Cutress on July 11, 2017 12:15 PM EST
Intel's Optimized Turbo Profiles
Also new to Skylake-SP, Intel has also further enhanced turbo boosting.
There are also some security and virtualization enhancements (MBE, PPK, MPX) , but these are beyond the scope this article as we don't test them.
Summing It All Up: How Skylake-SP and Zen Compare
The table below shows you the differences in a nutshell.
|AMD EPYC 7000
||Intel Skylake-SP||Intel Broadwell-EP
|Package & Dies||Four dies in one MCM||Monolithic||Monolithic|
|Die size||4x 195 mm²||677 mm²||456 mm²|
|On-Chip Topology||Infinity Fabric
|Socket configuration||1-2S||1-8S ("Platinum")||1-2S|
|4x16 (64) PCIe lanes
4x 37.9 GB/s
|3x UPI 20 lanes
3x 41.6 GB/s
|2x QPI 20 lanes
2x 38.4 GB/s
|LLC (max.)||64MB (8x8 MB)||38.5 MB||55 MB|
|Max. Memory||2 TB||1.5 TB||1.5 TB|
Fastest sup. DRAM
|PCIe Per CPU in a 2P
||64 PCIe (available)||48 PCIe 3.0||40 PCIe 3.0|
(*) total bandwidth (bidirectional)
At a high level, I would argue that Intel has the most advanced multi-core topology, as they're capable of integrating up to 28 cores in a mesh. The mesh topology will allow Intel to add more cores in future generations while scaling consistently in most applications. The last level cache has a decent latency and can accommodate applications with a massive memory footprint. The latency difference between accessing a local L3-cache chunk and one further away is negligible on average, allowing the L3-cache to be a central storage for fast data synchronization between the L2-caches. However, the highest performing Xeons are huge, and thus expensive to manufacture.
AMD's MCM approach is much cheaper to manufacture. Peak memory bandwidth and capacity is quite a bit higher with 4 dies and 2 memory channels per die. However, there is no central last level cache that can perform low latency data coordination between the L2-caches of the different cores (except inside one CCX). The eight 8 MB L3-caches acts like - relatively low latency - spill over caches for the 32 L2-caches on one chip.