Power

As with the Ryzen parts, EPYC will support 0.25x multipliers for P-state jumps of 25 MHz. With sufficient cooling, different workloads will be able to move between the base frequency and the maximum boost frequency in these jumps – AMD states that by offering smaller jumps it allows for smoother transitions rather than locking PLLs to move straight up and down, providing a more predictable performance implementation. This links into AMD’s new strategy of performance determinism vs power determinism.

Each of the EPYC CPUs include two new modes, one based on power and one based on performance. When a system configured at boot time to a specific maximum power, performance may vary based on the environment but the power is ultimately limited at the high end. For performance, the frequency is guaranteed, but not the power.  This enables AMD customers to plan in advance without worrying about how different processors perform with regards voltage/frequency/leakage, or helps provide deterministic performance in all environments. This is done at the system level at boot time, so all VMs/containers on a system will be affected by this.

This extends into selectable power limits. For EPYC, AMD is offering the ability to run processors at a lower or higher TDP than out of the box – most users are likely familiar with Intel’s cTDP Up and cTDP Down modes on the mobile processors, and this feature by AMD is somewhat similar. As a result, the TDP limits given at the start of this piece can go down 15W or up 20W:

EPYC TDP Modes
Low TDP Regular TDP High TDP
155W 180W 200W
140W 155W 175W
105W 120W -

The sole 120W processor at this point is the 8-core EPYC 7251 which is geared towards memory limited workloads that pay licenses per core, hence why it does not get a higher power band to work towards.

Workload-Aware Power Management

One of AMD’s points about the sort of workloads that might be run on EPYC is that sporadic tasks are sometimes hard to judge, or are not latency sensitive. In a non-latency sensitive environment, in order to conserve power, the CPU could spread the workload out across more cores at a lower frequency. We’ve seen this sort of policy before on Intel’s Skylake and up processors, going so far as duty cycling at the efficiency point to conserve power, or in the mobile space. AMD is bringing this to the EPYC line as well.

Rather than staying at the high frequency and continually powering up and down, by reducing the frequency such the cores are active longer, latency is traded for power efficiency. AMD is claiming up to a 10% perf-per-Watt improvement with this feature.

Frequency and voltage can be adjusted for each core independently, helping drive this feature. The silicon implements per-core linear regulators that work with the onboard sensor control to adjust the AVFS for the workload and the environment. We are told that this helps reduce the variability from core-to-core and chip-to-chip, with regulation supported with 2mV accuracy. We’ve seen some of this in Carrizo and Bristol Ridge already, although we are told that the goal for per-core VDO was always meant to be EPYC.

This can not only happen on the core, but also on the Infinity Fabric links between the CPU dies or between the sockets. By modulating the link width and analyzing traffic patterns, AMD claims another 8% perf-per-Watt for socket-to-socket communications.

Performance-Per-Watt Claims

For the EPYC system, AMD is claiming power efficiency results in terms of SPEC, compiled on GCC 6.2:

AMD Claims
2P EPYC 7601 vs 2P E5-2699A V4
  SPECint SPECfp
Performance 1.47x 1.75x
Average Power 0.96x 0.99x
Total System Level Energy 0.88x 0.78x
Overall Perf/Watt 1.54x 1.76x

Comparing a 2P high-end EPYC 7601 server against Intel’s current best 2P E5-2699A v4 arrangement, AMD is claiming a 1.54x perf/watt for integer performance and 1.76x perf/watt on floating point performance, giving more performance for a lower average power resulting in overall power gains. Again, we cannot confirm these numbers, so we look forward to testing.

Security in EPYC: AMD Secure Processor, SME, SEV, AES-128 Engine AMD’s Reach and Ecosystem
Comments Locked

131 Comments

View All Comments

  • deltaFx2 - Wednesday, June 21, 2017 - link

    That's because intel cheats on SPEC in icc by doing transformations that are specifically targetted at making SPEC faster and nothing else. Libquantum is a particularly egregious example where you nearly double the performance by doing tricks that help nothing else. But this is generally true across the suite. It's not unlike VW's emission defeat devices: do something special when you're being tested.

    As for Tom's hardware, they're not authorities on anything server. What they know something about is gaming benchmarking, and that's pretty much it. I don't expect he'd know a thing about it, and whether 20% is correct vs 40%. It's a feeble attempt at sounding clever. The people buying this stuff know what they're doing, and aren't going to be influenced by some online reviewer.
  • Ryan Smith - Tuesday, June 20, 2017 - link

    I've posted galleries of the full slide decks. The slide you're interested in is: http://images.anandtech.com/galleries/5699/epyc_te...

    "Scores for these E5 processors extrapolated from test results published at www.spec.org, applying a conversion multiplier to each published score"
  • patrickjp93 - Wednesday, June 21, 2017 - link

    No, it vastly underestimates and undermines Intel's real-world performance.
  • lefty2 - Tuesday, June 20, 2017 - link

    I think you missed the point of the eight-core processor. That's for GPU compute servers, where you want the cheapest processor possible with the most PCIe lanes. It's probably going to be the one that sells the most, because Intel has nothing comparable.
  • Luckz - Tuesday, June 20, 2017 - link

    Is this useful for the mining craze?
  • LurkingSince97 - Tuesday, June 20, 2017 - link

    probably not. Miners want the most GPU (hashes) per Watt (combined with total price). If they can do that with 5 smaller, cheaper machines vs 1 larger one, they will. Mining does not need coordination across multiple GPUs.

    The enterprisey compute stuff -- machine learning being a huge one -- often _does_ need to coordinate across GPUs in one big data set and will run in datacenters where consolidation into performance/$ and performance/Watt will often like servers with few CPU and many GPU, with a ton of I/O and connectivity to other servers.

    Mining doesn't care about I/O, just total # of ports. People even use tools to split up a x16 bus of a normal consumer motherboard into may smaller PCIe ports each with a GPU on it. The GPU will compute hashes just as well with a x2 port as a x16 one.
  • KAlmquist - Wednesday, June 21, 2017 - link

    EPYC has 128 PCI-e lanes on both 1 socket and 2 socket systems, so if AMD had intended the EPYC 7251 to be used for GPU compute servers, they would have made it part of the single socket lineup. That doesn't mean that the chip won't be used in GPU compute servers; it just means that GPU compute servers are not the market that AMD intended to target with the chip.
  • Zizy - Thursday, June 22, 2017 - link

    All these 2P Epyc CPUs should be just fine in 1P as well. Obviously nobody will buy most of them to run in 1P, as 1P are cheaper. It is just 1P that is limited - it can be *only* used in 1P.
    And given 500-ish price of the chip, I can see why AMD didn't bother to give additional 400 chip for 1P - it wouldn't change anything. And limiting this chip to just 1P would be pointless, as the other segments such as "need tons of memory" would be hurting for no good reason.
  • armtec - Tuesday, June 20, 2017 - link

    NUMA NUMA IEI... I hadn't previously made this connection but now I will every time I need to think about server configs...
  • Barilla - Tuesday, June 20, 2017 - link

    I read that header and now that stupid song will be stuck in my head for days...

Log in

Don't have an account? Sign up now