The Ampere Altra Max Review: Pushing it to 128 Cores per Socketby Andrei Frumusanu on October 7, 2021 8:00 AM EST
SPEC - Per-Core Performance under Load
A metric that is actually more interesting than isolated single-thread performance, is actually per-thread performance in a fully loaded system. This actually is a measurement and benchmark figure that would greatly interest enterprises and customers which are running software or workloads that are possibly licensed on a per-core basis, or simply workloads that require a certain level of per-thread service level agreement in terms of performance.
The Altra Max here is inevitably expected to post worse metrics than the Altra – first of all due to the 10% lower core frequencies, and second of all due to lower shared resources that need to be shared amongst more cores.
As expected, taking view of the aggregate socket performance figures here, the M128-80 doesn’t fare well here in the metric, as it takes the much controversial “flock of chickens” approach to core performance. It’s also dragged down by the score regressions in the memory bound workload in SPEC.
In terms of total throughput vs per-thread performance, the M128-30 barely differs to the EPYC 7763 with SMT and half the physical cores.
Again, I have to reiterate that these figures are all very much not in favour of the workloads that the Altra Max was designed for – cloud and hyperscaler workloads, which don’t tend to have the same harsher memory demands as some of the workloads in the SPEC suite. However, extracting those workloads in a different subset would also be a questionable practise and actually frowned upon.
The Altra Max here really does need to have huge footnotes in the way that it presents itself.