Comparing With the Other ARMs

We did not have access to any recent Cortex-A57 or X-Gene platform to run the full SPEC CPU2006 suite. But we can still combine our previous findings with those that have been published on the 7-cpu.com. The first X-Gene 1 result is our own measurement, the second one is the best we could find.

SKU Clock Baseline Xeon D Compress Baseline Xeon D Decompress
Atom C2720 2.4 1687 2114
X-Gene 1 (AT bench) 2.4 1580 1864
X-Gene 1 (best) 2.4 1770 1980
Cortex-A57 1.9 1500 2330
ThunderX 2.0 1547 2042
Xeon D1557 1.5-2.1 3079 2320
Xeon E5-2640 v4 2.4-2.6 3755 2943
Xeon E5-2690 v3 2.6-3.5 4599 3811

Let's translate this to percentages, where we compare the Thunder-X performance to the Xeon D and the Cortex-A57, two architectures it must try to beat. The first one is to open a broader market, the second one to justify the development of a homegrown ARMv8 microarchitecture.

SKU Clock Baseline Xeon D Compress Baseline Xeon D Decompress Baseline A57 Compress Baseline A57 Decompress
Atom C2720 2.4 55% 91% 112% 91%
X-Gene (AT bench) 2.4 51% 80% 105% 80%
X-Gene (best) 2.4 57% 85% 118% 85%
Cortex-A57 1.9 49% 100% 100% 100%
ThunderX 2.0 50% 88% 103% 88%
Xeon D1557 2.1 100% 100% 205% 100%
Xeon E5-2640 v4 2.4 122% 127% 250% 126%
Xeon E5-2690 v3 3.5 149% 164% 307% 164%

First of all, these benchmarks should be placed in perspective: they tend to have a different profile than most server applications. For example compression relies a lot on memory latency and TLB efficiency. Decompression relies on integer instructions (shift, multiply). Since this test has unpredictable branches, the ThunderX has an advantage.

The ThunderX at 2 GHz performs more or less like an A57 core at the same speed. Considering that AMD only got eight A57 cores inside a power envelope of 32W using similar process technology, you could imagine that a A57 chip would be able to fit 32 cores at the most in a 120W TDP envelope. So Cavium did quite well fitting about 50% more cores inside the same power envelope using an old 28 nm high-k metal gate process.

Nevertheless, a 120W Xeon E5 offers about 2.5-3 times higher compression performance. The gap is indeed much smaller in decompression, where the wide Broadwell core is only 13% (!) faster than the narrow ThunderX core (compare the Xeon D-1557 with the ThunderX).

Multi-Threaded Integer Performance: SPEC CPU2006 Compression & Decompression
Comments Locked

82 Comments

View All Comments

Log in

Don't have an account? Sign up now