Investigating Cavium's ThunderX: The First ARM Server SoC With Ambitionby Johan De Gelas on June 15, 2016 8:00 AM EST
Comparing With the Other ARMs
We did not have access to any recent Cortex-A57 or X-Gene platform to run the full SPEC CPU2006 suite. But we can still combine our previous findings with those that have been published on the 7-cpu.com. The first X-Gene 1 result is our own measurement, the second one is the best we could find.
|SKU||Clock||Baseline Xeon D Compress||Baseline Xeon D Decompress|
|X-Gene 1 (AT bench)||2.4||1580||1864|
|X-Gene 1 (best)||2.4||1770||1980|
|Xeon E5-2640 v4||2.4-2.6||3755||2943|
|Xeon E5-2690 v3||2.6-3.5||4599||3811|
Let's translate this to percentages, where we compare the Thunder-X performance to the Xeon D and the Cortex-A57, two architectures it must try to beat. The first one is to open a broader market, the second one to justify the development of a homegrown ARMv8 microarchitecture.
|SKU||Clock||Baseline Xeon D Compress||Baseline Xeon D Decompress||Baseline A57 Compress||Baseline A57 Decompress|
|X-Gene (AT bench)||2.4||51%||80%||105%||80%|
|Xeon E5-2640 v4||2.4||122%||127%||250%||126%|
|Xeon E5-2690 v3||3.5||149%||164%||307%||164%|
First of all, these benchmarks should be placed in perspective: they tend to have a different profile than most server applications. For example compression relies a lot on memory latency and TLB efficiency. Decompression relies on integer instructions (shift, multiply). Since this test has unpredictable branches, the ThunderX has an advantage.
The ThunderX at 2 GHz performs more or less like an A57 core at the same speed. Considering that AMD only got eight A57 cores inside a power envelope of 32W using similar process technology, you could imagine that a A57 chip would be able to fit 32 cores at the most in a 120W TDP envelope. So Cavium did quite well fitting about 50% more cores inside the same power envelope using an old 28 nm high-k metal gate process.
Nevertheless, a 120W Xeon E5 offers about 2.5-3 times higher compression performance. The gap is indeed much smaller in decompression, where the wide Broadwell core is only 13% (!) faster than the narrow ThunderX core (compare the Xeon D-1557 with the ThunderX).