Investigating Cavium's ThunderX: The First ARM Server SoC With Ambition
by Johan De Gelas on June 15, 2016 8:00 AM EST- Posted in
- SoCs
- IT Computing
- Enterprise
- Enterprise CPUs
- Microserver
- Cavium
Comparing With the Other ARMs
We did not have access to any recent Cortex-A57 or X-Gene platform to run the full SPEC CPU2006 suite. But we can still combine our previous findings with those that have been published on the 7-cpu.com. The first X-Gene 1 result is our own measurement, the second one is the best we could find.
SKU | Clock | Baseline Xeon D Compress | Baseline Xeon D Decompress |
Atom C2720 | 2.4 | 1687 | 2114 |
X-Gene 1 (AT bench) | 2.4 | 1580 | 1864 |
X-Gene 1 (best) | 2.4 | 1770 | 1980 |
Cortex-A57 | 1.9 | 1500 | 2330 |
ThunderX | 2.0 | 1547 | 2042 |
Xeon D1557 | 1.5-2.1 | 3079 | 2320 |
Xeon E5-2640 v4 | 2.4-2.6 | 3755 | 2943 |
Xeon E5-2690 v3 | 2.6-3.5 | 4599 | 3811 |
Let's translate this to percentages, where we compare the Thunder-X performance to the Xeon D and the Cortex-A57, two architectures it must try to beat. The first one is to open a broader market, the second one to justify the development of a homegrown ARMv8 microarchitecture.
SKU | Clock | Baseline Xeon D Compress | Baseline Xeon D Decompress | Baseline A57 Compress | Baseline A57 Decompress |
Atom C2720 | 2.4 | 55% | 91% | 112% | 91% |
X-Gene (AT bench) | 2.4 | 51% | 80% | 105% | 80% |
X-Gene (best) | 2.4 | 57% | 85% | 118% | 85% |
Cortex-A57 | 1.9 | 49% | 100% | 100% | 100% |
ThunderX | 2.0 | 50% | 88% | 103% | 88% |
Xeon D1557 | 2.1 | 100% | 100% | 205% | 100% |
Xeon E5-2640 v4 | 2.4 | 122% | 127% | 250% | 126% |
Xeon E5-2690 v3 | 3.5 | 149% | 164% | 307% | 164% |
First of all, these benchmarks should be placed in perspective: they tend to have a different profile than most server applications. For example compression relies a lot on memory latency and TLB efficiency. Decompression relies on integer instructions (shift, multiply). Since this test has unpredictable branches, the ThunderX has an advantage.
The ThunderX at 2 GHz performs more or less like an A57 core at the same speed. Considering that AMD only got eight A57 cores inside a power envelope of 32W using similar process technology, you could imagine that a A57 chip would be able to fit 32 cores at the most in a 120W TDP envelope. So Cavium did quite well fitting about 50% more cores inside the same power envelope using an old 28 nm high-k metal gate process.
Nevertheless, a 120W Xeon E5 offers about 2.5-3 times higher compression performance. The gap is indeed much smaller in decompression, where the wide Broadwell core is only 13% (!) faster than the narrow ThunderX core (compare the Xeon D-1557 with the ThunderX).
82 Comments
View All Comments
BlueBlazer - Friday, June 17, 2016 - link
Cavium is quite aware of their ThunderX single thread weakness, and directly from Cavium themselves https://www.youtube.com/watch?v=ei9uVskwPNE thanks to ARMdevices.net.TiffanyTown - Thursday, July 28, 2016 - link
hi, The JDK version you used is OpenJDK 1.8.0_91 . Did you build it yourself?