Investigating Cavium's ThunderX: The First ARM Server SoC With Ambitionby Johan De Gelas on June 15, 2016 8:00 AM EST
Floating Point: C-ray
Shifting over from integer to floating point benchmarks we have C-ray. C-ray is an extremely simple ray-tracer which is not representative of any real world raytracing application. In fact, it is essentially a floating point benchmark that runs out the L1-cache. That said, it is not as synthetic and meaningless as Whetstone, as you can actually use the software to do simple raytracing. We use this benchmark because it allows us to isolate the FP performance and the energy consumption from other factors such as L2/L3 cache/memory subsystem.
We compiled the C-ray multi-threaded version with -O3 -ffast-math. Real floating point intensive applications tend to put the memory subsystem under pressure, and running a second thread makes it only worse. So we are used to seeing that many HPC applications perform worse with multi-threading on. But since C-ray runs mostly out of the L1-cache, we get different behavior.
This is the most favorable floating point benchmark that we could run on the ThunderX: it does not use the high latency blocking L2-cache, nor does it needs to access the DRAMs. Also, the Xeons cannot really fully flex their AVX muscles. So take this with a large grain of salt.
In these situations, the ThunderX performs like the midrange Xeon E5-2640 v4.