Single-Threaded Integer Performance

I admit, the following two benchmarks are almost irrelevant for anyone buying a Xeon E5 based machine. But still, we have to quench our curiosity: how much have the new cores been improved? There is a lot that can be said about the sophisticated "uncore" improvements (cache coherency policies, low latency rings, and so on) that allow this multi-core monster to scale, but at the end of the day, good performance starts with a good core. And since we have listed the many subtle core improvements, we could not resist the opportunity to see how each core compares.

The results aren't totally meaningless either, as the profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.1.

Single Threaded LZMA Compression

It looks more boring than it is. First of all, judging by the reactions on forums, many people expected that an 18-core E5-2699 v3 at 2.3GHz would be slower than a 3.2GHz Xeon E5-2667 v3. However you actually can have it all. The Xeon E5-2699 v3 and 2695 v3 boost their clock speed to no less than 3.6GHz when only one or two cores are active. The Xeon E5-2667 v3's maximum Turbo Boost is also the same 3.6GHz, so when only a few threads are active, the Xeon E5-2667 v3 has no clock advantage over the "mega/expensive SKUs" other than the fact that the clock speed will not drop lower than 3.2GHz if all cores are running at full bore.

Despite the fact that the Xeon E5-2690 core has lower IPC, it is able to keep up as it can boost the standard clock speed from 2.9 to 3.8GHz. As it is very hard to extract more IPC out of this kind of code, the extra 200MHz is enough to keep up.

Let's see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is very branch intensive and depends on the latencies of the multiply and shift instructions.

Single threaded LZMA decompression

The older Xeon E5 takes the lead as decompression runs at very low IPC and is mostly depended on clock speed and low latency accesses. The new Xeon E5 v3 has slightly higher latency in both L3 cache and memory, so it falls behind.

What makes this benchmark interesting is that it proves that Turbo Boost works very well, even on an 18-core chip with a massive die. This is a big bonus, as especially in situations where you are setting up/preparing a system to be productive, it is very likely that you will be waiting for some single-threaded application to end. It also means that if one heavy request hits the server while it is running at very low load, the response time of the request will be low, keeping the impatient users happy.

Memory Subsystem: Latency Multi-Threaded Integer Performance
Comments Locked

85 Comments

View All Comments

  • SuperVeloce - Tuesday, September 9, 2014 - link

    Oh, nevermind... I unknowingly caught an error.
  • JohanAnandtech - Tuesday, September 9, 2014 - link

    thx! Fixed. Sorry for the late reaction, jetlagged and trying to get to the hectic pace of IDF :-)
  • hescominsoon - Tuesday, September 9, 2014 - link

    As long as AMD continues it's idiotic two integer units sharing an fpu design they will be an afterthought in the cpu department.
  • nils_ - Sunday, September 14, 2014 - link

    Serious competition for Intel will not come from AMD any time soon, but possibly IBM with the POWER8, Tyan even came out with a single socket board for that CPU so it might make it's way into the same market soon.
  • ScarletEagle - Tuesday, September 16, 2014 - link

    Any feel for the relative HPC performance of the E5-2680v3 with respect to the E5-2650Lv3? I am looking at purchasing a PowerEdge 730 with two of these and the 2133MHz RAM. My guess is that the higher base clock speed should make somewhat of an improvement?

Log in

Don't have an account? Sign up now