Single-Threaded Integer Performance

I admit, the following two benchmarks are almost irrelevant for anyone buying a Xeon E5 based machine. But still, we have to quench our curiosity: how much have the new cores been improved? There is a lot that can be said about the sophisticated "uncore" improvements (cache coherency policies, low latency rings, and so on) that allow this multi-core monster to scale, but at the end of the day, good performance starts with a good core. And since we have listed the many subtle core improvements, we could not resist the opportunity to see how each core compares.

The results aren't totally meaningless either, as the profile of a compression algorithm is somewhat similar to many server workloads: it can be hard to extract instruction level parallelism (ILP) and it's sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.1.

Single Threaded LZMA Compression

It looks more boring than it is. First of all, judging by the reactions on forums, many people expected that an 18-core E5-2699 v3 at 2.3GHz would be slower than a 3.2GHz Xeon E5-2667 v3. However you actually can have it all. The Xeon E5-2699 v3 and 2695 v3 boost their clock speed to no less than 3.6GHz when only one or two cores are active. The Xeon E5-2667 v3's maximum Turbo Boost is also the same 3.6GHz, so when only a few threads are active, the Xeon E5-2667 v3 has no clock advantage over the "mega/expensive SKUs" other than the fact that the clock speed will not drop lower than 3.2GHz if all cores are running at full bore.

Despite the fact that the Xeon E5-2690 core has lower IPC, it is able to keep up as it can boost the standard clock speed from 2.9 to 3.8GHz. As it is very hard to extract more IPC out of this kind of code, the extra 200MHz is enough to keep up.

Let's see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is very branch intensive and depends on the latencies of the multiply and shift instructions.

Single threaded LZMA decompression

The older Xeon E5 takes the lead as decompression runs at very low IPC and is mostly depended on clock speed and low latency accesses. The new Xeon E5 v3 has slightly higher latency in both L3 cache and memory, so it falls behind.

What makes this benchmark interesting is that it proves that Turbo Boost works very well, even on an 18-core chip with a massive die. This is a big bonus, as especially in situations where you are setting up/preparing a system to be productive, it is very likely that you will be waiting for some single-threaded application to end. It also means that if one heavy request hits the server while it is running at very low load, the response time of the request will be low, keeping the impatient users happy.

Memory Subsystem: Latency Multi-Threaded Integer Performance
Comments Locked

85 Comments

View All Comments

  • bsd228 - Friday, September 12, 2014 - link

    Now go price memory for M class Sun servers...even small upgrades are 5 figures and going 4 years back, a mid sized M4000 type server was going to cost you around 100k with moderate amounts of memory.

    And take up a large portion of the rack. Whereas you can stick two of these 18 core guys in a 1U server and have 10 of them (180 cores) for around the same sort of money.

    Big iron still has its place, but the economics will always be lousy.
  • platinumjsi - Tuesday, September 9, 2014 - link

    ASRock are selling boards with DDR3 support, any idea how that works?

    http://www.asrockrack.com/general/productdetail.as...
  • TiGr1982 - Tuesday, September 9, 2014 - link

    Well... ASRock is generally famous "marrying" different gen hardware.
    But here, since this is about DDR RAM, governed by the CPU itself (because memory controller is inside the CPU), then my only guess is Xeon E5 v3 may have dual-mode memory controller (supporting either DDR4 or DDR3), similarly as Phenom II had back in 2009-2011, which supported either DDR2 or DDR3, depending on where you plugged it in.

    If so, then probably just the performance of E5 v3 with DDR3 may be somewhat inferior in comparison with DDR4.
  • alpha754293 - Tuesday, September 9, 2014 - link

    No LS-DYNA runs? And yes, for HPC applications, you actually CAN have too many cores (because you can't keep the working cores pegged with work/something to do, so you end up with a lot of data migration between cores, which is bad, since moving data means that you're not doing any useful work ON the data).

    And how you decompose the domain (for both LS-DYNA and CFD makes a HUGE difference on total runtime performance).
  • JohanAnandtech - Tuesday, September 9, 2014 - link

    No, I hope to get that one done in the more Windows/ESXi oriented review.
  • Klimax - Tuesday, September 9, 2014 - link

    Nice review. Next stop: Windows Server. (And MS-SQL..)
  • JohanAnandtech - Tuesday, September 9, 2014 - link

    Agreed. PCIe Flash and SQL server look like a nice combination to test this new Xeons.
  • TiGr1982 - Tuesday, September 9, 2014 - link

    Xeon 5500 series (Nehalem-EP): up to 4 cores (45 nm)
    Xeon 5600 series (Westmere-EP): up to 6 cores (32 nm)
    Xeon E5 v1 (Sandy Bridge-EP): up to 8 cores (32 nm)
    Xeon E5 v2 (Ivy Bridge-EP): up to 12 cores (22 nm)
    Xeon E5 v3 (Haswell-EP): up to 18 cores (22 nm)

    So, in this progression, core count increases by 50% (1.5 times) almost each generation.

    So, what's gonna be next:

    Xeon E5 v4 (Broadwell-EP): up to 27 cores (14 nm) ?

    Maybe four rows with 5 cores and one row with 7 cores (4 x 5 + 7 = 27) ?
  • wallysb01 - Wednesday, September 10, 2014 - link

    My money is on 24 cores.
  • SuperVeloce - Tuesday, September 9, 2014 - link

    What's the story with 2637v3? Only 4 cores and the same freqency and $1k price as 6core 2637v2? By far the most pointless cpu on the list.

Log in

Don't have an account? Sign up now