Single-Threaded Performance

I admit, the following two benchmarks are almost irrelevant for anyone buying a Xeon E7 based machine. But still, we have to quench our curiosity: how much have the new cores been improved? There is a lot that can be said about all the sophisticated "uncore" improvements (cache coherency policies, low latency rings, and so on) that allow this multi-core monster to scale, but at the end of the day, good performance starts with a good core. And since we have listed the many subtle core improvements, we could not resist the opportunity to check how each core compares.

The results aren't totally meaningless either, as the profile of a compression algorithm is somewhat similar to many server workloads: hard to extract instruction level parallelism (ILP) and sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.1.

We've run an additional data point for this particular set of tests. The new Ivy Bridge EX was tested at 2.8GHz and downclocked to 2.4GHz, so that we can do a clock-for-clock comparison with Westmere EX. Since we're only testing single-threaded performance here, other than perhaps slight differences due to having more total L3 cache, it doesn't matter which particular E7 v2 chip we use.

LZMA single threaded performance: compression

The latest Xeon E7 v2 "Ivy Bridge EX" is capable of extracting 33% more ILP out of the complex compression code than the older Xeon E7 "Westmere-EX" at the same clock speed. That is pretty amazing and shows how all the small micro-architecture improvements have accumulated into a large performance increase. The Opteron core is also better than most people think: at 2.4GHz it would deliver about 2481 MIPs. That is about 80% of Intel's best server core at the moment—not enough, but nothing to be ashamed about.

Also interesting to note is that the Westmere core was indeed a "tick": any performance increase over the Xeon X7560 (Codename "Beckton", 45nm Nehalem core) is simply the result of the higher clockspeed of the 32nm chip.

Let us see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is pretty branch intensive and depends on the latencies of the multiply and shift instructions.

LZMA single threaded performance: decompression

Again, we note a 30% improvement in integer performance going from the Xeon E7 "Westmere" (Xeon E7-4870 at 2.4GHz) to the Xeon E7 v2 "Ivy Bridge EX" (Xeon E7-4890 v2 clocked down to 2.4GHz).

To summarize: the new 15-core Xeon E7 v2 is built upon a strong core architecture that has improved significantly compared to the predecessor.

Our Benchmarks and Configuration Multi-Threaded Integer Performance
Comments Locked

125 Comments

View All Comments

  • JohanAnandtech - Saturday, February 22, 2014 - link

    I meant, I have never seen an independent review of high-end IBM or SUN systems. We did one back in the T1 days, but the product performed only well in a very small niche.
  • Phil_Oracle - Monday, February 24, 2014 - link

    Contact your Oracle rep and I am sure we'd be glad to loan you a SPARC T5 server, which we have in our loaner pool for analysts and press. Would be nice if you had a more objective view on comparisons.
  • Phil_Oracle - Monday, February 24, 2014 - link

    If you look at Oracles Performance/Benchmark blog, we have comparisons between Xeon, Power and SPARC based on all publicly available benchmarks. As Oracle sells both x86 as well as SPARC, we sometimes have benchmarks available on both platforms to compare.

    https://blogs.oracle.com/BestPerf/
  • Will Robinson - Saturday, February 22, 2014 - link

    Intel and their CPU technology continues to impress.
    Those kind of performance increase numbers must leave their competitors gasping on the mat.
    Props for the smart new chip. +1
  • Nogoodnms - Saturday, February 22, 2014 - link

    But can it run Crysis?
  • errorr - Saturday, February 22, 2014 - link

    My wife would now the answer to this considering she works for ibm but considering software costs far exceed hardware costs on a life cycle basis does anyone know what the licensing costs are between the different platforms.

    She once had me sit down to explain to her how CPU upgrades would effect db2 licenses. The system is more arcane and I'm not sure what the cost of each core is.

    For an ERP each chip type has a rated pvu metric from IBM which determines the cost of the license. Are RISC cores priced differently than x86 cores enough to partially make up the hardware costs?
  • JohanAnandtech - Sunday, February 23, 2014 - link

    I know Oracle does that (risc core <> x86 core when it comes to licensing), but I must admit, Licensing is extremely boring for a technical motivated person :-).
  • Phil_Oracle - Monday, February 24, 2014 - link

    In total cost of ownership calculations, where both HW and SW as well as maintenance costs are calculated, the majority of the costs (upwards of 90%) are associated with software licensing and maintenance/administration- so although HW costs matter, it’s the performance of the HW that drives the TCO. For Oracle, both Xeon and SPARC have a per core license factor of .5x, meaning 1 x license for every two cores, while Itanium and Power have a 1x multiplier, so therefore Itanium/Power must have a 2x performance/core advantage to have equivalent SW licensing costs. IBM has a PVU scale for SW licensing, which essentially is similar to Oracle but more granular in details. Microsofts latest SQL licensing follows similarly. So clearly, performance/CPU and especially per core matters in driving down licensing costs.
  • Michael REMY - Sunday, February 23, 2014 - link

    that would have be very good to test this cpu on 3D rendering benchmark.
    i can imagine the gain of time in a workstation...even the cost will be nearest a renderfarm...
    but comparing this xeon to other one in that situation should have bring a view point.
  • JohanAnandtech - Sunday, February 23, 2014 - link

    What rendering engine are you thinking about? Most engines scale badly beyond 16-32 threads

Log in

Don't have an account? Sign up now