Single-Threaded Performance

I admit, the following two benchmarks are almost irrelevant for anyone buying a Xeon E7 based machine. But still, we have to quench our curiosity: how much have the new cores been improved? There is a lot that can be said about all the sophisticated "uncore" improvements (cache coherency policies, low latency rings, and so on) that allow this multi-core monster to scale, but at the end of the day, good performance starts with a good core. And since we have listed the many subtle core improvements, we could not resist the opportunity to check how each core compares.

The results aren't totally meaningless either, as the profile of a compression algorithm is somewhat similar to many server workloads: hard to extract instruction level parallelism (ILP) and sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.1.

We've run an additional data point for this particular set of tests. The new Ivy Bridge EX was tested at 2.8GHz and downclocked to 2.4GHz, so that we can do a clock-for-clock comparison with Westmere EX. Since we're only testing single-threaded performance here, other than perhaps slight differences due to having more total L3 cache, it doesn't matter which particular E7 v2 chip we use.

LZMA single threaded performance: compression

The latest Xeon E7 v2 "Ivy Bridge EX" is capable of extracting 33% more ILP out of the complex compression code than the older Xeon E7 "Westmere-EX" at the same clock speed. That is pretty amazing and shows how all the small micro-architecture improvements have accumulated into a large performance increase. The Opteron core is also better than most people think: at 2.4GHz it would deliver about 2481 MIPs. That is about 80% of Intel's best server core at the moment—not enough, but nothing to be ashamed about.

Also interesting to note is that the Westmere core was indeed a "tick": any performance increase over the Xeon X7560 (Codename "Beckton", 45nm Nehalem core) is simply the result of the higher clockspeed of the 32nm chip.

Let us see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is pretty branch intensive and depends on the latencies of the multiply and shift instructions.

LZMA single threaded performance: decompression

Again, we note a 30% improvement in integer performance going from the Xeon E7 "Westmere" (Xeon E7-4870 at 2.4GHz) to the Xeon E7 v2 "Ivy Bridge EX" (Xeon E7-4890 v2 clocked down to 2.4GHz).

To summarize: the new 15-core Xeon E7 v2 is built upon a strong core architecture that has improved significantly compared to the predecessor.

Our Benchmarks and Configuration Multi-Threaded Integer Performance
Comments Locked

125 Comments

View All Comments

  • Kevin G - Saturday, February 22, 2014 - link

    Not 100% sure since I'm not an IEEE member to view it, but this paper maybe the source for the POWER7+ figures:
    http://ieeexplore.ieee.org/xpl/articleDetails.jsp?...
  • Phil_Oracle - Monday, February 24, 2014 - link

    TDP is great for comparing chip to chip, but what really matters is system performance/watt. And although Intel's latest Xeon E7 v2 may have better TDP specs than either Power7+ or SPARC T5, when you look at the total system performance/watt, SPARC T5 actually leads today due to its higher throughput, core count, 4 x more threads, built-in encryption engines and higher optimization with the Oracle SW stack.
  • Flunk - Friday, February 21, 2014 - link

    8 core consumer chips now please. If you have to take the GPU off go for it.
  • DanNeely - Friday, February 21, 2014 - link

    Assuming you mean 8 identical cores, until mainstream consumer apps appear that can use more CPU resources than the 4HT cores in Intel's high end consumer chips but which can't benefit from GPU acceleration become common it's not going to happen.

    I suppose Intel could do a big.little type implementation with either core and atom or atom and the super low power 486ish architecture they announced a few months ago in the future. But in addition to thinking it was worthwhile for the power savings, they'd also need to license/work around arm's patents. I suppose a mobile version might happen someday; but don't really see a plausible benefit for laptop/desktop systems that don't need continuous connected standby like phones do.
  • Kevin G - Friday, February 21, 2014 - link

    Intel hasn't announced any distinct plans to go this route, they're at least exploring the idea at some level. The SkyLake and Knights Landing are to support the same ISA extensions and in principle a program could migrate between the two types of cores.
  • StevoLincolnite - Saturday, February 22, 2014 - link

    Er. You don't need apps to use more than 4 threads to make use of an 8 core processor.
    Whatever happened to running several demanding applications at once? Surely I am not the only one who does this...
    My Sandy-Bridge-E processor being a few years old is starting to show it's age in such instances, I would cry tears of blood for an 8-Core Haswell based processor to replace my current 6-core chip.
  • psyq321 - Monday, March 10, 2014 - link

    Well, you can buy bigger Ivy Bridge EP Xeon CPU and fit it in your LGA2011 system.

    This way you can go up to 12 cores and not have to wait for 8-core Haswell E.
  • SirKnobsworth - Friday, February 21, 2014 - link

    8 core Haswell-E chips are due out later this year. You can already buy 6 core Ivy Bridge-E chips with no integrated graphics.
  • TiGr1982 - Friday, February 21, 2014 - link

    Did you know:
    Haswell-E is supposed to be released in Q3 this year, to have up to 8 Haswell cores with HT, fit in the new revision of Socket LGA2011 (incompatible with the current desktop LGA2011), and work with DDR4 and X99 chipset. No GPU there, since it's a byproduct of server Haswell-EP.
  • Harry Lloyd - Friday, February 21, 2014 - link

    That will not help much, unless they release a 6-core chip for around 300 $, replacing the lowest LGA2011 4-core chips. It is about time.

Log in

Don't have an account? Sign up now