Single-Threaded Performance

I admit, the following two benchmarks are almost irrelevant for anyone buying a Xeon E7 based machine. But still, we have to quench our curiosity: how much have the new cores been improved? There is a lot that can be said about all the sophisticated "uncore" improvements (cache coherency policies, low latency rings, and so on) that allow this multi-core monster to scale, but at the end of the day, good performance starts with a good core. And since we have listed the many subtle core improvements, we could not resist the opportunity to check how each core compares.

The results aren't totally meaningless either, as the profile of a compression algorithm is somewhat similar to many server workloads: hard to extract instruction level parallelism (ILP) and sensitive to memory parallelism and latency. The instruction mix is a bit different, but it's still somewhat similar to many server workloads. And as one more reason to test performance in this manner, the 7-zip source code is available under the GNU LGPL license. That allows us to recompile the source code on every machine with the -O2 optimization with gcc 4.8.1.

We've run an additional data point for this particular set of tests. The new Ivy Bridge EX was tested at 2.8GHz and downclocked to 2.4GHz, so that we can do a clock-for-clock comparison with Westmere EX. Since we're only testing single-threaded performance here, other than perhaps slight differences due to having more total L3 cache, it doesn't matter which particular E7 v2 chip we use.

LZMA single threaded performance: compression

The latest Xeon E7 v2 "Ivy Bridge EX" is capable of extracting 33% more ILP out of the complex compression code than the older Xeon E7 "Westmere-EX" at the same clock speed. That is pretty amazing and shows how all the small micro-architecture improvements have accumulated into a large performance increase. The Opteron core is also better than most people think: at 2.4GHz it would deliver about 2481 MIPs. That is about 80% of Intel's best server core at the moment—not enough, but nothing to be ashamed about.

Also interesting to note is that the Westmere core was indeed a "tick": any performance increase over the Xeon X7560 (Codename "Beckton", 45nm Nehalem core) is simply the result of the higher clockspeed of the 32nm chip.

Let us see how the chips compare in decompression. Decompression is an even lower IPC (Instructions Per Clock) workload, as it is pretty branch intensive and depends on the latencies of the multiply and shift instructions.

LZMA single threaded performance: decompression

Again, we note a 30% improvement in integer performance going from the Xeon E7 "Westmere" (Xeon E7-4870 at 2.4GHz) to the Xeon E7 v2 "Ivy Bridge EX" (Xeon E7-4890 v2 clocked down to 2.4GHz).

To summarize: the new 15-core Xeon E7 v2 is built upon a strong core architecture that has improved significantly compared to the predecessor.

Our Benchmarks and Configuration Multi-Threaded Integer Performance
Comments Locked

125 Comments

View All Comments

  • snoopy1710 - Friday, February 21, 2014 - link

    Minor correction on the Dell E7-4890 SAP benchmark, which was done on SUSE Linux Enterprise Server for SAP Applications:

    http://download.sap.com/download.epd?context=40E2D...

    Snoopy
  • FunBunny2 - Friday, February 21, 2014 - link

    you should opt for ubuntu 12.04. "real" databases are approved only for LTS versions, and 12.04 is the latest.
  • bji - Friday, February 21, 2014 - link

    Page 10 does not contain the Linux Kernel Compile time benchmarks.
  • JohanAnandtech - Saturday, February 22, 2014 - link

    The web engine did something weird...I restored the page
  • JawsOfLife - Friday, February 21, 2014 - link

    Very thorough review, which is what I've come to expect from Anandtech! I am interested but not very knowledgeable about the server side of computing, so this definitely filled me in on a lot of the facets of that area. Thanks for the writeup.

    By the way, the "Linux Kernel Compile" page is blank, as bji noted.
  • JohanAnandtech - Saturday, February 22, 2014 - link

    Thx. A glitch in the engine made it delete a page. Restored.
  • iwod - Friday, February 21, 2014 - link

    While the revenue are high, just how many unit are shipped?
    I have been thinking if Intel would move Mobile First, meaning Atom, Tablet and Laptop Chips all gets the latest node first, which are low power design. While Desktop and Server will be a Architecture and Node behind. Which will align the Desktop and Xeon E3 - E5 Series.

    But it seems the volume of Chips dont quite measure out, since the top end volume are far too small? Anyone have any idea on this.
  • dealcorn - Saturday, February 22, 2014 - link

    I believe the statement "Still, that tiny amount of RISC servers represents about 50% of the server market revenues." should read "Still, that tiny amount of RISC servers represents about 50% of the high end server market revenues." Stated differently, from a revenue perspective Intel is #1 vendor in the high end segment even though it has less than a 50% market share. Server orders are placed with vendors, not architectures. Intel has fought an uphill battle to access the high end market and it is costly. However, if Intel can amortize its development costs over a larger revenue base than any competitor, it is well positioned to maintain it's share acquisition momentum.
  • NikosD - Saturday, February 22, 2014 - link

    @Johan

    Very nice review, I would like to see more benchmarks between E7 v2 vs RISC processors because I think the real competition is there.

    Older Intel and AMD servers are not real competition for IvyBridge-EX.

    It would be interesting when POWER8 is out, to give us the new figures of 8 socket benchmarks and if there is any progress on more 8+ sockets for Intel E7 v2 (built by Cray and other vendors)

    I think that E7 v2 (I don't know about older vendors) can be placed in up to 32-socket systems - not natively of course.
  • JohanAnandtech - Saturday, February 22, 2014 - link

    Older Intel systems are competition, because these kind of servers are not replaced quickly. If a new generation does not deliver substantial gains, some companies will postpone replacement. In fact, very few people that already have a quad intel consider the move to RISC platforms.

    But you have a point. But it is almost impossible for us to do an independent review of other vendors. I have never seen an independent review, and the systems are too scarce, so there is little chance that we can ask a friendly company to borrow us one.

Log in

Don't have an account? Sign up now