Intel's Benchmarks

Since time constraints meant that we were not able to run a ton of benchmarks ourselves, it's useful to check out Intel's own benchmarks as well. In our experience Intel's own benchmarking has a good track record for producing accurate numbers and documenting configuration details. Of course, you have to read all the benchmarking information carefully to make sure you understand just what is being tested.

The OLTP and virtualization benchmarks show that the new Xeon E7 v3 is about 25 to 39% faster than the previous Xeon E7 (v2). In some of those benchmarks, the new Xeon had twice as much memory, but it is safe to say that this will make only a small difference. We think it's reasonable to conclude that the Xeon E7 is 25 to 30% faster, which is also what we found in our integer benchmarks.

The increase in legacy FP application is much lower. For example Cinebench was 14% faster, SPECFP 9% and our own OpenFOAM was about 4% faster. Meanwhile linpack benchmarks are pretty useless to most of the HPC world, so we have more faith in our own benchmarking. Intel's own realistic HPC benchmarking showed at best a 19% increase, which is nothing to write home about.

The exciting part about this new Xeon E7 is that data analytics/mining happens a lot faster on the new Xeon E7 v3. The 72% faster SAS analytics number is not really accurate as part of the speedup was due to using P3700 SSDs instead of the S3700 SSD. Still, Intel claims that the replacing the E7 v2 with the v3 is good for a 55-58% speedup.

The most spectacular benchmark is of course SAP HANA. It is not 6x faster as Intel claims, but rather 3.3x (see our comments about TSX). That is still spectacular and the result of excellent software and hardware engineering.

Final Words: Comparing Xeon E7 v3 vs V2

For those of us running scale-up, reasonably priced HPC or database applications, it is hard to get excited about the Xeon E7 v3. The performance increases are small-but-tangible, however at the same time the new Xeon E7 costs a bit more. Meanwhile as far as our (HPC) energy measurements go, there is no tangible increase in performance per watt.

The Xeon E7 in its natural habitat: heavy heatsinks, hotpluggable memory

However organizations running SAP HANA will welcome the new Xeon E7 with open arms, they get massive speedups for a 0.1% or less budget increase. The rest of the data mining community with expensive software will benefit too, as the new Xeon E7 is at least 50% faster in those applications thanks to TSX.

Ultimately we wonder how the rest of us will fare. Will SAP/SAS speedups also be visible in open source Big Data software such as Hadoop and Elastic Search? Currently we are still struggling to get the full potential out of the 144 threads. Some of these tests run for a few days only to end with a very vague error message: big data benchmarking is hard.

Comparing Xeon E7 v3 and POWER8

Although the POWER8 is still a power gobbling monster, just like its older brother the POWER7, there is no denying that IBM has made enormous progress. Few people will be surprised that IBM's much more expensive enterprise systems beat Intel based offerings in the some high-end benchmarks like SAP's. But the fact that 24 POWER8 cores in a relatively reasonably priced IBM POWER8 server can beat 36 Intel Haswell cores by a considerable margin is new.

It is also interesting that our own integer benchmarking shows that the POWER8 core is capable of keeping up with Intel's best core at the same clockspeed (3.3-3.4 GHz). Well, at least as long as you feed it enough threads in IPC unfriendly code. But that last sentence is the exact description of many server workloads. It also means that the SAP benchmark is not an exception: the IBM POWER8 is definitely not the best CPU to run Crysis (not enough threads) but it is without a doubt a dangerous competitor for Xeon E7 when given enough threads to fill up the CPU.

Right now the threat to Intel is not dire, IBM still asks way too much for its best POWER8 systems and the Xeons have a much better performance-per-watt ratio. But once the OpenPOWER fondation partners start offering server solutions, there is a good chance that Intel will receive some very significant performance-per-dollar competition in the server market.

HPC Watts per Job
Comments Locked

146 Comments

View All Comments

  • thunng8 - Monday, May 11, 2015 - link

    Sorry about the language

    A couple points that are wrong on benchmarks:
    - The Power7 p270 is a 2 socket system with 2 processors in one socket (4 processors). It was designed to get more cores into 1 socket and not outright performance per processor. If you want to show the best quad processor on 4 socket system, then it would be this result:
    http://download.sap.com/download.epd?context=40E2D...

    - Your comment about Power7 needing more sockets to match Intel is not based on reality. IBM held the 8 socket lead in SAP SD from March 2010 with this result:
    http://download.sap.com/download.epd?context=40E2D...

    It wasn't surpassed by Intel until June 2014 with this result:
    http://download.sap.com/download.epd?context=40E2D...

    Note: Even the Power7 result from 2010 shows higher throughput per core than the just released Haswell server chips.

    And then 4 months later Power8 overtook it again. BTW, IBM recently announced the 12 core 4.02Ghz cpu in the E880..that should get an extra ~15% throughput per socket.

    - Power8 L2 cache runs at full speed clock speed

    A point completely overlooked and what makes Power systems really excel is the efficiency of the Power hypervisor. IMO it is the biggest selling point of the Power ecosystem.
  • thunng8 - Monday, May 11, 2015 - link

    Another datapoint (not on spec site yet, but listed on the IBM e880 performance site):

    http://www-03.ibm.com/systems/power/hardware/e880/...

    SpecIntRate: 14400
    SpecfpRate: 11400

    Which makes it (per processor):
    SpecIntrate: 900
    SepcfpRate: 713
  • thunng8 - Monday, May 11, 2015 - link

    Also, the ibm power 760 is the same deal with the p270.

    It is actually a 4 socket system with 2 processors per socket.

    Technical overview here:

    http://www.redbooks.ibm.com/redpapers/pdfs/redp498...
  • thunng8 - Wednesday, May 13, 2015 - link

    Well, it has been a few days since I've listed a quite few of your misrepresentations of the data in comparison to POWER, and nothing has changed and no reply at all.

    I find it hilarious that you can put this text in the article:
    "the new POWER8 has made the Enterprise line of IBM more competitive than ever. Gone are the days that IBM needed more CPU sockets than Intel to get the top spot."

    and still have it there when I've pointed out over the last 5 years (or maybe longer, I couldn't be bothered looking further), Intel has only overtaken POWER system for only 4 months. i.e. 4 months out of 60+ months
  • JlHADJOE - Sunday, May 10, 2015 - link

    "No less than 98% of the server shipments have been 'Intel inside'... From the revenue side, the RISC based systems are still good for slightly less than 20% of the $49 Billion (per year) server market*."

    Wow! So RISC has 2% market share and 20% revenue.
  • FunBunny2 - Monday, May 11, 2015 - link

    Gee. Sounds kinda like the Apple approach to production.
  • akula2 - Sunday, May 10, 2015 - link

    POWER8 is far better than Intel's counterpart.
    IBM is way ahead of Intel for the next generation computing with their Brain Chip.

    I hope Intel's share slips with the emerging ARM 64 bit CPU (A-72) in the Server space.
  • ats - Tuesday, May 12, 2015 - link

    Whoa, there is wrong then there is Brutalizer WRONG!

    First of all many IMDBs support full locking at multiple granularity including both TimesTen and SAP HANA. IMDBs are not read only and are used in the most critical performance transaction processing scenarios (because disk based DBs simply can't keep up!)

    Second, IMDBs are used for a variety of DB workloads from transaction processing to analytic workloads.

    Third, if your queries are taking hours, you are doing analytic workloads, not transaction processing. Transaction processing is the DB workload most dependent on locking functionality and requires real time responses. Analytic workloads are the least dependent on locking performance.

    Fourth, many IMDBs are designed and deployed as the sole DB layer, including SAP HANA and TimesTen. Both fully support shadowing to disk.

    Fifth, you can run businesses on **SCALE UP** severs like UV2K. Unless you now want to claim you can run businesses on mainframes, Sun's large scale servers, Fujitsu's large scale servers, IBMs large scale servers, or HP's large scale servers.

    Sixth, if you think an UV2K is a cluster, you don't have enough knowledge to even post about this topic. UV2k is a SSI cache coherent SMP, no different than Oracle Sparc M6 or and IBM P795.

    Seventh, you don't need a direct channel between sockets. You have never needed a direct channel between sockets. In fact the system that put Sun on the map, UE10K, did not have direct connections between each socket. In fact MANY MANY large scale sun systems have not had direct connections between sockets. If you actually knew anything about the history of big servers you would know that direct connections can be slower, using switches can be slower, and using torii and hypercubes can be slower, or they all can be faster. Looking at an interconnection network topology doesn't tell you jack. What matters is latency and latency vs load.

    Eighth, people who fail at math should probably not try to make math based arguments. To directly connect N sockets, each socket needs N-1 links, no n^2 links. And you should probably learn something about how bandwidth and latency works. The more you directly connect, the less bandwidth you have between each node and the highly the latency hot spotting becomes. Using min channel widths isn't necessarily the best solution. And actually, you can have throughput and low latency, it just impacts cost.

    Ninth, ScaleMP has no relation to SGI's UV2k. None.

    10th, more business software runs on X86 than anything else in the world. More DBs run on x86 than anything else in the world. And neither ScaleMP nor UV2k are scale out solutions. UV2K is a pure scale-up system. You might know that if you only had a clue.

    1) There are 8, 16, 32, 64, 128, AND 256 processor **SCALE-UP** x86 systems. And the x86 Superdome delivers higher performance than any previous HP scale-up system. And no, you don't need socket counts, you need performance. Socket counts are quite immaterial, and shrinking by the by.

    2) SGI UV2K is not a scale out system. Its a SSI Scale Up system. When you finally admit this, you'll be one step closer to not riding the short bus.

    And fyi, plenty of people use x86 for large sap installations. In fact, x86 runs more sap installations than anyone else combined.

    Oh and: http://global.sap.com/solutions/benchmark/bweml-re...

    And just for fun, WE LAUGH AT YOUR PUNY ORACLE SAPS: http://global.sap.com/solutions/benchmark/sd3tier.... still under a million? You are being beaten by 8 socket servers, ouch that's gotta hurt!
  • ats - Tuesday, May 12, 2015 - link

    Whoa, there is wrong then there is Brutalizer WRONG!

    First of all many IMDBs support full locking at multiple granularity including both TimesTen and SAP HANA. IMDBs are not read only and are used in the most critical performance transaction processing scenarios (because disk based DBs simply can't keep up!)

    Second, IMDBs are used for a variety of DB workloads from transaction processing to analytic workloads.

    Third, if your queries are taking hours, you are doing analytic workloads, not transaction processing. Transaction processing is the DB workload most dependent on locking functionality and requires real time responses. Analytic workloads are the least dependent on locking performance.

    Fourth, many IMDBs are designed and deployed as the sole DB layer, including SAP HANA and TimesTen. Both fully support shadowing to disk.

    Fifth, you can run businesses on **SCALE UP** severs like UV2K. Unless you now want to claim you can run businesses on mainframes, Sun's large scale servers, Fujitsu's large scale servers, IBMs large scale servers, or HP's large scale servers.

    Sixth, if you think an UV2K is a cluster, you don't have enough knowledge to even post about this topic. UV2k is a SSI cache coherent SMP, no different than Oracle Sparc M6 or and IBM P795.

    Seventh, you don't need a direct channel between sockets. You have never needed a direct channel between sockets. In fact the system that put Sun on the map, UE10K, did not have direct connections between each socket. In fact MANY MANY large scale sun systems have not had direct connections between sockets. If you actually knew anything about the history of big servers you would know that direct connections can be slower, using switches can be slower, and using torii and hypercubes can be slower, or they all can be faster. Looking at an interconnection network topology doesn't tell you jack. What matters is latency and latency vs load.

    Eighth, people who fail at math should probably not try to make math based arguments. To directly connect N sockets, each socket needs N-1 links, no n^2 links. And you should probably learn something about how bandwidth and latency works. The more you directly connect, the less bandwidth you have between each node and the highly the latency hot spotting becomes. Using min channel widths isn't necessarily the best solution. And actually, you can have throughput and low latency, it just impacts cost.

    Ninth, ScaleMP has no relation to SGI's UV2k. None.

    10th, more business software runs on X86 than anything else in the world. More DBs run on x86 than anything else in the world. And neither ScaleMP nor UV2k are scale out solutions. UV2K is a pure scale-up system. You might know that if you only had a clue.

    1) There are 8, 16, 32, 64, 128, AND 256 processor **SCALE-UP** x86 systems. And the x86 Superdome delivers higher performance than any previous HP scale-up system. And no, you don't need socket counts, you need performance. Socket counts are quite immaterial, and shrinking by the by.

    2) SGI UV2K is not a scale out system. Its a SSI Scale Up system. When you finally admit this, you'll be one step closer to not riding the short bus.

    And fyi, plenty of people use x86 for large sap installations. In fact, x86 runs more sap installations than anyone else combined.

    Oh and: http://global.sap.com/solutions/benchmark/bweml-re...

    And just for fun, WE LAUGH AT YOUR PUNY ORACLE SAPS: http://global.sap.com/solutions/benchmark/sd3tier.... still under a million? You are being beaten by 8 socket servers, ouch that's gotta hurt!
  • MyNuts - Tuesday, May 12, 2015 - link

    Great, i guess. Wheres the holograms and teleporters. I see just another calculator :(

Log in

Don't have an account? Sign up now