Intel's Benchmarks

Since time constraints meant that we were not able to run a ton of benchmarks ourselves, it's useful to check out Intel's own benchmarks as well. In our experience Intel's own benchmarking has a good track record for producing accurate numbers and documenting configuration details. Of course, you have to read all the benchmarking information carefully to make sure you understand just what is being tested.

The OLTP and virtualization benchmarks show that the new Xeon E7 v3 is about 25 to 39% faster than the previous Xeon E7 (v2). In some of those benchmarks, the new Xeon had twice as much memory, but it is safe to say that this will make only a small difference. We think it's reasonable to conclude that the Xeon E7 is 25 to 30% faster, which is also what we found in our integer benchmarks.

The increase in legacy FP application is much lower. For example Cinebench was 14% faster, SPECFP 9% and our own OpenFOAM was about 4% faster. Meanwhile linpack benchmarks are pretty useless to most of the HPC world, so we have more faith in our own benchmarking. Intel's own realistic HPC benchmarking showed at best a 19% increase, which is nothing to write home about.

The exciting part about this new Xeon E7 is that data analytics/mining happens a lot faster on the new Xeon E7 v3. The 72% faster SAS analytics number is not really accurate as part of the speedup was due to using P3700 SSDs instead of the S3700 SSD. Still, Intel claims that the replacing the E7 v2 with the v3 is good for a 55-58% speedup.

The most spectacular benchmark is of course SAP HANA. It is not 6x faster as Intel claims, but rather 3.3x (see our comments about TSX). That is still spectacular and the result of excellent software and hardware engineering.

Final Words: Comparing Xeon E7 v3 vs V2

For those of us running scale-up, reasonably priced HPC or database applications, it is hard to get excited about the Xeon E7 v3. The performance increases are small-but-tangible, however at the same time the new Xeon E7 costs a bit more. Meanwhile as far as our (HPC) energy measurements go, there is no tangible increase in performance per watt.

The Xeon E7 in its natural habitat: heavy heatsinks, hotpluggable memory

However organizations running SAP HANA will welcome the new Xeon E7 with open arms, they get massive speedups for a 0.1% or less budget increase. The rest of the data mining community with expensive software will benefit too, as the new Xeon E7 is at least 50% faster in those applications thanks to TSX.

Ultimately we wonder how the rest of us will fare. Will SAP/SAS speedups also be visible in open source Big Data software such as Hadoop and Elastic Search? Currently we are still struggling to get the full potential out of the 144 threads. Some of these tests run for a few days only to end with a very vague error message: big data benchmarking is hard.

Comparing Xeon E7 v3 and POWER8

Although the POWER8 is still a power gobbling monster, just like its older brother the POWER7, there is no denying that IBM has made enormous progress. Few people will be surprised that IBM's much more expensive enterprise systems beat Intel based offerings in the some high-end benchmarks like SAP's. But the fact that 24 POWER8 cores in a relatively reasonably priced IBM POWER8 server can beat 36 Intel Haswell cores by a considerable margin is new.

It is also interesting that our own integer benchmarking shows that the POWER8 core is capable of keeping up with Intel's best core at the same clockspeed (3.3-3.4 GHz). Well, at least as long as you feed it enough threads in IPC unfriendly code. But that last sentence is the exact description of many server workloads. It also means that the SAP benchmark is not an exception: the IBM POWER8 is definitely not the best CPU to run Crysis (not enough threads) but it is without a doubt a dangerous competitor for Xeon E7 when given enough threads to fill up the CPU.

Right now the threat to Intel is not dire, IBM still asks way too much for its best POWER8 systems and the Xeons have a much better performance-per-watt ratio. But once the OpenPOWER fondation partners start offering server solutions, there is a good chance that Intel will receive some very significant performance-per-dollar competition in the server market.

HPC Watts per Job
Comments Locked

146 Comments

View All Comments

  • DanNeely - Saturday, May 9, 2015 - link

    The work loads that you'd be buying racks of servers for are better handled with individually less expensive systems. These 4/8way leviatans are for the one or two core business functions that only scale up not out; so the typical customer would only be buying a handful of these max.

    The other half is that even a thousand or two thousand/year in increased operating costs for the server is not only dwarfed by the price of the server; but by the price of software that makes the server look cheap. The best server for those applications isn't the server that costs the least to run. It's not the server that has the cheapest hardware price either. It's the one that lets you get away with the cheapest licensing fee for the application you're running.

    One extreme example from the better part of a decade ago was that prior to being acquired by Oracle, Sun was making extremely wide processors that were very competitive on a per socket basis but used a huge number of really slow cores/threads to get their throughput. At that time Oracle licensed its DB on a per core (per thread?) basis, not per socket. As a result, an $80-100k HP/IBM server was a cheaper way to run a massive Oracle database than a $30k Sun box even if your workload was such that the cheap Sun hardware performed equally well; because Oracle's licensing ate several times the difference in hardware prices.
  • KateH - Saturday, May 9, 2015 - link

    I think the Intel transition was almost-entirely dictated by the lack of mobile options for PowerPC. 125W each for 970MP's sounds like a lot, but keep in mind that the Mac Pro has been using a pair of 100-130W Xeons since the beginning in 2008. Workstations and HPC are much, much less constrained by TDP. The direction that Power and SPARC has been taking for the past decade of cramming loads of SMT-enabled, high-clocked cores into a single chip somewhat negates the power concerns- if a Power8 is pulling a couple hundred watts for a 12C/96T chip, that's probably going to be worth it for the users that need that much grunt. Even Intel's E7-8890V3 is a 165W chip!
  • melgross - Saturday, May 9, 2015 - link

    Actually, the G5 was moving faster than Netburst was. In a bit over a year, it would have caught up, then moved past. Intel's unexpected move to the older "M" series for the Yonah series surprised everyone (particularly AMD), and allowed Apple to make that move. It never would have happened with Netburst.

    Apple switched for two reasons. One was that IBM failed to deliver a mobile G5 chip right at the time when laptop sales were increasing faster than desktop sales, and Apple was forced into using two G4s instead, which wasn't a good alternative. IBM delivered the chip after Apple switched over, but it was too late.

    The second reason was that Apple wanted better Windows compatibility, which could only occur using x86 chips.
  • Kevin G - Saturday, May 9, 2015 - link

    IBM did fail to make a G5 chip for laptops which significantly hurt Apple. Though Apple did have a plan B: PowerPC chips from PA-Semi. Also Apple never shipped a laptop with two G4 chips.

    And Apple didn't care about Windows software compatibility. Apple did care about hardware support as many chips couldn't be used in big endian mode or it made writing firmware for those chips complicated.

    And the real second reasons why Apple ditched PowerPC was due to chipsets. The PCIe based G5's actually had a chipset that was more expensive than the CPUs that were used. It was composed of a DDR2/Hypertransport north bridge, two memory buffers, a hypertransport PCIe bridge chip from Broadcomm/Serverworks and a south bridge chip to handle SATA/USB IO, Firewire 800 chip, and a pair of Broadcomm ethernet chips. The dual core 2.5 Ghz PowerPC 970MP at the time were going between $200 and $250 a piece. Not only was the hardware complex for the motherboards but so was the software side. PowerPC 970's cannot boot themselves as they need a service processor to initialize the FSB. The PowerPC 970 chipsets Apple used have an embedded PowerPC 400 series chip in them that'll initialize and calibrate the PowerPC's high speed FSB before handing off the rest of the boot process.
  • SnowCat00 - Friday, May 8, 2015 - link

    I would question how accurate that chart is...
    Mainframe sales are up: http://www.businessinsider.com/mainframe-saves-ibm...

    Also as someone who works with mainframes, if one wanted to they could consolidate a entire data center to one big z13.
  • ats - Friday, May 8, 2015 - link

    Um, I'm not sure you quite comprehend the scale of some of the datacenters out here. While Z13 is very nice, Its hardly a replacement of 10 racks of 8 socket Xeons.
  • usernametaken76 - Friday, May 8, 2015 - link

    That depends entirely on what those 10 racks worth of systems are doing and what type of applications they are running and at what utilization.

    Mainframes are built to run up to 100% utilization. Real world x86 systems at or above 80% are either rendering video, doing HPC or they have process control issues.

    Real world Enterprise applications running in a virtualized environment is a more appropriate comparison. Everywhere I look it's VMWare at the moment.

    Compare a PowerVM DLPAR to a VMWare VM running Linux x64 for a more fair, real world comparison.
  • melgross - Saturday, May 9, 2015 - link

    It isn't the same thing. Mainframes excell in I/O, which often trumps pure processing power. It's a very different environment.
  • ats - Saturday, May 9, 2015 - link

    Um, the days of mainframes having any real advantage in I/O are long gone, fyi.
  • Kevin G - Saturday, May 9, 2015 - link

    Sort of. Mainframes still farm off most IO commands to dedicated coprocessors so that they don't eat away CPU cycles running actually applications.

    Mainframes also have dedicated hardware for encryption and compression. This is becoming more common in the x86 world on a drive basis but the mainframe implements this at a system level so that any drive's data can be encrypted and compressed.

    It is also because of these coprocessors that IBM's mainframe virtualization is so robust: even the hypervisor itself can be virtualized on top of an another hypervisor without any slow down in IO or reduction in functionality.

Log in

Don't have an account? Sign up now