Closing Thoughts

Testing both the IBM POWER8 and the Intel Xeon V4 with an unbiased compiler gave us answers to many of the questions we had. The bandwidth advantage of POWER8's subsystem has been quantified: IBM's most affordeable core can offer twice as much bandwidth than Intel's, at least if your application is not (perfectly) vectorized.

Despite the fact that POWER8 can sustain 8 instructions per clock versus 4 to 5 for modern Intel microarchitectures, chips based on Intel's Broadwell architecture deliver the highest instructions per clock cycle rate in most single threaded situations. The larger OoO buffers (available to a single thread!) and somewhat lower branch misprediction penalty seem to the be most likely causes.

However, the difference is not large: the POWER8 CPU inside the S812LC delivers about 87% of the Xeon's single threaded performance at the same clock. That the POWER8 would excel in memory intensive workloads is not a suprise. However, the fact that the large L2 and eDRAM-based L3 caches offer very low latency (at up to 8 MB) was a surprise to us. That the POWER8 won when using GCC to compile was the logical result but not something we expected.

The POWER8 microarchitecture is clearly built to run at least two threads. On average, two threads gives a massive 43% performance boost, with further peaks of up to 84%. This is in sharp contrast with Intel's SMT, which delivers a 18% performance boost with peaks of up to 32%. Taken further, SMT-4 on the POWER8 chip outright doubles its performance compared to single threaded situations in many of the SPEC CPU subtests.

All in all, the maximum throughput of one POWER8 core is about 43% faster than a similar Broadwell-based Xeon E5 v4. Considering that using more cores hardly ever results in perfect scaling, a POWER8 CPU should be able to keep up with a Xeon with 40 to 60% more cores.

To be fair, we have noticed that the Xeon E5 v4 (Broadwell) consumes less power than its formal TDP specification, in notable contrast to its v3 (Haswell) predecessor. So it must be said that the power consumption of the 10 core POWER8 CPU used here is much higher. On paper this is 190W + 64W Centaur chips, versus 145W for the Intel CPU. Put in practice, we measured 221W at idle on our S812LC, while a similarly equipped Xeon system idled at around 90-100W. So POWER8 should be considered in situations where performance is a higher priority than power consumption, such as databases and (big) data mining. It is not suited for applications that run close to idle much of the time and experience only brief peaks of activity. In those markets, Intel has a large performance-per-watt advantage. But there are definitely opportunities for a more power hungry chip if it can deliver significantly greater performance.

Ultimately the launch of IBM's LC servers deserves our attention: it is a monumental step forward for IBM to compete with Intel in a much larger part of the market. Those servers seem to be competitively priced with similar Xeon systems and can access the same Little Endian data as an x86 server. But can POWER8 based system really deliver a significant performance advantage in real server applications? In the next article we will explore the S812LC and its performance in a real server situations, so stay tuned.

Multi-Threaded Integer Performance: SPEC CPU2006
POST A COMMENT

124 Comments

View All Comments

  • close - Thursday, July 21, 2016 - link

    This right here is why I keep coming back to Anandtech. Thumbsup! Reply
  • jardows2 - Thursday, July 21, 2016 - link

    Agreed. There are plenty of places you can go to find out how pretty your games will look, but this sort of stuff is much more interesting to me!

    Looking forward to the application numbers. Power8 may shape up to be a nice server alternative. I would like to see about virtualization. With the threaded capabilities, it might just be a good platform for that.
    Reply
  • Brutalizer - Friday, July 22, 2016 - link

    Regarding virtualization, SPARC M7 is more than 4x faster than POWER8 on SPECvirt_sc2013, and more than 2x faster than x86
    https://blogs.oracle.com/BestPerf/entry/20151025_s...
    Regarding SPECcpu2006, SPARC M7 is 1.9x and 1.8x faster than POWER8, and is faster than x86 as well:
    https://blogs.oracle.com/BestPerf/entry/201510_spe...
    Regarding memory bandwidth, SPARC M7 is 2.2x and 1.7x faster than POWER8 and 2.4x faster than x86 on STREAM benchmarks:
    https://blogs.oracle.com/BestPerf/entry/20151025_s...
    If you dig a bit on that web site, you will find 30ish world records, where SPARC M7 is 2-3x faster than POWER8 and x86, all the way up to 11x faster.

    It is interesting to delve in to the technology behind POWER8 and x86, but in the end, what really matters, is how fast the cpu performs in real life workloads and benchmarks. SPARC has lower IPC than x86, but as real life server workloads have an IPC of 0.8, SPARC which is a server cpu, is much faster than x86 in practice. In theory, x86 and POWER8 are fast, but in practice, they are much slower than SPARC. So, you can theoretize all you want, but in the end - which cpu is fastest in real workloads and in real benchmarks? SPARC. Just look at all the benchmarks above, where SPARC M7 is faster in number crunching, Big data, neural networks, Hadoop, virtualization, memory bandwidth, etc etc. And if you also factor in the business benchmarks, such as SAP, Peoplesoft, databases etc - there is no contest. You get twice the performance, or more, with a SPARC M7 server than the competitors.

    SPARC M7 can also turn on encryption on everything, and loose 2-3% performance. Whereas encryption on POWER8 and x86 typically reduces performance down to 33% or lower. So, if you benchmark encrypted workloads, then SPARC M7 is not typically 2-3x faster, but another 3 times faster again - i.e. typically faster 6-9x.
    Reply
  • Kevin G - Friday, July 22, 2016 - link

    Oracle marketing at its finest.

    The virtualization score is good vs. POWER8 mainly based on the radical different in core count: 32 vs. 6. Yeah, even with lower IPC, I'd expect the higher core count system to fair better. Also note that IBM offers such higher core count systems and at higher clock speeds which would close that gap.

    Same for the claims of being twice as fast in raw benchmarks: Oracle isn't comparing there best against IBM's best POWER8. There choice of comparison point was simply arbitrary to make SPARC look good, as is the job of their marketing department. Real performance comparisons come from independent reports.

    To get the memory bandwidth advantage Oracle proclaims, they have to use twice as many sockets.
    Reply
  • SarahKerrigan - Friday, July 22, 2016 - link

    These supposed Oracle "wins" are all based on worst-case scenarios for Power8 - ie, testing a DCM based system and counting each DCM as two processors. This isn't very useful for comparison to Power8 overall, as the entry-level machines like the one in this article, and the S822LC positioned above it, all use SCM's (with as many as twelve cores.)

    M7 is a first-rate CPU, but it's also in a totally different cost class; the cheapest M7 config listed on Oracle's website costs over US$40k, for a one-processor machine. Considering you can get a pair of 10-core Power8's with 256GB of RAM in an S822LC for US$14,300 list, this is an exceptionally tough sell for those not wedded to Solaris (and by the way, there's no RHEL, SLES, or Ubuntu for SPARC - Solaris is pretty much the only game in town.)

    My company is currently deploying an S812LC and intends to deploy an S822LC in the future; we briefly considered SPARC but found the style of marketing that Oracle and its proxies seem to favor to be deeply offputting, as is the relatively poor perf/$ compared to both Power and Intel. Our loads (mainly a large PostgreSQL application) scale well with memory bandwidth and cache sizes, and we've found S812LC perf/$ to be first-rate. The main downsides have just been related to the relative immaturity of the ppc64le platform (occasional lack of available packages, etc.)
    Reply
  • Brutalizer - Sunday, July 31, 2016 - link

    These oracle sparc m7 benchmarks vs IBM power8 are not worst case. The DCM Power8 module, actually consists of two power8 CPUs, in one socket. So there is nothing wrong with these benchmarks. It is up to IBM to release benchmarks with two power8 CPUs in one socket, not oracle choice. IBM has for decades promoted few strong cores instead of many weaker cores. For instance, IBM claimed "dual core power6 @ 5 ghz was superior to 8core sparc niagara2 @ 1.6 ghz because databases runs best on few but strong cores" and IBM talked about future super strong single/dual core 6-7 ghz power CPUs and mocked sparc many but weaker cores because databases are worthless on sparc. Back then sparc were first with 8 cores, and it was very controversial having that many cores. Later IBM realized laws of physics prohibit highly clocked CPUs, so IBM abandoned that path and followed sparc with many knower clocked cores. Just like Intel abanoned Prescott with high clocks. Today everybody have many lower clocked cores, just like spare decades ago.

    Of course, if IBM released benchmarks with other configurations of power8, oracle would be happy to use them, but IBM has not. Oracle has no choice than to use those benchmarks that IBM has released. It is not oracles choice what benchmarks IBM release.

    We also know that power8 is slower than the latest Intel xeons, and we know that sparc m7 is typically 2-3x faster than Intel Xeon, so probably these benchmarks from IBM vs sparc m7 benchmarks are true. If you find other IBM power8 benchmarks I am sure oracle will compare to them instead. But you can only bench against ibm's own results, right?

    Regarding my credibility, yes, I am an sparc supporter. What is the problem with being an supporter? I know there are IBM supporters here, and there are nvidia, Amd, Intel etc supporters. What is wrong with that? Does the fact that I consider sparc to be superior, invalidate the official oracle vs IBM vs Intel benchmarks? I have not created those benchmarks, IBM has. And oracle. And Intel. Instead of you, IBM supporters, linking to official superior IBM power8 benchmarks you claim that because I am an sparc supporter, those official vendor benchmarks can not be trusted. Instead of proving that power8 is faster with benchmarks, you resort to attacking me. That does not win you any discussions. Show us facts and benchmarks if you want invalidate my linked benchmarks, instead of attacking me. Fact is, you have not proven anything regarding power8 inferiority.

    And why do I keep talking about sparc m7? Well, it seems people believe that Intel and power8 is so fast, but in fact there are another cpu out there, 2-3x faster, up to 11x faster. People just don't know that sparc is the worlds fastest CPU. I would like anandtech to talk about the best CPU in the world instead of slow IBM power or Intel Xeon CPUs. But anandtech don't.

    Regarding myself, yes I have been interviewed in Swedish media, and it is evident that I have always worked finance. I have never worked at Sun nor Oracle. Just read the interview. The last years I am an quantitative analyst concocting trading strategies. I have never worked in IT. i just happen to be a nerd and geek, and i only support the best tech, and it is sparc and Solaris. IBM and Intel sucks. Just compare their lousy performance to sparc m7
    Reply
  • SarahKerrigan - Sunday, July 31, 2016 - link

    "The DCM Power8 module, actually consists of two power8 CPUs, in one socket."

    Dude, nobody outside of Oracle marketing cares, just like they didn't care when Xeon and Opteron used MCM's. IBM has SCM's going all the way up to 12 cores and 8 Centaur links, they just use DCM's for cost reasons on some (but not all) smaller machines. These have the same number of Centaur links per socket as the big SCM's, and they're priced as one would expect of one or two socket enterprise systems. Realistically, the 8-Centaur SCM has roughly equivalent memory bandwidth to the 8-Centaur DCM.

    "Later IBM realized laws of physics prohibit highly clocked CPUs, so IBM abandoned that path and followed sparc with many knower clocked cores. Just like Intel abanoned Prescott with high clocks. Today everybody have many lower clocked cores, just like spare decades ago."

    You mean like when Oracle replaced 16-core 1.65GHz T3 with 8-core 3GHz T4? Which, by the way, had very similar throughput performance (which you say is all that matters) to the T3, but had far higher single-thread and single-core performance? If only throughput matters, why would Oracle do such a thing? It's quite a thing for you to imply Oracle doesn't know what they're doing!

    They also have been publishing benchmarks for their shiny new S7 chip where they lose per-chip to the Xeon - but they win per-core, which you've said on many occasions doesn't matter. Here are some examples:

    https://blogs.oracle.com/BestPerf/entry/20160629_n...
    https://blogs.oracle.com/BestPerf/entry/20160629_r...

    Comparisons to IBM are conspicuously absent, I suspect because Power perf/core is rather impressive.

    "For instance, IBM claimed "dual core power6 @ 5 ghz was superior to 8core sparc niagara2 @ 1.6 ghz because databases runs best on few but strong cores" and IBM talked about future super strong single/dual core 6-7 ghz power CPUs and mocked sparc many but weaker cores because databases are worthless on sparc."

    IBM has never reduced per-core or single-thread performance generation to generation. P7 and P8 were both massive improvements in both categories. IBM has not historically shown interest in "weaker" cores for Power.

    "Well, it seems people believe that Intel and power8 is so fast, but in fact there are another cpu out there"

    Yes. For the low, low price of over forty thousand dollars for the lowest-end, one-processor M7 system with public prices on Oracle's website.

    "2-3x faster"

    Consulting officially published results on an industry-standard benchmark:

    Xeon E7-8890v4, 2.2GHz: SPECint rate result of 927/chip, 24 cores (38/core)
    Power8 SCM, 4GHz: SPECint rate result of 900/chip, 12 cores (75/core)
    SPARC M7, 4.13GHz: SPECint rate result of 1200/chip, 32 cores (37/core)

    Not that impressive - especially given M7's price. And certainly not 2-3x of anything (or even 1.9x). It's 1.3x... while having 2.5x as many cores. Additionally, for a large range of applications, single-thread performance matters.

    "up to 11x faster."

    When running in-memory queries inside Oracle DB using accelerator instructions added to SPARC M7 specifically for Oracle DB, yes.

    By the way, since you mentioned memory bandwidth... how does it feel to have two-processor SPARC S7 losing on STREAM Triad to entry-level, one-processor Power8 machines that cost significantly less? Compare https://blogs.oracle.com/BestPerf/entry/20160629_s... to the entry-level Power8 results in the article we're commenting on!

    Oracle proponents need to do better than this. At least Phil Dunn resorts less to copypasta...

    SPEC references:
    https://spec.org/cpu2006/results/res2016q2/cpu2006...
    https://spec.org/cpu2006/results/res2015q4/cpu2006...
    https://spec.org/cpu2006/results/res2015q2/cpu2006...
    Reply
  • close - Tuesday, August 02, 2016 - link

    "What is the problem with being an supporter? What is wrong with that?"
    Lying, deceiving, etc.

    This is what Oracle does because simply put ever since they acquired Sun those products went to sh*t. Oracle are reverse-alchemists. Whether it's software (like Java) or hardware (like the Sparc) Oracle managed to turn those gold nuggets into lead weights. Java was buried by Google, Sparc was buried by Intel and IBM.

    Oracle always resorts to this kind of piss-poor advertising and it's not for the customers themselves. They try to save face with numbers on that site with one reason only: to have something to show during their conferences. Because companies don't rely on numbers in a benchmark when committing to multi-year contracts and getting tied into a specific ecosystem.
    Right now only a handful of government institutions and some in regulated industries still rely on Sparc and only in corner cases. Most times it's just until they manage to migrate off them.
    Reply
  • close - Tuesday, August 02, 2016 - link

    The EXA products might be the only ones with some solid popularity because it's the full package but they do come with plenty of caveats. Having worked in the defense and financial sectors for a long time I've seen plenty of consolidation being done on newer Oracle/Sparc systems but not so many new deployments (a handful). And the proof is in the numbers. Oracle can't seem to make any headway into this.
    This isn't the kind of runaway success you'd expect for such an "overpowering" system.

    P.S. Google for "For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer" and see the army of posters Oracle is employing and the kind of tactics Oracle they resort to. And that's just the official posters.
    Reply
  • close - Tuesday, August 02, 2016 - link

    Their engineered systems for integrated infrastructure and platforms (the latter being their driver) are great but not because of the hardware or the CPU in particular. It's because of the value of the whole package that includes the software layer. Nobody actually cares about the CPU in those particular products and if the CPU were being sold they would have tough time.
    And not least, they almost always HAVE to heavily discount the price in order to make the sale. From personal and recent experience Oracle was eager enough to undercut competitors like Cisco, VCE or HP (HP has 3 digit growth in this segment YoY for 2-3 years now) and discounted so aggressively that we ended up with 50% savings...
    Reply

Log in

Don't have an account? Sign up now