Comparing with Intel's Best

Comparing CPUs in tables is always a very risky game: those simple numbers hide a lot of nuances and trade-offs. But if we approach with caution, we can still extract quite a bit of information out of it.

Feature IBM POWER8
 
Intel
Broadwell (Xeon E5 v4)
Intel
Skylake
L1-I cache
Associativity
32 KB
8-way
32 KB
8-way
32 KB
8-way
L1-D cache
Associativity
64 KB
8-way
32 KB
8-way
32 KB
8-way
Outstanding L1-cache misses 16 10 10
Fetch Width 8 instructions 16 bytes (+/- 4-5 x86) 16 bytes (+/- 4-5 x86)
Decode Width 8 4 µops 5-6* µops
(*µop cache hit)
Issue Queue 64+15 branch+8 CR
= 87 
60 unified 97 unified
Issue Width/Cycle 10   8 8
Instructions in Flight 224 (GCT SMT-8 modus) 192 (ROB) 224 (ROB)
Archi regs
Rename regs
32 (ST), 2x32 (SMT-2)
92 (ST), 2x92 (SMT-2)
16
168
16
180
Load
Bandwidth (per unit)
Load Queue Size
4 per cycle
16B/cycle

44 entries
2 per cycle
32B/cycle

72 entries
2 per cycle
32B/cycle

72 entries
Store
Bandwidth
Store Queue Size
2 per cycle
16B/cycle
40 entries
1 per cycle
32B/cycle
42 entries
1 per cycle
32B/cycle
56 entries
Int. Pipeline Length

18 stages

19 stages
14 stage from µop cache


19 stages
14 stage from µop cache
TLB 2048
4-way
128I + 64D L1
1024
8-way
128I + 64D L1
1536
8-way
Page Support 4 KB, 64 KB, 16 MB, 16 GB 4 KB, 2/4 MB, 1 GB 4 KB, 2/4 MB, 1 GB

Both CPUs are very wide brawny Out of Order (OoO) designs, especially compared to the ARM server SoCs.

Despite the lower decode and issue width, Intel has gone a little bit further to optimize single threaded performance than IBM. Notice that the IBM has no loop stream detector nor µop cache to reduce branch misprediction. Furthermore the load buffers of the Intel microarchitecture are deeper and the total number of instructions in flight for one thread is higher. The TLB architecture of the IBM POWER8 has more entries while Intel favors speedy address translations by offering a small level one TLB and a L2 TLB. Such a small TLB is less effective if many threads are working on huge amounts of data, but it favors a single thread that needs fast virtual to physical address translation.

On the flip side of the coin, IBM has done its homework to make sure that 2-4 threads can really boost the performance of the chip, while Intel's choices may still lead to relatively small SMT related performance gains in quite a few applications. For example, the instruction TLB, µop cache (Decode Stream Buffer) and instruction issue queues are divided in 2 when 2 threads are active. This will reduced the hit rate in the micro-op cache, and the 16 byte fetch looks a little bit on the small side. Let us see what IBM did to make sure a second thread can result in a more significant performance boost.

Inside the Beast(s) Heavy SMT: Multi Threading Prowess
Comments Locked

124 Comments

View All Comments

  • Kevin G - Tuesday, August 2, 2016 - link

    "These oracle sparc m7 benchmarks vs IBM power8 are not worst case."
    >Eh? Did Oracle release the complete system configuration of the POWER8 for their testing? From your stream link you can find this PDF ( https://blogs.oracle.com/BestPerf/resource/stream/... ) where Oracle only test with 24 threads out of 96 possible in the environment and out of 192 possible supported with the hardware. This document does not detail how many cDIMMs were installed in a system which has a direct impact on available bandwidth. Case in point, the 512 GB of memory on the POWER8 system can be configured with the bare minimum number of cDIMMs in a system. That is a worst case scenario for POWER8 and we don't know if Oracle used it.

    Oracle also made a source code change for STREAM for reverse allocation. The thing that is missing here is a comparison to the original code. This could impact how well prefetchers work and favor a particular architecture and thus impact performance. Thus we don't know if this change is a best or worst case scenario for comparison purposes.

    "If you find other IBM power8 benchmarks I am sure oracle will compare to them instead. But you can only bench against ibm's own results, right?"
    >I find it perfectly fair to use submitted benchmarks from IBM to compare against similarly configured systems submitted by Oracle. POWER8 systems are available with higher clocks and more cores than what is generally used in the open benchmarks IBM has submitted. Thus it is deceptive to claim that SPARC is decisively faster when there is beefier IBM hardware available.

    "I am an sparc supporter. What is the problem with being an supporter?"
    >Nothing inherently wrong with that but you are incredibly closed minded to any other alternative. You are blind to the idea that anything could be better or competitive in any metric. The reality of IT is that there no one tool that best fits every job. Anyone claiming otherwise is trying to sell you something.

    "I would like anandtech to talk about the best CPU in the world instead of slow IBM power or Intel Xeon CPUs. But anandtech don't."
    >How about you use your contacts at Oracle to get Anandtech a test system for some real independent analysis?

    "I have never worked in IT."
    >This explains a lot.
  • wingar - Saturday, July 23, 2016 - link

    So, no one in the entire comment section mentioned SPARC at all. You come along, start ragging on POWER8, how SPARC is so much better, and then link to benchmarks on Oracle's website, with results provided by Oracle, with the conclusion of Oracle being so much better. Not only that, but the benchmarks you link require Oracle to use much higher end and incredibly higher cost hardware to beat low and mid-range POWER8 with.

    On top of all that you make dubious and unsubstantiated claims about server workloads and claims of performance of POWER8 and x86.

    And finally to top it off, your comment is barely even related to the comment you replied to. It seems you picked the comment most visible in the thread to reply to.

    So, to everyone else I think it's quite clear this is just an Oracle shill, please just ignore him.
  • wingar - Saturday, July 23, 2016 - link

    So, adding on to this I was curious. I decided to make a simple google search, "site:anandtech.com brutalizer". What did I find? Comments on anything x86 and POWER8, every single one talking about how Oracle and SPARC are so much better than whatever the review is talking about. Consistently linking to Oracle-ran benchmarks on Oracles own site with the conclusion that Oracle is better. Consistently making dubious claims about the non-Oracle hardware. Every single comment I found shilling for SPARC, and every single one as close to the top of the comments list as possible. You seem to want to be as visible as possible.

    Have some links.
    http://www.anandtech.com/comments/10158/the-intel-...
    http://www.anandtech.com/comments/9193/the-xeon-e7...
    http://www.anandtech.com/comments/10230/ibm-nvidia...
    http://www.anandtech.com/comments/9567/the-power-8...
    http://www.anandtech.com/comments/7757/quad-ivy-br...
    http://www.anandtech.com/comments/7852/intel-xeon-...
    http://www.anandtech.com/comments/7285/intel-xeon-...

    Infact I found a couple of comments you left that *weren't* shilling. Have some links.
    http://www.anandtech.com/comments/7334/a-look-at-a...
    http://www.anandtech.com/comments/7371/understandi...
    http://www.anandtech.com/comments/5831/amd-trinity...

    It's hard to draw a conclusion from those two links but I'll point a few things. All of the non-shilling comments you made were in 2013. Every single pro-Oracle comment you made was at minimum 2014. Sounds to me like you were either bought out at that time, or you bought someone else's account, or perhaps this was the time you were put on Oracle's pay-cheque. It's quite possible that there's more comments that aren't shilling that I've missed here.

    So, please. Try again.
  • Zetbo - Saturday, July 23, 2016 - link

    He is a known Oracle Troll/Shill Kebbabert who is probably paid by Oracle to post crap all over internet. If he is not paid then thats just sad...
  • wingar - Saturday, July 23, 2016 - link

    Ohhh yes I am well aware, I encounter him on El Reg and other places all the time. But hey, I hate shills, so I'm quite happy to destroy any sense of credibility he may have for those not in the know.
  • tipoo - Tuesday, July 26, 2016 - link

    Ooh, good callout. It would almost be weirder if Oracle *didn't* pay him after all those links, lol.
  • Kevin G - Wednesday, July 27, 2016 - link

    Here is an interview with him (in Swedish) about how he was invited by then Sun to a party for his efforts:
    http://it24.idg.se/2.2275/1.202161/staende-ovation...
  • wingar - Thursday, July 28, 2016 - link

    I'd call it sad, really. Very sad.
  • alpha754293 - Wednesday, July 27, 2016 - link

    Dude, SPARC sucks.

    Look at SWaP and TCO. Re-run your "analysis", it's obvious that SPARC sucks.

    Can you even RUN Ubuntu on SPARC anymore?

    Their FP performance sucks and it always have. That's why The Niagara T2 had to have FPUs ADDED to ALL of the cores because sharing a single FPU with 8 cores was a really bad/dumb idea.

    I've looked at SPARC before. Had a couple of them and had a SunFire server before as well, and POWER/Intels can easily beat SPARC, especially once you consider TCO.

    The company that I work for now (a Fortune 10 company) dumped all of the SPARC workstations for Intel.
  • RISC is RISKY! - Tuesday, August 2, 2016 - link

    I would support "Brutalizer". Every processor has its strength and weakness. If memory architecture is considered, for the same capacity, Intel is conjested memory, IBM is very distributed and Oracle-Sun is something in between. So Intel will always have memory B/W problem every way. IBM has memory efficiency problem. Oracle in theory doesn't have problem, but with 2 dimm per ch, that look like have problem. Oracle-Sun is for highly branched workload in the real world. Intel is for 1T/Core more of single threaded workloads and IBM is for mixed workloads with 2T-4T/Core priority. So supercomputing workloads will work fast on IBM now, compared to intel and sparc, while analytics and graph and other distributed will work faster on SPARC M7 and S7 (although S7 is resource limited). While for intel, a soft mix of applications and highly customized os is better. Leave the business decisions and the sales price. List prices are twice as much as sales price in the real world. These three processors (xeon e5v4, power8-9, sparc m7-s7) are thoroughly tuned for different work spaces with very little overlap. So there's no point in comparing them other than their specs. Its like comparing a falcon and a lion and a swordfish. Their environments are different even though all of them hunt. Thats in the real world. So benchmarks are not the real proof. We at the university of IITD have lots and lots of intel xeon e5v4, some P8 (10-15 single and dual sockets), and a very few (1-2 two socket M7 and 2 two socket S7). We run anything and every thing on any of these, we get our hands on. And this is the real world conclusion. So don't fight. Its a context centric supply of processors!

Log in

Don't have an account? Sign up now