Future Visions: POWER8 with NVLink

Digging a bit deeper, the shiny new S822LC is a different beast. If offers the "NVIDIA improved" POWER8. The core remained the same but the CPU now comes with NVIDIA's NVlink technology. Four of these NVLink ports allows the S822LC to make a very fast (80 GB/s full duplex) and direct link with the latest and greatest of NVIDIA GPUs: the Tesla P100. Ryan has discussed NVLink and the 16 nm P100 in more detail a few months ago. I quote:

NVLink will allow GPUs to connect to either each other or to supporting CPUs (OpenPOWER), offering a higher bandwidth cache coherent link than what PCIe 3 offers. This link will be important for NVIDIA for a number of reasons, as their scalability and unified memory plans are built around its functionality.

Each P100 has a 720 GB/s of memory bandwidth, powered by 16 GB of HBM2 stacked memory. If you combine that with the fact that the P100 has more than twice the processing power in half precision and double precision floating point (important for machine learning algorithms) than its predecessor, it easy to understand why the data transfers from the CPU to GPU can easily become a bottleneck in some applications.

This means that the "OpenPOWER way of working" has enabled the IBM POWER8 to be the first platform to fully leverage the best of NVIDIA's technology. It is almost certain that Intel will not add NVLink to their products, as Intel went a totally different route with the Xeon and Xeon Phi. NVLink offers 80 GB/s of full-duplex connectivity per GPU, which is provided in the form of 4 20GB/s connections that can be routed between GPUs and CPUs as needed. By comparison, a P100 that plugs into an x16 PCIe 3.0 slot only gets 16 GB/s full duplex to communicate with both the CPU and the other GPUs. So theoretically, a quad NVLink setup from GPU to CPU offers at least 2.5 times more bandwidth. However, IBM claims that in reality the advantage is 2.8x as the NVLink is more efficient than PCIe (83% of theoretical bandwidth vs. 74%).

The NVLink equipped P100 cards will make use of the SXM2 form factor and come with a bonus: they deliver 13% more raw compute performance than the "classic" PCIe card due to the higher TDP (300W vs 250W). By the numbers, this amounts to 5.3 TFLOPS double precision for the SXM2 version, versus 4.7 TFLOPS for the PCIe version.

Recent Developments: OpenPOWER's Potential HPC Comeback Future Visions: POWERed by NVIDIA
POST A COMMENT

49 Comments

View All Comments

  • Eden-K121D - Thursday, September 15, 2016 - link

    Can't wait for Power9 Reply
  • Kevin G - Thursday, September 15, 2016 - link

    Same here. I'm really curious about the differences between the four different dies IBM will be offering. Certainly the mix of two core types and IO types should fill the assorted niches found in the server market. Reply
  • rahvin - Thursday, September 15, 2016 - link

    I can wait, it will be a market share failure like every other power because IBM will price it out of reach of any sensible price range. Going by previous attempts it will cost anywhere from 5-10X as much as an equivalent amount of x86 processing power. Something like $10K for the processor and a another $2-5 for the case, memory and motherboard and it will be equivalent to a quad x86 Xeon server that costs $5k for the same hardware.

    No one that doesn't need some special sauce it provides will buy them, particularly because you'd have to recompile all your software to use it. IBM has screwed up power so many times at this point that you'd have to be a fool to bet on it.
    Reply
  • Eden-K121D - Friday, September 16, 2016 - link

    Tell that to Google Reply
  • Brutalizer - Friday, September 16, 2016 - link

    Power9 will be 50% - 125% faster than power8, according to IBM.
    http://www.nextplatform.com/wp-content/uploads/201...
    On average it will be 75% faster.

    The specjbb2013 benchmark is broken, SPEC discovered the benchmark can be vendor optimized to provide false results so they fixed it in specjbb2015. IBM have released specjbb2015 numbers for their S812LC server achieving 44.900 for max-jops and 13.000 for crticial-jops. That is almost as good as the Intel Xeon E5-2699v4 result. However, what is interesting is the critical-jops, which measures critical throughput under SLAs. IBM have said they will compete with Intel, with their power9.

    (Of course, one SPARC M7 cpu achieves 120.600 max-jops and 60.300 critical-jops, that is 2.7x faster max-jops and 4.6x faster critical-jops. This is not using the built in hardware accelerators in SPARC. Next year the SPARC M8 arrives, which is 2x faster than M7. Today, Oracle have released six cpus in six years, each doubling performance (except the low cost S7, which is a crippled M7))
    Reply
  • wingar - Friday, September 16, 2016 - link

    I do like how you come with a comment that's incendiary towards POWER8 and POWER9, doing what you can to make it look worse... and then start touting how magical and wonderful SPARC M7 is. Using the same old Oracle-supplied performance claims without substantiating it. Funny, that. I think it stands out a little bit...

    But that's not what matters. If you run a simple google search, "site:anandtech.com brutalizer", you'll find comments with not a lot of variety. Usually commenting on anything x86 and POWER8, and in every single one (Except this one, actually! You actually reference an IBM supplied Spec result. However, you should link to it next time.) you tout the wonder of the latest SPARC of the time. Linking to Oracle-supplied benchmarks, on Oracles own site consistently concluding that Oracle outperforms their competitors. And every time you do this the comment seems to be as close to the top of the comment list as possible, for visibility.

    Have some links.
    http://www.anandtech.com/comments/10158/the-intel-...
    http://www.anandtech.com/comments/9193/the-xeon-e7...
    http://www.anandtech.com/comments/10230/ibm-nvidia...
    http://www.anandtech.com/comments/9567/the-power-8...
    http://www.anandtech.com/comments/7757/quad-ivy-br...
    http://www.anandtech.com/comments/7852/intel-xeon-...
    http://www.anandtech.com/comments/7285/intel-xeon-...

    But I found a couple of comments you left that anti-everyone-not-Oracle. Have some links.
    http://www.anandtech.com/comments/7334/a-look-at-a...
    http://www.anandtech.com/comments/7371/understandi...
    http://www.anandtech.com/comments/5831/amd-trinity...

    I'm sure there's more comments like this where you're actually adding to the conversation but those are the few I found, and they're always unrelated to CPUs and the server market. They seem to perhaps reflect your own interests? But there is one thing to point out here and that the first religiously-pro-Oracle comment you made seemed to be in 2014. What happened then? Did you buy the account? Did someone start paying you? I don't know.

    And hey, for fun I've actually posted this comment before to you, here's a link:
    http://www.anandtech.com/comments/10435/assessing-...
    Reply
  • Brutalizer - Friday, September 16, 2016 - link

    I am not doing something to make power look worse, I put it in perspective and post other benchmark numbers from Intel and Oracle so people can compare. Yes, I am posting hard facts that can be indendently verified, or are you rejecting the benchmarks I post? Why? Why do you think it is a bad thing I post benchmarks from other vendors than IBM? You dont want people to be able to build their own opinion about power by comparing with other vendors? Why not? Why is it dangerous when someone quote benchmarks from other vendors? Whats the problem with that?

    If you insist, here is the SPARC M7 specjbb2015 results.
    https://blogs.oracle.com/BestPerf/entry/201511_spe...
    Reply
  • PowerOfFacts - Friday, September 16, 2016 - link

    troll Reply
  • Brutalizer - Friday, September 16, 2016 - link

    "...Using the same old Oracle-supplied performance claims without substantiating it..."

    Now this is the same old FUD from the IBM supporters. As i have explained, mathematicians can always prove their claims with links to benchmarks, white papers, resaerch papers, or point to common comp sci knowledge, etc. So you are in deep sh-t now. I can always post links to the numbers I claim. You claim I can not, and I spread unsubstantiated information - now you are lying about me.

    Quote me on any number in any post - and I will post links to prove my numbers. If you ever find any post (you will not find any) where I make up numbers out of the blue to discredit IBM or Intel, you are correct that I post unsubstantiated claims. If you can not find any such posts by me, you are spreading FUD about me, and you lie about me. Now go ahead and quote me on any number where I make out things. I am waiting.

    You are not really smart to claim a mathematician to not be able to prove his figures. I am now able to prove you are a liar and FUDer.

    I think it is funny how the IBM supporters always FUD and try to discredit people, instead of countering the benchmark numbers. I post benchmark numbers, and instead of try to discuss the numbers you always attack me. That is not the scientific way, to avoid the hard facts and instead try to discredit the opponent. You should instead try to dissect my numbers and links instead of attacking me. But always, always, the IBM crowd does that " oh, he is an Oracle supporter" - so what? You are an IBM supporter! The difference is that I post numbers, and IBM crowd attacks me instead of countering with other numbers.

    If you want to disprove my claims about Sparc, post numbers that disproves my benchmarks. Do not attack me, that does not win you any discussions.
    Reply
  • SarahKerrigan - Friday, September 16, 2016 - link

    Sure, it's true that on SPECjbb2015 a T7-1 beats a low-end IBM Turismo machine, an S812LC (with an entry price under $5000 list, compared to over $30000 entry price for the T7-1), by a factor of 2.7x on max-jops. It's also true that M7 came out almost a year and a half after P8 did, and that you can get a dual-CPU P8 server with that same processor, and 256GB RAM, for well under half of the list price of a single-CPU T7-1 with 128GB.

    Starting to see why IBM has over 70% of the non-x86 server market?
    Reply

Log in

Don't have an account? Sign up now