Future Visions, Cont: POWERed by NVIDIA

We have to check for ourselves of course, but IBM claims that compared to a dual K80 setup, a dual P100 gets a 2.07x speedup on the S822LC HPC. The same dual P100 on a fast Xeon with PCIe 3.0 only saw a 1.5x speedup. The benchmark used was a rather exotic Lattice QCD, or an approach to "solve quantum chromodynamics".

However, IBM reports that NVLink removes performance bottlenecks in

  1. FFT (signal processing)
  2. STAC-A2 (risk analysis)
  3. CPMD - computational chemistry
  4. Hash tables (used in many algorithms, security and big data)
  5. Spark

Those got our attention as, they are not some exotic niche HPC applications, but wide spread software components/frameworks used in both the HPC and data analytics world.

NVIDIA also claims that thanks to NVLink and the improved page migration engine capabilities, a new breed of GPU accelerated applications will be possible. The unified memory space (CUDA 6) introduced in Kepler was a huge step forward for the CUDA programmers: they no longer had to explicitly copy data from the CPU to the GPU. The Page Migration Engine would do that for them.

But the current system (Kepler and Maxwell) also had quite a few limitations. For example the memory space where the CPU and GPU are sharing data was limited to size of the GPU memory (typically 8-16 GB). The P100 now gets 49-bit virtual addressing, which means CUDA programs can thread every available RAM byte as one big virtual space. In the case of the newly launched S822LC, this means up to 1 TB of DRAM, and consequently 1 TB of memory space. Secondly, the whole virtual address space is coherent thanks to the new page fault mechanism: both the CPU and GPU can access the DRAM together. This requires OS support, and NVIDIA cooperated with the Linux community to make this happen.

Of course as the unified memory space gets larger, the amount of data to transfer back and forth gets larger too and that is where NVLink and the extra memory bandwidth of the POWER8 have a large advantage. Remember that even the POWER8 with only 4 buffer chips delivered twice as much memory bandwidth than the best Xeons. The higher end POWER8 have 8 buffer chips, and as a result offer almost twice as much memory bandwidth.

NVLink, together with the beefy memory subsystem of the POWER8, ensures that CUDA applications using such a unified 1TB memory space can actually work well.


The POWER8 - al heatsinks - looks less hot headed now that it has the companion of 4 Tesla P100 GPUs...

The S822LC will cost less than $50000, and it offers a lot of FLOPS per dollar if you ask us. First consider that a single Tesla P100 SXM2 costs around $9500. The S822LC integrates four of them, two 10-core POWER8s and 256 GB of RAM. More than 21 TFLOPS (FP64) connected by the latest and greatest interconnects in a 2U box: the S822LC HPC is going to turn some heads.

Last but not least, note that once you add two or more GPUs which consume 300W each, the biggest disadvantage of the POWER8 almost literally melts away. The fact that each POWER8 CPU may consume 45-100W more than the high performance Xeons seems all of a sudden relative and not such a deal breaker anymore. Especially in the HPC world, where performance is more important than Watts.

Future Visions: POWER8 with NVLink Back to the Present: Real World Application Benchmarking on IBM's S812LC
Comments Locked

49 Comments

View All Comments

  • PowerOfFacts - Friday, September 16, 2016 - link

    troll
  • BOMBOVA - Friday, October 7, 2016 - link

    Rich info , good scout
  • PowerOfFacts - Friday, September 16, 2016 - link

    Sigh ....
  • PowerOfFacts - Friday, September 16, 2016 - link

    That's strange, this site says you can buy a POWER8 server for $4800. https://www.ibm.com/marketplace/cloud/big-data-inf...

    Screwed up Power (so many times)? Please explain? Compared to what....SPARC? Itanium? If you are talking about those platforms, POWER has 70% of that marketshare. Do you mean against "Good Enough" Intel? Absolutely Intel is the market leader but only in share as it isn't in innovation. Power still delivers enterprise features for AIX and IBM i customers with features Intel could only dream about. Where the future of the data center is going with Linux, well it did take IBM a while to figure out they couldn't do it their way. Now, they are committed 100% (from my perspective as a non-IBMer while also being committed to AIX & IBM i as their is a solid install base there) which we all see in the form of IBM & even non-IBM solutions built by OpenPOWER partners and ISV solutions using little endian Linux. Yes, there are some workloads that require extra work to optimize but for those already optimized or those which can be optimized, those customers can now buy a server for less money that has the potential to outperform Intel by up to 2X, in a system using innovative technology (CAPI & NVLink) that is more reliable. I don't know, IBM may be late and Power has some work to do but I really don't think you can back up your statement that "IBM has screwed up power so many times". Latest OpenPOWER Summit was a huge success. Here is a Google interview https://www.youtube.com/watch?v=f0qTLlvUB-s&fe...

    Oh, but you were probably just trying to be clever and take a few competitive shots.
  • CajunArson - Saturday, September 17, 2016 - link

    Yeah, that $4800 Power server wasn't nearly equivalent to what was benchmarked in this review with the "midrange" server that costs over $11K on the same web page you cited.

    I could build an 8 or 12 core Xeon that would put the hurt on that low-end Power box for less money and continue to save money during every minute of operation.
  • JohanAnandtech - Saturday, September 17, 2016 - link

    " it will cost anywhere from 5-10X" . What do you base this on? Several SKUs of IBM are in the $1500 range. "Something like $10K for the processor". This seems to be about the high-end. The E7s are in the $4.6-7k range. Even if IBM would charge $10k for the high end CPUs, it is nowhere near being 5x more expensive. Unless I am missing something, you seem to have missed that IBM has a scale out range and is offering much more affordable OpenPOWER CPUs.
  • jesperfrimann - Wednesday, September 21, 2016 - link

    IMHO, the place where POWER servers make sense right now, is for use with IBM software. So if you are using something DB2 or WebSphere, where the real cost is the Software licenses.
    Then it's really a Nobrainer. Not that your local IBM sales Guy will like that you'll do a switch to a Linux@Power solution :)

    // Jesper
  • YukaKun - Thursday, September 15, 2016 - link

    For the Java tests, did you change the GC collector settings? Also, why only 24GB for the JVM? I run JBoss with 32GB across our servers. I'd use more, but they still have issues with going to higher levels.

    Cheers!
  • madwolfa - Thursday, September 15, 2016 - link

    Unless working with huge datasets you want to keep your JVM heap size as reasonably low as possible... otherwise there would be a penalty on GC performance. Granted, with this sort of hardware it would be pretty minuscule, but the general rule of thumb still applies...
  • JohanAnandtech - Thursday, September 15, 2016 - link

    No changes to the GC Collector settings. 24 GB for VM = 4x 24 GB + 4x 3 GB for Transaction Injector and 2 GB for the controllor = +/- 110 GB memory. We wanted to run it inside 128 GB as most of our DIMMs are 16 GB at DDR4-2400/2133.

Log in

Don't have an account? Sign up now