Floating Point: C-ray

Shifting over from integer to floating point benchmarks we have C-ray. C-ray is an extremely simple ray-tracer which is not representative of any real world raytracing application. In fact, it is essentially a floating point benchmark that runs out the L1-cache. Luckily it is not as synthetic and meaningless as Whetstone, as you can actually use the software to do simple raytracing. That is not the kind of benchmark we like to use for the evaluations of server CPUs, but since our first efforts to port some of our favorite applications to OpenPOWER failed, we settled for something easier. We knew we would have the POWER8 system only for a few weeks, so we had to play it safe.

First we compiled the C-ray multi-threaded version with -O3 -ffast-math. To understand the CPU performance better, we limited C-ray with taskset to one or two threads (CPU 0 and 18) on the Haswell-based Xeon and one to eight threads on the POWER8. We also kept the output resolution at 768x432 to keep the render times in check. The "sphfract" file was used as input.

C-ray 768x432 on one core

Real floating point intensive applications tend to put the memory subsystem under pressure, and running a second thread makes it only worse. So we are used to seeing that many HPC applications performe worse with multi-threading on. But since C-ray runs mostly out of the L1-cache, we get different behavior. Still, 8 threads of floating action seem to be too much: the POWER8 delivers the best FP performance at 4 threads. At this point, the POWER8 core is able to deliver 20% higher floating point performance than the Haswell Xeon.

Next we used all 160 (20 x 8 threads SMT) or 72 (36 x 2 threads SMT) threads and increased the resolution to 3840x2160.

C-ray rendering at 3840x2160

With a core count that is 80% higher, there is nothing stopping the Xeon E5-2699 v3 from taking the top spot. Still, the POWER8 delivers solid performance and outperforms the slower Xeon E5-2695 v3 by 5%. Although the real world relevance of this benchmark is small, we now have an idea of how good the "basic FP" performance is. Otherwise in real world applications, the use of AVX-2/VSX and the available bandwidth will play a role.

Influence of the Compiler: Integer Floating Point & Compilers
Comments Locked

146 Comments

View All Comments

  • hissatsu - Friday, November 6, 2015 - link

    You might want to look more closely. Thought it's a bit blurry, I'm almost certain that's the 80+ Platinum logo, which has no color.
  • DanNeely - Friday, November 6, 2015 - link

    That's possible; it looks like there's something at the bottom of the logo. Google image search shows 80+ platinum as a lighter silver/gray than 80+ silver; white is only the original standard.
  • Shezal - Friday, November 6, 2015 - link

    Just look up the part number. It's a Platinum :)
  • The12pAc - Thursday, November 19, 2015 - link

    I have a S814, it's Platinum.
  • johnnycanadian - Friday, November 6, 2015 - link

    Oh yum! THIS is what I still love about AT: non-mainstream previews / reviews. REALLY looking forward to more like this. I only wish SGI still built workstation-level machines. :-(
  • mapesdhs - Tuesday, November 10, 2015 - link


    Indeed, but it'd need a hefty change in direction at SGI to get back into workstations again, so very unlikely for the forseeable future. They certainly have the required base tech (NUMALink6, MPI offload, etc.), namely lots of sockets/cores/RAM coupled with GPUs for really heavy tasks (big data, GIS, medical, etc.), ie. a theoretical scalable, shared-memory workstation. But the market isn't interested in advanced performance solutions like this atm, and the margin on standard 2/4-socket systems isn't worthwhile, it'd be much cheaper to buy a generic Dell or HP (plus, it's only above this no. of sockets that their own unique tech comes into play). Pity, as the equivalent of a UV 30/300 workstation would be sweet (if expensive), though for virtually all of the tasks discussed in this article, shared memory tech isn't relevant anyway. The notion of connectable, scalable, shared memory workstations based on NV gfx, PCIe and newer multi-core MIPS CPUs was apparently brought up at SGI way back before the Rackable merger, but didn't go anywhere (not viable given the financial situation at the time). It's a neat concept, eg. imagine being able to connect two or more separate ordinary 2/4-socket XEON workstations together (each fitted with, say, a couple of M6000s) to form a single combined system with one OS instance and resources pool, allowing users to combine & split setups as required to match workloads, but it's a notion whose time has not yet come.

    Of course, what's missing entirely is the notion of advanced but costly custom gfx, but again there's no market for that atm either, at least not publicly. Maybe behind the scenes NV makes custom stuff the way SGI used to for relevant customers (DoD, Lockheed, etc.), but SGI's products always had some kind of commercially available equivalent from which the custom builds were derived (IRx gfx), whereas atm there's no such thing as a Quadro with 30000 cores and 100GB RAM that costs $50K and slides into more than one PCIe slot which anyone can buy if they have the moolah. :D

    Most of all though, even if the demand existed and the tech could be built, it'd never work unless SGI stopped using its pricing-is-secret reseller sales model. They should have adopted a direct sales setup long ago, order on the site, pricing configurator, etc., but that never happened, even though the lack of such an option killed a lot of sales. Less of an issue with the sort of products they sell atm, but a better sales model would be essential if they were to ever try to sell workstations again, and that'd need a huge PR/sales management clearout to be viable.

    Pity IBM couldn't pay NV to make custom gfx, that'd be interesting, but then IBM quit the workstation market aswell.

    Ian.
  • mostlyharmless - Friday, November 6, 2015 - link

    "There is definitely a market for such hugely expensive and robust server systems as high end RISC machines are good for about 50.000 servers. "

    Rounding error?
  • DanNeely - Friday, November 6, 2015 - link

    50k clients would be my guess.
  • FunBunny2 - Friday, November 6, 2015 - link

    (dot) versus (comma) most likely. Euro centric versus 'Murcan centric.
  • DanNeely - Friday, November 6, 2015 - link

    If that was the case, a plain 50 would be much more appropriate.

Log in

Don't have an account? Sign up now