Energy and Pricing

Unfortunately, accurately and fairly comparing energy consumption at the system level between the S822L and other systems wasn't something we were able to do, as there were quite a few differences in the hardware configuration. For example, the IBM S822L had two SAS controllers and we had no idea how power hungry that chip under the copper heatsink was. Still there is no doubt that the dual CPU system is by far the most important power consumer when the server system is under load. In case of the IBM system, the Centaur chips will take their fair share too, but those chips are not optional. So we can only get a very rough idea how the power consumption compares.

Xeon E5 299 v3/POWER8 Comparison (System)
Feature 2x Xeon E5-2699v3 2x IBM POWER8 3.4 10c
IBM S822L
Idle 110-120W 360-380W

Running NAMD (FP)


540-560W

700-740W
Running 7-zip (Integer)

300-350W


780-800W

The Haswell core was engineered for mobile use, and there is no denying that Intel's engineers are masters at saving power at low load.


The mightly POWER8 is cooled by a huge heatsink

IBM's POWER8 has pretty advanced power management, as besides p-states, power gating cores and the associated L3-cache should be possible. However, it seems that these features were not enabled out-of-the box for some reason as idle power was quite high. To be fair, we spent much more time on getting our software ported and tuned than on finding the optimal power settings. In the limited time we had with the machine, producing some decent benchmarking numbers was our top priority.

Also, the Centaur chips consume about 16W per chip (Typical, 20W TDP) and as we had 8 of them inside our S822L, those chips could easily be responsible for consuming around 100W.

Interestingly, the IBM POWER8 consumes more energy processing integers than floating point numbers. Which is the exact opposite of the Xeon, which consumes vastly more when crunching AVX/FP code.

Pricing

Though the cost of buying a system might be only "a drop in the bucket" in the total TCO picture in traditional IT departements running expensive ERP applications, it is an important factor for almost everybody else who buys Xeon systems. It is important to note that the list prices of IBM on their website are too high. It is a bad habit of a typical tier-one OEM.

Thankfully we managed to get some "real street prices", which are between 30% (one server) and 50% (many) lower. To that end we compared the price of the S822L with a discounted DELL R730 system. The list below is not complete, as we only show the cost of the most important components. The idea is to focus on the total system price and show which components contribute the most to the total system cost.

Xeon E7v3/POWER8 Price Comparison
Feature Dell R730 IBM S822L
  Type Price Type Price
Chassis R730 N/A S822L N/A
Processor 2x E5-2697 $5000 2x POWER8 3.42 $3000
RAM 8x 16GB
DDR4 DIMM
$2150 8x 16 GB CDIMM (DDR3) $8000
PSU 2x 1100W $500 2x 1400W $1000
Disks SATA or SSD Starting at
$200
SAS HD/SSD +/- $450
Total system price (approx.)   $10k   $15k

With more or less comparable specs, the S822L was about 50% more expensive. However, it was almost impossible to make an apples-to-apples comparison. The biggest "price issue" are the CDIMMs, which are almost 4 times as expensive as "normal" RDIMMs. CDIMMs offer more as they include an L4-cache and some extra features (such as a redundant memory chip for each 9 chips). For most typical current Xeon E5 customers, the cost issue will be important. For a few, the extra redundancy and higher bandwidth will be interesting. Less important, but still significant is the fact that IBM uses SAS disks, which increase the cost of the storage system, especially if you want lots of them.

This cost issue will be much less important on most third party POWER8 systems. Tyan's "Habanero" system for example integrates the Centaur chips on the motherboard, making the motherboard more expensive but you can use standard registered DDR3L RDIMMs, which are much cheaper. Meanwhile the POWER8 processor tends to be very reasonably priced, at around $1500. That is what Dell would charge for an Intel Xeon E5-2670 (12 cores at 2.3-2.6 GHz, 120W). So while Intel's Xeon are much more power efficient than the POWER8 chips, the latter tends to be quite a bit cheaper.

Scale-Out Big Data Benchmark: ElasticSearch Comparing Benchmarks & Closing Thoughts
Comments Locked

146 Comments

View All Comments

  • jesperfrimann - Monday, November 9, 2015 - link

    Well, I think you should kick Franz Bourlet, for not hooking you up with with a IBM technical Advocate who actually knew the technology. Such a person could have shown you the robes and helped you understand the kit better. Again Franz is a sales guy.

    IMHO selecting Ubuntu as the Linux distro, did not help you. It's new to the POWER platform and does not have the same robustness as for example SLES which have been around for 10+ years on POWER.

    The fact that you are getting better results using gcc generated code rather than xLC, shows me that something is not right.
    And that the IBM JDK isn't working is well also an indicator that something is now right.
    IMHO selecting Ubuntu, did not make Things easier for you Guys.

    And for really optimized code you need to install and use High performance math libraries for POWER (MASS), which is an addon math library.

    And AFAIR having 8 memory modules, only enables half the memory bandwidth of the system.

    So IMHO IBM didn't help you make their system look good.

    But again that is what you get when you get rid of all the clever people :)

    // Jesper
  • nils_ - Wednesday, November 11, 2015 - link

    You can always rent a box at OVH, they offer a huge chunk of an OpenPower System, albeit virtualized through Runlabs.
  • stefstef - Sunday, November 8, 2015 - link

    compared to the pentium 4 the mips r16k with loads of l3 cache was a bzip2 beast, outperforming the pentium 4 which ran at twice the clock speed and more. despite that the usage of zip programs is what these server processors are build.
  • mapesdhs - Tuesday, November 10, 2015 - link

    Just curious, do you know of any comparative results anywhere for bzip2 on old MIPS vs. other CPUs? It's not something I've seen mentioned before, at least not with respect to SGIs, but perhaps I can run som tests the next time I obtain a quad-R16K/1GHz (16MB L2) Tezro. Best I have at is only an R16K/900MHz (8MB L2) single-CPU Fuel and various configs of Tezro and Onyx350 from 4 to 16x 700MHz with 8MB L2. Just a pity SGI never got to employ multi-core MIPS (it was planned, but alas never happened).

    Oddly, back when current, MIPS' real strength was fp. Over time it fell behind badly for general int, though for SGI's core markets that didn't really matter ("It's the bandwidth, stupid!" - famous quote from Mashey IIRC). MIPS could have caught up with MDMX and MIPS V ISA, especially with the initially intended merged Cray vector stuff, but again that all fell away once the design talent moved to Intel in 1996/7.

    Ian.
  • Freen the merciless - Sunday, November 8, 2015 - link

    Heh! Sparc T5 eats Xeon and power for breakfast.
  • kgardas - Monday, November 9, 2015 - link

    I guess you mean T7 with SPARC M7 inside and not T5. If so, then yes, M7 looks quite capable, but unfortunately provides horrible price/performance ratio. POWER8 box starts at ~6.5k $ while T7-1 on ~40k $. So on SPARC front we'll need to see if Oracle is going to change that with Sonoma chip.
  • Michael Bay - Monday, November 9, 2015 - link

    In parallel only.
  • aryonoco - Tuesday, November 10, 2015 - link

    Thank you Johan for this amazingly well written and well researched article.

    I have to agree with a few people here that question your choice of using LE Ubuntu to test. Traditionally people who use Linux on POWER use SUSE, and some use RHEL, but Ubuntu? Nothing against them, and I love apt, but it's just not a mature platform.

    Try with something more representative such as BE SLES and you will find a vastly different types ecosystem maturity.

    But thanks again, and also thanks to AT for caring about such subjects and publishing these tests.
  • JohanAnandtech - Wednesday, November 11, 2015 - link

    Thank you for taking the time to write up some constructive feedback. I have years of experience with ubuntu and linux and I wanted to play it safe. Running benchmarks on "new" hardware with a new ISA (from my perspective) is pretty complex. C-ray and 7-zip are the only exceptions, but most real server apps (NAMD, ElasticSearch, Spark) depends on many layers of software.

    In theory the OS/ distro is more important to get applications working than the ISA. In practice, it might have been better to bet on the distro with the most maturity and adapt our scripts and installation procedures to Suse.

    But as soon as I get the chance, I'll try out BE suse or redhat on a POWER system.
  • mapesdhs - Tuesday, November 10, 2015 - link

    Johan,

    A minor point, please note my home page for C-ray is here:

    http://www.sgidepot.co.uk/c-ray.html

    Blinkenlights is just a mirror, and not the primary mirror either (that would be the vintagecomputers site).

    Btw, it's a pity you didn't use the same image sizes & settings as used on the main c-ray site, because then I could have included the results on my page (ie. 'sphfract' at 800x600, 1024x768 with 8X oversampling, and 7500x3500), or did you just use the same settings that Phoronix employs?

    Also, John Tsiombikas, the guy who wrote C-ray, told me some interesting things about the test and how it works (info included on the page), most especially that it is highly vulnerable to compiler optimisations which can produce results that are even less realistic than real life workloads. I'm glad thought that you did at least use the sphfract test, since at a sensible resolution or with oversampling it easily pushes the test out of just L1 (the 'scene' test is much smaller). But yeah, overall, c-ray was never intended to be used as a benchmark, it's just taken off somehow, perhaps because the scanline method of threading makes it scale very well.

    Hmm, I really must sort out the page formatting one of these days, and move the most complex test tables to the top. Never seem to find the time...

    Thanks!!

    Ian.

    PS. I always obtained the best results by having more threads than the no. of cores/CPUs, or is this something which doesn't work with non-MIPS systems?

Log in

Don't have an account? Sign up now