Scale-Out Big Data Benchmark: ElasticSearch

ElasticSearch is an open source, full text search engine that can be run on a cluster relatively easy. It's basically like an open source version of Google Search that can be deployed in an enterprise. It should be one of the poster-children of scale-out software and is one of the representatives of the so called "Big Data" technologies. Thanks to Kirth Lammens, one of the talented researchers at my lab, we have developed a benchmark that searches through all the Wikipedia content (+/- 40GB). Elasticsearch is – like many Big Data technologies – built on Java.

We are not sure why, but installing IBM's JDK caused a lot of headaches. For some reason the JVM stopped working in the middle of our tests. We got the same behavior running Apache Spark. This could be a result of our lack of experience with the IBM JDK, or the fact that the Linux LE ecosytem is still young. To cut a long story short, we ended up useing OpenJDK 8, which is part of the Ubuntu 15.04 distribution. OpenJDK is very similar to and based upon the same code as Oracle's HotSpot JDK.

We limited the systems to one socket to avoid the issues associated with garbage collection pauses and other scaling issues. There is reason why many Java benchmarks on these massive machines are using multiple JVMs.

Elastic Search

Although the POWER8 can probably perform a bit better with the IBM JDK, performance is in the same league as the best Xeons. Meanwhile as a further point of comparison we also included the score of the Xeon D from our previous article.

Database Performance: MySQL Energy and Pricing
Comments Locked

146 Comments

View All Comments

  • JohanAnandtech - Saturday, November 7, 2015 - link

    suggestions on how to to do this? OpenSSL 1.02 will support the build in crypto accelerator, but I am not sure on how I would be able to see if the crypto code uses VMX.
  • SarahKerrigan - Monday, November 9, 2015 - link

    Compile with -qreport in XL C/C++.
  • Oxford Guy - Saturday, November 7, 2015 - link

    Typo on page 2:

    The resuls are that Google is supporting the efforts and Rackspace has even build their own OpenPOWER server called "Barreleye".
  • Ryan Smith - Saturday, November 7, 2015 - link

    Thanks.
  • iwod - Saturday, November 7, 2015 - link

    In terms of 100, POWER Software Ecosystem manage to scale from 10 to 20, so that is a 100% increase but still very very low. Will we see POWER CPU / Server that is cheap enough to compete with Xeon E3 / E5, where most of the volume are? Compared to E7 is like comparing Server CPU for the 10% of the market.

    Intel will be moving to 14nm E7, I don't see anyone making POWER CPU at 14nm anytime soon.

    Intel DC business are growing, and it desperately need a competitor, such as POWER to combat E7 and AMD Zen from the bottom.
  • Frenetic Pony - Saturday, November 7, 2015 - link

    Nice review! It just confirms my question however of "What does IBM do?" Seriously, what do they do anymore? All I see are headlines for things that never come out as actual products. Their servers suck up too much power per watt, they don't have their own semi conductor foundries, their semi conductor research seems like a bunch of useless paper tiger stuff, their much vaunted AI is better at playing Jeapordy than seemingly any real world use.

    Countdown to complete IBM bankruptcy/spinoff/selloff is closer than ever.
  • ws3 - Saturday, November 7, 2015 - link

    Since the dawn of computing, IBM has been in the business of providing solutions, rather than merely hardware. When you buy IBM you pay a huge amount of money, and what you get for that is support, with some hardware thrown in.

    Obviously this only appeals to wealthy customers who don't have or don't want to have an internal support organization that can duplicate what IBM offers. It seems to me that the number of such customers is decreasing over time, but as long as the US government is around, IBM will have at least one customer.
  • xype - Sunday, November 8, 2015 - link

    They make 2-5 Billion dollars of profit per quarter. "Countdown to complete IBM bankruptcy/spinoff/selloff is closer than ever." my ass.
  • PowerTrumps - Sunday, November 8, 2015 - link

    Pretty fair and even handed review; don't agree with it all and definitely feel there is room to learn and improve. Btw, full disclosure, I am a System Architect focusing on Power technology for a Business Partner.

    With regard to compilers I would suggest IBM's SDK for Linux on Power & Advanced Tool Chain (ATC) provide development tools and open source optimized dev stack (ie gcc) for POWER8. Details at: https://www-304.ibm.com/webapp/set2/sas/f/lopdiags... and https://www.ibm.com/developerworks/community/wikis...

    MySQL is definitely relevant but with the new Linux distro's packaging MariaDB in place of MySQL I would have liked to see an Intel vs Power comparison with this MySQL alternative. MariaDB just announced v10.1 is delivering over 1M queries per second on POWER8. https://blog.mariadb.org/10-1-mio-qps/

    A commenter asked about Spark with POWER8. This blog discusses how it performs vs Intel. https://www.ibm.com/developerworks/community/blogs...

    In addition to the commercial benchmarks often quoted such as SPEC, SAP and TPC like this SAP HANA result with SUSE on POWER8 ; SAP BW-EML (ie HANA) shows tremendous scaling with POWER8. http://www.smartercomputingblog.com/power-systems/... many of the ISV's have produced their own. I have seen results for PostgreSQL, STAC (http://financial.mcobject.com/press-release-novemb... Redis Labs, etc.

    Benchmarks are great, all vendors do them and most people realize you should take them with a grain of salt. One benefit of Power servers when using PowerVM, its native firmware based hypervisor is that it delivers tremendous compute efficiency to VM's. On paper things like TDP seem higher for Power vs Intel (especially E5_v3 chips) but when Power servers deliver consolidation ratio's with 2-4X (and greater) more VM's per core the TCA & TCO get real interesting. One person commented how SAP on Power would blow out a budget. It does just the opposite because how you can run in a Tier-2 architecture obtaining intra-server VM to VM efficiencies, compute efficiencies with fewer cores & servers which impacts everything in the datacenter. Add in increased reliability & serviceability features and you touch the servers less which means your business is running longer.

    And for more details on the open platform or those based on the OpenPOWER derivative using the "LC" designator such as S822LC in contrast to the S822L used as the focus in this article. http://www.smartercomputingblog.com/power-systems/... and http://businesspartnervoices.com/ibm-power-systems...
  • JohanAnandtech - Sunday, November 8, 2015 - link

    Great feedback. We hope to get access to another POWER8(+) server and build further upon our existing knowledge. We have real world experience with Spark, so it is definitely on the list. The blog you linked seems to have used specific SPARK optimization for POWER, but the x86 reference system looks a bit "neglected". A real independent test would be very valuable there. The interesting part of Spark is that a good benchmark would be also very relevant for the real world as peak performance is one of the most important aspects of Spark, in contrast with databases where maximum performance is only a very small part of the experience.

    About MySQL, people have pointed out that the 5.7 version seems to scale a lot better, so that is together with MariaDB also on my "to test" list. Redis does not seem relevant for this kind of machine, it is single-threaded, almost impossible to test 160 instances.

    The virtualization part is indeed one of the most interesting parts, but it is a benchmarking nightmare. You got to keep response times at more or less the same levels while loading the machine with more and more VMs. We did that kind of testing until 2 years ago on x86, but it was very time consuming and we had a deep understanding on how vSphere worked. Building that kind of knowledge on PowerVM might be beyond our manpower and time :-).

Log in

Don't have an account? Sign up now