Scale-Out Big Data Benchmark: ElasticSearch

ElasticSearch is an open source, full text search engine that can be run on a cluster relatively easy. It's basically like an open source version of Google Search that can be deployed in an enterprise. It should be one of the poster-children of scale-out software and is one of the representatives of the so called "Big Data" technologies. Thanks to Kirth Lammens, one of the talented researchers at my lab, we have developed a benchmark that searches through all the Wikipedia content (+/- 40GB). Elasticsearch is – like many Big Data technologies – built on Java.

We are not sure why, but installing IBM's JDK caused a lot of headaches. For some reason the JVM stopped working in the middle of our tests. We got the same behavior running Apache Spark. This could be a result of our lack of experience with the IBM JDK, or the fact that the Linux LE ecosytem is still young. To cut a long story short, we ended up useing OpenJDK 8, which is part of the Ubuntu 15.04 distribution. OpenJDK is very similar to and based upon the same code as Oracle's HotSpot JDK.

We limited the systems to one socket to avoid the issues associated with garbage collection pauses and other scaling issues. There is reason why many Java benchmarks on these massive machines are using multiple JVMs.

Elastic Search

Although the POWER8 can probably perform a bit better with the IBM JDK, performance is in the same league as the best Xeons. Meanwhile as a further point of comparison we also included the score of the Xeon D from our previous article.

Database Performance: MySQL Energy and Pricing
Comments Locked

146 Comments

View All Comments

  • extide - Friday, November 6, 2015 - link

    No he meant that in a lot of the european countries they use the dot as a comma, so it would be 50.000 to mean 50 thousand.
  • Murloc - Sunday, November 8, 2015 - link

    the international system dictates that , and . are the same thing, and as a separator you should use a space.
    In many countries in Europe, ' is also used. That's fine too as there is no ambiguity.
    Using . and , for anything that is not the decimal separator in international websites just creates confusion imho.
    I guess AT doesn't have a style book though.
  • duploxxx - Friday, November 6, 2015 - link

    nice review.
    but Xeon is not 95% of the market. AMD is still just a bit above 5% on its own. so it deserves a bit salt :) not to mention the fact that competition is good for all of us. if reviewers continue like this all narrowed readers will think there is no competition.
  • silverblue - Friday, November 6, 2015 - link

    I'm left wondering what a Steamroller-based 16+ core CPU would do here, considering multithreading is better than with previous models. Yes, the Xeons have a large single-threading lead, but more cores = good in the server world, not to mention that such a CPU would severely undercut the price of the competition.

    Shame it isn't ever going to happen!
  • lmcd - Friday, November 6, 2015 - link

    Or even an Excavator! It's a shame AMD didn't just keep Bulldozer developing internally until at least Piledriver, and iterate on Thuban.
  • Kevin G - Saturday, November 7, 2015 - link

    AMD killed off both Streamroller and Excavator chips early on as the Bulldozer and Piledriver chips weren't as competitive. More importantly, OEMs simply were not interested even if those parts were upgrades based upon existing designs. Thus the great AMD server drought began as they effectively have left that market and are hoping for a return with Zen.

    Also I should point out that Seattle, AMD's first ARM based Opteron has yet to arrive. This was supposed to be out a year ago and keep AMD's server business going throughout 2015 during the wait for Zen and K12 in 2016. Well K12 has already been delayed into 2017 and Seattle is no where to be found in commercial systems (there are a handle of Seattle developer boards).
  • JoeMonco - Saturday, November 7, 2015 - link

    When you account for only 5% of the market while the other side commands 95%, you aren't really much of a credible competitor.
  • xype - Sunday, November 8, 2015 - link

    That’s not always correct, though. You can have 5% of the market and 20% of the profits, for example, which would put you in a way better position than your competitors (because only a small increase in market share would pay big time).
  • Murloc - Sunday, November 8, 2015 - link

    that applies more to consumer products, e.g. apple.
  • dgingeri - Friday, November 6, 2015 - link

    I've been dealing with IBM Power based machines for 5 years now. Such experience has only given me a major disdain for AIX.

    I do NOT advise it for anyone. It sucks to work on. There is a certain consistent, spartan logic to it, but it is difficult to learn, and learning materials are EXTREMELY expensive. I never liked the idea of paying $12,000 for a one week class that taught me barely a tenth of what I needed to know to run an AIX network. (My company paid for the class, but I could not get them to pay for the rest of them, for some reason.) This makes people who can support AIX extremely expensive to employ. Figure on paying twice the rate of a Windows admin in order to employ an AIX admin. Then there is the massive expense of maintenance agreements. Even the software only maintenance agreement, just to get patches for AIX, is $4000 per year per system. They may be competitive in cost up front, but they drain money like vampires to maintain.

    Even the most modern IBM Power based machine takes 20-30 minutes to reboot or power up due to POST diagnostics. That alone is annoying enough to make me avoid AIX as much as I can.

Log in

Don't have an account? Sign up now