Web Server Performance

Writing about micro and entry level servers without a website benchmark would be unforgiveable. Most websites are based on the LAMP stack: Linux, Apache, MySQL, and PHP. Few people write html/PHP code from scratch these days, so we turned to a Drupal 7.21 based site. The web server is Apache 2.4.7 and the database is MySQL 5.5.38 on top of Ubuntu 14.04 LTS.

Drupal powers massive sites (e.g. The Economist and MTV Europe) and has a reputation of being a hardware resource hog. That is a price more and more developers happily pay for lowering the time to market of their work. We tested the Drupal website with our vApus stress testing framework and increased the number of connections from 5 to 300.

We report the maximum throughput achievable with 95% percent of request being handled faster than 1000 ms. Notice that these numbers are not comparable to the ones in the last Xeon E5 server review, where we measured throughput at 100 ms. We assume that if you deploy a full LAMP stack on micro servers, your first requirement is cost efficiency and not the lowest response time at all times. If you do require the lowest response time, it is a best practice to only deploy the front-end of your web server on such a server. We are looking into developing such a real-world benchmark for a later review.

Drupal Website

As the website load is a very bumpy curve with very short peaks of high CPU load and lots of lows, the Xeon E3-1200s operate at relatively high frequencies. Website workloads work well with Hyper-Threading as the low instruction level parallelism in one thread leaves a lot of headroom for another thread. Hyper-Threading delivers in this environment: the 8-thread Xeon E3-1265L v2 at 2.5-3.4GHz is quite a bit faster than the Xeon E3-1220 v2 at 3.1-3.3GHz.

We really wonder if anyone ever bought an Atom Saltwell based server of SeaMicro or HP to run web workloads. Those customers were either very brave or very naive; notice how the Xeon E3 is roughly 10 times faster (and as much as 17X faster)!

The Atom C2750 still performs rather poorly and sustain only about 42% of the requests of the Xeon E3-1230L. We suspect that the lack of an L3 cache that allows cores to sync threads quickly is one of the culprits. The MySQL back-end is included in this web benchmark, and this is one of the reasons that our benchmark prefers the Xeon E3. The X-Gene does not benefit much from the rather slow L3 cache and performs more or less like the Atom C2750.

Do not overestimate the effect of including the MySQL backend in our benchmark however. MySQL consumes about 20% of the CPU cycles. There is no denying that high clock speeds and simultaneous multi-threading are a very powerful mix to handle web requests.

According to some academic studies, the Atom C2750 should do better in typical scale-out software such as web search, web front-ends, and media streaming, where no syncing between threads is necessary.

Java Server Performance MySQL Performance: Sysbench
Comments Locked

47 Comments

View All Comments

  • JohanAnandtech - Tuesday, March 10, 2015 - link

    Thanks! It is been a long journey to get all the necessary tests done on different pieces of hardware and it is definitely not complete, but at least we were able to quantify a lot of paper specs. (25 W TDP of Xeon E3, 20W Atom, X-Gene performance etc.)
  • enzotiger - Tuesday, March 10, 2015 - link

    SeaMicro focused on density, capacity, and bandwidth.

    How did you come to that statement? Have you ever benchmark (or even play with) any SeaMicro server? What capacity or bandwidth are you referring to? Are you aware of their plan down the road? Did you read AMD's Q4 earning report?

    BTW, AMD doesn't call their server as micro-server anymore. They use the term dense server.
  • Peculiar - Tuesday, March 10, 2015 - link

    Johan, I would also like to congratulate you on a well written and thorough examination of subject matter that is not widely evaluated.

    That being said, I do have some questions concerning the performance/watt calculations. Mainly, I'm concerned as to why you are adding the idle power of the CPUs in order to obtain the "Power SoC" value. The Power Delta should take into account the difference between the load power and the idle power and therefore you should end up with the power consumed by the CPU in isolation. I can see why you would add in the chipset power since some of the devices are SoCs and do no require a chipset and some are not. However, I do not understand the methodology in adding the idle power back into the Delta value. It seems that you are adding the load power of the CPU to the idle power of the CPU and that is partially why you have the conclusion that they are exceeding their TDPs (not to mention the fact that the chipset should have its own TDP separate from the CPU).

    Also, if one were to get nit picky on the power measurements, it is unclear if the load power measurement is peak, average, or both. I would assume that the power consumed by the CPUs may not be constant since you state that "the website load is a very bumpy curve with very short peaks of high CPU load and lots of lows." If possible, it may be more beneficial to measure the energy consumed over the duration of the test.
  • JohanAnandtech - Wednesday, March 11, 2015 - link

    Thanks for the encouragement. About your concerns about the perf/watt calculations. Power delta = average power (high web load measured at 95% percentile = 1 s, an average of about 2 minutes) - idle power. Since idle power = total idle of node, it contains also the idle power of the SoC. So you must add it to get the power of the SoC. If you still have doubts, feel free to mail me.
  • jdvorak - Friday, March 13, 2015 - link

    The approach looks absolutely sound to me. The idle power will be drawn in any case, so it makes sense to add it in the calculation. Perhaps it would also be interesting to compare the power consumed by the differents systems at the same load levels, such as 100 req/s, 200 req/s, ... (clearly, some higher loads will not be achievable by all of them).

    Johan, thanks a lot for this excellent, very informative article! I can imagine how much work has gone into it.
  • nafhan - Wednesday, March 11, 2015 - link

    If these had 10gbit - instead of gbit - NICs, these things could do some interesting stuff with virtual SANs. I'd feel hesitant shuttling storage data over my primary network connection without some additional speed, though.

    Looking at that moonshot machine, for instance: 45 x 480 SSD's is a decent sized little SAN in a box if you could share most of that storage amongst the whole moonshot cluster.

    Anyway, with all the stuff happening in the virtual SAN space, I'm sure someone is working on that.
  • Casper42 - Wednesday, April 15, 2015 - link

    Johan, do you have a full Moonshot 1500 chassis for your testing? Or are you using a PONK?

Log in

Don't have an account? Sign up now