Web Server Performance

Writing about micro and entry level servers without a website benchmark would be unforgiveable. Most websites are based on the LAMP stack: Linux, Apache, MySQL, and PHP. Few people write html/PHP code from scratch these days, so we turned to a Drupal 7.21 based site. The web server is Apache 2.4.7 and the database is MySQL 5.5.38 on top of Ubuntu 14.04 LTS.

Drupal powers massive sites (e.g. The Economist and MTV Europe) and has a reputation of being a hardware resource hog. That is a price more and more developers happily pay for lowering the time to market of their work. We tested the Drupal website with our vApus stress testing framework and increased the number of connections from 5 to 300.

We report the maximum throughput achievable with 95% percent of request being handled faster than 1000 ms. Notice that these numbers are not comparable to the ones in the last Xeon E5 server review, where we measured throughput at 100 ms. We assume that if you deploy a full LAMP stack on micro servers, your first requirement is cost efficiency and not the lowest response time at all times. If you do require the lowest response time, it is a best practice to only deploy the front-end of your web server on such a server. We are looking into developing such a real-world benchmark for a later review.

Drupal Website

As the website load is a very bumpy curve with very short peaks of high CPU load and lots of lows, the Xeon E3-1200s operate at relatively high frequencies. Website workloads work well with Hyper-Threading as the low instruction level parallelism in one thread leaves a lot of headroom for another thread. Hyper-Threading delivers in this environment: the 8-thread Xeon E3-1265L v2 at 2.5-3.4GHz is quite a bit faster than the Xeon E3-1220 v2 at 3.1-3.3GHz.

We really wonder if anyone ever bought an Atom Saltwell based server of SeaMicro or HP to run web workloads. Those customers were either very brave or very naive; notice how the Xeon E3 is roughly 10 times faster (and as much as 17X faster)!

The Atom C2750 still performs rather poorly and sustain only about 42% of the requests of the Xeon E3-1230L. We suspect that the lack of an L3 cache that allows cores to sync threads quickly is one of the culprits. The MySQL back-end is included in this web benchmark, and this is one of the reasons that our benchmark prefers the Xeon E3. The X-Gene does not benefit much from the rather slow L3 cache and performs more or less like the Atom C2750.

Do not overestimate the effect of including the MySQL backend in our benchmark however. MySQL consumes about 20% of the CPU cycles. There is no denying that high clock speeds and simultaneous multi-threading are a very powerful mix to handle web requests.

According to some academic studies, the Atom C2750 should do better in typical scale-out software such as web search, web front-ends, and media streaming, where no syncing between threads is necessary.

Java Server Performance MySQL Performance: Sysbench
Comments Locked

47 Comments

View All Comments

  • IBleedOrange - Monday, March 9, 2015 - link

    EETimes is wrong.
    Google "Intel Denverton"
  • beginner99 - Monday, March 9, 2015 - link

    Maybe it would be good to mention the X-Gene is made on a 40nm process at the start of the article. I read the article and think for myself that the X-Gene is crap and in the end you get the explanation. It's on 40 nm vs Atoms on Intel 22 nm. It's a huge difference and currently the article is a bit misleading eg. shining a bad light on X-Gene and ARM. (And I say this even though I always was a proponent of Intel Big cores in almost all server applications).
  • Stephen Barrett - Monday, March 9, 2015 - link

    If APM had a newer part to test then we would have tested it. XG2 is simply not out yet. So the fact that APM has their flagship SoC on an older process is not misleading... Its the facts. The currently available Intel parts have a process advantage.
  • warreo - Monday, March 9, 2015 - link

    Mentioning it at the start would be good from a technical disclosure standpoint, but I'm not sure for the purposes of this article it truly matters. The article is comparing what is currently available now from APM and Intel. Reality is Intel will likely have a significant process advantage for the foreseeable future, and if you wanted to see a like for like comparison on a process basis, then you'll probably need to wait 2-3 years for X-Gene to get on 22nm, meanwhile Intel will have moved on to 10nm.
  • CajunArson - Monday, March 9, 2015 - link

    The 40nm process is only really relevant when it comes to the power-consumption comparisons.
    A 28nm.. or 20nm or 16nm... part with the same cores at the same clockspeeds will register the exact same level of performance. The only difference will be that the smaller lithographic processes should provide that level of performance in a smaller power envelope.
  • JohanAnandtech - Monday, March 9, 2015 - link

    well, with so much time invested in an article, I always hope people will read the pages between page 1 and 18 too :-p. It is mentioned in the overview of the SoCs on page 5 and quite a few times at other pages too.
  • colinstu - Monday, March 9, 2015 - link

    what server is on the bottom of the first page?
  • JohanAnandtech - Monday, March 9, 2015 - link

    A very old MSI server :-). Just to show people what webfarms used before the micro server era.
  • Samus - Monday, March 9, 2015 - link

    I use the Xeon E3-1230v3 in desktop applications all the time. It's basically an i7 for the price of an i5.

    And a lot of IT dept dump them on eBay cheap when they upgrade their servers. They can be had well under $200 lightly used. The 80w TDP could theoretically have some drawbacks for boost time, but the real-world performance according to passmark elongated tests doesn't seem to show any difference between it's boost potential and that of an 88w i7-k

    Great CPU's.
  • Alone-in-the-net - Monday, March 9, 2015 - link

    In both your compilers, you need to specify the -march=native so the the compiler can optimize for the architecture you are running on, -o3 is not enough. This enables the compiler to use cpu specific commands.

Log in

Don't have an account? Sign up now