Web Server Performance

Websites based on the LAMP stack - Linux, Apache, MySQL, and PHP - are very popular. Few people write html/PHP code from scratch these days, so we turned to a Drupal 7.21 based site. The web server is Apache 2.4.7 and the database is MySQL 5.5.38 on top of Ubuntu 14.04 LTS.

Drupal powers massive sites (e.g. The Economist and MTV Europe) and has a reputation of being a hardware resource hog. That is a price more and more developers happily pay for lowering the time to market of their work. We tested the Drupal website with our vApus stress testing framework and increased the number of connections from 5 to 300.

We report the maximum throughput achievable with 95% percent of request being handled faster than 1000 ms. 

Drupal Website

Let us be honest: the graph above is not telling you everything. The truth is that, on the Xeon D and Xeon E5, we ran into several other bottlenecks (OS and Database related) before we ever could measure a 1000 ms 95th percentile response time. So the actual throughput at 1 second response time is higher.

Basically, the performance of the Xeon D and Xeon E5 was too high for our current benchmark setup. Let us zoom in a bit to get a more accurate picture. The picture below shows you the 95th percentile of the response time (Y-axis) versus the amount of concurrent requests/users (X-axis). We did not show the results of the Atom C2750 beyond 200 req/s to keep the graph readable.  

We warm up the machine with 5 concurrent requests, but that is not enough for some servers. Notice that the response time of the Xeon D between 50 and 200 requests per second is lower than at 25 request per second. So let us start our analyses at 50 request per second. 

The Xeon E3-1230L clock speed fluctuates between 1.8, 2.3 and 2.8 GHz. It is amazing low power chip, but you pay a price: the 95th percentile never goes below 100 ms. The highly clocked Xeon E3s like the 1240 keeps the response time below 100 ms unless your website is hit more than 100 times per second. 

The Xeon D once again delivers astonishing performance. Unless the load is more than 200 concurrent requests per second, the server responds within 100 ms. There is more. Imagine that you want to keep your 95th percentile. response time below half a second. With a previous generation Xeon E3, even the 80W chip will hit that limit at around 200-250 requests per second. The Xeon D sustains about 800 (!) requests per second (not shown on graph) before a small percentage of the users will experience that response time.  In other words, you can sustain up to 4 times as manyhits with the Xeon D-1540 compared to the E3.   

Java Server Performance ElasticSearch
Comments Locked

90 Comments

View All Comments

  • Flunk - Tuesday, June 23, 2015 - link

    Yes, but it's still bad marketing. -D is associated with inferior, overly hot, bad performing Intel chips.
  • IanHagen - Tuesday, June 23, 2015 - link

    Certainly. From a marketing standpoint it's a pretty poor choice. I agree with wussupi, E4 would haven been a far better name.
  • karpodiem - Tuesday, June 23, 2015 - link

    does anyone know where to buy these online? I'm looking for just the board/processor, model # 'X10SDV-TLN4F'

    All these random/small Supermicro resellers are selling it now, based on some Google searches. They're marking it up in price by at least a hundred bucks, because availability is limited. Anyone know when Newegg might get it in stock?

    Looking to do a FreeNAS build - this board + IBM M1015 card in an ATX motherboard (6x4TB drives in RAIDZ2).
  • ats - Tuesday, June 23, 2015 - link

    The TLN4F is the one in most demand and almost no place is able to keep it in stock. There are multiple places that will order it for you for ~1K but wait times can be anywhere from 1 week to 1 month.
  • Jon Tseng - Tuesday, June 23, 2015 - link

    > And the reality is that the current SoCs with an ARM ISA do not deliver the necessary per core
    > performance: they are still micro server SoCs, at best competing with the Atom C2750. So
    > currently, there is no ARM SoC competition in the scale out market until something better than
    > the A57 hits the market for these big players.

    Dude... You really want to have a look at the latest ThunderX parts or the X-Gene 16nm shrinks before you start making unwise statements like that. These aren't waiting around for A57 they are custom ARM architecture designs. Per core performance might not be as hot as Xeon but once you start to throw 48 cores on a die I wouldn't quite call that "at best competing with Avaton".
  • smoohta - Tuesday, June 23, 2015 - link

    Link to reviews?
  • ats - Tuesday, June 23, 2015 - link

    X-Gene is in the article, any further shrinks are still entirely vapor. ThunderX isn't currently available is is likely to have significantly worse per core performance than Atom C2k series and worse than A57. All the cores in the world don't do jack if the ST isn't there. And ST performance IS a barrier even in scale out. For general scale out, C2750 was found fairly wanting because of the ST performance, and neither X-Gene nor ThunderX even compete with C2750 in ST performance... QED.
  • mczak - Tuesday, June 23, 2015 - link

    He said "currently". The X-Gene 16nm cores might offer some competition who knows - but those are X-Gene 3 whereas you can't even buy anything with X-Gene 2 28nm ones right now... Likewise, ThunderX servers have been announced, but I haven't seen any reviews yet.
  • name99 - Tuesday, June 23, 2015 - link

    Look at the ThunderX parts HOW? Cavium releases fsck-all information about them. No-one knows if they are even OoO, how wide they are, etc.
    Yes, there are 48 cores on a SoC; and presumably they will do well for tasks like memcached that like lots of low-performance parallelism. But right now, we have ZERO evidence that a ThunderX part is a better single-threaded core than A57, let alone that it's comparable to Broadwell.
  • der - Tuesday, June 23, 2015 - link

    NOICE FAM!

Log in

Don't have an account? Sign up now