Web Server Performance

Websites based on the LAMP stack - Linux, Apache, MySQL, and PHP - are very popular. Few people write html/PHP code from scratch these days, so we turned to a Drupal 7.21 based site. The web server is Apache 2.4.7 and the database is MySQL 5.5.38 on top of Ubuntu 14.04 LTS.

Drupal powers massive sites (e.g. The Economist and MTV Europe) and has a reputation of being a hardware resource hog. That is a price more and more developers happily pay for lowering the time to market of their work. We tested the Drupal website with our vApus stress testing framework and increased the number of connections from 5 to 300.

We report the maximum throughput achievable with 95% percent of request being handled faster than 1000 ms. 

Drupal Website

Let us be honest: the graph above is not telling you everything. The truth is that, on the Xeon D and Xeon E5, we ran into several other bottlenecks (OS and Database related) before we ever could measure a 1000 ms 95th percentile response time. So the actual throughput at 1 second response time is higher.

Basically, the performance of the Xeon D and Xeon E5 was too high for our current benchmark setup. Let us zoom in a bit to get a more accurate picture. The picture below shows you the 95th percentile of the response time (Y-axis) versus the amount of concurrent requests/users (X-axis). We did not show the results of the Atom C2750 beyond 200 req/s to keep the graph readable.  

We warm up the machine with 5 concurrent requests, but that is not enough for some servers. Notice that the response time of the Xeon D between 50 and 200 requests per second is lower than at 25 request per second. So let us start our analyses at 50 request per second. 

The Xeon E3-1230L clock speed fluctuates between 1.8, 2.3 and 2.8 GHz. It is amazing low power chip, but you pay a price: the 95th percentile never goes below 100 ms. The highly clocked Xeon E3s like the 1240 keeps the response time below 100 ms unless your website is hit more than 100 times per second. 

The Xeon D once again delivers astonishing performance. Unless the load is more than 200 concurrent requests per second, the server responds within 100 ms. There is more. Imagine that you want to keep your 95th percentile. response time below half a second. With a previous generation Xeon E3, even the 80W chip will hit that limit at around 200-250 requests per second. The Xeon D sustains about 800 (!) requests per second (not shown on graph) before a small percentage of the users will experience that response time.  In other words, you can sustain up to 4 times as manyhits with the Xeon D-1540 compared to the E3.   

Java Server Performance ElasticSearch
POST A COMMENT

90 Comments

View All Comments

  • extide - Tuesday, June 23, 2015 - link

    That's ECC Registered, -- not sure if it will take that, but probably, although you dont need registered, or ECC. Reply
  • nils_ - Wednesday, June 24, 2015 - link

    If you want transcoding, you might want to look at the Xeon E3 v4 series instead, which come with Iris Pro graphics. Should be a lot more efficient. Reply
  • bernstein - Thursday, June 25, 2015 - link

    for using ECC UDIMMs, a cheaper option would be an i3 in a xeon e3 board. Reply
  • psurge - Tuesday, June 23, 2015 - link

    Has Intel discussed their Xeon-D roadmap at all? I'm wondering in particular if 2x25GbE is coming, whether we can expect a SOC with higher clock-speed or more cores (at a higher TDP), and what the timeframe is for Skylake based cores. Reply
  • nils_ - Tuesday, June 23, 2015 - link

    Is 25GbE even a standard? I've heard about 40GbE and even 56GbE (matching infiniband), but not 25. Reply
  • psurge - Tuesday, June 23, 2015 - link

    It's supposed be a more cost effective speed upgrade to 10GbE than 40GbE (it uses a single 25Gb/s serdes lane, as used in 100GbE, vs 4 10Gb/s lanes), and IIRC is being pushed by large datacenter shops like Google and Microsoft. There's more info at http://25gethernet.org/. I'm not sure where things are in the standardization process. Reply
  • nils_ - Wednesday, June 24, 2015 - link

    It also has an interesting property when it comes to using a breakout cable of sorts, you could connect 4 servers to 1 100GbE port (this is already possible with 40GbE which can be split into 4x10GbE). Reply
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Considering that the Xeon D must find a home in low power high density servers, I think dual 10 Gbit will be standard for a while. Any idea what 25/40 Gbit PHY would consume? Those 10 Gbit PHYs already need 3 Watt in idle, probably around 6-8W at full speed. That is a large chunk of the power budget in a micro/scale out server. Reply
  • psurge - Wednesday, June 24, 2015 - link

    No I don't, sorry. But, I thought SFP+ with SR optics (10GBASE-SR) was < 1W per port, and that SFP+ direct attach (10GBASE-CR) was not far behind? 10GBASE-T is a power hog... Reply
  • pjkenned - Tuesday, June 23, 2015 - link

    Hey Johan - just re-read. A few quick thoughts:
    First off - great piece. You do awesome work. (This is Patrick @ ServeTheHome.com btw)

    Second - one thing should probably be a bit clearer - you were not using a Xeon D-1540. It was a ES Broadwell-DE version at 2.0GHz. The shipping product has 100MHz higher clocks on both base and max turbo. I did see a 5% or so performance bump from the first ES version we tested to the shipping parts. The 2.0GHz parts are really close to shipping spec though. One both of my pre-release Xeon D and all of the post-release Xeon D systems was nearly identical.

    Those will not change your conclusions but does make the actual Intel Xeon D-1540 a bit better than the one you tested. LMK if you want me to set aside some time on a full speed version on a Xeon D-1540 system for you.
    Reply

Log in

Don't have an account? Sign up now