Web Server Performance

Websites based on the LAMP stack - Linux, Apache, MySQL, and PHP - are very popular. Few people write html/PHP code from scratch these days, so we turned to a Drupal 7.21 based site. The web server is Apache 2.4.7 and the database is MySQL 5.5.38 on top of Ubuntu 14.04 LTS.

Drupal powers massive sites (e.g. The Economist and MTV Europe) and has a reputation of being a hardware resource hog. That is a price more and more developers happily pay for lowering the time to market of their work. We tested the Drupal website with our vApus stress testing framework and increased the number of connections from 5 to 300.

We report the maximum throughput achievable with 95% percent of request being handled faster than 1000 ms. 

Drupal Website

Let us be honest: the graph above is not telling you everything. The truth is that, on the Xeon D and Xeon E5, we ran into several other bottlenecks (OS and Database related) before we ever could measure a 1000 ms 95th percentile response time. So the actual throughput at 1 second response time is higher.

Basically, the performance of the Xeon D and Xeon E5 was too high for our current benchmark setup. Let us zoom in a bit to get a more accurate picture. The picture below shows you the 95th percentile of the response time (Y-axis) versus the amount of concurrent requests/users (X-axis). We did not show the results of the Atom C2750 beyond 200 req/s to keep the graph readable.  

We warm up the machine with 5 concurrent requests, but that is not enough for some servers. Notice that the response time of the Xeon D between 50 and 200 requests per second is lower than at 25 request per second. So let us start our analyses at 50 request per second. 

The Xeon E3-1230L clock speed fluctuates between 1.8, 2.3 and 2.8 GHz. It is amazing low power chip, but you pay a price: the 95th percentile never goes below 100 ms. The highly clocked Xeon E3s like the 1240 keeps the response time below 100 ms unless your website is hit more than 100 times per second. 

The Xeon D once again delivers astonishing performance. Unless the load is more than 200 concurrent requests per second, the server responds within 100 ms. There is more. Imagine that you want to keep your 95th percentile. response time below half a second. With a previous generation Xeon E3, even the 80W chip will hit that limit at around 200-250 requests per second. The Xeon D sustains about 800 (!) requests per second (not shown on graph) before a small percentage of the users will experience that response time.  In other words, you can sustain up to 4 times as manyhits with the Xeon D-1540 compared to the E3.   

Java Server Performance ElasticSearch
POST A COMMENT

90 Comments

View All Comments

  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Hi Patrick, the base clock of our chip is 2 GHz, not 1.9 GHz as the one pre-production version that we got from Intel. I have to check the turboclocks though, but I do believe we have measured 2.6 GHz. I'll doublecheck. Reply
  • pjkenned - Wednesday, June 24, 2015 - link

    Awesome! Our ES ones were 1.9GHz. Reply
  • Chrisrodinis1 - Tuesday, June 23, 2015 - link

    For comparison, this server uses Xeon's. It is the HP Proliant BL460c G9 blade server: https://www.youtube.com/watch?v=0s_w8JVmvf0 Reply
  • MrDiSante - Wednesday, June 24, 2015 - link

    Why use only -O2 when compiling the benchmarks? I would imagine that in order to squeeze out every last bit of performance, all production software is compiled with all optimizations turned up to 11. I noticed that their github uses -O2 as an example - is it that TinyMemBenchmark just doesn't play nice with -O3? Reply
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    The standard makefile had no optimization whatsoever. If you want to measure latency, you do not want maximum performance but rather accuracy, so I played it safe and used -O2. I am not convinced that all production software is optimized with all optimization turned on. Reply
  • diediealldie - Wednesday, June 24, 2015 - link

    Intel seems disARMing them... X-Gene 2 doesn't look so promising, as they'll have to fight mighty Skylake-based Xeons, not Broadwell ones.

    Thanks for great article again.
    Reply
  • jfallen - Wednesday, June 24, 2015 - link

    Thanks Johan for the great article. I'm a tech enthusiast, and will never buy or use one of these. But it makes great reading and I appreciate the time you take to research and write the article.

    Regards
    Jordan
    Reply
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Happy to read this! :-) Reply
  • TomWomack - Wednesday, June 24, 2015 - link

    This looks very much consistent with my experience; the disconcertingly high idle power (I looked at the board with a thermal camera; the hot chips were the gigabit PHY, the inductors for the power supply, and the AST2400 management chip), the surprisingly good memory performance, the fairly hot SoC (running sixteen threads of number-crunching I get a power draw of 83W at the plug) and the generally pretty good computation.

    I'm not entirely sure it was a better buy for my use case than a significantly cheaper 6-core Haswell E - Haswell E is not that hot, electricity not that expensive, and from my supplier the X10SDV-F board and memory were £929 whilst Scan get me an i7-5820K board, CPU and memory for £702. And four-channel DDR4 probably is usefully faster than two-channel for what I do.

    I quite strongly don't believe in server mystique - the outbuilding is big enough that I run out of power before I run out of space for micro-ATX cases, and I am lucky enough to be doing calculations which are self-checking to the point that ECC is a waste of money.
    Reply
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    Hi Tom, I believe we saw up to 90 Watt at the wall when running OpenFOAM (10 Gbit enabled). It is however less relevant for such a chip which is not meant to be a HPC chip as we have shown in the article. HPC really screams for an E5. Reply

Log in

Don't have an account? Sign up now