Drupal Website: Performance per Watt

When we reviewed the Xeon E5-2600 v2, a performance per watt comparison was much more straightforward. Now we are faced with two different 2U-servers, with many similarities but also with some noteworthy differences. For example, the E5-2600 v3 based server is outfitted with six fans that can pull 1.6A, while our Xeon E5-2600 v2/v1 based server is outfitted with three fans that can pull 0.6A. If all fans are at their maximum RPM, the fans of the first server could easily pull 90W more. Also, our "Wildcat Pass" server still has to mature a bit as we are using a beta BIOS that has quite a few issues. Still, at idle, both servers are in the same ballpark.

Idle Power Consumption, On Demand

The Haswell-EP reveals its mobile roots. The Idle power of the Xeon E5-2699 v3 is lower despite being a much larger chip than the Xeon E5-2697 v2, while both are baked upon the same process technology.

Next we measure the power consumed while keeping the response time at 100 ms. These are averages measured over a period of time. So basically we are measuring energy consumption, but we report the average power that was consumed over the same period of time.

Power at 100 ms response time

It would be wrong to simply compare the numbers above as the Xeon E5-2695 and 2699 do considerably more work. However, it cannot be denied that the Xeon E5-2699 v3 and Xeon E5-2667 v3 are a lot more power hungry than the rest of the pack. Remember also that, as noted above, the fans of the server that hosted the Xeon E5 v3 consume quite a bit more, but at the moment we have not been able to determine how much.

Let's calculate performance per watt. Take the following graph with a grain of salt as the benchmark is not the most accurate (results tend to vary by 5-8%), but still it gives a rough idea of what you can expect.

Drupal 7.21 web performance per watt

The Xeon E5-2695 v3 is able to Turbo Boost to high clock speeds, which keeps the response time low. At the same time, the power consumption is limited. The Xeon E5-2699 v3 probably fires up the fans a lot higher, and that drives power consumption up as the fans in our server can consume quite a bit.

What this means is that TDP is once again a relatively decent predictor of actual power consumption. The lower TDP of the Xeon E5-2695 v3 (120w) materializes in real world power savings compared to the Xeon E5-2667 v3 (135W TDP) and Xeon E5-2699 v3 (145W TDP).

Website Performance: Drupal 7.21 HPC: OpenFoam
Comments Locked

85 Comments

View All Comments

  • LostAlone - Saturday, September 20, 2014 - link

    Given the difference in size between the two companies it's not really all that surprising though. Intel are ten times AMD's size, and I have to imagine that Intel's chip R&D department budget alone is bigger than the whole of AMD. And that is sad really, because I'm sure most of us were learning our computer science when AMD were setting the world on fire, so it's tough to see our young loves go off the rails. But Intel have the money to spend, and can pursue so many more potential avenues for improvement than AMD and that's what makes the difference.
  • Kevin G - Monday, September 8, 2014 - link

    I'm actually surprised they released the 18 core chip for the EP line. In the Ivy Bridge generation, it was the 15 core EX die that was harvested for the 12 core models. I was expecting the same thing here with the 14 core models, though more to do with power binning than raw yields.

    I guess with the recent TSX errata, Intel is just dumping all of the existing EX dies into the EP socket. That is a good means of clearing inventory of a notably buggy chip. When Haswell-EX formally launches, it'll be of a stepping with the TSX bug resolved.
  • SanX - Monday, September 8, 2014 - link

    You have teased us with the claim that added FMA instructions have double floating point performance. Wow! Is this still possible to do that with FP which are already close to the limit approaching just one clock cycle? This was good review of integer related performance but please combine with Ian to continue with the FP one.
  • JohanAnandtech - Monday, September 8, 2014 - link

    Ian is working on his workstation oriented review of the latest Xeon
  • Kevin G - Monday, September 8, 2014 - link

    FMA is common place in many RISC architectures. The reason why we're just seeing it now on x86 is that until recently, the ISA only permitted two registers per operand.

    Improvements in this area maybe coming down the line even for legacy code. Intel's micro-op fusion has the potential to take an ordinary multiply and add and fuse them into one FMA operation internally. This type of optimization is something I'd like to see in a future architecture (Sky Lake?).
  • valarauca - Monday, September 8, 2014 - link

    The Intel compiler suite I believe already converts

    x *= y;
    x += z;

    into an FMA operation when confronted with them.
  • Kevin G - Monday, September 8, 2014 - link

    That's with source that is going to be compiled. (And don't get me wrong, that's what a compiler should do!)

    Micro-op fusion works on existing binaries years old so there is no recompile necessary. However, micro-op fusion may not work in all situations depending on the actual instruction stream. (Hypothetically the fusion of a multiply and an add in an instruction stream may have to be adjacent to work but an ancient compiler could have slipped in some other instructions in between them to hide execution latencies as an optimization so it'd never work in that binary.)
  • DIYEyal - Monday, September 8, 2014 - link

    Very interesting read.
    And I think I found a typo: page 5 (power optimization). It is well known that THE (not needed) Haswell HAS (is/ has been) optimized for low idle power.
  • vLsL2VnDmWjoTByaVLxb - Monday, September 8, 2014 - link

    Colors or labeling for your HPC Power Consumption graph don't seem right.
  • JohanAnandtech - Monday, September 8, 2014 - link

    Fixed, thanks for pointing it out.

Log in

Don't have an account? Sign up now