The War of the SoCs: Performance/Watt

As we mentioned earlier, it is not that easy to determine the performance per watt of the different SoCs. Depending on the motherboard feature richness, performance per watt can vary a lot. We tested the Xeon E3-1240 v3 only on the feature-rich ASUS P9D, while the Atom C2750 is on a very efficient and simple HP m300 cartridge. Given the discrepancies, we cannot simply divide the performance by the power consumption and called it a day. No, we have to do a few calculations to get a good estimate of the performance/watt.

The current idle power of modern Intel CPUs is so low that it is almost irrelevant. All cores but one are put in a deep sleep (power gating), and the one that is still active runs at a very low clock and voltage. We have found that a Xeon E3-1200 v2's (Ivy Bridge) idle power is around 3W, perhaps even less... it is very hard and time consuming to measure correctly. We know from the mobile device reviews that the Haswell idle power is even lower. The Atom core is simpler, but the sleep states are slightly less advanced. Regardless, whether a CPU consumes 1.7W or 2.2W idling is not relevant for our calculation.

If we take the delta between idle power of a system and full load, and add about 3W idle, we're probably very close to the real power consumption of an Intel CPU. The only noise is the loss of the power supply (low because these are highly efficient ones) and the fact that the voltage regulators and DRAM consume a little more at higher load. Again, we are talking about very low numbers.

In the case of the Xeon E3, we also add about 3W for the Intel C224 chipset (0.7W idle, 4.1W TDP). For the X-Gene, we may assume that the idle power is a lot higher. When we calculated the power of the different components (8 DIMMs, disabled 10 GbE, etc.), we estimate that it is about 10W.

For the total system power, the power consumption of one node, we take the m300 numbers as measured. We subtract 7W from the m400 numbers as the m400 has four extra DIMM slots and a 10 GbE NIC. We add 9W to the SoC power of the Xeon E3 as we have found out that 12W is more or less the power that a Xeon E3 node consumes without the SoC.

Power Consumption Calculations
SoC Power Delta =
Power Web -
Idle (W)
Power SoC =
Power Delta +
Idle SoC +
Chipset (W)
Total System Power =
Power SoC +
Mobo (W)
Xeon E3- 1240 v3 3.4 95-42 = 53 53+3+3 = 59 53+3+12 = 68
Xeon E3-1230L v2 1.8 68-41 = 27
(45-18 = 27)
27+3+3 = 33 27+3+12 = 42
Xeon E3-1265L v2 2.5 65-26 = 39 39+3+3 = 45 39+3+12 = 54
Atom C2750 2.4 25-11 = 13 13+3+0 = 16 25
X-Gene 67-37 = 30 30+10 = 40 67-7 = 60

Let's discuss our findings. The Xeon E3-1240 v3 consumes probably about 50W with a high web load and is nowhere near its TDP (80W). The Xeon E3-1265L v2 (45W TDP) and Xeon E3-1230 (25W TDP) consume probably slightly more than their advertised TDP. That is slightly worrying as an integer workload that raises the CPU load to about 85-90% is not the worst situation you can imagine.... a 100% FPU load will go far beyond the TDP numbers then. The Atom C2750 requires the least power.

Performance per Watt
SoC Total Power
(SoC + Chipset)
Total
System
Power
Throughput
at 1000ms
Throughput per
Watt (SoC)
Throughput per
Watt (System)
Xeon E3- 1240 v3 3.4 59 68 1221 20.7 18
Xeon E3-1230L v3 1.8 33 42 739 22.4 17.6
Xeon E3-1265L v2 2.5 45 54 759 16.9 14.1
Atom C2750 2.4 16 25 312 19.5 12.5
X-Gene 1 2.4 40 60 322 8 5.4

We are not pretending that our calculations are 100% accurate, but they should be close enough. At the end of the day, a couple Watts more or less is not going to change our conclusion that the Xeon E3-1230L v3 and Xeon E3-1240 v3 are the most efficient processors for these workloads. The Xeon E3-1230L v3 wins because it will require less cooling and less electricity distribution infrastructure using the same dense servers.

The Atom wins if you are power limited but the power efficiency is a bit lower when it comes to serving up a web infrastructure. Lastly, the X-Gene 1 has some catching up to do. The X-Gene 2 promises to be 50% more efficient. The software optimization efforts could bridge the rest of the gap, but we don't have a crystal ball.

Web infrastructure Power consumption Conclusion
POST A COMMENT

47 Comments

View All Comments

  • IBleedOrange - Monday, March 09, 2015 - link

    EETimes is wrong.
    Google "Intel Denverton"
    Reply
  • beginner99 - Monday, March 09, 2015 - link

    Maybe it would be good to mention the X-Gene is made on a 40nm process at the start of the article. I read the article and think for myself that the X-Gene is crap and in the end you get the explanation. It's on 40 nm vs Atoms on Intel 22 nm. It's a huge difference and currently the article is a bit misleading eg. shining a bad light on X-Gene and ARM. (And I say this even though I always was a proponent of Intel Big cores in almost all server applications). Reply
  • Stephen Barrett - Monday, March 09, 2015 - link

    If APM had a newer part to test then we would have tested it. XG2 is simply not out yet. So the fact that APM has their flagship SoC on an older process is not misleading... Its the facts. The currently available Intel parts have a process advantage. Reply
  • warreo - Monday, March 09, 2015 - link

    Mentioning it at the start would be good from a technical disclosure standpoint, but I'm not sure for the purposes of this article it truly matters. The article is comparing what is currently available now from APM and Intel. Reality is Intel will likely have a significant process advantage for the foreseeable future, and if you wanted to see a like for like comparison on a process basis, then you'll probably need to wait 2-3 years for X-Gene to get on 22nm, meanwhile Intel will have moved on to 10nm. Reply
  • CajunArson - Monday, March 09, 2015 - link

    The 40nm process is only really relevant when it comes to the power-consumption comparisons.
    A 28nm.. or 20nm or 16nm... part with the same cores at the same clockspeeds will register the exact same level of performance. The only difference will be that the smaller lithographic processes should provide that level of performance in a smaller power envelope.
    Reply
  • JohanAnandtech - Monday, March 09, 2015 - link

    well, with so much time invested in an article, I always hope people will read the pages between page 1 and 18 too :-p. It is mentioned in the overview of the SoCs on page 5 and quite a few times at other pages too. Reply
  • colinstu - Monday, March 09, 2015 - link

    what server is on the bottom of the first page? Reply
  • JohanAnandtech - Monday, March 09, 2015 - link

    A very old MSI server :-). Just to show people what webfarms used before the micro server era. Reply
  • Samus - Monday, March 09, 2015 - link

    I use the Xeon E3-1230v3 in desktop applications all the time. It's basically an i7 for the price of an i5.

    And a lot of IT dept dump them on eBay cheap when they upgrade their servers. They can be had well under $200 lightly used. The 80w TDP could theoretically have some drawbacks for boost time, but the real-world performance according to passmark elongated tests doesn't seem to show any difference between it's boost potential and that of an 88w i7-k

    Great CPU's.
    Reply
  • Alone-in-the-net - Monday, March 09, 2015 - link

    In both your compilers, you need to specify the -march=native so the the compiler can optimize for the architecture you are running on, -o3 is not enough. This enables the compiler to use cpu specific commands. Reply

Log in

Don't have an account? Sign up now