The War of the SoCs: Performance/Watt

As we mentioned earlier, it is not that easy to determine the performance per watt of the different SoCs. Depending on the motherboard feature richness, performance per watt can vary a lot. We tested the Xeon E3-1240 v3 only on the feature-rich ASUS P9D, while the Atom C2750 is on a very efficient and simple HP m300 cartridge. Given the discrepancies, we cannot simply divide the performance by the power consumption and called it a day. No, we have to do a few calculations to get a good estimate of the performance/watt.

The current idle power of modern Intel CPUs is so low that it is almost irrelevant. All cores but one are put in a deep sleep (power gating), and the one that is still active runs at a very low clock and voltage. We have found that a Xeon E3-1200 v2's (Ivy Bridge) idle power is around 3W, perhaps even less... it is very hard and time consuming to measure correctly. We know from the mobile device reviews that the Haswell idle power is even lower. The Atom core is simpler, but the sleep states are slightly less advanced. Regardless, whether a CPU consumes 1.7W or 2.2W idling is not relevant for our calculation.

If we take the delta between idle power of a system and full load, and add about 3W idle, we're probably very close to the real power consumption of an Intel CPU. The only noise is the loss of the power supply (low because these are highly efficient ones) and the fact that the voltage regulators and DRAM consume a little more at higher load. Again, we are talking about very low numbers.

In the case of the Xeon E3, we also add about 3W for the Intel C224 chipset (0.7W idle, 4.1W TDP). For the X-Gene, we may assume that the idle power is a lot higher. When we calculated the power of the different components (8 DIMMs, disabled 10 GbE, etc.), we estimate that it is about 10W.

For the total system power, the power consumption of one node, we take the m300 numbers as measured. We subtract 7W from the m400 numbers as the m400 has four extra DIMM slots and a 10 GbE NIC. We add 9W to the SoC power of the Xeon E3 as we have found out that 12W is more or less the power that a Xeon E3 node consumes without the SoC.

Power Consumption Calculations
SoC Power Delta =
Power Web -
Idle (W)
Power SoC =
Power Delta +
Idle SoC +
Chipset (W)
Total System Power =
Power SoC +
Mobo (W)
Xeon E3- 1240 v3 3.4 95-42 = 53 53+3+3 = 59 53+3+12 = 68
Xeon E3-1230L v2 1.8 68-41 = 27
(45-18 = 27)
27+3+3 = 33 27+3+12 = 42
Xeon E3-1265L v2 2.5 65-26 = 39 39+3+3 = 45 39+3+12 = 54
Atom C2750 2.4 25-11 = 13 13+3+0 = 16 25
X-Gene 67-37 = 30 30+10 = 40 67-7 = 60

Let's discuss our findings. The Xeon E3-1240 v3 consumes probably about 50W with a high web load and is nowhere near its TDP (80W). The Xeon E3-1265L v2 (45W TDP) and Xeon E3-1230 (25W TDP) consume probably slightly more than their advertised TDP. That is slightly worrying as an integer workload that raises the CPU load to about 85-90% is not the worst situation you can imagine.... a 100% FPU load will go far beyond the TDP numbers then. The Atom C2750 requires the least power.

Performance per Watt
SoC Total Power
(SoC + Chipset)
Total
System
Power
Throughput
at 1000ms
Throughput per
Watt (SoC)
Throughput per
Watt (System)
Xeon E3- 1240 v3 3.4 59 68 1221 20.7 18
Xeon E3-1230L v3 1.8 33 42 739 22.4 17.6
Xeon E3-1265L v2 2.5 45 54 759 16.9 14.1
Atom C2750 2.4 16 25 312 19.5 12.5
X-Gene 1 2.4 40 60 322 8 5.4

We are not pretending that our calculations are 100% accurate, but they should be close enough. At the end of the day, a couple Watts more or less is not going to change our conclusion that the Xeon E3-1230L v3 and Xeon E3-1240 v3 are the most efficient processors for these workloads. The Xeon E3-1230L v3 wins because it will require less cooling and less electricity distribution infrastructure using the same dense servers.

The Atom wins if you are power limited but the power efficiency is a bit lower when it comes to serving up a web infrastructure. Lastly, the X-Gene 1 has some catching up to do. The X-Gene 2 promises to be 50% more efficient. The software optimization efforts could bridge the rest of the gap, but we don't have a crystal ball.

Web infrastructure Power consumption Conclusion
POST A COMMENT

47 Comments

View All Comments

  • JohanAnandtech - Tuesday, March 10, 2015 - link

    Thanks! It is been a long journey to get all the necessary tests done on different pieces of hardware and it is definitely not complete, but at least we were able to quantify a lot of paper specs. (25 W TDP of Xeon E3, 20W Atom, X-Gene performance etc.) Reply
  • enzotiger - Tuesday, March 10, 2015 - link

    SeaMicro focused on density, capacity, and bandwidth.

    How did you come to that statement? Have you ever benchmark (or even play with) any SeaMicro server? What capacity or bandwidth are you referring to? Are you aware of their plan down the road? Did you read AMD's Q4 earning report?

    BTW, AMD doesn't call their server as micro-server anymore. They use the term dense server.
    Reply
  • Peculiar - Tuesday, March 10, 2015 - link

    Johan, I would also like to congratulate you on a well written and thorough examination of subject matter that is not widely evaluated.

    That being said, I do have some questions concerning the performance/watt calculations. Mainly, I'm concerned as to why you are adding the idle power of the CPUs in order to obtain the "Power SoC" value. The Power Delta should take into account the difference between the load power and the idle power and therefore you should end up with the power consumed by the CPU in isolation. I can see why you would add in the chipset power since some of the devices are SoCs and do no require a chipset and some are not. However, I do not understand the methodology in adding the idle power back into the Delta value. It seems that you are adding the load power of the CPU to the idle power of the CPU and that is partially why you have the conclusion that they are exceeding their TDPs (not to mention the fact that the chipset should have its own TDP separate from the CPU).

    Also, if one were to get nit picky on the power measurements, it is unclear if the load power measurement is peak, average, or both. I would assume that the power consumed by the CPUs may not be constant since you state that "the website load is a very bumpy curve with very short peaks of high CPU load and lots of lows." If possible, it may be more beneficial to measure the energy consumed over the duration of the test.
    Reply
  • JohanAnandtech - Wednesday, March 11, 2015 - link

    Thanks for the encouragement. About your concerns about the perf/watt calculations. Power delta = average power (high web load measured at 95% percentile = 1 s, an average of about 2 minutes) - idle power. Since idle power = total idle of node, it contains also the idle power of the SoC. So you must add it to get the power of the SoC. If you still have doubts, feel free to mail me. Reply
  • jdvorak - Friday, March 13, 2015 - link

    The approach looks absolutely sound to me. The idle power will be drawn in any case, so it makes sense to add it in the calculation. Perhaps it would also be interesting to compare the power consumed by the differents systems at the same load levels, such as 100 req/s, 200 req/s, ... (clearly, some higher loads will not be achievable by all of them).

    Johan, thanks a lot for this excellent, very informative article! I can imagine how much work has gone into it.
    Reply
  • nafhan - Wednesday, March 11, 2015 - link

    If these had 10gbit - instead of gbit - NICs, these things could do some interesting stuff with virtual SANs. I'd feel hesitant shuttling storage data over my primary network connection without some additional speed, though.

    Looking at that moonshot machine, for instance: 45 x 480 SSD's is a decent sized little SAN in a box if you could share most of that storage amongst the whole moonshot cluster.

    Anyway, with all the stuff happening in the virtual SAN space, I'm sure someone is working on that.
    Reply
  • Casper42 - Wednesday, April 15, 2015 - link

    Johan, do you have a full Moonshot 1500 chassis for your testing? Or are you using a PONK? Reply

Log in

Don't have an account? Sign up now