Krait: Idle Power

We'll start out our power investigation looking at behavior at idle. Although battery life when you're actually using your device is very important, having a fast SoC that can quickly complete tasks and race to sleep means that you need to be able to drive down to very low idle power levels to actually benefit from that performance. Here we're looking at power consumption at the Start Screen in Windows RT/8. You'll notice that there are two distinct periods during the benchmark, with the latter part of the graph showing lower power consumption thanks to the live tiles going to sleep. In this test, WiFi is enabled but there's no background syncing of anything. WiFi being on is why we continue to see power spikes even after the live tiles have gone to sleep:

The W510 does a great job of drawing little power at idle. Its silly WiFi implementation results in peak idle power consumption that's very similar to the Dell XPS 10, but the lowest the platform hits is appreciably lower than anything else. Surface RT remains the more power hungry of the three, while the XPS 10 falls somewhere in between MS and Acer.

If we isolate CPU core power alone though, things are a bit different. Keep in mind that we don't have the L2 power island instrumented, so the XPS 10 looks a little better than it should here but minimum CPU power consumption is very good on Krait. Although the Atom Z2760 is built on a special SoC derivative of Intel's 32nm process, I do suspect that it's not quite as low power as TSMC's 28nm LP. Things may change by the time 22nm rolls around however. All meaningful compute transistors here should be power gated, and what we end up looking at is the best case leakage for all SoCs. The Krait/28nm LP combination is awesome. I'm not sure why Tegra 3 is so much more active here towards the very end of the curve by comparison.

Adreno 225, or at least whatever Qualcomm drives off of the GPU power rail is extremely power efficient at idle. The PowerVR SGX 545 curve looks flatter at the end but Qualcomm is able to hit lower minimum power levels. It's not clear to me how much of this is architecture vs. process technology. On the GPU side there is some activity happening here as the display is still being refreshed even though the system is idle, so we're not looking at purely power gated consumption here.

To take the WiFi controller out of the equation, I tossed all tablets into Airplane mode and re-ran the same tests as above. You'll notice much less fluctuation in power consumption once the live tiles go to sleep.

Take WiFi out of the equation and Acer's W510 looks really good. Intel worked very hard with Acer to ensure power consumption was as low as possible on this device. The XPS 10 does a bit better than Surface RT here, but not tremendously so. Acer/Intel hold the clear advantage.

Looking at the CPU power island alone (excluding the L2 cache for Krait), we continue to see lower idle power consumption from APQ8060A vs. Atom Z2760. Once again I believe this is a TSMC 28nm LP advantage more than an architectural thing.

Modifying a Krait Platform: More Complicated Krait: SunSpider, Kraken & RIABench
Comments Locked

140 Comments

View All Comments

  • A5 - Friday, January 4, 2013 - link

    Even if you just look at the Sunspider (which draws nothing on the screen) power draw, it's pretty clear that the A15 draws more power. There have been a ton of OEMs complaining about A15's power draw, too.
  • madmilk - Friday, January 4, 2013 - link

    Since when did screen resolution matter for CPU power consumption on CPU benchmarks? Platform power might change, yes, but this doesn't invalidate many facts like Cortex-A15 using twice as much power on average compared to Krait, Atom or Cortex-A9.
  • Wolfpup - Friday, January 4, 2013 - link

    Good lord. Do you have some evidence for any of this? If neither Windows nor Android is the "right platform" for ARM, then...are you waiting for Blackberry benchmarks? That's a whole lot of spin you're doing, presumably to fit the data to your preconceived "ARM IS BETTER!" faith.
  • Veteranv2 - Friday, January 4, 2013 - link

    Hahaha, the Nexus 10 has almost 4 times the pixels of the Atom.
    And the conclusion is it draws more power in benchmarks? Of course, those pixels aren't going to fill itself. Way to make conclusion.

    How big was that Intel PR cheque?
  • iwod - Saturday, January 5, 2013 - link

    While i wouldn't say it was a Intel PR, I think they should definitely have left the system level power usage out of the questions. There is no point telling me that a 100" Screen with ARM is using X amount of power compared to 1" Screen with Haswell.

    It is confusing.

    But they did include CPU and GPU benchmarks. So saying it is Intel PR is just trolling.
  • AlB80 - Friday, January 4, 2013 - link

    Architectures with variable length of instruction are doomed. Actually there is only one remains. x86.
    Intel made the step into a happy past when CISC has an advantage over RISC, when superscalarity was just a theory.
    Cortex A57 is coming. ARM cores will easily outperform Atom by effective instruction rate with minimum overhead.
  • Wolfpup - Friday, January 4, 2013 - link

    How is x86 doomed when it has an absolute stranglehold on real PCs, and is now competitive on ultramobile platforms?

    The only disadvantage it holds is the need for a larger decoder on the front end, which has been proportionally shrinking since 1995.
  • djgandy - Friday, January 4, 2013 - link

    plus effing one!

    I think some people heard their uni lecturers say something once in 1999 and just keep repeating it as if it is still true!
  • AlB80 - Friday, January 4, 2013 - link

    Shrinking decoder... nice myth. Of course complicated scheduler and ALU dozen impact on performance, but do not forget how decoded instruction queues are filled. Decoder is only one real difference.
    1. There is fundamental limits how many variable instructions can be decoded per clock. CISC has an instruction cross-interference at the decoder stage. One logical block should determine a total length of decoded instructions.
    2. There is a trick when CISC decoder is disintegrated into 2-3 parts with dedicated inputs, so its looks like a few independent decoders, but each part can not decode any instruction.

    Now compare it with RISC.
    And as I said, what happens when Cortex can decode 4,5,6,7,8 instructions?
  • Kogies - Friday, January 4, 2013 - link

    Don't be so quick to prophesy the death of a' that. What happens when a Cortex decodes 8 instructions... I don't know, it uses 8W?

    Also, didn't Apple choose CISC (Intel) chips over RISC (PowerPC)? Interestingly, I believe Apple made the switch to Intel because the PowerPC chips had too high a power premium for mobile computers.

Log in

Don't have an account? Sign up now