Power Measurements using Trepn

Measuring power draw is an interesting unique capability of Qualcomm's MDPs. Using their Trepn Profiler software and measurement hardware integrated into the MDP, we can measure a number of different power rails on the device, including power draw from each CPU core, the digital core (including video decoder and modem) and a bunch of other measures.

Measuring and keeping track of how different SoCs consumer power is something we've wanted to do for a while, and at least under the Qualcomm MDP umbrella at this point it's possible to measure right on the device.

The original goal was to compare power draw on 45nm MSM8660 versus 28nm MSM8960, however we encountered stability issues with Trepn profiler on the older platform that are still being resolved. Thankfully it is possible to take measures on MSM8960, and for this we turned to a very CPU intensive task that would last long enough to get a good measure, and also load both cores so we can see how things behave. That test is the Moonbat Benchmark, which is a web-worker wraper of the sunspider 0.9 test suite. We fired up a test consisting of 4 workers and 50 runs inside Chrome beta (which is web-worker enabled), and profiled using Trepn.

If you squint at the graph, you can see that one Krait core can use around 750 mW at maximum load. I didn't enable the CPU frequency graph (just to keep things simple above) but is 750 mW number happens right at 1.5 GHz. The green spikes from battery power are when we're drawing more than the available current from USB - this is also why you see devices sometimes discharge even when plugged in. There's an idle period at the end that I also left visible - you can see how quickly Qualcomm's governor suspends the second core completely after our moonbat test finishes running.

Here's another run of moonbat on Chrome Beta where we can see the same behavior, but zoomed in a bit better - each Krait core will consume anywhere between 450 mW and 750 mW depending on the workload, which does change during our run while V8 does its JIT compilation and Chrome dispach things to each CPU.

The next big question is obviously - well how much does GPU contribute to power drain? The red "Digital Core Rail Power" lines above include the Adreno 225 GPU, video decode, and "modem digital" blocks. Cellular is disabled on the MDP MSM8960, and we're not decoding any video, so in the right circumstances we can somewhat isolate out the GPU. To find out, I profiled a run of GLBenchmark Egypt on High settings (which is an entirely GPU compute bound test) and let it run to completion. You can see how the digital rail bounces between 800 mW and 1.2 W while the test is running. Egypt's CPU portions are pretty much single-threaded as well, as shown by the yellow and green lines above.

Another interesting case is what this looks like when browsing the web. I fired up the analyzer and loaded the AnandTech homepage followed by an article, and scrolled the page in the trace above. Chrome and "Browser" on Android now use the GPU for composition and rendering the page, and you can see the red line in the plot spike up when I'm actively panning and translating around on the page. In addition, the second CPU core only really wakes up when either loading the page and parsing HTML.

One thing we unfortunately can't measure is how much power having the baseband lit up on each different air interface (CDMA2000 1x, EV-DO, WCDMA, LTE, etc.) consumes, as the MDP MSM8960 we were sampled doesn't have cellular connectivity enabled. This is something that we understand in theory (at least for the respective WCDMA and LTE radio resource states), but remains to be empirically explored. It's unfortunate that we also can't compare to the MDP MSM8660 quite yet, but that might become possible pretty quickly.

GPU Performance - Adreno 225 Final Thoughts
Comments Locked

86 Comments

View All Comments

  • bhspencer - Tuesday, February 21, 2012 - link

    Does anyone know if Linpak is using the hardware or software floating point calculations for the MFLOPS number.
  • metafor - Wednesday, February 22, 2012 - link

    Hardware. But it's run on the JIT instead of native code. According to CF-Bench, Java FP performance is around 1/3 of native. Neither actually use NEON but instead uses the older VFP instructions.
  • vision33r - Tuesday, February 21, 2012 - link

    The Tegra 3 is actually a big disappointment from a performance standpoint. It actually has 5 CPU cores and the GPU performance isn't much better than the Tegra 2. The Adreno 225 is a much bigger upgrade but I'm afraid that it's another marginal upgrade.

    The A5 in the iPad2/iPhone 4S are over 1 year old by March. In that time, Nvidia's Tegra 2/3 has not dominated and the MSM8960 is finally a true contender for the fastest SOC on the market. By the time this thing is out in volume, Apple has the A6 ready and most likely another 4-8x performance increase over the A5.

    This SOC will probably be forgotten when the A6 is out.
  • LetsGo - Wednesday, February 22, 2012 - link

    Yeah your right looking at my Asus Transformer Prime running GTA 3. /S

    A lot of graphical optimisations can be done on the CPU cores before data is offloaded to the GPU.

    The moral of the story is that Benchmarks are only a rough guide at best.
  • tipoo - Wednesday, February 22, 2012 - link

    Unless the rumors are true and its A5X, not A6, with just faster dual cores rather than quads on a newer architecture. I would not be surprised, its like the 3G-3GS was an architecture change, then the 4 was just a faster chip on a similar architecture. The iPad 2 was an architecture change, the 3 might just be a faster version of the same thing, hopefully with improvements in the GPU. I'd be fine with that, as long as the GPU kept up with the new resolution.
  • Stormkroe - Tuesday, February 21, 2012 - link

    I was just plotting out what little resolution scaling info there is here and noticed something very odd. Both the iphone 4s and galaxy s2 actually score MUCH higher when the resolution is raised to 720p offscreen. I can see that in the 4s' case it could be explained with fps caps, but the S2 is definitely not hitting a cap at 34.6 fps @ 800x480, yet it hits 42.5 fps @ 1280x720. All other phones predictably step down in speed. Anyone else notice this?
  • Alexstarfire - Tuesday, February 21, 2012 - link

    Yes I did. It was actually the reason I was going to post. I was curious to know if the iPhone had VSync or not because it made no sense that it would get better performance at a higher resolution. Neither of the results make any sense to me since they are counter-intuitive.

    If the "offscreen" tests force VSync off then that could explain it for the iPhone but not really for the SGSII unless some parts of the test go way past the 60FPS cap with VSync turned on.
  • alter.eg00 - Wednesday, February 22, 2012 - link

    Shut up & take my money
  • Denithor - Wednesday, February 22, 2012 - link

    Seconded!!

    I'm still carrying a first generation HTC Incredible (yep, one of the original ones!), been out of contract for a few months, was waiting to hear more about the 28nm SoC update. These look really, really good, seriously looking forward to them hitting the market now!
  • tipoo - Wednesday, February 22, 2012 - link

    I wonder how many apps scale beyond two cores. For the time being, I doubt its many, and since you're still not doing any true multitasking I think a faster dual core like this will trump a slower quad like the Tegra 3 most of the time.

Log in

Don't have an account? Sign up now