Modifying a Krait Platform: More Complicated

Modifying the Dell XPS 10 is a little more difficult than Acer's W510 and Surface RT. In both of those products there was only a single inductor in the path from the battery to the CPU block of the SoC. The XPS 10 uses a dual-core Qualcomm solution however. Ever since Qualcomm started doing multi-core designs it has opted to use independent frequency and voltage planes for each core. While all of the A9s in Tegra 3 and both of the Atom cores used in the Z2760 run at the same frequency/voltage, each Krait core in the APQ8060A can run at its own voltage and frequency. As a result, there are two power delivery circuits that are needed to feed the CPU cores. I've highlighted the two inductors Intel lifted in orange:

Each inductor was lifted and wired with a 20 mΩ resistor in series. The voltage drop across the 20 mΩ resistor was measured and used to calculate CPU core power consumption in real time. Unless otherwise stated, the graphs here represent the total power drawn by both CPU cores.

Unfortunately, that's not all that's necessary to accurately measure Qualcomm CPU power. If you remember back to our original Krait architecture article you'll know that Qualcomm puts its L2 cache on a separate voltage and frequency plane. While the CPU cores in this case can run at up to 1.5GHz, the L2 cache tops out at 1.3GHz. I remembered this little fact late in the testing process, and we haven't yet found the power delivery circuit responsible for Krait's L2 cache. As a result, the CPU specific numbers for Qualcomm exclude any power consumed by the L2 cache. The total platform power numbers do include it however as they are measured at the battery.

The larger inductor in yellow feeds the GPU and it's instrumented using another 20 mΩ resistor.

Visualizing Krait's Multiple Power/Frequency Domains

Qualcomm remains adament about its asynchronous clocking with multiple voltage planes. The graph below shows power draw broken down by each core while running SunSpider:

SunSpider is a great benchmark to showcase exactly why Qualcomm has each core running on its own power/frequency plane. For a mixed workload like this, the second core isn't totally idle/power gated but it isn't exactly super active either. If both cores were tied to the same voltage/frequency, the second core would have higher leakage current than in this case. The counter argument would be that if you ran the second core at its max frequency as well it would be able to complete its task quicker and go to sleep, drawing little to no power. The second approach would require a very fast microcontroller to switch between v/f modes and it's unclear which of the two would offer better power savings. It's just nice to be able to visualize exactly why Qualcomm does what it does here.

On the other end of the spectrum however is a benchmark like Kraken, where both cores are fairly active and the workload is balanced across both cores:

 

Here there's no real benefit to having two independent voltage/frequency planes, both cores would be served fine by running at the same voltage and frequency. Qualcomm would argue that the Kraken case is rare (single threaded performance still dominates most user experience), and the power savings in situations like SunSpider are what make asynchronous clocking worth it. This is a much bigger philosophical debate that would require far more than a couple of graphs to support and it's not one that I want to get into here. I suspect that given its current power management architecture, Qualcomm likely picked the best solution possible for delivering the best possible power consumption. It's more effort to manage multiple power/frequency domains, effort that I doubt Qualcomm would put in without seeing some benefit over the alternative. That being said, what works best for a Qualcomm SoC isn't necessarily what's best for a different architecture.

Introduction Krait: Idle Power
Comments Locked

140 Comments

View All Comments

  • kumar0us - Friday, January 4, 2013 - link

    My point was that for a CPU benchmark say Sunspider, the code generated by x86 compilers would be better than ARM compilers.

    Could better compilers available for x86 platform be a (partial) reason for faster performance of intel. Or compilers for ARM platform are mature and fast enough that this angle could be discarded?
  • iwod - Friday, January 4, 2013 - link

    Yes, not just compiler but general optimization in software on x86. Which is giving some advantage on Intel's side. However with the recent surge of ARM platform and software running on it my ( wild ) guess is that this is less then 5% in the best case scenario. And it is only the worst case, or individual cases like SunSpider not running fully well.
  • jwcalla - Friday, January 4, 2013 - link

    Yes. And it was a breath of fresh air to see Anand mention that in the article.

    Look at, e.g., the difference in SunSpider benchmarks between the iPad and Nexus 10. Completely different compilers and completely different software. As the SunSpider website indicates, the benchmark is designed to compare browsers on the same system, not across different systems.
  • monstercameron - Friday, January 4, 2013 - link

    it would be interesting to throw an amd system into the benchmarking, maybe the current z-01 or the upcoming z-60...
  • silverblue - Friday, January 4, 2013 - link

    AMD has thrown a hefty GPU on die, which, coupled with the 40nm process, isn't going to help with power consumption whatsoever. The FCH is also separate as opposed to being on-die, and AMD tablets seem to be thicker than the competition.

    AMD really needs Jaguar and its derivatives and now. A dual core model with a simple 40-shader GPU might be a competitive part, though I'm always hearing about the top-end models which really aren't aimed at this market. Perhaps AMD will use some common sense and go for small, volume parts over the larger, higher performance offerings, and actually get themselves into this market.
  • BenSkywalker - Friday, January 4, 2013 - link

    There is an AMD design in their, Qualcomm's part.

    A D R E N O
    R A D E O N

    Not a coincidence, Qualcomm bought AMD's ultra portable division off from them for $65 million a few years back.

    Anand- If this is supposed to be a CPU comparison, why go overboard with the terrible browser benchmarks? Based on numbers you have provided, Tegra 3 as a generic example is 100% faster under Android then WinRT depending on the bench you are running. If this was an article about how the OSs handle power tasks I would say that is reasonable, but given that you are presenting this as a processor architecture article I would think that you would want to use the OS that works best with each platform.
  • powerarmour - Friday, January 4, 2013 - link

    Agreed, those browser benchmarks seem a pretty poor way to test general CPU performance, in fact browser benchmarks in general just test how optimized a particular browser is on a particular OS mainly.

    In fact I can beat most of those results with a lowly dual-A9 Galaxy Nexus smartphone running Android 4.2.1!
  • Pino - Friday, January 4, 2013 - link

    I remember AMD having a dual core APU (Ontario) with a 9W TDP, on a 40nm process, back in 2010.

    They should invest on a SOC
  • kyuu - Friday, January 4, 2013 - link

    That's what Temash is going to be. They just need to get it on the market and into products sooner rather than later.
  • jemima puddle-duck - Friday, January 4, 2013 - link

    Impressive though all this engineering is, in the real world what is the unique selling point for this? Normal people (not solipsistic geeks) don't care what's inside their phone, and the promise of their new phone being slighty faster than another phone is irrelevant. And for manufacturers, why ditch decades of ARM knowledge to lock yourself into one supplier. The only differentiator is cost, and I don't see Intel undercutting ARM any time soon.

    The only metric that matters is whether normal human beings get any value from it. This just seems like (indirect) marketing by Intel for a chip that has no raison d'etre. I'm hearing lots of "What" here, but no "Why". This is the analysis I'm interested in.

    All that said, great article :)

Log in

Don't have an account? Sign up now