Original Link: http://www.anandtech.com/show/6536/arm-vs-x86-the-real-showdown



Late last month, Intel dropped by my office with a power engineer for a rare demonstration of its competitive position versus NVIDIA's Tegra 3 when it came to power consumption. Like most companies in the mobile space, Intel doesn't just rely on device level power testing to determine battery life. In order to ensure that its CPU, GPU, memory controller and even NAND are all as power efficient as possible, most companies will measure power consumption directly on a tablet or smartphone motherboard.

The process would be a piece of cake if you had measurement points already prepared on the board, but in most cases Intel (and its competitors) are taking apart a retail device and hunting for a way to measure CPU or GPU power. I described how it's done in the original article:

Measuring power at the battery gives you an idea of total platform power consumption including display, SoC, memory, network stack and everything else on the motherboard. This approach is useful for understanding how long a device will last on a single charge, but if you're a component vendor you typically care a little more about the specific power consumption of your competitors' components.

What follows is a good mixture of art and science. Intel's power engineers will take apart a competing device and probe whatever looks to be a power delivery or filtering circuit while running various workloads on the device itself. By correlating the type of workload to spikes in voltage in these circuits, you can figure out what components on a smartphone or tablet motherboard are likely responsible for delivering power to individual blocks of an SoC. Despite the high level of integration in modern mobile SoCs, the major players on the chip (e.g. CPU and GPU) tend to operate on their own independent voltage planes.


A basic LC filter

What usually happens is you'll find a standard LC filter (inductor + capacitor) supplying power to a block on the SoC. Once the right LC filter has been identified, all you need to do is lift the inductor, insert a very small resistor (2 - 20 mΩ) and measure the voltage drop across the resistor. With voltage and resistance values known, you can determine current and power. Using good external instruments (NI USB-6289) you can plot power over time and now get a good idea of the power consumption of individual IP blocks within an SoC.


Basic LC filter modified with an inline resistor

The previous article focused on an admittedly not too interesting comparison: Intel's Atom Z2760 (Clover Trail) versus NVIDIA's Tegra 3. After much pleading, Intel returned with two more tablets: a Dell XPS 10 using Qualcomm's APQ8060A SoC (dual-core 28nm Krait) and a Nexus 10 using Samsung's Exynos 5 Dual (dual-core 32nm Cortex A15). What was a walk in the park for Atom all of the sudden became much more challenging. Both of these SoCs are built on very modern, low power manufacturing processes and Intel no longer has a performance advantage compared to Exynos 5.

Just like last time, I ensured all displays were calibrated to our usual 200 nits setting and ensured the software and configurations were as close to equal as possible. Both tablets were purchased at retail by Intel, but I verified their performance against our own samples/data and noticed no meaningful deviation. Since I don't have a Dell XPS 10 of my own, I compared performance to the Samsung ATIV Tab and confirmed that things were at least performing as they should.

We'll start with the Qualcomm based Dell XPS 10...



Modifying a Krait Platform: More Complicated

Modifying the Dell XPS 10 is a little more difficult than Acer's W510 and Surface RT. In both of those products there was only a single inductor in the path from the battery to the CPU block of the SoC. The XPS 10 uses a dual-core Qualcomm solution however. Ever since Qualcomm started doing multi-core designs it has opted to use independent frequency and voltage planes for each core. While all of the A9s in Tegra 3 and both of the Atom cores used in the Z2760 run at the same frequency/voltage, each Krait core in the APQ8060A can run at its own voltage and frequency. As a result, there are two power delivery circuits that are needed to feed the CPU cores. I've highlighted the two inductors Intel lifted in orange:

Each inductor was lifted and wired with a 20 mΩ resistor in series. The voltage drop across the 20 mΩ resistor was measured and used to calculate CPU core power consumption in real time. Unless otherwise stated, the graphs here represent the total power drawn by both CPU cores.

Unfortunately, that's not all that's necessary to accurately measure Qualcomm CPU power. If you remember back to our original Krait architecture article you'll know that Qualcomm puts its L2 cache on a separate voltage and frequency plane. While the CPU cores in this case can run at up to 1.5GHz, the L2 cache tops out at 1.3GHz. I remembered this little fact late in the testing process, and we haven't yet found the power delivery circuit responsible for Krait's L2 cache. As a result, the CPU specific numbers for Qualcomm exclude any power consumed by the L2 cache. The total platform power numbers do include it however as they are measured at the battery.

The larger inductor in yellow feeds the GPU and it's instrumented using another 20 mΩ resistor.

Visualizing Krait's Multiple Power/Frequency Domains

Qualcomm remains adament about its asynchronous clocking with multiple voltage planes. The graph below shows power draw broken down by each core while running SunSpider:

SunSpider is a great benchmark to showcase exactly why Qualcomm has each core running on its own power/frequency plane. For a mixed workload like this, the second core isn't totally idle/power gated but it isn't exactly super active either. If both cores were tied to the same voltage/frequency, the second core would have higher leakage current than in this case. The counter argument would be that if you ran the second core at its max frequency as well it would be able to complete its task quicker and go to sleep, drawing little to no power. The second approach would require a very fast microcontroller to switch between v/f modes and it's unclear which of the two would offer better power savings. It's just nice to be able to visualize exactly why Qualcomm does what it does here.

On the other end of the spectrum however is a benchmark like Kraken, where both cores are fairly active and the workload is balanced across both cores:

 

Here there's no real benefit to having two independent voltage/frequency planes, both cores would be served fine by running at the same voltage and frequency. Qualcomm would argue that the Kraken case is rare (single threaded performance still dominates most user experience), and the power savings in situations like SunSpider are what make asynchronous clocking worth it. This is a much bigger philosophical debate that would require far more than a couple of graphs to support and it's not one that I want to get into here. I suspect that given its current power management architecture, Qualcomm likely picked the best solution possible for delivering the best possible power consumption. It's more effort to manage multiple power/frequency domains, effort that I doubt Qualcomm would put in without seeing some benefit over the alternative. That being said, what works best for a Qualcomm SoC isn't necessarily what's best for a different architecture.



Krait: Idle Power

We'll start out our power investigation looking at behavior at idle. Although battery life when you're actually using your device is very important, having a fast SoC that can quickly complete tasks and race to sleep means that you need to be able to drive down to very low idle power levels to actually benefit from that performance. Here we're looking at power consumption at the Start Screen in Windows RT/8. You'll notice that there are two distinct periods during the benchmark, with the latter part of the graph showing lower power consumption thanks to the live tiles going to sleep. In this test, WiFi is enabled but there's no background syncing of anything. WiFi being on is why we continue to see power spikes even after the live tiles have gone to sleep:

The W510 does a great job of drawing little power at idle. Its silly WiFi implementation results in peak idle power consumption that's very similar to the Dell XPS 10, but the lowest the platform hits is appreciably lower than anything else. Surface RT remains the more power hungry of the three, while the XPS 10 falls somewhere in between MS and Acer.

If we isolate CPU core power alone though, things are a bit different. Keep in mind that we don't have the L2 power island instrumented, so the XPS 10 looks a little better than it should here but minimum CPU power consumption is very good on Krait. Although the Atom Z2760 is built on a special SoC derivative of Intel's 32nm process, I do suspect that it's not quite as low power as TSMC's 28nm LP. Things may change by the time 22nm rolls around however. All meaningful compute transistors here should be power gated, and what we end up looking at is the best case leakage for all SoCs. The Krait/28nm LP combination is awesome. I'm not sure why Tegra 3 is so much more active here towards the very end of the curve by comparison.

Adreno 225, or at least whatever Qualcomm drives off of the GPU power rail is extremely power efficient at idle. The PowerVR SGX 545 curve looks flatter at the end but Qualcomm is able to hit lower minimum power levels. It's not clear to me how much of this is architecture vs. process technology. On the GPU side there is some activity happening here as the display is still being refreshed even though the system is idle, so we're not looking at purely power gated consumption here.

To take the WiFi controller out of the equation, I tossed all tablets into Airplane mode and re-ran the same tests as above. You'll notice much less fluctuation in power consumption once the live tiles go to sleep.

Take WiFi out of the equation and Acer's W510 looks really good. Intel worked very hard with Acer to ensure power consumption was as low as possible on this device. The XPS 10 does a bit better than Surface RT here, but not tremendously so. Acer/Intel hold the clear advantage.

Looking at the CPU power island alone (excluding the L2 cache for Krait), we continue to see lower idle power consumption from APQ8060A vs. Atom Z2760. Once again I believe this is a TSMC 28nm LP advantage more than an architectural thing.



SunSpider 0.9.1

The results get more interesting when we look at power consumption during active workloads. We'll start off with SunSpider, a mid-length JavaScript benchmark that we frequently use in our reviews:

At the platform level, Qualcomm's APQ8060 powered Dell XPS 10 falls in between Surface RT and Acer's W510. Active power looks very similar to the Intel powered W510, but performance is appreciably slower so total energy consumed is higher.

Looking at the CPU, the situation changes a bit. Intel's peak power consumption is similar to Tegra 3, while Krait manages to come in appreciably lower. I suspect that missing the L2 cache power island here is lowering Qualcomm's power consumption by 100 - 200mW but overall CPU-only power consumption would still be lower. Once again, at idle Krait seems to have a bit of an advantage as well.

The situation changes once we look at GPU power consumption, with Intel/Imagination having the clear advantage here.

JavaScript Performance - SunSpider 0.9.1

Kraken

Mozilla's Kraken benchmark is a new addition to our js performance suite, and it's a beast. The test runs for much longer than SunSpider, but largely tells a similar story:

 

At the platform level, Acer's W510 has slightly higher peak power consumption compared to the Dell XPS 10 but it also completes the test quicker, giving it a better overall energy usage profile.

Looking at the CPU cores themselves, Qualcomm holds onto its lead here although once again, I suspect the margin of victory is exaggerated by the fact that we're not taking into account L2 power consumption for Qualcomm. Intel does deliver better performance, which allows the CPU to race to sleep quicker than on APQ8060A.

The comparison to Tegra 3 is not surprising, this is exactly what we've seen play out in our battery life tests as well.

JavaScript Performance - Mozilla Kraken Benchmark

RIABench

RIABench's Focus Tests are on the other end of the spectrum, and take a matter of seconds to complete. What we get in turn is a more granular look at power consumption:

 

Here the W510 consumes more power at the platform level, but drops to a lower idle state than the XPS 10. Surface RT clearly uses more power than both.

Krait's CPU level (excluding L2 cache) power consumption is once again lower than Atom's, but Atom completes the task quicker. In this case total energy usage is still in Qualcomm's favor. The fact that there's a discrepancy between CPU specific power results and the total platform results are partly due to the missing L2 cache power consumption data from the CPU power chart for Qualcomm, and partly due to differences in the tablets themselves.

JavaScript Performance - RIABench Focus Tests



Krait: WebXPRT 2013 Community Preview 1

I also included Principled Technologies' new HTML5/js web test suite WebXPRT in our power analysis. Intel and Qualcomm remain quite close in these tests. I didn't run the Qualcomm tests at the same time as the Intel tests the charts graphs aren't perfectly aligned, as a result it looks like Intel took longer to complete the test when in reality the opposite is true. Once again at the platform level, the W510 beats the XPS 10, but at the CPU level Krait manages to do better than Atom. Looking at GPU power consumption alone, Intel/Imagination are once again more power efficient. As 3D performance doesn't matter much here, the Qualcomm/Adreno 225 3D performance advantage does nothing - it just costs more power.

Once again there's no contest when we include Tegra 3 in the comparison. Atom/Krait are in a different league. It'll be interesting to see how Tegra 4 will do here...

WebXPRT - Overall Score

TouchXPRT 2013

As our first native client test, we turned to PT's TouchXPRT 2013. As there is no "run-all" functionality in the TouchXPRT benchmark, we had to present individual power curves for each benchmark. Unlike the previous tests where Qualcomm was consistently slower than Intel, many of the TouchXPRT tests show the two competitors performing quite similarly. This gives us a better idea of how these two fare when performance is equal. For the most part, Acer/Intel seem to win at the platform and GPU levels, while Qualcomm takes the win at the CPU level. Once again, it's not abundantly clear to me how much of Qualcomm's CPU core power advantage is due to the fact that we're not taking into account power consumption of the L2 cache.

 

 

 



GPU Power Consumption - 3D Gaming Workload

While we don't yet have final GPU benchmarks under Windows RT/8 that we can share numbers from, the charts below show power consumption in the same DX title running through roughly the same play path. Tegra 3 remains the fastest in this test, followed by Adreno 225 and finally the PowerVR SGX 545/Atom solution. Power consumption roughly follows that same order, however Tegra 3 burns much more power in delivering that performance than either of the competitors. I'd be really interested to see how some of the higher performing Imagination cores do here.



What's Next? ARM's Cortex A15

Comparing to Qualcomm's APQ8060A gives us a much better idea of how Atom fares in the modern world. Like Intel, Qualcomm appears to prioritize single threaded performance and builds its SoCs on a leading edge LP process. If this were the middle of 2012, the Qualcomm comparison is where we'd stop however this is a new year and there's a new kid in town: ARM's Cortex A15.

We've already looked at Cortex A15 performance and found it to be astounding. While Intel's 5-year old Atom core can still outperform most of the other ARM based designs on the market, the Cortex A15 easily outperforms it. But at what power cost?

To find out, we looked at a Google Nexus 10 featuring a Samsung Exynos 5250 SoC. The 5250 (aka Exynos 5 Dual) features two ARM Cortex A15s running at up to 1.7GHz, coupled with an ARM Mali-T604 GPU. The testing methodology remains identical.

Idle Power

As the Exynos 5250 isn't running Windows RT, we don't need to go through the same song and dance to wait for live tiles to stop animating. The Android home screen is static to begin with, all swings in power consumption have more to do with WiFi at this point:

At idle, the Nexus 10 platform uses more power than any of the other tablets. This shouldn't be too surprising as the display requires much more power, I don't think we can draw any conclusions about the SoC just yet. But just to be sure, let's look at power delivery to the 5250's CPU and GPU blocks themselves:

Ah the wonderful world of power gating. Despite having much more power hungry CPU cores, when they're doing nothing the ARM Cortex A15 looks no different than Atom or even Krait.

Mali-T604 looks excellent here. With virtually nothing happening on the display the GPU doesn't have a lot of work to do to begin with, I believe we're also seeing some of the benefits of Samsung's 32nm LP (HK+MG) process.

Remove WiFi from the equation and things remain fairly similar, total platform power is high thanks to a more power hungry display but at the SoC level idle power consumption is competitive. The GPU power consumption continues to be amazing, although it's possible that Samsung simply doesn't dangle as much off of the GPU power rail as the competitors.



Cortex A15: SunSpider 0.9.1

SunSpider performance in Chrome on the Nexus 10 isn't all that great to begin with, so the Exynos 5250 curve is longer than the competition. I wouldn't pay too much attention to overall performanceas that's more of a Chrome optimization issue, but we begin to shine some light on Cortex A15's power consumption:

Although these line graphs are neat to look at, it's tough to quantify exactly what's going on here. Following every graph from here on forward I'll present a bar chart that integrates over the benchmark time period (excluding idle) and presents total energy used during the task in Joules.

Task Energy - SunSpider 0.9.1 - Total Platform

The data here reflects what you see in the chart above fairly well. Acer/Intel manage to get the edge over Dell/Qualcomm when it comes to total energy consumed during the test. The Nexus 10 doesn't do so well here but that's likely a software issue more than anything else.

CPU power is just insane. Peak power consumption is around 3W, compared to around 1W for the competition.

Task Energy - SunSpider 0.9.1 - CPU Only

Looking at the CPU core itself, Qualcomm appears to have the advantage here but keep in mind that we aren't yet tracking L2 cache power on Krait (but we are on Atom). Regardless Atom and Krait are very close.

Even GPU power consumption is pretty high compared to everything else (minus Tegra 3).

Task Energy - SunSpider 0.9.1 - GPU Only

SunSpider - Max, Avg, Min Power

For your reference, the remaining graphs present max, average and min power draw throughout the course of the benchmark (excluding beginning/end idle times).

Max Power Draw - SunSpider 0.9.1 - Total Platform

Max Power Draw - SunSpider 0.9.1 - GPU Only

Max Power Draw - SunSpider 0.9.1 - CPU Only

Average Power Draw

Average Power Draw - SunSpider 0.9.1 - Total Platform

Average Power Draw - SunSpider 0.9.1 - GPU Only

Average Power Draw - SunSpider 0.9.1 - CPU Only

Minimum Power Draw

Min Power Draw - SunSpider 0.9.1 - Total Platform

Min Power Draw - SunSpider 0.9.1 - GPU Only

Min Power Draw - SunSpider 0.9.1 - CPU Only



Cortex A15: Kraken

While SunSpider wasn't a great performance target for Exynos 5250, Kraken is a different story entirely. The Cortex A15s complete the task significantly quicker than the competition, and as a result achieve competitive energy usage although at significantly higher peak power consumption.

 

 

Task Energy - Kraken - Total Platform

Despite the high peak power consumption of the Nexus 10 and its Cortex A15s, total energy usage is the lowest out of any of the contenders here since the Exynos 5250 is able to complete the benchmark so quickly. Intel is up next, followed by Qualcomm.

Once again we're seeing peak CPU power usage of ~3W, compared to < 1.5W for the competition. The performance advantage is enough to justify the added power, however in devices that simply can't dissipate this much heat (e.g. smartphones) I wonder what will happen.

Task Energy - Kraken - CPU Only

Isolate the CPU cores themselves and the race is much closer, this time with Qualcomm taking the lead.

Task Energy - Kraken - GPU Only

When mostly idle, the Mali-T604 on Samsung's 32nm LP (HK+MG) process barely sips power.

Kraken - Max, Avg, Min Power

Max Power Draw - Kraken - Total Platform

Max Power Draw - Kraken - GPU Only

Max Power Draw - Kraken - CPU Only

Average Power Draw

Average Power Draw - Kraken - Total Platform

Average Power Draw - Kraken - GPU Only

Average Power Draw - Kraken - CPU Only

Minimum Power Draw

Min Power Draw - Kraken - Total Platform

Min Power Draw - Kraken - GPU Only

Min Power Draw - Kraken - CPU Only



Cortex A15: RIABench

The RIABench story isn't any different from the other tests, although peak power consumption is slightly lower for the Cortex A15 here. The gap between it and Atom/Krait remains quite large. The big leap in performance does come at a real cost in power consumption.

 

Task Energy - RIABench - Total Platform

 

Task Energy - RIABench - CPU Only

Task Energy - RIABench - GPU Only

RIABench - Max, Avg, Min Power

Max Power Draw - RIABench - Total Platform

Max Power Draw - RIABench - GPU Only

Max Power Draw - RIABench - CPU Only

Average Power

Average Power Draw - RIABench - Total Platform

Average Power Draw - RIABench - GPU Only

Average Power Draw - RIABench - CPU Only

Minimum Power

Min Power Draw - RIABench - Total Platform

Min Power Draw - RIABench - GPU Only

Min Power Draw - RIABench - CPU Only



Cortex A15: WebXPRT 2013 - Community Preview 1

Obviously since we don't have an Android version of TouchXPRT we can't run that, but we can use PT's WebXPRT test on all of the platforms here. Exynos 5250 manages the best score we've seen thus far (246), although since we are using different browsers it's entirely possible that its performance is still being held back a bit. The performance advantage over Atom is around 9%, however the power expended to get here is significant.

 

Task Energy - WebXPRT 2013 CP1 - Total Platform

 

Once again, peak CPU power usage is in a completely different league. Here we see spikes nearing 4W for the dual core Cortex A15 SoC, compared to < 1.5W for Intel and Qualcomm.

Task Energy - WebXPRT 2013 CP1 - CPU Only

Task Energy - WebXPRT 2013 CP1 - GPU Only

WebXPRT 2013 CP1 - Max, Avg, Min Power

Max Power Draw - WebXPRT 2013 CP1 - Total Platform

Max Power Draw - WebXPRT 2013 CP1 - GPU Only

Max Power Draw - WebXPRT 2013 CP1 - CPU Only

Average Power Draw

Average Power Draw - WebXPRT 2013 CP1 - Total Platform

Average Power Draw - WebXPRT 2013 CP1 - GPU Only

Average Power Draw - WebXPRT 2013 CP1 - CPU Only

Minimum Power Draw

Min Power Draw - WebXPRT 2013 CP1 - Total Platform

Min Power Draw - WebXPRT 2013 CP1 - GPU Only

Min Power Draw - WebXPRT 2013 CP1 - CPU Only



Cortex A15: GPU Power Consumption - 3D Gaming Workload

ARM's Mali-T604 GPU is pretty quick, but similar to ARM's Cortex A15s it can definitely use a considerable amount of power to deliver that performance. Peak GPU power consumption tops out at just under 4W compared to ~1W for Qualcomm's Adreno 225. Even the Cortex A15s pull a decent amount of power in this test compared to the alternatives. It seems like that 4W max we keep seeing is likely the typical TDP for the Exynos 5250, anywhere from 1x - 4x what we get with Atom Z2760 and APQ8060A.

Task Energy - 3D Game 1 - Total Platform

Task Energy - 3D Game 1 - CPU Only

Task Energy - 3D Game 1 - GPU Only

The Mali-T604's performance advantage here comes at a price: total energy consumed is far higher than any of the competing solutions.

GPU Power Consumption - Max, Avg, Min Power

Max Power Draw - 3D Game 1 - Total Platform

Max Power Draw - 3D Game 1 - GPU Only

Max Power Draw - 3D Game 1 - CPU Only

Average Power Draw

Average Power Draw - 3D Game 1 - Total Platform

Average Power Draw - 3D Game 1 - GPU Only

Average Power Draw - 3D Game 1 - CPU Only

Minimum Power Draw

Min Power Draw - 3D Game 1 - Total Platform

Min Power Draw - 3D Game 1 - GPU Only

Min Power Draw - 3D Game 1 - CPU Only



Determining the TDP of Exynos 5 Dual

Throughout all of our Cortex A15 testing we kept bumping into that 4W ceiling with both the CPU and GPU - but we rarely saw both blocks use that much power at the same time. Intel actually tipped me off to this test to find out what happens if we try and force both the CPU and GPU to run at max performance at the same time. The graph below is divided into five distinct sections, denoted by colored bars above the sections. On this chart I have individual lines for GPU power consumption (green), CPU power consumption (blue) and total platform power consumption, including display, measured at the battery (red).

In the first section (yellow), we begin playing Modern Combat 3 - a GPU intensive first person shooter. GPU power consumption is just shy of 4W, while CPU power consumption remains below 1W. After about a minute of play we switch away from MC3 and you can see both CPU and GPU power consumption drop considerably. In the next section (orange), we fire up a multithreaded instance of CoreMark - a small CPU benchmark - and allow it to loop indefinitely. CPU power draw peaks at just over 4W, while GPU power consumption is understandably very low.

Next, while CoreMark is still running on both cores, we switch back to Modern Combat 3 (pink section of the graph). GPU voltage ramps way up, power consumption is around 4W, but note what happens to CPU power consumption. The CPU cores step down to a much lower voltage/frequency for the background task (~800MHz from 1.7GHz). Total SoC TDP jumps above 4W but the power controller quickly responds by reducing CPU voltage/frequency in order to keep things under control at ~4W. To confirm that CoreMark is still running, we then switch back to the benchmark (blue segment) and you see CPU performance ramps up as GPU performance winds down. Finally we switch back to MC3, combined CPU + GPU power is around 8W for a short period of time before the CPU is throttled.

Now this is a fairy contrived scenario, but it's necessary to understand the behavior of the Exynos 5250. The SoC is allowed to reach 8W, making that its max TDP by conventional definitions, but seems to strive for around 4W as its typical power under load. Why are these two numbers important? With Haswell, Intel has demonstrated interest (and ability) to deliver a part with an 8W TDP. In practice, Intel would need to deliver about half that to really fit into a device like the Nexus 10 but all of the sudden it seems a lot more feasible. Samsung hits 4W by throttling its CPU cores when both the CPU and GPU subsystems are being taxed, I wonder what an 8W Haswell would look like in a similar situation...



Final Words

Whereas I didn't really have anything new to conclude in the original article (Atom Z2760 is faster and more power efficient than Tegra 3), there's a lot to talk about here. We already know that Atom is faster than Krait, but from a power standpoint the two SoCs are extremely competitive. At the platform level Intel (at least in the Acer W510) generally leads in power efficiency. Note that this advantage could just as easily be due to display and other power advantages in the W510 itself and not necessarily indicative of an SoC advantage.

Looking at the CPU cores themselves, Qualcomm takes the lead. It's unclear how things would change if we could include L2 cache power consumption for Qualcomm as we do for Intel (see page 2 for an explanation). I suspect that Qualcomm does maintain the power advantage here though, even with the L2 cache included.

On the GPU side, Intel/Imagination win there although the roles reverse as Adreno 225 holds a performance advantage. For modern UI performance, the PowerVR SGX 545 is good enough but Adreno 225 is clearly the faster 3D GPU. Intel has underspecced its ultra mobile GPUs for a while, so a lot of the power advantage is due to the lower performing GPU. In 2D/modern UI tests however, the performance advantage isn't realized and thus the power advantage is still valid.

Qualcomm is able to generally push to lower idle power levels, indicating that even Intel's 32nm SoC process is getting a little long in the tooth. TSMC's 28nm LP and Samsung's 32nm LP processes both help silicon built in those fabs drive down to insanely low idle power levels. That being said, it is still surprising to me that a 5-year-old Atom architecture paired with a low power version of a 3-year-old process technology can be this competitive. In the next 9 - 12 months we'll finally get an updated, out-of-order Atom core built on a brand new 22nm low power/SoC process from Intel. This is one area where we should see real improvement. Intel's chances to do well in this space are good if it can manage to execute well and get its parts into designs people care about.


Device level power consumption, from our iPhone 5 review, look familiar?

If the previous article was about busting the x86 power myth, one key takeaway here is that Intel's low power SoC designs are headed in the right direction. Atom's power curve looks a lot like Qualcomm's, and I suspect a lot like Apple's. There are performance/power tradeoffs that all three make, but they're all being designed the way they should.

The Cortex A15 data is honestly the most intriguing. I'm not sure how the first A15 based smartphone SoCs will compare to Exynos 5 Dual in terms of power consumption, but at least based on the data here it looks like Cortex A15 is really in a league of its own when it comes to power consumption. Depending on the task that may not be an issue, but you still need a chassis that's capable of dissipating 1 - 4x the power of a present day smartphone SoC made by Qualcomm or Intel. Obviously for tablets the Cortex A15 can work just fine, but I am curious to see what will happen in a smartphone form factor. With lower voltage/clocks and a well architected turbo mode it may be possible to deliver reasonable battery life, but simply tossing the Exynos 5 Dual from the Nexus 10 into a smartphone isn't going to work well. It's very obvious to me why ARM proposed big.LITTLE with Cortex A15 and why Apple designed Swift.

I'd always heard about Haswell as the solution to the ARM problem, particularly in reference to the Cortex A15. The data here, particularly on the previous page, helped me understand exactly what that meant. Under a CPU or GPU heavy workload, the Exynos 5 Dual will draw around 4W. Peak TDP however is closer to 8W. If you remember back to IDF, Intel specifically called out 8W as a potential design target for Haswell. In reality, I expect that we'll see Haswell parts even lower power than that. While it may still be a stretch to bring Haswell down to 4W, it's very clear to me that Intel sees this as a possiblity in the near term. Perhaps not at 22nm, but definitely at 14nm. We already know Core can hit below 8W at 22nm, if it can get down to around 4W then that opens up a whole new class of form factors to a traditionally high-end architecture.

Ultimately I feel like that's how all of this is going to play out. Intel's Core architectures will likely service the 4W and above space, while Atom will take care of everything else below it. The really crazy part is that it's not too absurd to think about being able to get a Core based SoC into a large smartphone as early as 14nm, and definitely by 10nm (~2017) should the need arise. We've often talked about smartphones being used as mainstream computing devices in the future, but this is how we're going to get there. By the time Intel moves to 10nm ultramobile SoCs, you'll be able to get somewhere around Sandy/Ivy Bridge class performance in a phone.

At the end of the day, I'd say that Intel's chances for long term success in the tablet space are pretty good - at least architecturally. Intel still needs a Nexus, iPad or other similarly important design win, but it should have the right technology to get there by 2014. It's up to Paul or his replacement to ensure that everything works on the business side.

As far as smartphones go, the problem is a lot more complicated. Intel needs a good high-end baseband strategy which, as of late, the Infineon acquisition hasn't been able to produce. I've heard promising things in this regard but the baseband side of Intel remains embarassingly quiet. This is an area where Qualcomm is really the undisputed leader, Intel has a lot of work ahead of it here. As for the rest of the smartphone SoC, Intel is on the right track. Its existing architecture remains performance and power competitive with the best Qualcomm has to offer today. Both Intel and Qualcomm have architecture updates planned in the not too distant future (with Qualcomm out of the gate first), so this will be one interesting battle to watch. If ARM is the new AMD, then Krait is the new Athlon 64. The difference is, this time, Intel isn't shipping a Pentium 4.

Log in

Don't have an account? Sign up now