Kirin 920 SoC & Platform Power Analysis

The central point of the Honor 6 is the new HiSilicon Kirin 920. This is the first non-Samsung big.LITTLE chip that managed to get to market in consumer devices. The Kirin 920 is the successor to HiSIlicon's Kirin 910T that is shipping with the Huawei Ascend P7, but don't let the minor naming scheme change fool you as the 920, or more aptly, the Hi3630 as its actual model number describes it, is a major generational upgrade in every measurable aspect.

The Hi3630 is a fully HMP-enabled big.LITTLE design with 4x Cortex A7 and 4x Cortex A15 cores. HiSilicon has remained relatively conservative with the clock speeds and as such we don't see them exceeding 1.3 and respectively 1.7GHz for the little and big clusters. We see implementation of newer r3 A15 silicon IP for the big CPUs and should expect better power management and power efficiency as opposed to past A15 implementations.

On the GPU side we find a Mali T628MP4 clocked in at 600MHz. This is nothing to write home about as the T628 was to be found in devices already over a year ago in the form of the Exynos 5420. The MP4 configuration is also a downgrade from Samsung's MP6 implementation, so we should expect lower performance. I feel a bit underwhelmed by HiSilicon's GPU decision here as it seems they target a more mid-range performance segment rather than trying to compete with Samsung and Qualcomm. We'll see later in the benchmark section how this works out for the Honor 6.

HiSilicon "Kirin 920" Hi3630 vs Direct Competitors
SoC HiSilicon
Hi3630
Samsung
Exynos 5422
Samsung
Exynos 5430
Qualcomm
MSM8974v3
CPU 4x Cortex A7 r0p5 @ 1.3GHz
+
4x Cortex A15 r3p3 @ 1.7GHz
4x Cortex A7 r0p5 @ 1.3GHz
+
4x Cortex A15 r2p4 @ 1.9GHz
4x Cortex A7 r0p5 @ 1.3GHz
+
4x Cortex A15 r3p3 @ 1.8GHz 
4x Krait 400 @ 2.3GHz
Memory
Controller
2x 32-bit @ 800MHz DDR
12.8GB/s b/w
2x 32-bit @ 933MHz DDR
14.9GB/s b/w 
2x 32-bit @ 1066MHz DDR
17.0GB/s b/w 
2x 32-bit @ 933MHz DDR
14.9GB/s b/w 
GPU Mali T628MP4 
@ 600MHz
Mali T628MP6
@ 533MHz 
Mali T628MP6
@ 600MHz
Adreno 330 @
 578MHz
Integrated
Modem
"Balong"
LTE Cat. 6 300Mbps
n/a n/a MDM 9x25
LTE Cat. 4
150MBps
Video
H/W
H264 1080p
Enc- & Decoder
H264 2160p
Enc- & Decoder
H264 2160p
Enc- & Decoder
+
H265 4K Decoder
H264 2160p
Enc- & Decoder
Mfc.
Process
TSMC
28nm HPm
Samsung
28nm HKMG
Samsung
20nm HKMG
TSMC
28nm HPm

The SoC is manufactured on TSMC's 28nm HPm process. Unfortunately I wasn't able to determine the running voltages of the chip as it seems HiSilicon employs a separate microcontroller and closed firmware layer for direct DVFS controlling (DVFS is still arbitrated by the kernel though).

We have a standard 2x32bit LPDDR3 memory interface running at 800MHz DDR, making available some 12.8GB/s of bandwidth to the SoC. Hardware video encoder and decoders allow for H264 1080p recording and playback. The SoC employs some auxillary accelerator blocks such as a JPEG hardware unit. We have little information on the ISP that HiSilicon employs but it should be of a similar design as Samsung employs, meaning a Cortex A5 core with dedicated SIMD accelerators. 

The NAND/MMC interfaces use the same DesignWare IP that we find on Exynos SoCs, deploying 3 controllers each handling the main eMMC NAND, the external SD card via SDIO, and also the Broadcom BCM4334 Wi-Fi chip via SDIO.

Probably the most important aspect of the Kirin 920 SoC is that it has a new integrated LTE modem built into the same die. The "Balong" modem is capable of category 6 LTE speeds with carrier aggregation, not only making this one among the first Cat. 6 modems, but the very first integrated silicon available from any vendor. Looking back at the rest of the SoC's specifications this might have been one of the reasons as to why the SoC appears to have conservative specifications, as modems take a long time to validate and having it integrated into a SoC also delays the whole chip.

Unfortunately we couldn't review the modem in this Chinese unit as it lacks the RF front-end compatible with western FDD networks. For what it's worth, it runs 2G and EDGE seemingly well...

Power management

While knowing about the silicon employed gives us some notion about its expected performance, nowdays modern power management makes it pretty much unpredictable as to how efficient a SoC will be. In the future I'll be trying to expose more of how vendors implement their power management schemes and what we should expect of devices in daily use.

In the case of the HiSilicon Hi3630 there's a bit of a double-edged sword story going on.

As a fully HMP-enabled big.LITTLE chip, the OS employs a full Global Task Scheduling (GTS) scheme inside of the Linux Kernel (version 3.10.33) on the device. To be able to understand GTS we need a little explanation around the core mechanism which decides how a task is migrated between the two clusters:

The kernel employs a mechanism to track load continously for each scheduler entity (a process or a cgroup of processes). This per-entity load-racking algorithm is at the core of the scheduler mechanic for GTS. A simplified overview defines three main control parameters: the up- and down-thresholds and the load-average period which acts as a window frame for the decision making. If a task's load exceeds the up-threshold, it is migrated over to the big cluster, and similarly if the task's load falls under the down-threshold it bounced back onto the little cluster.

In Huawei's case we see the use of the HMP up- and down- as variable control parameters as the prefered method to control performance and power of the chip as opposed to the usual clock-frequency limits. Keep this in mind for the battery life benchmarks as this will impact them in substantial ways.

The chip comes of course will advanced clock- and power-gating mechanisms for the CPU cores. We have the usual ARM architectural core clock-gating state WFI (Wait-for-interrupt) on a per-CPU basis on all modern ARM chips. As a secondary-level CPUIdle state HiSilicon power-gates each individual core for prolonged idle periods (C1), and finally if all CPUs inside a cluster are sitting in extended idle periods the whole cluster is shut down (C2). Keep in mind that we are talking about entry-latencies of 500µS for C1 and 5000µS for C2, and thus represent a very fine-grained power-gating scheme compared to SoCs of the past. The little cluster may not enter the C2 state while the screen is enabled.

Because the power-gating is done via CPUIdle and not via classical hotplugging, the CPUs appear always online to the system, so don't be alarmed if that seems unusual. This also avoids the overhead that is to be found in Qualcomm SoCs and past A9-based SoCs, as hotplugging is a very expensive operation that requires a CPU to be taken out of coherency and mandates a full stop of the system for a certain amount of time, and enables much finer grained idling due to the vastly decreased latency. This also might have a side-effect that to classical monitoring tools the A15 cores might be stuck on some higher frequency in the CPUFreq statistics, while in reality the whole cluster is simply power-gated. This mode of operation is valid for all present and future big.LITTLE SoCs.

An interesting fact that I noticed while analysing the Hi3630's software stack is that it employs different CPUIdle drivers for the two clusters, with differing idle-state parameters. This is in contrast to what I've seen Samsung do, so in that regard HiSilicon employs a better software implementation.

The little cluster scales in frequency from 400MHz up to 1300MHz in 200MHz steps and is controlled by a Interactive-based governor. Google has standardized the "boostpulse" QoS mechanic in its Interactive governor and the Hi3630 takes full advantage of it, boosting up to 1200MHz when triggered by user-space events. We notice this when switching between applications in Android. In addition, the HMP thresholds are lowered for the duration of the boostpulse, easing processes to be migrated over to the big cluster. DVFS switches happen on a more coarse 80ms interval.

On the big cluster, the chip scales from 800MHz to 1700MHz also in rough 200MHz steps. We have a more standard Ondemand governor with very conservative parameters as to avoid unnecessary switches to high frequencies. We see a extremely small sampling interval of 10ms on the big cluster, this is the fastest default setting I've seen on any ARM based SoC yet to date.

On the GPU side, the Mali T628MP4 scales from an idle 120MHz to 600MHz in 6 steps employing a Ondemand algorithm on a 20ms sample interval. Again, due to the SoCs having the same GPU IP I can't stop myself from comparing it to Samsung's implementation of the GPU DVFS drivers: This is a much more aggressive algorithm than what see see in Exynos SoCs. While the latter can only reach the higher frequencies in sequential order from frequency to frequency, the HiSilicon chip can directly jump from its minimum state to the full 600MHz with a much quicker reponse time. I'm still not sure how wise this is as it appears to be a tad too aggressive and may impact power efficiency. Usually ARM licensees are responsible for implementing GPU power gating on the SoC-level, so while I don't have any direct evidence of this without the driver sources, I'll assume this is the case for the Hi3630.

The memory controller's driver seems more or less identical to what Samsung deploys, scaling from 120MHz to 800MHz using an identical governor algorithm as the GPU, but also employing a QoS scheme when the use-case demands minimal bandwidth requirements.

Platform Power

Once in a while, we get lucky and a device comes with a coloumb-counting fuel-gauge that allows to do precise power measurements without much hassle and external equipment. To my delight, the Honor 6 is one of these and I promptly went on to do some power analysis of the phone.

Huawei Honor 6 Platform Power

First we see that the device's idle power at our standardised 200cd/m² measuring brightness comes in at 965mW, for comparison Anand did a similar measurement for the Galaxy S5 which came in at 854mW with its AMOLED screen. Further investigating minimum brightness at 684mW and maximum brightness at 1466mW gives us about an estimated range on how efficient the JDI-manufactured panel is.

Continuing on, I tested out the camera's power usage as that is one of the most power intensive tasks for a smartphone besides playing games. At 2.5W for the preview screen and 3W for 1080p video recording we still see very reasonable values competitive with what Qualcomm and Samsung provide.

Similarly a run of Sunspider averages out at around 3W. Interesting to see here was the discrepancy between Chrome and the stock provided browser. In all test cases I was able to achieve a lower power usage on the stock browser than on Chrome. This may very well have to do with optimized CPU & GPU libraries that OEMs ship with the phone versus the more generic ones that Google bundles with Chrome.

GFXBench is when things start to get ugly: a T-Rex onscreen run averages out at 4.6W power consumption which is beyond what we find in any other competing smartphones. This really peaked my interest and tried to isolate where the power was comping from. I forcefully turned off the A15 cluster and was able to shave off almost a full 1W off the power consumption while losing only 8% of performance in the benchmark. What's left is some minor power consumption on the A7 cluster and a large chunk going to GPU and memory. When normalizing for power and peformance, the Mali T628MP4 in the Kirin 920 comes around only half the perf/W of the Adreno 330 found in the Snapdragon 801 and performs very poorly.

ARM has promised a 400% energy efficiency improvement over the T604 in the T760 and we can see why that's desperately needed, the current generation of Midgard GPUs can't compete in either performance or in power efficiency. For avid gamers, it's certainly better to look at a Qualcomm device for lack of other options in current Android devices.

While the T-Rex numbers were bad, the CPU full load ones are a disaster. Turning on a 4-thread stress test  which fully loads the A15 cluster makes the device consume a whopping 7.5W. While we're going crazy might as well also try to see peak device power consumption: Running both the stress test and T-Rex in tandem results in an average power consumption of 8.5W. Here we finally see thermal throttling putting a limit to the device power as the SoC limits itself after a few seconds. Peak power comes in in at 11.5W in the intervals where the thermal mechanism clears the limits, only to re-enable them seconds later.

For academic purposes, I again disabled the A15 cluster to try to isolate power consumption on the A7 cores. The frugal nature of the Cortex A7 barely manages to exceed 1W for the cluster + memory combined.

It is clear that HiSilicon employs no power budgeting algorithms at all as the Kirin 920 leaves any kind of limiting solely to the thermal throttling driver. The problem with this approach is that you are trusting your application not to behave like a power virus. We've seen how disabling the big cluster in the T-Rex test-case can massively improve power consumption while having only little impact on performance. We have seen that is is possible to deploy a smart power allocation mechanism such as the one found in Samsung's GTS-enabled Exynos SoCs and remain within a TDP typical of a smartphone factor. This an enormous oversight in what otherwise seemed like an excellent software stack for the Kirin 920 - I hope HiSilicon in the future will resolve this issue as it's solely a software problem that's easily fixable.

EmotionUI 2.3 - Applications CPU performance
Comments Locked

59 Comments

View All Comments

  • imaheadcase - Monday, September 15, 2014 - link

    By the time you need to replace the battery you will be getting a new phone anyways..so its a moot point.
  • Alexey291 - Monday, September 15, 2014 - link

    6 months in case of one of my phones? Damn thing expanded and basically lost about 50% of its capacity (I'm being generous here). The amount of effort it took to get it through warranty process (leaving me without a phone in the meantime)... Because you know "its still working isn't it?"

    Never again tyvm.
  • Stuka87 - Monday, September 15, 2014 - link

    "Takes up literally no space"

    Seriously? Do you understand what the meaning of "literally" is?
  • Alexey291 - Monday, September 15, 2014 - link

    I would have long since edited it to "literally no -extra- space" (because you know that would have worked as an exaggeration and that is pretty much what I wanted to say) but alas the comment system here is poop :)

    But you did have a point to make didn't you? Oh no you're just being an idiot. Fair enough.
  • semo - Sunday, September 14, 2014 - link

    So just the planned obsolescence then. Why isn't this considered outrageous? Maybe because marketing has convinced users that points 1 and 2 are actual problems (as Alexey291 has pointed out, that's not the case). Maybe you can't really make a oh la la looking phone with a removable battery like the HTC One but we don't all want or like such devices.

    Why can the auto industry cater to such a large number of wants/needs but the phone industry can't? They only make the same looking huge phones with sealed batteries, no Qi, no expandable storage, single SIM only, etc... It feels like there is no choice unless you want something practical and pocket friendly (a proper HTC Sensation successor would be nice)
  • Alexey291 - Monday, September 15, 2014 - link

    Hear hear!
  • Ethos Evoss - Sunday, September 14, 2014 - link

    jesus chris people GET OVER with replacing battery stupidness ! seriously .. you looking only what that phone doesn't what it doesn't have .. it has powerfull 3000 batt jesus christ people grow up
  • semo - Sunday, September 14, 2014 - link

    Why is that such a big problem for you? There's plenty of phones for you to choose from if you must have a sealed battery. Why can't the rest of us have a choice?
  • Alexey291 - Monday, September 15, 2014 - link

    that's until that cheap but (supposedly) powerful 3kmah battery swells and damages the phone's internal structure. Loses 50% of its original capacity. All in under 6 months.

    And before you say "that never happens" it happens very damn often especially in Huawei and Xiaomi phones >.>
  • semo - Monday, September 15, 2014 - link

    And don't expect the likes of Zerolemon and Anker to offer a better/bigger battery as they generally don't support non user replaceable batteries (most users won't bother unless they can just pop the battery in).

Log in

Don't have an account? Sign up now