Today in Beijing, HiSilicon held an extensive media briefing on their brand new flagship SoC - the Kirin 950. The new chipset is a very large leap for the Huawei subsidiary as it represents the first true flagship successor to the Kirin 920 that we've seen a little over a year ago in the Honor 6 review.

In the past our analysis of the Kirin chipset's overall performance in phones such as the Mate 7 or the new P8 was not overly positive. The reason to this was mostly due to lacklustre showing of the GPU and general low efficiency of the SoC. The new chipset, the Kirin 950, promises to bring a lot of improvements and also represents a lot of firsts for the company.

HiSilicon High-End Kirin Lineup
SoC Kirin 920/925 Kirin 930/935 Kirin 950
CPU 4x Cortex A15
(1.7/1.8 GHz)
4x Cortex A7 (1.3GHz)
4x Cortex A53
(2.0/2.2 GHz)
4x Cortex A53 (1.5 GHz)
4x Cortex A72 (2.3 GHz)
4x Cortex A53 (1.8 GHz)
Memory
Controller
2x 32-bit LPDDR3
@ 800MHz
12.8GB/s b/w
2x 32-bit LPDDR3
or LPDDR4
(hybrid controller)
GPU Mali T628MP4
600MHz
Mali T628MP4
680MHz
Mali T880MP4
900MHz
Encode/
Decode
1080p H.264
Decode & Encode
1080p H.264
Decode & Encode

2160p30 HEVC 
Decode
Camera/ISP Bad One (Needs Improvement) Dual 14bit ISP
940MP/s
Integrated
Modem
Balong Integrated
UE Cat. 6 LTE

In terms of CPU HiSilicon continues the usage of a traditional big.LITTLE setup with four Cortex A72 and four A53 cores. This effectively makes HiSilicon the second vendor after MediaTek to ship ARM's new CPU core and the first one to do so in a mobile device. Thanks to the new process node the big core's frequency is able to be pushed up to 2.3GHz while the little cores clock in at up to 1.8GHz. HiSilicon estimates that the A72 is overall 11% faster clock-for-clock than current A57 designs, so together with the slightly higher clocks we should see the Kirin 950 perform about 20% better than the Exynos 7420 in CPU-bound tasks.

On the GPU side, the Kirin 950 gets an upgrade to Mali's new T880 GPU, again making this the first silicon implementation of the new ARM IP. The designers chose to remain with an MP4 configuration, but with a vastly increased clock rate of to up to 900MHz. I asked the company's senior planning manager as to why they chose to go with a lower core count and high frequency over a potentially more power efficient wider implementation at lower clock speeds, and the answer seems to be strikingly similar to MediaTek's explanation to the same question:

For the very vast majority of use-cases and users the GPU isn't a bottleneck and the full potential of more powerful GPUs are never utilized, so the vendors prefer to save die space and thus cost by using less GPU cores. As such, while the upgrade represents a significant improvement over previous Kirin chipsets, one shouldn't expect it to compete with more powerful alternatives such as from Qualcomm or Samsung.

Due to the new SoC having both A72 CPU cores and the new T880 Mali GPU I was expecting it to also ship with ARM's new CCI-500 - after all, 3 IPs were launched simultaneously back in February. Unfortunately HiSilicon explained that the new interconnect wasn't ready during the Kirin 950's design phase and thus the designers had to rely on the older CCI-400 that has been used in all big.LITTLE SoCs released to date.

HiSilicon claims to have made some significant leaps in terms of ISP and DSP performance. In fact, the Kirin 950 is the first SoC that uses their newly developed PrimISP and IVP32 DSP. The new IP arose in part out of the 2013 recruitment of ex-Texas Instrument's OMAP SoC engineers in Nice, France. The subsequent founding of Huawei's Image Technology R&D Center allowed the company to have the design resources to build products tailored to their needs. As such, Huawei is finally able to sport a cutting-edge and competitive ISP/DSP integrated in the SoC which in the future will be able avoid usage of dedicated external ISP's such as those employed in this year's Huawei devices such as the Honor 6+, P8 or Mate S.

On the multimedia side, the company did not talk much about its encode and decode capabilities, limiting it to a single line in a single slide of their presentation, meaning we had to probe for further details. While the new SoC can now decode HEVC at up to 4K resolution at 30fps, the encode capabilities have not evolved from the previous generation and thus is limited to 1080p30 H.264. This also means that while the ISP would be able to handle 4K video, the actual encoder would make it impossible to actually implement 4K recording on a device product without a reduction in frame rate.

In terms of memory subsystem, the SoC is the first in the mobile industry to employ a hybrid LPDDR3 / LPDDR4 memory controller. It was only last week that ARM had announced a similar IP in the form of the DMC-500, but HiSilicon explained that the block employed in the Kirin 950 is of their own design and they were very proud to be able to offer this capability ahead of the competition.

In terms of connectivity, the SoC still employs an integrated Balong modem capable of 5-mode UE Category 6 LTE speeds of up to 300 Mbps. There wasn't much talk about the modem so likely the improvements were kept relatively minor.

Lastly, we saw an upgrade to the SoC's sensor-hub. The new i5 succeeds the i3 used in the Kirin 92X and 930s. The new co-processor is powered by an ARM Cortex M7 core which is able to provide much more processing performance at much lower power, including an FP unit and other controls.

TSMC 16FF+ Manufactured

As mentioned earlier, the Kirin 950 is HiSilicon's first TSMC 16FF+ manufactured mobile SoC. This also makes the Chinese vendor second in line after Apple's to release mobile silicon based on the new manufacturing node.

In fact, HiSilicon explains that along with Apple they've been the two main lead partners of the Taiwanese semiconductor giant, and both parties have been working closely together to try to improve the design and to tune the process. In fact, the company revealed that first mass production (also commonly named as risk production) started as early as last January. Over the following months both companies cooperated to sort out bugs and imperfections in the design (chip revisions) to go up from 20% yield in the earliest runs to up to 80% yields and qualified mass production this last August.

HiSilicon was stressing in particular one aspect of the new process: reduced power density. Over the last few years we've seen semiconductor manufacturers able continuously shrink their process geometry, but power consumption reductions were not able to scale at the same pace. The result is that with each new generation, even if total TDP might have been kept the same, the actual power density on the silicon has gone up significantly. This causes problems with thermal dissipation and can lead to hot-spots on the silicon die which then exceed the safe operating temperatures of the chipset. Until now vendors would circumvent this problem by creating "dark silicon", meaning silicon area on the die in which there is no logic and only serves as "padding" to help thermal dissipation. 
 
HiSilicon explains that TSMC's 20SoC node in particular had seen the negative effects of this development as the power density went up 42.5% over the previous 28HPM node. The introduction of FinFETs in the new 16FF+ is able to drastically improve this metric by more than halving the power density compared to the 20SoC process. The company was especially enthusiastic when talking about this improvement and emphasizing the severity of the issue that this has caused in the past. One visible impact of this seems to be the CPU's block sizes, as we're seeing a much smaller 'big cluster' than we would normally expect.

The company was able to show off the implementation area of both an A57 and an A72 on 16FF+. The more interesting number which we can use in an apples-to-apples comparison and put dark silicon into perspective are the A57 figures. Here HiSilicon claims that it an A57 implementation takes up around 9.7mm² - much less than the 13.2 and 15.3mm² we've seen from 20SoC vendors such as Nvidia on the X1 or Qualcomm on the S810 - the point here being that the implementation is much smaller than what the ~8-10% process shrink allows.

Of course, the Kirin 950 uses the more efficient A72 cores which are able to architecturally reduce the block's footprint. As such, the Kirin's big CPU cluster comes in at only 8.0mm², beating the Exynos's 7420 8.88mm² even though Samsung has an advantage on process density.

In terms of power consumption, they also had a slide up demonstrating the power curves of the cores:

According to the graphic, the A53s max out at around 200mW while the A72s go up to ~1450mW. On the big cores this is somewhat comparable to the Exynos 7420 even though the figures we publish use a different measurement methodology and test-case. HiSilicon uses Dhrystone which I found to use about 10% more power than the power virus which I use for our deep dive power measurement articles.

The company had live development demo boards running at the event. While we were unfortunately unable to have more time with them, it does demonstrate the high maturity of the SoC. In fact, we should be seeing products in consumer's hands before the end of the year.

All in all, I think HiSilicon was able to showcase some very significant improvements to the Kirin lineup. I see the Kirin 950 as the true successor to the 920, and it seems that we'll be finally able to move forward in terms of performance and efficiency. While the Kirin still has some rough edges, especially in the aspect of media capabilities, it promises to patch up a lot of the issues that seems to have plagued Huawei's latest flagships. I'm looking forward to getting to test incoming devices with the SoC to see where it places on the competitive landscape.

POST A COMMENT

41 Comments

View All Comments

  • zeeBomb - Friday, November 06, 2015 - link

    One more thing...the Kirin 920-935 having bad ISP is rather true tbh. A camera like the Honor 7 and Mate 7/S that has good specs on paper like a IMX230, and then fails to get colours accuracy and detail is a joke. Almost to a point like its a slap to the face! Reply
  • mercucu1111 - Tuesday, November 17, 2015 - link

    http://images.anandtech.com/doci/9330/a53-average-...

    What?
    Reply
  • SunnyNW - Thursday, November 05, 2015 - link

    Has there actually been any confirmation that Apple's A9 was indeed manufactured on TSMC's 16FF+ node and not just regular 16FF? Reply
  • Andrei Frumusanu - Thursday, November 05, 2015 - link

    Yes. Regular 16FF never went into any kind of mass production. Reply
  • webdoctors - Thursday, November 05, 2015 - link

    Slide 3: First Commercial A72...

    What, A72 is only 15% faster than A57? Which was only ~20% faster than A15. The last 4 years has only given us a 35% boost in ARM CPUs? We need some real competition in this space...
    Reply
  • witeken - Thursday, November 05, 2015 - link

    Broxton won't arrive until the second half of 2016, which is a delay of more than a year (Intel originally said it would be released mid-2015). Reply
  • syxbit - Thursday, November 05, 2015 - link

    True, but I think that's assuming the same process.
    In 4 years we've also gone from 45nm to 14nm.
    Reply
  • extide - Thursday, November 05, 2015 - link

    Seems like Apple is the only one to really push hard on this arch. I guess we will have to see what the Qualcomm Kryo puts out, but I don't know it really is a wildcard at this point, could be amazing, could be terrible. Reply
  • ToTTenTranz - Thursday, November 05, 2015 - link

    35% boost if all the CPU cores were being made in the same process, which they weren't.

    Besides, the A72 consumes less power while being faster.
    The A57 is a power hog (leading many IHVs to use only Cortex A53 for their "high-end" ARM64 chips), so it's natural that ARM would focus on power efficiency over raw performance.

    Besides, it's not like smartphones and tablets are in dire need of higher single-threaded performance. A Cortex A15 at 1.6-1.8GHz already blasts through most "real-life" Javascript scenarios.
    Reply
  • melgross - Thursday, November 05, 2015 - link

    No, single core performance is very important. For most uses is more important than a greater number of cores. Reply

Log in

Don't have an account? Sign up now