Final Words

HiSilicon’s Kirin 950 delivered impressive performance and efficiency, raising our expectations for its successor. And on paper at least, the Kirin 960 seems better in every way. It incorporates ARM’s latest IP, including A73 CPUs, the new Mali-G71 GPU with more cores, and a CCI-550 interconnect. It offers other improvements too, such as a new modem that supports higher LTE speeds and UFS 2.1 support. But when it comes to performance and efficiency, the Kirin 960 improves in some areas and regresses in others.

The Kirin 960’s A73 CPU cores are marginally faster than the 950’s A72 cores when handling integer workloads, with a more noticeable lead over Qualcomm’s Kryo and the older A57. When looking at floating-point IPC, the opposite is true, with Qualcomm’s Kryo and Kirin 950’s A72 cores posting better results than the 960’s A73.

Some of this performance regression may be explained by Kirin 960’s memory performance. Both latency and read bandwidth improve for its larger 64KB L1 cache, but write bandwidth is lower than Kirin 950. The 960’s L2 cache bandwidth is also lower for both read and write. Its latency to main memory improves by 25%, however, and bandwidth improves by an impressive 69%.

What’s really disappointing (and puzzling) about Kirin 960, though, is that its CPU efficiency is actually worse than the 950’s. ARM did a lot of work to reduce the A73’s power consumption relative to the A72, but the Kirin 960’s A73 cores see a substantial power increase over the 950’s A72 cores. The poor efficiency numbers are likely a combination of HiSilicon’s specific implementation and the switch to the 16FFC process. This was definitely an unexpected result considering the Mate 9’s excellent battery life. Fortunately, Huawei was able to save power elsewhere, such as the display, to make up for the SoC’s power increase, but it’s difficult not to think about how much better the battery life could have been.

Power consumption for Kirin 960’s GPU is even worse, with peak power numbers that are entirely inappropriate for a smartphone. Part of the problem is poor efficiency, again likely a combination of implementation and process, which is only made worse by an overly aggressive 1037MHz peak operating point that only serves to improve the spec sheet and benchmark results.

The Kirin 960 is difficult to categorize. It’s definitely not a clear upgrade over the 950, but it does just enough things right that we cannot dismiss it outright either. For example, its generally improved integer performance and lower system memory latency give it an advantage over the 950 in many real-world workloads. We cannot completely condemn its GPU either, because its sustained performance, at least in the Mate 9’s large aluminum chassis, is on par with or better than competing flagship phones, as is its battery life when gaming. Certainly the Mate 9 proves that Kirin 960 is a viable flagship SoC as long as Huawei puts in the effort to work around its flaws. But with a new generation of 10nm SoCs just around the corner, those flaws will only become more apparent.

GPU Power Consumption and Thermal Stability
Comments Locked

86 Comments

View All Comments

  • Meteor2 - Wednesday, March 15, 2017 - link

    Andrei! That's a pretty big stamp :). I hope you're well.
  • aryonoco - Tuesday, March 14, 2017 - link

    A great article, very insightful, and absolutely unique on the web.

    Well done Matt, well done AT.
  • name99 - Tuesday, March 14, 2017 - link

    One issue in comparing the GeekBench results to the SPEC results is the question of compiler optimization.
    Were the SPEC results specifically targeted at the A73? And using the most recent version of LLVM?

    GeekBench (as far as I can tell) compiles a particular version (say version 4) with a particular compiler and target, and does not update those over time until say GeekBench 5 is released. This is not an awful practice --- it certainly makes it a lot easier to compare results in such a way that compiler optimizations don't confuse the issue. But it DOES mean that
    - the SPEC results may be picking up A73 specific tuning that GeekBench does not reflect.
    - depending on when you compiled the comparison AArch64 binaries, some fraction (and this may be high, 5% or more) of the A73 "improvement" may reflect LLVM improvement, both generally and in specific ARMv8 optimizations.

    If the SPEC results were "maximally" compiled (so that LTO was used, something that only really started to work well in the most recent LLVM versions) there could be even more of a compiler-based discrepancy.
  • MrSewerPickle - Thursday, March 16, 2017 - link

    Thank you guys for the review. Please keep these details up and going. You guys are a rarity these days and still doing great. Facts are hard to post without rambling about them and excessive opinion rambling accompanying it.
  • socalbigmike - Thursday, March 16, 2017 - link

    Just don't try and run any apps on it! Because half of them will NOT run.
  • darkich - Thursday, March 23, 2017 - link

    Wow they messed up again with that high clocked GPU implementation. Just ridiculous..Huweai making the same mistake over and over again, refusing to take note from Samsung ..I'm certain the Samsung's low clocked G71 MP18 will consume far less power than this 8 core setup!

Log in

Don't have an account? Sign up now