Final Thoughts

What I wanted to showcase with this article was not only the particular advances of the Kirin 970, but also to use it as an opportunity to refresh everyone on the competitive landscape of the high-end Android SoC market. As the modern, post-iPhone smartphone ecosystem enters its 10-year anniversary, we’re seeing the increasing consolidation and vertical integration of the silicon that power today’s devices.

I wouldn’t necessarily say that Apple is the SoC trend setter that other companies are trying to copy, as much as other vendors are coming to the same conclusion Apple has: to be able to evolve and compete in a mature ecosystem you need to be able to control the silicon roadmap yourself. Otherwise you fall into the risk of not being able to differentiate from other vendors using similar component stacks, or risk not being competitive against those vendors who do have vertical integration. Apple was early to recognize this, and to date Huawei has been the only other OEM able actually realize this goal towards quasi-silicon independence.

I say quasi-independence because while the companies are designing their own SoCs, they are still relying on designs from the big IP licensing firms for key components such as the CPUs or GPUs. The Kirin 970 for example doesn’t really manage to differentiate itself from the Snapdragon 835 in regards to CPU performance or efficiency, as both ARM Cortex-A73 powered parts end up end up within margins of error of each other.

Snapdragon 820’s Kryo CPU core was a hard sell against a faster, more efficient, and smaller Cortex-A72. Samsung’s custom CPU efforts fared slightly better than Qualcomm’s, however the Exynos M1 and M2 haven’t yet managed to present a proper differentiating advantage against ARM’s CPUs. Samsung LSI’s performance claims for the Exynos 9810 are definitely eye-brow raising and might finally mark the point where years of investment and development on a custom CPU truly pay off, but Samsung’s mobile division has yet to demonstrate true and committed vertical integration. Considering all of this, HiSilicon’s decision to stick with ARM CPUs makes sense.

While Qualcomm has backpedalled on using its custom CPU designs in mobile, the company does demonstrate the potential and advantages of controlling your own IP designs when it comes to the GPU. To draw parallels, on the desktop GPU side of things we already see the competitive and market consequences of one vendor having a ~33% efficiency advantage (Nvidia GeForce GTX 1080 vs AMD Radeon Vega 64). Just imagine that disparity increasing to over 75-90%, and that’s currently the state that we have in the mobile landscape (Snapdragon 835 vs Kirin 970). In both cases silicon vendors can compensate for efficiency and performance by going with a larger GPU, something that is largely invisible to the experience of the end-user but definitely an unsustainable solution as it eats into the gross margin of the silicon vendor. With PPA disparities on the high end nearing factors of 4x it definitely gives moment to pause and wonder where we’ll be heading in the next couple of years.

Beyond CPU, GPU and modem IP, SoCs have a lot more component blocks that are generally less talked about. Media blocks such as encoder/decoders eventually end up summarized as feature-checkboxes going up to X*Y resolution at Z frames per second. Even more esoteric are the camera pipelines such as the ISPs of modern SoCs. Here the lack of knowledge of how they work of what the capabilities are both part due to the silicon vendor’s secrecy but also due to the fact that currently truly differentiating camera experiences are defined by software algorithm implementations. The Kirin 970’s new use a Cadence Tensilica Vision P6 DSP definitely uplifts the camera capabilities of the devices powered by the new SoC, but that’s something that we’ll cover in a future device-centric review.

The NPU is a new class of IP whose uses are still in its infancy. Did the Kirin 970 need to have it included to be competitive? No. Does its addition make it more competitive? Yes. Well, maybe. With the software ecosystem lagging behind it’s still early to say how crucial neural network acceleration IPs in smartphones will become, and we have sort of a chicken-or-egg sort of situation where certain use-cases might simply not be feasible without the hardware. The marketing advantages for Huawei have been loud and clear, and it looks industry wide adoption is inevitable and on its way. I don’t foresee myself recommending or not recommending a device based on its existing, or lack of “AI” capabilities for some time to come, and similarly consumers should apply a wait & see approach to the whole topic.

While going on a lot of tangents and comparisons against competitors, the article’s main topic was the Kirin 970. HiSilicon’s new chipset proves itself as an excellent smartphone SoC that's well-able to compete with Qualcomm’s and Samsung’s best SoCs. There’s still a looming release schedule disadvantage as Huawei doesn’t follow the usual spring Android device refresh cycle, and we expect newer SoCs to naturally leapfrog the Kirin 970. This might change in the future as both semiconductor manufacturing and IP roadmaps might become out of sync with the spring device product launches.

I come back to the fact that Huawei is only one of two OEM vendors – and the only Android vendor – whom is leveraging vertical integratation between their SoC designs and the final phones. The company has come a long way over the past few years and we’ve seen solid, generational improvements in both silicon as well as the complete phones. What is most important is that the company is able to put both reasonable goals and execute on its targets. Talking to HiSilicon I also see the important trait of self-awareness of short-comings and the need to improve in key areas. Intel’s Andy Grove motto of “only the paranoid survive” seems apt to apply to Huawei as I think the company is heading towards the right directions in the mobile business and a key reason for their success. 

NPU Performance & Huawei's Use-cases
Comments Locked

116 Comments

View All Comments

  • GreenReaper - Thursday, January 25, 2018 - link

    If it can do 2160p60 Decode then I'd imagine that of course it can do 2160p30 Decode, just as it can do 1080p60/30 decode. You list the maximum in a category.
  • yhselp - Tuesday, January 23, 2018 - link

    What a wonderful article: a joy to read, thoughtful, and very, very insightful. Thank you, Andrei. Here's to more coverage like that in the future.

    It looks like the K970 could be used in smaller form factors. If Huawei were to make a premium, bezel-less ~ 4.8" 18:9 model power by K970, it would be wonderful - a premium, Android phone about the size of the iPhone SE.

    Even though Samsung and Qualcomm (S820) have custom CPUs, it feels like their designs are much closer to stock ARM than Apple's CPUs. Why are they not making wider designs? Is it a matter of inability or unwillingness?
  • Raqia - Tuesday, January 23, 2018 - link

    Props for a nice article with data rich diagrams filled with interesting metrics as well as the efforts to normalize tests now and into the future w/ LLVM + SPECINT 06. (Adding the units after the numbers in the chart and "avg. watts" to the rightward pointing legend at the bottom would have helped me grok faster...) Phones are far from general purpose compute devices and their CPUs are mainly involved in directing program flow rather than actual computation, so leaning more heavily into memory operations with the larger data sets of SPECINT is a more appropriate CPU test than Geekbench. The main IPC uplift from OoOE comes from the prefetching and execution in parallel of the highest latency load and store operations and a good memory/cache subsystem does wonders for IPC in actual workloads. Qualcomm's Hexagon DSP has

    It would be interesting to see the 810 here, but its CPU figures would presumably blow off the chart. A modem or wifi test would also be interesting (care for a donation toward the aforementioned harness?), but likely a blowout in the good direction for the Qualcomm chipsets.
  • Andrei Frumusanu - Friday, January 26, 2018 - link

    Apologies for the chart labels, I did them in Excel and it doesn't allow for editing the secondary label position (watts after J/spec).

    The Snapdragon 810 devices wouldn't have been able to sustain their peak performance states for SPEC so I didn't even try to run it.

    Unless your donation is >$60k, modem testing is far beyond the reach of AT because of the sheer cost of the equipment needed to do this properly.
  • jbradfor - Wednesday, January 24, 2018 - link

    Andrei, two questions on the Master Lu tests. First, is there a chance you could run it on the 835 GPU as well and compare? Second, do these power number include DRAM power, or are they SoC only? If they do not include DRAM power, any chance you could measure that as well?
  • Andrei Frumusanu - Friday, January 26, 2018 - link

    The Master Lu uses the SNPE framework and currently doesn't have the option to chose computing target on the SoC. The GPU isn't any or much faster than the DSP and is less efficient.

    The power figures are the active power of the whole platform (total power minus idle power) so they include everything.
  • jbradfor - Monday, January 29, 2018 - link

    Thanks. Do you have the capability of measuring just the SoC power separately from the DRAM power?
  • ReturnFire - Wednesday, January 24, 2018 - link

    Great article Andrei. So glad there is new mobile stuff on AT. Fingers crossed for more 2018 flagship / soc articles!
  • KarlKastor - Thursday, January 25, 2018 - link

    "AnandTech is also partly guilty here; you have to just look at the top of the page: I really shouldn’t have published those performance benchmarks as they’re outright misleading and rewarding the misplaced design decisions made by the silicon vendors. I’m still not sure what to do here and to whom the onus falls onto."
    That is pretty easy. Post sustained performace values and not only peak power. Just run the benchmarks ten times in a row, it's not that difficult.
    If in every review sustained performance is shown, the people will realize this theater.

    And it is a big problem. Burst GPU performance is useless. No one plays a game for half a minute.
    Burst CPU performance ist perhaps a different matter. It helps to optimize the overall snappiness.
  • Andrei Frumusanu - Friday, January 26, 2018 - link

    I'm planning to switch to this in future reviews.

Log in

Don't have an account? Sign up now