Final Thoughts

What I wanted to showcase with this article was not only the particular advances of the Kirin 970, but also to use it as an opportunity to refresh everyone on the competitive landscape of the high-end Android SoC market. As the modern, post-iPhone smartphone ecosystem enters its 10-year anniversary, we’re seeing the increasing consolidation and vertical integration of the silicon that power today’s devices.

I wouldn’t necessarily say that Apple is the SoC trend setter that other companies are trying to copy, as much as other vendors are coming to the same conclusion Apple has: to be able to evolve and compete in a mature ecosystem you need to be able to control the silicon roadmap yourself. Otherwise you fall into the risk of not being able to differentiate from other vendors using similar component stacks, or risk not being competitive against those vendors who do have vertical integration. Apple was early to recognize this, and to date Huawei has been the only other OEM able actually realize this goal towards quasi-silicon independence.

I say quasi-independence because while the companies are designing their own SoCs, they are still relying on designs from the big IP licensing firms for key components such as the CPUs or GPUs. The Kirin 970 for example doesn’t really manage to differentiate itself from the Snapdragon 835 in regards to CPU performance or efficiency, as both ARM Cortex-A73 powered parts end up end up within margins of error of each other.

Snapdragon 820’s Kryo CPU core was a hard sell against a faster, more efficient, and smaller Cortex-A72. Samsung’s custom CPU efforts fared slightly better than Qualcomm’s, however the Exynos M1 and M2 haven’t yet managed to present a proper differentiating advantage against ARM’s CPUs. Samsung LSI’s performance claims for the Exynos 9810 are definitely eye-brow raising and might finally mark the point where years of investment and development on a custom CPU truly pay off, but Samsung’s mobile division has yet to demonstrate true and committed vertical integration. Considering all of this, HiSilicon’s decision to stick with ARM CPUs makes sense.

While Qualcomm has backpedalled on using its custom CPU designs in mobile, the company does demonstrate the potential and advantages of controlling your own IP designs when it comes to the GPU. To draw parallels, on the desktop GPU side of things we already see the competitive and market consequences of one vendor having a ~33% efficiency advantage (Nvidia GeForce GTX 1080 vs AMD Radeon Vega 64). Just imagine that disparity increasing to over 75-90%, and that’s currently the state that we have in the mobile landscape (Snapdragon 835 vs Kirin 970). In both cases silicon vendors can compensate for efficiency and performance by going with a larger GPU, something that is largely invisible to the experience of the end-user but definitely an unsustainable solution as it eats into the gross margin of the silicon vendor. With PPA disparities on the high end nearing factors of 4x it definitely gives moment to pause and wonder where we’ll be heading in the next couple of years.

Beyond CPU, GPU and modem IP, SoCs have a lot more component blocks that are generally less talked about. Media blocks such as encoder/decoders eventually end up summarized as feature-checkboxes going up to X*Y resolution at Z frames per second. Even more esoteric are the camera pipelines such as the ISPs of modern SoCs. Here the lack of knowledge of how they work of what the capabilities are both part due to the silicon vendor’s secrecy but also due to the fact that currently truly differentiating camera experiences are defined by software algorithm implementations. The Kirin 970’s new use a Cadence Tensilica Vision P6 DSP definitely uplifts the camera capabilities of the devices powered by the new SoC, but that’s something that we’ll cover in a future device-centric review.

The NPU is a new class of IP whose uses are still in its infancy. Did the Kirin 970 need to have it included to be competitive? No. Does its addition make it more competitive? Yes. Well, maybe. With the software ecosystem lagging behind it’s still early to say how crucial neural network acceleration IPs in smartphones will become, and we have sort of a chicken-or-egg sort of situation where certain use-cases might simply not be feasible without the hardware. The marketing advantages for Huawei have been loud and clear, and it looks industry wide adoption is inevitable and on its way. I don’t foresee myself recommending or not recommending a device based on its existing, or lack of “AI” capabilities for some time to come, and similarly consumers should apply a wait & see approach to the whole topic.

While going on a lot of tangents and comparisons against competitors, the article’s main topic was the Kirin 970. HiSilicon’s new chipset proves itself as an excellent smartphone SoC that's well-able to compete with Qualcomm’s and Samsung’s best SoCs. There’s still a looming release schedule disadvantage as Huawei doesn’t follow the usual spring Android device refresh cycle, and we expect newer SoCs to naturally leapfrog the Kirin 970. This might change in the future as both semiconductor manufacturing and IP roadmaps might become out of sync with the spring device product launches.

I come back to the fact that Huawei is only one of two OEM vendors – and the only Android vendor – whom is leveraging vertical integratation between their SoC designs and the final phones. The company has come a long way over the past few years and we’ve seen solid, generational improvements in both silicon as well as the complete phones. What is most important is that the company is able to put both reasonable goals and execute on its targets. Talking to HiSilicon I also see the important trait of self-awareness of short-comings and the need to improve in key areas. Intel’s Andy Grove motto of “only the paranoid survive” seems apt to apply to Huawei as I think the company is heading towards the right directions in the mobile business and a key reason for their success. 

NPU Performance & Huawei's Use-cases
Comments Locked

116 Comments

View All Comments

  • StormyParis - Monday, January 22, 2018 - link

    If the Modem IP is Huawei's one true in-house part, why didn't you at least test it alongside the CPU and GPU ? I'd assume in the real world, ti too has a large impact on batteyr and performance ?
  • Ian Cutress - Monday, January 22, 2018 - link

    The kit to properly test a modem power/attenuation to battery is around $50-100k. We did borrow one once, a few years ago, but it was only short-term loan. Not sure if/when we'll be able to do that testing again.
  • juicytuna - Monday, January 22, 2018 - link

    How does Mali have so many design wins? Why did Samsung switch from PowerVR to Mali? Cost savings? Politics? Because it clearly wasn't a descistion made on technical merit.
  • lilmoe - Tuesday, January 23, 2018 - link

    Because OEMs like Samsung are not stupid? And Mali is actually very power efficient and competitive?

    What are you basing your GPU decision on? Nothing in the articles provides evidence that Mali is less efficient than Adreno in UI acceleration or 60fps capped popular games (or even 60fps 1080p normalized T-Rex benchmark)...

    Measuring the constant power draw of the GPU, which is supposed to be reached in vert short bursts, is absolutely meaningless.
  • lilmoe - Tuesday, January 23, 2018 - link

    ***Measuring the max (constant) power draw of the GPU, which is supposed to be reached in very short bursts during a workload, is absolutely meaningless.
  • jospoortvliet - Saturday, January 27, 2018 - link

    Your argument is half-way sensible for a CPU but not for a GPU.

    A GPU should not even HAVE a boost clock - there is no point in that for typical GPU workloads. Where a CPU is often active in bursts, a GPU has to sustain performance in games - normal UI work barely taxes it anyway.

    So yes the max sustained performance and associated efficiency is ALL that matters. And MALI, at least in the implementations we have seen, is behind.
  • lilmoe - Sunday, January 28, 2018 - link

    I think you're confusing fixed function processing with general purpose GPUs. Modern GPU clocks behave just like CPU cores, and yes, with bursts, just like NVidia's and AMD's. Not all scenes rendered in a game, for example, need the same GPU power, and not all games have the same GPU power needs.

    Yes, there is a certain performance envelope that most popular games target. That performance envelope/ target is definitely not SlingShot nor T-rex.

    This is where Andrei's and your argument crumbles. You need to figure out that performance target and measure efficiency and power draw at that target. That's relatively easy to do; open up candy crush and asphalt 8 and measure in screen fps and power draw. That's how you measure efficiency on A SMARTPHONE SoC. Your problem is that you think people are using these SoCs like they would on a workstation. They don't. No one is going to render a 3dmax project on these phones, and there are no games that even saturate last year's flagship mobile gpu.

    Not sure if you're not getting my simple and sensible point, or you're just being stubborn about it. Mobile SoC designed have argued for bursty gpu behavior for years. You guys need to get off your damn high horse and stop deluding yourself into thinking that you know better. What Apple or Qualcomm do isn't necessarily best, but it might be best for the gpu architecture THEY'RE using.

    As for the CPU, you agree but Andrei insists on making the same mistake. You DON'T measure efficiency at max clocks. Again, max clocks are used in bursts and only for VERY short periods of time. You measure efficient by measuring the time it takes to complete a COMMON workload and the total power it consumes at that. Another hint, that common workload is NOT geekbench, and it sure as hell isn't SPEC.
  • lilmoe - Sunday, January 28, 2018 - link

    The A75 is achieving higher performance mostly with higher clocks. The Exynos M3 is a wide core WITH higher clocks. Do you really believe these guys are idiots? You really think that's going to affect efficiency negatively? You think Android OEMs will make the same "mistake" Apple did and not provide adequate and sustainable power delivery?

    Laughable.
  • futrtrubl - Monday, January 22, 2018 - link

    "The Kirin 970 in particular closes in on the efficiency of the Snapdragon 835, leapfrogging the Kirin 960 and Exynos SoCs."
    Except according to the chart right above it the 960 is still more efficient.
  • Andrei Frumusanu - Monday, January 22, 2018 - link

    The efficiency axis is portrayed as energy (joules) per performance (test score). In this case the less energy used, the more efficient, meaning the shorter the bars, the better.

Log in

Don't have an account? Sign up now