NPU Performance Tested

To test the performance of the NPU we need a benchmark which currently targets all of the various vendor APIs. Unfortunately at this stage short of developing our own implementation the choices are scarce, but luckily there is one: Popular Chinese benchmark suite Master Lu recently introduced an AI benchmark implementing both HiSilicon’s HiAI as well as Qualcomm’s SNPE frameworks. The benchmarks implements three different neural network models: VGG16, InceptionV3 as well as ResNet34. The input dataset are 100 images which are a subset of the ImageNet reference database. As a fall-back the app implements the TensorFlow inferencing library to run on the CPU. I’ve ran the performance figures on the Mate 10 Pro, Mate 9 as well as two Snapdragon 835 (Pixel 2 XL & V30) devices respectively running on the CPU as well as the Hexagon DSP.

Similarly to the SPEC2006 results I chose to use a more complex graph to better showcase the three dimensions of average power (W), efficiency (mJ/inference) as well as absolute performance (fps / inferences per second).

First thing we notice from the graph is that we can observe an order of magnitude difference in performance between the NPU and CPU implementations. Running the networks as they are on the CPUs we’re not able to exceed 1-2fps and we do so at very heavy CPU power consumption. Both the Snapdragon 835 as well as the Kirin 960 CPUs struggle with the workloads with average power exceeding sustainable workloads.

Qualcomm’s Hexagon DSP is able to improve on the CPU performance by a factor of 5-8x. But Huawei’s NPU performance figures are again several factors above that, showcasing up to a 4x lead in ResNet34. The reason for the different performance ratio differences between the different models is their design. Convolutional layers are heavily parallelisable whilst the polling and fully connected layers of the models must use more serial processing steps. ResNet in particular makes use of a larger percentage of convolution processing for a single inference and thus is able to achieve a higher utilization rate of the Kirin NPU.

In terms of power efficiency we’re very near to Huawei’s claims of up to a 50x improvement. This is the key characteristic that will enable CNNs to be used in real-world use-cases. I was quite surprised to see Qualcomm’s DSP reach similar efficiency levels as Huawei’s NPU – albeit at 1/3rd to 1/4th of the performance. This should bode quite well in terms of the Snapdragon 845’s Hexagon 685 which promises up to a 3x increase in performance.

I wanted to take the opportunity to make a rant about Google’s Pixel 2: I was able to actually run the benchmark on the Snapdragon 835’s CPU because the Pixel 2 devices lacked support for the SNPE framework. This was in a sense maybe both expected as well as unexpected. With the introduction of the NN API in Android 8.1, which the Pixel 2 phones support and use acceleration through the dedicated Pixel Visual Core SoC, it’s natural that Google would want to push usage of Android’s standard APIs. But on the other hand this is also a limitation on the capabilities of the phone by the OEM vendor which I can’t help but compare to the decision by Google to by default omit OpelCL in Android. This is a decision which in my eyes has heavily stifled the ecosystem and is why we don’t see more GPU accelerated compute workloads, out of which CNNs could have been one.

While we can’t run the Master Lu AI test on an iPhone, HiSilicon did publish some slides with reported internally numbers we can try to correlate. Based on the models included in the slide, the Apple A11 neural network IP’s performance should land somewhere slightly ahead of the Snapdragon 835’s DSP but still far behind the Kirin NPU, but again we can't independently verify these figures due to lack of a fitting iOS benchmark we can run ourselves.

Of course the important question is, what is this all good for? HiSilicon discloses that one use-case being used is noise reduction via CNN processing, and thus is able to increase voice recognition rate in heavy traffic from 80% to 92%.

The other most publicised use-case is the implementation in the camera app. The Mate 10’s camera makes use of the NPU to run inferencing to recognize different scenarios and optimize the camera settings based on pre-sets for those scenarios. The Mate 10 comes with a translation app which was developed with Microsoft, which is able to use the NPU for accelerated offline translation, and this was definitely the single most impressive usage for me. Inside the built-in gallery application we also see the use of image classification to create a new section where pictures are organized by content type. The former scenarios where the SoC is doing live inferencing on a media stream such as the camera feed is also the use-case where HiSilicon has an advantage over Qualcomm as employs both a DSP and the NPU whereas Snapdragon SoCs have to share the DSP resources between vision processing and neural network inferencing workloads.

Oddly enough the Kirin 970 has sort of double the silicon IP capable of running neural network efficiently as its vision pipeline also includes a Cadence Tensilica Vision P6 DSP which should be in the same performance class as Qualcomm’s Hexagon 680 DSP, but is currently not exposed for user applications.

While the Mate 10 does make some use of the NPU it’s hard to argue that it’s a definitive differentiating factor for the end-user. Currently neural network usage in mobile doesn’t seem to have the same killer-applications that they have in automotive and security camera sectors. Again this is due to the ecosystem being its early days and the Mate 10 among the first devices to actually offer such a dedicated acceleration block. It’s arguable if it’s worth it for the Kirin 970 to have implemented such a piece and Huawei is very open about the fact that it’s reaching out to developers to try and find more use-cases for the silicon, and at least Huawei should be lauded for innovating with something new.

Huawei/Microsoft's translation app seemed to be the most distinguished experience on the Mate 10 so maybe there’s more non-image based use-cases that can be explored in the future. Currently the app allows the traditional snapshot of a foreign language text and then shows a translated overlay, but imagine a future implementation where it’s able to do it live from the camera feed and allow for an AR experience. MediaTek at CES also showed a distinguishing use-case of using CNNs: for video conferencing the video encoder is fed metadata on scene composition by a CNN layer doing image recognition and telling the encoder to use finer-grained block sizes where a user’s face would be, thus increasing video quality. It’s more likely that neural network use-cases will slowly creep up with time rather than there being a new revolutionary thing, as more devices will start to incorporate such IPs and they become more widespread so will developers be more enticed to find uses for them.

An Introduction to Neural Network Processing Final Thoughts
Comments Locked

116 Comments

View All Comments

  • Ratman6161 - Wednesday, January 24, 2018 - link

    Personally I think Samsung is in a great position...wheather you consider them "truly vertically integrated" or not. One thing to remember is that most often, Samsung flagship devices come in two variants. It's mostly in the US where we get the Qualcomm variants while elsewhere tends to get Exynos. The dual source is a great arrangement because every once in a while Qualcomm is going to turn out a something problematic like the Snapdragon 810. When that happens Samsung has the option to use its own which is what they did with the Galaxy S6/Note 5 generation which was Exynos only.

    Another point is: what do you consider "truly vertically integrated". The story cites Apple and Huewai but they don't actually manufacture their SOC's and neither does Qualcomm. I believe the Kirin SOC's are actually manufactured by TSMC while Apple and Qualcomm SOC's have at various times been actually manufactured in Samsung FABs. As far as I know, Samsung is the only company that even has the capability to design and also manufacture their own SOC. So in a way, you could say that my Samsung Note 5 is about the most vertically integrated phone there is, along with non-US versions of the S7 and S8 generations. In those cases you have a samsung SOC manufactured in a Samsung FAB in a Samsung phone with a Samsung screen etc. Don't make the mistake of thinking the whole world is just like us...they aren't. Also many of the screens for other brands are also of Samsung manufacture so you have to keep in mind that there is a lot more to the device than the SOC
  • fred666 - Monday, January 22, 2018 - link

    Huawei only uses HiSilicon SoCs? Nothing from Qualcomm?
  • Andrei Frumusanu - Monday, January 22, 2018 - link

    They've used Qualcomm chip-sets and still do use them in segments they can't fill with their own SoCs.
  • niva - Monday, January 22, 2018 - link

    So they still use QC chips, but unlike them, Samsung isn't vertically integrated because they use QC chips.

    Get out of here.
  • Dr. Swag - Monday, January 22, 2018 - link

    His point is Huawei only uses non-HiSilicon chips in price segments that they do not have SoCs for. Samsung, however, does sometimes use QC silicon even if they have SoCs that can fill that segment (e.g. Samsung uses the Snapdragon 835s even though they have the 8895).

    I'm not saying that I agree with Andrei's view, but there is a difference.
  • niva - Tuesday, January 23, 2018 - link

    I completely disagree with the assessment that Samsung is somehow not "as vertically integrated" as Huawei. Samsung is not just vertically integrated, it produces components for many other key players in the market. They have reasons why they CHOOSE not to use their SOCs in specific markets and areas. Some of the rationale behind those choices may be questioned, but it's a choice. I too think that the world would be a better place if they actually put their own chip designs into their phones and directly competed against Qualcom. That of course might be the end of Qualcom and a whole lot of other companies... Samsung can easily turn into a monopoly that suffocates the entire market, so it's not just veritcal, but horizontal integration. What Huawei has accomplished in short order is impressive, but isn't Huawei just another branch of the Chinese government at this point? Sure yeah, their country is more vertically integrated. Maybe that's the line to take to justify the statement...
  • levizx - Monday, February 26, 2018 - link

    No, it's not INTEGRATED because it doesn't prefer its own over outsourcing. Samsung Mobile department runs separately from its Semiconductor department which act as a contractor no different than Qualcomm.

    As for Huawei being a branch of the Chinese government, it's as true as Google being part of the US government. Stop spruiking conspiracy theory. I know for a fact their employees almost fully owns the company.
  • KarlKastor - Thursday, January 25, 2018 - link

    Well, that's not true. Huawei choose the Snapdragon 625 in the Nova. Why not use their own Kirin 600 Series? it is the same market segment.

    Samsung only opts for Snapdragon, where they have no own SoCs: all regions with CDMA2000 Networks.
    In all other regions, europe for example, they ship all smartphones frome the J- and A-Series to the S-Series and Note with their Exynos SoCs.
  • yslee - Tuesday, January 30, 2018 - link

    You keep on repeating that line, but where I am we have no CDMA2000 networks and still get Snapdragon Samsungs.
  • levizx - Monday, February 26, 2018 - link

    That's also not true, Samsung uses Snapdragon where there's no CDMA2000 as well. Huawei used to use VIA's 55nm CBP8.2D over Snapdragon.

    Mid-tier is not so indicative compared to higher end devices when it comes to, well everything. They may even outsource the ENTIRE DESIGN to a third party, and still proves nothing in particular. They might have chosen S625 because of supply issues which is completely reasonable. Same can not be applied to Samsung, since there's no such thing as supply issues when it comes to Exynos and Snapdragon.

Log in

Don't have an account? Sign up now