Inference Performance: APIs, Where Art Thou?

Having covered the new CPU complexes of both new Exynos and Snapdragon SoCs, up next is the new generation neural processing engines in each chip.

The Snapdragon 855 brings big performance improvements to the table thanks to a doubling of the HVX units inside the Hexagon 690 DSP. The HVX units in the last two generations of Snapdragon chips were the IP blocks who took the brunt of new integer neural network inferencing work, an area the IP is specifically adept at.

The new tensor accelerator inside of the Hexagon 690 was shown off by Qualcomm at the preview event back in January. Unfortunately one of the issues with the new block is that currently it’s only accessible through Qualcomm’s own SDK tools, and currently doesn’t offer acceleration for NNAPI workloads until later in the year with Android Q.

Looking at a compatibility matrix between what kind of different workloads are able to be accelerated by various hardware block in NNAPI reveals are quite sad state of things:

NNAPI SoC Block Usage Estimates
SoC \ Model Type INT8 FP16 FP32
Exynos 9820 GPU GPU GPU
Exynos 9810 GPU? GPU CPU
Snapdragon 855 DSP GPU GPU
Snapdragon 845 DSP GPU GPU
Kirin 980 GPU? NPU CPU

What stands out in particular is Samsung’s new Exynos 9820 chipset. Even though the SoC promises to come with an NPU that on paper is extremely powerful, the software side of things make it as if the block wouldn’t exist. Currently Samsung doesn’t publicly offer even a proprietary SDK for the new NPU, much less NNAPI drivers. I’ve been told that Samsung looks to address this later in the year, but how exactly the Galaxy S10 will profit from new functionality in the future is quite unclear.

For Qualcomm, as the HVX units are integer only, this means only quantised INT8 inference models are able to be accelerated by the block, with FP16 and FP32 acceleration falling back what should be GPU acceleration. It’s to be noted my matrix here could be wrong as we’re dealing with abstraction layers and depending on the model features required the drivers could run models on different IP blocks.

Finally, HiSilicon’s Kirin 980 currently only offers NNAPI acceleration for FP16 models for the NPU, with INT8 and FP32 models falling back to the CPU as the device are seemingly not using Arm’s NNAPI drivers for the Mali GPU, or at least not taking advantage of INT8 acceleration ine the same way Samsung's GPU drivers.

Before we even get to the benchmark figures, it’s clear that the results will be a mess with various SoCs performing quite differently depending on the workload.

For the benchmark, we’re using a brand-new version of Andrey Ignatov’s AI-Benchmark, namely the just released version 3.0. The new version tunes the models as well as introducing a new Pro-Mode that most interestingly now is able to measure sustained throughput inference performance. This latter point is important as we can have very different performance figures between one-shot inferences and back-to-back inferences. In the former case, software and DVFS can vastly overshadow the actual performance capability of the hardware as in many cases we’re dealing with timings in the 10’s or 100’s of milliseconds.

Going forward we’ll be taking advantage of the new benchmark’s flexibility and posting both instantaneous single inference times as well sequential throughput inference times; better showcasing and separating the impact of software and hardware capabilities.

There’s a lot of data here, so for the sake of brevity I’ll simply put up all the results up and we’ll go over the general analysis at the end:

AIBenchmark 3 - 1a - The Life - CPU (FP) AIBenchmark 3 - 1b - The Life - NNAPI (INT8) AIBenchmark 3 - 1c - The Life - NNAPI (FP16) AIBenchmark 3 - 2a - Zoo - NNAPI (INT8) AIBenchmark 3 - 2b - Zoo - CPU (FP) AIBenchmark 3 - 2c - Zoo - NNAPI (FP16) AIBenchmark 3 - 3a - Pioneers - CPU (INT) AIBenchmark 3 - 3b - Pioneers - NNAPI (INT8) AIBenchmark 3 - 3c - Pioneers - NNAPI (FP16) AIBenchmark 3 - 4 - Let's Play! - CPU (FP) AIBenchmark 3 - 5a - Masterpiece - NNAPI (INT8) AIBenchmark 3 - 5b - Masterpiece - NNAPI (FP16) AIBenchmark 3 - 6b - Cartoons - NNAPI (FP16) AIBenchmark 3 - 7a - Ms.Universe - CPU (INT) AIBenchmark 3 - 7b - Ms.Universe - CPU (FP) AIBenchmark 3 - 8 - Blur iT! - CPU (FP) AIBenchmark 3 - 9 - Berlin Driving - NNAPI (FP16) AIBenchmark 3 - 10a - WESPE-dn - NNAPI (FP16) AIBenchmark 3 - 10b - WESPE-dn - NNAPI (FP32)

As initially predicted, the results are extremely spread across all the SoCs.

The new tests also include workloads that are solely using TensorFlow libraries on the CPU, so the results not only showcase NNAPI accelerator offloading but can also serve as a CPU benchmark.

In the CPU-only tests, we see the Snapdragon 855 and Exynos 9820 being in the lead, however there’s a notable difference between the two when it comes to their instantaneous vs sequential performance. The Snapdragon 855 is able to post significantly better single inference figures than the Exynos, although the latter catches up in longer duration workloads. Inherently this is a software characteristic difference between the two chips as although Samsung has improved scheduler responsiveness in the new chip, it still lags behind the Qualcomm variant.

In INT8 workloads there is no contest as Qualcomm is far ahead of the competition in NNAPI benchmarks simply due to the fact that they’re the only vendor being able to offload this to an actual accelerator. Samsung’s Exynos 9820 performance here actually has also drastically improved thanks to the new Mali G76’s new INT8 dot-product instructions. It’s odd that the same GPU in the Kirin 980 doesn’t show the same improvements, which could be due to not up-to-date Arm GPU NNAPI drives on the Mate 20.

The FP16 performance crown many times goes to the Kirin 980 NPU, but in some workloads it seems as if they fall back to the GPU, and in those cases Qualcomm’s GPU clearly has the lead.

Finally for FP32 workloads it’s again the Qualcomm GPU which takes an undisputed lead in performance.

Overall, machine inferencing performance today is an absolute mess. In all the chaos though Qualcomm seems to be the only SoC supplier that is able to deliver consistently good performance, and its software stack is clearly the best. Things will evolve over the coming months, and it will be interesting to see what Samsung will be able to achieve in regards to their custom SDK and NNAPI for the Exynos NPU, but much like Huawei’s Kirin NPU it’s all just marketing until we actually see the software deliver on the hardware capabilities, something which may take longer than the actual first year active lifespan of the new hardware.

SPEC2006: Almost Performance Parity at Worse Efficiency System Performance
Comments Locked

229 Comments

View All Comments

  • Thraxen - Friday, March 29, 2019 - link

    I’m in that customization category and also not technically naive so avoiding any security issues is, well, as natural as not falling for e-mail scams. Anyway, I’m typing this reply on my iPad Pro and like it quite a bit, but compared to my S10 it’s boring as hell. The phone just feels more exciting while the iPad feels... safe? Like it was jointly produced by Fisher Price or something.
  • jaju123 - Sunday, March 31, 2019 - link

    Lol, I have the same experience. The iPad pro 11 that I have is like a kids version of what a mobile OS should be. I can barely do anything on it, whereas android on my mate 20 pro feels like an OS for adults.
  • Thraxen - Sunday, March 31, 2019 - link

    Exactly. I love customizing my phone. I can add widgets (real ones, not that card BS on iOS), change the screen grid layout, change all the icons or just one, use live wall papers (real ones, not that handful of very limited ones on iOS), add automation with apps like Tasker, change the dialer/contacts/etc apps, change how notification functions, etc, etc, etc...

    If there’s something you don’t like how it works or looks on Android there’s a very good chance you can change it. On iOS everything is Apple’s way. And I get the logic there. Apple is big on having a very consistent user experience. But for someone like me it’s painfully boring. Everyone’s iOS devices look the same. So one hand it means you are immediately comfortable using any iOS devices, but on the other it’s like living in one of those neighborhoods where the boulder used the same floor plan for every house. It’s soul sucking boring.
  • Speedfriend - Friday, March 29, 2019 - link

    I use a iPhone and Android daily, and despite benchmarks saying that my iPhone 7 is much faster than my pixel 2 XL, in reality it is slower, takes longer to log into new WiFi, kills apps in the background and takes far worse photos. Plus it is loaded with bloatware I can't even remove off the home screen and can't even rearrange the home screen with icons at the bottom.
  • Wardrive86 - Friday, March 29, 2019 - link

    This is absolutely true. My job always upgrades me to the latest Iphone and Ipad. After having multiple generations of Iphone, browser performance is not as good as benchmarks suggest. Personal and work are always on the same network either WiFi or Verizon.
  • GekkePrutser - Saturday, March 30, 2019 - link

    That's because Apple skimps so much on memory. They make great SoCs but their memory skimping hurts the overall experience by killing off apps in the background too much. Especially after one or two iOS updates it becomes really bad.
  • Irish910 - Saturday, March 30, 2019 - link

    That’s just a blatant lie. I used an iPhone 7 Plus for almost 2 years and the thing was hella fast. Using my XS Max I can barely see a speed difference under most circumstances. The only thing that might seem “faster” is the non animations of apps in android. iOS is much more fluid and smooth. But memory, chipset and software, the iPhone should be faster.
  • arayoflight - Saturday, March 30, 2019 - link

    That applies only to the US. The iPhones are much, much more expensive outside of US. In my country, the 128GB S10+ costs less than the base 64GB iPhone XR (yes, the XR). If you are going to get comparable, even the base XS max costs about 1.5x of the S10+, and comes with half the storage to boot.

    Not to mention that Apple phones don't work that well outside US as well. There are no ubiquitous Apple stores which fix your problems immediately, Apple maps doesn't work well, or siri with non-US accents. You can't disable or set defaults to google assistant or google maps or chrome as well, so good luck. Also, the rest of the world doesn't use imessage, but WhatsApp.

    iPhones are a much worse deal outside of US, They have excellent performance and displays yes, but they aren't excellent value for the atrocious prices you pay.
  • cha0z_ - Tuesday, April 9, 2019 - link

    This, when I got my (sadly exynos as EU) note 9 it was HALF the price of the XS max 256GB at my carrier both and with deal. I literally could take two note 9 instead of a single xs max 256GB. Even if we argue that the xs max is a better phone (tho in reality it has it's + and - compared to the note 9), is it two times the price better? Had the money to buy both, but tbh I like android generally more. Tho I must admit that the iphones are a lot a lot smoother... got iphone 6s too and it's smoother than the note 9 and that's not exactly making me happy. :D
  • id4andrei - Saturday, March 30, 2019 - link

    You keep saying Android's security problems like it's an axiom. You're just as safe with a high end Android device like you are with an iphone. Android does not have ads. Tracking can be disabled or enabled with as much ease as on ios.

    Stop spreading bullshit. You are tracked and monetized on ios via 3rd parties just like on Android. Ios gathers data about you just like Android.

Log in

Don't have an account? Sign up now