The Snapdragon 855 Phone Roundup: Searching for the Best Implementationsby Andrei Frumusanu on September 5, 2019 8:30 AM EST
Machine Learning Inference Performance
AIMark makes use of various vendor SDKs to implement the benchmarks. This means that the end-results really aren’t a proper apples-to-apples comparison, however it represents an approach that actually will be used by some vendors in their in-house applications or even some rare third-party app.
Summarizing the four sub-tests of AIMark 3, we see a few clear outliers: OnePlus 7 Pro and Xiaomi’s Black Shark 2 lack Qualcomms driver libraries on which the benchmark relies on (or they’re broken), and the application crashes. Another outlier is the G8, here the phone also lacked the libraries for hardware acceleration, but the app at least fell back onto CPU execution of the workloads, albeit at a massive performance penalty.
We notably see that AIMark has been able to implement Samsung’s NPU SDK as we’re seeing evident hardware acceleration. Huawei’s P30 Pro also makes use of its NPU here via its proprietary SDK. Naturally, Apple’s iPhone XS uses the CoreML framework to accelerate the AI workloads.
Overall, the Snapdragon 855 devices with the latest SDKs and frameworks here seem to compete extremely well, offering extensive performance that is leading the pack across all the different sub-tests.
AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.
We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initialising the test to actually executing the computation.
AIBenchmark 3 - NNAPI CPU
We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.
We’re seeing largely regular results here, although some observations pop up again, such as seeing that the Black Shark 2 having a very conservative result in some subtests. The other big outlier is the OPPO Reno 10x, here we’re seeing that the phone consistently performs better than the rest of the pack. This is very interesting, particularly because the phone actually isn’t able to actually use the NNAPI acceleration in the subsequent tests we’re covering next. This means that the “plain” TensorFlow libraries the OPPO is making use of are performing better than what’s employed for the rest of the devices.
AIBenchmark 3 - NNAPI INT8
INT8 performance is dominated by the Snapdragon 855 devices, and this is thanks to the vector processing units of the Hexagon DSP. Test 3-6 stands out for some devices, and it’s likely that this is due to discrepancies in the NNAPI drivers which don’t fully accelerate things on the hardware for the S10, Mi9 and G8.
AIBenchmark 3 - NNAPI FP16
FP16 performance is again quite even across the board, with the difference coming down to DVFS and scheduler responsiveness. Here the Snapdragon 855 competes neck-in-neck with the Kirin 980, winning some tests while losing others. It’s to be remembered that the Kirin 980’s NPU doesn’t support INT8 acceleration at this time, and that’s why it shows up better in the FP16 benchmarks.
AIBenchmark 3 - NNAPI FP32
Finally, on the FP32 benchmark, Qualcomm accelerates these workloads on the GPU and is far ahead of the competition, which either have lacking GPU drivers or have to fall back to CPU acceleration.