Machine Learning Inference Performance

AIMark 3

AIMark makes use of various vendor SDKs to implement the benchmarks. This means that the end-results really aren’t a proper apples-to-apples comparison, however it represents an approach that actually will be used by some vendors in their in-house applications or even some rare third-party app.

鲁大师 / Master Lu - AIMark 3 - InceptionV3 鲁大师 / Master Lu - AIMark 3 - ResNet34 鲁大师 / Master Lu - AIMark 3 - MobileNet-SSD 鲁大师 / Master Lu - AIMark 3 - DeepLabV3

In AIMark 3, the benchmark uses each vendor’s proprietary SDK in order to accelerate the NN workloads most optimally. For Qualcomm’s devices, this means that seemingly the benchmark is also able to take advantage of the new Tensor cores. Here, the performance improvements of the new Snapdragon 865 chip is outstanding, posting in 2-3x performance compared to its predecessor.

AIBenchmark 3

AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.

We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initialising the test to actually executing the computation.

AIBenchmark 3 - NNAPI CPU

We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.

AIBenchmark 3 - 1 - The Life - CPU/FP AIBenchmark 3 - 2 - Zoo - CPU/FP AIBenchmark 3 - 3 - Pioneers - CPU/INT AIBenchmark 3 - 4 - Let's Play - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/INT AIBenchmark 3 - 8 - Blur iT! - CPU/FP

Starting off with the CPU accelerated benchmarks, we’re seeing some large improvements of the Snapdragon 865. It’s particularly the FP workloads that are seeing some big performance increases, and it seems these improvements are likely linked to the microarchitectural improvements of the A77.

AIBenchmark 3 - NNAPI INT8

AIBenchmark 3 - 1 - The Life - INT8 AIBenchmark 3 - 2 - Zoo - Int8 AIBenchmark 3 - 3 - Pioneers - INT8 AIBenchmark 3 - 5 - Masterpiece - INT8 AIBenchmark 3 - 6 - Cartoons - INT8

INT8 workload acceleration in AI Benchmark happens on the HVX cores of the DSP rather than the Tensor cores, for which the benchmark currently doesn’t have support for. The performance increases here are relatively in line with what we expect in terms of iterative clock frequency increases of the IP block.

AIBenchmark 3 - NNAPI FP16

AIBenchmark 3 - 1 - The Life - FP16 AIBenchmark 3 - 2 - Zoo - FP16 AIBenchmark 3 - 3 - Pioneers - FP16 AIBenchmark 3 - 5 - Masterpiece - FP16 AIBenchmark 3 - 6 - Cartoons - FP16 AIBenchmark 3 - 9 - Berlin Driving - FP16 AIBenchmark 3 - 10 - WESPE-dn - FP16

FP16 acceleration on the Snapdragon 865 through NNAPI is likely facilitated through the GPU, and we’re seeing iterative improvements in the scores. Huawei’s Mate 30 Pro is in the lead in the vast majority of the tests as it’s able to make use of its NPU which support FP16 acceleration, and its performance here is quite significantly ahead of the Qualcomm chipsets.

AIBenchmark 3 - NNAPI FP32

AIBenchmark 3 - 10 - WESPE-dn - FP32

Finally, the FP32 test should be accelerated by the GPU. Oddly enough here the QRD865 doesn’t fare as well as some of the best S855 devices. It’s to be noted that the results here today were based on an early software stack for the S865 – it’s possible and even very likely that things will improve over the coming months, and the results will be different on commercial devices.

Overall, there’s again a conundrum for us in regards to AI benchmarks today, the tests need to be continuously developed in order to properly support the hardware. The test currently doesn’t make use of the Tensor cores of the Snapdragon 865, so it’s not able to showcase one of the biggest areas of improvement for the chipset. In that sense, benchmarks don’t really mean very much, and the true power of the chipset will only be exhibited by first-party applications such as the camera apps, of the upcoming Snapdragon 865 devices.

System Performance GPU Performance & Power
Comments Locked

178 Comments

View All Comments

  • jospoortvliet - Monday, December 16, 2019 - link

    The best snapdragon can barely keep up with the a11, as Andrei points out in his analysis. YouTube speed tests are by far the most useless and pointless benchmarks ever devised, which is why not a single reputable source (like anandtech) ever uses them...

    Sorry, but the only question here is how much faster the a14 will be. 40%, 50% or even more...
  • Kishoreshack - Monday, December 16, 2019 - link

    Why doesn't Qualcomm simply increases their die size & use a larger die properly to at least come closer to apple
    maybe it is needs more than a larger die size
    it needs a better Architecture
    Arm or Qualcomm whom to blame?
  • eastcoast_pete - Monday, December 16, 2019 - link

    A key problem for smartphones is power budget. These SoCs are already pushing 5 W/h and up if running at full tilt, so even a nicely sized battery (5000 mAh) can be drained in 3-4 hours top if someone runs them accordingly. Apple has managed to accommodate high peak/burst performance while still getting good overall power usage, and I still find their battery life wanting.
  • Quantumz0d - Monday, December 16, 2019 - link

    Why do they need to ? Apple is only Apple and it only works for them.

    If you see realworld speedtests on YouTube see how OP7 Pro flies through the tasks giving the user a faster and smoother experience.

    And go to ScyllaDB website and see how AWS Graviton 2 stacks with Intel in Benches and how they mention benches only should not be taken as a measure.

    Apple OS lacks Filesystem. It cannot be a computer ever. iOS is a kid friendly OS. You can't even fucking change launcher / icons forget other system level changes.

    Qcomm needs competition from MediaTek, Exynos. Huawei HiSilicon but except Exynos all are garbage because they do not let us unlock Bootloaders. And Android phones see community driven ROMs there is so much or choice to add even the DAPs from 200USD to 3000USD have Qcomm technology.

    Repairing is also easier due to the HW Boxes which can bring a QComm9008 Brick to life. Whereas with Apple its Ball and Chain ecosystem.

    I see my SD835 run like butter through everything I throw at it and has an SD slot too.

    This stupid Whiteknighting of Apple processors beating x86 and their use case / Android Phones is a big sham. People need to realize benches are not the only case when you compare Processors accross OSes.
  • jospoortvliet - Monday, December 16, 2019 - link

    A 1995 computer running MS DOS 6.0 is also butter smooth, I hope you dont think that means an intel 486 DX4 is faster than an apple chip.

    Please stop with your nonsense about "real world tests". Real world your 835 has a slower cpu, GPU, and storage. Doesn't mean it is garbage - it is fine you are happy with it but it is not your duty to defend the honor of Oppo against facts. I dont want an iphone either die to their walked garden but that doesn't mean I live under the delusion that my brand new galaxy s10e is anything other than at least 40% slower and twice as inefficient as an iPhone 11...
  • cha0z_ - Friday, December 27, 2019 - link

    Coming from exynos 9810 note 9 to iphone 11 pro max... the SOC on the iphone is literally times faster and more efficient than the exynos. The difference is absurdly big and people still calls apple slower because of design choices (like the slow animations, etc). It's super smooth in all conditions/times + it's rofl fast in any app/game (not to mention apps got functions not available on android). GL running full PC civilization 6 on android with decent performance later in the game on bigger map and decent battery life. There is a reason why the game was not ported on android too (and not only piracy) - it will run poor even on most high end current gen android phones.
  • ksec - Monday, December 16, 2019 - link

    They could, but are you going to pay for it? Let say Qualcomm has to bump up $50 ( inclusive of their profits ) to reach the same level of performance, as you consumer you will have to pay roughly $100 more.

    In a cut throat Android market, who is going to risk putting up their Smartphone price by $100?

    There is a reason why Samsung and Huawei are trying to make SoC themselves, instead of putting those profits into Qualcomm's hand, they want those cost to go towards more die space to better differentiate their product and compete with Apple.

    Now here is another question, how many consumer will notice the different in CPU speed? And how many consumer will notice the Modem quality different?

    They are all set of trade offs, not only in engineering, but also in cost, markets, risk... etc...
  • jospoortvliet - Monday, December 16, 2019 - link

    It is a matter of cost. Arm could design a cpu core that is 4 times the size of the a76 and 50% faster, catching up to apple. But that would cost a lot of die size and thus money... for high margin, high cost devices it is ok but not for cheap ones. Ape can afford this...
  • jospoortvliet - Monday, December 16, 2019 - link

    Ape - I mean apple of course!
  • cha0z_ - Friday, December 27, 2019 - link

    It's not that simple as putting a lot of transistors in it. You can somewhat tackle the problem with that, but by itself it will not lead to the desired end result.I can elaborate, but it will be lengthily and highly technical post.

Log in

Don't have an account? Sign up now