Machine Learning Inference Performance

AIMark 3

AIMark makes use of various vendor SDKs to implement the benchmarks. This means that the end-results really aren’t a proper apples-to-apples comparison, however it represents an approach that actually will be used by some vendors in their in-house applications or even some rare third-party app.

鲁大师 / Master Lu - AIMark 3 - InceptionV3 鲁大师 / Master Lu - AIMark 3 - ResNet34 鲁大师 / Master Lu - AIMark 3 - MobileNet-SSD 鲁大师 / Master Lu - AIMark 3 - DeepLabV3

In AIMark 3, the benchmark uses each vendor’s proprietary SDK in order to accelerate the NN workloads most optimally. For Qualcomm’s devices, this means that seemingly the benchmark is also able to take advantage of the new Tensor cores. Here, the performance improvements of the new Snapdragon 865 chip is outstanding, posting in 2-3x performance compared to its predecessor.

AIBenchmark 3

AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.

We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initialising the test to actually executing the computation.

AIBenchmark 3 - NNAPI CPU

We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.

AIBenchmark 3 - 1 - The Life - CPU/FP AIBenchmark 3 - 2 - Zoo - CPU/FP AIBenchmark 3 - 3 - Pioneers - CPU/INT AIBenchmark 3 - 4 - Let's Play - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/INT AIBenchmark 3 - 8 - Blur iT! - CPU/FP

Starting off with the CPU accelerated benchmarks, we’re seeing some large improvements of the Snapdragon 865. It’s particularly the FP workloads that are seeing some big performance increases, and it seems these improvements are likely linked to the microarchitectural improvements of the A77.

AIBenchmark 3 - NNAPI INT8

AIBenchmark 3 - 1 - The Life - INT8 AIBenchmark 3 - 2 - Zoo - Int8 AIBenchmark 3 - 3 - Pioneers - INT8 AIBenchmark 3 - 5 - Masterpiece - INT8 AIBenchmark 3 - 6 - Cartoons - INT8

INT8 workload acceleration in AI Benchmark happens on the HVX cores of the DSP rather than the Tensor cores, for which the benchmark currently doesn’t have support for. The performance increases here are relatively in line with what we expect in terms of iterative clock frequency increases of the IP block.

AIBenchmark 3 - NNAPI FP16

AIBenchmark 3 - 1 - The Life - FP16 AIBenchmark 3 - 2 - Zoo - FP16 AIBenchmark 3 - 3 - Pioneers - FP16 AIBenchmark 3 - 5 - Masterpiece - FP16 AIBenchmark 3 - 6 - Cartoons - FP16 AIBenchmark 3 - 9 - Berlin Driving - FP16 AIBenchmark 3 - 10 - WESPE-dn - FP16

FP16 acceleration on the Snapdragon 865 through NNAPI is likely facilitated through the GPU, and we’re seeing iterative improvements in the scores. Huawei’s Mate 30 Pro is in the lead in the vast majority of the tests as it’s able to make use of its NPU which support FP16 acceleration, and its performance here is quite significantly ahead of the Qualcomm chipsets.

AIBenchmark 3 - NNAPI FP32

AIBenchmark 3 - 10 - WESPE-dn - FP32

Finally, the FP32 test should be accelerated by the GPU. Oddly enough here the QRD865 doesn’t fare as well as some of the best S855 devices. It’s to be noted that the results here today were based on an early software stack for the S865 – it’s possible and even very likely that things will improve over the coming months, and the results will be different on commercial devices.

Overall, there’s again a conundrum for us in regards to AI benchmarks today, the tests need to be continuously developed in order to properly support the hardware. The test currently doesn’t make use of the Tensor cores of the Snapdragon 865, so it’s not able to showcase one of the biggest areas of improvement for the chipset. In that sense, benchmarks don’t really mean very much, and the true power of the chipset will only be exhibited by first-party applications such as the camera apps, of the upcoming Snapdragon 865 devices.

System Performance GPU Performance & Power
Comments Locked

178 Comments

View All Comments

  • Andrei Frumusanu - Monday, December 16, 2019 - link

    You forgot I'm member of the Illuminati, half mole-people from my dad's side and half lizard-man from my mother's side. I love my monthly deep state paycheck alongside the Apple subsidies I get for spreading their narrative. Wait till people find out the earth is really flat.
  • Quantumz0d - Monday, December 16, 2019 - link

    LOL. Lawyer manipulation is for their Class Actions KB fiasco, Touch Disease, Error 53..not you (Just clarifying) and idk if you know Louis Rossman on YT. If not I suggest to watch and know how the fleecing is done and consumer is kept in dark always. The revelations of their stranglehold on HW IC chip for supplying to repair services and Lobbying against Repair is enough to understand and gauge the fundamemal pillars of a company and its ethics.

    Sorry I take ethics and choice/liberty into account over utopian performance and elitist / Luxury status quo stance.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    I pleaded with you to not go into tangential rants for this article again, yet here we are.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > How? Just like Geekbench, different compilers are used. Different distribution of loads are made.

    Please explain to me what the hell "different distributions of loads are made" is meant to mean? You have zero technical rationale behind such statements. All the comparisons here were made with the Clang/LLVM compilers on all platforms - bar the ISA, there is exactly zero difference in the workload logic between the platforms, and Apple's toolchain isn't doing something completely different either that it would suddenly invalidate the comparison.

    > You are showing Apple A13 (LOL A13 is faster than the fastest AMD or Intel chip) using Jurassic Spec benchmark?

    Yes I am because that is the reality of the matter.

    > We are talking about efficiency here, your beloved Apple chip is sucking twice the power than SD855 or SD865 per workload.

    And it's finishing the workload than twice as fast, ending up being *almost* as efficient in terms of the energy used by the computation. What matters here is the energy efficiency, not the power efficiency, and in this regard Apple's devices are top of the line.

    > While your chart if showing Apple has twice the performance vs SD865, the phone doesn't tell lies.

    What's even your point here? Of course the iPhones are significantly faster in loading webpages?

    Return here when you have an actual factual argument to present, because right now you just have been repeating complete nonsense.
  • joms_us - Monday, December 16, 2019 - link

    > Please explain to me what the hell "different distributions of loads are made" is meant to mean? You have zero technical rationale behind such statements. All the comparisons here were made with the Clang/LLVM compilers on all platforms - bar the ISA, there is exactly zero difference in the workload logic between the platforms, and Apple's toolchain isn't doing something completely different either that it would suddenly invalidate the comparison.

    The compiler maybe the same but the scheduler of tasks in Android and Windows are different than in iOS. Many background apps are running simultaneously on Android and Windows machine, how about iOS? Frozen apps? LOL

    >Yes I am because that is the reality of the matter.

    Only matters to you, not in outside world. If you really think A9 has better IPC than Ryzen or Skylake, why don't you join the Apple engineers and build the fastest gaming/productivity PC with Apple A9 chip and sell it like hotcakes? No? Cannot t be? Even Apple doesn't claim their SoC is faster than even low end desktop today LOL. Even milking the customers with overpriced Macs with "Intel" inside.

    > And it's finishing the workload than twice as fast, ending up being *almost* as efficient in terms of the energy used by the computation. What matters here is the energy efficiency, not the power efficiency, and in this regard Apple's devices are top of the line.

    What matters is how fast it can finish the whole task not each micro-workload nonsense. If I want to zip and upload a file or encode and upload a video, I only care how fast it will finish the whole task and for that matter. If I want to play games, do I care how the fast the damn phone will compute the vector, pixel location, math operations etc? I only care how elegant, smooth and how fast the gaming experience will be.

    iPhone is not twice as fast as loading any web page, any consumer app or even exporting or transcoding videos. Different apps yield different results, you are showing one worthless primitive benchmark where iPhone is fast, but out there, hundreds or thousands of different apps and website are showing the opposite results.

    Here is one or two for you, one is showing twice the performance over the other =D

    https://youtu.be/ay9V5Ec8eiY?t=529

    https://youtu.be/DtSgdrKztGk?t=432
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > the scheduler of tasks in Android and Windows are different than in iOS.

    The scheduler isn't any different, because the scheduler doesn't do anything when there's only a single thread on a core to be run. There is literally no scheduling.

    > If you really think A9 has better IPC than Ryzen or Skylake

    Correction, I don't really just think it, I know it.

    > What matters is how fast it can finish the whole task not each micro-workload nonsense.

    The whole SPEC suite takes exactly an hour to complete, so quit with the micro nonsense if you have no idea what's even being tested here.

    > Here is one or two for you, one is showing twice the performance over the other =D

    Both phones don't even use the freaking CPU when transcoding videos - they're both offloaded using the dedicated fixed function video encoders much like you can offload encoding on desktop PCs to your GPU's encoders, instead of doing it inefficiently on the CPU.

    You have absolutely ZERO understanding of what's going on here.
  • joms_us - Monday, December 16, 2019 - link

    > The scheduler isn't any different, because the scheduler doesn't do anything when there's only a single thread on a core to be run. There is literally no scheduling.

    Then the SoC is not maximized but underperforming.

    > Correction, I don't really just think it, I know it.

    Sure you do, now where is the fastest processor in this planet? Where is our A9-powered gaming PC LOL.

    > The whole SPEC suite takes exactly an hour to complete, so quit with the micro nonsense if you have no idea what's even being tested here.

    Just goes to show how primitive your tool is. 2020 is just around the corner, here you are still using a 2006 tool. This is like claiming Wolfdale is faster than Ryzen because it can finish 1M SuperPI faster LOL.
  • Dug - Monday, December 16, 2019 - link

    You really don't have any argument because you really aren't sure what you are talking about.
  • joms_us - Monday, December 16, 2019 - link

    Am I or you? Isn't it clear that SPEC result does not translate to real-world? Where is the double performance as shown here? Show us proof that iPhone has twice the performance, I've posted links with two Android phones decimating iPhone 11.

    Sure you can claim all day you want that iPhone is the fastest phone via SPEC LOL, I'd rather see it translate to actual performance, not imaginary numbers.
  • cha0z_ - Monday, December 23, 2019 - link

    You clearly have no idea what you are talking about. Dunno why Andrei dedicated so much of his time trying to explain to you in primitive language what's going on (so you can understand).

Log in

Don't have an account? Sign up now