Machine Learning Inference Performance

AIMark 3

AIMark makes use of various vendor SDKs to implement the benchmarks. This means that the end-results really aren’t a proper apples-to-apples comparison, however it represents an approach that actually will be used by some vendors in their in-house applications or even some rare third-party app.

鲁大师 / Master Lu - AIMark 3 - InceptionV3 鲁大师 / Master Lu - AIMark 3 - ResNet34 鲁大师 / Master Lu - AIMark 3 - MobileNet-SSD 鲁大师 / Master Lu - AIMark 3 - DeepLabV3

In AIMark 3, the benchmark uses each vendor’s proprietary SDK in order to accelerate the NN workloads most optimally. For Qualcomm’s devices, this means that seemingly the benchmark is also able to take advantage of the new Tensor cores. Here, the performance improvements of the new Snapdragon 865 chip is outstanding, posting in 2-3x performance compared to its predecessor.

AIBenchmark 3

AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.

We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initialising the test to actually executing the computation.

AIBenchmark 3 - NNAPI CPU

We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.

AIBenchmark 3 - 1 - The Life - CPU/FP AIBenchmark 3 - 2 - Zoo - CPU/FP AIBenchmark 3 - 3 - Pioneers - CPU/INT AIBenchmark 3 - 4 - Let's Play - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/INT AIBenchmark 3 - 8 - Blur iT! - CPU/FP

Starting off with the CPU accelerated benchmarks, we’re seeing some large improvements of the Snapdragon 865. It’s particularly the FP workloads that are seeing some big performance increases, and it seems these improvements are likely linked to the microarchitectural improvements of the A77.

AIBenchmark 3 - NNAPI INT8

AIBenchmark 3 - 1 - The Life - INT8 AIBenchmark 3 - 2 - Zoo - Int8 AIBenchmark 3 - 3 - Pioneers - INT8 AIBenchmark 3 - 5 - Masterpiece - INT8 AIBenchmark 3 - 6 - Cartoons - INT8

INT8 workload acceleration in AI Benchmark happens on the HVX cores of the DSP rather than the Tensor cores, for which the benchmark currently doesn’t have support for. The performance increases here are relatively in line with what we expect in terms of iterative clock frequency increases of the IP block.

AIBenchmark 3 - NNAPI FP16

AIBenchmark 3 - 1 - The Life - FP16 AIBenchmark 3 - 2 - Zoo - FP16 AIBenchmark 3 - 3 - Pioneers - FP16 AIBenchmark 3 - 5 - Masterpiece - FP16 AIBenchmark 3 - 6 - Cartoons - FP16 AIBenchmark 3 - 9 - Berlin Driving - FP16 AIBenchmark 3 - 10 - WESPE-dn - FP16

FP16 acceleration on the Snapdragon 865 through NNAPI is likely facilitated through the GPU, and we’re seeing iterative improvements in the scores. Huawei’s Mate 30 Pro is in the lead in the vast majority of the tests as it’s able to make use of its NPU which support FP16 acceleration, and its performance here is quite significantly ahead of the Qualcomm chipsets.

AIBenchmark 3 - NNAPI FP32

AIBenchmark 3 - 10 - WESPE-dn - FP32

Finally, the FP32 test should be accelerated by the GPU. Oddly enough here the QRD865 doesn’t fare as well as some of the best S855 devices. It’s to be noted that the results here today were based on an early software stack for the S865 – it’s possible and even very likely that things will improve over the coming months, and the results will be different on commercial devices.

Overall, there’s again a conundrum for us in regards to AI benchmarks today, the tests need to be continuously developed in order to properly support the hardware. The test currently doesn’t make use of the Tensor cores of the Snapdragon 865, so it’s not able to showcase one of the biggest areas of improvement for the chipset. In that sense, benchmarks don’t really mean very much, and the true power of the chipset will only be exhibited by first-party applications such as the camera apps, of the upcoming Snapdragon 865 devices.

System Performance GPU Performance & Power
Comments Locked

178 Comments

View All Comments

  • joms_us - Monday, December 16, 2019 - link

    Right, he even claimed a 2015 Apple A9 is faster than Skylake and Ryzen processors today. Only a complete !Diot will believe this claim.
  • Quantumz0d - Monday, December 16, 2019 - link

    You should see AT forum. A thread has been dedicated to discuss this BS fanboyism and outcome was Apple won.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    x86 emulation on Arm has absolutely nothing to do with any topic discussed here or QC vs Apple performance. I'm sick and tired of your tirades here as nothing you say remains technical or on point to the matter.

    The experience I have, when dismissing any other aspects such as iOS's super slow animations, is that the iPhones are far ahead in performance of any Android device out there, which is very much what the benchmark depict.
  • Quantumz0d - Monday, December 16, 2019 - link

    Did I mention anything from your article on QC vs x86 ? I was replying to a comment on "Revolutionary" performance of A series vs x86. And then you claimed it as nonsensical point of x86 on ARM.

    So "super slow animations" & "far ahead". What do you mean by that ? An iPhone X vs a 11 Pro will exhibit the launching speed, then loading speed differences same as 835 vs 855 which can be observed. Everything ApplePro guy did a massive video of iPhones across multiple A series iterations which is the ONLY way a user can see the performance improvement.

    But when Android vs iOS you are saying iPhone animation speeds are super slow yet the benches show much lead..So how is the user seeing the far ahead in performance out there when OP7 Pro vs iPhone 11 Pro Max, like iPhone is still faster as you claim but in reality user is seeing same ?
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    Apparently I'm able say that because I'm able to differentiate between CPU performance, raw performance, and "platform performance".

    CPU performance is clear cut on where we're at and if you're still arguing this then I have no interest in discussing this.

    Raw performance is what I would call things that are not actually affected by the OS, web content *is* far faster on the latest iPhone than on Androids, that's a fact. Among this is actual real applications, when Civilization came to iOS the developers notably commented on the performance being essentially almost as good as desktop devices, the performance is equal to x86 laptops or better: https://www.anandtech.com/show/13661/the-2018-appl...

    And finally, the platform experience includes stuff like the very slow animations. I expect this is a big part as to what you regard as being part of your "experience" and "reality". I even complained about this in the iPhone 11 review as I stated that I feel the hardware is being held back by the software here.

    Now here's what might blow your mind: I can both state that Apple's CPUs are far superior at the same time as stating that the Android experience might be faster, because both statements are very much correct.
  • Quantumz0d - Monday, December 16, 2019 - link

    Okay thanks for that clarity on Raw performance and other breakdowns like CPU, Platform. Yes I can also see that Web performance on A series has always been faster vs Androids.

    I forgot about that article. Good read, and on Civ 6 port however it lacks the GFX options. I would also mention that TFlops cannot be even compared within same company. Like Vega 64 is 12TFs vs a 5700XT at 9TFs, latter completely wrecks the former in majority except for the compute loads utlizing HBM. I know you mentioned the FP16 and other aspects of the figure in opening, just saying as many people just take that aspect. Esp the new Xbox SX and Console as a whole (They add the CPU too into that figure)

    And finally. Yes ARM scales in normal browsing, small tasks vs x86 laptops which majority of the people nowadays are doing (colleagues don't even use PCs) but for higher performance and other workloads ARM cannot cut it at all.

    Plus I'd also add these x86 laptop parts throttle a lot incl. Macbooks obv because they are skimping on cooling them for thinness so their consistency isn't there as well just like A series.
  • joms_us - Monday, December 16, 2019 - link

    When I look at the comparisons here, I look only for Android vs. Android or Apple vs. Apple. Comparing them with different OSes and more so primitive tools is a worthless approach. Firstly, the results need to be normalized, one Soc is showing lead while sucking more power than the other. Secondly, the bloated scores of Apple Soc here does not represent real-world results. Most Android phones with SD855 are faster if not the same than iPhone 11.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > Comparing them with different OSes and more so primitive tools is a worthless approach.

    SPEC is a native apples-to-apples comparison. The web benchmarks and the 3D benchmarks are apples-to-apples interpreted or abstracted, same-workload comparisons.
    All the tests here are directly comparable - the tests which aren't and which rely on OS specific APIs, such as PCMark, obviously don't have the Apple data.

    > Firstly, the results need to be normalized, one Soc is showing lead while sucking more power than the other.

    That's a very stupid rationale. If you were to follow that logic you'd have to normalise little cores up in performance as well because they suck much less power.
  • joms_us - Monday, December 16, 2019 - link

    > SPEC is a native apples-to-apples comparison.

    Stop right there, Apple vs. Apple only

    > The web benchmarks and the 3D benchmarks are apples-to-apples interpreted or abstracted, same-workload comparisons.
    All the tests here are directly comparable - the tests which aren't and which rely on OS specific APIs, such as PCMark, obviously don't have the Apple data.

    How? Just like Geekbench, different compilers are used. Different distribution of loads are made.
    My Ryzen 2700 can finished 5 full GB run as fast as one full GB run in an iPhone and yet the single core score of iPhone is higher than any Ryzen. You are showing Apple A13 (LOL A13 is faster than the fastest AMD or Intel chip) using Jurassic Spec benchmark?

    Talk about dreams vs. reality.

    > That's a very stupid rationale. If you were to follow that logic you'd have to normalise little cores up in performance as well because they suck much less power.

    We are talking about efficiency here, your beloved Apple chip is sucking twice the power than SD855 or SD865 per workload.

    Have you ever load a consumer website or run an consumer app with these phones side-by-side? Don't tell they are not using cpu or memory resources. They are, they are doing most if not all of the workloads on the charts here. While your chart if showing Apple has twice the performance vs SD865, the phone doesn't tell lies. A bloated benchmark score does not translate to real-world result.

    It is time to stop this worthless propaganda that Android SoC is inferior than Apple and the laughable IPC king (iPhone chip is faster than desktop processors).

    Until iPhone can play Crysis smoother than even low end laptops, this BS claim that it is the fastest chip should stop.
  • Quantumz0d - Monday, December 16, 2019 - link

    Agreed.

    It really feels like a propaganda every single article on CPU Apple gets super limelight because of these benches on a closed walled garden platform from OS to HW to Repair.

    The power consumption of A series processors deteriorating the battery was nicely thrown under the rug by Apple throttling bs. They even added the latest throttle switch for XS series. But yea no one cares. Apple's deeppockets allow top lawyers in their hands to manipulate every thing.

    The consumer app part. Its perfect use case since we never see any of the Android phones lag as interpreted here due to the dominance of A series by 2-3x folds and in real life nothing is observable. And comparing that to the x86 Desktop machines with proper OS and a computing usecases like Blender, Vray, MATLAB, Compliation, MIPS of Compression and decompression, Decode/Encoding and superior Filesystem support and socketed / Standardized HW (PCIe, I/O options), Virtualization and Gaming, DRAM scaling choice (user can buy whatever memory they want or any HW as its obvious)..this whole thing screams bs. It would be better if the highlight is mentioned on benches and realwork might differ but its not the case at all.

    The worst is spineless corporate agenda of allowing Chinese CPC to harvest every bit from their Cloud data Center in China allowing the subversion and anti liberty. A.k.a Anti American principles.

Log in

Don't have an account? Sign up now