Machine Learning Inference Performance

The new SoC generations also bring with them new AI capabilities, however things are quite different in terms of their capabilities. We saw the Snapdragon 865 add to the table a whole lot of new Tensor core performance which should accelerate ML workloads, but the software still plays a big role in being able to extract that capability out of the hardware.

Samsung’s Exynos 990 is quite odd here in this regard, the company quoted the SoC’s NPU and DSP being able to deliver a 10TOPs but it’s not clear how this figure is broken down. SLSI has also been able to take advantage of the new Mali-G77 GPU and its ML abilities, exposing them through NNAPI.

We’re skipping AIMark for today’s test as the benchmark couldn’t support hardware acceleration for either device, lacking updated support for neither Qualcomm’s or SLSI’s ML SDK’s. We thus fall back to AIBenchmark 3, which uses NNAPI acceleration.

AIBenchmark 3

AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.

We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initializing the test to actually executing the computation.

AIBenchmark 3 - NNAPI CPU

We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.

AIBenchmark 3 - 1 - The Life - CPU/FP AIBenchmark 3 - 2 - Zoo - CPU/FP AIBenchmark 3 - 3 - Pioneers - CPU/INT AIBenchmark 3 - 4 - Let's Play - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/INT AIBenchmark 3 - 8 - Blur iT! - CPU/FP

In the purely CPU accelerated workloads, we’re seeing both phones performing very well, but the Snapdragon 865’s A77 cores here are evidently in the lead by a good margin. It’s to be noted that the scores are also updated for the S10 phones – I noted a big performance boost with the Android 10 updates and the newer NNAPI versions of the test.

AIBenchmark 3 - NNAPI INT8

AIBenchmark 3 - 1 - The Life - INT8 AIBenchmark 3 - 2 - Zoo - Int8 AIBenchmark 3 - 3 - Pioneers - INT8 AIBenchmark 3 - 5 - Masterpiece - INT8 AIBenchmark 3 - 6 - Cartoons - INT8

Integer ML workloads on both phones is good, but because the Snapdragon 865 leverages the Hexagon DSP cores for such workload types, it’s much in lead ahead of the Exynos 990 S20. This latter variant however also showcases some very big performance improvements compared to its predecessor. I still think that Samsung here is only exposing the GPU of the SoC for NNAPI, but because of the new microarchitecture being able to accelerate ML workloads, we’re seeing a big performance improvement compared to the Exynos 9820.

AIBenchmark 3 - NNAPI FP16

AIBenchmark 3 - 1 - The Life - FP16 AIBenchmark 3 - 2 - Zoo - FP16 AIBenchmark 3 - 3 - Pioneers - FP16 AIBenchmark 3 - 5 - Masterpiece - FP16 AIBenchmark 3 - 6 - Cartoons - FP16 AIBenchmark 3 - 9 - Berlin Driving - FP16 AIBenchmark 3 - 10 - WESPE-dn - FP16

In FP16 workloads, the Exynos 990’s GPU actually manages to more often outperform the Snapdragon 865’s Adreno unit. In workloads that allow it, HiSilicon’s NPU still is far in the lead in workloads as it support FP16 acceleration which isn’t present on either the Snapdragon or Exynos SoCs – both falling back to their GPUs.

AIBenchmark 3 - NNAPI FP32

AIBenchmark 3 - 10 - WESPE-dn - FP32

Finally, FP32 also again uses the GPU of each SoC, and again the Exynos 990 presents quite a large performance lead ahead of the Snapdragon 865 unit.

It’s certainly encouraging to see the Samsung SoC keep up with the Snapdragon variant of the S20, pointing out that other vendors now finally are paying better attention to their ML capabilities. We don’t know much at all about the DSP or the NPU of the Exynos 990 as Samsung’s EDEN AI SDK is still not public – I hope that they finally open up more and allow third-party developers to take advantage of the available hardware.

System Performance: 120Hz Winner GPU Performance & Power
Comments Locked

137 Comments

View All Comments

  • toyeboy89 - Friday, April 3, 2020 - link

    I'm really amazed in the fact that the iPhone XR is still beating snapdragon 865 in GFXBench in both peak and sustained performance. I am hoping the OnePlus 8 has better sustained performance.
  • TMCThomas - Friday, April 3, 2020 - link

    Amazing review! Always wait for this one before getting a new samsung. And I won't be getting any of the s20 phones. For me the kind of feel like "beta" phones. The 120hz which is not quite ready for 1440p yet, the underutilized 108mp camera, the space zoom which is blurry, the camera hole still being there the big camera bump and so on. I think all these features and more could be way more refined with the next galaxy s which I'll be waiting for. Also the poor exynos 990 performance especially the GPU part is just unacceptable to me. Especially with it probably being a lot better next year, so I'll skip this year
  • wheeliebin - Friday, April 3, 2020 - link

    Thanks Andrei, really good review!

    I have read many users complain about extra crazy post-processing on the S10/S20 series when there is a face detected in the frame. i.e. the phone will apply an aggressive 'smooth skin' filter that you can't disable unless you shoot RAW. I was hoping that your review might touch on this however there were no people in your example shots so perhaps you didn't get a chance to experience the problem. I wonder if you have heard of this issue and can replicate it yourself with the S20 range?
  • anonomouse - Friday, April 3, 2020 - link

    Hi Andrei, did you also run the bandwidth and MLP sweeps from previous reviews? Last year you noted the Snapdragon 855/A76 had peculiar behavior in the L1, and it would be also interesting to see if there are any MLP changes in both the SD865 and the Exynos.
  • anonomouse - Friday, April 3, 2020 - link

    Also, any idea why the new scores for these in 403.gcc seems to be worse than their previous generation products? In particular the score for the SD865 in these S20s is substantially worse than the SD865 score from the QRD preview article.
  • Andrei Frumusanu - Saturday, April 4, 2020 - link

    Yes I know. I don't know why that happens. I also got a V60 now and the scores there are higher, I'm wondering if there's something with Samsungs shared libraries.
  • anonomouse - Sunday, April 5, 2020 - link

    What type of compile flags are used for these binaries? Are they the same for all of the tested binaries (or even same binary on each given platform)? Are LTO or PGO used (and if not why not)?

    I'm also not convinced of this statement from the article:

    "I had mentioned that the 7LPP process is quite a wildcard in the comparisons here. Luckily, I’ve been able to get my hands on a Snapdragon 765G, another SoC that’s manufactured on Samsung’s EUV process. It’s also quite a nice comparison as we’re able to compare that chip’s performance A76 cores at 2.4GHz to the middle A76 cores of the Exynos 990 which run at 2.5GHz. Performance and power between the two chips here pretty much match each other, and a clearly worse than other TSMC A76-based SoCs, especially the Kirin 990’s. The only conclusion here is that Samsung’s 7LPP node is quite behind TSMC’s N7/N7P/N7+ nodes when it comes to power efficiency – anywhere from 20 to 30%."

    Both the energy consumed and the performance scores for both of these A76's seem to also very closely track the "mid" 2.43Ghz A76's on the TSMC-fabbed SD855 - all of which have similar L2's and similar frequencies, but possibly differ significantly (to the point of being suboptimal on latency) on the memory hierarchy and SoC beyond that - which greatly affects many of the SPEC workloads. All of these may also have implementation targets. Given this, is it really conclusive that the Samsung process is truly 20-30% worse in energy efficiency? Granted, things will probably not look pretty next year when TSMC is on a true 5nm and Samsung is not.
  • Andrei Frumusanu - Monday, April 6, 2020 - link

    The test is just -Ofast without any other addition. LTO wasn't/isn't in a good state on the Android NDK - it's something to look into in maybe a new binary revision.

    As for the 855 figures, well, that's also on an earlier 7nm. HiSilicon did a lot better in terms of they physical implementation. If not against N7, 7LPP clearly has a disadvantage against N7P/N7+.
  • Andrei Frumusanu - Saturday, April 4, 2020 - link

    I'll add them in, that test takes a whole day and I needed the phones doing battery tests and other stuff.
  • dad_at - Saturday, April 4, 2020 - link

    Again, your S10+ Exynos results in pc mark are false as of 2020. In performance mode I easily get 9500 work 2.0 overall, about 9600 in browser bench, 21K in photo editing. PC mark in general is inconsistent, irrelevant benchmark, not representative of actual performance in daily usage. The same about these ancient SPEC synthetics. No one uses these for performance evaluation now.

Log in

Don't have an account? Sign up now