Machine Learning Inference Performance

The new SoC generations also bring with them new AI capabilities, however things are quite different in terms of their capabilities. We saw the Snapdragon 865 add to the table a whole lot of new Tensor core performance which should accelerate ML workloads, but the software still plays a big role in being able to extract that capability out of the hardware.

Samsung’s Exynos 990 is quite odd here in this regard, the company quoted the SoC’s NPU and DSP being able to deliver a 10TOPs but it’s not clear how this figure is broken down. SLSI has also been able to take advantage of the new Mali-G77 GPU and its ML abilities, exposing them through NNAPI.

We’re skipping AIMark for today’s test as the benchmark couldn’t support hardware acceleration for either device, lacking updated support for neither Qualcomm’s or SLSI’s ML SDK’s. We thus fall back to AIBenchmark 3, which uses NNAPI acceleration.

AIBenchmark 3

AIBenchmark takes a different approach to benchmarking. Here the test uses the hardware agnostic NNAPI in order to accelerate inferencing, meaning it doesn’t use any proprietary aspects of a given hardware except for the drivers that actually enable the abstraction between software and hardware. This approach is more apples-to-apples, but also means that we can’t do cross-platform comparisons, like testing iPhones.

We’re publishing one-shot inference times. The difference here to sustained performance inference times is that these figures have more timing overhead on the part of the software stack from initializing the test to actually executing the computation.

AIBenchmark 3 - NNAPI CPU

We’re segregating the AIBenchmark scores by execution block, starting off with the regular CPU workloads that simply use TensorFlow libraries and do not attempt to run on specialized hardware blocks.

AIBenchmark 3 - 1 - The Life - CPU/FP AIBenchmark 3 - 2 - Zoo - CPU/FP AIBenchmark 3 - 3 - Pioneers - CPU/INT AIBenchmark 3 - 4 - Let's Play - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/FP AIBenchmark 3 - 7 - Ms. Universe - CPU/INT AIBenchmark 3 - 8 - Blur iT! - CPU/FP

In the purely CPU accelerated workloads, we’re seeing both phones performing very well, but the Snapdragon 865’s A77 cores here are evidently in the lead by a good margin. It’s to be noted that the scores are also updated for the S10 phones – I noted a big performance boost with the Android 10 updates and the newer NNAPI versions of the test.

AIBenchmark 3 - NNAPI INT8

AIBenchmark 3 - 1 - The Life - INT8 AIBenchmark 3 - 2 - Zoo - Int8 AIBenchmark 3 - 3 - Pioneers - INT8 AIBenchmark 3 - 5 - Masterpiece - INT8 AIBenchmark 3 - 6 - Cartoons - INT8

Integer ML workloads on both phones is good, but because the Snapdragon 865 leverages the Hexagon DSP cores for such workload types, it’s much in lead ahead of the Exynos 990 S20. This latter variant however also showcases some very big performance improvements compared to its predecessor. I still think that Samsung here is only exposing the GPU of the SoC for NNAPI, but because of the new microarchitecture being able to accelerate ML workloads, we’re seeing a big performance improvement compared to the Exynos 9820.

AIBenchmark 3 - NNAPI FP16

AIBenchmark 3 - 1 - The Life - FP16 AIBenchmark 3 - 2 - Zoo - FP16 AIBenchmark 3 - 3 - Pioneers - FP16 AIBenchmark 3 - 5 - Masterpiece - FP16 AIBenchmark 3 - 6 - Cartoons - FP16 AIBenchmark 3 - 9 - Berlin Driving - FP16 AIBenchmark 3 - 10 - WESPE-dn - FP16

In FP16 workloads, the Exynos 990’s GPU actually manages to more often outperform the Snapdragon 865’s Adreno unit. In workloads that allow it, HiSilicon’s NPU still is far in the lead in workloads as it support FP16 acceleration which isn’t present on either the Snapdragon or Exynos SoCs – both falling back to their GPUs.

AIBenchmark 3 - NNAPI FP32

AIBenchmark 3 - 10 - WESPE-dn - FP32

Finally, FP32 also again uses the GPU of each SoC, and again the Exynos 990 presents quite a large performance lead ahead of the Snapdragon 865 unit.

It’s certainly encouraging to see the Samsung SoC keep up with the Snapdragon variant of the S20, pointing out that other vendors now finally are paying better attention to their ML capabilities. We don’t know much at all about the DSP or the NPU of the Exynos 990 as Samsung’s EDEN AI SDK is still not public – I hope that they finally open up more and allow third-party developers to take advantage of the available hardware.

System Performance: 120Hz Winner GPU Performance & Power
Comments Locked

137 Comments

View All Comments

  • MAGAover9000 - Tuesday, April 7, 2020 - link

    I have the s10+. Fantastic device. Very happy
  • id4andrei - Friday, April 3, 2020 - link

    No need for me to praise this review any longer. Still, I must nitpick. The 3dmark GPU test always has caveats in your reviews. Drop it if you feel it is detected by OEMs or it's a false GPU test like the physics one.

    On web tests. I read on wiki that JetStream is an Apple made test, literally. Wouldn't you say that's a big caveat when testing against ios? Similarly Speedometer is developed by the webkit team at Apple. With Android webview based on Blink, not webkit, wouldn't Android smartphones be at a disadvantage against iphones? I don't see Kraken(Firefox) or Octane(Google) being used.

    Kraken would actually be neutral to both. Other 3rd party tests might be Testdrive(Microsoft) or Basemark.
  • Andrei Frumusanu - Friday, April 3, 2020 - link

    I don't think that the fact that the WebKit team made those tests is a valid argument against using them. You can go and read the source JS yourself if you wish, and they're industry accepted benchmarks. Both Kraken and Octane are ancient and outdated and we dropped them just like we dropped SunSpider of the early days.
  • id4andrei - Friday, April 3, 2020 - link

    Thank you for the prompt answer.
  • s.yu - Friday, April 3, 2020 - link

    Thank you Andrei, again the most comprehensive and reliable set of samples anywhere!
    There seems to be considerable sample variation again (last time with Samsung was the main module since S9 with the variable aperture) in the UWA, S20+E and S20UE should have absolutely identical UWA performance but the S20UE seems to have far worse sagittal resolution than the S20+E, and Samsung's processing isn't that good in the first place, considering the 12MP 1.4μm could produce incredibly sharp pictures as that been the specs of the Pixels' main module for generations.
    I don't regret their switch to f/1.8 because the old module that went up to f/1.5 wasn't sharp wide open, especially in the corners, but a further two stops' variation to f/3.3 could be useful for more DoF in closeups provided inserting that physical aperture into the tiny module doesn't compromise the optical design otherwise.
    This time around the E seems to generally outperform the S, except in color as E doesn't seem to have proper color fidelity...almost as if chroma NR is set too high even in broad daylight, and the "hybridization" of the digital zoom, in which the E clearly uses a smaller portion from the periscope's readout than the S in the resulting merge. Speaking of the zoom, S20+ still performs slightly worse at 2x(16MP readout) than S10's native 12MP, though the difference is small and could be down to lens variation. Considering S10U's Z height, they could've easily fixed the S20U like Xiaomi, going 1/2.3" f/2 12MP with the 2x. Xiaomi used it despite a 4-1 bin, all the more reason to use it with a 9-1 bin. S20U's corner performance at 3x would also be much improved.
    Regarding the comparison with the Fuji though, I suspect your unit has trouble focusing to infinity correctly, because the train and forest samples show clear superiority of the Fuji's zoom. I especially recognize that kind of slight haziness as being very responsive to dehaze and low radius sharpening in LR and would result in far more detail with extraction in post. Also, with an ILC, there's always stopping down a little for more sharpness and more DoF.
    Regarding the full res modes, it's not worth storing 108MP of data with the CFA asking for a 9-1 bin, of course the 64MP would be better, without the RAW it's hard to say for sure, but the 64MP seems to be quad bayer.
  • s.yu - Friday, April 3, 2020 - link

    I don't agree with your remark about the night comparison with Mate30P though, the UWA is not "UW" so it has better image quality, that's true, and the night mode of the Mate30P is far superior, that's also true, but not auto mode, nor any aspect of the telephoto as it's clearly using a crop of the main for 3x. Samsung does attempt to use the 4x for telephoto and although there's a significant issue of chroma noise, it's far sharper than Mate30P's crop, with at least twice 3 times the effective resolution in night mode. With S20U you could also crop out a single shot 3-4x of similar brightness to the Mate30P crop, but it's just a crop.
    As for the potential of P40P surpassing S20U, that model operates on a 9.4MP crop by default, interpolated to 12.5MP which clearly has consequences. In daylight it's often a regression compared to P30P (much less match Mate30P), and in night shots using the current firmware it has severe color issues of rendering large portions of the scene as a crimson red, so it's hard to say at this point too.
  • s.yu - Friday, April 3, 2020 - link

    Oh, there's exception of the Mate30P auto mode in the last sample, but the night mode isn't constantly superior either.
  • RealBeast - Friday, April 3, 2020 - link

    I've been looking forward to getting one of these, not sure which yet. The fly in the ointment now is that I won't see my Mom (who gets my old S9+) until the Fall due to the whole COVID problem, not to mention less income. That will weigh heavily on sales of what is otherwise an amazing looking phone for me.
  • 29a - Friday, April 3, 2020 - link

    How large are the picture file sizes created by this thing?
  • BedfordTim - Sunday, April 5, 2020 - link

    The same size as any other 12MP camera. They will depend on content, hdr, motion and compression options but I would expect about 36MB for a raw image and 8MB for a high quality jpeg.

Log in

Don't have an account? Sign up now