Final Thoughts

Today’s preview focused solely on the performance metrics of the new chipset, which only cover a very small subset of the new features that the chip will be bringing to devices next year. A lot of the talking-points of the new SoC such as 5G connectivity, or the new camera and media capabilities, are aspects for which we’ll have to wait on commercial devices.

For what we’ve been able to test today, the Snapdragon 865 seems very solid. The new Cortex-A77 CPU does bring larger IPC improvements to the table, and thanks to the Snapdragon 865’s improved memory subsystem, the chip has been able to showcase healthy performance increases. I did find it odd that the web benchmarks didn’t quite perform as well as I had expected – I don’t know if the new microarchitecture just doesn’t improve these workloads as much, or if it might have been a software issue on the QRD865 phone; we’ll have to wait for commercial devices to have a clearer picture of the situation. System performance of the new chip certainly shouldn’t be disappointing, and even on a conservative baseline configuration, 2020 flagships should see an increase in responsiveness compared to the Snapdragon 855.

AI performance of the new chip is also improved – although our limited benchmark suite here isn’t able to fully expose the hardware improvements that the S865 brings with it. It’s likely that first-party camera applications will be the first real workloads that will be able to showcase the new capabilities of the chip.

On the GPU side, the improvements are also quite solid, but I just have a feeling that the narrative here isn’t quite the same anymore for Qualcomm, as Apple’s the elephant in the room now here as well. During the launch of the chipset the company was quite eager to promote that its sustained performance is better than the competition. While we weren’t able to test this aspect of the Snapdragon 865 on the QRD865 due to time constraints, the simple fact is that the chip’s peak performance remains inferior to Apple’s sustained performance, with the fruit company essentially dominating an area where previously Qualcomm was king. In this regard, I hope Qualcomm is able to catch up in the future, as the differences here are seemingly getting bigger each year.

Overall, the Snapdragon 865 seems like a very well-balanced chip and I have no doubt it’ll serve as a very competitive foundation for 2020 flagships. Qualcomm’s strengths lie in the fact that they’re able to deliver a complete solution with 5G connectivity – we do however hope that in the future the company will be able to offer more solid performance upgrades; the competition out there is getting tough.

GPU Performance & Power


View All Comments

  • Bulat Ziganshin - Monday, December 16, 2019 - link

    The Spec2006 tables show that A13 has performance similar to x86 desktop chips, which may be considered as revolution. Can you please add frequencies of the chips (both x86 and Apple) too, at least some estimations? Also, what are the memory configs (freq/CAS/...)? It will be also interesting to see x86 chips in individual SPEC benchmarks so we can analyze what are the weak and string points of Apple architecture. Reply
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    The Apple chips are running near their peak frequencies, with some subtests being slightly throttled due to power. The 9900K was at 5GHz, the 3950X at 4.6-4.65GHz, 3200CL16 on the desktop parts.

    I added the detailed overview over all chips; here's it again:
  • unclevagz - Monday, December 16, 2019 - link

    It would be nice if some contemporary x86 laptop chips could be added to that list (Ryzen/Ice Lake/Coffee Lake...) just for ease of comparison between ARM and x86 mobile chips. Reply
  • sam_ - Monday, December 16, 2019 - link

    Any strong reason for these tests being compiled with -mcpu=cortex-a53 on Android/Linux?

    One might expect for SoCs with 8.2 on all cores there may be some uplift from at least targeting cortex-a55, if not cortex-a75?

    When you're expecting to run on a big core, forcing the compiler to target a in-order core which can only execute one ASIMD instruction per cycle seems likely to restrict the perf (unrolling insufficiently etc.). Certainly seems a bit unfair for aarch64 vs. x64 comparison, and probably makes the apple SoCs look better too (assuming XCode isn't targeting a LITTLE core by default). It also likely makes newer bigger cores look worse than they should vs. older cores with smaller OoO windows.

    I get not wanting to target compilation to every CPU individually, but would be interesting to know how much of an effect this has; perhaps this could contribute to the expected IPC gains for FP not being achieved?
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    The tuning models only have very minor impact on the performance results. Whilst using the respective models for each µarch can give another 1-1.5% boost in some tests, as an overall average across all micro-architectures I found that giving the A53 model results in the highest performance. This is compared to not supplying any model at all, or using the common A57 model.

    The A55 model just points to the A53 scheduling model, so they're the same.
  • sam_ - Monday, December 16, 2019 - link

    Hmm, I took a look at LLVM and the scheduling model is indeed the same for A53 and A55, but A55 should enable instruction generation for the various extensions introduced since v8.0. I can believe that for spec 2006 8.1 atomics/SQRDMLAH/fp16/dot product/etc. instructions don't get generated.

    It looks like not much attention has been paid to tweaking the LLVM backend for more recent big cores than A57, beyond getting the features right for instruction generation, so I can believe cortex-a53 still ends up within a couple of percent of more specific tuning. Probably means there's more work to be done on LLVM.

    If it is easy to test I think it would be interesting to try cortex-a57, or maybe exynos-m4 tuning on a77 because these targets do seem to unroll more aggressively than other cortex-X targets with the current LLVM backend.
    I made a toy example on godbolt: , though for this particular loop I think a77 would have the vector integer MLA unit saturated with unroll by 2 (and is probably memory bound!), still the other targets would seem more predisposed to exposing instruction level parallelism.
  • Andrei Frumusanu - Tuesday, December 17, 2019 - link

    I pointed out to Arm that there's not much optimisations going on in terms of the models, but they said that they're not putting a lot of effort into that, and instead trying to optimise the general Arm64 target.

    I tested the A57 targets in the past, I'll have a look again on things like the M4 tuning over the coming months as I finally get to port SPEC2017.
  • Quantumz0d - Monday, December 16, 2019 - link

    Sigh another comment on the x86 vs A series. Why dont people understand running an x86 code on ARM will have a massive impact in performance ? How do people think a fanless BGA processor with sub 10W design beat an x86 in realworld just because it has Muh Benchwarrior ? There are so many possible workloads from SIMD, HT/SMT, ALU.

    Having scalability is also the key. Look at x86 AMD and Intel how they do it by making a Large Wafer and having multi SKUs with LGA/PGA (AM4) sockets allowing for maximum robustness.

    ARM is all about efficiency and economical bandwidth and it won't scale like x86 for all workloads. If you add AVX its dead. And Freq scaling with HT/SMT. Add the TSMC N7 which is only fit for mobile SoCs. Ryzen don't scale much into clocks because of this limitation.

    ARM is always Custom if you see as per Vendor. Its bad. Look at MediaTek trash no GPL policy. Huawei as well. Except QcommCAF and Exynos. Its a shame that TI OMAP left.
  • Andrei Frumusanu - Monday, December 16, 2019 - link

    > Why dont people understand running an x86 code on ARM will have a massive impact in performance ?

    Nobody even mentioned anything regarding this, you're going off on a nonsensical rant yet again. For once, please keep the comments section level-headed.
  • Quantumz0d - Monday, December 16, 2019 - link

    What ? Its a genuine point. ARM based 8c processors Windows machines like Surface Pro X can only emulate 32bit x86 code. 64bit isnt here and running both emulation will have am impact (slow) That's what I mean. They need native code to run and rival.

    Rant ? Benches = Realworld right. How come a user is able to see an OP7 Pro breeze through and not lag and offer shitty performance vs an iPhone ? I saw with my own OP3 downclocked on Sultan ROM due to the high clockspeed bug on 82x platform not just me, So many other users. GB score and benches do not only mean performance esp in ARM arena.

    Except for bragging rights, This is pure Whiteknighting.

Log in

Don't have an account? Sign up now