GPU Performance & Power

On the GPU side of things, testing the QRD865 is a bit complicated as we simply didn’t have enough time to run the device through our usual test methodology where we stress both peak as well as sustained performance of the chip. Thus, the results we’re able to present today solely address the peak performance characteristics of the new Adreno 650 GPU.

Disclaimer On Power: As with the CPU results, the GPU power measurements on the QRD865 are not as high confidence as on a commercial device, and the preliminary power and efficiency figures posted below might differ in final devices.

3DMark Sling Shot 3.1 Extreme Unlimited - Physics

The 3DMark Physics tests is a CPU-bound benchmark within a GPU power constrained scenario. The QRD865 here oddly enough doesn’t showcase major improvements compared to its predecessor, in some cases actually being slightly slower than the Pixel 4 XL and also falling behind the Kirin 990 powered Mate 30 Pro even though the new Snapdragon has a microarchitectural advantage. It seems the A77 does very little in terms of improving the bottlenecks of this test.

3DMark Sling Shot 3.1 Extreme Unlimited - Graphics

In the 3DMark Graphics test, the QRD865 results are more in line with what we expect of the GPU. Depending on which S855 you compare to, we’re seeing 15-22% improvements in the peak performance.

GFXBench Aztec Ruins - High - Vulkan/Metal - Off-screen

In the GFXBench Aztec High benchmark, the improvement over the Snapdragon 855 is roughly 26%. There’s one apparent issue here when looking at the chart rankings; although there’s an improvement in the peak performance, the end result is that the QRD865 still isn’t able to reach the sustained performance of Apple’s latest A13 phones.

GFXBench Aztec High Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone 11 Pro (A13) Warm N7P 26.14 3.83 6.82 fps/W
iPhone 11 Pro (A13) Cold / Peak N7P 34.00 6.21 5.47 fps/W
iPhone XS (A12) Warm N7 19.32 3.81 5.07 fps/W
iPhone XS (A12) Cold / Peak N7 26.59 5.56 4.78 fps/W
QRD865 (Snapdragon 865) N7P 20.38 4.58 4.44 fps/W
Mate 30 Pro (Kirin 990 4G) N7 16.50 3.96 4.16 fps/W
Galaxy 10+ (Snapdragon 855) N7 16.17 4.69 3.44 fps/W
Galaxy 10+ (Exynos 9820) 8LPP 15.59 4.80 3.24 fps/W

Looking at the estimated power draw of the phone, it indeed does look like Qualcomm has been able to sustain the same power levels as the S855, but the improvements in performance and efficiency here aren’t enough to catch up to either the A12 or A13, with Apple being both ahead in terms of performance, power and efficiency.

GFXBench Aztec Ruins - Normal - Vulkan/Metal - Off-screen

GFXBench Aztec Normal Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone 11 Pro (A13) Warm N7P 73.27 4.07 18.00 fps/W
iPhone 11 Pro (A13) Cold / Peak N7P 91.62 6.08 15.06 fps/W
iPhone XS (A12) Warm N7 55.70 3.88 14.35 fps/W
iPhone XS (A12) Cold / Peak N7 76.00 5.59 13.59 fps/W
QRD865 (Snapdragon 865) N7P 53.65 4.65 11.53 fps/W
Mate 30 Pro (Kirin 990 4G) N7 41.68 4.01 10.39 fps/W
Galaxy 10+ (Snapdragon 855) N7 40.63 4.14 9.81 fps/W
Galaxy 10+ (Exynos 9820) 8LPP 40.18 4.62 8.69 fps/W

We’re seeing a similar scenario in the Normal variant of the Aztec test. Although the performance improvements here do match the promised figures, it’s not enough to catch up to Apple’s two latest SoC generations.

GFXBench Manhattan 3.1 Off-screen

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone 11 Pro (A13) Warm N7P 100.58 4.21 23.89 fps/W
iPhone 11 Pro (A13) Cold / Peak N7P 123.54 6.04 20.45 fps/W
iPhone XS (A12) Warm N7 76.51 3.79 20.18 fps/W
iPhone XS (A12) Cold / Peak N7 103.83 5.98 17.36 fps/W
QRD865 (Snapdragon 865) N7P 89.38 5.17 17.28 fps/W
Mate 30 Pro (Kirin 990 4G) N7 75.69 5.04 15.01 fps/W
Galaxy 10+ (Snapdragon 855) N7 70.67 4.88 14.46 fps/W
Galaxy 10+ (Exynos 9820) 8LPP 68.87 5.10 13.48 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 61.16 5.01 11.99 fps/W
Mate 20 Pro (Kirin 980) N7 54.54 4.57 11.93 fps/W
Galaxy S9 (Exynos 9810) 10LPP 46.04 4.08 11.28 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W

Even on the more traditional tests such as Manhattan 3.1, although again the Adreno 650 is able to showcase good improvements this generation, it seems that Qualcomm didn’t aim quite high enough.

GFXBench T-Rex 2.7 Off-screen

GFXBench T-Rex Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone 11 Pro (A13) Warm N7P 289.03 4.78 60.46 fps/W
iPhone 11 Pro (A13) Cold / Peak N7P 328.90 5.93 55.46 fps/W
iPhone XS (A12) Warm N7 197.80 3.95 50.07 fps/W
iPhone XS (A12) Cold / Peak N7 271.86 6.10 44.56 fps/W
QRD865 (Snapdragon 865) N7P 206.07 4.70 43.84 fps/W
Galaxy 10+ (Snapdragon 855) N7 167.16 4.10 40.70 fps/W
Mate 30 Pro  (Kirin 990 4G) N7 152.27 4.34 35.08 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 150.40 4.42 34.00 fps/W
Galaxy 10+ (Exynos 9820) 8LPP 166.00 4.96 33.40fps/W
Galaxy S9 (Exynos 9810) 10LPP 141.91 4.34 32.67 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 108.20 3.45 31.31 fps/W
Mate 20 Pro (Kirin 980) N7 135.75 4.64 29.25 fps/W
Galaxy S8 (Exynos 8895) 10LPE 121.00 5.86 20.65 fps/W

Lastly, the T-Rex benchmark which is the least compute heavy workload tested here, and mostly is bottlenecked by texture and fillrate throughput, sees a 23% increase for the Snapdragon 865.

Overall GPU Conclusion – Good Improvements – Competitively Not Enough

Overall, we were able to verify the Snapdragon 865’s performance improvements and Qualcomm’s 25% claims seem to be largely accurate. The issue is that this doesn’t seem to be enough to keep up with the large improvements that Apple has been able to showcase over the last two generations.

During the chipset’s launch, Qualcomm was eager to mention that their product is able to showcase better long-term sustained performance than a competitor which “throttles within minutes”. While we don’t have confirmation as to whom exactly they were referring to, the data and narrative here only matches Apple’s device behaviour. Whilst we weren’t able to test the sustained performance of the QRD865 today, it unfortunately doesn’t really matter for Qualcomm as the Snapdragon 865 and Adreno 650’s peak performance falls in at a lower level than Apple’s A13 sustained performance.

Apple isn’t the only one Qualcomm has to worry about; the 25% performance increases this generation are within reach of Arm’s Mali-G77. In theory, Samsung’s Exynos 990 should be able to catch up with the Snapdragon 865. Qualcomm had been regarded as the mobile GPU leader over the last few years, but it’s clear that development has slowed down quite a lot recently, and the Adreno family has lost its crown.

Machine Learning Inference Performance Final Thoughts
Comments Locked

178 Comments

View All Comments

  • rpg1966 - Monday, December 16, 2019 - link

    How is Apple so far ahead in some/many respects, given that Arm is dedicated to designing these microarchitectures?
  • eastcoast_pete - Monday, December 16, 2019 - link

    In addition to spending $$$ on R&D, Apple can optimize (tailor, really) its SoCs 100% to its OS and vice versa. Also, not sure if anybody has figures just how much the (internal) costs of Apple's SoCs are compared to what Samsung, Xiaomi etc. pay QC for their flagship SoCs. Would be interesting to know how much this boils down to costs.
  • jospoortvliet - Monday, December 16, 2019 - link

    I think cody I'd the big factor. Qualcomm and arm keep chips small for cost reasons. Apple throws transistors at the problem and cares little...
  • s.yu - Monday, December 16, 2019 - link

    I like the approach of throwing transistors :)
  • generalako - Monday, December 16, 2019 - link

    Can we stop with these excuses? What cost reasons? Whose stopping them from making two architectures then, letting OEMs decide which to use -- if Apple does it, why not them? Samsung aiming at large cores with their failed M4 clearly points towards a desire/intention to have larger cores that are more performant. Let's not make the assumption that there's no need here--there clearly is.

    Furthermore, where is the excuse in ARM still being on the A55 for the third straight year? Or Qualcomm being on their GPU architecture for 3 straight years, with so incremental GPU improvements the past two years that they not only let Apple both match and vastly surpass them, but are even getting matched by Mali?

    There's simply no excuse for the laziness going on. ARM's architecture is actually impressive, with still big year-on-year IPC gains (whereas Apple has actually stagnated here the past two years). But abandoning any work on efficiency cores is inexcusable. As is the fact that none of the OEMs has done anything to deal with this problem.
  • Retycint - Monday, December 16, 2019 - link

    Probably because ARM designs for general use - mobiles, tablets, TVs, cars etc, whereas Apple designs specifically for their devices. So naturally Apple is able to devote more resources and time to optimize for their platform, and also design cores/chips specific to their use (phone or tablet).

    But then again I'm an outsider, so the reality could be entirely different
  • generalako - Monday, December 16, 2019 - link

    TIL using the same A55 architecture is "for general use" /s

    If ARM had actually done their job and released efficiency cores more often, like Apple does every year, we'd have far more performant and efficient smartphones today across the spectrum. Flagship phones would benefit in idle use (including standby), and also in assigning far more resource-mild works to these cores than they do today.

    But mid-range and low-end phones would benefit a huge amount here, with efficiency cores performing close to performance cores (often 1-2 older gen clocked substantially lower). That would also be cheaper, as it would make cluster of 2 performance cores not as necessary--fitting right in with your logic of making cheap designs for general use.
  • quadrivial - Monday, December 16, 2019 - link

    There's a few reasons.

    Apple seems to have started before arm did. They launched their design just 2 years or so after the announcement of a64 while arm needed the usual 4-5 years for a new design. I don't believe apples designers are that much better than normal (I think they handed them the ISA and threatened to buy out MIPS if they didn't). Arm has never recovered that lead time.

    That said, PA Semi had a bunch of great designers who has already done a lot of work with low power designs (mostly POWER designs if I recall correctly).

    Another factor is a32 support. It's a much more complex design and doesn't do performance, power consumption, or die area any favors. Apple has ecosystem control, so they just dropped the complex parts and just did a64. This also drastically reduces time to design any particular part of the core and less time to verify everything meaning more time optimizing vs teams trying to do both at once.

    Finally, Apple has a vested interest in getting faster as fast as possible. Arm and the mobile market want gradual performance updates to encourage upgrades. Even if they could design an iPhone killer today, I don't think they would. There's already enough trouble with people believing their phones are fast enough as is.

    Apple isn't designing these chips for phones though. They make them for their pro tablets. The performance push is even more important for laptops though. The current chip is close to x86 in mobile performance. Their upcoming 5nm designs should be right at x86 performance for consumer applications while using a fraction of the power. They're already including a harvested mobile chip in every laptop for their T2. Getting rid of Intel on their MacBook air would do two things. It would improve profits per unit by a hundred dollars or so (that's almost 10% of low end models). It also threatens Intel to get them better deals on higher end models.

    We may see arm move in a similar direction, but they can't get away with mandating their users and developers change architectures. Their early attempts with things like the surface or server chips (a57 was mostly for servers with a72 being the more mobile-focused design) fell flat. As a result, they seem to be taking a conservative approach that eliminates risk to their core market.

    The success or failure of the 8cx will probably be extremely impactful on future arm designs. If it achieves success, then focusing on shipping powerful, 64-bit only chip designs seems much more likely. I like my Pixelbook, but I'd be willing to try an 8cx if the price and features were right (that includes support for Linux containers).
  • Raqia - Monday, December 16, 2019 - link

    Nice post! You're right, it really does seem like Apple's own implementations defined the ARM v8.x spec given how soon after ARM's release their chips dropped. ARM is also crimped by the need to address server markets so their chips have a more complex cache and uncore hierarchies than Apple's and generally smaller caches with lower single threaded performance. Their customers' area budgets are also more limited compared to Apple who doesn't generally integrate a modem into their SoC designs.
  • aliasfox - Monday, December 16, 2019 - link

    I would also add that Qualcomm only makes a dozen or so dollars per chip, whereas Apple makes hundreds of dollars per newest generation iPhone and iPad Pro. Qualcomm's business model just puts them at a disadvantage in this case - they have to make a chip that's not only competitive in performance, but at a low enough cost that a) they can make money selling it, and b) handset vendors can make money using it. Apple doesn't really have to worry about that because for all intents and purposes, their chip division is a part of their mobile division.

    I wonder if it's in the cards for Apple to ever include both an Intel processor as well as a full fledged mobile chip in the future, working in the same way as integrated/discrete graphics - the system would primarily run on the A13x, with the Intel chip firing up for Intel-binary apps as needed.

Log in

Don't have an account? Sign up now