GPU Performance & Power

The Kirin 980 is the first SoC to sport Arm’s newest generation Mali G76 GPU. The new IP differs significantly to previous generations, in more or less simplified terms, in that the GPU cores are essentially twice as big and capable as the previous generation Mali G72 cores. So while the G76MP10 configuration of the Kirin 980 might sound small, it’s not small at all in terms of theoretical performance.

GPU performance and efficiency has been a big thorn in the side of both the Kirin 960 and 970, as both SoCs showcased less than stellar power figures, which in turn also resulted in forced limited clocks and performance of the GPUs. It’s here that Huawei made the biggest promises in terms of improvements: a claimed 46% increase in performance while showcasing a staggering 178% increase in power efficiency. The latter figure especially caught some attention, as you just don’t see such increases in the industry.

3DMark Sling Shot 3.1 Extreme Unlimited - Physics

Starting off with 3DMark Sling Shot Extreme Unlimited and the Physics sub-test, we see the Mate 20’s showcase some leading peak performance figures. This test is mainly a CPU test with just some more minor GPU load. The performance jump here undoubtedly comes from the new Cortex A76 microarchitecture.

In terms of sustained performance, we see some diverging figures between the Mate 20 and Mate 20 Pro, as the Pro is able to reach much higher sustained scores. Before getting into any conclusions, it’s worth to also look at the GPU results.

3DMark Sling Shot 3.1 Extreme Unlimited - Graphics

On the Graphics sub-test, we see both new Mate 20’s reach respectable peak performance figures, however they are both still throttling quite a lot until they reach thermal equilibrium. Comparing the results to the stock firmware Kirin 970’s, such as the P20 Pro, the performance increase is nevertheless quite significant.

GFXBench Aztec Ruins - High - Vulkan/Metal - Off-screen GFXBench Aztec Ruins - Normal - Vulkan/Metal - Off-screen

In the new Aztec Ruins Vulkan benchmarks, both in High and Normal quality modes, we see some really odd performance behaviour. While the peak performance isn’t all that great, the sustained performance is pretty much almost identical. On the Normal run the Mate 20 Pro actually was able to maintain a higher performance than the Mate 20, something that we also saw on the 3DMark Physics run. It would be definitely interesting if the benchmark is in some way CPU bound, or if the devices have different thermal limits between Vulkan and OpenGLES workloads.

GFXBench Manhattan 3.1 Off-screen

In Manhattan 3.1, we see again respectable performance gains both in peak and sustained figures. Compared to the Vulkan runs, these scores showcase a more expected delta between peak and sustained. The Kirin 980 here generally matches most Snapdragon 845 devices – short of the OnePlus 6 and G7 which seem to allow much higher sustained power limits.

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone XS (A12) Warm 7FF 76.51 3.79 20.18 fps/W
iPhone XS (A12) Cold / Peak 7FF 103.83 5.98 17.36 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 61.16 5.01 11.99 fps/W
Huawei Mate 20 Pro (Kirin 980) 7FF 54.54 4.57 11.93 fps/W
Galaxy S9 (Exynos 9810) 10LPP 46.04 4.08 11.28 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 33.04 4.18 7.90 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 30.98 3.98 7.78 fps/W
Huawei Mate 10 (Kirin 970) 10FF 37.66 6.33 5.94 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W
Galaxy S7 (Exynos 8890) 14LPP 29.41 5.95 4.94 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 14.45 3.47 4.16 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 21.94 5.44 4.03 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 10.37 2.75 3.77 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 32.49 8.63 3.77 fps/W
Huawei P9 (Kirin 955) 16FF+ 10.59 2.98 3.55 fps/W

Looking at the power efficiency during Manhattan 3.1, we unfortunately see that the phone and chipset didn’t quite meet my projections in efficiency. Performance is exactly where it should be, however the power is off by about 1W as I had hoped to see about 3.5W peak power. At peak performance of both chipsets, the Kirin 980 showcases a 100% efficiency gain over the Kirin 970, which is still a pretty massive generational improvement, even if the previous generation didn’t exactly set the bar all that high.

In regards to Huawei’s 178% power efficiency claim during the chipset’s announcement: I still think this number is correct, however evidently this was a traditional case of somewhat misleading presentation or a mixup between “or” and “and” in the relationship between the performance and power efficiency improvements. Now in hindsight, the 178% efficiency figure likely refers to the efficiency advantage of the Kirin 980 at the same performance of the Kirin 970, which given the measured power figures here, is something that’s definitely plausible.

GFXBench T-Rex 2.7 Off-screen

In T-Rex, the peak performance improvements over the Kirin 970 are far less, and I do wonder exactly what the bottleneck here is. Nevertheless, the sustained performance jumps 50%, but yet again this is just for the Mate 20 Pro as the regular Mate 20 sees far more severe throttling. T-Rex would be in many ways CPU bound as it’s hitting very high frame-rates on modern SoCs.

GFXBench T-Rex Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone XS (A12) Warm 7FF 197.80 3.95 50.07 fps/W
iPhone XS (A12) Cold / Peak 7FF 271.86 6.10 44.56 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 150.40 4.42 34.00 fps/W
Galaxy S9 (Exynos 9810) 10LPP 141.91 4.34 32.67 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 108.20 3.45 31.31 fps/W
Huawei Mate 20 Pro (Kirin 980) 7FF 135.75 4.64 29.25 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 94.97 3.91 24.26 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 90.59 4.18 21.67 fps/W
Galaxy S8 (Exynos 8895) 10LPE 121.00 5.86 20.65 fps/W
Galaxy S7 (Exynos 8890) 14LPP 87.00 4.70 18.51 fps/W
Huawei Mate 10 (Kirin 970) 10FF 127.25 7.93 16.04 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 55.67 3.83 14.54 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 58.97 4.70 12.54 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 41.69 3.58 11.64 fps/W
Huawei P9 (Kirin 955) 16FF+ 40.42 3.68 10.98 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 99.16 9.51 10.42 fps/W

Again, the power efficiency as measured on T-Rex sees a significant jump over the Kirin 970, however most of this improvement is simply going towards reducing the actual power usage from the ridiculously high values of its predecessor, with only a little gained peak performance.

I wouldn’t take this as a definitive verdict on the Mali G76 as of yet, as over the last 3 generations Samsung has been able to extract much better results out of their GPU implementations inside the Exynos SoCs than what HiSilicon was able to achieve in the Kirins. The next generation Exynos 9820 should be able to do better than this, so maybe that’s where the Mali G76 will hit its projected targets.

Overall, the Kirin 980 definitely is posting substantial improvements over its predecessor, however Arm’s Mali GPU still seems to lag a tad behind the higher end competition from Apple and Qualcomm. What is definitely positive for Huawei is that the new SoC finally is able to shed off the atrocious performance showcased in the previous generation chipsets, and is now actually competitive with most recent devices.

System Performance Display Measurement & Power
Comments Locked

141 Comments

View All Comments

  • Javert89 - Friday, November 16, 2018 - link

    Perhaps the most interesting part is missing :( how is working (performance and power) the middle cluster at 1.92 ghz? Same performance of 2.8ghz A75 at half power usage?
  • Andrei Frumusanu - Friday, November 16, 2018 - link

    I couldn't test it without root.
  • ternnence - Friday, November 16, 2018 - link

    try syscall(__NR_sched_setaffinity, pid, sizeof(mask), &mask)
  • ternnence - Friday, November 16, 2018 - link

    FYI,https://stackoverflow.com/questions/7467848/is-it-...
  • pjcamp - Friday, November 16, 2018 - link

    If it weren't for Huawei's aggressively belligerent stance against unlocked bootloaders . . . .
  • name99 - Friday, November 16, 2018 - link

    Andrei, can you please explain something that I just do not understand in any of these phone reviews (Apple or Android).
    The die shots always show 4x 16-wide LPDDR4 PHYs. OK, so 64-bit wide channel to DRAM, seems reasonable.

    Now the fastest normal LPDDR4 is LPDDr4-2133, which in any normal naming scheme would imply 2,133MT/s. So one transaction, 8 bytes wide, gives us guaranteed-not-to-exceed of 17GB/s.
    But of course Huawei's Geekbench4 memory bandwidth is ~22GB/s. Maybe Huawei are using slightly faster LPDDr4-2166 or whatever, but the details don't change --- the only way the numbers work out is if the "maximum bandwidth" of the DRAM is actually around 34 GB/s.

    Which implies that EITHER
    - LPDDR4-2133 does NOT mean 2133MT/sec. (But that's what common sense would suggest, and this recent AnandTech article on DDR5
    https://www.anandtech.com/show/13605/sk-hynix-deve... )

    OR

    - somehow there is 128-bits of width between all the high-end phone SoCs (either 2 independent 64-bit channels [more likely IMHO] or a single 128-bit wide channel).

    Can you clarify?
  • anonomouse - Friday, November 16, 2018 - link

    It’s 2133MHz IO and it’s DDR, so 4266MT/s. Each LPDDR4 channel is 16 bits. Hence the common listing of LPDDR4X-4266.

    Usually these are advertised/listed at the MT/s rate so DDR4-2666 has an IO clock of 1333MHz. Main difference being that DDR4 has a 64 bit channel width.
  • name99 - Friday, November 16, 2018 - link

    But then look at the article I gave, for DDR5
    https://www.anandtech.com/show/13605/sk-hynix-deve...

    This includes sentences like "The new DDR5 chip from SK Hynix supports a 5200 MT/sec/pin data transfer rate, which is 60% faster than the 3200 MT/s rate officially supported by DDR4."
    which strongly implies that a DDR4-3200 is NOT running at 6400 MT/s.

    WTF is going on here? Micron lists their LPDDR4, for example, as LPDDR4-2133, NOT as LPDDR4-4266?
  • N Zaljov - Sunday, November 18, 2018 - link

    I fail to see any issues with the current naming convention, apart from being confusing asf.

    "Micron lists their LPDDR4, for example, as LPDDR4-2133, NOT as LPDDR4-4266" - of course they are: https://www.micron.com/parts/dram/mobile-ddr4-sdra...

    Although there seems to be a typo in the specs of their partlists, which can be confusing, but they are clearly listing their LPDDR4(x) as LPDDR4-4266 (or, typoed, LPDDR4-4166), with an I/O clk of 2133 MHz and an actual memory clockspeed of around 533,3 MHz (on-demand modulation will keep the clock of the memory arrays somewhere between 533,25 and 533,35, depending on the load).
  • Andrei Frumusanu - Friday, November 16, 2018 - link

    The DSU's interface is limited at 2x 128bit per ACE interface to the memory subsystem/interconnect (32B/cycle in each direction) times the frequency of the DSU/L3 of which we aren't certain in the Kirin 980, but let's take the S845 which runs at 1478MHz IIRC: ~47GB/s. Plenty enough. We don't know the interconnect bandwidth from the DSU to the memory controller. The memory controllers themselves internally run at a different frequency (usually half) but what matters is talking about the DRAM speed. The Kirin 980/Mate 20's run on LPDDR4X at 2133MHz, or actually 4266MT/s because it's DDR. That's a peak of 4*16*4266/8=34.12GB/s.

    The actual answer is a lot simpler and more stupid. Geekbench 4's multi-threaded memory test just caps out at 2 threads, so in reality there's only ever two CPUs stressing the memory controller. Beyond this I've been told by some vendors that it doesn't scale in the test itself.

    My conclusion: Ignore all the GB4 memory tests.

Log in

Don't have an account? Sign up now