GPU Performance

For 3D graphics and games the Kirin 970 is the first GPU to make use of ARM’s second generation Bifrost GPU architecture, Heimdall / G72. The new IP is an evolutionary update over last year’s Mali G71 with density and efficiency updates. 

The density increase as well as the process node shrink allowed HiSilicon to increase the GPU core count by 50% from 8 to 12 while still reducing the GPU block complex in terms of absolute silicon area. There is no mincing around with words on last year’s G71 performance: The GPU unfortunately came nowhere near the projected efficiency goals stated by ARM in neither the Exynos 8895 nor the Kirin 960. The Kirin 960 especially was remarkable in terms of how we saw devices powered by it reach until then unheard of average power figures at the peak performance states, ranging at around the 9W mark for the Mate 9. I still remember 2 years ago I had praised HiSilicon for implementing a GPU conservative enough that it could properly sustain its maximum performance state within the device thermal envelope, staying below 4W. Nevertheless before continuing the power argument any power figures of the Kirin 970, let’s go over the peak performance figures of the most commonly used industry 3D benchmarks.

3DMark Sling Shot 3.1 Extreme Unlimited - Overall

3DMark Sling Shot 3.1 Extreme Unlimited - Graphics

3DMark Sling Shot 3.1 Extreme Unlimited - Physics

In 3DMark Sling Shot 3.1 Extreme Unlimited we see the G72 on the Kirin 970, oddly enough, not improving at all. I ran the benchmark several times and made sure thermals weren’t the causen but still the phone wasn’t able to increase performance over the Kirin 960 save for a small increase in the physics score. I’m not yet sure what the cause is here – I wasn’t able to monitor GPU frequency as I haven’t rooted the device yet so I can’t be sure that it’s using some kind of limitation mechanism.

GFXBench Car Chase ES 3.1 / Metal (Off Screen 1080p)

GFXBench Manhattan ES 3.1 / Metal (Off Screen 1080p)

GFXBench T-Rex HD (Offscreen)

Moving on to Kishonti’s GFXBench we see the Kirin 970 achieve its theoretical gains of 15-20%. As a reminder while the GPU core count increased 50% from 8 to 12 cores, the frequency has been vastly reduced from the maximum 1033MHz down to 746MHz, leaving only a more marginal performance upgrade to be expected.

The Kirin 970’s G71MP12 ends up slightly below the Exynos 8895’s G71MP20 and the Snapdragon 835’s Adreno 540 in more compute bound workloads such as Manhattan 3.1 or Car Chase. In TRex the GPU has a slight lead over the Exynos 8895, but only when the device is cool as it quickly starts throttling down from its maximum frequencies at slightly more elevated temperatures.

GPU Power Efficiency

 

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 33.04 4.18 7.90 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 30.98 3.98 7.78 fps/W
Huawei Mate 10 (Kirin 970) 10FF 37.66 6.33 5.94 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 14.45 3.47 4.16 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 21.94 5.44 4.03 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 10.37 2.75 3.77 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 32.49 8.63 3.77 fps/W
Huawei P9 (Kirin 955) 16FF+ 10.59 2.98 3.55 fps/W

In terms of average platform active power consumption, the Mate 10 shows as significant improvement over last year’s Mate 9. In Manhattan we go down from 8.6W to 6.33W. In terms of efficiency at similar peak performance the Kirin 970 managed only slightly outpace the Exynos 8895 and Mali G71. The architectural improvements that the G72 is promised to bring is counter-acted by the fact that the Exynos uses more cores at lower frequencies (and efficient voltages), with both ending up at a similar performance and efficiency point. The same effect applies between the Kirin 960 and 970, but in reverse. Here the addition of more cores at a lower frequency amplifies the process and architectural efficiency gains versus the G71, resulting in an absolute efficiency gain of 57% at peak performance, which comes near to Huawei’s stated claims of 50% efficiency gain. It’s to be noted that the true efficiency gain at same performance points is likely near the 100% mark, meaning for the same peak Kirin 960 performance levels the Kirin 970 and G72 implementation will be nearly double its efficiency.

Whilst this all might sound optimistic in terms of performance and efficiency gains, it’s all rather meaningless as the Mate 10 and Kirin 970 average power drains are still far above sustainable thermal envelopes at 6.3W.

GFXBench T-Rex Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
Galaxy S8 (Snapdragon 835) 10LPE 108.20 3.45 31.31 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 94.97 3.91 24.26 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 90.59 4.18 21.67 fps/W
Galaxy S8 (Exynos 8895) 10LPE 121.00 5.86 20.65 fps/W
Galaxy S7 (Exynos 8890) 14LPP 87.00 4.70 18.51 fps/W
Huawei Mate 10 (Kirin 970) 10FF 127.25 7.93 16.04 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 55.67 3.83 14.54 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 58.97 4.70 12.54 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 41.69 3.58 11.64 fps/W
Huawei P9 (Kirin 955) 16FF+ 40.42 3.68 10.98 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 99.16 9.51 10.42 fps/W

Again on T-Rex, which is less ALU heavy and more texture, fill-rate and triangle rate bound we see the Kirin 970 reach impressive performance levels at impressively bad power figures. At 7.93W the phone doesn’t seem to be able to sustain the peak frequencies for long as even on a second consecutive run we see performance go down as thermal throttling kicks in. So while the Kirin 970 slightly outpaces the Exynos 8895 in performance it does so at 25% lower efficiency.

Against the Kirin 960 as again the previous paragraph might sound dire, it’s a vast improvement in comparison. So disastrous was the peak power of the Mate 9 that still at 28% higher peak performance, the Mate 10 still manages to be 53% more efficient, again validating Huawei’s marketing claims. At iso-performance again I estimate that the Kirin 970 is likely near twice as efficient over the Kirin 960.

In all this you’ll have probably noticed Qualcomm consistently at the top of the charts. Indeed over the last few generations it seems Qualcomm is the only company which has managed to increase performance by architectural and process node improvements without ever increasing and exploding the power budget. On the contrary, Qualcomm seems to steadily able to lower the average power generation after generation, reaching an extremely impressive 3.5-3.8W on the Snapdragon 835. It’s widely quoted that mobile GPU’s power budget is 1.5-2W, but over the last few years the only high-end GPU able to achieve that seems to be Adreno, and this gap seems to be ever increasing generation after generation.

In my review of the Mate 8 there were a lot of users in the comments section who still deemed the performance of the T880MP4 in the Kirin 950 unsatisfactory and uncompetitive. Unfortunately this view is the common widespread notion among most users and most media, and was one of main complaints of Huawei devices in the past. Today Huawei is able to compete at the top of the benchmarks, but at a rather ghastly hidden cost of efficiency and unsustainable power that is perfectly honest a lot harder to test and to communicate to users.

AnandTech is also partly guilty here; you have to just look at the top of the page: I really shouldn’t have published those performance benchmarks as they’re outright misleading and rewarding the misplaced design decisions made by the silicon vendors. I’m still not sure what to do here and to whom the onus falls onto. As long as vendors keep away from configuring devices with unreachable and unsustainable performance states on 3D workloads and keep within reasonable levels then the whole topic becomes a non-issue. If things don’t improve then we’ll have to have a hard look on how to handle these situations I’m considering simply no longer posting any GPU peak performance figures in device reviews and keeping them in separate more technical SoC pieces such as this one.

Overall I think we’re at a critical point in time for the mobile GPU landscape. Qualcomm currently holds such an enormous lead in performance, density and efficiency that other silicon vendors who rely on IP vendors for their GPUs are in a tight and precarious situation in terms of their ability to offer competitive products. I see this as a key catalyst as to why Apple has stated to planning to abandon Imagination as their GPU IP provider in upcoming SoCs and why Samsung has accelerated efforts to replace Mali and also introduce their in-house S-GPU maybe as early as 2019. Over the course of the next 2 years we’ll be seeing some exciting shake-ups of the SoC GPU space, that’s for sure.

SPEC2006 - The Results An Introduction to Neural Network Processing
POST A COMMENT

116 Comments

View All Comments

  • lilmoe - Monday, January 22, 2018 - link

    Unfortunately, they're not "fully" vertical as of yet. They've been held back since the start by Qualcomm's platform, because of licensing and "other" issues that no one seems to be willing to explain. Like Andrei said, they use the lowest common denominator of both the Exynos and Snapdragon platforms, and that's almost always lower on the Snapdragons.

    Where I disagree with Andrei, and others, are the efficiency numbers and the type of workloads used to reach those results. Measuring efficiency at MAX CPU and GPU load is unrealistic, and frankly, misleading. Under no circumstance is there a smartphone workload that demands that kind of constant load from either the CPU or GPU. A better measure would be running a actual popular game for 30 mins in airplane mode and measuring power consumption accordingly, or loading popular websites, using the native browser, and measuring power draw at set intervals for a set period of time (not even a benchmarking web application).

    Again, these platforms are designed for actual, real world, modern smartphone workloads, usually running Android. They do NOT run workstation workloads and shouldn't be measured as such. Such notions, like Andrei has admitted, is what pushes OEMs to be "benchmark competitive", not "experience competitive". Apple is also guilty of this (proof is in the latest events, where they're power deliver can't handle the SoC, or the SoC is designed well above sustainable TDP). I can't stress this enough. You just don't run SPEC and then measure "efficiency". It just doesn't work that way. There is no app out there that stresses a smartphone SoC this much, not even the leading game. In the matter of fact, there isn't an Android (or iPhone) game that saturates last year's flagship GPU (probably not even the year before).

    We've reached a point of perfectly acceptable CPU and GPU performance for flagships running 1080p and 1440p resolution screens at this point. Co-processors, such as the decoder, ISP, DSP and NPU, in addition to software optimization are far, FAR more more important at this time, and what Huawei has done with their NPU is very interesting and meaningful. Kudos to them. I just hope these co-processors are meant to improve the experience, not collect and process private user data in any form.
    Reply
  • star-affinity - Monday, January 22, 2018 - link

    Just curious about your claims about Apple – so you think it's a design fault? I'm thinking that the problem arise only when the battery has been worn out and a healthy battery won't have the problem of not sustaining enough juice for the SoC. Reply
  • lilmoe - Monday, January 22, 2018 - link

    Their batteries are too small, by design, so that's the first design flaw. But that still shouldn't warrant unexpected slowdowns within 12-18 months of normal usage; their SoCs are too power hungry at peak performance, and the constant amount of bursts was having its tall on the already smaller batteries that weren't protect with a proper power delivery system. It goes both ways. Reply
  • Samus - Monday, January 22, 2018 - link

    Exactly this. Apple still uses 1500mah batteries in 4.7" phones. When more than half the energy is depleted in a cell this small, the nominal voltage drops to 3.6-3.7v from the 3.9-4.0v peak. A sudden spike in demand for a cell hovering around 3.6v could cause it to hit the low-voltage cutoff, normally 3.4v for Li-Ion, and 3.5v for Li-Polymer, to prevent damage to the chemistry the internal power management will shut the phone down, or slow the phone down to prevent these voltage drops.

    Apple designed their software to protect the hardware. It isn't necessarily a hardware problem, it's just an inherently flawed design. A larger battery that can sustain voltage drops, or even a capacitor, both of which take up "valuable space" according to Apple, like that headphone jack that was erroneously eliminated for no reason. A guy even successfully reinstalled a Headphone jack in an iPhone 7 without losing any functionality...it was just a matter of relocating some components.
    Reply
  • ZolaIII - Wednesday, January 24, 2018 - link

    Try with Dolphine emulator & you will see not only how stressed GPU is but also how much more performance it needs. Reply
  • Shadowfax_25 - Monday, January 22, 2018 - link

    "Rather than using Exynos as an exclusive keystone component of the Galaxy series, Samsing has instead been dual-sourcing it along with Qualcomm’s Snapdragon SoCs."

    This is a bit untrue. It's well known that Qualcomm's CDMA patents are the stumbling block for Samsung. We'll probably see Exynos-based models in the US within the next two versions once Verizon phases out their CDMA network.
    Reply
  • Andrei Frumusanu - Monday, January 22, 2018 - link

    Samsung has already introduced a CDMA capable Exynos in the 7872 and also offers a standalone CDMA capable modem (S359). Two year's ago when I talked to SLSI's VP they openly said that it's not a technical issue of introducing CDMA and it'll take them two years to bring it to market once they decide they need to do so (hey maybe I was the catalyst!), but they didn't clarify the reason why it wasn't done earlier. Of course the whole topic is a hot mess and we can only speculate as outsiders. Reply
  • KarlKastor - Thursday, January 25, 2018 - link

    Uh, how many devices have shipped yet with the 7872?
    Why do you think they came with a MDM9635 in the Galaxy S6 in all CDMA2000 regions? In all other regions their used their integrated shannon modem.
    The other option is to use a Snapdragon SoC with QC Modem. They also with opt for this alternative but in the S6 they don't wanted to use the crappy Snapdragon 810.

    It is possible, that Qualcomm today skip their politics concerning CDMA2000 because it is obsolete.
    Reply
  • jjj - Monday, January 22, 2018 - link

    Don't forget that Qualcomm is a foundry customer for Samsung and that could be why they still use it.
    Also, cost is a major factor when it comes to vertical integration, at sufficient scale integration can be much cheaper.
    What Huawei isn't doing is to prioritize the user experience and use their high end SoCs in lower end devices too, that's a huge mistake. They got much lower costs than others in high end and gaining scale by using these SoCs in lower end devices, would decrease costs further. It's an opportunity for much more meaningful differentiation that they fail to exploit. Granted, the upside is being reduced nowadays by upper mid range SoCs with big cores and Huawei might be forced into using their high end SoCs more as the competition between Qualcomm and Mediatek is rather ferocious and upper mid becomes better and better.

    Got to wonder about A75 and the clocks it arrives at ... While at it, I hope that maybe you take a close look at the SD670 when it arrives as it seems it will slightly beat SD835 in CPU perf.

    On the GPU side, the biggest problem is the lack of real world tests. In PC we have that and we buy what we need, in mobile somehow being anything but first is a disaster and that's nuts. Not everybody needs a Ferrari but mobile reviews are trying to sell one to everybody.
    Reply
  • HStewart - Monday, January 22, 2018 - link

    This could be good example why Windows 10 for ARM will failed - it only works for Qualcomm CPU and could explain why Samsung created Intel based Windows Tablets

    I do believe that ARM especially Samsung has good market in Phone and Tablets - I love my Samsung Tab S3 but I also love my Samsung TabPro S - both have different purposes.
    Reply

Log in

Don't have an account? Sign up now