The Detailed Explanation of GPU Turbo

Under the hood, Huawei uses TensorFlow neural network models that are pre-trained by the company on a title-by-title basis. By examining the title in detail, over many thousands of hours (real or simulated), the neural network can build its own internal model of how the game runs and its power/performance requirements. The end result can be put into one dense sentence:

Optimized Per-Device Per-Game DVFS Control using Neural Networks

In the training phase, the network analyzes and adjusts the SoC’s DVFS parameters in order to achieve the best possible performance while minimizing power consumption. This entails trying its best to hit the nearest DVFS states on the CPUs, GPU, and memory controllers that still allow for hitting 60fps, yet without going to any higher state than is necessary (in other words, minimizing performance headroom). The end result is that for every unit of work that the CPU/GPU/DRAM has to do or manage, the corresponding hardware block has the perfectly optimized amount of power needed. This has a knock-on effect for both performance and power consumption, but mostly in the latter.

The resulting model is then included in the firmware for devices that support GPU Turbo. Each title has a specific network model for each smartphone, as the workload varies with the title and the resources available vary with the phone model. As far as we understand the technology, on the device itself there appears to be an interception layer between the application and GPU driver which monitors render calls. These serve as inputs to the neural network model.  Because the network model was trained to output the DVFS settings that would be most optimal for a given scene, the GPU Turbo mechanism can apply this immediately to the hardware and adjust the DVFS accordingly.

For SoCs that have them, the inferencing (execution) of the network model is accelerated by the SoC’s own NPU. Where GPU Turbo is introduced in SoCs that don’t sport an NPU, a CPU software fall-back is used. This allows for extremely fast prediction. One thing that I do have to wonder is just how much rendering latency this induces, however it can’t be that much and Huawei says they focus a lot on this area of the implementation. Huawei confirmed that these models are all 16-bit floating point (FP16), which means that for future devices like the Kirin 980, further optimization might occur through using INT8 models based on the new NPU support.

Essentially, because GPU Turbo is in effect a DVFS mechanism that works in conjunction with the rendering pipeline and with a much finer granularity, it’s able to predict the hardware requirements for the coming frame and adjust accordingly. This is how GPU Turbo in particular is able to make claims of much reduced performance jitter versus more conventional "reactive" DVFS drivers, which just monitor GPU utilization rate via hardware counters and adapt after-the-fact.

Thoughts After A More Detailed Explanation

What Huawei has done here is certainly an interesting approach with the clear potential for real-world benefits. We can see how distributing resources optimally across available hardware within a limited power budget will help the performance, the efficiency, and the power consumption, all of which is already a careful balancing act in smartphones. So the detailed explanation makes a lot of technical sense, and we have no issues with this at all. It’s a very impressive feat that could have ramifications in a much wider technology space, eventually including PCs.

The downside to the technology is the per-device & per-game nature of it. Huawei did not go into detail about long it took to train a single game: the first version of GPU Turbo supports PUBG and a Chinese game called Mobile Legends: Bang Bang. The second version, coming with the Mate 20, includes NBA 2K18, Rules of Survival, Arena of Valor, and Vainglory.

Technically the granularity is per-SoC rather than per-device, although different devices will have different limits in thermal performance or memory performance. But it is obvious that while Huawei is very proud of the technology, it is a slow per-game roll out. There is no silver bullet here – while an ideal goal would be a single optimized network to deal with every game in the market, we have to rely on default mechanisms to get the job done.

Huawei is going after its core gaming market first with GPU Turbo, which means plenty of Battle Royale and MOBA action, like PUBG and Arena of Valor, as well as tie-ins with companies like EA/Tencent for NBA 2K18. I suspect on the back of this realization, some companies will want to get in contact with Huawei to add their title to the list of games to be optimized. Our only request is that you also include tools so we can benchmark the game and output frame-time data, please!

On the next page, we go into our analysis on GPU Turbo with devices on hand. We also come across an issue with how Arm’s Mali GPU (used in Huawei Kirin SoCs) renders games differently to Huawei’s competitor devices.

The Claimed Benefits of GPU Turbo: Huawei’s Figures The Difficulty in Analyzing GPU Turbo
Comments Locked

64 Comments

View All Comments

  • eastcoast_pete - Tuesday, September 4, 2018 - link

    Thanks Andrei! I agree that this is, in principle, an interesting way to adjust power use and GPU performance in a finer-grained way than otherwise implemented. IMO, it also seems to be an attempt to push HiSlilicon's AI core, as its other benefits are a bit more hidden for now (for lack of a better word). Today's power modes (at least on Android) are a bit all-high or all-low, so anything finer grained is welcome. Question: how long can the "turbo" turbo for before it gets a bit warm for the SoC? Did Huawei say anything about thermal limitations? I assume the AI is adjusting according to outside temperature and SoC to outside temperature differential?

    Regardless of AI-supported or not, I frequently wish I could more finely adjust power profiles for CPU, GPU and memory and make choices for my phone myself, along the lines of: 1. Strong, short CPU and GPU bursts enabled, otherwise balanced, to account for thermals and battery use (most everyday use, no gaming), 2. No burst, energy saver all round (need to watch my battery use) and 3. High power mode limited only by thermals (gaming mode), but allows to vary power allocations to CPU and GPU cores. An intelligent management and power allocation would be great for all these, but especially 3.
  • Ian Cutress - Tuesday, September 4, 2018 - link

    GPU Turbo also has a CPU mode, if there isn't an NPU present. That's enabling Huawei to roll it out to older devices. The NPU does make it more efficient though.

    In your mode 3, battery life is still a concern. Pushing the power causes the efficiency to decrease as the hardware is pushed to the edge of its capabilities. The question is how much of a trade off is valid? Thermals can also ramp a lot too - you'll hit thermal skin temp limits a lot earlier than you think. That also comes down to efficiency and design.
  • kb9fcc - Tuesday, September 4, 2018 - link

    Sounds reminiscent of the days when nVidia and ATI would cook some code into their drivers that could detect when certain games and/or benchmarking tools were being run and tweak the performance to return results that favored their GPU.
  • mode_13h - Tuesday, September 4, 2018 - link

    Who's to say Nvidia isn't already doing a variation of GPU Turbo, in their game-ready drivers? The upside is less, with a desktop GPU, but perhaps they could do things like preemptively spike the core clock speed and dip the memory clock, if they knew the next few frames would be shader-limited but with memory bandwidth to spare.
  • Kvaern1 - Tuesday, September 4, 2018 - link

    I don't suppose China has a law that punishes partyboss owned corporation for making wild dishonest claims.
  • darckhart - Tuesday, September 4, 2018 - link

    ehhh it's getting hype now, but I bet it will only be supported on a few games/apps. it's a bit like nvidia's game ready drivers: sure the newest big name game releases get support (but only for newer gpu) and then what happens when the game updates/patches? will the team keep the game in the library and let the AI keep testing so as to keep it optimized? how many games will be added to the library? how often? which SoC will continue to be supported?
  • mode_13h - Tuesday, September 4, 2018 - link

    Of course, if they just operated a cloud service that automatically trained models based on automatically-uploaded performance data, then it could easily scale to most apps on most phones.
  • Ratman6161 - Tuesday, September 4, 2018 - link

    meh....only for games? So what. Yes, I know a lot of people reading this article care about games, but for those of us who don't this is meaningless. But looking at it as a gamer might, it still seems pretty worthless. Per soc and per game? That's going to take constant updates to keep up with the latest releases. And how long can they keep that up? Personally if I were that interested in games, I'd just buy something that's better at gaming to begin with.
  • mode_13h - Tuesday, September 4, 2018 - link

    See my point above.

    Beyond that, the benefits of a scheme like this, even on "something that's better at gaming to begin with", is longer battery life and less heat. Didn't you see the part where it clocks everything just high enough to hit 60 fps? That's as fast as most phone's displays will update, so any more and you're wasting power.
  • mode_13h - Tuesday, September 4, 2018 - link

    I would add that the biggest benefit is to be had by games, since they use the GPU more heavily than most other apps. They also have an upper limit on how fast they need to run.

    However, a variation on this could be used to manage the speeds of different CPU cores and the distribution of tasks between them.

Log in

Don't have an account? Sign up now