GPU Performance & Power

The Google Tensors GPU is quite a beast. A Mali G78 with 20 cores, it’s sporting 42% more cores than the Exynos 2100 implementation, and only comes second after HiSilicon’s Kirin 9000. However, unlike the more power efficient N5 process node of the Kirin 9000, the Tensor SoC comes on the same process node as on the Exynos 2100. Having a much larger GPU, one would expect Google to drive the block at lower frequencies, in order to achieve better energy efficiency. To our surprise, the G78MP20 runs at up to 1GHz on the tiler and L2, and up to 848MHz on the shader cores, which is essentially the same as the smaller Exynos 2100 implementation of the GPU. Of course this immediately raises red flags for the Tensor when it comes to power consumption, as the chip certainly can’t pull out a rabbit out of a hat in terms of efficiency, so let’s see what happens:

3DMark Wild Life Unlimited

In 3DMark Wild Life unlimited, the first thing to note is that for some reason the regular Pixel 6 didn’t want to run the test as it errored out due to memory – I’m not sure what happened here, but it was isolated to the baseline model as the Pro unit had no issues.

The Pixel 6 Pro’s peak performance is respectable, however it’s only 21% faster than the Exynos 2100, not exactly what we’d expect from 21% more cores. A large issue with Mali GPUs of late has been that while you can throw more shader cores at the issue, the shared resources such as the tiler and L2 still remain as a single unit on the GPU. The G78’s ability to clock this part of the GPU higher is taken advantage of by Google in the Tensor implementation of the GPU, however that’s only 16% faster in pure clock frequency – maybe the workload is bottlenecked somewhere in this part of the GPU architecture.

Sustained performance off the start doesn’t look too good for the Pixel 6 Pro as it throttles considerably once the device gets hot, more on this in a bit.

Basemark GPU 1.2 - Medium 1440p - Off-Screen / Blit

In Basemark GPU, the Pixel 6 phones both showcase odd peak performance figures that are way lower than we expected, here the chip doesn’t even manage to outperform the Exynos 2100. I’m not sure what the technical explanation here is, as on paper, the chip should be faster.

GFXBench Aztec Ruins - High - Vulkan/Metal - Off-screen

In Aztec High, the peak performance of the Tensor is again below what you’d expect, at +14% vs the Exynos 2100, and slightly ahead of the Snapdragon 888.

Sustained performance is quite bad here, and especially the Pixel 6 Pro seems to be running more severe throttling than the Pixel 6.

Looking at the power consumption of the phones, at peak performance, the Pixel 6 lands in around 7.28W, however this figure is a bit misleading. In actuality, the phone is running peak power figures in excess of 9-10W, but this is so much power, that the SoC isn’t able to complete a single run of the benchmark without throttling, so average power for a given run is actually much lower. This would also explain as to why our peak performance figures are less than what’s expected of a GPU clocked this high, it simply can’t maintain that speed for long enough to give off an FPS figure at the peak frequencies.

At sustained frequencies, the Pixel 6 and Pixel 6 Pro end up with different spots, however both are at quite low power figures around 3W.

GFXBench Aztec Ruins - Normal - Vulkan/Metal - Off-screen

Aztec normal shows similar results, peak performance of the GPU is barely any better than the smaller configuration Exynos 2100 unit, and sustained performance figures are also significantly lower.

Sustained power after throttling on the phones is also quite weird here, as the phone seemingly throttles to <3W on the SoC. The Pixel 6 for some reason appears to have better power characteristics, it’s possible that chip bin has lower power than my 6 Pro unit.

GFXBench Manhattan 3.1 Off-screen

Manhattan 3.1 shows a similar peak and sustained performance standing, which isn’t too favourable for the Tensor.

Power levels in Manhattan are higher than the Aztec benchmarks, I think the CPUs, or the DRAM contribute to more of the power due to the higher achieved framerates, and it slightly helps the heat dissipation rather than having everything focused on the GPU.

Overall, the GPU performance of the Google Tensor is quite disappointing. On paper, the massive G78MP20 GPU seemed like a juggernaut at the frequencies Google delivers the chip in, but in practice, it doesn’t reach the theoretical levels of performance. That being said, over the last year of SoC releases, almost every vendor in the industry has introduced some absurd ultra-high-power GPU configuration that throttles quickly. Why they do this, I don’t know, GPU compute for burst performance is always one of the reasons given, so maybe Google is also aiming the GPU towards compute rather than gaming.

In terms of sustained performance levels, the larger GPU in theory should have allowed it to run at lower frequencies, thus at better efficiency, and in turn deliver more performance than a smaller implementation like that of the Exynos 2100. The reality here is that the Pixel 6 phones struggle with thermal dissipation, and it’s something that seems to be completely unrelated to the chip itself.


Source: PBKreviews

Both the Pixel 6 and Pixel 6 Pro are quite special in their hardware designs, in that they’re one of the rare Android devices out there which adopt an internal hardware design which doesn’t have a midframe adhered to the display panel. Looking at various teardowns of the phone, we can see that the display is relatively easily removable from the rest of the phone body, a design that’s actually more similar to Apple’s iPhones than any other Android flagship. This bodes well for the repairability of the screen, but it doesn’t do well for the thermal dissipation of the SoC. Much like iPhones have issues with thermal dissipation, and having much lower sustained power levels under stress, the Pixel 6 phones also suffer from the same issue as they cannot effectively use the display panel as a heat sink. This comes in contrast with other flagship Android devices – the Galaxy S21 Ultra for example has its display panel adhered to the midframe of the phone, it's not great for repairability, but it allows Samsung to employ a gigantic thermal dissipation pad the size of half of the phone footprint, with a direct heat pathway from the SoC to the display. Other thermally optimised devices out there share similar designs, able to better dump heat onto the full body of the phone.

The Pixel 6 Pro in contrast, has quite stark heat spots, with the left side of the phone, near the SoC, getting quite hot at up to 45°C, but at the same time the right side of the device here barely reaches 30-33°C, which is a large temperature gradient and signifies bad heat transfer abilities. Also, while I’m not sure how other people feel about this, but it does make the Pixel 6 phones feel more “hollow” in their build quality, but that might just be a nit-pick.

In any case, the Google Tensor’s chip gaming performance might be adequate, it’s no better than the Exynos 2100, and it gets further handicapped by the thermal design of the Pixel 6 phones. Generally, one can say it’s not the best phone for high-end gaming, which lines up with the subjective experiences with the devices in actual gaming demanding games like Genshin Impact.

CPU Performance & Power Google's IP: Tensor TPU/NPU
Comments Locked

108 Comments

View All Comments

  • Alistair - Tuesday, November 2, 2021 - link

    It's very irritating how slow Android SOCs are. I'll just keep on waiting. Won't give up my existing Android phone until actual performance improvements arrive. Hopefully Samsung x AMD will make a difference next year.
  • Speedfriend - Thursday, November 4, 2021 - link

    Looking at the excellent battery life of the iPhone 13 (which I am currently waiting for as my work phone) does iPhone till kill suspend background tasks. When I used to day trade, my iPhone would stop prices updating in the background, very annoying when I would flick to the app to check prices and unwittingly see prices hours old.
  • ksec - Tuesday, November 2, 2021 - link

    Av1 hardware decoder having design problem again?

    Where have I heard of this before?
  • Peskarik - Tuesday, November 2, 2021 - link

    preplanned obsolescence
  • tuxRoller - Tuesday, November 2, 2021 - link

    I wonder if Google is using the panfrost open source driver for Mali? That might account for some of the performance issues.
  • TheinsanegamerN - Tuesday, November 2, 2021 - link

    Seems to me based on thermals that the pixel 6/pro suffer from thermal throttling, and thus have power power budgets, then they should have given the internal hardware, leading to poor results.

    Makes me wonder what one of these chips could do in a better designed chassis.
  • name99 - Tuesday, November 2, 2021 - link

    I'd like to ask a question that's not rooted in any particular company, whether it's x86, Google, or Apple, namely: how different *really* are all these AI acceleration tools, and what sort of timelines can we expect for what?

    Here are the kinda use cases I'm aware of:
    For vision we have
    - various photo improvement stuff (deblur, bokeh, night vision etc). Works at a level people consider OK, getting better every year.
    Presumably the next step is similar improvement applied to video.

    - recognition. Objects, OCR. I'd say the Apple stuff is "acceptable". The OCR is genuinely useful (eg search for "covid" will bring up a scan of my covid card without me ever having tagged it or whatever), and the object recognition gets better every year. Basics like "cat" or person recognition work well, the newest stuff (like recognizing plant species) seems to be accurate, but the current UI is idiotic and needs to be fixed (irrelevant for our purposes).
    On the one hand, you can say Google has had this for years. On the other hand my practical experience with Google Lens and recognition is that the app has been through so many rounds of "it's on iOS, no it isn't; it's available in the browser, no it isn't" that I've lost all interest in trying to figure out where it now lives when I want that sort of functionality. So I've no idea whether it's better than Apple along any important dimensions.

    For audio we have
    - speech recognition, and speech synth. Both of these have been moved over the years from Apple servers to Apple HW, and honestly both are now remarkably good. The only time speech recognition serves me poorly is when there is a mic issue (like my watch is covered by something, or I'm using the mic associated with my car head unit, not the iPhone mic).
    You only realize how impressive this is when you hear voice synth from older platforms, like the last time I used Tesla maybe 3 yrs ago the voice synth was noticeably more grating and "synthetic" than Apple. I assume Google is at essentially Apple level -- less HW and worse mics to throw at the problem, but probably better models.

    - maybe there's some AI now powering Shazam? Regardless it always worked well, but gets better and faster every year.

    For misc we have
    - various pose/motion recognition stuff. Apple does this for recognizing types of exercises, or handwashing, and it works fine. I don't know if Google does anything similar. It does need a watch. Not clear how much further this can go. You can fantasize about weird gesture UIs, but I'm not sure the world cares.

    - AI-powered keyboards. In the case of Apple this seems an utter disaster. They've been at it for years, it seems no better now with 100x the HW than it was five years ago, and I think everyone hates it. Not sure what's going on here.
    Maybe it's just a bad UI for indicating that the "recognition" is tentative and may be revised as you go further?
    Maybe the model is (not quite, but almost entirely) single-word based rather than grammar and semantic based?
    Maybe the model simply does not learn, ever, from how I write?
    Maybe the model is too much trained by the actual writing of cretins and illiterates, and tries to force my language down to that level?
    Regardless, it's just terrible.

    What's this like in Google world? no "AI"-powered keyboards?, or they exist and are hated? or they exist and work really well?

    Finally we have language.
    Translation seems to have crossed into "good enough" territory. I just compared Chinese->English for both Apple and Google and while both were good enough, neither was yet at fluent level. (Honestly I was impressed at the Apple quality which I rate as notably better than Google -- not what I expected!)

    I've not yet had occasion to test Apple in translating images; when I tried this with Google, last time maybe 4 yrs ago, it worked but pretty terribly. The translation itself kept changing, like there was no intelligence being applied to use the "persistence" fact that the image was always of the same sign or item in a shop or whatever; and the presentation of the image, trying to overlay the original text and match font/size/style was so hit or miss as to be distracting.

    Beyond translation we have semantic tasks (most obviously in the form of asking Siri/Google "knowledge" questions). I'm not interested in "which is a more useful assistant" type comparisons, rather which does a better job of faking semantic knowledge. Anecdotally Google is far ahead here, Alexa somewhat behind, and Apple even worse than Alexa; but I'm not sure those "rate the assistant" tests really get at what I am after. I'm more interested in the sorts of tests where you feed the AI a little story then ask it "common sense" questions, or related tasks like smart text summarization. At this level of language sophistication, everybody seems to be hopeless apart from huge experimental models.

    So to recalibrate:
    Google (and Apple, and QC) are putting lots of AI compute onto their SoCs. Where is it used, and how does it help?
    Vision and video are, I think clear answers and we know what's happening there.
    Audio (recognition and synth) are less clear because it's not as clear what's done locally and what's shipped off to a server. But quality has clearly become a lot better, and at least some of that I think happens locally.
    Translation I'm extremely unclear how much happens locally vs remotely.
    And semantics/content/language (even at just the basic smart secretary level) seems hopeless, nothing like intelligent summaries of piles of text, or actually useful understanding of my interests. Recommendation systems, for example, seem utterly hopeless, no matter the field or the company.

    So, eg, we have Tensor with the ability to run a small BERT-style model at higher performance than anyone else. Do we have ways today in which that is used? Ways in which it will be used in future that aren't gimmicks? (For example there was supposed to be that thing with Google answering the phone and taking orders or whatever it was doing, but that seems to have vanished without a trace.)

    As I said, none of this is supposed to be confrontational. I just want a feel for various aspects of the landscape today -- who's good at what? are certain skills limited by lack of inference or by model size? what are surprising successes and failures?
  • dotjaz - Tuesday, November 2, 2021 - link

    " but I do think it’s likely that at the time of design of the chip, Samsung didn’t have newer IP ready for integration"

    Come on. Even A77 was ready wayyyy before G78 and X1, how is it even remotely possible to have A76 not by choice?
  • Andrei Frumusanu - Wednesday, November 3, 2021 - link

    Samsung never used A77.
  • anonym - Sunday, November 7, 2021 - link

    Exynos 980 uses Cortex-A77

Log in

Don't have an account? Sign up now