Google's IP: Tensor TPU/NPU

At the heart of the Google Tensor, we find the TPU which actually gives the chip is marketing name. Developed by Google with input and feedback by the team’s research teams, taking advantage of years of extensive experience in the field of machine learning, Google puts a lot of value into the experiences that the new TPU allows for Pixel 6 phones. There’s a lot to talk about here, but let’s first try to break down some numbers, to try to see where the performance of the Tensor ends up relative to the competition.

We start off with MLCommon’s MLPerf – the benchmark suite works closely with all industry vendors in designing something that is representative of actual workloads that run on devices. We also run variants of the benchmark which are able to take advantage of various vendors SDKs and acceleration frameworks. Google had sent us a variant of the MLPerf app to test the Pixel 6 phones with – it’s to be noted that the workloads on the Tensor run via NNAPI, while other phones are optimised to run through the respective chip vendor’s libraries, such as Qualcomm’s SNPE, Samsung’s EDEN, or MediaTek’s Neuron – unfortunately only the Apple variant is lacking CoreML acceleration, thus we should expect lower scores on the A15.

MLPerf 1.0.1 - Image Classification MLPerf 1.0.1 - Object Detection MLPerf 1.0.1 - Image SegmentationMLPerf 1.0.1 - Image Classification (Offline)

Starting off with the Image Classification, Object Detection, and Image Segmentation workloads, the Pixel 6 Pro and the Google Tensor showcase good performance, and the phone is able to outperform the Exynos 2100’s NPU and software stack. More recently, Qualcomm had optimised its software implementation for MLPerf 1.1, able to achieve higher scores than a few months ago, and this allows the Snapdragon 888 to achieve significantly better scores than what we’re seeing on the Google Tensor and the TPU – at least for those workloads, in the current software releases and optimisations.

MLPerf 1.0.1 - Language Processing 

The Language Processing test of MLPerf is a MobileBERT model, and here for either architectural reasons of the TPU, or just a vastly superior software implementation, the Google Tensor is able to obliterate the competition in terms of inference speed.

In Google’s marketing, language processing, such as live transcribing, and live translations, are very major parts of the differentiating features that the new Google Tensor enables for the Pixel 6 series devices – in fact, when talking about the TPU performance, it’s exactly these workloads that the company highlights as being the killer use-cases and what the company calls state-of-the-art.

If the scores here are indeed a direct representation of Google’s design focus of the TPU, then that’s a massively impressive competitive advantage over other platforms, as it represents a giant leap in performance.

GeekBench ML 0.5.0

Other benchmarks we have available are for example GeekBench ML, which is currently still in a pre-release state in that the models and acceleration can still change in further updates.

The performance here depends on the APIs used, with the test either allowing TensorFlow delegates for the GPU or CPU, or using NNAPI on Android devices (and CoreML on iOS). The GPU results should only represent the GPU ML performance, which is surprisingly not that great on the Tensor, as it somehow lands below the Exynos 2100’s GPU.

In NNAPI mode, the Tensor is able to more clearly distinguish itself from the other SoCs, showcasing a 44% lead over the Snapdragon 888. It’s likely this represent the TPU performance lead, however it’s very hard to come to conclusions when it comes to such abstractions layer APIs.

AI Benchmark 4 - NNAPI (CPU+GPU+NPU)

In AI Benchmark 4, when running the benchmark in pure NNAPI mode, the Google Tensor again showcases a very large performance advantage over the competition. Again, it’s hard to come to conclusions as to what’s driving the performance here as there’s use of CPU, GPU, and NPUs.

I briefly looked at the power profile of the Pixel 6 Pro when running the test, and it showcased similar power figures to the Exynos 2100, which extremely high burst power figures of up to 14W when doing individual inferences. Due to the much higher performance the Tensor showcases, it also means it’s that much more efficient. The Snapdragon 888 peaked around 12W in the same workloads, so the efficiency gap here isn’t as large, however it’s still in favour of Google’s chip.

All in all, Google’s ML performance of the Tensor has been its main marketing point, and Google doesn’t disappoint in that regard, as the chip and the TPU seemingly are able to showcase extremely large performance advantages over the competition. While power is still very high, completing an inference faster means that energy efficiency is also much better.

I asked Google what their plans are in regards to the software side of things for the TPU – whether they’ll be releasing a public SDK for developers to tap into the TPU, or whether things will remain more NNAPI centric like how they are today on the Pixels. The company wouldn’t commit yet to any plans as it’s still very early – in generally that’s the same tone we’ve heard from other companies as even Samsung, even 2 years after the release of their first-gen NPU, doesn’t publicly make available their Eden SDK. Google notes that there is massive performance potential for the TPU and that the Pixel 6 phones are able to use them in first-party software, which enables the many ML features for the camera, and many translation features on the phone.

GPU Performance & Power Phone Efficiency & Battery Life
Comments Locked

108 Comments

View All Comments

  • Alistair - Tuesday, November 2, 2021 - link

    It's very irritating how slow Android SOCs are. I'll just keep on waiting. Won't give up my existing Android phone until actual performance improvements arrive. Hopefully Samsung x AMD will make a difference next year.
  • Speedfriend - Thursday, November 4, 2021 - link

    Looking at the excellent battery life of the iPhone 13 (which I am currently waiting for as my work phone) does iPhone till kill suspend background tasks. When I used to day trade, my iPhone would stop prices updating in the background, very annoying when I would flick to the app to check prices and unwittingly see prices hours old.
  • ksec - Tuesday, November 2, 2021 - link

    Av1 hardware decoder having design problem again?

    Where have I heard of this before?
  • Peskarik - Tuesday, November 2, 2021 - link

    preplanned obsolescence
  • tuxRoller - Tuesday, November 2, 2021 - link

    I wonder if Google is using the panfrost open source driver for Mali? That might account for some of the performance issues.
  • TheinsanegamerN - Tuesday, November 2, 2021 - link

    Seems to me based on thermals that the pixel 6/pro suffer from thermal throttling, and thus have power power budgets, then they should have given the internal hardware, leading to poor results.

    Makes me wonder what one of these chips could do in a better designed chassis.
  • name99 - Tuesday, November 2, 2021 - link

    I'd like to ask a question that's not rooted in any particular company, whether it's x86, Google, or Apple, namely: how different *really* are all these AI acceleration tools, and what sort of timelines can we expect for what?

    Here are the kinda use cases I'm aware of:
    For vision we have
    - various photo improvement stuff (deblur, bokeh, night vision etc). Works at a level people consider OK, getting better every year.
    Presumably the next step is similar improvement applied to video.

    - recognition. Objects, OCR. I'd say the Apple stuff is "acceptable". The OCR is genuinely useful (eg search for "covid" will bring up a scan of my covid card without me ever having tagged it or whatever), and the object recognition gets better every year. Basics like "cat" or person recognition work well, the newest stuff (like recognizing plant species) seems to be accurate, but the current UI is idiotic and needs to be fixed (irrelevant for our purposes).
    On the one hand, you can say Google has had this for years. On the other hand my practical experience with Google Lens and recognition is that the app has been through so many rounds of "it's on iOS, no it isn't; it's available in the browser, no it isn't" that I've lost all interest in trying to figure out where it now lives when I want that sort of functionality. So I've no idea whether it's better than Apple along any important dimensions.

    For audio we have
    - speech recognition, and speech synth. Both of these have been moved over the years from Apple servers to Apple HW, and honestly both are now remarkably good. The only time speech recognition serves me poorly is when there is a mic issue (like my watch is covered by something, or I'm using the mic associated with my car head unit, not the iPhone mic).
    You only realize how impressive this is when you hear voice synth from older platforms, like the last time I used Tesla maybe 3 yrs ago the voice synth was noticeably more grating and "synthetic" than Apple. I assume Google is at essentially Apple level -- less HW and worse mics to throw at the problem, but probably better models.

    - maybe there's some AI now powering Shazam? Regardless it always worked well, but gets better and faster every year.

    For misc we have
    - various pose/motion recognition stuff. Apple does this for recognizing types of exercises, or handwashing, and it works fine. I don't know if Google does anything similar. It does need a watch. Not clear how much further this can go. You can fantasize about weird gesture UIs, but I'm not sure the world cares.

    - AI-powered keyboards. In the case of Apple this seems an utter disaster. They've been at it for years, it seems no better now with 100x the HW than it was five years ago, and I think everyone hates it. Not sure what's going on here.
    Maybe it's just a bad UI for indicating that the "recognition" is tentative and may be revised as you go further?
    Maybe the model is (not quite, but almost entirely) single-word based rather than grammar and semantic based?
    Maybe the model simply does not learn, ever, from how I write?
    Maybe the model is too much trained by the actual writing of cretins and illiterates, and tries to force my language down to that level?
    Regardless, it's just terrible.

    What's this like in Google world? no "AI"-powered keyboards?, or they exist and are hated? or they exist and work really well?

    Finally we have language.
    Translation seems to have crossed into "good enough" territory. I just compared Chinese->English for both Apple and Google and while both were good enough, neither was yet at fluent level. (Honestly I was impressed at the Apple quality which I rate as notably better than Google -- not what I expected!)

    I've not yet had occasion to test Apple in translating images; when I tried this with Google, last time maybe 4 yrs ago, it worked but pretty terribly. The translation itself kept changing, like there was no intelligence being applied to use the "persistence" fact that the image was always of the same sign or item in a shop or whatever; and the presentation of the image, trying to overlay the original text and match font/size/style was so hit or miss as to be distracting.

    Beyond translation we have semantic tasks (most obviously in the form of asking Siri/Google "knowledge" questions). I'm not interested in "which is a more useful assistant" type comparisons, rather which does a better job of faking semantic knowledge. Anecdotally Google is far ahead here, Alexa somewhat behind, and Apple even worse than Alexa; but I'm not sure those "rate the assistant" tests really get at what I am after. I'm more interested in the sorts of tests where you feed the AI a little story then ask it "common sense" questions, or related tasks like smart text summarization. At this level of language sophistication, everybody seems to be hopeless apart from huge experimental models.

    So to recalibrate:
    Google (and Apple, and QC) are putting lots of AI compute onto their SoCs. Where is it used, and how does it help?
    Vision and video are, I think clear answers and we know what's happening there.
    Audio (recognition and synth) are less clear because it's not as clear what's done locally and what's shipped off to a server. But quality has clearly become a lot better, and at least some of that I think happens locally.
    Translation I'm extremely unclear how much happens locally vs remotely.
    And semantics/content/language (even at just the basic smart secretary level) seems hopeless, nothing like intelligent summaries of piles of text, or actually useful understanding of my interests. Recommendation systems, for example, seem utterly hopeless, no matter the field or the company.

    So, eg, we have Tensor with the ability to run a small BERT-style model at higher performance than anyone else. Do we have ways today in which that is used? Ways in which it will be used in future that aren't gimmicks? (For example there was supposed to be that thing with Google answering the phone and taking orders or whatever it was doing, but that seems to have vanished without a trace.)

    As I said, none of this is supposed to be confrontational. I just want a feel for various aspects of the landscape today -- who's good at what? are certain skills limited by lack of inference or by model size? what are surprising successes and failures?
  • dotjaz - Tuesday, November 2, 2021 - link

    " but I do think it’s likely that at the time of design of the chip, Samsung didn’t have newer IP ready for integration"

    Come on. Even A77 was ready wayyyy before G78 and X1, how is it even remotely possible to have A76 not by choice?
  • Andrei Frumusanu - Wednesday, November 3, 2021 - link

    Samsung never used A77.
  • anonym - Sunday, November 7, 2021 - link

    Exynos 980 uses Cortex-A77

Log in

Don't have an account? Sign up now