Google's IP: Tensor TPU/NPU

At the heart of the Google Tensor, we find the TPU which actually gives the chip is marketing name. Developed by Google with input and feedback by the team’s research teams, taking advantage of years of extensive experience in the field of machine learning, Google puts a lot of value into the experiences that the new TPU allows for Pixel 6 phones. There’s a lot to talk about here, but let’s first try to break down some numbers, to try to see where the performance of the Tensor ends up relative to the competition.

We start off with MLCommon’s MLPerf – the benchmark suite works closely with all industry vendors in designing something that is representative of actual workloads that run on devices. We also run variants of the benchmark which are able to take advantage of various vendors SDKs and acceleration frameworks. Google had sent us a variant of the MLPerf app to test the Pixel 6 phones with – it’s to be noted that the workloads on the Tensor run via NNAPI, while other phones are optimised to run through the respective chip vendor’s libraries, such as Qualcomm’s SNPE, Samsung’s EDEN, or MediaTek’s Neuron – unfortunately only the Apple variant is lacking CoreML acceleration, thus we should expect lower scores on the A15.

MLPerf 1.0.1 - Image Classification MLPerf 1.0.1 - Object Detection MLPerf 1.0.1 - Image SegmentationMLPerf 1.0.1 - Image Classification (Offline)

Starting off with the Image Classification, Object Detection, and Image Segmentation workloads, the Pixel 6 Pro and the Google Tensor showcase good performance, and the phone is able to outperform the Exynos 2100’s NPU and software stack. More recently, Qualcomm had optimised its software implementation for MLPerf 1.1, able to achieve higher scores than a few months ago, and this allows the Snapdragon 888 to achieve significantly better scores than what we’re seeing on the Google Tensor and the TPU – at least for those workloads, in the current software releases and optimisations.

MLPerf 1.0.1 - Language Processing 

The Language Processing test of MLPerf is a MobileBERT model, and here for either architectural reasons of the TPU, or just a vastly superior software implementation, the Google Tensor is able to obliterate the competition in terms of inference speed.

In Google’s marketing, language processing, such as live transcribing, and live translations, are very major parts of the differentiating features that the new Google Tensor enables for the Pixel 6 series devices – in fact, when talking about the TPU performance, it’s exactly these workloads that the company highlights as being the killer use-cases and what the company calls state-of-the-art.

If the scores here are indeed a direct representation of Google’s design focus of the TPU, then that’s a massively impressive competitive advantage over other platforms, as it represents a giant leap in performance.

GeekBench ML 0.5.0

Other benchmarks we have available are for example GeekBench ML, which is currently still in a pre-release state in that the models and acceleration can still change in further updates.

The performance here depends on the APIs used, with the test either allowing TensorFlow delegates for the GPU or CPU, or using NNAPI on Android devices (and CoreML on iOS). The GPU results should only represent the GPU ML performance, which is surprisingly not that great on the Tensor, as it somehow lands below the Exynos 2100’s GPU.

In NNAPI mode, the Tensor is able to more clearly distinguish itself from the other SoCs, showcasing a 44% lead over the Snapdragon 888. It’s likely this represent the TPU performance lead, however it’s very hard to come to conclusions when it comes to such abstractions layer APIs.

AI Benchmark 4 - NNAPI (CPU+GPU+NPU)

In AI Benchmark 4, when running the benchmark in pure NNAPI mode, the Google Tensor again showcases a very large performance advantage over the competition. Again, it’s hard to come to conclusions as to what’s driving the performance here as there’s use of CPU, GPU, and NPUs.

I briefly looked at the power profile of the Pixel 6 Pro when running the test, and it showcased similar power figures to the Exynos 2100, which extremely high burst power figures of up to 14W when doing individual inferences. Due to the much higher performance the Tensor showcases, it also means it’s that much more efficient. The Snapdragon 888 peaked around 12W in the same workloads, so the efficiency gap here isn’t as large, however it’s still in favour of Google’s chip.

All in all, Google’s ML performance of the Tensor has been its main marketing point, and Google doesn’t disappoint in that regard, as the chip and the TPU seemingly are able to showcase extremely large performance advantages over the competition. While power is still very high, completing an inference faster means that energy efficiency is also much better.

I asked Google what their plans are in regards to the software side of things for the TPU – whether they’ll be releasing a public SDK for developers to tap into the TPU, or whether things will remain more NNAPI centric like how they are today on the Pixels. The company wouldn’t commit yet to any plans as it’s still very early – in generally that’s the same tone we’ve heard from other companies as even Samsung, even 2 years after the release of their first-gen NPU, doesn’t publicly make available their Eden SDK. Google notes that there is massive performance potential for the TPU and that the Pixel 6 phones are able to use them in first-party software, which enables the many ML features for the camera, and many translation features on the phone.

GPU Performance & Power Phone Efficiency & Battery Life
Comments Locked

108 Comments

View All Comments

  • Silver5urfer - Tuesday, November 2, 2021 - link

    You said you do Kernels but "As someone who was rather hopeful that google would take control and bring us android users a true apple chip equivalent some day, this is definitely not the case with google silicon."

    What is Android lacking from needing that so called A series processor onto the platform ? I already see Android modding has been drained a lot now. It's there on XDA but less than 1% of user base uses mods, maybe root but it's still niche.

    Android has been on a downhill since a long time. With Android v9 Pie to be specific. Google started to mimic iOS on superficial level with starting from OS level information density loss now on 12 it's insane, you get 4 QS toggles. It's worst. People love it somehow because new coat of trash paint is good.

    On HW side, except for OnePlus phones no phones have proper mod ecosystem. Pixels had but due to the crappy policies they implemented on the HW side like AB system, Read only filesystem copied from Huawei horrible fusing of filesystems and then enforcing all these at CTS level, they added the worst of all - Scoped Storage which ruined all the user use cases of having a pocket computer to a silly iOS like trash device. Now on Android any photo you download goes into that Application specific folder and you cannot change it, due to API level block on Playstore for targeting Android v11 which comes with Scoped Storage by default. Next year big thing is coming, all 32bit applications will be obsoleted because ARM is going to remove the 32bit IP from the Silicon designs. That makes 888 the last 32bit capable CPU.

    Again what do you expect ? Apple A series shines in these Anandtech SPEC scores but when It comes to real life Application work done performance, they do not show the same level of difference. Which is basically Application launch speed and performance of the said application now Android 12 adds a splash screen BS to all apps globally. Making it even worse.

    There's nothing that Google is going to provide you or anyone to have something that doesn't exist, Android needs freedom and that is being eroded away every year with more and more Apple inspired crap. The only reason Google did this is to experiment on those billions of dollars and millions for their R&D, Pixel division has been in loss since 2016, less than 3% North American marketshare. Only became 3 from 2 due to A series budget Pixels. And they do not even sell overseas on many markets. In fact they imitate Apple so much that now they want the stupid HW exclusive joke processors for their lineup imitating Apple for no reason. Qcomm provides all the blobs and baseband packages, If Google can make them deliver support for 6 years they can do it, but they won't because sales. All that no charger because environment, no 3.5mm jack because no space, no SD slot is all a big fat LIE.

    Their GS101 is a joke, a shame to CPU engineering, trash thermal design, useless A7x cores and the bloated X1 x2 cores for nothing, except for their ISP nothing is useful and even the Pixel camera can be ported to other phones, Magic Eraser for eg works on old Pixels, soon other phones due to Camera API2 and Modding.

    Google's vision of Android was dead since v9 and since the death of Nexus series. Now it's more of a former shell with trash people running for their agenda of yearly consumerism and a social media tool rather than the old era of computer in your pocket, to make it worse the PR of Pixel is horrible and more political screaming than anything else.
  • Zoolook - Saturday, November 6, 2021 - link

    Apple silicon shines in part due to being on a superior process, and a much better memory subsystem, Samsung process is far behind TSMC in regards to efficiency unfortunately.
  • Zoolook - Saturday, November 6, 2021 - link

    Small nitpick, A8X GPU was a PowerVR licence, A11 had the first Apple inhouse GPU.
  • iphonebestgamephone - Sunday, November 14, 2021 - link

    "cut power consumption in half WITHOUT increasing performance"

    Make a custom kernel and uc/uv it and there you go. Should be easy for a pro kernel dev like you.
  • tipoo - Tuesday, November 2, 2021 - link

    Thanks for this analysis, it's great.

    I'm still left wondering what the point of Tensor is after all this. It doesn't seem better than what was on market even for Android. I guess the extra security updates are nice but still not extra OS updates even though it's theirs. And the NPU doesn't seem to outperform either despite them talking about that the most.

    And boy do these charts just make A15 look even more above and beyond their efforts, but even A4 started with Cortex cores, maybe in 2-3 spins Google will go more custom.
  • Blastdoor - Tuesday, November 2, 2021 - link

    I wonder if we will now see a similar pattern play out in the laptop space, with Macs moving well beyond the competition in CPU and GPU performance/watt, and landing at similar marketshare (it would be a big deal for the Mac to achieve the same share of the laptop market that the iPhone has of the smartphone market).
  • tipoo - Tuesday, November 2, 2021 - link

    Well I'm definitely going to hold my Apple stocks for years and that's one part of the reason. M1 Pro and Max are absolute slam dunks on the industry, and their chipmaking was part of what won me over on their phones.
  • TheinsanegamerN - Tuesday, November 2, 2021 - link

    When did apple manage that? I can easily recall the M1 pulling notably more power then the 4700u in order to beat it in benchmarks despite having 5nm to play with. The M1X max pulls close to 100W at full tilt, and is completely unsustainable.
  • Spleter - Tuesday, November 2, 2021 - link

    I think you are confusing temperature in degrees and not the amount of watts.
  • Alistair - Wednesday, November 3, 2021 - link

    when it is drawing 100 watts it is competing against windows laptops that are drawing 200 watts, i'm not sure what the problem is

Log in

Don't have an account? Sign up now