Google’s Tensor Processing Unit: What We Know

by Joshua Ho on May 20, 2016 6:00 AM EST

39 Comments | Add A Comment

39 Comments

If you’ve followed Google’s announcements at I/O 2016, one stand-out from the keynote was the mention of a Tensor Processing Unit, or TPU (not to be confused with thermoplastic urethane). I was hoping to learn more about this TPU, however Google is currently holding any architectural details close to their chest.

More will come later this year, but for now what we know is that this is an actual processor with an ISA of some kind. What exactly that ISA entails isn't something Google is disclosing at this time - and I'm curious as to whether it's even Turing complete - though in their blog post on the TPU, Google did mention that it uses "reduced computational precision." It’s a fair bet that unlike GPUs there is no ISA-level support for 64 bit data types, and given the workload it’s likely that we’re looking at 16 bit floats or fixed point values, or possibly even 8 bits.

Reaching even further, it’s possible that instructions are statically scheduled in the TPU, although this was based on a rather general comment about how static scheduling is more power efficient than dynamic scheduling, which is not really a revelation in any shape or form. I wouldn’t be entirely surprised if the TPU actually looks an awful lot like a VLIW DSP with support for massive levels of SIMD and some twist to make it easier to program for, especially given recent research papers and industry discussions regarding the power efficiency and potential for DSPs in machine learning applications. Of course, this is also just idle speculation, so it’s entirely possible that I’m completely off the mark here, but it’ll definitely be interesting to see exactly what architecture Google has decided is most suited towards machine learning applications.

Source: Google

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

39 Comments

View All Comments

nathanddrews - Friday, May 20, 2016 - link
So is it basically a really fast, but really dumb CPU?
HollyDOL - Friday, May 20, 2016 - link
It sounds a bit like being an offspring of Pentium III and a Bitcoin ASIC. But I can be completely wrong. Until Google provides info it's pure best guessing...
nathanddrews - Friday, May 20, 2016 - link
https://cloudplatform.googleblog.com/2016/05/Googl...
Krysto - Friday, May 20, 2016 - link
I would think the architecture is a lot simpler than a Pentium 3. It should be a lot closer to a Bitcoin ASIC.
JoshHo - Friday, May 20, 2016 - link
That doesn't make a lot of sense. Pentium III is a CPU core with out of order execution and speculative execution, Bitcoin ASICs are basically just dedicated silicon for SHA256.

Machine learning really needs a huge amount of SIMD in order to enable faster matrix multiplication, with some general purpose instructions so you don't have to involve the CPU just to evaluate whether you need to break out of a loop. Static scheduling also means that the compiler is deciding how to schedule operations rather than relying on hardware to do so dynamically.
HollyDOL - Monday, May 23, 2016 - link
Sorry for confusion, it was rather ment literally, not absolutely. I was not referring to internal architecture or so... just SIMD in P3 and one-purpose of bitcoin ASIC.
ddriver - Friday, May 20, 2016 - link
It is basically a special / dedicated purpose processor, much better at doing that one thing for the power and transistor budget. A lot of fuss about nothing, which has become standard practice nowadays.

"deep learning" has very shallow hardware requirements, a traditional processor architecture is far more capable, which in this particular task would be wasted capacity. It is basically like driving a semi truck to the corner store to get a pack of cigarettes.
name99 - Friday, May 20, 2016 - link
The "fuss" is because it's one more step in the slow decline of Intel.
Every large computational task that gets moved to custom silicon is a slice of the data center/server business for which Intel is no longer obligatory; and that data center/server business is the only place Intel makes money these days.

The Intel fans insist this doesn't matter, and will tell you how nothing can replace a fast low-latency CPU for certain types of server and business tasks. Which is all true, but it misses the point.
Once upon a time Intel was a "computation" company, whose business grew as computation grew. Then they fscked up their mobile strategy (something I've discussed in great detail elsewhere) and they shrank from a computation company to a "high performance computation" company --- and stayed basically flat as the greatest expansion of computation in human history happened in mobile.
Now, as certain types of computation move first to GPUs, then to even more specialized chips, they are shrinking from a "high performance computation" company to a "low-latency CPUs" company, and once again business will stay flat, even as the revenue that goes into GPUs and these specialized chips explodes.
And, soon enough (starting probably next year, really taking off around 2020) we'll see the ARM server CPUs targeting high frequencies and high IPC, and Intel will shrink further. to the "x86 CPUs" category, where their only real selling point is the ability to run dusty decks --- basically the same place IBM zSeries is today.

That is why TPU matters. because it's the third step (after mobile, and after GPUs) in the shrinking relevance of Intel to the entire spectrum of computation.
name99 - Friday, May 20, 2016 - link
Hah. Great minds think alike!
I hadn't read that Wired article ( http://www.wired.com/2016/05/googles-making-chips-... )
when I wrote the above, but yeah, they mostly see things the same way I do...
ddriver - Friday, May 20, 2016 - link
Saying that in any context other than a joke is just sad...

How does this hurt intel? Why should intel be worried or even care? It doesn't compete with intel's products, couldn't even if it wanted to. There is nothing preventing intel from manufacturing special purpose processors if they wanted to. As I already mentioned, google only develop this because they need it to comb through people's personal information they've mined over the years. Intel doesn't need that, google need that, thus they make a special purpose chip they can't buy, cuz barely anyone needs it. Google will not, and could not compete with intel even if it wanted to, they don't have the experience, resources and know-how of intel. The thing is that intel won't go into anything that doesn't promise high profit margins - that's why their mobile device business ain't going too well - intel is not very enthusiastic about low margin markets.

Don't be such a tool, buying into sensationalist article titles. Those chips will serve one purpose alone - assist google in capitalizing on people's data, they have exabytes of data collected, and they need the most efficient tools to process it.

Google’s Tensor Processing Unit: What We Know

Post Your Comment

39 Comments

View All Comments

nathanddrews - Friday, May 20, 2016 - link

HollyDOL - Friday, May 20, 2016 - link

nathanddrews - Friday, May 20, 2016 - link

Krysto - Friday, May 20, 2016 - link

JoshHo - Friday, May 20, 2016 - link

HollyDOL - Monday, May 23, 2016 - link

ddriver - Friday, May 20, 2016 - link

name99 - Friday, May 20, 2016 - link

name99 - Friday, May 20, 2016 - link

ddriver - Friday, May 20, 2016 - link

Log in

Don't have an account? Sign up now