Imagination Announces B-Series GPU IP: Scaling up with Multi-GPU

Name: Imagination Announces B-Series GPU IP: Scaling up with Multi-GPU
Item: Imagination Announces B-Series GPU IP: Scaling up with Multi-GPU
Author: Andrei Frumusanu

by Andrei Frumusanu on October 13, 2020 4:00 AM EST

74 Comments | Add A Comment

74 Comments

Introducing BXS Series: Functional Safety for Autonomy

Besides targeting higher performance design targets, an area where Imagination is putting a higher level of focus on is the automotive and industrial markets. To cover these use-cases, Imagination is today also launching the new “BXS” series of GPU IP – where the S stands for safety.

The new GPU IP line-up mirrors the standard BXT, BXM and BXE configurations, but adds support for ISO 26262 / ASIL-B functional safety features.

Imagination is introducing a new feature called “Tile Region Protection” in which a configurable region of render tiles on the render frame can be marked as safety critical, and for which the GPU can check for correct execution and rendering, allowing it to be ISO 26262 certified.

TRP is implemented from the smallest BXE-equivalent BXS GPU (Frankly Imagination could have done better here than calling the whole safety line-up BXS), allowing for work repletion to achieve fault detection. Furthermore, Imagination allows for end-to-end data integrity protection via CRC checking of all data going in and out of the GPU, further helping the IP achieve safety requirements.

TRP require a single GPU to repeat work, which in turn would mean reduced performance in a system. A more performance-oriented way of scaling things would be a multi-GPU implementation.

A multi-GPU configuration in an automotive design would also server the purpose of partitioning the GPUs for multiple independent workloads; whilst in a consumer implementation you would expect the GPUs to mostly act and appear as a single large unit to a host, automotive use-cases could also have the multiple GPUs act completely independently from each other. It’s also possible to mix- and match GPUs, for example a 4-core implementation could have 3 partitions, with two GPUs working together to pool up resources for a more demanding task such as the infotainment system, while two other GPUs would be handling other independent workloads.

Imagination naturally also continues to support hardware virtualisation within one single GPU with up to 8 “hyperlanes” (guests). So, you could split up a 2-core design into 3 partitions, such as depicted above.

Beyond the addition of safety critical features on the BXS series, the automotive IP also features some specific enhancements in the microarchitecture that allows for better performance scaling for workloads that are more unique to the automotive space. One such aspect is geometry, where automotive vendors have the tendency to use absurd amounts of triangles. Imagination says they’ve tweaked their designs to cover these more demanding use-cases, and together with some MSAA specific optimisations they can reach up to a 60% greater performance for these automotive edge-cases, compared to the regular non-automotive IP.

Introducing IMGIC - A better frame-buffer compression Performance, Efficiency, and a Raytracing Teaser

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

74 Comments

View All Comments

Yojimbo - Tuesday, October 13, 2020 - link
I didn't know Xi JinPing was an engineer...
EthiaW - Wednesday, October 14, 2020 - link
Those chinese have only managed to outcast the former corporate leaders recently. The shift from engeneer culture will take time, if not reverted by the UK government.
Yojimbo - Wednesday, October 14, 2020 - link
They had the stubbornness to not be bought by Apple in order to be bought by the Chinese government. And through what will or method is the UK government going to change the culture of the company?
Yojimbo - Tuesday, October 13, 2020 - link
Hey, you're right. He studied chemical engineering. I knew that, but forgot.
melgross - Tuesday, October 13, 2020 - link
With Apple being 60% of their sales, and 80% of their profits, they demanded $1 billion from Apple, which refused that ridiculous price.

The company is likely worth no more than $100 million, if that, considering their sales are now just about $20 million a year.
colinisation - Tuesday, October 13, 2020 - link
Well if not Apple why not ARM, I know ARM tried to buy them at some point in the past.
But once Apple left their valuation would have taken a pretty substantial hit and ARM's GPU IP is successful but I don't think it is the most Area/Power efficient so it looked to me to be something they would have explored both would have been in the same country, maybe it would have spurred ARM into providing a more viable alternative to Qualcomm in the smartphone GPU space.
CiccioB - Tuesday, October 13, 2020 - link
"Whereas current monolithic GPU designs have trouble being broken up into chiplets in the same way CPUs can be, Imagination’s decentralised multi-GPU approach would have no issues in being implemented across multiple chiplets, and still appear as a single GPU to software."

There's not problem in splitting today desktop monolithic GPUs into chiplets.
What is done here is to create small chiplets that have all the needed pieces as a monolithic one. The main one is the memory controller.

Splitting a GPU over chiplets all having their own MC is technically simple but makes a mess when trying to use them due to the NUMA configuration. Being connected with a slow bus makes data sharing between chiplets almost impossible and so needs the programmer/driver to split the needed data over the single chiplet memory space and not make algorithms that share data between them.

The real problem with MCM configuration is data sharing = bandwidth.
You have to allow for data to flow from one core to another independently of its physical location on which chiplet it is. That's the only way you can obtain really efficient MCM GPUs.
And that requires high power+wide buses and complex data management with most probably very big caches (= silicon and power again) to mask memory latency and natural bandwidth restriction as it is impossible to have buses as fast as actual ones that connect 1TB/s to a GPU for each chiplet.

As you can see to make their GPUs work in parallel in HCP market Nvidia made a very fast point-to-point connection and created very fast switches to connect them together.
hehatemeXX - Tuesday, October 13, 2020 - link
That's why Infinity Cache is big. The bandwidth limitation is removed.
Yojimbo - Tuesday, October 13, 2020 - link
Anything Infinity is big, except compared to a bigger Infinity.
CiccioB - Tuesday, October 13, 2020 - link
It is removed just for the size of the cache.
If you need more than that amount of data you'll still be limited to bandwidth limitation.
With the big cache latency now added.

If it were so easy to reduce the bandwidth limitations anyone would just add a big enough cache... the fact is that there's no a big enough cache for the immense quantity of data GPUs work with, unless you want all your VRAM as a cache (but then you won't be connected with such a limited bus).

Imagination Announces B-Series GPU IP: Scaling up with Multi-GPU

Introducing BXS Series: Functional Safety for Autonomy

Post Your Comment

74 Comments

View All Comments

Yojimbo - Tuesday, October 13, 2020 - link

EthiaW - Wednesday, October 14, 2020 - link

Yojimbo - Wednesday, October 14, 2020 - link

Yojimbo - Tuesday, October 13, 2020 - link

melgross - Tuesday, October 13, 2020 - link

colinisation - Tuesday, October 13, 2020 - link

CiccioB - Tuesday, October 13, 2020 - link

hehatemeXX - Tuesday, October 13, 2020 - link

Yojimbo - Tuesday, October 13, 2020 - link

CiccioB - Tuesday, October 13, 2020 - link

Log in

Don't have an account? Sign up now