ARM Unveils Next Generation Bifrost GPU Architecture & Mali-G71: The New High-End Mali

Name: ARM Unveils Next Generation Bifrost GPU Architecture & Mali-G71: The New High-End Mali
Item: ARM Unveils Next Generation Bifrost GPU Architecture & Mali-G71: The New High-End Mali
Author: Ryan Smith

by Ryan Smith on May 30, 2016 7:00 AM EST

Posted in
GPUs
Arm
Mali
Mali G
Bifrost

57 Comments | Add A Comment

57 Comments

Putting It Together: Mali-G71

Now that we’ve seen what Bifrost looks like at the core level, it’s time to take a look at the big picture. The first ARM GPU design that Bifrost will be going into is the high-end Mali-G71, which is being announced alongside the Cortex-A73 this morning.

Like other Mali designs, the G71 is designed for a variable number of shader cores, up to 32 in total. This gives ARM’s clients a significant amount of scalability to work with, and ARM’s own marketing slides show an 8x span in designs, from G71MP4 all the way up to the full G71MP32. At the same time it’s not completely clear at the moment if a full sized 32 core design is even viable for a mobile device, at least on current processes. Mali’s core count scalability tends to be very forward looking in that regard.

Connecting the various shader cores together is a new control fabric for Bifrost. The new fabric goes hand-in-hand with the earlier changes we discussed to the core design, which changed how various units are attached to the fabric. There are also some implications for heterogeneous compute, which we’ll get back to in a bit.

Meanwhile at the other end of the fabric is G71’s L2 cache subsystem. In a change from Midgard, the L2 cache is now a single logical L2 cache, as opposed to being a fully segmented cache before. Furthermore the cache has been reworked a bit to cut down on the number of partial lines that are flushed out to memory. Partial lines became a more pressing problem with LPDDR4, which introduced a larger prefetch size that in turn is less tolerant of partial lines.

But the biggest news here where the L2 cache fits into the bigger picture in the ARM ecosystem, where it’s attached to the SoC coherent interconnect, such as ARM’s new CoreLink CCI-550 interconnect or third-party proprietary interconnect. Overall G71 now offers up to 4 full ACE (fully coherent) interfaces to the interconnect, versus only two ACE Lite (IO coherent) interfaces on Midgard. Taken altogether, thanks to a combination of architecture changes at the GPU level, the fabric level, the cache level, and the interconnect level, G71 offers full cache coherency with the rest of the system. As a result, when paired up with a suitable CPU core, G71 is capable of heterogeneous compute.

ARM has stated their intention to step into offering heterogeneous compute functionality for some time now, and G71 is in turn their first GPU to be released with support for the feature. The implementation here allows for a full “fine grained” compute, meaning that both the CPU and GPU can see each other’s caches, allowing for the greatest potential performance gains from heterogeneous compute.

From a software standpoint, it’s interesting to note that ARM has gone with an OpenCL 2.0-centric approach, intending to make the functionality accessible through that and related (SPIR-V utilizing) APIs such as Vulkan. G71 however does not support the Heterogeneous System Architecture’s HSAIL standard, this despite ARM being a member of the HSA Foundation. ARM did not have too much to say on the matter, but has stated that they never “totally bought into” HSAIL. OpenCL 2.0, by comparison, is a more generic implementation at the API level, leaving ARM to sort out the low level details as they see fit.

Update 06/01: With yesterday's announcement of the HSA 1.1 specification, I went back to ARM to ask them whether the new specification impacts the company’s heterogenous compute plans at all, especially given that their architecture doesn’t support the 1.0 standard. As it turns out, ARM is going a route very similar to AMD’s ROCm platform: while the company isn’t utilizing the HSAIL – and thus in the strictest sense isn’t a complete HSA platform – they are using the HSA standard in the development of their hardware.

At a hardware level, the HSA specification standardizes a number of aspects of the hardware for common interoperability and easier programming purposes, including signals, queues, floating point number handling, and other, low-level minutiae about how heterogeneous execution should work. This is separate from the HSAIL, which is more concerned with the software aspects of heterogeneous programming, and though helpful, is not necessary for heterogeneous compute. As a result while Mali-G71 is technically not an HSA platform, in practice it is HSA hardware, using the HSA specification as a means to offer a common and well understood execution model for heterogeneous compute. So ARM is very much on-board with HSA – and is essentially supplying one of the first non-AMD HSA 1.1 hardware designs – even if they’re not using HSAIL itself.

At this point heterogeneous compute is still a long term play for ARM. The potential performance improvements are, in the right scenarios, very significant. And using the GPU instead of the CPU is again a sound move when there’s lots of suitable parallel work to throw at it, especially in SoCs where power efficiency is so critical. But it will take time to bring software developers on board, so while the hardware will soon be here, it will take some time for the software to catch up.

That said, a big part of the process will the natural migration towards newer APIs that better support heterogeneous execution. ARM of course has been big on Vulkan support, and while Vulkan is first and foremost a graphics API, as the line blurs between graphics and compute, it feeds into their compute plans as well. The forthcoming Vulkan 1.1 specification is set to introduce some new compute functionality that further bridges the gap between Vulkan 1.0 and OpenCL 2.x, which ARM in turn will be preparing to take advantage of.

But regardless of the compute implications, ARM sees Vulkan as being important to the long term progression of software development. The lower overhead of Vulkan factors well into the power and thermal needs of mobile devices; unnecessary CPU work not only burns power, but it eats into thermal headroom that could be going to the GPU. Consequently, expect to see ARM pushing Vulkan even harder in the coming months in alignment with G71.

The Bifrost Core: Decoupled Wrapping Things Up: Mali-G71, Coming Soon

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

57 Comments

View All Comments

Ariknowsbest - Monday, May 30, 2016 - link
Mainly MediaTek and Rockchip that I can remember.
Ryan Smith - Monday, May 30, 2016 - link
Bingo. I haven't forgotten about the IMG crew, but in the Android space (which is really the only competitive space for GPU IP licensing) they've lost most of their market share, especially at the high-end.
name99 - Tuesday, May 31, 2016 - link
However it would be interesting to know how these various features (eg primacy of SIMT rather than SIMD, coherent common address space) compare to PowerVR.
lucam - Tuesday, May 31, 2016 - link
At this point I think it is a blessing that IMG has Apple as big customer; without it they would have completely lost all mobile market share.
Ariknowsbest - Tuesday, May 31, 2016 - link
But it's not good to be dependent on one large customer. Maybe the emergence of VR can help them to retake market share.
lucam - Tuesday, May 31, 2016 - link
Totally agree with you. PowerVr is an hell of solution, but for some reason IMG has lost his leadership in the mobile market, almost disappeared in Android.
I wonder if IMG didn't have Apple, what could be the situation now. Maybe even worse..
zeeBomb - Monday, May 30, 2016 - link
Stay frosty my friends.
Krysto - Monday, May 30, 2016 - link
I guess ARM will abandon HSAIL now that SPIR-V and Vulkan are here. It probably makes sense to stop focusing on OpenCL as well, if developers can just use some other language than OpenCL with SPIR-V.
mdriftmeyer - Monday, May 30, 2016 - link
One uses C99+ or C11++ in OpenCL 2.x. SPIR-V same thing. Why would I care to write in SPIR-V unless it was a requirement for portability? If I want a lower level, higher performance result I'll skip SPIR-V which bridges with OpenCL via LLVM-IR and go straight to using Clang/LLVM and OpenCL?

Don't confuse SPIR-V with the HSA Foundation. They are solving different needs and SPIR-V doesn't address what APUs via AMD are by designed to resolve.
beginner99 - Tuesday, May 31, 2016 - link
Yeah that's a bit of a bummer. For me this pretty much means HSA is DOA. No software company will invest in something HSA compatible if it only is available on AMD APUs.

ARM Unveils Next Generation Bifrost GPU Architecture & Mali-G71: The New High-End Mali

Putting It Together: Mali-G71

Post Your Comment

57 Comments

View All Comments

Ariknowsbest - Monday, May 30, 2016 - link

Ryan Smith - Monday, May 30, 2016 - link

name99 - Tuesday, May 31, 2016 - link

lucam - Tuesday, May 31, 2016 - link

Ariknowsbest - Tuesday, May 31, 2016 - link

lucam - Tuesday, May 31, 2016 - link

zeeBomb - Monday, May 30, 2016 - link

Krysto - Monday, May 30, 2016 - link

mdriftmeyer - Monday, May 30, 2016 - link

beginner99 - Tuesday, May 31, 2016 - link

Log in

Don't have an account? Sign up now