ARM Unveils Next Generation Bifrost GPU Architecture & Mali-G71: The New High-End Maliby Ryan Smith on May 30, 2016 7:00 AM EST
Putting It Together: Mali-G71
Now that we’ve seen what Bifrost looks like at the core level, it’s time to take a look at the big picture. The first ARM GPU design that Bifrost will be going into is the high-end Mali-G71, which is being announced alongside the Cortex-A73 this morning.
Like other Mali designs, the G71 is designed for a variable number of shader cores, up to 32 in total. This gives ARM’s clients a significant amount of scalability to work with, and ARM’s own marketing slides show an 8x span in designs, from G71MP4 all the way up to the full G71MP32. At the same time it’s not completely clear at the moment if a full sized 32 core design is even viable for a mobile device, at least on current processes. Mali’s core count scalability tends to be very forward looking in that regard.
Connecting the various shader cores together is a new control fabric for Bifrost. The new fabric goes hand-in-hand with the earlier changes we discussed to the core design, which changed how various units are attached to the fabric. There are also some implications for heterogeneous compute, which we’ll get back to in a bit.
Meanwhile at the other end of the fabric is G71’s L2 cache subsystem. In a change from Midgard, the L2 cache is now a single logical L2 cache, as opposed to being a fully segmented cache before. Furthermore the cache has been reworked a bit to cut down on the number of partial lines that are flushed out to memory. Partial lines became a more pressing problem with LPDDR4, which introduced a larger prefetch size that in turn is less tolerant of partial lines.
But the biggest news here where the L2 cache fits into the bigger picture in the ARM ecosystem, where it’s attached to the SoC coherent interconnect, such as ARM’s new CoreLink CCI-550 interconnect or third-party proprietary interconnect. Overall G71 now offers up to 4 full ACE (fully coherent) interfaces to the interconnect, versus only two ACE Lite (IO coherent) interfaces on Midgard. Taken altogether, thanks to a combination of architecture changes at the GPU level, the fabric level, the cache level, and the interconnect level, G71 offers full cache coherency with the rest of the system. As a result, when paired up with a suitable CPU core, G71 is capable of heterogeneous compute.
ARM has stated their intention to step into offering heterogeneous compute functionality for some time now, and G71 is in turn their first GPU to be released with support for the feature. The implementation here allows for a full “fine grained” compute, meaning that both the CPU and GPU can see each other’s caches, allowing for the greatest potential performance gains from heterogeneous compute.
From a software standpoint, it’s interesting to note that ARM has gone with an OpenCL 2.0-centric approach, intending to make the functionality accessible through that and related (SPIR-V utilizing) APIs such as Vulkan. G71 however does not support the Heterogeneous System Architecture’s HSAIL standard, this despite ARM being a member of the HSA Foundation. ARM did not have too much to say on the matter, but has stated that they never “totally bought into” HSAIL. OpenCL 2.0, by comparison, is a more generic implementation at the API level, leaving ARM to sort out the low level details as they see fit.
Update 06/01: With yesterday's announcement of the HSA 1.1 specification, I went back to ARM to ask them whether the new specification impacts the company’s heterogenous compute plans at all, especially given that their architecture doesn’t support the 1.0 standard. As it turns out, ARM is going a route very similar to AMD’s ROCm platform: while the company isn’t utilizing the HSAIL – and thus in the strictest sense isn’t a complete HSA platform – they are using the HSA standard in the development of their hardware.
At a hardware level, the HSA specification standardizes a number of aspects of the hardware for common interoperability and easier programming purposes, including signals, queues, floating point number handling, and other, low-level minutiae about how heterogeneous execution should work. This is separate from the HSAIL, which is more concerned with the software aspects of heterogeneous programming, and though helpful, is not necessary for heterogeneous compute. As a result while Mali-G71 is technically not an HSA platform, in practice it is HSA hardware, using the HSA specification as a means to offer a common and well understood execution model for heterogeneous compute. So ARM is very much on-board with HSA – and is essentially supplying one of the first non-AMD HSA 1.1 hardware designs – even if they’re not using HSAIL itself.
At this point heterogeneous compute is still a long term play for ARM. The potential performance improvements are, in the right scenarios, very significant. And using the GPU instead of the CPU is again a sound move when there’s lots of suitable parallel work to throw at it, especially in SoCs where power efficiency is so critical. But it will take time to bring software developers on board, so while the hardware will soon be here, it will take some time for the software to catch up.
That said, a big part of the process will the natural migration towards newer APIs that better support heterogeneous execution. ARM of course has been big on Vulkan support, and while Vulkan is first and foremost a graphics API, as the line blurs between graphics and compute, it feeds into their compute plans as well. The forthcoming Vulkan 1.1 specification is set to introduce some new compute functionality that further bridges the gap between Vulkan 1.0 and OpenCL 2.x, which ARM in turn will be preparing to take advantage of.
But regardless of the compute implications, ARM sees Vulkan as being important to the long term progression of software development. The lower overhead of Vulkan factors well into the power and thermal needs of mobile devices; unnecessary CPU work not only burns power, but it eats into thermal headroom that could be going to the GPU. Consequently, expect to see ARM pushing Vulkan even harder in the coming months in alignment with G71.