Putting It Together: Mali-G71

Now that we’ve seen what Bifrost looks like at the core level, it’s time to take a look at the big picture. The first ARM GPU design that Bifrost will be going into is the high-end Mali-G71, which is being announced alongside the Cortex-A73 this morning.

Like other Mali designs, the G71 is designed for a variable number of shader cores, up to 32 in total. This gives ARM’s clients a significant amount of scalability to work with, and ARM’s own marketing slides show an 8x span in designs, from G71MP4 all the way up to the full G71MP32. At the same time it’s not completely clear at the moment if a full sized 32 core design is even viable for a mobile device, at least on current processes. Mali’s core count scalability tends to be very forward looking in that regard.

Connecting the various shader cores together is a new control fabric for Bifrost. The new fabric goes hand-in-hand with the earlier changes we discussed to the core design, which changed how various units are attached to the fabric. There are also some implications for heterogeneous compute, which we’ll get back to in a bit.

Meanwhile at the other end of the fabric is G71’s L2 cache subsystem. In a change from Midgard, the L2 cache is now a single logical L2 cache, as opposed to being a fully segmented cache before. Furthermore the cache has been reworked a bit to cut down on the number of partial lines that are flushed out to memory. Partial lines became a more pressing problem with LPDDR4, which introduced a larger prefetch size that in turn is less tolerant of partial lines.

But the biggest news here where the L2 cache fits into the bigger picture in the ARM ecosystem, where it’s attached to the SoC coherent interconnect, such as ARM’s new CoreLink CCI-550 interconnect or third-party proprietary interconnect. Overall G71 now offers up to 4 full ACE (fully coherent) interfaces to the interconnect, versus only two ACE Lite (IO coherent) interfaces on Midgard. Taken altogether, thanks to a combination of architecture changes at the GPU level, the fabric level, the cache level, and the interconnect level, G71 offers full cache coherency with the rest of the system. As a result, when paired up with a suitable CPU core, G71 is capable of heterogeneous compute.

ARM has stated their intention to step into offering heterogeneous compute functionality for some time now, and G71 is in turn their first GPU to be released with support for the feature. The implementation here allows for a full “fine grained” compute, meaning that both the CPU and GPU can see each other’s caches, allowing for the greatest potential performance gains from heterogeneous compute.

From a software standpoint, it’s interesting to note that ARM has gone with an OpenCL 2.0-centric approach, intending to make the functionality accessible through that and related (SPIR-V utilizing) APIs such as Vulkan. G71 however does not support the Heterogeneous System Architecture’s HSAIL standard, this despite ARM being a member of the HSA Foundation. ARM did not have too much to say on the matter, but has stated that they never “totally bought into” HSAIL. OpenCL 2.0, by comparison, is a more generic implementation at the API level, leaving ARM to sort out the low level details as they see fit.

Update 06/01: With yesterday's announcement of the HSA 1.1 specification, I went back to ARM to ask them whether the new specification impacts the company’s heterogenous compute plans at all, especially given that their architecture doesn’t support the 1.0 standard. As it turns out, ARM is going a route very similar to AMD’s ROCm platform: while the company isn’t utilizing the HSAIL – and thus in the strictest sense isn’t a complete HSA platform – they are using the HSA standard in the development of their hardware.

At a hardware level, the HSA specification standardizes a number of aspects of the hardware for common interoperability and easier programming purposes, including signals, queues, floating point number handling, and other, low-level minutiae about how heterogeneous execution should work. This is separate from the HSAIL, which is more concerned with the software aspects of heterogeneous programming, and though helpful, is not necessary for heterogeneous compute. As a result while Mali-G71 is technically not an HSA platform, in practice it is HSA hardware, using the HSA specification as a means to offer a common and well understood execution model for heterogeneous compute. So ARM is very much on-board with HSA – and is essentially supplying one of the first non-AMD HSA 1.1 hardware designs – even if they’re not using HSAIL itself.

At this point heterogeneous compute is still a long term play for ARM. The potential performance improvements are, in the right scenarios, very significant. And using the GPU instead of the CPU is again a sound move when there’s lots of suitable parallel work to throw at it, especially in SoCs where power efficiency is so critical. But it will take time to bring software developers on board, so while the hardware will soon be here, it will take some time for the software to catch up.

That said, a big part of the process will the natural migration towards newer APIs that better support heterogeneous execution. ARM of course has been big on Vulkan support, and while Vulkan is first and foremost a graphics API, as the line blurs between graphics and compute, it feeds into their compute plans as well. The forthcoming Vulkan 1.1 specification is set to introduce some new compute functionality that further bridges the gap between Vulkan 1.0 and OpenCL 2.x, which ARM in turn will be preparing to take advantage of.

But regardless of the compute implications, ARM sees Vulkan as being important to the long term progression of software development. The lower overhead of Vulkan factors well into the power and thermal needs of mobile devices; unnecessary CPU work not only burns power, but it eats into thermal headroom that could be going to the GPU. Consequently, expect to see ARM pushing Vulkan even harder in the coming months in alignment with G71.

The Bifrost Core: Decoupled Wrapping Things Up: Mali-G71, Coming Soon
Comments Locked

57 Comments

View All Comments

  • Andrei Frumusanu - Monday, May 30, 2016 - link

    > it's cheaper this way than buying license to modify these cores (and you have to add R&D costs of modifying uarch).

    There is no such license. ARM does not allow vendors to modify the licensed micro-architectures, even on the newly announced license it's ARM themselves which do the modifications, not giving vendors access to change the RTL.

    http://www.anandtech.com/show/10366/arm-built-on-c...
  • Tabalan - Monday, May 30, 2016 - link

    I meant architecture license, custom design cores. My bad, used wrong phrase, thanks for pointing that out.
  • Shadow7037932 - Monday, May 30, 2016 - link

    Samsung sells the Exynos, but vendors are unlikely to jump ship from Qualcomm because the OEMs already know the Qualcomm stuff well.
  • FullmetalTitan - Monday, May 30, 2016 - link

    Also a bit of global supply economics in play there. Samsung uses Exynos chips in their flagship phones in the Asian market and sometimes select parts of the European market, but typically they buy from Qualcomm for the NA market and majority of Europe due to marketing, existing penetration, etc. It also helps that Samsung currently is a fabricator of the newest Snapdragon SoCs in both Korea and the U.S. and that affords them prime pricing.

    They would be slapping the right hand with the left by getting too deep into the R&D and fighting for market share with Exynos. It would be taking revenue from their foundry business to try to grow their design business, and margins in mobile logic are pretty slim these days.
  • Howard72 - Monday, May 30, 2016 - link

    Mali G71 parallel architecture is now consider as RISC SIMD?
  • Howard72 - Monday, May 30, 2016 - link

    *considerd
  • OEMG - Monday, May 30, 2016 - link

    I hope they also push this to the low-end, at least for the sake of API parity (Vulkan! Vulkan! Vulkan!) among a wide range of devices.
  • LeptonX - Monday, May 30, 2016 - link

    "In just six years, the number of GPU vendors with a major presence in high-end Android phones has been whittled down to only two: the vertically integrated Qualcomm, and the IP-licensing ARM."

    What about PowerVR? Why did you omit them? Aren't they still a major player with GPUs that are at the top both in terms of overall performance and energy-efficiency?
  • Ariknowsbest - Monday, May 30, 2016 - link

    I would consider PowerVR as a major player, after several generations i find them to be very balanced and powerful.
    But they have lost marketshare to ARM.

    So far only bought products with PowerVR and Adreno and a couple Tegras.
  • Colin1497 - Monday, May 30, 2016 - link

    I think the "android" statement is the qualifier. PowerVR is obviously big with Apple as a customer. In the non-Apple space I guess they're more limited? MediaTek uses them?

Log in

Don't have an account? Sign up now