Putting It Together: Mali-G71

Now that we’ve seen what Bifrost looks like at the core level, it’s time to take a look at the big picture. The first ARM GPU design that Bifrost will be going into is the high-end Mali-G71, which is being announced alongside the Cortex-A73 this morning.

Like other Mali designs, the G71 is designed for a variable number of shader cores, up to 32 in total. This gives ARM’s clients a significant amount of scalability to work with, and ARM’s own marketing slides show an 8x span in designs, from G71MP4 all the way up to the full G71MP32. At the same time it’s not completely clear at the moment if a full sized 32 core design is even viable for a mobile device, at least on current processes. Mali’s core count scalability tends to be very forward looking in that regard.

Connecting the various shader cores together is a new control fabric for Bifrost. The new fabric goes hand-in-hand with the earlier changes we discussed to the core design, which changed how various units are attached to the fabric. There are also some implications for heterogeneous compute, which we’ll get back to in a bit.

Meanwhile at the other end of the fabric is G71’s L2 cache subsystem. In a change from Midgard, the L2 cache is now a single logical L2 cache, as opposed to being a fully segmented cache before. Furthermore the cache has been reworked a bit to cut down on the number of partial lines that are flushed out to memory. Partial lines became a more pressing problem with LPDDR4, which introduced a larger prefetch size that in turn is less tolerant of partial lines.

But the biggest news here where the L2 cache fits into the bigger picture in the ARM ecosystem, where it’s attached to the SoC coherent interconnect, such as ARM’s new CoreLink CCI-550 interconnect or third-party proprietary interconnect. Overall G71 now offers up to 4 full ACE (fully coherent) interfaces to the interconnect, versus only two ACE Lite (IO coherent) interfaces on Midgard. Taken altogether, thanks to a combination of architecture changes at the GPU level, the fabric level, the cache level, and the interconnect level, G71 offers full cache coherency with the rest of the system. As a result, when paired up with a suitable CPU core, G71 is capable of heterogeneous compute.

ARM has stated their intention to step into offering heterogeneous compute functionality for some time now, and G71 is in turn their first GPU to be released with support for the feature. The implementation here allows for a full “fine grained” compute, meaning that both the CPU and GPU can see each other’s caches, allowing for the greatest potential performance gains from heterogeneous compute.

From a software standpoint, it’s interesting to note that ARM has gone with an OpenCL 2.0-centric approach, intending to make the functionality accessible through that and related (SPIR-V utilizing) APIs such as Vulkan. G71 however does not support the Heterogeneous System Architecture’s HSAIL standard, this despite ARM being a member of the HSA Foundation. ARM did not have too much to say on the matter, but has stated that they never “totally bought into” HSAIL. OpenCL 2.0, by comparison, is a more generic implementation at the API level, leaving ARM to sort out the low level details as they see fit.

Update 06/01: With yesterday's announcement of the HSA 1.1 specification, I went back to ARM to ask them whether the new specification impacts the company’s heterogenous compute plans at all, especially given that their architecture doesn’t support the 1.0 standard. As it turns out, ARM is going a route very similar to AMD’s ROCm platform: while the company isn’t utilizing the HSAIL – and thus in the strictest sense isn’t a complete HSA platform – they are using the HSA standard in the development of their hardware.

At a hardware level, the HSA specification standardizes a number of aspects of the hardware for common interoperability and easier programming purposes, including signals, queues, floating point number handling, and other, low-level minutiae about how heterogeneous execution should work. This is separate from the HSAIL, which is more concerned with the software aspects of heterogeneous programming, and though helpful, is not necessary for heterogeneous compute. As a result while Mali-G71 is technically not an HSA platform, in practice it is HSA hardware, using the HSA specification as a means to offer a common and well understood execution model for heterogeneous compute. So ARM is very much on-board with HSA – and is essentially supplying one of the first non-AMD HSA 1.1 hardware designs – even if they’re not using HSAIL itself.

At this point heterogeneous compute is still a long term play for ARM. The potential performance improvements are, in the right scenarios, very significant. And using the GPU instead of the CPU is again a sound move when there’s lots of suitable parallel work to throw at it, especially in SoCs where power efficiency is so critical. But it will take time to bring software developers on board, so while the hardware will soon be here, it will take some time for the software to catch up.

That said, a big part of the process will the natural migration towards newer APIs that better support heterogeneous execution. ARM of course has been big on Vulkan support, and while Vulkan is first and foremost a graphics API, as the line blurs between graphics and compute, it feeds into their compute plans as well. The forthcoming Vulkan 1.1 specification is set to introduce some new compute functionality that further bridges the gap between Vulkan 1.0 and OpenCL 2.x, which ARM in turn will be preparing to take advantage of.

But regardless of the compute implications, ARM sees Vulkan as being important to the long term progression of software development. The lower overhead of Vulkan factors well into the power and thermal needs of mobile devices; unnecessary CPU work not only burns power, but it eats into thermal headroom that could be going to the GPU. Consequently, expect to see ARM pushing Vulkan even harder in the coming months in alignment with G71.

The Bifrost Core: Decoupled Wrapping Things Up: Mali-G71, Coming Soon
POST A COMMENT

57 Comments

View All Comments

  • Ranger1065 - Monday, May 30, 2016 - link

    Interesting but not quite the GPU review the faithfull are awaiting...hope springs eternal. Reply
  • Shadow7037932 - Monday, May 30, 2016 - link

    This is more interesting than a GTX 1070/1080 review imo. We more or less know what the nVidia cards are capable of. This ARM GPU design will be relevant for the next 2-3 years. Reply
  • Alexey291 - Monday, May 30, 2016 - link

    Well yeah, ofc it's not interesting anymore because by the time their reviews hit whatever it is they are reviewing is a known quantity. 1080 in this case is a perfect example. Reply
  • name99 - Tuesday, May 31, 2016 - link

    Oh give it a rest! Your whining about the 1080 is growing tiresome. Reply
  • SpartanJet - Monday, May 30, 2016 - link

    Relevant for what? Phone GPU's are fine as is. For mobile gaming? Its all cash shop garbage. For productivity? ADroid and iOS pale in comparison to a real OS like Windows 10.

    I find the Nvidia much more interesting.
    Reply
  • name99 - Tuesday, May 31, 2016 - link

    "Phone GPUs are fine as is."
    And there, ladies and gentlemen, is the "640K is enough for anyone" of our times...
    Truly strong the vision is, in this one.
    Reply
  • imaheadcase - Tuesday, May 31, 2016 - link

    He is not wrong though. A faster GPU on one offers nothing for people as of right now. If you look at the mobile apps that are used, not a single one would even come close to using what they use now..and phones are at max quality for the screen size they use.

    The only way phones can improve now that users would notice is storage/CPU/and better app quality in general which is terrible.
    Reply
  • shadarlo - Tuesday, May 31, 2016 - link

    You realize that GPU's are used for things beyond prettying up a screen right?

    Lets just stop advancing mobile GPU's because in the future we will never use anything more advanced that needs more power or less power usage... *eye roll*
    Reply
  • ribzy - Monday, November 21, 2016 - link

    I think what he's trying to say is that the content is not available or sufficient right now to really justify the cost. Let's say you built a car but don't have the fuel that's needed to power it. Or what about those that got a 4K TV early and now are stuck with something that doesn't have HDR and that runs old, not updateable TV software. I understand his concern. Reply
  • mr_tawan - Thursday, June 02, 2016 - link

    Well as the phone's and tablet's resolution is going ridiculously higher and higher everyday, having strong GPU is a must. It's not all about gaming, it affects day-to-day use as well. Everything on displays now on every OS nowadays are hardware-accelerated. And the current mobile GPU are under-power comparing to the task it's given (eg. driving 2K displays at 60fps). Reply

Log in

Don't have an account? Sign up now