Putting It Together: Mali-G71

Now that we’ve seen what Bifrost looks like at the core level, it’s time to take a look at the big picture. The first ARM GPU design that Bifrost will be going into is the high-end Mali-G71, which is being announced alongside the Cortex-A73 this morning.

Like other Mali designs, the G71 is designed for a variable number of shader cores, up to 32 in total. This gives ARM’s clients a significant amount of scalability to work with, and ARM’s own marketing slides show an 8x span in designs, from G71MP4 all the way up to the full G71MP32. At the same time it’s not completely clear at the moment if a full sized 32 core design is even viable for a mobile device, at least on current processes. Mali’s core count scalability tends to be very forward looking in that regard.

Connecting the various shader cores together is a new control fabric for Bifrost. The new fabric goes hand-in-hand with the earlier changes we discussed to the core design, which changed how various units are attached to the fabric. There are also some implications for heterogeneous compute, which we’ll get back to in a bit.

Meanwhile at the other end of the fabric is G71’s L2 cache subsystem. In a change from Midgard, the L2 cache is now a single logical L2 cache, as opposed to being a fully segmented cache before. Furthermore the cache has been reworked a bit to cut down on the number of partial lines that are flushed out to memory. Partial lines became a more pressing problem with LPDDR4, which introduced a larger prefetch size that in turn is less tolerant of partial lines.

But the biggest news here where the L2 cache fits into the bigger picture in the ARM ecosystem, where it’s attached to the SoC coherent interconnect, such as ARM’s new CoreLink CCI-550 interconnect or third-party proprietary interconnect. Overall G71 now offers up to 4 full ACE (fully coherent) interfaces to the interconnect, versus only two ACE Lite (IO coherent) interfaces on Midgard. Taken altogether, thanks to a combination of architecture changes at the GPU level, the fabric level, the cache level, and the interconnect level, G71 offers full cache coherency with the rest of the system. As a result, when paired up with a suitable CPU core, G71 is capable of heterogeneous compute.

ARM has stated their intention to step into offering heterogeneous compute functionality for some time now, and G71 is in turn their first GPU to be released with support for the feature. The implementation here allows for a full “fine grained” compute, meaning that both the CPU and GPU can see each other’s caches, allowing for the greatest potential performance gains from heterogeneous compute.

From a software standpoint, it’s interesting to note that ARM has gone with an OpenCL 2.0-centric approach, intending to make the functionality accessible through that and related (SPIR-V utilizing) APIs such as Vulkan. G71 however does not support the Heterogeneous System Architecture’s HSAIL standard, this despite ARM being a member of the HSA Foundation. ARM did not have too much to say on the matter, but has stated that they never “totally bought into” HSAIL. OpenCL 2.0, by comparison, is a more generic implementation at the API level, leaving ARM to sort out the low level details as they see fit.

Update 06/01: With yesterday's announcement of the HSA 1.1 specification, I went back to ARM to ask them whether the new specification impacts the company’s heterogenous compute plans at all, especially given that their architecture doesn’t support the 1.0 standard. As it turns out, ARM is going a route very similar to AMD’s ROCm platform: while the company isn’t utilizing the HSAIL – and thus in the strictest sense isn’t a complete HSA platform – they are using the HSA standard in the development of their hardware.

At a hardware level, the HSA specification standardizes a number of aspects of the hardware for common interoperability and easier programming purposes, including signals, queues, floating point number handling, and other, low-level minutiae about how heterogeneous execution should work. This is separate from the HSAIL, which is more concerned with the software aspects of heterogeneous programming, and though helpful, is not necessary for heterogeneous compute. As a result while Mali-G71 is technically not an HSA platform, in practice it is HSA hardware, using the HSA specification as a means to offer a common and well understood execution model for heterogeneous compute. So ARM is very much on-board with HSA – and is essentially supplying one of the first non-AMD HSA 1.1 hardware designs – even if they’re not using HSAIL itself.

At this point heterogeneous compute is still a long term play for ARM. The potential performance improvements are, in the right scenarios, very significant. And using the GPU instead of the CPU is again a sound move when there’s lots of suitable parallel work to throw at it, especially in SoCs where power efficiency is so critical. But it will take time to bring software developers on board, so while the hardware will soon be here, it will take some time for the software to catch up.

That said, a big part of the process will the natural migration towards newer APIs that better support heterogeneous execution. ARM of course has been big on Vulkan support, and while Vulkan is first and foremost a graphics API, as the line blurs between graphics and compute, it feeds into their compute plans as well. The forthcoming Vulkan 1.1 specification is set to introduce some new compute functionality that further bridges the gap between Vulkan 1.0 and OpenCL 2.x, which ARM in turn will be preparing to take advantage of.

But regardless of the compute implications, ARM sees Vulkan as being important to the long term progression of software development. The lower overhead of Vulkan factors well into the power and thermal needs of mobile devices; unnecessary CPU work not only burns power, but it eats into thermal headroom that could be going to the GPU. Consequently, expect to see ARM pushing Vulkan even harder in the coming months in alignment with G71.

The Bifrost Core: Decoupled Wrapping Things Up: Mali-G71, Coming Soon
Comments Locked

57 Comments

View All Comments

  • mdriftmeyer - Tuesday, May 31, 2016 - link

    Aren't you glad you commented yesterday? See the update to HSA.
  • prisonerX - Wednesday, June 1, 2016 - link

    You're confusing OpenCL C with OpenCL. SPIR-V is an intermediate language also supported by OpenCL.
  • pencea - Monday, May 30, 2016 - link

    How about the review for the GTX 1080? It's been days since the card came out. Other major sites have already posted their reviews on both the 1080 and 1070, while AnandTech still haven't posted one yet. Only a pathetic preview.
  • Quake - Monday, May 30, 2016 - link

    Quantiy over quality my friend. Anandtech is known to write some concise, detailed and thorough articles. For me, sites like Engadget are tabloid newspapers while Anandtech is a respected newspaper that takes it time to write some thorough and intelligent reviews.
  • Tabalan - Monday, May 30, 2016 - link

    True, but they could split review into 2 parts - 1st one would consists of tests, benchmarks, etc (like other websites do), other would be about uarchi on it's own. This way almost everyone would be happy.
  • funkforce - Monday, May 30, 2016 - link

    Engadget?! Come on! PCWorld, PCGamesHardware, HardOCP, KitGuru, Hothardware, Tomshardware, TechSpot, HardwareCanucks, TweakTown all posted their review, many almost as thourough as Anandtech... 13 days ago! And we could forgive Mr Smith if it was a one time thing, but it's been like this every GPU review since he took over as Editor in Chief.
    When Anand was in charge, no review was late like this and it still was as thorough.

    Now I LOVE Ryan's writing, it's the best bar none, he is and awesome guy for sure!
    But please! For the love of GOD, step down as Editor and focus on writing only, delivering on time without a boss to push you is not your thing...
    Just check earlier reviews, same thing, 1-2 weeks late, even though promised so many times it would come out a week or two before actually published. (Except GTX 960 which never got published at all after 7 weeks of promises and then just silence)
    Alexa shows this website has lost an insane amount of readers in 1 year. I have been here for almost 20 years and I just want AT to be great again. Please someone do something! Anyone?! <3 Please save AT!
  • r3loaded - Tuesday, May 31, 2016 - link

    Because while other sites will be satisfied with a review that covers "zomg runs Crysis 3 and GTA V on max settings at xyz FPS, temps and noise are pretty good", Anandtech doesn't really roll that way. They won't be satisfied with their review until they've completed a deep dive on the Pascal architecture, the merits of GDDR5X and how it compares with GDDR5 and HBM/HBM2, and quantifying frame latency and consistency.

    Other reviews are written by gamers and computer enthusiasts. Anandtech reviews are written by computer engineers.
  • prisonerX - Tuesday, May 31, 2016 - link

    1080 whiners like you are really tedious. I hope they cancel the review.
  • jjj - Monday, May 30, 2016 - link

    Any clue about cache sizes and if a reduction there is factored into the perf density math? Also wondering about thermal , some of the mentioned changes will help but some more details would be nice.
  • allanmac - Monday, May 30, 2016 - link

    Nice review. Delivering a full-featured Vulkan/SPIR-V 1.1 GPU to the masses is something we're all ready for.

Log in

Don't have an account? Sign up now