Midgard: The Modern Mali

As ARM’s current-generation SoC GPU architecture, at the highest level the Midgard architecture is an interesting take on GPUs that in some ways looks a lot like other GPUs we’ve seen before, and in other ways (owing to its uncommon ancestry) is radically unlike other GPUs. This is coupled with the fact that as an SoC GPU supplier, ARM is in an interesting position where they can offer both CPU and GPU designs to 3rd party licensees, unlike most other GPU designers who either use their designs internally (Qualcomm, NVIDIA) or only license out GPUs and not ARM CPUs (Imagination). From a sales perspective this means ARM can offer the CPU and GPU designs together in a bundle, but perhaps more importantly it means they have the capability design the two in concert with each other, being in the position of the sole creator of the ARM ISA.

Architecturally Midgard is a direct descendant of Utgard. While there is a significant difference in how unified and discrete shaders operate, and as a result they cannot simply be swapped, the resulting shader design for Midgard still ends inheriting many of Utgard’s design elements, features, and quirks. At the same time the surrounding functionality blocks that compose the rest of the GPU have received their own upgrades over the years to improve performance and features, but are none the less distinctly descended from Utgard as well. At the end of the day this is a distinction more important for programmers than it is users (or even tech enthusiasts), but going forward it’s interesting to note just how similar Utgard and Midgard are, a similarity we don’t normally see between unified and discrete shader designs.

From a design standpoint Midgard is designed to span much of the range for SoC GPUs, from cheap, area-efficient designs to relatively massive designs with an eye on gaming. In doing so ARM offers a few different variations on the Midgard design that are all architecturally identical, but will vary slightly in features and internal organization. So for the purposes of today’s article we’ll be focusing on ARM’s latest and greatest design, Mali-T760, but we will also be calling out differences as necessary.

First and foremost then, let’s talk about design goals and features. Unlike the bare bones OpenGL ES 2.0 Utgard architecture, Midgard has been designed to be a more feature-rich architecture that not only offers solid graphics performance but solid compute performance too. This is in part a logical extension of what a unified shader GPU can already do – they’re innately good at mass math for graphics, so compute is only a minor stretch – but also a deliberate decision by ARM to push compute harder than they would otherwise have to for merely a graphics product.

From an API standpoint then Midgard was designed as what is best described as an OpenGL ES 3.0+ part. The architecture was designed from the start to offer functionality beyond what OpenGL ES 3.0 would offer, a decision that has since benefitted ARM by allowing Midgard parts to keep up with newer API standards. In fact ARM has just recently completed OpenGL ES 3.1 conformance testing, with their updated drivers passing Khronos’s required tests. As such all Midgard parts at a hardware level can support OpenGL ES 3.1, with software support reliant on OS and device vendors shipping updated OSes and drivers that enable 3.1 functionality.

Even then Midgard has some functionality that has gone untapped, but will be enabled in the Android ecosystem through the upcoming Android Extension Pack for Android L. The AEP will further build off of OpenGL ES 3.1 by enabling features such as tessellation and geometry shaders, features that did not make it in to 3.1. As with OpenGL ES 3.1, ARM has confirmed that they expect all Midgard GPUs to support the AEP.

Finally, along with OpenGL ES support, ARM also officially offers Direct3D support on Midgard. This functionality has not yet been tapped – all Windows Phone and Windows RT devices so far have been Qualcomm or NVIDIA based – but in principle it is there. One thing to note however is that among the Mali 700 series, only Mali-T760 is Direct3D Feature Level 11_1 capable. Mali-T720 however only supports level 9_3, more befitting of the market realities and its status as a lower cost, lower complexity part.

Meanwhile from a compute standpoint Midgard is intended to be a strong competitor by supporting Android’s RenderScript framework and OpenCL 1.2 full profile. OpenCL support on SoC GPUs has been spotty due in part to the fact that the major OSes haven’t consistently supported it (iOS never has and Android only recently), and of those SoC GPUs that do support it, not all of them support the full profile as opposed to the much more restricted embedded profile. As is often the case with GPU computing just how well this functionality is used is up to the capabilities and imaginations of developers, but ARM has made it clear that they’re fully backing GPU computing even in the SoC space.

A Brief History of Mali The Midgard Architecture
POST A COMMENT

66 Comments

View All Comments

  • 3DPowerFX - Thursday, July 3, 2014 - link

    Once again, AnandTech has published a great article! Thanks ARM and AnandTech.

    Just one point. there is a small mistake in the article about Samsung Exynos 3470 GPU. It's not Mali 450MP but the undead Mali 400MP GPU. Although it would be nice to have the latest one.
    Reply
  • Cogman - Thursday, July 3, 2014 - link

    On transaction elimination. A movie is actually much worse about being eliminated than anything else. The only saving grace for a movie is the fact that the FPS are often much lower than what the device is natively putting out (so 60fps is a typical display refresh rate whereas movies typically operate at 24->30fps). After that, everything changes right down to the smallest detail. This is the grainy effect that you see in movies.

    For games, there could be some benefit assuming the game isn't a high action one. The biggest win will be still images (90% of what these displays are going to be displaying).
    Reply
  • EdvardS - Thursday, July 3, 2014 - link

    Movies are not actually that bad. Remember that videos we watch on our devices have already been compressed with lossy algorithms looking for temporal resemblance, which seems to boost the transaction elimination efficiency as well. Reply
  • BMNify - Thursday, July 3, 2014 - link

    gem did a writup , but i cant find it now !, but take a look here as regards transaction elimination http://community.arm.com/groups/arm-mali-graphics/...

    BTW "the grainy effect that you see in movies" have absolutely nothing to do with frame rate

    its put there (as in artificially) by the post processing due to the fact today everyone's using 8bit per pixel as in Rec. 709 (HDTV) color space that produces banding and other visible anomalies not the new official Rec. 2020 (UHDTV/UHD-1/UHD-2) real 10bit/12bit color space we will see soon.
    Reply
  • tuxRoller - Thursday, July 3, 2014 - link

    Consider asking red hat's rob clark. He's been reverse engineering the adreno arch (his driver, freedreno (https://github.com/freedreno/freedreno/wiki) however, is not a reverse engineered adreno driver) for a few years now and can almost certainly give you at least that much info.
    His blog is at http://bloggingthemonkey.blogspot.com, and he's a super nice guy.
    Reply
  • jwcalla - Friday, July 4, 2014 - link

    Qualcomm is a really closed company. They just did a massive DCMA takedown on GitHub: https://github.com/github/dmca/blob/master/2014-07...

    Their software side isn't that great either.
    Reply
  • tuxRoller - Friday, July 4, 2014 - link

    I'm not sure why this is addressed to me. Although I expect AT will ignore what I've written so as not to upset their corporate friends, what I suggested is what they should do if they are really interested in the tech.
    What's strange to me is that they did something similar with their analysis of Cyclone to what I'm suggesting they do, except in the Qualcomm case the work is done by someone else.
    Reply
  • Death666Angel - Thursday, July 3, 2014 - link

    Awesome to see this here! I hope the Adreno team will follow suite soon and lay their doubts to rest.

    "LG’s Viewty" Holy shit, that way my 2nd ever phone (after my first flip phone got broken when I rammed a car with my bike). That thing was pretty bad all in all. But the slow motion camera was great for its time! :D It broke too while I was in a fight, but that was the last one. Touch Pro 2, Galaxy S2, Galaxy Nexus and LG G2 all working fine till this day. :D
    Reply
  • Willardjuice - Thursday, July 3, 2014 - link

    "From a sales perspective this means ARM can offer the CPU and GPU designs together in a bundle, but perhaps more importantly it means they have the capability design the two in concert with each other, being in the position of the sole creator of the ARM ISA."

    lol, the bundle aspect is far more important for ARM gpu sales. ;)
    Reply
  • skiboysteve - Thursday, July 3, 2014 - link

    Truth. Basically makes it so a competitor needs to show a significant performance, power, feature, or cost difference before it's worth an integrator investing in breaking apart the bundle Reply

Log in

Don't have an account? Sign up now