Over the last few years the SoC GPU space has taken an interesting path, and one I admittedly wasn’t expecting. At the start of this decade the playing field for SoC-class GPUs was rather diverse, with everyone from NVIDIA to Broadcom (and everything in between) participating in it. Consolidation in the GPU space would be inevitable – something we’ve already seen with SoC vendors dropping out – however I am surprised by just how quickly it has happened. In just six years, the number of GPU vendors with a major presence in high-end Android phones has been whittled down to only two: the vertically integrated Qualcomm, and the IP-licensing ARM.

That ARM has managed to secure most of the licensed GPU market for themselves is a testament to both their engineering and their IP licensing efforts. ARM’s path into this market has been non-traditional, having acquired an essentially unknown GPU vendor a decade ago, and growing it into the 800lb gorilla it has now become. ARM’s expertise in IP licensing, coupled with a somewhat unusual GPU architecture, has proven to be a powerful combination for the company as they have secured a number of significant wins from the high end to the low end.

Much of this growth was built on the back of the company’s GPU architecture of the last few years, Midgard. Initially launched in 2012, Midgard has been the cornerstone of ARM’s Mali 600, 700, and 800 series designs. As ARM’s first unified shader design for GPUs, Midgard has been extended over the years to support newer features such as geometry tessellation and 10bpc color, along with newer APIs such as OpenGL ES 3.1/3.2 and Vulkan.

However as Midgard approaches its fourth birthday and the SoC GPU landscape evolves, Midgard’s time at the top will soon be coming to an end. Amidst the backdrop of Computex 2016 and alongside their new Cortex-A73 CPU, ARM is announcing their next generation GPU architecture, Bifrost. A significant update to ARM’s GPU architecture, Bifrost will first be deployed in ARM’s Mali-G71 GPU.

Recap: Mali & VLIW

One of the interesting aspects of SoC GPU development over the years is that it has been a very distinct echo of larger discrete GPU development. Many innovations and changes that first show up with dGPUs will show up in SoC GPUs a few years later, as newer manufacturing processes allow for those developments to fit within the extreme space and power requirements of an SoC-class GPU. At the same time mobile games/graphics development follows a similar path, with mobile application developers picking up rendering techniques first used elsewhere.

ARM’s architectural development, in turn, has been a good example of this process. The non-unified Utgard architecture gave way to the unified Midgard architecture in 2012, about 6 years after dGPUs first made the transition. And as we learned when we examined the Midgard architecture in depth, Midgard was an architecture well suited for the rendering paradigms of the time.

Midgard’s shader core, in short, was an Instruction Level Parallelism-centric design, employing a Very Long Instruction Word (VLIW) instruction format. To achieve maximum utilization out of Midgard’s shader cores, you needed to be able to extract a significant amount of ILP – 4 concurrent instructions – in order to fill all of the slots in a shader core. This sort of design maps well to basic graphics workloads, as 4 color component RGBA is a natural fit for the 4 lanes of ARM’s VLIW-4 design. Furthermore VLIW designs are traditionally very space efficient, as there’s relatively little overhead logic, which is always a boon for the tight constraints of the SoC space.

However getting back to what we said earlier about SoC GPUs being an echo of discrete GPUs, as we’ve seen there, VLIW does have a limited shelf life. Newer rendering paradigms often work with just 1 or 2 components at once, which leaves open lanes that need to be filled to achieve full GPU utilization. A good shader compiler can help here, but it does become an escalating technology war over time, as getting good performance becomes increasingly compiler-centric, and writing a compiler that can extract the necessary ILP is a challenge in and of itself. What history has shown us – and what is going to happen again in the mobile market – is that rendering workloads will continue to shift away from a style that is suitable for VLIW.

The Bifrost Quad: Replacing ILP with TLP
POST A COMMENT

57 Comments

View All Comments

  • Ariknowsbest - Monday, May 30, 2016 - link

    Mainly MediaTek and Rockchip that I can remember. Reply
  • Ryan Smith - Monday, May 30, 2016 - link

    Bingo. I haven't forgotten about the IMG crew, but in the Android space (which is really the only competitive space for GPU IP licensing) they've lost most of their market share, especially at the high-end. Reply
  • name99 - Tuesday, May 31, 2016 - link

    However it would be interesting to know how these various features (eg primacy of SIMT rather than SIMD, coherent common address space) compare to PowerVR. Reply
  • lucam - Tuesday, May 31, 2016 - link

    At this point I think it is a blessing that IMG has Apple as big customer; without it they would have completely lost all mobile market share. Reply
  • Ariknowsbest - Tuesday, May 31, 2016 - link

    But it's not good to be dependent on one large customer. Maybe the emergence of VR can help them to retake market share. Reply
  • lucam - Tuesday, May 31, 2016 - link

    Totally agree with you. PowerVr is an hell of solution, but for some reason IMG has lost his leadership in the mobile market, almost disappeared in Android.
    I wonder if IMG didn't have Apple, what could be the situation now. Maybe even worse..
    Reply
  • zeeBomb - Monday, May 30, 2016 - link

    Stay frosty my friends. Reply
  • Krysto - Monday, May 30, 2016 - link

    I guess ARM will abandon HSAIL now that SPIR-V and Vulkan are here. It probably makes sense to stop focusing on OpenCL as well, if developers can just use some other language than OpenCL with SPIR-V. Reply
  • mdriftmeyer - Monday, May 30, 2016 - link

    One uses C99+ or C11++ in OpenCL 2.x. SPIR-V same thing. Why would I care to write in SPIR-V unless it was a requirement for portability? If I want a lower level, higher performance result I'll skip SPIR-V which bridges with OpenCL via LLVM-IR and go straight to using Clang/LLVM and OpenCL?

    Don't confuse SPIR-V with the HSA Foundation. They are solving different needs and SPIR-V doesn't address what APUs via AMD are by designed to resolve.
    Reply
  • beginner99 - Tuesday, May 31, 2016 - link

    Yeah that's a bit of a bummer. For me this pretty much means HSA is DOA. No software company will invest in something HSA compatible if it only is available on AMD APUs. Reply

Log in

Don't have an account? Sign up now