Today Arm announces a new Mali Multimedia Suite of products consisting of GPU, display processor and video processor IPs targeting the mainstream and low-end. The announcement comes at a time where the industry’s biggest growth comes from China where “premium” experience devices hold >30% of global smartphone market share. Indeed we’ve seen the same narrative reproduced by MediaTek as it was a core factor in the strategy shift and re-focus on the P-series.

The coverage starts with the announcement of the Mali-G52 mid-range GPU IP which follows the G51 announced in October 2016. The G51 was a rather odd GPU design in the sense that we haven’t seen any consumer SoCs adopt it, as vendors seemingly have preferred to use low core count G71 and G72s. Arm states that the DTV market is also a large volume market where mainstream GPUs are in demand, however we have less visibility into the silicon of those markets.

The G52 promises great gains as Arm posts up to a 30% improvement in performance density, meaning fps/mm². The efficiency improvements are more conservative with a posted 15% improvement over the G51.

With what was surprising to me is to see that Arm has divulged that one of the core changes of the G52 over the G51 is also a characteristic that I'm expecting to find in Arm’s next generation high-end GPUs. The big change is the doubling of the ALU lanes within an execution engine of a core. As a refresher, a single ALU lane for the Bifrost architecture (G71, G72, G51) included a FMA and an ADD/SF unit. An execution engine was comprised of four of these lanes, making up a wavefront which in Arm’s terminology was called a Quad. The G52 is the first to double the lanes up from four to eight, effectively doubling the ALU throughput within an execution engine.

Arm says that this is where most of the gains in performance and density come from as the doubling of the ALU lanes only increases the core area by ~1.22x. The 3.6x increase in machine learning workloads is attributed to the fact that the new ALUs can now handle 8-bit dot product operations.

The G52 continues to use the G51’s “dual-pixel” texture units which are able to process 2 pixels and 2 texels per cycle. A confusing matter for some G51 configurations was the fact that the GPU consisted of either cores with “uni-pixel” setups or “dual-pixel” setups and configurations such as MP3, which consists of a pairing a uni-pixel core with a dual-pixel core to make an “G51MP3”. And indeed there’s even more confusion when we realise that in the past Arm’s MP denotation for GPUs meant multi-pixel and actually counted the amount of pixel throughput of a GPU. The G52 now fixes this confusion and future MP denotations will actually refer to multi-processor configuration, so a G52MP4 will mean there are 4 GPU cores whereas a G51MP4 officially described a two-core configuration.

Arm Mali G52 vs G51
  Mali-G52 Mali-G51
Core Configurations 1-4 1-3
ALU Lanes Per Core (Default) 16 (2 EU)
24 (3 EU)
12
Texture Units Per Core 2 1-2
Pixel Units Per Core 2 1-2
FLOPS:Pixel Ratio 16:1 (2 EU)
24:1 (3 EU)
12:1 (Dual-pixel)
24:1 (Uni-pixel)
APIs OpenGLES 3.2
OpenCL 2.0
Vulkan
OpenGLES 3.2
OpenCL 2.0
Vulkan

To give customers more choice between compute and fill-rate focused configurations, Arm allows the G52 to be used with core setups containing either two or three execution engines, meaning the FLOPS/core will come in at either 32 or 48 counting only the FMA’s to 48 or 72 if you count in the additional ADD/SF unit. The FLOPS:pixel ratio naturally also changes as that is the point of the configuration flexibility, able to use a 16:1 or a 24:1 ratio. This ratio is a lot more compute balanced compared to the G51’s 12:1 ratio and now is the same as the higher-end GPUs.

The Mali-400 is Arm’s most successful GPU, and one could probably say it’s the most successful GPU ever from any vendor as the IP is now nearing its 10 year anniversary and it’s still shipping in new products today. Having received generational updates over the years, it’s only now that we finally see the need for a new ultra-low end GPU as operating systems and workloads make OpenGLES >3.0 and Vulkan a hard requirement, something that the good old Mali400 can’t do.

The new Mali-G31 is meant to finally replace the Mali-400 in super low end designs. The G31 is not related to the G52 in architecture as it still employs the traditional quad-layout (4 ALU lanes). While the G52 helped clear the confusion in configuration, the G31 remains confusing as it comes with either a one execution engine (4 lanes) with a 1 pixel per clock texture unit, or with two execution engines (2x4 lanes) with a 2 pixel per clock TMU. In a single-core configuration the G31 promises up to a 20% area reduction over the G51MP2 and up to 12% better UI performance, a metric likely tied to the fillrate efficiency of the core.

Wrapping up today’s announcement is an update on the display processor and video processors.

The Mali-V52 is a follow-up to the V61 also announced back at the end of 2016 along with the G51. The V52 scales down the V61 and targets the mid-range with more limited capabilities with up to 4K60 encoding and decoding (as opposed to 4K120 for the V61). The improvements allowed a 2x decode performance increase which in turn enabled a 38% smaller silicon area, which is a significant figure. Arm also says that for HEVC encoding the new architecture has improved its heuristics and achieves up to a 20% better quality when handling the variable block sizes of the codec.

Finally the Mali-D51 is a follow-up to the DP650 and is derived from the higher-end Mali-D71 whose architecture was disclosed under the Mali-Cetus codename here at AnandTech. The new IP allows for a 2x increase in area efficiency and supports up to 8 composition layers much like the D71. Arm’s display processors are quite unique as they allow for offloading UI rendering completely to the display processor from the GPU and in doing this achieve very good power efficiency compared to GPU-only approaches. 

POST A COMMENT

4 Comments

View All Comments

  • ET - Tuesday, March 06, 2018 - link

    Here's to hoping that the G31 indeed replaces the 400/450. Reply
  • PeachNCream - Tuesday, March 06, 2018 - link

    "...able to process 2 pixels and 2 texels per cycle."

    I just got a Riva TNT's "TwiN Texel Engine" flashback reading that.
    Reply
  • serendip - Tuesday, March 06, 2018 - link

    "Arm’s display processors are quite unique as they allow for offloading UI rendering completely to the display processor from the GPU and in doing this achieve very good power efficiency compared to GPU-only approaches."

    Please elaborate on this, I have always thought that Android rendering pipelines used the GPU for everything. I assume the display processor handles 2D layers?
    Reply
  • karthik.hegde - Tuesday, March 06, 2018 - link

    Generally the display processors' task is to fetch the frames from the frame-buffer somewhere in the memory and convert to the right protocol (VGA etc) and push it to the display only. Some might include more supports like cropping, re-scaling and other CV operations. However, in Arm's case it allows you to overlay many rendered frames to form a final frame that the UI has - which they call as frame composition. I think this is what is being referred by the author here. Reply

Log in

Don't have an account? Sign up now