Section by Andrei Frumusanu

Performance & Efficiency

In terms of scalability and performance, what we can generally say is that one G76 core is roughly equal to two G72 cores. This also changes the configuration options that Arm offers as the maximum core count for the largest GPU is an MP20 configuration.

When going all-out in laying down cores this means we have a 25% higher maximum performance point. To date we haven’t seen vendors reach near the maximum configuration option of MP32 for the G71 and G72 and as the largest Mali was the Exynos 8895 with a G71MP20.

Improving the performance density of the cores by consolidating functional blocks and execution engines in fewer “cores” improves the PPA of the GPU dramatically. The G76 at iso process and frequency, at similar area configurations, is said to improve the fps/mm² metric by 39% in Manhattan 3.0 and thanks to the improvements in the geometry pipelines, a significant 65% in Car Chase. The casual gaming benchmark here depicts a simpler fill-rate bound workload such as Angry Birds and Candy Crush.

In terms of power efficiency, the metrics presented here depict the performance improvement at ISO process node and frequency, at ISO power values of peak power coming in at a target 2.3W GPU power only. We’re to be reminded that for 3D workloads there’s significant power overhead from the memory subsystem and DRAM and that’s why this figure is lower than what I’ve usually published in terms of total platform active power in the past.

In general the figures that we’re looking at in terms of improvement in common benchmarks like Manhattan are a 1.3x increase in performance at equal power and area, process and frequency.

How this would look in a late 2018 / early 2019 SoC would be something like the following projection:

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
Mali G76MP12 SoC Projection 7/8nm class 69.00 4.08 16.90 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 61.16 5.01 11.99 fps/W
Galaxy S9 (Exynos 9810) 10LPP 46.04 4.08 11.28 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 33.04 4.18 7.90 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 30.98 3.98 7.78 fps/W
Huawei Mate 10 (Kirin 970) 10FF 37.66 6.33 5.94 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W
Galaxy S7 (Exynos 8890) 14LPP 29.41 5.95 4.94 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 14.45 3.47 4.16 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 21.94 5.44 4.03 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 10.37 2.75 3.77 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 32.49 8.63 3.77 fps/W
Huawei P9 (Kirin 955) 16FF+ 10.59 2.98 3.55 fps/W

Arm that’s that the 1.5x target improvement in performance on a future G76 in 7nm would happen thanks to a relative increase of the GPU capabilities scaling from a G72MP18 to a G76MP12. So it seem natural to take the Exynos 9810 as a baseline for the performance projections. Assuming the power target wouldn’t change, we’d see a G76MP12 in the upcoming process node outperforming current generation leader, the Snapdragon 845, by 13% in Manhattan 3.1. Power efficiency at peak performance would also be 47% better.

Obviously the competition won’t be standing still – although Qualcomm had a bit of a misstep in terms of power efficiency in the Adreno 630, it’s possible this will be caught up in the next iteration next year, not to mention that the process node improvements alone would be then sufficient to retake the lead on the GPU side.

End Remarks

All in all, the Mali G76 provides extremely solid advancements – 30% better performance at the same area and power are heavy generational improvements. However while this will greatly improve the competitiveness of Mali GPUs – I don’t think it will be quite sufficient to catch up with the competition.

In terms of the microarchitectural changes, I think Arm did the right choices in terms of consolidating the cores and beefing them up. Currently it seems that the high-core count in Mali GPUs is a two-edged sword; while it does provide extremely fine-grained configuration ability and allows vendors to pick exactly a certain core count that fits their area budget for the GPU, it also causes inevitable overhead.

The Mali G76 proves the kind of improvement that comes from simply avoiding overhead control logic. Arm envisions a MP12 configuration for a flagship SoC and I still quite think this is rather too many cores. Compared to the 4-core Adreno 540, 2-core Adreno 630 or even the 3-core Apple A11 GPU it’s easy to see quite why Mali lags behind in power efficiency and area. I wish that in the future we’ll see another doubling of the computational resources per core as that would bring another large improvement to close the gap to the competition.

For now, I’m looking forward to how the landscape will change with upcoming SoCs and how the G76 will perform in actual silicon.

The Mali G76 µarch - Fine tuning it
POST A COMMENT

25 Comments

View All Comments

  • eastcoast_pete - Friday, June 01, 2018 - link

    I expect some headwind for this, but bear with me. Firstly, great that ARM keeps pushing forward on the graphics front, this does sound promising. Here my crazy (?) question: would a MALI G76 based graphics card for a PC (laptop or desktop) be a. feasible and b. be better/faster than Intel embedded. Like many users, I have gotten frustrated with the crypto-craze induced price explosion for NVIDIA and AMD dedicated graphics, and Intel seems to have thrown in the towel on making anything close to those when it comes to graphics. So, if one can run WIN 10 on ARM chips, would a graphics setup with, let's say, 20 Mali G76 cores, be worthwhile to have? How would it compare to lower-end dedicated graphics from the current duopoly? Any company out there ambitious and daring enough to try? Reply
  • eastcoast_pete - Friday, June 01, 2018 - link

    Just to clarify: I mean a dedicated multicore (20, please) PCIe-connected MALI graphics card in a PC with a grown-up Intel or AMD Ryzen CPU - hence "crazy", but maybe not. I know there will be some sort of ARM-CPU based WIN 10 laptops, but those target the market currently served by Celeron etc. Reply
  • Alurian - Friday, June 01, 2018 - link

    Arguably MALI might one day be powerful to do interesting things with should ARM choose to take that direction. But comparing MALI to the dedicated graphics units that AMD and NVIDIA have been working with for decades...certainly not in the short term. If it was that easy Intel would have popped out a competitor chip by now. Reply
  • Valantar - Friday, June 01, 2018 - link

    I'd say it depends on your use case. For desktop usage and multimedia, it'd probably be decent, although it would use significantly more power than any iGPU simply due to being a PCIe device.

    On the other hand, for 3D and gaming, drivers are key (and far more complex), and ARM would here have to play catch-up with a decade or more of driver development from their competitors. It would not go well.
    Reply
  • duploxxx - Friday, June 01, 2018 - link

    like many users, I have gotton frustrated with and intel seems to have thrown in the towel....

    how does that sound you think?..........

    easy solution buy a ryzen apu. more then enough cpu power to run win10 and decent gpu and if you think the intel cpu are better then ******
    Reply
  • eastcoast_pete - Friday, June 01, 2018 - link

    Have you tried to buy a graphics card recently? I actually like the Ryzen chips, but once you add the (required) dedicated graphics card, it gets expensive fast. There is a place for a good, cheap graphics solution that still beats Intel embedded but doesn't break the bank. Think HTC setups. My comment on Intel having thrown in the towel refers to them now using AMD dedicated graphics fused to their CPUs in recent months; they have clearly abandoned the idea of increasing the performance of their own designs (Iris much?) , and that is bad for the competitive situation in the lower end graphics space. Reply
  • jimjamjamie - Friday, June 01, 2018 - link

    Ryzen 3 2200G
    Ryzen 5 2400G

    Thank me later.
    Reply
  • dromoxen - Sunday, June 03, 2018 - link

    2200ge
    2400ge
    Intel has far from given up gfx , in fact they are plunging into it with both feet? They are demonstrating their own discrete card and hope for availability sometime in 2019. the amd powered hades is just a stop gap, maybe even a little technology demonstrator if you will. The promise of APU accelerated processing is finally arriving, most especially for AI apps.
    Reply
  • Ryan Smith - Friday, June 01, 2018 - link

    Truthfully I don't have a good answer for you. A Mali-G76MP20 works out to 480 ALUs (8*3*20), which isn't a lot by PC standards. However ALUs alone aren't everything, as we can clearly see comparing NVIDIA to AMD, AMD to Intel, etc.

    At a high level, high core count Malis are meant to offer laptop-class performance. So I'd expect them to be performance competitive with an Intel GT2 configuration, if not ahead of them in some cases. (Note that Mali is only for iGPUs as part of an SoC; it's lacking a bunch of important bits necessary to be used discretely)

    At least if Arm gets their way, then perhaps one day we'll get to see this. With Windows-on-ARM, there's no reason you couldn't eventually build an A76+G76 SoC for a Windows machine.
    Reply
  • eastcoast_pete - Friday, June 01, 2018 - link

    Thanks Ryan! I wouldn't expect MALI graphics in a PC to challenge the high end of dedicated graphics, but if they come close to an NVIDIA 1030 card but significantly cheaper, I would be game to try. That being said, I realize that going from an SOC to a actual stand-alone part would require some heavy lifting. But then, there is an untapped market waiting to be served. Lastly, this must have occurred to the people at ARM graphics (MALI team) , and I wonder if any of them has ever speculated on how their newest&hottest would stack up against GT2, or entry-level NVIDIA and AMD solutions. Any off-the-record remarks? Reply

Log in

Don't have an account? Sign up now