Arm Announces Mali-G76 GPU: Scaling up Bifrostby Ryan Smith & Andrei Frumusanu on May 31, 2018 3:00 PM EST
Section by Andrei Frumusanu
Performance & Efficiency
In terms of scalability and performance, what we can generally say is that one G76 core is roughly equal to two G72 cores. This also changes the configuration options that Arm offers as the maximum core count for the largest GPU is an MP20 configuration.
When going all-out in laying down cores this means we have a 25% higher maximum performance point. To date we haven’t seen vendors reach near the maximum configuration option of MP32 for the G71 and G72 and as the largest Mali was the Exynos 8895 with a G71MP20.
Improving the performance density of the cores by consolidating functional blocks and execution engines in fewer “cores” improves the PPA of the GPU dramatically. The G76 at iso process and frequency, at similar area configurations, is said to improve the fps/mm² metric by 39% in Manhattan 3.0 and thanks to the improvements in the geometry pipelines, a significant 65% in Car Chase. The casual gaming benchmark here depicts a simpler fill-rate bound workload such as Angry Birds and Candy Crush.
In terms of power efficiency, the metrics presented here depict the performance improvement at ISO process node and frequency, at ISO power values of peak power coming in at a target 2.3W GPU power only. We’re to be reminded that for 3D workloads there’s significant power overhead from the memory subsystem and DRAM and that’s why this figure is lower than what I’ve usually published in terms of total platform active power in the past.
In general the figures that we’re looking at in terms of improvement in common benchmarks like Manhattan are a 1.3x increase in performance at equal power and area, process and frequency.
How this would look in a late 2018 / early 2019 SoC would be something like the following projection:
|GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
|Mfc. Process||FPS||Avg. Power
|Mali G76MP12 SoC Projection||7/8nm class||69.00||4.08||16.90 fps/W|
|Galaxy S9+ (Snapdragon 845)||10LPP||61.16||5.01||11.99 fps/W|
|Galaxy S9 (Exynos 9810)||10LPP||46.04||4.08||11.28 fps/W|
|Galaxy S8 (Snapdragon 835)||10LPE||38.90||3.79||10.26 fps/W|
|LeEco Le Pro3 (Snapdragon 821)||14LPP||33.04||4.18||7.90 fps/W|
|Galaxy S7 (Snapdragon 820)||14LPP||30.98||3.98||7.78 fps/W|
|Huawei Mate 10 (Kirin 970)||10FF||37.66||6.33||5.94 fps/W|
|Galaxy S8 (Exynos 8895)||10LPE||42.49||7.35||5.78 fps/W|
|Galaxy S7 (Exynos 8890)||14LPP||29.41||5.95||4.94 fps/W|
|Meizu PRO 5 (Exynos 7420)||14LPE||14.45||3.47||4.16 fps/W|
|Nexus 6P (Snapdragon 810 v2.1)||20Soc||21.94||5.44||4.03 fps/W|
|Huawei Mate 8 (Kirin 950)||16FF+||10.37||2.75||3.77 fps/W|
|Huawei Mate 9 (Kirin 960)||16FFC||32.49||8.63||3.77 fps/W|
|Huawei P9 (Kirin 955)||16FF+||10.59||2.98||3.55 fps/W|
Arm that’s that the 1.5x target improvement in performance on a future G76 in 7nm would happen thanks to a relative increase of the GPU capabilities scaling from a G72MP18 to a G76MP12. So it seem natural to take the Exynos 9810 as a baseline for the performance projections. Assuming the power target wouldn’t change, we’d see a G76MP12 in the upcoming process node outperforming current generation leader, the Snapdragon 845, by 13% in Manhattan 3.1. Power efficiency at peak performance would also be 47% better.
All in all, the Mali G76 provides extremely solid advancements – 30% better performance at the same area and power are heavy generational improvements. However while this will greatly improve the competitiveness of Mali GPUs – I don’t think it will be quite sufficient to catch up with the competition.
In terms of the microarchitectural changes, I think Arm did the right choices in terms of consolidating the cores and beefing them up. Currently it seems that the high-core count in Mali GPUs is a two-edged sword; while it does provide extremely fine-grained configuration ability and allows vendors to pick exactly a certain core count that fits their area budget for the GPU, it also causes inevitable overhead.
The Mali G76 proves the kind of improvement that comes from simply avoiding overhead control logic. Arm envisions a MP12 configuration for a flagship SoC and I still quite think this is rather too many cores. Compared to the 4-core Adreno 540, 2-core Adreno 630 or even the 3-core Apple A11 GPU it’s easy to see quite why Mali lags behind in power efficiency and area. I wish that in the future we’ll see another doubling of the computational resources per core as that would bring another large improvement to close the gap to the competition.
For now, I’m looking forward to how the landscape will change with upcoming SoCs and how the G76 will perform in actual silicon.