Section by Andrei Frumusanu

Performance & Efficiency

In terms of scalability and performance, what we can generally say is that one G76 core is roughly equal to two G72 cores. This also changes the configuration options that Arm offers as the maximum core count for the largest GPU is an MP20 configuration.

When going all-out in laying down cores this means we have a 25% higher maximum performance point. To date we haven’t seen vendors reach near the maximum configuration option of MP32 for the G71 and G72 and as the largest Mali was the Exynos 8895 with a G71MP20.

Improving the performance density of the cores by consolidating functional blocks and execution engines in fewer “cores” improves the PPA of the GPU dramatically. The G76 at iso process and frequency, at similar area configurations, is said to improve the fps/mm² metric by 39% in Manhattan 3.0 and thanks to the improvements in the geometry pipelines, a significant 65% in Car Chase. The casual gaming benchmark here depicts a simpler fill-rate bound workload such as Angry Birds and Candy Crush.

In terms of power efficiency, the metrics presented here depict the performance improvement at ISO process node and frequency, at ISO power values of peak power coming in at a target 2.3W GPU power only. We’re to be reminded that for 3D workloads there’s significant power overhead from the memory subsystem and DRAM and that’s why this figure is lower than what I’ve usually published in terms of total platform active power in the past.

In general the figures that we’re looking at in terms of improvement in common benchmarks like Manhattan are a 1.3x increase in performance at equal power and area, process and frequency.

How this would look in a late 2018 / early 2019 SoC would be something like the following projection:

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
Mali G76MP12 SoC Projection 7/8nm class 69.00 4.08 16.90 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 61.16 5.01 11.99 fps/W
Galaxy S9 (Exynos 9810) 10LPP 46.04 4.08 11.28 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 33.04 4.18 7.90 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 30.98 3.98 7.78 fps/W
Huawei Mate 10 (Kirin 970) 10FF 37.66 6.33 5.94 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W
Galaxy S7 (Exynos 8890) 14LPP 29.41 5.95 4.94 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 14.45 3.47 4.16 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 21.94 5.44 4.03 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 10.37 2.75 3.77 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 32.49 8.63 3.77 fps/W
Huawei P9 (Kirin 955) 16FF+ 10.59 2.98 3.55 fps/W

Arm that’s that the 1.5x target improvement in performance on a future G76 in 7nm would happen thanks to a relative increase of the GPU capabilities scaling from a G72MP18 to a G76MP12. So it seem natural to take the Exynos 9810 as a baseline for the performance projections. Assuming the power target wouldn’t change, we’d see a G76MP12 in the upcoming process node outperforming current generation leader, the Snapdragon 845, by 13% in Manhattan 3.1. Power efficiency at peak performance would also be 47% better.

Obviously the competition won’t be standing still – although Qualcomm had a bit of a misstep in terms of power efficiency in the Adreno 630, it’s possible this will be caught up in the next iteration next year, not to mention that the process node improvements alone would be then sufficient to retake the lead on the GPU side.

End Remarks

All in all, the Mali G76 provides extremely solid advancements – 30% better performance at the same area and power are heavy generational improvements. However while this will greatly improve the competitiveness of Mali GPUs – I don’t think it will be quite sufficient to catch up with the competition.

In terms of the microarchitectural changes, I think Arm did the right choices in terms of consolidating the cores and beefing them up. Currently it seems that the high-core count in Mali GPUs is a two-edged sword; while it does provide extremely fine-grained configuration ability and allows vendors to pick exactly a certain core count that fits their area budget for the GPU, it also causes inevitable overhead.

The Mali G76 proves the kind of improvement that comes from simply avoiding overhead control logic. Arm envisions a MP12 configuration for a flagship SoC and I still quite think this is rather too many cores. Compared to the 4-core Adreno 540, 2-core Adreno 630 or even the 3-core Apple A11 GPU it’s easy to see quite why Mali lags behind in power efficiency and area. I wish that in the future we’ll see another doubling of the computational resources per core as that would bring another large improvement to close the gap to the competition.

For now, I’m looking forward to how the landscape will change with upcoming SoCs and how the G76 will perform in actual silicon.

The Mali G76 µarch - Fine tuning it
Comments Locked

25 Comments

View All Comments

  • ET - Monday, June 4, 2018 - link

    How 'significantly cheaper' would you expect such a card to be compared to a $70 discrete GPU?

    Based on the expected GFXBench score and further extrapolation, the G76MP20 could perform about the same as the 1030, and it's possible that it could work with slower RAM and save there, but still, I don't see how it could be a really successful or high margin product. There would be need for a complete product line reaching significantly higher performance to make this more than a curiosity.
  • eastcoast_pete - Monday, June 4, 2018 - link

    I would really appreciate if you could provide a link to a vendor's site that lists a 1030 card for $ 70. The cheapest I have seen them was for ~ $ 120. If I can get one for $ 70 - we have a deal, even if it is the even further throttled DDR4 version. $ 70 is about what that card is really worth.

    Unrelated to this: My question arose from a situation I believe a number of us have: a HTPC that's otherwise Ok (in my case, around a Haswell i5), but cannot for the life of it decode 2160p HEVC at 30 fps or faster. If nothing else, a 1030 class card does at least have HDMI 2.0 out. For a new build, I would probably give the Ryzen 2400G a spin.
  • ET - Wednesday, June 6, 2018 - link

    I think I can post again. Spam filter blocked me yesterday from posting anything at all. I'll try the part without dollar signs first.

    If you just want video, why would you need a GeForce 1030 level GPU? Video is a different ARM IP anyway, not part of the G76.

    I do see a small market for a very low power USB GPU that's simply a mobile CPU with some low power RAM. All that basically needs is drivers, and preferably BIOS support. That would allow for example creating Ryzen based PCs without having to stick a GPU in the case, and would work for people like you with old hardware who want support for newer standards, including for laptop owners who want video out and for whom a GPU upgrade is impractical.
  • ET - Wednesday, June 6, 2018 - link

    Okay, now for the tricky part.

    I indeed see that the 1030 has gone up in price. I can find it for $ 90 at Amazon and Newegg, so it's not as bad as you say, and there's a DDR4 version for $ 77, which may be okay if what you're looking for is video playback and not 3D performance. However, I don't think a G76 part would solve the GPU market prices problem. If it's good enough, its price will go up like the rest of them. If it's not, its market share will be rather small. I think (as I posted in the other part) that a low power USB card would have a larger market. It would be a more convenient add-on, which could be applied to more configurations.
  • darkich - Friday, June 1, 2018 - link

    16.9fps/W vs 11.9fps/W (Snapdragon 845), and you "don't think it will catch up with the competition".
  • vladx - Friday, June 1, 2018 - link

    Indeed the author/s seem quite biased.
  • Andrei Frumusanu - Saturday, June 2, 2018 - link

    There's a process node difference between that comparison. An eventual Snapdragon 855 will surpass it.
  • vladx - Saturday, June 2, 2018 - link

    Jumping to such conclusions doesn't sit well with being an impartial party.
  • jospoortvliet - Monday, June 4, 2018 - link

    Oh come on you think they should assume the next snapdragon is not improved to be seen as impartial?

    They point out that the projection is that this MALI will be 15% faster than the current snapdragon. But it comes out next year and this will have to compete with the next snapdragon, not the 845. Totally sane to point out that given their history it seems a stretch to same that Qualcomm will only improve their new high end SOC by 15% or less...
  • jospoortvliet - Monday, June 4, 2018 - link

    Same -> assume

Log in

Don't have an account? Sign up now