Performance Targets: 30% Better PPC and Efficiency

On paper, the new Valhall architecture and the new Mali-G77 certainly seem like big changes, but what will be more important is to see what the performance, efficiency, and area claims are.

Arm’s performance claims are interesting because they’re being published in a performance per mm² basis. Due to how vendors implement their GPUs, in which they can vary core count as well as frequency, it’s hard to actually give a clear figure which describes the improvements between two discrete GPU configurations. In the case of the G77, Arm claims that the new IP is able to provide from 1.2x to 1.4x the performance/mm² compared to the G76. In absolute terms a G77 shader core is said to be about the same size as a G76 core.

What this means is that this could be directly translated into either a smaller GPU for the vendor, or simply more space to add in additional GPU cores and consequently increase the performance. Particularly Arm claims the G77 does very well in texture heavy games, so that will be something interesting to see in once devices actually come out and how different workloads will behave.

Another way to increase performance is to clock the GPU higher. Here the fundamental limit is the 4-5W TDP thermal envelope of smartphones. In a comparison at ISO-process and performance, the new G77 is said to use between 17% and 29% less energy and power to complete the same workloads. Or in other words, the performance/W is 1.2x to 1.39x better. Arm states that fundamentally frequency between the G76 and G77 shouldn’t change much at all, an internally Arm still targets and 850MHz sign-off.

This year I’m not going to attempt any performance and efficiency projections as there’s just too many variables at play. Among one of the larger changes for next year is that I’m expecting SoCs to support LPDDR5 which likely will change the power dynamics in smartphones by some notable margin.

Arm does note that they are expecting 1.4x performance jumps in next year’s SoCs with the G77. Using Samsung’s Exynos 9820 as the reference G76 implementation, this would mean that a future G77 SoC would come close with Apple’s A12 GPU’s performance at better power efficiency (assuming power levels are maintained). This would put Qualcomm in trouble as it would be a clear jump ahead of the current generation Adreno 640, however we expect Qualcomm to follow-up with a new generation GPU as well.

Machine learning performance of the G77 is something that Arm is quite proud of. Here it’s not just the fact that the cores have 33% more processing units, but also the much improved design of the LSC and its bandwidth that pushes inferencing performance of the G77 at an average of 1.6x the G76.

Finally, Arm made a generational comparison between the last two generation Mali GPUs. On the same process and at the same performance, the new G77 continues on track to 30% year-on-year energy efficiency improvements, and uses 50% less energy than a Mali-G72.

Conclusion & End Remarks

During the TechDay event Arm was clearly very excited about the new Valhall architecture and the new Mali-G77. There’s very much reason to be excited as it seems like Arm is about to showcase a significant generational jump in its Mali GPU IP.

The new G77 employs a brand-new architecture that fundamentally revamps Arm’s execution core, aiming for something more modern and in line with desktop GPU architectures. This seems like a shift that was long coming, as while the G76 was a relatively good GPU, the previous generation G72 and G71 weren’t.

I’m expecting to see the Mali-G77 in the next generation of Samsung Exynos and HiSilicon Kirin SoCs later this year and early next year. On paper and if everything goes right, the G77 should be able to close the performance and efficiency gap to Apple and Qualcomm. In particular the G77 should be able to leap ahead of Qualcomm’s Adreno GPUs, that is, at least against the current-generation.

I’m fairly optimistic, and now Arm as well as the partner licensees just need to execute properly for users to be able to enjoy the end-results.

Quad-Texture Mapper, Better Load/Store, GPU Scaling
POST A COMMENT

42 Comments

View All Comments

  • darkich - Monday, May 27, 2019 - link

    40% more performance just from design improvements?
    That's ridiculous, if true..
    Reply
  • spaceship9876 - Monday, May 27, 2019 - link

    I really hope they release a Mali-G32 replacement for the G31 with this new architecture, a smaller die with lower power consumption and better performance would be great for entry level phones. Reply
  • KECHEES - Tuesday, May 28, 2019 - link

    And come to think of it, The other Mali gpu was fab on 8nm. so given that 7nm euv is supposedly 50% more efficient, we should be looking at a staggering performance improvement that's way above Arm's 40% target Reply
  • ballsystemlord - Monday, May 27, 2019 - link

    Spelling and grammar corrections (Hint: have someone read what you're writing so that you don't make so many dumb mistakes).

    "Valhall and the new Mali-G77 follow up on the last three generation of Mali GPUs with some significant improvements in performance,..."
    Missing s:
    "Valhall and the new Mali-G77 follow up on the last three generations of Mali GPUs with some significant improvements in performance,..."

    "...the new ISA is said to be more compiler friendly and adapted and designed to better aligned with modern APIs such as Vulkan."
    Missing "be":
    "...the new ISA is said to be more compiler friendly and adapted and designed to be better aligned with modern APIs such as Vulkan."

    "Dwelling deeper into the structure of the execution engine,..."
    Very awkward, try delving:
    "Delving deeper into the structure of the execution engine,..."

    "One single has more instances on the primary datapath, and less instances of the control and I-cache,..."
    Single what, engine? Maybe "EE"?
    "One single EE has more instances on the primary datapath, and less instances of the control and I-cache,..."

    "On the hit-path, the texture cache itself has been improved and is now 32KB and is able of 16 texels/cycle throughput."
    Missing words, maybe:
    "On the hit-path, the texture cache itself has been improved and is now 32KB and is able to process 16 texels/cycle throughput."

    "Arm states that fundamentally frequency between the G76 and G77 shouldn't change much at all, an internally Arm still targets an 850MHz sign-off."
    "and" not "an"
    "Arm states that fundamentally frequency between the G76 and G77 shouldn't change much at all, and internally Arm still targets an 850MHz sign-off."
    Reply
  • warreo - Monday, May 27, 2019 - link

    Not to say we should excuse journalists for less than stellar writing, but having read his stuff for a long time, with Andrei you have to accept the good (technical expertise) with the "could use improvement" (writing/word choice). There's no one out there that offers the kind of analysis and insights Andrei does, so as a reader I continue to read his articles with great interest and don't let the typos and writing bother me. Reply
  • phoenix_rizzen - Tuesday, May 28, 2019 - link

    I don't mind the typos and wording issues and grammar issues ... if this was a blog where the content was written and posted directly by the author.

    What really bugs me is that Anandtech (and Ars, and other news sites) supposedly have editors on staff, yet these issues still slip through. :( There was a time when articles would pass through two or three stages of proofing to make sure these kinds of things didn't make it to press. But, it seems even for-pay "newspapers" these days are lacking in the QA/proofing department, so there's not much we can expect from for-free news sites. :(
    Reply
  • Andrei Frumusanu - Tuesday, May 28, 2019 - link

    Thanks for the corrections. Reply
  • eastcoast_pete - Monday, May 27, 2019 - link

    As mentioned in my post on Andrei's A77 article, I believe that at least some of these efforts are also to help establish ARM's designs as believable competition in the ultraportable space. With the graphics, that won't apply to Qualcomm, but is vital for Huawei and Samsung, as they rely on ARM-designed GPUs. A hexa- or octacore A77 with 12 or 16 of these might just be able to go head-to-head with Intel's low power chips. Reply
  • Andrei Frumusanu - Tuesday, May 28, 2019 - link

    Currently the big issue with Mali and ultra-portable is the fact that Arm has no plans for Windows drivers. Thus aside from ChromeOS devices, they're not really targeting that form-factor as much on the GPU as they are on the CPU (because Qualcomm uses the CPU). Reply
  • eastcoast_pete - Wednesday, May 29, 2019 - link

    Andrei, that's an important point. Also shows that MS is not as full-throated in its Windows-on-ARM as they let on. While I believe that some of the existing graphics support in Windows for QC's Adreno House is due to QC doing a lot of the heavy lifting, I don't believe that ARM would say no to a collaborative effort with MS to get MALI supported in Windows. Reply

Log in

Don't have an account? Sign up now