Small Performance Improvements - Uncertain Projections

Summing up all the different microarchitectural advancements, Arm presents with us the different performance improvements we can expect of the Mali-G78:

On the part of the asynchronous top-level performance improvements the GPU can achieve by improving the geometry to shader core capabilities, Arm projects to see a roughly 8% boost in benchmarks, with a larger ~14% boost in some game titles.

These improvements are quite small, but from a SoC vendor perspective I suppose it wouldn’t be too complicated to implement this, as it would only cost an additional PLL or just a frequency divider in order to achieve the extra performance.

The generational power efficiency improvements of the G78 over the G77 in a similar configuration are 10%, likely attributed to the FMA and cache improvements of the core. It’s small, but we take what we can get.

The async feature from an energy efficiency perspective is proclaimed to be around 6-13% depending on the workload. This is actually a bit of a more complex figure in my view. The main problem in my view is that to achieve this, the SoC vendor needs to actually go ahead and employ a second voltage rail for the GPU to gain the most benefit of the asynchronous frequencies. The efficiency benefit here is small enough, that it begs the question if it’s not just cheaper to add in a few more extra cores and lock them lower, rather than incurring the cost of the extra PMIC rail, inductors and capacitors. It’s an easy efficiency gain for flagship SoCs, but I’m really wondering what vendors will be deploying in the mid-range and lower.

Mali-G68 GPU: It's the same

Alongside the Mali-G78, Arm is today also announcing the new Mali-G68 GPU:

You might be wondering why I’m including this as a footnote at the end of the article rather than covering it in more detail. The truth is, this is the exact same IP as the Mali-G78, with the only difference being that this GPU configuration only scales up to 6 cores. In essence, if the microarchitecture is implemented with up to 6 cores, it’s branded as a G68, and if uses 7 or more cores, it’s branded as a G78.

Arm actually had used this marketing with the G57, which ended up being actually the same IP as the G77, leading to some confusion with the MediaTek Dimensity 800 SoC that was announced earlier this year. We had called that GPU as a derivative of the G77 until MediaTek had reached out to us to point out that it’s actually the same GPU.

It’s pretty disappointing to see Arm do such marketing exercises, as it can be technically misleading. We asked what their rationale is, and they explained that it’s actually a customer demand for them to better differentiate their products. It’s a somewhat credible argument, but on the other hand we’ve had MediaTek outright want to point out to us this misleading branding, so it seems that not everybody is on the same page on the matter.

Arm does say that they possibly envision that future iterations in this series might actually see real microarchitectural differentiations compared to the bigger implementations. In that scenario, the branding at least would make more sense.

Mali-G78: Meagre improvements, or just bad vendor implementations?

If you didn’t already catch on until now, I’m feeling quite pessimistic about the Mali-G78. First of all, it’s just not that big of a generational upgrade compared to the Mali-G77, even by Arm’s own standards and advertised figures.

You could forgive the smaller upgrades if we had started from an excellent baseline performance. The Mali-G77 promised a whole ton of improvements in both performance and efficiency. The actual results we’ve seen out of the Exynos 990 and the MediaTek D1000 were anything but stellar. On one hand we had a SoC which seemingly had a bad implementation on a seemingly immature process node, and on the other hand we had some very mid-range performance even though it was an MP9 GPU configuration. Truth is, we still don’t know if the Mali-G77 is a good GPU or not, as we simply haven’t seen a good implementation out there. If we don’t know if the G77 is good or not, then it’s also impossible to project if the G78 will be any good.

I see Arm having the exact same problem they’ve been facing in the CPU space until the just announced Cortex-X1, as in they’re stuck with having to design a scalable GPU that fits all target markets and having to please all customer design points. Technically, that’s never the best option, as you end up with something that always has compromises.

As for potential implementers of the G78, amongst the biggest vendors it’s likely HiSilicon to be the first adopter – if they can manage to bring out the new Kirin chipsets out to market amidst the current political situation. Whether Samsung and AMD will manage to bring out an RDNA based mobile Exynos next year is also still unclear, though I’m sure that’s what they’re striving for. The biggest issue on the competitive landscape is Apple. Even if the G77 had managed to live up to its projections, the G78 certainly is showcasing too meagre improvements to be able to catch up to the Apple GPUs. We’re also supposed to be seeing the first Imagination A-series GPU SoC designs later this year which is a whole other wildcard. That’s a very tough competitive landscape for Mali – let’s hope the G78 will see more positive success in the future.

More Scaling, Different Frequency Domains
Comments Locked

36 Comments

View All Comments

  • tkSteveFOX - Wednesday, May 27, 2020 - link

    Apart from MTK and Huawei most will drop using Mali Cores as the architecture doesn't scale well at all.
    Anything over 7-8 cores and you start to lose performance and get the consumption up.
    When Samsung finally unveil their RDNA powered GPU, even Apple's cores might lose their crown.
    I doubt it will be very power efficient though, just like Apple's.
  • lightningz71 - Wednesday, May 27, 2020 - link

    Haven't the various mobile RISC cores gotten very close to hitting the wall with respect to memory bandwidth? Feeding the G78 in a full-house config with enough data to allow it to reach it's full throughput potential would require fairly massive amounts of RAM bandwidth. All that bandwidth will require some very wide channels and a lot of memory ICs on the phone motherboards, or, it'll require some quite power hungry HBM stacks. At best, we get a couple of channels of low power DRAM that spends as much time as possible in low power mode. I just don't see it being very useful on a mobile device. At the very best, if it's used in an ARM Windows laptop, and if it gets a solid memory subsystem attached to it, it MAY be competitive with other iGPU solutions available in the market. However, once you go down that road, you have to ask yourself, is it worth putting that many resources into the CPU and its memory subsystem when there are available low power dGPU solutions out there that will still run rings around it in performance and not cost any more per unit to integrate into your solution? Even if it costs a bit more power to do so, in a laptop, you have a much larger form factor and much larger power budgets to play with.
  • ballsystemlord - Thursday, May 28, 2020 - link

    Spelling error:

    "The core's cache shave also had they cache maintenance algorithms improved with better dependency tracking,..."
    "the" not "they":
    "The core's cache shave also had the cache maintenance algorithms improved with better dependency tracking,..."
  • Lobstermobster - Saturday, June 6, 2020 - link

    How can we compare this new mobile GPU to others made by Qualcomm, Nvidia and Imagination? How many teraflops do these mobile GPUs have? I know the Switch uses a Tegra chip that can go up to 1 teraflops in dock mode
  • iphonebestgamephone - Sunday, June 7, 2020 - link

    Whats the use of knowing the flops anyway.
  • IUU - Friday, October 2, 2020 - link

    "Whats the use of knowing the flops anyway." I believe it is one of the most important metrics to know. Because a chip will always perform a certain percentage of its theoretical performance, often about 60 to 70% of theoretical. So , if a chip's theoretical performance is say X5 compared to another chip, no-one can fool you with the usual nonsense, "yes but it is real world performance that matters" . Because a x5 theoretical performance wins hands down in real world scenarios, no matter what marketing gimicks would want you to believe.

    That said , just consider , the modern fashion of hiding details about architecture , of a lot of companies, lately even by Intel, and you will see , there is an effort to go by marketing only to hide potential weaknesses.

Log in

Don't have an account? Sign up now