GPU Performance

The performance improvement of the A12 GPU was one of the biggest highlights of the keynote presentation, promising up to 50% higher performance versus the A11 GPU. Apple has achieved this by "simply" adding in a fourth GPU core, up from three on the A11, and by introducing memory compression on the GPU. The memory compression is what I think the most contributing factor to the increased microarchitectural performance of the GPU, as it really is a huge one-time shift, which admittedly, took Apple a long time to make.

One thing that I’d like to mention before going into the benchmarks, is that peak performance and peak power consumption of the latest Apple GPUs is a problem. We’ve seen Apple transition from promoting its sustained performance over time, to actually being one of the worst “offenders” in terms of actual performance degradation from the peak capabilities of the SoC. There’s reasons to this, but I’ll be addressing them shortly.

3DMark Sling Shot 3.1 Extreme Unlimited - Physics

In the 3DMark Physics test, which is mostly a CPU-bound test that also stresses the overall platform power limits while the GPU is also doing work, we see the iPhone XS and the A12 achieve some great gains over last year’s iPhone. This had been a test that in the past had been particularly problematic for Apple CPUs, however it seems that this microarchitectural hiccup was solved in the A11 and the Monsoon cores. The Vortex cores along with the generally improved power efficiency of the SoC further raises the performance, finally matching the Arm’s cores in this particular test.

3DMark Sling Shot 3.1 Extreme Unlimited - Graphics

In the Graphics part of the 3DMark test, the iPhone XS showcases 41% better sustained performance over last year’s iPhone X. In this particular test, the OnePlus 6’s more generous thermals still allow the Snapdragon 845 to outperform the new chip.

In terms of peak performance, I encountered some great issues in 3DMark: I was completely unable to complete a single run on either the iPhone XS or XS Max while the devices were cool. If the device is cool enough, the GPU will boost to such high performance states that it will actually crash. I was consistently able to reproduce this over and over again. I attempted to measure power during this test, and the platform had instantaneous average power of 7-8 watts, figures above this which I suspect weren’t recorded by my measurement methodology. For the GPU to crash, it means that the power delivery is failing to deliver the necessary transient currents during operation and we’ll see a voltage dip that corrupts the GPU.

When iterating the test several times over a few attempts, in order to heat up the SoC until it decides to start off with a lower GPU frequency, it will successfully complete the test.

GFXBench

Kishonti most recently released the new GFXBench 5 Aztec Ruins test, which brings a newer, more modern, and complex workload to our test suite. In an ideal world we would be testing real games, however this is an incredible headache on mobile devices as there are essentially no games with built-in benchmarking modes. There are some tools to gather fps values, but the biggest concern here is repeatability of the workload when one manually plays the game – also a huge concern for many of the online games of today.

GFXBench Sub-Tests
AnandTech Aztec
High
Aztec
Normal
Manhattan
3.1
T-Rex
Scene length 64.3s 64.3s 62s 56s
Resolution 2560 x 1440 1920 x 1080 1920 x 1080 1920 x 1080
Compute Shaded Pixels ~1.5% of work ~1.5% of work ~3% of work ~2.4% of work
Total Shaded Pixels ~5.80M / frame
~161% of scene
~2.64M / frame
~127% of scene
~1.90M / frame
~92% of scene
~0.65M / frame
~31% of scene
Av Triangles Per Frame ~440K ~207K ~244K ~724K
Memory B/W Per
Frame

(Mali G72 GPU specific)
VK 652MB
(413R + 239W) 
268MB
(160R + 107W) 
135MB
(88R + 46W)
73MB
(51R + 22W)
GL 514MB
(331R + 182W)
242MB
(154R + 87W)

I still think synthetic benchmark testing has a very solid place here – as long as you understand the characteristics of the benchmark. Kishonti’s GFXBench here has been an industry standard for years now, and the new Aztec test gives us a different kind of workload. The new tests are a lot more shader heavy, making use of more complex effects which stress the arithmetic power of the GPUs. While the data in the above table has been collected on an Arm Mali G72 GPU – it still should give an overall indication of what to expect on other architectures. The new tests are also very bandwidth hungry due to their larger textures.

In general games will correlate with benchmarks depending on the percentage of the various graphical workloads, being fillrate or texture heavy, having complex geometries, or simply the ever more increasing complexity of shader effects which demand more arithmetic power of a GPU.

GFXBench Aztec Ruins - Normal - Vulkan/Metal - Off-screen

In Aztec Ruins in Normal mode, which is the less demanding new test, the new Apple A12 phones showcase some extremely high peak performance, showcasing a 51% increase over last year’s iPhones.

In terms of sustained performance, the figures quickly reduce after a few minutes and stabilise further down the road. Here, the iPhone XS outperforms the iPhone X by 61%. The Apple A12 is also able to beat the current leader, the Snapdragon 845 inside the OnePlus 6, by 45% in sustained performance.

GFXBench Aztec Ruins - High - Vulkan/Metal - Off-screen

In the High mode of Aztec Ruins, we’re seeing an eerily similar performance ranking. The iPhone XS’s peak performance is again great, but what should matter is the sustained score. Here again the iPhone XS’s performance is 61% better over the iPhone X. The performance delta to the OnePlus 6’s Snapdragon 845 is reduced to 31% here, which is a tad less than the Normal run, it’s possible we’re hitting some bottlenecks here in some aspects of the microarchitecture.

GPU Power

Platform and GPU power for Apple devices has been something I wanted to publish for some time, but there complexities in achieving this. I was able to get reasonable figures for the new iPhone XS – however data on older SoCs is still something that might have to wait for a future opportunity.

I haven’t had time to measure Aztec across the swath of devices, so we’re still relying on the standard Manhattan 3.1 and T-Rex figures. First off, to get the full performance figures out of the way:

GFXBench Manhattan 3.1 Off-screen

Again in Manhattan 3.1, the new iPhone XS performs an extraordinary 75% better than the iPhone X. The improvements here are not just because of the microarchitectural improvements of the GPU, and having an extra core, all along with the new process node of the SoC, but also thanks to the new memory compression which will reduce power consumption of the external DRAM, something that can represent up to 20-30% of system power in bandwidth heavy 3D workloads. Saved power on the DRAM means more thermal envelope that can be used by the GPU and SoC, increasing performance.

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone XS (A12) Warm 7FF 76.51 3.79 20.18 fps/W
iPhone XS (A12) Cold / Peak 7FF 103.83 5.98 17.36 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 61.16 5.01 11.99 fps/W
Galaxy S9 (Exynos 9810) 10LPP 46.04 4.08 11.28 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 33.04 4.18 7.90 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 30.98 3.98 7.78 fps/W
Huawei Mate 10 (Kirin 970) 10FF 37.66 6.33 5.94 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W
Galaxy S7 (Exynos 8890) 14LPP 29.41 5.95 4.94 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 14.45 3.47 4.16 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 21.94 5.44 4.03 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 10.37 2.75 3.77 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 32.49 8.63 3.77 fps/W
Huawei P9 (Kirin 955) 16FF+ 10.59 2.98 3.55 fps/W

The power figures here are system active power, meaning the total device power, minus the idle power of a given workload scenario (Which includes screen power among other things).

At peak performance, when the device is cool under 22°C ambient temperatures, the Apple A12’s GPU can get quite power hungry, reaching 6W of power. This wasn’t really the peak average of the GPU as I did mention that I saw 3DMark reach around 7.5W (before crashing).

Even at this high power figure, the efficiency of the A12 beats all other SoCs. While this is somewhat interesting, it’s incredibly important to emphasise Apple’s throttling behaviour. After only 3 minutes, or 3 benchmark runs, the phone will throttle by around 25%, to what I describe in the efficiency table as the “Warm” state. Here power reaches reasonable 3.79W. It’s to be noted that the power efficiency did not drastically go up, only improving by 16% over the peak figures. What this could point out is that the platform has a relatively shallow power curve, and performance is mostly limited by thermals.

GFXBench T-Rex 2.7 Off-screen

Moving on to T-Rex, again the iPhone XS showcased a similar 61% improvement in sustained performance.

GFXBench T-Rex Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
iPhone XS (A12) Warm 7FF 197.80 3.95 50.07 fps/W
iPhone XS (A12) Cold / Peak 7FF 271.86 6.10 44.56 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 150.40 4.42 34.00 fps/W
Galaxy S9 (Exynos 9810) 10LPP 141.91 4.34 32.67 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 108.20 3.45 31.31 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 94.97 3.91 24.26 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 90.59 4.18 21.67 fps/W
Galaxy S8 (Exynos 8895) 10LPE 121.00 5.86 20.65 fps/W
Galaxy S7 (Exynos 8890) 14LPP 87.00 4.70 18.51 fps/W
Huawei Mate 10 (Kirin 970) 10FF 127.25 7.93 16.04 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 55.67 3.83 14.54 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 58.97 4.70 12.54 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 41.69 3.58 11.64 fps/W
Huawei P9 (Kirin 955) 16FF+ 40.42 3.68 10.98 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 99.16 9.51 10.42 fps/W

Power consumption for T-Rex is in-line with what we saw in Manhattan, with the peak figures on a cold device reaching a little over 6W. After 3 runs, this again reduces to under 4W, at a 28% reduction in performance. Efficiency again doesn’t improve by much here, pointing out to a shallow power curve again.

It’s to be noted that the power measurements of the “Warm” runs don’t represent sustained performance, and I simply wanted to add an additional data-point to the table alongside the peak figures. Sustained power envelopes for most devices are in the 3-3.5W range.

So why does Apple post such big discrepancies between peak performance and sustained performance, when the latter was a keynote focus point for Apple as recent as the iPhone 6 and the A8? The change is due to how everyday GPU use-cases have changed, and how Apple uses the GPU for non 3D related workloads.

Apple makes heavy use of GPU compute for various uses, such as general hardware acceleration in apps to using the GPU compute for camera image processing. These are use-cases where sustained performance doesn’t really matter because they’re transactional workloads, meaning fixed workloads that need to be processed as fast as possible.

Android GPU compute has been a literal disaster over the last few years, and I primarily blame Google for not supporting OpenCL in AOSP – leaving support to be extremely patchy among vendors. RenderScript has never picked up much as it just doesn’t guarantee performance. The fragmentation of Android devices and SoCs has meant that in third-party apps GPU compute is essentially non-existent (Please correct me if I’m wrong!).

Apple’s vertical integration and tight control of the API stack means that GPU compute is a reality, and peak transactional GPU performance is a metric that is worth consideration.

Now while this does explain the throttling, I still do think Apple can do some kind of optimisation in regards to the thermals. I played some Fortnite on the iPhone XS’, and the way that the phones heated up isn’t something that I was very much fan of. Here the must be some kind of way to let actual games and applications which have a characteristic of sustained performance, actually start off with the GPU limited to this sustained performance state.

Other than the thermal and peak performance considerations, the iPhone XS and XS Max, thanks to the new A12 SoC, showcase industry leading performance and efficiency, and currently are the best mobile platforms for gaming, period.

System Performance & iOS12 Improvements Display Measurement & Power
Comments Locked

253 Comments

View All Comments

  • Speedfriend - Monday, October 8, 2018 - link

    So you would expect them to use that powerful SOC to deliver real battery improvements, but somehow they can't. No one I speak to complains that their modern smartphone is slow, but everyone complains about battery life.
  • melgross - Saturday, October 6, 2018 - link

    It’s both. The deep dive isolates the SoC to a great extent. It can be done with any phone.
  • eastcoast_pete - Friday, October 5, 2018 - link

    Andrei, thanks for the review! Yes, these are outstanding phones at outrageous prices. I appreciate the in-depth testing and detailed background, especially on the A12's CPU and GPU. While I don't own an iPhone and don't like iOS, I also believe that, phone-wise, the XS and XS Max are the new kings of the hill. The A12's performance is certainly in PC laptop class, and I wonder if (or how) the recent Apple-Qualcomm spat that kept QC's modem tech out of the new iPhones has helped Intel to keep its status as CPU provider for Apple's laptops, at least for now.
    One final comment, and one question: Andrei, I agree with you 100% that Apple missed an opportunity when they decided on a rather middling battery capacity for the XS Max. If I buy a big phone, I expect a big battery. Give the XS Max a 5000 mAh or larger battery, and it really is "the Max", at least among iPhones. At that size, a few mm additional thickness are not as important as run time. Maybe Apple kept that upgrade for its mid-cycle refresh next year - look, bigger batteries.
    @Andrei. I remember reading somewhere that the iPhone X and 8 used 128 bit wide memory buses. Questions: Is that the case here, and how does the memory system and bus compare to Android phones? And, in your estimate, how much of the A12's speed advantages are due to Apple's memory speeds and bus width ?
  • dudedud - Friday, October 5, 2018 - link

    I was sure that only the A$X were 128bit, but i would also want to know if this had changed.
  • RSAUser - Saturday, October 6, 2018 - link

    A12 is definitely not in the laptop class unless you're looking at the extreme low power usage tier.

    Just because it is quite a but faster than the equivalent on mobile does Not mean it can compete at a different power envelope. If that were true, Intel would already have dominated the SoC market. It requires a completely different CPU design. It's wwhy they can use it for the touchbar on the macbook but not as a main processor.
  • ws3 - Sunday, October 7, 2018 - link

    This review did not compare the A12 with “mobile” Intel chips but rather with server chips. The A12 is on par with Skylake server CPUs on a single threaded basis. Let that sink in.

    As to why Intel doesn’t dominate the SoC space, Intel’s designs haven’t been energy efficient enougj and also the x86 instruction set offers no advantage on mobile.
  • tipoo - Thursday, October 18, 2018 - link

    It's already competing with laptop and desktop class chips, not just mobile fare. It's up there per core with Skylake server, and NOT normalized per clock, just core vs core.

    It's like people don't read these articles year over year and are still using lines from when A7 arrived...
  • tipoo - Thursday, October 18, 2018 - link

    Only the A10X and A8X were 128 bit, on mobile that's still power limited for memory bandwidth.
  • juicytuna - Friday, October 5, 2018 - link

    Apple's big cores are like magic at this point. 2-3x the performance per watt of the nearest competitors is just ridiculous.
  • sing_electric - Friday, October 5, 2018 - link

    I know this is almost a side point, but this really goes to show what a mess Android (Google/Qualcomm) is compared to iOS. At the rate Snapdragon is improving, it'll be 2020/2021 before Qualcomm sells a chip as fast as 2017's A11, and Google is shooting itself in the foot by not having APIs available that take advantage of Snapdragon's (relative) GPU strength.

    That's on top of other long-term Android issues (like how in 2018, Android phones still can't handle a 1:1 match of finger movement to scrolling, which the iPhone could in 2008). Honestly, if I wasn't so invested in Android at this point, I really consider switching now.

Log in

Don't have an account? Sign up now