System & ML Performance

Having investigated the new A13’s CPU performance, it’s time to look at how it performs in some system-level tests. Unfortunately there’s still a frustrating lack of proper system tests for iOS, particularly when it comes to tests like PCMark that would more accurately represent application use-cases. In lieu of that, we have to fall back to browser-based benchmarks. Browser performance is still an important aspect of device performance, as it remains one of the main workloads that put large amounts of stress on the CPU while exhibiting performance characteristics such as performance latency (essentially, responsiveness).

As always, the following benchmarks aren’t just a representation of the hardware capabilities, but also the software optimizations of a phone. iOS13 has again increased browser-based benchmarks performance by roughly 10% in our testing. We’ve gone ahead and updated the performance figures of previous generation iPhones with new scores on iOS13 to have proper Apple-to-Apple comparisons for the new iPhone 11’s.

Speedometer 2.0 - OS WebView

In Speedometer 2.0 we see the new A13 based phones exhibit a 19-20% performance increase compared to the previous generation iPhone XS and the A12. The increase is in-line with Apple’s performance claims. The increase this year is a bit smaller than what we saw last year with the A12, as it seems the main boost to the scores last year was the upgrade to a 128KB L1I cache.

JetStream 2 - OS Webview

JetStream 2 is a newer browser benchmark that was released earlier this year. The test is longer and possibly more complex than Speedometer 2.0 – although we still have to do proper profiling of the workload. The A13’s increases here are about 13%. Apple’s chipsets, CPUs, and custom Javascript engine continue to dominate the mobile benchmarks, posting double the performance we see from the next-best competition.

WebXPRT 3 - OS WebView

Finally WebXPRT represents more of a “scaling” workload that isn’t as steady-state as the previous benchmarks. Still, even here the new iPhones showcase a 18-19% performance increase.

Last year Apple made big changes to the kernel scheduler in iOS12, and vastly shortened the ramp-up time of the CPU DVFS algorithm, decreasing the time the system takes to transition from lower idle frequencies and small cores idle to full performance of the large cores. This resulted in significantly improved device responsiveness across a wide range of past iPhone generations.

Compared to the A12, the A13 doesn’t change all that much in terms of the time it takes to reach the maximum clock-speed of the large Lightning cores, with the CPU core reaching its peak in a little over 100ms.

What does change a lot is the time the workload resides on the smaller Thunder efficiency cores. On the A13 the small cores are ramping up significantly faster than on the A12. There’s also a major change in the scheduler behavior and when the workload migrates from the small cores to the large cores. On the A13 this now happens after around 30ms, while on the A12 this would take up to 54ms. Due to the small cores no longer being able to request higher memory controller performance states on their own, it likely makes sense to migrate to the large cores sooner now in the case of a more demanding workload.

The A13’s Lightning cores are start off at a base frequency of around 910MHz, which is a bit lower than the A12 and its base frequency of 1180MHz. What this means is that Apple has extended the dynamic range of the large cores in the A13 both towards higher performance as well as towards the lower, more efficient frequencies.

Machine Learning Inference Performance

Apple has also claimed to have increased the performance of their neural processor IP block in the A13. To use this unit, you have to make use of the CoreML framework. Unfortunately we don’t have a custom tool for testing this as of yet, so we have to fall back to one of the rare external applications out there which does provide a benchmark for this, and that’s Master Lu’s AIMark.

Like the web-browser workloads, iOS13 has brought performance improvements for past devices, so we’ve rerun the iPhone X and XS scores for proper comparisons to the new iPhone 11.

鲁大师 / Master Lu - AIMark 3 - InceptionV3 鲁大师 / Master Lu - AIMark 3 - ResNet34 鲁大师 / Master Lu - AIMark 3 - MobileNet-SSD 鲁大师 / Master Lu - AIMark 3 - DeepLabV3

The improvements for the iPhone 11 and the new A13 vary depending on the model and workload. For the classical models such as InceptionV3 and ResNet34, we’re seeing 23-29% improvements in the inference rate. MobileNet-SSD sees are more limited 17% increase, while DeepLabV3 sees a major increase of 48%.

Generally, the issue of running machine learning benchmarks is that it’s running through an abstraction layer, in this case which is CoreML. We don’t have guarantees on how much of the model is actually being run on the NPU versus the CPU and GPU, as things can differ a lot depending on the ML drivers of the device.

Nevertheless, the A13 and iPhone 11 here are very competitive and provide good iterative performance boosts for this generation.

Performance Conclusion

Overall, performance on the iPhone 11s is excellent, as we've come to expect time and time again from Apple. With that said, however, I can’t really say that I notice too much of a difference to the iPhone XS in daily usage. So while the A13 delivers class leading performance, it's probably not going to be very compelling for users coming from last year's A12 devices; the bigger impact will be felt coming from older devices. Otherwise, with this much horsepower I feel like the user experience would benefit significantly more from an option to accelerate application and system animations, or rather even just turn them off completely, in order to really feel the proper snappiness of the hardware.

SPEC2006 Perf: Desktop Levels, New Mobile Power Heights GPU Performance & Power
Comments Locked

242 Comments

View All Comments

  • Henk Poley - Saturday, October 19, 2019 - link

    Does the A13 have more security features, such as the pointer encryption that was added with the A12 (essentially binding pointers to their origin (e.g. processes)) ? It was kinda interesting that the recent mass exploitation of iPhones uncovered, didn't touch any of the A12 iDevices (and neither does jailbreaks).
  • techsorz - Sunday, October 20, 2019 - link

    I'm sorry Anandtech, but your GPU review is absolutely horrendous. You are using 3Dmark on iOS, which hasn't recieved an update since IOS 10 and then compare it to the Android version which was updated June 2019. There is a reason you are getting conflicted results when you switch over to GFXbench, which was updated on iOS in 2018. How this didn't make you wonder, is amazing.
  • Andrei Frumusanu - Sunday, October 20, 2019 - link

    The 3D workloads do not get updated between the update versions, so your whole logic is moot.
  • techsorz - Sunday, October 20, 2019 - link

    Are you kidding me? The load won't change, but the score sure will. It makes it look like the iPhone throttles much more than it does in reality. That the score is 50% less due to unoptimized garbage does not mean that the chipset actually throttled with 50%.

    I can't believe that I have to explain this to you, 3Dmark supports an operative system that is 3 years old, for all we know it is running in compatibility mode and is emulated.
  • Andrei Frumusanu - Sunday, October 20, 2019 - link

    Explain to me how the score will change if the workload doesn't change? That makes absolutely zero sense.

    You're just spouting gibberish with stuff as compatibility mode or emulation as those things don't even exist - the workload is running on Metal and the iOS version is irrelevant in that regard.
  • techsorz - Monday, October 21, 2019 - link

    In computing you have what is called a low-level 3D API. This is what Metal and DirectX is. This is what controls how efficiently you use the hardware you have available. If you have a new version of this API in say, IOS 13, and you run an iOS 10 application, you will run into compatibility issues. These issues can degrade performance without it being proportional to the actual throttling taking place. On android however, it is compatible with the latest low-level API's as well as various performance modes.

    The hillarious thing is that Anandtech even contradict themselves, using an "only" 1 year outdated benchmark, where the iPhone suddenly throttles less at full load. This entire article is just a box full of fail, if you want to educate yourself, I suggest you watch Speedtest G on Youtube. Or Gary Explains. He has a video on both 'REAL' iOS and Android throttling, done using the latest version of their respective API
  • Andrei Frumusanu - Monday, October 21, 2019 - link

    > If you have a new version of this API in say, IOS 13, and you run an iOS 10 application, you will run into compatibility issues. These issues can degrade performance without it being proportional to the actual throttling taking place. On android however, it is compatible with the latest low-level API's as well as various performance modes.

    Complete and utter nonsense. You literally have no idea what you're talking about.
  • techsorz - Monday, October 21, 2019 - link

    How about you provide a proper response instead of saying it's nonsense. How can the throttling be different at full load on 2 different benchmarks otherwhise? There is clearly no connection between actual throttling and the score itself. You are literally contradicting yourself in your own review.
  • Andrei Frumusanu - Monday, October 21, 2019 - link

    A proper response to what exactly? Until now all you managed to do is complain is that the test is somehow broken and wrong and I need to educate myself.

    The whole thing has absolutely nothing to do with software versions or OS version or whatever other thing. The peak and sustained scores are performed with the same workloads and nothing other than the phone's temperature has changed - the % throttling is a physical attribute of the phone, the benchmark doesn't decide to suddenly throttle more on one benchmark more than the other simply because it's somehow been released a few years ago.

    The throttling is different on the different tests *because they are different workloads*. 3DMark and Aztec High will put very high stress the ALUs on the GPU, more than the other tests and create more heat on and hotspot temperatures the GPU, resulting into more throttling in and reduced frequencies those tests. T-Rex for example will be less taxing on the GPU in terms of its computation blocks have more load spread out to the CPU and DRAM, also spreading out temperature, and that's why it throttles the least amount.
  • techsorz - Monday, October 21, 2019 - link

    Thank you for your informative reply. Then, is it crazy to assume that 3-year-old 3Dmark benchmark is not providing the same workload as the 2019 version on Android? Maybe you could run an outdated buggy benchmark on a rog 2 as well and it would stress the ALU even more? Possibly, the rog 2 is getting a much more sensible workload while the iPhone is getting unrealistic loads that don't utilize the archiecture at all. In which case, it is pretty unfair and misleading. It's like taking a car and only testing 1 wheel and the other cars get to use all 4.

Log in

Don't have an account? Sign up now