System Performance - Slightly Underwhelming?

While synthetic steady state workloads are one thing, real-world workloads are more transactional and their performance is determined not just by hardware, but as well by software. Here things like the CPU scheduler and OS APIs can have a big effect on the resulting perceived performance of a device.

PCMark Work 2.0 - Web Browsing 2.0

Starting off with PCMark’s Web Browsing 2.0 test, the Snapdragon 855 goes off to a bad start. Here for some reason the S855 QRD wasn’t able to distinguish itself from the lower end of Snapdragon 845 devices – here we had expected the phone to perform and compete similarly to the Kirin 980 in the Mate 20’s.

PCMark Work 2.0 - Video Editing

The video editing score is again also quite mediocre, but again the reason for this is that this test has largely reached a performance plateau where most of today’s devices no longer really showcase meaningful differences between each other.

PCMark Work 2.0 - Writing 2.0

The writing sub-test is among one of the most important in PCMark, and luckily here the Snapdragon 855 QRD performed as expected as it’s within range of the Mate 20’s.

PCMark Work 2.0 - Photo Editing 2.0

The photo editing sub-test is characterised by shorter heavy RenderScript workload bursts. The QRD performs well, although it’s within the results of the top Snapdragon 845 devices.

PCMark Work 2.0 - Data Manipulation

Finally in the data manipulation result which is more single-thread bound, we see the Snapdragon 855 perform well, but still remains neck-and-neck with the Kirin 980 devices as well as behind the Pixel 3’s very aggressive scheduler implementation.

PCMark Work 2.0 - Performance

Overall, the Snapdragon 855 QRD in PCMark ended up among the top scorers, however I found the result to be a bit disappointing as it doesn’t appear to achieve a higher ranking than the Pixel 3, and Huawei’s Kirin 980 Mate 20’s are also ahead.

Speedometer 2.0 - OS WebViewWebXPRT 3 - OS WebView

I’ve discussed the results with Qualcomm, and they were surprised to see the numbers end up like this. They stated that it’s something they will look into, and stated that it’s possible that the scheduler and software stack on commercial devices might improve performance. Something to be revisited once we get our hands on the first phones.

The web-based benchmarks such as Speedometer 2.0 and WebXPRT 3 showcase similar relatively muted results. Here I had expected Qualcomm to perform really well given the scheduler performance showcase of the Snapdragon 845. The results of the Snapdragon 855 are quite meagre, especially in a steady-state throughput workload such as Speedometer 2.0. Here the Snapdragon 855 only manages to showcase a ~17% improvement over the last generation, and also lags behind the Kirin 980 by some notable amount.

Performance Scaling Ramp Test

One of the newer kind of tests I introduced last year and has used in our review of the Apple iPhone XS is the scaling ramp test – here showcasing the improved DVFS responsivity of iOS12 across several generations of iPhones.

I’ve quickly ran this on the S855 QRD to be able to get a sense of the scheduler and DVFS mechanism:

Here we see the Snapdragon 855 QRD being able to scale from a sleeping idle workload state to its maximum performance state in around 100ms. To compare this, I also showcase the scaling behaviour of the S845 in both the S9+ as well as the Pixel 3. The difference between the Pixel 3’s aggressive boost behaviour and the S9’s more step-wise frequency scaling showcases the best visual representation of the perceived responsiveness difference between the two devices.

The Snapdragon 855 here falls somewhere in-between both. It’s to be noted that the workload does get boosted to an “efficient” big core at 2.45GHz in around 40ms which is a very fast scaling behaviour.

 

Comparing the Snapdragon 855 against the Kirin 980, we see that the Snapdragon isn’t any slower in reaching the maximum performance states. What is odd in these results is that the workload sees a significant pause of ~2.4ms when migrating over from the little cores, something that seems to affect only devices with Qualcomm’s custom scheduler. It’s an interesting find that I’ll have to investigate more.

Overall, real-world performance of the Snapdragon 855 is a bit lower than I had expected it to be. I’m not exactly sure what the cause here is; on the scheduler side we’ve verified that the workload doesn’t inherently scale slower than the Kirin 980. The only other explanation I could see is that we might be seeing some disadvantage of the smaller L3 cache or even the higher DRAM latency.

As we’ve seen in past Snapdragon performance previews, final commercial device performance is subject to change, and it’s possible the performance situation will be more tuned in actual shipping phones.

 

Inference Performance: Good, But Missing Tensor APIs GPU Performance - Returning To Lower Power
Comments Locked

132 Comments

View All Comments

  • tipoo - Tuesday, January 15, 2019 - link

    Untrue. Apples cores are wider, deeper, more OoO than anything else in mobile, and use massive caches at that. You have it reversed, if Android could use the A12 it would post impressive benchmarks, it's hardware design.

    Low level benchmarks are meant to remove the OS from the equation. Proof is in the pudding.
  • goatfajitas - Tuesday, January 15, 2019 - link

    The A12 is a great CPU, but it's not magic. It's all ARM. The difference is in the implementation and control that Apple has with integration. Whatever though, both ways have benefits and downsides. I am just saying that people that think it's all about this CPU that is somehow years ahead of everyone else are mistaken as to the reality of the situation. Suffice to say, it's all fast.
  • axius81 - Tuesday, January 15, 2019 - link

    This just doesn't make sense. "It's all ARM." Yeah, sure, and one companies implementation of that instruction set can absolutely be superior.

    That's like saying "It's all x86 / x86-64." when we're comparing AMD and Intel. One can *absolutely* be faster than the other at implementing that instruction set - and in practice, is.

    Apple makes amazing ARM chips, irrespective of iOS.
  • goatfajitas - Tuesday, January 15, 2019 - link

    They are great chips, I am just saying they are not (hardware wise) way beyond what the competition is doing. Alot of that performance is OS, tight integration with apps, drivers, API's etc as its all controlled by one company. That isnt a bad thing, that is a good thing for Apple customers.
  • techconc - Tuesday, January 15, 2019 - link

    Actually, Apple is significantly ahead of what the competition is doing with ARM based chips. This can be objectively measured.
  • tipoo - Wednesday, January 16, 2019 - link

    What do you call their massive cache and issue width advantage if not being hardware wise beyond the competition? It's not magic, but Apple is clearly spending more on die area than Qualcomm is.
  • bji - Tuesday, January 15, 2019 - link

    Yeah I don't think you know what you're talking about. I think you read somewhere that some of Apple's performance/stability superiority over Android come from Apple controlling the whole stack and you've generalized that into places where the statement just isn't true.
  • techconc - Tuesday, January 15, 2019 - link

    You seem to conflate the ARM instruction set with the actual design of the chip. You then play off Apple's obvious advantages as some sort of magic... err.. "integration" as you call it. That's nonsense. You might be able to claim that for a specific application, but not for generic benchmarks.
  • tipoo - Wednesday, January 16, 2019 - link

    I didn't say it was magic. I said it's not entirely down to some ambiguous "optimization" with the OS. The cores themselves are physically impressive regardless of OS.

    "It's all ARM."

    This shows me you may have missed crucial step, Apple is only licencing the ARM instruction set, but otherwise they design the whole very wide, deep, very OoO core themselves.
  • tipoo - Wednesday, January 16, 2019 - link

    I didn't say it was magic. I said it's not entirely down to some ambiguous "optimization" with the OS. The cores themselves are physically impressive regardless of OS. It's when people play it off as some pie in the sky optimization advantage that they're claiming magic, you can't make a 3-wide Braswell core fly just with vertical integration.

    "It's all ARM."

    This shows me you may have missed crucial step, Apple is only licencing the ARM instruction set, but otherwise they design the whole very wide, deep, very OoO core themselves.

Log in

Don't have an account? Sign up now