Overall Analysis & Conclusion

Hopefully we've managed to cover a few of the more common use-cases that are routinely encountered in daily usage on Android and get a good idea of how applications behave. We've seen some quite expected numbers for some use-cases but also stumbled on very large surprises that weren't quite as obvious. 

There were two cases that especially stood out: Browser usage and application installation and updates. It could be argued that app updates are merely a corner-case that doesn't affect a user's experience much. After all, installing an updating apps represent only an insignificant fraction of what a user does on a device. Browser usage and web-page rendering in general however, are one of the most common and often encountered scenarios on a smartphone, and here's where we encountered the largest surprises.

When I started out this piece the goals I set out to reach was to either confirm or debunk on how useful homogeneous 8-core designs would be in the real world. The fact that Chrome and to a lesser extent Samsung's stock browser were able to consistently load up to 6-8 concurrent processes while loading a page suddenly gives a lot of credence to these 8-core designs that we would have otherwise not thought of being able to fully use their designed CPU configurations. In terms of pure computational load, web-page rendering remains as one of the heaviest tasks on a smartphone so it's very encouraging to see that today's web rendering engines are able to make good use of parallelization to spread the load between the available CPU cores.

It's hard to summarize the vast amount data of the last 16 pages in an orderly and correct manner. After all we are talking about extremely varying use-cases and time-scales for each scenario. While averaging the metrics over the course of a scenario might seem a good idea at first, one has to keep in mind that this wouldn't be able to properly represent cases where load peaks for smaller durations. It's these small computational bursts which are most of the time the cause for "lags" and frame-drops. So to better represent these bottle-necks which determine the user-visible cases of application speed and performance, we rather use the 90th percentile of the CPU run-queue depths:

90th Percentile Run-Queue Depth Averages
  Little Cluster Big Cluster Little + Big
Clusters
S-Browser - AnandTech Article 2.27 2.19 3.87
S-Browser - AnandTech FP 3.12 1.25 4.15
Chrome - AnandTech FP 5.69 1.84 7.10
Chrome - BBC Frontpage 5.00 2.00 6.22
Hangouts Launch 2.77 2.11 4.01
Hangouts Writing A Message 2.80 0.05 2.57
Reddit Sync Launch 1.84 1.11 2.38
Reddit Sync Scrolling 0.95 1.03 1.46
Play Store Open & Scroll 2.87 0.78 3.45
Play Store App Updates 3.73 5.42 8.51
Camera: Launch 1.45 2.73 2.98
Camera: Still Snapshot 4.12 0.87 4.59
Camera: Video Recording 5.17 2.04 5.42
Real Racing 3 Launch 2.16 1.33 3.26
Real Racing 3 Playing 2.09 0.89 2.96
Modern Combat 5 Playing 2.09 0.73 2.68

I was wary of creating this table as it can be easily misinterpreted: Because run-queue depth averages are not directly representative of the amount of concurrent threads in a given scenario, we lose information when aggregating them for a given cluster or the whole system. This for example happens on the big cluster on the AT article load scenario where the 90th percentile of the aggregate rq-depth reaches 2.19 while in reality this figure is composed of 4 medium-high threads. Readers should thus keep in mind the actual detailed graphs of the preceding pages when reading the table.

While not directly the goal of the article, the collected data also serves as a perfect case-study for heterogeneous big.LITTLE SoCs. We've long seen discussions concerning what the "ideal" big.LITTLE configuration would be. There's several angles to this: the most optimal little and big cluster core counts, and whether we're aiming for performance or power efficiency in each case. In terms of low- to medium-performance threads, we've had several cases where 4 little cores weren't enough. Web page rendering in Chrome in particular seems to be the killer use-case where actually having two clusters of highly efficient cores makes sense.

On the high-performance "big" cluster side, the discussion topic is more about whether 2 or 4 core designs make more sense. I think the decision here is not about performance but rather about power efficiency. A 2-core big-cluster design would provide more than enough performance for most use-cases, but as we've seen throughout our testing during interactive use it's more common than not to have 2+ threads placed on the big cluster. So while a 2-core design could handle bursts where ~3-4 threads are placed onto the big cluster, the CPUs would need to scale up higher in frequency to provide the same performance compared to a wider 4-core design. And scaling up higher in frequency has a quadratically detrimental effect on power efficiency as we need higher operating voltages. At the end of the day I think the 4 big core designs are not only the better performing ones but also the more efficient ones. 

This puts one particular vendor in quite of an interesting position: MediaTek. Even if one wouldn't be able to fully saturate a cluster one can still derive power efficiency advantages due to the fact that two small clusters would be able to operate at separate frequencies and thus efficiency points. I've encountered enough scenarios that would in theory fit the Helio X20's tri-cluster design that I'm starting to think that such a design would actually be a very smart choice for current Android devices.

What about more traditional SoC configurations? As mentioned earlier symmetric 8-core designs such as MediaTek's Helio X10 would, contrary to one's expectations, be seemingly able to take advantage of their higher core counts. So while it would be preferable to have higher performance cores such as Cortex A57's or A72's, one has to keep in mind the target market of these architectures are limited to higher-end SoCs. The 8 little-core designs are mostly targeted at the entry- and mid-level where adding a second Cortex A53 cluster can be very cheap way of still providing benefits in every-day usages, particularly in web-browsing.

What is clear though albeit there are corner-cases, is that the vast majority of applications do seem to be optimal for quad-core SoCs. This is why traditional 4-core and 4.4 big.LITTLE designs still appear to make the most sense in terms providing a balanced configuration and making most use of the hardware at hand. For big.LITTLE, even if there were no use-cases where all cores are concurrently used, it's not a big deal as what we are aiming for in heterogeneous systems is power efficiency gains.

This is also the point of the discussion where the debate of the potential detrimental effect of having more cores comes into play: The fact that a SoC has more cores does not automatically mean it uses more power. As demonstrated in the data, modern power management is advanced enough to make extensive use of fine-grained power-gated idle states, thus eliminating any overhead there might be of simply having more physical cores on the silicon. If there are cases (And as we've seen, there are!) which make use of more cores then this should be seen purely as an added bonus and icing on the cake. 

What about narrow CPU-core number design philosophies? Would such designs make sense on Android? This is probably another question that our readers will ask themselves when looking at the data. Apple and recently Nvidia with their Denver architecture both choose to keep going the route of employing large 2-core designs that are strong in their single-threaded performance but fall behind in terms of multi-threaded performance.

While for Apple it can be argued that we're dealing with a very different operating system and it is likely iOS applications are less threaded than their Android counter-parts. But there are cases where this doesn't need to be necessarily hold true: For example browser rendering engines, as demonstrated, can be multi-threaded if adapted to do so. Native high-end games which already make use of multiple threads are also unlikely to differ in their threading logic between the platforms.

While such narrow CPU-core designs would have higher performance at a given frequency - it is not a direct indicator of the actual performance/W efficiency that a single thread would have on these chipsets. We still haven't had a chance to make a proper apples-to-apples comparison for these architectures so we're limited to theorycrafting with the data we currently have available to us:

What we see in the use-case analysis is that the amount of use-cases where an application is visibly limited due to single-threaded performance seems be very limited. In fact, a large amount of the analyzed scenarios our test-device with Cortex A57 cores would rarely need to ramp up to their full frequency beyond short bursts (Thermal throttling was not a factor in any of the tests). On the other hand, scenarios were we'd find 3-4 high load threads seem not to be that particularly hard to find, and actually appear to be an a pretty common occurence. For mobile, the choice seems to be obvious due to the power curve implications. In scenarios where we're not talking about having loads so small that it becomes not worthwhile to spend the energy to bring a secondary core out of its idle state, one could generalize that if one is able to spread the load over multiple CPUs, it will always preferable and more efficient to do so. 

In the end what we should take away from this analysis is that Android devices can make much better use of multi-threading than initially expected. There's very solid evidence that not only are 4.4 big.LITTLE designs validated, but we also find practical benefits of using 8-core "little" designs over similar single-cluster 4-core SoCs. For the foreseeable future it seems that vendors who rely on ARM's CPU designs will be well served with a continued use of 4.4 b.L designs. Only MediaTek seems to fall out of the norm here with its upcoming X20 SoC, which I'm definitely looking forward to see as to how it behaves in the real-world. We'll also see some vendors revert back to quad-core designs in their custom architectures - while we've yet to get a better picture of how these will behave in terms of performance and power, I think that 4 cores will be a quite reasonable target and sweet-spot for vendors to aim for.

Games: Modern Combat 5 Playing
POST A COMMENT

157 Comments

View All Comments

  • lilmoe - Tuesday, September 1, 2015 - link

    "we're seeing what or how windows & the x86 platform has stagnated"

    Your argument is highly inaccurate and extremely dated. This isn't Windows XP era anymore... Windows 10 and 10 Mobile might as well be better than Android in what you're giving kudos to Google for (which they've somewhat managed after YEARS of promises). There's still a huge chunk of overhead in Android's rendering pipeline that needs serious attention. Android has made huge improvements, yes, but there still lots of work that needs to be done.

    @Impulses has a good point too; It's extremely difficult to get a fair apples-to-apples comparison when it comes to optimal handling of workloads for varying thermal limits. CPUs at ~2W TDP behave VERY differently from those at 15W, and both behave yet differently from those running at 37W+. This becomes evident when middle ground ~5W mobile CPUs are in the picture, like Intel's Core M, where devices running those are showing no better battery life than their 15W counterparts running the same OS. (Windows 10 is changing that, however, and is showing extreme battery savings in these lower TDPs, more so than the improvements in higher TDP parts, which tells a lot about W10).

    If that isn't clear enough already, read the article again. The author CLEARLY mentions in the first page not to make the mistake of applying the aforementioned metrics to other platforms and operating systems, and to strictly stick with Android and big.LITTLE.
    Reply
  • Alexvrb - Tuesday, September 1, 2015 - link

    Thank you lilmoe and name99! I read his comment and I was like really? These results don't support his claims and were never intended to compare platforms - as specifically stated by the author. Reply
  • R0H1T - Thursday, September 3, 2015 - link

    XP to win10 took what a decade & a half? Vista was the last major change, after XP, DX10 & UAC, with win7 then win8 & now win10 bringing only incremental updates. Yeah I call that slow & we've had quad cores since what nearly a decade now, even then a vast majority of systems (desktops+notebooks) are dual core or 2 cores+HT surely that must make you cringe! Then we have programs that don't make use of multiple cores efficiently &/or the latest instruction sets like AVX. There's just a single web browser, that I know of, which uses the latter on PC! Call it whatever you may or twist it however you like to but this is one of the major reasons that PC sales are declining not just "everyone owns one & so they don't need it" excuse that's thrown around far too often. So far as "extrapolating this article to my observations" argument is concerned, there's no need to do that since there's historical precedence & copious amount of evidence to support pretty much every word of what I've said. Reply
  • Azethoth - Thursday, September 3, 2015 - link

    Ugh dude, you have no idea what you are talking about. 4.4 architectures on a phone are a desperate attempt to reduce power usage. I am a programmer and compile times matter to me and threading helps. Even so going from 8 threads on my desktop CPU to 12 threads on the E CPU a year later only reduces a total recompile of 26 minutes by 2-3 minutes. But that E cannot clock as high, so in the regular incremental compile case it is slower. Do you get this? You are factually wrong for an actual core dependent use case.

    Now I can stick my head in the sand like you and pretend that more cores are automatically better but it just isn't for my workload. You may as well bitch that I should be running on multi thousand dollar server CPUs with 16 cores. Again no. They have their place in a server, but no place in my desktop.
    Reply
  • Samus - Tuesday, September 1, 2015 - link

    If "Google and Android" have 'nailed' MT then why do $600+ Android phones feel more sluggish, have a choppier UI, and launch programs slower than a 3 year old iPhone 5 or Lumia 800?

    Perhaps because the kernel and underlying architecture are so bloated because they need to support so many SOC's. They've resorted to heavy compression just to keep distribution sizes down, which also hits performance.

    Android only has one place, on cheap phones. You're an idiot if you buy a $600+ Android phone when you get the same crappy experience on a $50 Kyocera.

    I've tried so hard to like Android over the years, but every device I've had completely disappointed me compared to older Blackberry and modern iPhone devices where you don't need to find hacked distributions when manufactures drop the ball supporting the phone, or just make a crappy ROM in general. Even Nexus devices aren't immune to this and historically they haven't been very good phones, although admittedly, the only high-end Android phone worth buying is a Nexus, but now they cost so much it isn't justifiable.

    Basically I recommend two phones to people. If they want a cheap phone, get a OnePlus One or some other sub-$300 Android device. If you're budget is higher, get an iPhone, or if you are adventurous, a WinMo device. At least the iPhone will receive support for 4-5 years and holds its value during that time.
    Reply
  • Buk Lau - Tuesday, September 1, 2015 - link

    I'm calling BS on most of your claims. Your experience with a Moto E (not saying it's a bad phone) will be vastly different from that of a Note 5, and those differences can start as obvious as how often you need to refresh your Chrome pages as you run out of RAM.
    What "600+" Android phone are you talking about that feels “more sluggish and slower” than a 3 year old iPhone? If you want people to take your claim seriously then at least provide some examples rather than this generic BS that anyone can easily come up with.
    The way Android is designed makes it kind of difficult to bring updates as surprising as you may found. Every time the OS updates, there are changes to the HAL (hardware abstraction layer) and those changes can be minor or significant. It is then up to the SoC provider to provide the proper drivers needed after the HAL change, and they certainly won’t provide it for free. At the same time, OEM also have to decide how much the new update will impede performance. For example my first gen Moto X got an update to 5.1.1 a few months ago and despite the new features, there are still performance hits in places. Even older devices probably will do better on Jelly Bean and KitKat anyways since Google Play services can be updated independent of OS version.
    Here’s some useful info on why Android is as fragmented as it is
    http://www.xda-developers.com/opinion-android-is-i...
    The biggest reason Apple updated all those 4S isn’t because how they loved their users, but rather to purposely slow down their devices to force them to upgrade. You can just ask the 4S users around you to see what iOS 8 really meant for them.
    I do agree however that people should try more $300-400 devices that are near flagship level with compromises that are more tolerable, and this $600+ smartphone price should really tone itself down a bit.
    Reply
  • Kutark - Tuesday, September 1, 2015 - link

    Yeah i have to call bullshit on his claims too. I mean i know its anecdotal, but my buddies and i have had literally dozens of android phones over the years, as well as various iphones. And none of us have seen any kind of performance difference between the two. Im thinking he just had a shit experience with one android phone and like most people just wrote it off at that point.

    I have had a bad experience with an HTC Rezound, but every phone ive had before or after that has been fantastic. I absolutely LOVE my LG G3, its extremely responsive and fast, and i've never had issues with slowdowns on it. That being said i dont do any "gaming" (and i put gaming in quotes for a reason) on the phone, so i can't speak to that. But as far as browser, youtube, other apps, etc. It couldn't be more perfect.
    Reply
  • Samus - Wednesday, September 2, 2015 - link

    I'm at IT director and I have a "shit experience" with android phones people bring to me every week.

    Defending android is like defending your Kia Rio. It's a low cost tool to fit a low cost niche. The experience is the same no matter who is driving.
    Reply
  • Kutark - Wednesday, September 2, 2015 - link

    If you say so. As an IT director you should know that 99% of the time there is a problem, its user related and not hardware related. One thing i will give apple is that they lock their products down so hard that its much harder for the user to F it up. Whereas on more open platforms like android or windows, the user has much more control and thus much more ability to F things up royally.

    Whether thats a plus or a minus really just depends on what you're looking for. For people who want or need control over their hardware, its a plus, for people who just want something "to work" so to speak, its a minus.
    Reply
  • mkozakewich - Wednesday, September 2, 2015 - link

    Your claim that Apple is trying to slow down devices throws off your entire argument, really. Reply

Log in

Don't have an account? Sign up now