The Mobile CPU Core-Count Debate: Analyzing The Real World

Name: The Mobile CPU Core-Count Debate: Analyzing The Real World
Item: The Mobile CPU Core-Count Debate: Analyzing The Real World
Author: Andrei Frumusanu

by Andrei Frumusanu on September 1, 2015 8:00 AM EST

Posted in
Smartphones
CPUs
Mobile
SoCs

157 Comments | Add A Comment

157 Comments

S-Browser - AnandTech Article

We start off with some browser-based scenarios such as website loading and scrolling. Since our device is a Samsung one, this is a good opportunity to verify the differences between the stock browser and Chrome as we've in the past identified large performance discrepancies between the two applications.

To also give the readers an idea of the actions logged, I've also recorded recreations of the actions during logging. These are not the actual events represented in the data as I didn't want the recording to affect the CPU behaviour.

We start off by loading an article on AnandTech and quickly scrolling through it. It's mostly at the beginning of the events that we're seeing high computational load as the website is being loaded and rendered.

Starting off at a look of the little cluster behaviour:

The time period of the data is 11.3s, as represented in the x-axis of the power state distribution chart. During the rendering of the page there doesn't seem to be any particular high load on the little cores in terms of threads, as we only see about 1 little thread use up around 20% of the CPU's capacity. Still this causes the cluster to remain at around the 1000MHz mark and causes the little cores to mostly stay in their active power state.

Once the website is loaded around the 6s mark, threads begin to migrate back to the little cores. Here we actually see them being used quite extensively as we see peaks of 70-80% usage. We actually have bursts where may seem like the total concurrent threads on the little cluster exceeds 4, but still nothing too dramatically overloaded.

Moving on to the big cluster:

On the big cluster, we see an inversion of the run-queue graph. Where the little cores didn't have many threads placed on them, we see large activity on the big cluster. The initial web site rendering is clearly done by the big cluster, and it looks like all 4 cores have working threads on them. Once the rendering is done and we're just scrolling through the page, the load on the big cluster is mostly limited to 1 large thread.

What is interesting to see here is that even though it's mostly just 1 large thread that requires performance on the big cores, most of the other cores still have some sort of activity on them which causes them to not be able to fall back into their power-collapse state. As a result, we see them stay within the low-residency clock-gated state.

On the frequency side, the big cores scale up to 1300-1500 MHz while rendering the initial site and 1000-1200 while scrolling around the loaded page.

When looking at the total amount of threads on the system, we can see that the S-Browser makes good use of at least 4 CPU cores with some peaks of up to 5 threads. All in all, this is a scenario which doesn't necessarily makes use of 8 cores per-se, however the 4+4 setup of big.LITTLE SoCs does seem to be fully utilized for power management as the computational load shifts between the clusters depending on the needed performance.

Introduction & Methodology Browser: S-Browser - AnandTech Frontpage

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

157 Comments

View All Comments

lilmoe - Tuesday, September 1, 2015 - link
"If 4 threads running on 4 small cores at 50% FMax can be done by one big core at FMin without wasting any cycles, the advantage actually goes to the big core configuration."

That's hardly a real-world or even valid comparison. Things aren't measured that way.

On the chip level, it all boils down to a direct comparison, which by itself isn't telling much because the core configuration of two different chips isn't usually the only difference. Other metrics start to kick in. Those arguing dual-core wide cores are thinking iOS, which by itself invalidates the comparison. We're talking Android here.

On the software side, real life scenarios aren't easy to quantify.

This article simply states the following:

- Android currently has relatively good parallelism capabilities for common workloads,
- Therefore, there is merit in 8 small core and 4x4 big.LITTLE configurations from an efficiency perspective. The latter being beneficial for comparable performance with custom core designs when needed.

Most users are either browsing, texting, or on social media. Most of the games played, BY FAR, are the less demanding ones that usually don't trigger the big cores.

I've said this before in reply to someone else. When QC and Samsung release their custom quad core designs, which do honestly believe would me more power efficient, those chips as-is? Or the same chips in addition to little cores in big.LITTLE (provided they can be properly configured that way).

A wise man once said: "efficiency is king".
lilmoe - Tuesday, September 1, 2015 - link
You guys are deliberately stretching the scope in which the findings of this article applies to.

Just stop.

It has been clearly stated that this only applies to how Android (and Android Apps) manage to benefit from more cores in terms of efficiency. It was clearly stated that this doesn't apply to other operating systems "iOS in particular".
Nenad - Tuesday, September 1, 2015 - link
I agree.

Especially since any app designed for performance will launch as many threads as needed to use available cores. So looking if "there are more than 4 threads active on 4+4 core CPU" can be misleading. If you run those tests on 2 core CPU, would number of threads remain same or be reduced? How about 10 core CPU?

In other words, only comparing performance and power usage (and not number of threads) would tell us if 4+4 is better than 4 or than 2 cores. Problem with that is finding different CPUs on same technology platform (to ensure only number of cores is different, and not 20nm vs 28nm vs different process etc).

Barring that, comparison of power performance per cost among 4+4 vs 4 vs 2 is also better indicator than comparing number of threads.

TD;DR: it is 'easy' to have more threads than CPU cores, but it does not indicate neither performance nor power usage.
ThisIsChrisKim - Tuesday, September 1, 2015 - link
The question being answered was this, "On the following pages we’ll have a look at about 20 different real-world often encountered use-cases where we monitor CPU frequency, power states and scheduler run-queues. What we are looking for specifically is the run-queue depth spikes for each scenario to see just how many threads are spawned during the various scenarios."

It was simply assessing if multiple cores were actually used in the little-big design. Not a comparison of different designs.
Aenean144 - Tuesday, September 1, 2015 - link
Yes, that was what the article was trying to find out, but it didn't answer the question of whether 4-core and 8-core designs are better than 2 core or 3 core designs. That's been the contention of the "can't use that many cores" mantra.

All this article has explained is that the OS scheduler can distribute threads across a lot of cores, something hardly anyone has a problem with.

What I'd like to see is the performance or user experience difference between 2-core, 4-core, and 8-core designs, all using the same SoC. There's nothing magic about this. In PCs today, we've largely settled on 2-core and 4-core designs for consumer systems. 6-core and 8-core systems for gaming rigs, but that's largely an artifact of Intel's SKUs.

So, if I believe the marketing, these smartphones really need to have 8-core designs when my laptop or desktop, capable of handling an order of magnitude more computation needs, with just 2-cores or 4-cores?
prisonerX - Tuesday, September 1, 2015 - link
You're missing the point. Your desktop is faster because it uses much more power. Mobile phones have more cores because it's more efficient to use more lower power cores than fewer high power cores.
Aenean144 - Tuesday, September 1, 2015 - link
My desktop and laptop are power limited at their respective TDPs, and it's been this way for a very long time. If more cores were the answer, why are we sitting at mostly 2-core and 4-core CPUs in the PC space?

All this stuff isn't new whatsoever, and the PC space went through the same core-count race 10 years ago. There has to be something systematically different such that Intel went down this path while ARM smartphones are in the midst of a core-count race.

I've read that it could be an economics thing as ARMH gets money on a per-core basis while spending money on complicated DVFS schemas and high IPC cores isn't worth it for them. Maybe in the PC space, we don't need the performance anymore.
mkozakewich - Wednesday, September 2, 2015 - link
It's because we're always fighting for quicker single-thread work. A lot of things can be parallelized, but there are also a lot of things today that aren't. I agree that Intel should try out some kind of big.LITTLE thing with a couple Atom cores and a Core M, just to see how it runs.
prisonerX - Wednesday, September 2, 2015 - link
It's Intel's backward looking strategy. They're competing in high power/high single thread performance because they can win that with legacy desktop software and a legacy CPU architecture.

Meanwhile, the rest of the world is going low power multithreaded, because that's the future. Going forward it's the only way to increase performance with low power. Google are correctly pushing an aggressively multithreaded software architecture.

Intel have already hit the wall with single threaded performance. They can win the present but not the future. Desktops aren't moving forward because no-one cares about them except gamers, and gamers largely don't care about power usage because they don't run on batteries.
Frihed - Friday, September 4, 2015 - link
In the desktop, the costs of making a chip matters much more, as the bigger chips are some times more expensive than a hole mobile device. The costs of putting more cores in the chip counts there.

The Mobile CPU Core-Count Debate: Analyzing The Real World

S-Browser - AnandTech Article

Post Your Comment

157 Comments

View All Comments

lilmoe - Tuesday, September 1, 2015 - link

lilmoe - Tuesday, September 1, 2015 - link

Nenad - Tuesday, September 1, 2015 - link

ThisIsChrisKim - Tuesday, September 1, 2015 - link

Aenean144 - Tuesday, September 1, 2015 - link

prisonerX - Tuesday, September 1, 2015 - link

Aenean144 - Tuesday, September 1, 2015 - link

mkozakewich - Wednesday, September 2, 2015 - link

prisonerX - Wednesday, September 2, 2015 - link

Frihed - Friday, September 4, 2015 - link

Log in

Don't have an account? Sign up now