S-Browser - AnandTech Article

We start off with some browser-based scenarios such as website loading and scrolling. Since our device is a Samsung one, this is a good opportunity to verify the differences between the stock browser and Chrome as we've in the past identified large performance discrepancies between the two applications.

To also give the readers an idea of the actions logged, I've also recorded recreations of the actions during logging. These are not the actual events represented in the data as I didn't want the recording to affect the CPU behaviour.

We start off by loading an article on AnandTech and quickly scrolling through it. It's mostly at the beginning of the events that we're seeing high computational load as the website is being loaded and rendered.

Starting off at a look of the little cluster behaviour:

The time period of the data is 11.3s, as represented in the x-axis of the power state distribution chart. During the rendering of the page there doesn't seem to be any particular high load on the little cores in terms of threads, as we only see about 1 little thread use up around 20% of the CPU's capacity. Still this causes the cluster to remain at around the 1000MHz mark and causes the little cores to mostly stay in their active power state. 

Once the website is loaded around the 6s mark, threads begin to migrate back to the little cores. Here we actually see them being used quite extensively as we see peaks of 70-80% usage. We actually have bursts where may seem like the total concurrent threads on the little cluster exceeds 4, but still nothing too dramatically overloaded.

Moving on to the big cluster:

On the big cluster, we see an inversion of the run-queue graph. Where the little cores didn't have many threads placed on them, we see large activity on the big cluster. The initial web site rendering is clearly done by the big cluster, and it looks like all 4 cores have working threads on them. Once the rendering is done and we're just scrolling through the page, the load on the big cluster is mostly limited to 1 large thread. 

What is interesting to see here is that even though it's mostly just 1 large thread that requires performance on the big cores, most of the other cores still have some sort of activity on them which causes them to not be able to fall back into their power-collapse state. As a result, we see them stay within the low-residency clock-gated state.

On the frequency side, the big cores scale up to 1300-1500 MHz while rendering the initial site and 1000-1200 while scrolling around the loaded page.

When looking at the total amount of threads on the system, we can see that the S-Browser makes good use of at least 4 CPU cores with some peaks of up to 5 threads. All in all, this is a scenario which doesn't necessarily makes use of 8 cores per-se, however the 4+4 setup of big.LITTLE SoCs does seem to be fully utilized for power management as the computational load shifts between the clusters depending on the needed performance.

Introduction & Methodology Browser: S-Browser - AnandTech Frontpage
Comments Locked

157 Comments

View All Comments

  • yankeeDDL - Tuesday, September 1, 2015 - link

    Just wanted to say that it's agreat article. Well done and very interesting: the use of 4+4 cores on a mobile platform while on a PC we still have plenty of 2 cores CPUs, seemed quite ridiculous. But no, clearly, it makes sense.
  • Tolwyns - Tuesday, September 1, 2015 - link

    Very interesting article. These test were done on Android 5, I take it. I know that this analysis is geared toward current hardware, but most of the "4cores are only marketing" discussion was quite a while back when most device had some version of Android 4. I wonder if the benefits of more cores did show up then. The second thing i'm interested in is "How much of this is applicable to other SOCs". Not much I gather. And related to that "How much of this is limited to Samsung devices", because they made the CPU and the Firmware-softwarelayer of the tested device.
  • SunLord - Tuesday, September 1, 2015 - link

    I'm kinda curious how a 8 core version of the x20 with 2 lower power 4 mid power and 2 high power cores would perform
  • Shadowmaster625 - Tuesday, September 1, 2015 - link

    It is kind of a misleading analysis. One single haswell core could juggle all of these processes and still have plenty of time to sleep. So you're not really telling us anything here. Is a wider fatter core better than all these narrow underpowered cores? Given the performance and power consumption of the apple SoCs, I would still have to say yes.
  • IanHagen - Tuesday, September 1, 2015 - link

    This! When developing for iOS I usually have to span several threads (queues in Apple's world) for things that would otherwise block the main queue, which would cause the UI to "freeze" and the dual core SoC inside the devices I'm targeting are munching my threads absolutely fine. Just by saying that the several extre cores found in Android phones aren't sleeping you're not coming to any definitive conclusion about any clear advantage of having them.
  • nightbringer57 - Tuesday, September 1, 2015 - link

    The thing is that when you have 4 threads, 4 cores can potentially do the job more efficiently with performance equal to a single core with 4 times the execution speed.
  • nightbringer57 - Tuesday, September 1, 2015 - link

    *by efficiently, I mean, using less power*
  • metafor - Tuesday, September 1, 2015 - link

    Potentially, but not necessarily. Threading and thread migration aren't free. It depends on how much performance you really need. The A57(R3), for instance, at very low frequencies is actually slightly more power efficient than the A53 at its peak frequency (surprising, I know).

    If you have 4 threads that need absolutely-bare-minimum performance that a min-frequency single-core could handle, waking up 4 cores (even if they're smaller) and loading the code/data into the caches of each of those cores isn't necessarily a clear win. Especially if they share the same code.
  • lilmoe - Tuesday, September 1, 2015 - link

    "The A57(R3), for instance, at very low frequencies is actually slightly more power efficient than the A53 at its peak frequency (surprising, I know)."

    Cool story. Except that, in most of the smaller multithreaded workload cases, the little cores usually aren't near their saturation levels. Also, in most cases, when they _do_ get saturated, the workload is transferred and dealt with by big core or two in short bursts.

    Even if it isn't a "clear win", in *some* workloads mind you, saying that there isn't any apparent merit in these configurations is really irresponsible.
  • metafor - Tuesday, September 1, 2015 - link

    I don't think I said there's no merit to such configurations. I simply said parallelizing a workload isn't always a clear win over using a single core. It depends on the required performance level and the efficiency curve of the small core and big core.

    If 4 threads running on 4 small cores at 50% FMax can be done by one big core at FMin without wasting any cycles, the advantage actually goes to the big core configuration. The small core configuration works if there's a thread that requires so little performance, it'd be wasteful to run it on the big core even at FMin.

    The conclusion of which is best for the given workload isn't as clear cut as saying "look, the small cores are being used by a lot of threads!". But rather, by measuring power and perf using the two configurations.

Log in

Don't have an account? Sign up now