S-Browser - AnandTech Article

We start off with some browser-based scenarios such as website loading and scrolling. Since our device is a Samsung one, this is a good opportunity to verify the differences between the stock browser and Chrome as we've in the past identified large performance discrepancies between the two applications.

To also give the readers an idea of the actions logged, I've also recorded recreations of the actions during logging. These are not the actual events represented in the data as I didn't want the recording to affect the CPU behaviour.

We start off by loading an article on AnandTech and quickly scrolling through it. It's mostly at the beginning of the events that we're seeing high computational load as the website is being loaded and rendered.

Starting off at a look of the little cluster behaviour:

The time period of the data is 11.3s, as represented in the x-axis of the power state distribution chart. During the rendering of the page there doesn't seem to be any particular high load on the little cores in terms of threads, as we only see about 1 little thread use up around 20% of the CPU's capacity. Still this causes the cluster to remain at around the 1000MHz mark and causes the little cores to mostly stay in their active power state. 

Once the website is loaded around the 6s mark, threads begin to migrate back to the little cores. Here we actually see them being used quite extensively as we see peaks of 70-80% usage. We actually have bursts where may seem like the total concurrent threads on the little cluster exceeds 4, but still nothing too dramatically overloaded.

Moving on to the big cluster:

On the big cluster, we see an inversion of the run-queue graph. Where the little cores didn't have many threads placed on them, we see large activity on the big cluster. The initial web site rendering is clearly done by the big cluster, and it looks like all 4 cores have working threads on them. Once the rendering is done and we're just scrolling through the page, the load on the big cluster is mostly limited to 1 large thread. 

What is interesting to see here is that even though it's mostly just 1 large thread that requires performance on the big cores, most of the other cores still have some sort of activity on them which causes them to not be able to fall back into their power-collapse state. As a result, we see them stay within the low-residency clock-gated state.

On the frequency side, the big cores scale up to 1300-1500 MHz while rendering the initial site and 1000-1200 while scrolling around the loaded page.

When looking at the total amount of threads on the system, we can see that the S-Browser makes good use of at least 4 CPU cores with some peaks of up to 5 threads. All in all, this is a scenario which doesn't necessarily makes use of 8 cores per-se, however the 4+4 setup of big.LITTLE SoCs does seem to be fully utilized for power management as the computational load shifts between the clusters depending on the needed performance.

Introduction & Methodology Browser: S-Browser - AnandTech Frontpage
Comments Locked

157 Comments

View All Comments

  • jjj - Wednesday, September 2, 2015 - link

    Fortune seems way heavy for example but even Amazon's home page (desktop version) seems not too friendly.
  • djscrew - Tuesday, September 1, 2015 - link

    Love the article, but after reading it, I feel like the articles you write comparing phone CPU performance & battery life are far more applicable. You lose access to so much of the information in this article that at the end of the day testing the actual phone & OS usage of the CPU makes more sense.
  • Daniel Egger - Tuesday, September 1, 2015 - link

    What I'm sincerely missing in this article is the differentiation between multi-processing and multi-threading, with the difference being that multi-processing is partitioning the workload across multiple processes whereas multi-threading spawns threads which are then run in the OS directly or again mapped to processes in different ways -- depending on the OS, in Linux they're actually mapped onto processes.Threads do share context with their creator so shared information requires locking which wastes performance and increases waiting times, the solution to which in the threading happy world is to throw more threads at a problem in the hopes that locking contention doesn't go through the roof and there's always enough work to do to keep the cores busy.

    So the optimum way to utilise resources to a maximum is actually not to use MT but MP for the heavy lifting and make sure that the heavy work is split evenly across the available number of to-be-utilised cores.

    For me it would actually be interesting to know whether some apps are actually clever enough to do MP for the real work or are just stupidly creating threads (and also how many).

    Since someone mentioned iOS: Actually if you're using queues this is not a traditional threading model but more akin to a MP model where different queues handled by workers (IMNSHO confusingly called thread) are used to dispatch work to in a usually lock free manner. Those workers (although they can be managed manually) are managed by the system and adjust automatically to the available resources to always deliver the best possible performance.
  • extide - Tuesday, September 1, 2015 - link

    Don't forget, most of it is in Java, so it's probably one java process with several threads, not multiple java processes. The native apps, could go either way.
  • Daniel Egger - Tuesday, September 1, 2015 - link

    One interesting question here is: What does Google do? Chrome on regular desktop OS uses one process per view to properly isolate views from one another; does anybody know whether Chrome on Android does the same? I couldn't figure it out from the available documentation...
  • praeses - Tuesday, September 1, 2015 - link

    Next time can the colour legend below the graphs have their little squares enlarged to the height of the text? For those who are colour-challenged, it would make it a lot easier to match even when the image is blown-up. There doesn't seem to be a reason to have them so small.
  • endrebjorsvik - Thursday, September 3, 2015 - link

    I would rather make the colors more intuitive. For instance by using a colormap like the jet colormap from Octave/Matlab. Low clock frequencies should be mapped to cool colors (blue to green), while high clock frequencies should be mapped to warm colors (yellow to red). By doing that you just have to look at the legend only once. After that, the colors speak for themselves.

    The plots are really hard to read now when you have green at both low and high frequency (700 and 1400), and four shades of blue evenly distributed over the frequency range (500, 900, 1100, 1500). When I want to read such a plot, I don't care whether the frequency is 600 or 700. So these two colors doesn't have to be very different. But 500 and 1500 should be wastly different. The plots in this article are made in the opposite way. All the small steps has big color differences in order to be able to distinguish every small step from each other. But at some point the map ran out of majors colors and started repeating the spectrum again, with only slightly different colors.
  • qlum - Tuesday, September 1, 2015 - link

    It would be interesting how desktop systems hold up in these tests especially with amd's 2 cores per module design.
  • name99 - Tuesday, September 1, 2015 - link

    Andrei,
    After so much work on your part it seems uncouth to complain! But this is the internet, so here goes...

    If you ever have the energy to revise this topic, allow me to suggest two changes to substantially improve the value of the results:

    With respect to how results are displayed:
    - Might I suggest you change the stacking order of the Power State Distribution graphs so that we see Power Gated (ie the most power saving state) at the bottom, with Clock Gated (slightly less power saving) in the middle, and Active on top.
    - The frequency distribution graphs make it really difficult to distinguish certain color pairs, and to see the big picture. Might I suggest that a plot using just grey scale (eg black at lowest frequency to white at highest frequency) would actually be easier to parse and to show the general structural pattern?

    As a larger point, while this data is interesting in many ways, it doesn't (IMHO) answer the real question of interest. Knowing that there are frequently four runnable threads is NOT the same thing as knowing that four cores are useful, because it is quite possible that those threads are low priority, and that sliding them so as run consecutively rather than simultaneously would have no effect on perceived user performance.

    The only way, I think, that one can REALLY answer this particular question ("are four cores valuable, and if so how") is an elimination study. (Alternatives like trying to figure out the average run duration of short term threads is really tough, especially given the granularity at which data is reported).

    So the question is: does Android provide facilities for knocking out certain cores so that the scheduler just ignores them? If so, I can suggest a few very interesting experiments one might run to see the effects of certain knockout patterns. In each case, ideally, one would want to learn
    - "throughput" style performance (how fast the system scores on various benchmarks)
    - effect on battery usage
    - "snappiness" (which is difficult to measure objectively, but maybe is obvious enough for subjective results to be noticed).

    So, for example, what if we knock out all the .LITTLE cores? How much faster does the system seem to run, with what effect on battery? Likewise if we knockout all the big cores? What if we have just two big cores (vaguely equivalent to an iPhone 6)? What if we have two big and two LITTLE cores?

    I don't have any axe to grind here --- I've no idea what these experiments will show. But it would certainly be interesting to know, for example, if a system consisting of only 4 big cores feels noticeably snappier than a big.LITTLE system while battery life is 95% as long? That might be a tradeoff many people are willing to make. Or, maybe it goes the other way --- a system with only one big core and 2 little cores feels just as fast as an octocore system, but the battery lasts 50% longer?
  • justinoes - Tuesday, September 1, 2015 - link

    This was a seriously fascinating read. It points to a few things...

    First, Android has some serious ability to take advantage of multiple cores or ILP has improved dramatically. I remember when the Moto X (1st Gen) came out with a dual core CPU engineers at Moto said that even opening many websites didn't use more than two cores on most phones. [http://www.cnet.com/news/top-motorola-engineer-def...] Does this mean that Android has stepped up its game dramatically or was that information not true to begin with?

    Second, It seems like there are two related components to the question that I have about multi-core performance. First, do extra cores get used? (You show that they do. Question answered.) Secondly, do extra cores matter from a performance perspective (if clock speed is compromised or otherwise)? (This is probably harder to answer because cores and clock are confounded - better CPU -> more cores, faster clock and complicated by the heterogeneous nature of these CPUs core setups.)

    I suppose the second question could be (mostly) answered by taking a homogeneous core CPU and disabling a cores sequentially and looking at the changes in user experienced performance and power consumption. I'm sure some people will buy something with the maximum number of cores, but I'm just curious about whether it'll make a difference in real-world situations.

Log in

Don't have an account? Sign up now