S-Browser - AnandTech Article

We start off with some browser-based scenarios such as website loading and scrolling. Since our device is a Samsung one, this is a good opportunity to verify the differences between the stock browser and Chrome as we've in the past identified large performance discrepancies between the two applications.

To also give the readers an idea of the actions logged, I've also recorded recreations of the actions during logging. These are not the actual events represented in the data as I didn't want the recording to affect the CPU behaviour.

We start off by loading an article on AnandTech and quickly scrolling through it. It's mostly at the beginning of the events that we're seeing high computational load as the website is being loaded and rendered.

Starting off at a look of the little cluster behaviour:

The time period of the data is 11.3s, as represented in the x-axis of the power state distribution chart. During the rendering of the page there doesn't seem to be any particular high load on the little cores in terms of threads, as we only see about 1 little thread use up around 20% of the CPU's capacity. Still this causes the cluster to remain at around the 1000MHz mark and causes the little cores to mostly stay in their active power state. 

Once the website is loaded around the 6s mark, threads begin to migrate back to the little cores. Here we actually see them being used quite extensively as we see peaks of 70-80% usage. We actually have bursts where may seem like the total concurrent threads on the little cluster exceeds 4, but still nothing too dramatically overloaded.

Moving on to the big cluster:

On the big cluster, we see an inversion of the run-queue graph. Where the little cores didn't have many threads placed on them, we see large activity on the big cluster. The initial web site rendering is clearly done by the big cluster, and it looks like all 4 cores have working threads on them. Once the rendering is done and we're just scrolling through the page, the load on the big cluster is mostly limited to 1 large thread. 

What is interesting to see here is that even though it's mostly just 1 large thread that requires performance on the big cores, most of the other cores still have some sort of activity on them which causes them to not be able to fall back into their power-collapse state. As a result, we see them stay within the low-residency clock-gated state.

On the frequency side, the big cores scale up to 1300-1500 MHz while rendering the initial site and 1000-1200 while scrolling around the loaded page.

When looking at the total amount of threads on the system, we can see that the S-Browser makes good use of at least 4 CPU cores with some peaks of up to 5 threads. All in all, this is a scenario which doesn't necessarily makes use of 8 cores per-se, however the 4+4 setup of big.LITTLE SoCs does seem to be fully utilized for power management as the computational load shifts between the clusters depending on the needed performance.

Introduction & Methodology Browser: S-Browser - AnandTech Frontpage
Comments Locked

157 Comments

View All Comments

  • V900 - Tuesday, September 1, 2015 - link

    A question that DOESNT get answered however is: Does the fact that all cores get used, contribute to a better/faster user experience?

    If there was only 2 or 4 cores present, would they complete the tasks just as fast?

    In other words, Is there a gain from all 8 cores being used, or does all 8 cores get used just because they are there? (By low priority threads, which in a quad/dual core CPU would have been done sequentially, in just as fast a time?)

    Since Apples dual core iPhones, always outperform Android quad and octa core phones, I would think that the latter is closer to the truth.

    Read up on what some of the other posters here have written about low priority threads, and Microsofts research on the matter.

    And ignore anyone who tries to over-interpret this article!
  • frenchy_2001 - Wednesday, September 2, 2015 - link

    > Does the fact that all cores get used, contribute to a better/faster user experience?
    It does not, as long as you CPU can process all the threads in a timely manner.
    It contributes to a lower power usage though, as power grows following the square of Voltage and voltage usage grows with frequency, while parallelization grows linearly.
    Basically, if 2 A53 @ 800MHz can do the same amount of work as 1 A53 @ 1.6GHz, the 2 slower cores will do it for less power (refer to the perf/W curve on the conclusion page).

    This was the goal of ARM when they designed big.LITTLE and this article shows that the S6 uses it correctly (by using small cores predominantly and keeping frequencies low). It is one more trick to deliver strong immediate computation, good perfs/W at moderate usage and great idle power while idling. I would not extrapolate beyond that as too many variables are in play (kernel/governor/HW/apps...)
  • name99 - Tuesday, September 1, 2015 - link

    "When I started out this piece the goals I set out to reach was to either confirm or debunk on how useful homogeneous 8-core designs would be in the real world"

    You mean heterogeneous above rather than homogeneous.
  • Andrei Frumusanu - Tuesday, September 1, 2015 - link

    No, I meant specifically 8x A53 SoCs.
  • lilmoe - Tuesday, September 1, 2015 - link

    I've been waiting for this piece since the GS6 came out. I can't even imagine the amount of time and work you've put into it. THANK YOU Andre.

    Now I hope we can put to rest the argument that Android would do better with only 2 high performance cores VS more core configurations. Google has been promising this for years and they're finally _starting_ to deliver. They're not there yet, lots of work needs to be done to exterminate all that ridiculous overhead (evident in the charts).

    I'm also glad that it's finally evident that Chrome on Android VS SBrowser has significant impact on performance and battery life. It should only be fair to ask that Anandtech starts using the built-in browser for each respective device when benchmarking.

    We're _just_ reaping the benefits of properly implemented big.LITTLE configurations, in both hardware and software, after 2 years of waiting. What's funny is that both Qualcomm and Samsung are moving away from these implementations back to Quad-core CPUs with Kryo and Mongoose respectively... I personally hope we get the best of both worlds in the form of Mediatek's 10 core big.LITTLE implementation, except the 2 high perf cores being either Kryo or Mongoose for their relatively insane single-threaded performance.
  • V900 - Tuesday, September 1, 2015 - link

    You're coming up with conclusions that aren't aupported by the article.

    Can we put 2 vs 8 core argument to rest? Nope.

    This test only shows, that when there are 4 (or 8) cores available, Android occasionally uses them all.

    It says NOTHING about whether an 8 core CPU would be faster than with 2 wide cores. (Samsung and Qualcomm are moving towards Apple-like wide dual core designs. I doubt they'd do that, if 8 cores were really always faster/economical than 2)

    In fact, the article doesn't really tell us whether 8 small cores are faster/more economical than 2 or 4 small cores. Keep in mind what people have brought up about the priority of threads. Some of the threads you see occupying all 8 cores, are low priority threads, that could just as quickly be completed in sequence if there were only 2 or 4 low power cores available.
  • lilmoe - Tuesday, September 1, 2015 - link

    "Can we put 2 vs 8 core argument to rest? Nope."

    Are you sure we're on the same page here? We're talking efficiency, right?

    "This test only shows, that when there are 4 (or 8) cores available, Android occasionally uses them all."

    No it doesn't. Android is capable of utilizing all cores, yes, but it only allocates threads to the amount of cores *needed*, which is much, MUCH more power efficient than elevating a smaller number of high performance cores to their max performance/freq states.

    "It says NOTHING about whether an 8 core CPU would be faster than with 2 wide cores. (Samsung and Qualcomm are moving towards Apple-like wide dual core designs. I doubt they'd do that, if 8 cores were really always faster/economical than 2)".

    True, it doesn't show direct comparisons with modern wide cores running Android, because there isn't any. But even taking MT overhead and core switching overhead into account, I believe it's safe to say that things should be comparable (since the small cluster is rarely saturated), except (again) much more efficient. And no, QC and Samsung aren't moving to any dual core configuration; they're both moving to Quad-core configuration (ie: the most optimal for Android), which further proves the argument that more cores running at a lower frequency (and lower power draw) is more efficient than having fewer cores running at their relative max for MOBILE DEVICES.

    The problem isn't the premise, it's the means. ARM's reference core designs aren't optimal in comparison to custom designs neither in performance nor in power consumption. Theoretically speaking, if Qualcomm or Samsung use little versions of their custom cores in 8-core configurations, or 4x4 big.LITTLE, we might theoretically see tremendous power savings in comparison. Again, this applies to Android based on this article.

    "In fact, the article doesn't really tell us whether 8 small cores are faster/more economical than 2 or 4 small cores."

    This article STRICTLY talks about the impact of 4x4 core big.LITTLE configuration has on ANDROID if you want BOTH performance and maximum efficiency. It clearly displays how Android (and its apps) is capable of dividing the load into multiple threads; therefore having more cores has its benefits. Also, you can clearly see that there is noticeable overhead here and there, and throwing more cores at the problem, running at lower frequency, is a better brute force solution to, AGAIN, maximize efficiency while maintaining high performance WHEN NEEDED, which is usually in relatively short bursts. Android still has ways with optimization, but its current incarnation proves that more cores are more efficient.

    You are making the wrong comparisons here. What you should be asking for is comparisons between a quad-core A57 chip, VS an 8-core A53 chip, VS a 4x4 A57/A53 big.LITTLE chip. That, and only that would be a valid apples-to-apples comparison, which in this case is only valid when tested with Android. Unfortunately, good luck finding these chips from the same manufacturer built ont he same process...
  • lilmoe - Tuesday, September 1, 2015 - link

    "they're both moving to Quad-core configuration (ie: the most optimal for Android)"

    In regard to large/wide core non-big.LITTLE designs that is.
  • lefty2 - Tuesday, September 1, 2015 - link

    You are right. The article is deeply flawed. Nowhere is there any evidence of a 4 core device rendering a web page faster than a 2 core.
  • lopri - Wednesday, September 2, 2015 - link

    Qualcom's next custom core is 2+2 but Samsung's is 4+4. But I agree with the gist of your argument. Different core counts, but they all aim the same goal - performance and efficiency.

Log in

Don't have an account? Sign up now