Scheduler mechanisms: WALT & PELT

Over the years, it seems Arm noticed the slow progress and now appears to be working more closely with Google in developing the Android common kernel, utilizing out-of-tree (meaning outside of the official Linux kernel) modifications that benefit performance and battery life of mobile devices. Qualcomm also has been a great contributor as WALT is now integrated into the Android common kernel, and there’s a lot of work going on from these parties as well as other SoC manufacturers to advance the platform in a way that benefits commercial devices a lot more.

Samsung LSI’s situation here seems very puzzling. The Exynos 9810 is the first flagship SoC to actually make use of EAS, and they are basing the BSP (Board support package) kernel off of the Android common kernel. The issue here is that instead of choosing to optimise the SoC through WALT, they chose to fall back to full PELT dictated task utilisation. That’s still fine in terms of core migrations, however they also chose to use a very vanilla schedutil CPU frequency driver. This meant that the frequency ramp-up of the Exynos 9810 CPUs could have the same characteristics as PELT, which means it would be also bring with it one of the existing disadvantages of PELT: a relatively slow ramp-up.


Source: BKK16-208: EAS

Source: WALT vs PELT : Redux – SFO17-307

One of the best resources on the issue actually comes from Qualcomm, as they had spearheaded the topic years ago. In the above presentation presented at Linaro Connect 2016 in Bangkok, we see the visual representation of the behaviour of PELT vs WinLT (which WALT was called at the time). The metrics to note here in the context of the Exynos 9810 are the util_avg (which is the default behaviour on the Galaxy S9) and the contrast to WALT’s ravg.demand and actual task execution. So out of all the possible options in terms of BSP configurations, Samsung seemed to have chosen the worst one for performance. And I do think this seems to have been a conscious choice as Samsung had made additional mechanisms to the both the scheduler (eHMP) and schedutil (freqvar) to counteract this very slow behaviour caused by PELT.

In trying to resolve this whole issue, instead of adding additional logic on top of everything I looked into fixing the issue at the source.

What was first tried is perhaps the most obvious route, and that's to enable WALT and see where that goes. While using WALT as a CPU utilisation signal for the Exynos S9 gave outstandingly good performance, it also very badly degraded battery life. I had a look at the Snapdragon 845 Galaxy S9’s scheduler, but here it seems Qualcomm diverges significantly from the Google common kernel which the Exynos is based on. This being far too much work to port, I had another look at the Pixel 2’s kernel – which luckily was a lot nearer to Samsung’s. I ported all relevant patches which were also applied to the Pixel 2 devices, along with porting EAS to a January state of the 4.9-eas-dev branch. This improved WALT’s behaviour while keeping performance, however there was still significant battery life degradation compared to the previous configuration. I didn’t want to spend more time on this so I looked through other avenues.


Source : LKML Estimate_Utilization (With UtilEst) 

Looking through Arm's resources, it looks very much like the company is aware of the performance issues and is actively trying to improve the behaviour of PELT to more closely match that of WALT. One significant change is a new utilisation signal called util_est (Utilisation estimation) which is added on top of WALT and is meant to be used for CPU frequency selection. I backported the patch and immediately saw a significant improvement in responsiveness due to the higher CPU frequency state utilisation. Another simple way of improving PELT was reducing the ramp/decay timings, which incidentally also got an upstream patch very recently. I backported this as well to the kernel, and after testing a 8ms half-life setting  for a bit and judging it to not be good for battery life, I settled on a 16ms settings, which is an improvement over the 32ms of the stock kernel and gives the best performance and battery compromise.

Because of these significant changes in the way the scheduler is fed utilisation statistics, the existing tuning from Samsung obviously weren’t valid anymore. I adapted most of them to the best I could, which basically involves just disabling most of them as they were no longer needed. Also I significantly changed the EAS capacity and cost tables, as I do not think that the way Samsung populated the table is correct or representative of actual power usage, which is very unfortunate. Incidentally, this last bit was one of the reasons that performance changed when I limited the CPU frequency in part 1, as it shifted the whole capacity table and changed the scheduler heuristic. 

But of course, what most of you are here for is not how this was done but rather the hard data on the effects of my experimenting, so let's dive into the results.

The New Modifications & A Scheduling Recap Performance & Battery Results
POST A COMMENT

76 Comments

View All Comments

  • Andrei Frumusanu - Saturday, April 21, 2018 - link

    It's not a replacement; they serve different purposes. Reply
  • The_Assimilator - Saturday, April 21, 2018 - link

    A single tech writer with some smarts is able to do what a tech conglomerate with multiple billions of dollars can't. That is absolutely fucking pathetic on Samsung's part; I used to think they were just incompetent, but to be able to design and fab their own CPUs, yet not provide appropriate working drivers for that CPU? Words quite honestly fail me. Reply
  • BurntMyBacon - Monday, April 23, 2018 - link

    An alternate view is that they have great teams for chip design (though perhaps not as much this one), semi-conductor fabrication, and phone hardware integration., but not particularly good (being kind) teams for their software/firmware development. Reply
  • johnnycanadian - Saturday, April 21, 2018 - link

    And the verdict is ... if you want the best Android experience, either pick up a Pixel 2 XL or wait a few months for the P3. If you want the highest performing smartphone, tolerate Apple and the never-quite-works-properly Siri and their gimmicky fashion-first business model. I expected better out of Samsung ... The Note9 is supposed to be quite an evolution ... hopefully enough that I can be convinced to trade in my Note5. As for everyday use, my Pixel 1 XL is still running brilliantly and there simply isn't enough of a performance delta to risk a third-party Android build that may or may not receive OS updates 24 months from now. Reply
  • santz - Saturday, April 21, 2018 - link

    thank you for the excellent writeup. I just hope the Note 9 will not have exynos for their international version. Reply
  • Seattletech - Sunday, April 22, 2018 - link

    Sign me up for
    2 M3
    2 A75
    4 A55
    Reply
  • N Zaljov - Sunday, April 22, 2018 - link

    Magnificent article, thanks for wrapping everything up in such a detailed manner.

    The more I‘m looking at it, the more I‘m asking myself: „WTF were they smoking when they came up with the ingenieus idea of putting all Meerkat cores into a single clock- & voltage-domain?“. And tbh, even today I can‘t come up with a proper explanation other than „Time to market & saving xtors for the sake of not blowing up the chips budget“.
    Reply
  • Quantumz0d - Sunday, April 22, 2018 - link

    Another majestic one. As expected from the initial experiment, though I was under the remark that the custom tuning would at least fix up the issue a bit, it did but the performance/efficiency is just bad. Very bad for such a high profile flagship device.

    A pity that this level of SoC should need 4000Mah capacity. It would have been much better for tuning and custom software enthusiasts and considering the constant high performance the voltage/power scaling is fine so vs the SD845 this SoC won't kill the total endurance of the battery fast. That's the only good thing of 9810 custom tuned/stock vs 845. But the XDA developer community and the top devs will get much more resources to work with with ease on the other phones with 845 platform like Pixel 3 or OP6 (Unfortunate Notch B$) don't have hopes on HTC as their 10 was a fail even with optimization the efficiency was off small battery and WiFi issues. OP3(T) and 5T seem the best choice for now.

    Much thanks Andrei for this, superb analysis and thanks for letting us know about the EAS too, that was total gold. I needed that. I guess I will pass this. I don't want another 6hr SOT, OP3 barely has 3-5 and 6 is super bad with custom tuning that too after a gap of 1Yr. Will wait for the next Exynos if it has a headphone jack / OP7.

    And please keep this work coming going forth, don't leave us in the dark, Ofc it would be great if you get a good opportunity but we need you.

    Thank you again Andrei.
    Reply
  • Azurael - Sunday, April 22, 2018 - link

    On my OnePlus 5, I go from 11 hours SOT (with the standard setup) to about 7 (using EAS+schedutil) and performance in most benchmarks regresses - it clearly needs a lot of work. Reply
  • Spoelie - Sunday, April 22, 2018 - link

    Doesn't seem like someone on XDA is picking up this config yet - would love to try it on my S9. Reply

Log in

Don't have an account? Sign up now