Mixed-Usage Power & Preliminary Battery Life

We haven’t had both Galaxy S21 Ultra devices long enough to do our more extensive battery life testing routines, however I did run some more power analysis on the more compute-heavy PCMark suite as well as some battery life numbers at 120Hz.

I’ll start off with some power profiling – these figures are generally within 5% margin of error accurate in terms of power usage at the battery level but are measured as input power into device as this allows us much higher resolution data sampling, without actually dismantling the phones.

I put the devices to minimum brightness to minimise screen power attribution in the power figures, with the devices set to 120Hz mode and a lamp shining on them to enable the 120Hz VRR/LFD power savings on the display. Both devices are in their default performance modes.

Galaxy S21 Ultra - Snapdragon 888

The Snapdragon 888’s power chart looks relatively straightforward here, with the different sections corresponding to the different sub-tests in PCMark. We see a varied amount of activity with clear activity spikes and corresponding power spikes when the SoC had to do computations.

Instantaneous peak power is around the 9W mark, which should be mostly due to the Cortex-X1 cores of the chip running alongside the Cortex-A78 cores.

Galaxy S21 Ultra - Exynos 2100

The Exynos chart looks generally similar, which is no surprise given that it’s the same workload. What’s more interesting about the Exynos though is that it has much larger power spikes, up to 14W, which is notably more than the Snapdragon. The Cortex-A78 cores on the Exynos run much higher frequency and power than on the Snapdragon, and together with the higher power draw of the X1 cores, it makes sense that the Exynos’ instantaneous power is considerably higher when all the cores are under load.

The photo editing section of the test is very intriguing as the power profile is very different to that of the Snapdragon. This RenderScript section should be accelerated by the GPU, and here we see the Exynos’ baseline power goes up significantly compared to the Snapdragon for the majority of the test. This is almost 1W in magnitude – I really wonder what’s happening here under the hood and where that power comes from – maybe the GPU doesn’t have as fine-grained power gating or DVFS?

Galaxy S21 Ultra Power Usage in PCMark
Minimum Brightness, 120Hz, Default Performance Mode
  Snapdragon 888

Power Usage
Exynos 2100

Power Usage
% Difference

Exynos vs
Idle Score Screen 759mW 797mW +4.8%
Web-browsing 12935
Video Editing 7866
Writing 2.0 12966
Photo Editing 26884
Data Manipulation 9918

Tabulating the PCMark test scores with the respective power figures across both Galaxy S21 Ultras, we see a few trends.

Performance wise, surprisingly enough, the Exynos has a few sub-tests where it outperforms the Snapdragon, notably in the Web-browsing, Writing and Photo editing tests. The Writing subtest particularly shows a +18% advantage in favour of the Exynos, while it loses more notably in the more single-threaded bound Data Manipulation test.

What’s also notable is that the Exynos’ power consumption is quite higher, across the board on all tests. The Photo Editing test aside where it has a 49% power disadvantage (which is actually high if we were to account for baseline device power), the rest of the test should be apples-to-apples CPU comparisons. We’re seeing roughly +15-20% device power disadvantages, and when account for baseline, this actually grows to around 18-35%. The Exynos does showcase a performance advantage in some of the tests, but not enough to make up for the increased power, meaning perf/W is lower.

Looking at the DVFS of the two chips, we see that they’re generally reaching peak performance at roughly the same time in around 37-38ms. The Snapdragon 888 will schedule a workload directly on the A78 cores at 2.41GHz during that time before ramping up to the X1 cores at 2.84GHz. The Exynos starts off on the A55 cores at idle frequencies around 400-624MHz for 900µs, ramping up to 2210MHz for 4.2ms, before migrating onto the Cortex-A78 cores which start at 1768MHz and ramp up to 2600MHz. Oddly enough, when migrating to the X1 cores the scheduler seems to have troubles migrating the load as the cores run at the idle 533MHz before realising they have work to do and ramp up to the maximum 2.91GHz.

What’s interesting about the Exynos here is that for single-threaded workloads it doesn’t actually visit the A78’s max frequencies – which is actually a benefit for power efficiency and makes the SoC behave more like the Snapdragon counterparts even though is has higher peak frequencies. What’s a bit concerning to see is that even in this extremely simplistic load which is just an add dependency chain, the X1 cores on the Exynos don’t look solid, but rather fluctuate quite a bit. The resulting 2888MHz readout actually doesn’t exist in the SoC’s frequency tables, so I have to wonder if that’s actually real, or if Samsung has employed some new sort of hardware DFS mechanism that works on extremely fine-grained timescales.

PCMark Work 2.0 - Battery Life

In terms of battery life in PCMark between the two phones, because we measured higher power draw on the Exynos, we naturally also see lower battery life on the new Samsung chip compared to the Snapdragon 888 variant of the S21 Ultra. The battery life here is tested in our traditional fashion, with the screen calibrated to 200cd/m² brightness.

The Snapdragon 888 S21 Ultra here fares better than the Galaxy S20 Ultra in terms of battery life, but by a quite minor amount. These results aren’t exactly great given the S21 Ultra’s massively more advanced and more efficient display.

For the Exynos 2100 S21 Ultra, the battery results here are actually slightly worse than the Exynos 990 S20 Ultra. This means that despite the new much more efficient screen, the Exynos 2100 is so aggressive in terms of performance scaling, that it draws notably more average power than the Exynos 990. Yes, the Exynos 2100 is also significantly more performant than its predecessor and this is immediately visible in terms of device usage, but it’s performance that wasn’t just achieved through efficiency, but also through more power usage.

We also got a smaller S21 with the Exynos 2100 – this variant as well as the S21+ do not have the new super-efficient OLED screen the S21 Ultra has, and as such the SoC’s more aggressive power draw is more prominently showcased through quite bad power efficiency in this test at 120Hz.

Web Browsing Battery Life 2016 (WiFi)

In the web-browsing test, which is less compute heavy and leans more towards display power consumption, both the new S21 Ultras fare significantly better than their predecessor due to the now much improved OLED display. These 120Hz numbers (at QHD no less), are actually fantastic, and just shows the new advancements of the new panel.

Nevertheless, the Snapdragon 888 variant of the S21 Ultra is still falling ahead of the Exynos 2100 version due to the better SoC efficiency and lower power levels. The 12.7% lead here is also similar to the general SoC efficiency differences we’ve seen in the other tests.

Update: February 14th - I'm retracting the 120Hz battery life results of the new VRR/LFD display devices pending re-testing, after discovering power-management inconsistencies in the test results. 60Hz results seem unaffected. Further details in the full review.

As we spend more times with the devices, we’ll be completing the test numbers at 60Hz as well as getting data from our web-browsing test. For the time being, the general view is that these new SoCs showcase quite increased performance, however their power draw has also gone up – meaning that battery life generationally should actually go down – with the exception of other non-SoC factors such as the S21 Ultra’s new more efficient display panel.

SPEC - Single Threaded Performance & Power GPU Performance & Power: Very, Very Hot


View All Comments

  • Spunjji - Thursday, February 11, 2021 - link

    I'm not an expert by any means, but I think Samsung's biggest problem was always optimisation - they use lots of die area for computing resources but the memory interfaces aren't optimised well enough to feed the beast, and they kept trying to push clocks higher to compensate.

    The handy car analogy would be:
    Samsung - Dodge Viper. More cubes! More noise! More fuel! Grrr.
    Qualcomm / ARM - Honda Civic. Gets you there. Efficient and compact.
    Apple - Bugatti Veyron. Big engine, but well-engineered. Everything absolutely *sings*.
  • Shorty_ - Monday, February 15, 2021 - link

    you're right but you also don't really touch why Apple can do that and X86 designs can't. The issue is that uOP decoding on x86 is *awfully* slow and inefficient on power.

    This was explained to me as follows:

    Variable-length instructions are an utter nightmare to work with. I'll try to explain with regular words how a decoder handles variable length. Here's all the instructions coming in:

    x86: addmatrixdogchewspout
    ARM: dogcatputnetgotfin

    Now, ARM is fixed length (3-letters only), so if I'm decoding them, I just add a space between every 3 letters.
    ARM: dogcatputnetgotfin
    ARM decoded: dog cat put net got fin

    done. Now I can re-order them in a huge buffer, avoid dependencies, and fill my execution ports on the backend.

    x86 is variable length, This means I cannot reliably figure out where the spaces should go. so I have to try all of them and then throw out what doesn't work.
    Look at how much more work there is to do.

    x86: addmatrixdogchewspoutreading frame 1 (n=3): addmatrixdogchewspout
    Partially decoded ops: add, , dog, , ,
    reading frame 2 (n=4): matrixchewspout
    Partially decoded ops: add, ,dog, chew, ,
    reading frame 3 (n=5): matrixspout
    Partially decoded ops: add, ,dog, chew, spout,
    reading frame 4 (n=6): matrix
    Partially decoded ops: add, matrix, dog, chew, spout,
    Fully Expanded Micro Ops: add, ma1, ma2, ma3, ma4, dog, ch1, ch2, ch3, sp1, sp2, sp3

    This is why most x86 cores only have a 3-4 wide frontend. Those decoders are massive, and extremely energy intensive. They cost a decent bit of transistor budget and a lot of thermal budget even at idle. And they have to process all the different lengths and then unpack them, like I showed above with "regular" words. They have excellent throughput because they expand instructions into a ton of micro-ops... BUT that expansion is inconsistent, and hilariously inefficient.

    This is why x86/64 cores require SMT for the best overall throughput -- the timing differences create plenty of room for other stuff to be executed while waiting on large instructions to expand. And with this example... we only stepped up to 6-byte instructions. x86 is 1-15 bytes so imagine how much longer the example would have been.

    Apple doesn't bother with SMT on their ARM core design, and instead goes for a massive reorder buffer, and only presents a single logical core to the programmer, because their 8-wide design can efficiently unpack instructions, and fit them in a massive 630μop reorder buffer, and fill the backend easily achieving high occupancy, even at low clock speeds. Effectively, a reorder buffer, if it's big enough, is better than SMT, because SMT requires programmer awareness / programmer effort, and not everything is parallelizable.
  • Karim Braija - Saturday, February 20, 2021 - link

    Je suis pas sur si le benchmark SPENCint2006 est vraiment fiable, en plus je pense que ça fait longtemps que ce benchmark est là depuis un moment et je pense qu'il n'a plus bonne fiabilité, ce sont de nouveaux processeurs puissant. Donc je pense que ce n'est pas très fiable et qu'il ne dit pas des choses précises. Je pense que faut pas que vous croyez ce benchmark à 100%. Reply
  • serendip - Monday, February 8, 2021 - link

    "Looking at all these results, it suddenly makes sense as to why Qualcomm launched another bin/refresh of the Snapdragon 865 in the form of the Snapdragon 870."

    So this means Qualcomm is hedging its bets by having two flagship chips on separate TSMC and Samsung processes? Hopefully the situation will improve once X1 cores get built on TSMC 5nm and there's more experience with integrating X1 + A78. All this also makes SD888 phones a bit pointless if you already have an SD865 device.
  • Bluetooth - Monday, February 8, 2021 - link

    Why would they skimp on the cache. Was neural engine or something else with higher priority getting silicon? Reply
  • Kangal - Tuesday, February 9, 2021 - link

    I think Samsung was rushing, and its usually easier to stamp out something that's smaller (cache takes alot of silicon estate). Why they rushed was due to a switch from their M-cores to the X-core, and also internalising the 5G-radio.

    Here's the weird part, I actually think this time their Mongoose Cores would be competitive. Unlike Andrei, I estimated the Cortex-X1 was going to be a load of crap, and seems I was right. Having node parity with Qualcomm, the immature implementation that is the X1, and the further refined Mongoose core... it would've meant they would be quite competitive (better/same/worse) but that's not saying much after looking at Apple.

    How do I figure?
    The Mongoose core was a Cortex A57 alternative which was competitive against Cortex A72 cores. So it started as midcore (Cortex A72) and evolved into a highcore implementation as early as 2019 with the S9 when they began to get really wide, really fast, really hot/thirsty. Those are great for a Large Tablet or Ultrabook, but not good properties for a smaller handheld.

    There was a precedence for this, in the overclocked QSD 845 SoCs, 855+, and the subpar QSD 865 implementation. Heck, it goes all the way back to 2016 when MediaTek was designing 2+4+4 core chipsets (and they failed miserably as you would imagine). I think when consumers buy these, companies send orders, fabs design them, etc... they always forget about the software. This is what separates Apple from Qualcomm, and Qualcomm from the rest. You can either brute-force your way to the top, or try to do things more cost/thermal efficiently.
  • Andrei Frumusanu - Tuesday, February 9, 2021 - link

    > Unlike Andrei, I estimated the Cortex-X1 was going to be a load of crap, and seems I was right.

    The X1 *is* great, and far better than Samsung's custom cores.
  • Kangal - Wednesday, February 10, 2021 - link

    First of all, apologies for sounding crass.
    Also, you're a professional in this field, I'm merely an enthusiast (aka Armchair Expert) take what I say with a grain of salt. So if you correct me, I stand corrected.

    Nevertheless, I'm very unimpressed by big cores: Mongoose M5, to a lesser extent the Cortex-X1, and to a much Much much lesser extent the Firestorm. I do not think the X1 is great. Remember, the "middle cores" still haven't hit their limits, so it makes little sense to go even thirstier/hotter. Even if the power and thermal issues weren't so dire with these big-cores, the performance difference between the middle cores vs big cores is negligible, also there is no applications that are optimised/demand the big cores. Apple's big-core implementation is much more optimised, they're smarter about thermals, and the performance delta between it and the middle-cores is substantial, hence why their implementation works and why it favours compared to the X1/M5.

    I can see a future for big-cores. Yet, I think it might involve killing the little-cores (A53/A55), and replacing it with a general purpose cores that will be almost as efficient yet be able to perform much better to act as middle-cores. Otherwise latency is always going to be an issue when shifting work from one core to another then another. I suspect the Cortex-X2 will right many wrongs of the X1, combined with a node jump, it should hopefully be a solid platform. Maybe similar to the 20nm-Cortex A57 versus the 16nm-Cortex A72 evolution we saw back in 2016. The vendors have little freedom when it comes to implementing the X1 cores, and I suspect things will ease up for X2, which could mean operating at reasonable levels.

    So even with the current (and future) drawbacks of big-cores, I think they could be a good addition for several reasons: application-specific optimisations, external dock. We might get a DeX implementation that's native to Android/AOSP, and combined that with an external dock that provides higher power delivery AND adequate active-cooling. I can see that as a boon for content creators and entertainment consumers alike. My eye is on emulation performance, perhaps this brute-force can help stabilise the weak Switch and PS2 emulation currently on Android (WiiU next?).
  • iphonebestgamephone - Monday, February 15, 2021 - link

    The improvement with the 888 in damonps2 and eggns are quite good. Check some vids on youtube. Reply
  • Archer_Legend - Tuesday, February 9, 2021 - link

    Actually samsung has still M6 cores in its belly, the development team was shut down only after they completed the M6 cores.

    Difficoult to say if they would have been better than an X1.

    However it seems that arm has rushed this whole a78 and X1 thing and samsung rushed to put too much stuff in the cpu with evidently not enough time to do it well

Log in

Don't have an account? Sign up now