Conclusion

Samsung’s Exynos 7420 is a major stepping stone for Samsung LSI. While on a functional and IP basis the chipset hasn’t seen substantial differentiation from its predecessor, it’s on the actual physical implementation and manufacturing process that the new SoC has raised the bar.

On the CPU side of things, we saw some performance improvements due to slightly higher clocks and what seems to be a better cache implementation, especially the big CPU cluster. Equally on the big cluster Samsung has played it safe and has gone for power efficiency rather than aiming for maximum achievable clocks. ARM’s Cortex A57 in the Exynos 5433 was already overshooting performance over its direct competitor, the Snapdragon 805, so there was no need for the Exynos 7420 to push the clocks much higher. And this is a good design decision for the new SoC as both maximum power as well as power efficiency have improved by a lot. With the new part now using 35-45% less power at equal frequencies it now has the required TDP and efficiency to be placed in thin smartphones such as the Galaxy S6.

I think Samsung could have even gotten away in performance benchmarks by keeping the chip at up to only 1.9GHz to keep power consumption below the 1W per core mark. This would have slightly improved efficiency on high loads as the small 10% performance degradation would have been worth the 26% power improvement.

In the review of the Exynos 5433 I was very up front about my disappointment with that SoC’s software and power management as it showed very little optimization and the degradation in real-world use-cases was measurable. This time around, it seems Samsung Electronics did a better job at properly configuring the scaling parameters of the SoC’s power management. Gone are the odd misconfigurations, and with them also most of the inefficient behaviors that we were able to measure on the big.LITTLE SoC’s predecessor. While there’s still plenty of room for improvement such as an eventual upgrade to an energy-aware scheduler, it currently does the job in a satisfactory way.

On the GPU side of things we saw sort of a two-sided story; The good side is that the Exynos 7420’s Mali T760MP8 combined with the 14nm process not only makes this the fastest SoC we’ve seen in a smartphone but also currently the most efficient one that we measured. The bad side of the story is that while it’s the most efficient SoC, the performance and power again overshoots the sustainable TDP of the phone as it will inevitably thermal throttle to lower frequency states during active usage. Over the last few generations this issue grew worse and worse as semiconductor vendors and OEMs tried to boost their competitive position in benchmark scoreboards.

While for the CPU there are real-world uses and performance advantages of having overdrive frequencies above the sustainable TDP, one cannot say the same for the GPU. Samsung is not alone here in this practice as also Qualcomm and many others employ overpowered configurations that make no sense in the devices they ship in. Having a reasonably balanced SoC has become more of the exception than the rule. One can argue that these are high-performance designs that are also meant to also go into tablets and larger form-factors, and SoC vendors should subsequently not be at the ones at receiving end of the blame – it would then be the OEM’s responsibility to properly configure and limit power via software when using the parts in smaller devices. Ultimately, I’d like to see this practice go away as it brings only disadvantages to the end-consumer and leads to an inconsistent gaming experience with reduced battery life.

The Galaxy S6 with the Exynos 7420 is among the first wave of devices to feature LPDDR4 memory. While the performance improvement was nothing ground-breaking, with the boost coming at an average 18-20% in GFXBench, it’s mostly the efficiency that should have the biggest impact on a device’s experience. While I wasn’t able to fully quantize this advantage during measurement due to the complexity of the task, the theoretical gains show that improvements in daily use-cases should be substantial.

Overall, the big question is how good the Exynos 7420 finally is. The verdict on a SoC vastly depends on the competing alternative options available at the time. For the better part of 2015 this will most likely be Qualcomm’s Snapdragon 810 and to a lesser part the Snapdragon 808. In this piece I was already able to show GPU numbers of the S810 and the results unfortunately showed no improvement over the Snapdragon 805, which the Exynos 7420 already beats both in performance and power. While I already have CPU numbers for the 810, we weren’t quite ready to include these in this piece as they’ll warrant a more in-depth look in a separate article. Readers who have already read our review of the HTC M9 will already know what to expect as the SoC just wasn’t able to perform as promised, and I can confirm that the efficiency disadvantage relative to the Exynos 7420 is significant.

Ultimately, this leaves the Exynos 7420 without real competition. Samsung was able to hit it out of the park with the new 14nm design and subsequently leapfrogged competing solutions. For the near future, the Exynos 7420 comfortably stands alone above other Android-targeted designs as it sets the new benchmark for what a 2015 SoC should be.

GPU & LPDDR4 Performance & Power
Comments Locked

114 Comments

View All Comments

  • jjj - Monday, June 29, 2015 - link

    The power doesn't look that great, for the A57 seems to allow 300-350Mhz higher clocks, granted it's not a clean shrink. It looks good here because on 20nm they pushed the clocks way high.
  • name99 - Monday, June 29, 2015 - link

    Insofar as rumors can be believed, the bulk of A9's are scheduled to be produced by Samsung, presumably on this process. It seems strange to have Apple design/layout everything twice for the same CPU, so if these same rumors (30% going to TSMC) are correct, presumably that means the A9X will be on TSMC.

    As for characterizing Apple CPUs, while there are limits to what one can learn (eg in the voltage/power tradeoffs), there is a LOT which can be done but which, to my disappointment, has still not been done. In particular if someone wanted, I think there's scope for learning an awful lot from carefully crafted micro benchmarks. Agner Fog has give a large number of examples of how to do this in the x86 space, while Henry Wong at stuffedcow.net has done the same for a few less obvious parts of the x86 architecture and for GPUs.

    It strikes me as bizarre how little we know about Apple CPUs even after two years.
    The basic numbers (logical registers, window, ROB size) seem to about match Intel these days, and the architecture seems to be 6-wide with two functional clusters. There appears to be a loop buffer (but how large?) But that's about it.
    How well does the branch prediction work and where does it fail?
    What prefetchers are provided? (at I1, D1, L2. L3)
    Do the caches do anything smart (like dead block prediction) for either performance or power?
    Does the memory manager do anything smart (like virtual write queue in the L3)?
    etc etc etc

    Obviously Apple doesn't tell us these. (Nowadays the ONLY company that does is IBM, and only in pay-walled articles in their JRD.) But people write the micro benchmarks to figure this out for Intel and AMD, and I wish the same sort of enthusiasm and community existed in the ARM world.
  • SunnyNW - Wednesday, July 1, 2015 - link

    Believe word on the street is the A9 will be Sammy 14nm and the A9X TSM 16nm+
  • SunnyNW - Wednesday, July 1, 2015 - link

    Please ignore this comment, should have read the rest of the comments before posting since Name99 already alluded to this below. Sorry
  • CiccioB - Monday, June 29, 2015 - link

    Is the heterogeneous processing that allows all 8 cores working together active?
    Seen the numbers of the various bench it seems this feature is not used.
    What I would like to know exactly is that is the bench number of this SoC can be directly compared to SoC with only 4 cores like the incoming Qualcomm Snapdragon 820 based on custom architecture which has "only" 4 cores and not a big.LITTLE configuration.
  • Andrei Frumusanu - Monday, June 29, 2015 - link

    HMP is active. Why do you think it seems to be not used?
  • CiccioB - Monday, June 29, 2015 - link

    Because with 8 cores active (or what they should be with HMP) results is not even near 4x the score of a single core.
    So I wonder if those 8 core are really active. And whether they are of any real use if, to keep consumption adequate, frequencies of higher cores get limited.
  • Andrei Frumusanu - Monday, June 29, 2015 - link

    All the cores are always active and they do not get limited other than in thermal stress situations. I didn't publish any benchmarks comparing single vs multi-core performance so your assumption must be based on something else. Having X-times the cores doesn't mean you'll have X-times the performance, it completely depends on the application.

    It's still a perfectly valid comparison to look at traditional quad-cores vs bL octa-cores. In the end you're looking at total power and total performance and for use-cases such as PCMark the number of cores used shouldn't be of interest to the user.
  • Refuge - Monday, June 29, 2015 - link

    I would hazard a guess that thermal throttling has something to do with part of it.
  • ruturaj1989@gmail.com - Monday, June 29, 2015 - link

    It does have 4 cores but I guess they are in big.LITTLE configuration too. We will see shortly. HMP is active but I am not sure if every bench app uses all the cores.

Log in

Don't have an account? Sign up now