T760 investigated - Samsung Galaxy Note 4 Exynos Review

Name: ARM A53/A57/T760 investigated - Samsung Galaxy Note 4 Exynos Review
Item: ARM A53/A57/T760 investigated - Samsung Galaxy Note 4 Exynos Review

by Andrei Frumusanu & Ryan Smith on February 10, 2015 7:30 AM EST

135 Comments | Add A Comment

135 Comments

20nm Manufacturing Process

Both Samsung Semiconductor and TSMC delivered their first 20nm products in Q3 2014, but they don't represent the same jump in efficiency. Samsung's 28nm HKMG process varied a lot from TSMC's 28nm HPM process. While Samsung initially had a process lead with their gate-first approach when introducing 32nm HKMG and subsequently the 28nm shrink, TSMC went the route of gate-last approach. The advantage of the gate-last approach is that it allows for lower variance in the manufacturing process and being able to allow for better power characteristics. We've seen this as TSMC introduced the highly optimized HPM process in mobile. Qualcomm has been the biggest beneficiary as they've taken full advantage of this process jump with the Snapdragon 800 series as they moved from 28nm LP in previous SoCs.

In practical terms, Samsung is brought back on even terms with TSMC in terms of theoretical power consumption. In fact, 28nm HPM still has the same nominal transistor voltage as Samsung's new 20nm process.

Luckily Samsung provides useful power modeling values as part of the new Intelligent Power Allocation driver for the 5422 and 5430 so we can get a rough theoretical apples-to-apples comparison as to what their 20nm process brings over the 28nm one used in their previous SoCs.

I took the median chip bin for both SoCs to extract the voltage tables in the comparison and used the P=C*f*V² formula to compute the theoretical power figure, just as Samsung does in their IPA driver for the power allocation figures. The C coefficient values are also provided by the platform tables.

We can see that for the A15 cores, there's an average 24% power reduction over all frequencies, with the top frequencies achieving a good 29% reduction. The A7 cores see the biggest overall voltage drop, averaging around -125mV, resulting in an overall 40% power reduction and even 56% at the top frequency. It's also very likely that Samsung has been tweaking the layout of the cores for either power or die size; we've seen this as the block sizes of the CPUs have varied a lot between the 5410, 5420 and 5422, even though they were on the same process node.

While these figures provide quite a significant power reduction by themselves, they must be put into perspective with what Qualcomm is publishing for their Krait cores. The Snapdragon 805 on a median speed bin at 2.65GHz declares itself with a 965mW power consumption, going down to 57mW at 300MHz. While keeping in mind that these figures ignore L2 cache power consumption as Qualcomm feeds this on a dedicated voltage rail, it still gives us a good representation of how efficient the HPM process is. The highest voltages on the S805 are still lower than the top few frequencies found on both the 5430 and the 5433.

20nm does bring with itself a big improvement in die size. If we take the 5420 as the 28nm comparison part and match it against the 5430, we see a big 45% decrease on the A7 core size, and an even bigger 64% reduction on the A15 core size. The total cluster sizes remain relatively conservative in their scaling while shrinking about 15%; this is due to SRAM in the caches having a lower shrinking factor than pure logic blocks. One must keep in mind that auxiliary logic such as PLLs, bus interfaces, and various other small blocks are part of a CPU cluster and may also impact the effective scalability. Samsung also takes advantage of artificially scaling CPU core sizes to control power consumption, so we might not be looking at an apples-to-apples comparison, especially when considering that the 5430 is employing a newer major IP revision of the CPU cores.

Exynos 5420 vs Exynos 5430 block sizes
	Exynos 5420	Exynos 5430	Scaling Factor
A7 core	0.58mm²	0.4mm²	0.690
A7 cluster	3.8mm²	3.3mm²	0.868
A15 core	2.74mm²	1.67mm²	0.609
A15 cluster	16.49mm²	14.5mm²	0.879

The Mali T628 between the 5420 and the 5430 actually had an increase in die size despite the process shrink, but this is due to a big increase in the cache sizes.

Samsung regards their 20nm node as very short-lived and the 5430 and 5433 look to be the only high volume chips that will be coming out on the process as their attention is focused on shipping 14nm FinFET devices in the next few months. In fact at the Samsung Investor Forum 2014 they announced mass production of a new high-end SoC has already begun mid-November and will be ramping up to full volume in early 2015. I suspect this to be the Exynos 7420 as that is the successor SoC to the 5433.

All in all, the argument that this 20nm chip should be more power efficient than the competitors' 28nm is not completely factual and doesn't seem to hold up in practice. The process still seems young and unoptimized compared to what TSMC offers on 28nm.

Before we get to the performance and power figures, I'm handing things over to Ryan as we take a look at the architectural changes, starting with an analysis of the Cortex A53.

Note 4 with Exynos 5433 - An Overview Cortex A53 - Architecture

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

135 Comments

View All Comments

habbakuk87 - Wednesday, February 11, 2015 - link
I had like to thanks the writers for great in depth article, this is the kind of thing which has kept me coming to this site over the past many years.
Keep up the good work.
Arbie - Wednesday, February 11, 2015 - link
Yes, this is a great article - knowledgeable and in-depth. Work like this is what keeps Anandtech way above the crowd. Thanks.
joe_dude - Wednesday, February 11, 2015 - link
What it looks like to me is that they need to cap 4 big core load to 1.6 GHz, which would keep the thermals under control. Going by the chart, 1 core @ 1.9 GHz, 2 cores @ 1.8 GHz, 3 cores @ 1.7 GHz would work nicely for power consumption/heat as well. It seems Samsung and Qualcomm are setting the max frequency too high. That last 100 to 200 MHz requires a lot more voltage.
joe_dude - Wednesday, February 11, 2015 - link
You look at the power consumption at 1.9 GHz vs. 1.6 GHz - 7.39w vs. 4.44w. That's the crux of the problem right there. 4 cores should not be running at such a high clock speed (and voltage needed to support it).
PrinceGaz - Wednesday, February 11, 2015 - link
"So the question is, is it still worth to try and get an Exynos variant over the Snapdragon one? I definitely think so. In everyday usage the Exynos variant is faster. The small battery disadvantage is more than outweighed by the increased performance of the new ARM cores."

That made me laugh as it is easily the most out of touch comment I've read in an AnandTech review.

It may be true for you as a reviewer sitting at home with the phone plugged into a USB charger running benchmarks that you feel the extra performance is worthwhile, but it will count for nothing when you use it in the real world and find your battery is dead earlier. Almost everyone with a smartphone today would rather have longer battery life instead of higher performance, because just like with PCs for some time now, the performance is already good enough.

Longer battery life trumps extra performance in smartphones now. I don't know why you want to put a positive spin on these higher performance but lower efficiency A57/A53 cores; I doubt ARM are paying you, but it seems they are a step backwards for people who use their phone primarily on the move, which includes most people.
patrickjchase - Wednesday, February 11, 2015 - link
From long experience with a variety of microarchitectures (including ARMs), I would guess that the latency/bandwidth "oddities" reflect differences in hardware prefetch. That's the most logical culprit among the list of changes that ARM provided.

The hypothesis that the capability to dual-issue loads/stores impacts bandwidth to L2 and/or DDR seems questionable, because 4xA7 already has enough ld/st bandwidth to saturate both. Memory-limited code shouldn't see much impact due to issue-rule relaxation.
aryonoco - Wednesday, February 11, 2015 - link
Andrei and Ryan, thank you. I have not been impressed with anything Anandtech has published this much since the Original HTC One M7 review by Brian.

I believe you guys have just published the most thorough, detailed, comprehensive review of every aspect of an ARM SoC. Short of working at a chip maker's lab, I don't think anyone is going to have any better exposure to the ARM ecosystem than what you guys have presented here.

Huge thanks for finally paying attention to the SoCs that don't make it to the North American market. I've been fascinated by their performance and power consumption metrics, it's great to finally have an authoritative view of them. I would love to have your take on some other SoCs in this regard as well, especially Cortex A17. There is not much coverage of Cortex A17 and I think, given the situation with big.LITTLE, a quad core well optimised Cortex A17 might actually be a hidden weapon that no one seems to be using.

Also very much looking forward to your coverage of overclocking/undervolting mobile devices. You guys are truly bringing AnandTech to the mobile industry.

Once again, thank you. To be honest, I've been a bit worried since Anand and Brian's departure about the direction of AT, and it's so good to see it thrive in such capable hands now.
Andrei Frumusanu - Wednesday, February 11, 2015 - link
I'm planning on reviewing the A17, but still in the process of securing a device. Hopefully in the near future.
aryonoco - Wednesday, February 11, 2015 - link
On the subject of Javascript benchmarks and Chrome vs Stock browsers:

Are we sure that all of the difference is indeed due to the optimized libraries that Samsung has developed, and that there is no benchmark-targeting optimization going on? After all, we saw what happend with Sunspider (and thanks for dropping it), it is impossible that they are not targeting Kraken and Octane as well?

Would it be feasible for Anandtech to develop its own proprietary javascript benchmark? It could answer a few of these questions.
tuxRoller - Wednesday, February 11, 2015 - link
This was an excellent read.
Good detail, and enlightening investigation.
With regards to the power aware governor, you have to realize that it's a very hard problem. One that no one has managed, yet (iirc, huawei has claimed that their kernel has such a scheduler, but, you've seen how well it works), for general loads. Yes, there are many ideas but, as you've surmised, board implementations can drastically change assumptions.
BTW, power collapse appears to just be the arm term for the state under which cpuidle takes over for that core. So, it's not actually powered off, thus hotplug would still be needed.
For an overview of the domain see: http://lwn.net/Articles/482344/

Some useful links:
https://rt.wiki.kernel.org/index.php/Energy_Aware_...
https://lkml.org/lkml/2014/5/23/621

ARM A53/A57/T760 investigated - Samsung Galaxy Note 4 Exynos Review

20nm Manufacturing Process

Post Your Comment

135 Comments

View All Comments

habbakuk87 - Wednesday, February 11, 2015 - link

Arbie - Wednesday, February 11, 2015 - link

joe_dude - Wednesday, February 11, 2015 - link

joe_dude - Wednesday, February 11, 2015 - link

PrinceGaz - Wednesday, February 11, 2015 - link

patrickjchase - Wednesday, February 11, 2015 - link

aryonoco - Wednesday, February 11, 2015 - link

Andrei Frumusanu - Wednesday, February 11, 2015 - link

aryonoco - Wednesday, February 11, 2015 - link

tuxRoller - Wednesday, February 11, 2015 - link

Log in

Don't have an account? Sign up now