AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates

Name: AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates
Item: AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates
Author: Dr. Ian Cutress

by Ian Cutress on June 2, 2015 9:00 PM EST

137 Comments | Add A Comment

137 Comments

Power Saving and Power Consumption

When it comes to power, Carrizo features two/three technologies worth discussing. The first is the use of low power states, and the different frequency domains within the SoC. Previous designs had relatively few power planes, which left not as many chances for the SoC to power down areas not in use. Carrizo has ten power planes that can be controlled at run-time, allowing for what can be described as a dynamic race to sleep. This is bundled with access to the S0i3 power state, giving sub 50mW SoC power draw when in sleep and wake-up times under a second.

This is also combined with automated voltage/frequency sensors, of which an Excavator core has 10 each. These sensors take into consideration the instructions being processed, the temperature of the SoC, the quality of power delivery as well as the voltage and frequency at that point in order to relay information about how the system should adjust for the optimal power or performance point.

AMD states that this gives them the ability to adjust the frequency/power curve on a per-module basis further again to the right, providing another reduction in power or increase in frequency as required.

Next up for discussion is the voltage adaptive operation that was introduced back in Kaveri. I want to mention it here again because when it was first announced, I thought I understood it at a sufficient level in order to write about it. Well, having crossed another explanation of the feature by David Kanter, the reason for doing so clicked. I’m not going to steal his thunder, but I suggest you read his coverage to find out in more detail, but the concept is this:

When a processor does work, it draws power. The system has to be in a position to provide that power, and the system acts to restabilize the power while the processor is performing work. The work being done will cause the voltage across the processor to drop, to what we classically call Voltage Droop. As long as the droop does not cause the system to go below the minimum voltage required for operation, all is good. Voltage Droop works if the supply of power is consistent, although that cannot always be guaranteed – the CPU manufacturer does not have control over the quality of the motherboard, the power supply or the power conversion at hand. This causes a ripple in the quality of the power, and the CPU has to be able to cope with these ripples as these ripples, combined with a processor doing work, could cause the voltage to drop below the threshold.

The easiest way to cope is to put the voltage of the processor naturally higher, so it can withstand a bigger drop. This doesn’t work well in mobile, as more voltage results in a bigger power draw and a worse experience. There are other potential solutions which Kanter outlines in his piece.

AMD has tackled the problem is to get the processor to respond directly. When the voltage drops below a threshold value, the system will reduce the frequency and the voltage of the processor by around 5%, causing the work being done to slow down and not drain as much. At AMD’s Tech Day, they said this happens in as quickly as 3 cycles from detection, or in under a nanosecond. When the voltage drop is normalized (i.e. the power delivery is a more tolerable level), the frequency is cranked back up and work can continue at a normal rate.

Obviously the level of the threshold and the frequency drop will determine how much time is spent in this lower frequency state. We were told that with the settings used in Carrizo, the CPU hits this state less than 1% of the time, but it accounts for a sizeable chunk of overall average power reduction for a 3.5 GHz processor. This may sound odd, but it can make sense when you consider that the top 5% of the frequency is actually the most costly in terms of power than any other 5%. By removing that 5% extreme power draw, for a minimal performance loss (5% frequency loss for sub 1% of the time), it saves enough power to be worthwhile.

IPC Increases: Double L1 Data Cache, Better Branch Prediction Unified Video Decoder and Playback Pathways

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

137 Comments

View All Comments

name99 - Saturday, June 6, 2015 - link
You are comparing a $400 laptop to a $1500 laptop and, what do you know, the $1500 laptop comes out better. What a surprise!

The point is that in this space batteries have long been cheap and the energy efficiency nothing like at the higher end. Which means the work-life has been something like 3 hrs. If AMD shifts that to six hours with this chip, that's a massive improvement in the target space.

You're also making bad assumptions about why these laptops are bought. If you rely on your laptop heavily for your job, you buy a $1500 laptop. These machines are bought to act as light performance desk machines that are occasionally (but only occasionally) taken to a conference room or on a field trip.
name99 - Saturday, June 6, 2015 - link
AMD does not have infinite resources. This play makes sense.
Intel is essentially operating by starting with a Xeon design point and progressively stripping things out to get to Broadwell-M, which means that Broadwell-M over-supplies this $400-$700 market. Meanwhile at the really low end, Intel has Atom.

AMD is seeing (correctly, I think) that there is something of a gap in the Intel line which they can cover AND that this gap will probably persist for some time --- Intel isn't going to create a third line just to fit that gap.
Krysto - Wednesday, June 3, 2015 - link
I might be ready to get into AMD, as AMD has a lot of innovation lately. But it still disappoints me greatly that they aren't able to adopt a more modern process node.

If they launch their new high-performance CPU core next year as part of an APU that uses HBM memory and is at the very least on 16nm FinFET, I might get that instead of a Skylake laptop. HSA is pretty cool and one of the reasons I'd get it.
UtilityMax - Wednesday, June 3, 2015 - link
The Kaveri FX parts are still almost half as slow in IPC as a competing Intel Core i3 with the same TDP. Only in tests involving multithreaded apps that can load all four cores the FX parts are keeping up with the Core i3. Let's hope the Carrizo generation of APUs will improve this situation.
silverblue - Thursday, June 4, 2015 - link
Without being an AMD apologist, I think the point was that single threaded performance was "good enough" for your usual light work which tends to be hamstrung by I/O anyway.

There are two things that I need to see clarified about Carrizo, however:

1) Does Carrizo drop CPU frequency automatically when the GPU is being taxed? That's certainly going to be an issue as regards the comparison with an i3.
2) With the addition of AVX2, were there any architectural changes made to accommodate AVX2, for example a wider FlexFPU?
sonicmerlin - Tuesday, June 9, 2015 - link
Yup. I'll wait for the 14 nm Zen APUs with HBM. The performance leap (both CPU and GPU) should be truly massive.
Phartindust - Thursday, June 4, 2015 - link
Dude your gettin a Dell with a AMD processor!
When was the last time that happened?
Looks like @Dell loves #Carrizo, and will use @AMD once again. #AMDRTP http://www.cnet.com/au/news/dell-inspirion-amd-car... …
elabdump - Friday, June 5, 2015 - link
Don't forget that Intel gives you an non fixable NSA approved BIOS: http://mjg59.dreamwidth.org/33981.html
patrickjchase - Friday, June 5, 2015 - link
Ian, you appear to have confused I-cache and D-cache.

You wrote: "The L1 data cache is also now an 8-way associative design, but with the better branch prediction when needed it will only activate the one segment required and when possible power down the rest".

This is of course gibberish. Branch prediction would help to predict the target set of an *instruction* fetch from the I-cache, but is useless for D-cache set prediction for the most part (I say "for the most part" because Brad Calder did publish a way-prediction scheme based on instruction address back in the 90s. It didn't work very well and hasn't been productized that I know of).
zodiacfml - Friday, June 5, 2015 - link
Imagine what they could with 14nm of this, probably at half the cost of a Core M with 60 to 70% CPU performance of the M, yet with better graphics at the same TDP.

AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates

Power Saving and Power Consumption

Post Your Comment

137 Comments

View All Comments

name99 - Saturday, June 6, 2015 - link

name99 - Saturday, June 6, 2015 - link

Krysto - Wednesday, June 3, 2015 - link

UtilityMax - Wednesday, June 3, 2015 - link

silverblue - Thursday, June 4, 2015 - link

sonicmerlin - Tuesday, June 9, 2015 - link

Phartindust - Thursday, June 4, 2015 - link

elabdump - Friday, June 5, 2015 - link

patrickjchase - Friday, June 5, 2015 - link

zodiacfml - Friday, June 5, 2015 - link

Log in

Don't have an account? Sign up now