Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics

Name: Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics
Item: Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics
Author: Dr. Ian Cutress

by Dr. Ian Cutress on September 17, 2019 10:00 AM EST

Posted in
CPUs
AMD
Zen
Turbo Boost
Ryzen
Zen 2

144 Comments | Add A Comment

144 Comments

Defining Turbo, Intel Style

Since 2008, mainstream multi-core x86 processors have come to the market with this notion of ‘turbo’. Turbo allows the processor, where plausible and depending on the design rules, to increase its frequency beyond the number listed on the box. There are tradeoffs, such as Turbo may only work for a limited number of cores, or increased power consumption / decreased efficiency, but ultimately the original goal of Turbo is to offer increased throughput within specifications, and only for limited time periods. With Turbo, users could extract more performance within the physical limits of the silicon as sold.

In the beginning, Turbo was basic. When an operating system requested peak performance from a processor, it would increase the frequency and voltage along a curve within the processor power, current, and thermal limits, or until it hit some other limitation, such as a predefined Turbo frequency look-up table. As Turbo has become more sophisticated, other elements of the design come into play: sustained power, peak power, core count, loaded core count, instruction set, and a system designer’s ability to allow for increased power draw. One laudable goal here was to allow component manufacturers the ability to differentiate their product with better power delivery and tweaked firmwares to give higher performance.

For the last 10 years, we have lived with Intel’s definition of Turbo (or Turbo Boost 2.0, technically) as the defacto understanding of what Turbo is meant to mean. Under this scheme, a processor has a sustained power level, and peak power level, a power budget, and assuming budget is available, the processor will go to a Turbo frequency based on what instructions are being run and how many cores are active. That Turbo frequency is governed by a Turbo table.

The Turbo We All Understand: Intel Turbo

So, for example. I have a hypothetical processor that has a sustained power level (PL1) of 100W. The peak power level (PL2) is 150W*. The budget for this turbo (Tau) is 20 seconds, or the equivalent of 1000 joules of energy (20*(150-100)), which is replenished at a rate of 50 joules per second. This quad core CPU has a base frequency of 3.0 GHz, but offers a single core turbo of 4.0 GHz, and 2-core to 4-core of 3.5 GHz.

So tabulated, our hypothetical processor gets these values:

Sustained Power Level	PL1 / TDP	100 W
Peak Power Level	PL2	150 W
Turbo Window*	Tau	20 s
Total Power Budget*	(150-100) * 20	1000 J
*Turbo Window (and Total Power Budget) is typically defined for a given workload complexity, where 100% is a total power virus. Normally this value is around 95%

*Intel provides ‘suggested’ PL2 values and ‘suggested’ Tau values to motherboard manufacturers. But ultimately these can be changed by the manufacturers – Intel allows their partners to adjust these values without breaking warranty. Intel believes that its manufacturing partners can differentiate their systems with power delivery and other features to allow a fully configurable value of PL2 and Tau. Intel sometimes works with its partners to find the best values. But the take away message about PL2 and Tau is that they are system dependent. You can read more about this in our interview with Intel’s Guy Therien.

Now please note that a workload, even a single thread workload, can be ‘light’ or it can be ‘heavy’. If I created a piece of software that was a never ending while(true) loop with no operations, then the workload would be ‘light’ on the core and not stressing all the parts of the core. A heavy workload might involve trigonometric functions, or some level of instruction-level parallelism that causes more of the core to run at the same time. A ‘heavy’ workload therefore draws more power, even though it is still contained with a single thread.

If I run a light workload that requires a single thread, it will start the processor at 4.0 GHz. If the power of that single thread is below 100W, then I will use none of my budget, as it is refilled immediately. If I then switch to a heavy workload, and the core now consumes 110W, then my 1000 joules of turbo budget would decrease by 10 joules every second. In effect, I would get 100 seconds of turbo on this workload, and when the budget is depleted, the sustained power level (PL1) would kick in and reduce the frequency to ensure that the consumption on the chip stayed at 100W. My budget of energy for turbo would not increase, because the 100 joules/second that is being added is immediately taken away by the heavy workload. This frequency may not be the 3.0 GHz base frequency – it depends on the voltage/power characteristics of the individual chip. That 3.0 GHz base value is the value that Intel guarantees on its hardware – so every one of this hypothetical processor will be a minimum of 3.0 GHz at 100W on a sustained workload.

To clarify, Intel does not guarantee any turbo speed that is part of the specification sheet.

Now with a multithreaded workload, the same thing occurs, but you are more likely to hit both the peak power level (PL2) of 150W, and the 1000 joules of budget will disappear in the 20 seconds listed in the firmware. If the chip, with a 4-core heavy workload, hits the 150W value, the frequency will be decreased to maintain 150W – so as a result we may end up with less than the ‘3.5 GHz’ four-core turbo that was listed on the box, despite being in turbo.

So when a workload is what we call ‘bursty’, with periods of heavy and light work, the turbo budget may be refilled quicker than it is used in light workloads, allowing for more turbo when the workload gets heavy again. This makes it important when benchmarking software one after another – the first run will always have the full turbo budget, but if subsequent runs do not allow the budget to refill, it may get less turbo.

As stated, that turbo power level (PL2) and power budget time (Tau) are configurable by the motherboard manufacturer. We see that on enterprise motherboards, companies often stick to Intel’s recommended settings, but with consumer overclocking motherboards, the turbo power might be 2x-5x higher, and the power budget time might be essentially infinite, allowing for turbo to remain. The manufacturer can do this if they can guarantee that the power delivery to the processor, and the thermal solution, are suitable.

(It should be noted that Intel actually uses a weighted algorithm for its budget calculations, rather than the simplistic view I’ve given here. That means that the data from 2 seconds ago is weighted more than the data from 10 seconds ago when determining how much power budget is left. However, when the power budget time is essentially infinite, as how most consumer motherboards are set today, it doesn’t particularly matter either way given that the CPUs will turbo all the time.)

Ultimately, Intel uses what are called ‘Turbo Tables’ to govern the peak frequency for any given number of cores that are loaded. These tables assume that the processor is under the PL2 value, and there is turbo budget available. For example, here are Intel’s turbo tables for Intel’s 8^th Generation Coffee Lake desktop CPUs.

So Intel provides the sustained power level (PL1, or TDP), the Base frequency (3.70 GHz for the Core i7-8700K), and a range of turbo frequencies based on the core loading, assuming the motherboard manufacturer set PL2 isn’t hit and power budget is available.

The Effect of Intel’s Turbo Regime, and Intel’s Binning

At the time, Intel did a good job in conveying its turbo strategy to the press. It helped that staying on quad-core processors for several generations meant that the actual turbo power consumption of these quad-core chips was actually lower than sustained power value, and so we had a false sense of security that turbo could go on forever. With the benefit of hindsight, the nuances relating to turbo power limits and power budgets were obfuscated, and people ultimately didn’t care on the desktop – all the turbo for all the time was an easy concept to understand.

One other key metric that perhaps went under the radar is how Intel was able to apply its turbo frequencies to the CPU.

For any given CPU, any core within that design could hit the top turbo. It allowed for threads to be loaded onto whatever core was necessary, without the need to micromanage the best thread positioning for the best performance. If Intel stated that the single core turbo frequency was 4.6 GHz, then any core could go up to 4.6 GHz, even if each individual core could go beyond that.

For example, here’s a theoretical six-core Core i5-9600K, with a 3.7 GHz base frequency, and a 4.6 GHz turbo frequency. The higher numbers represent theoretical maximums of each core at the turbo voltage.

This is actually a strategy related to how Intel segments its CPUs after manufacturing, a process called binning. If a processor has the right power/thermal characteristics to reach a given frequency in a given power, then it could be labelled as the most appropriate CPU for retail and sold as such. Because Intel aimed for a homogeneous monolithic design, every core in the design was tested such that it performed equally (or almost equally) with every other core. Invariably some cores will perform better than others, if tweaked to the limits, but under Intel’s regime, it helped Intel to spread the workloads around as to not create thermal hotspots on the processor, and also level out any wear and tear that might be caused over the lifetime of the product. It also meant that in a hypervisor, every virtual machine could experience the same peak frequencies, regardless of the cores they used.

With binning, Intel (or any other company), is selecting a set of voltages and frequencies for a processor to which it is guaranteed. From the manufacturing, Intel (or others) can see the predicted lifespan of a given processor for a range of frequencies and voltages, and the ones that hit the right mark (based on internal requirements) means that a silicon chip ends up as a certain CPU. For example, if a piece of silicon does hit 9900K voltages and frequencies, but the lifespan rating of that piece of silicon is only two years, Intel might knock it down to a 9700K, which gives a predicted lifespan of fifteen years. It’s that sort of thing that determines how high a chip can perform. Obviously chips that can achieve high targets can also be reclassified as slower parts based on inventory levels or demand.

This is how the general public, the enthusiasts, and even the journalists and reviewers covering the market, have viewed Turbo for a long time. It’s a well-known part of the desktop space and to a large extent is easy to understand. If someone said ‘Turbo’ frequency, everyone was agreed on the same basic principles and no explanation was needed. We all assumed that when Turbo was mentioned, this is what they meant, and this is what it would mean for eternity.

Now insert AMD, March 2017, with its new Zen core microarchitecture. Everyone assumed Turbo would work in exactly the same way. It does not.

Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics AMD’s Turbo: Something Different

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

144 Comments

View All Comments

ajlueke - Tuesday, September 17, 2019 - link
More specifically, I was referring to this test from the article.

"Because of the new binning strategy – and despite what some of AMD's poorly executed marketing material has been saying – PBO hasn't been having the same effect, and users are seeing little-to-no benefit. This isn’t because PBO is failing, it’s because the CPU out of the box is already near its peak limits, and AMD’s metrics from manufacturing state that the CPU has a lifespan that AMD is happy with despite being near silicon limits."

What silicon limits exactly? AMDs marketing material has always indicated that a CPU will boost until it reaches either the PPT, TDC, EDC, or thermal limits. If none of those are met, it will boost until Fmax, which it simply will not exceed. Now, in a single threaded workload, the user is almost never at a PPT,TDC, EDC or thermal limit, and seem to be just shy of Fmax anyway. Now, if the user enables the auto-oc feature and extends Fmax by 100, 150 or 200MHz...nothing happens. The identical clockspeed and performance are observed.
I see the same thing happen in multicore on my 3900X. I normally hits the EDC and PPT limits under standard boosting. If I remove them, with precision boost overdrive, it does boost higher, but not by much. It again seems to stop a certain point. Again, EDC, TDC and PPT motherboard limits are not met, I am certainly not at Fmax, and the chip is under 70C, but it stops nonetheless. Nothing I can do makes it boost further.
"The Stilt", seems to mention the silicon fitness monitoring feature (FIT) in his "Matisse Strictly Technical" post on overclock.net. FIT appears to be a specific voltage limit for high and low current the CPU cannot exceed. This has never been included in AMDs documentation, and would help explain why the processor's stop boosting when according to AMD's own documentation, they should keep on going. So what exactly is this feature, and how does it work? I think that answer would do a great deal to alleviate user confusion.
mabellon - Tuesday, September 17, 2019 - link
>> "To a certain extent, Intel already kind of does this with its Turbo Boost Max 3.0 processors... [the] difference between the two companies is that AMD has essentially applied this idea chip-wide and through its product stack, while Intel has not, potentially leaving out-of-the-box performance on the table."

What does this mean? What has Intel not done that AMD has done? Both have variable max frequency per core. Both expose this concept to the OS. Both rely on the same Window scheduler. What are you alluding to is different here?

It seems to me that Intel's HEDT platform with Turbo 3.0 is very much similar to AMD's implementation in the sense of having certain cores run faster. @Ian how is performance left on the table for Intel here? (Intel non HEDT is obviously stuck on Turbo 2.0 which is at a disadvantage)
Targon - Tuesday, September 17, 2019 - link
The majority of Intel chips are multiplier locked, so there isn't any real overclocking ability to speak of. It is only the k chips that users can overclock. AMD on the other hand, has PBO which is more advanced when it comes down to it.
edzieba - Thursday, September 19, 2019 - link
"What does this mean? What has Intel not done that AMD has done?"

Intel picks the maximum 'turbo' bin as the lowest that any core can achieve. AMD picks their maximum boost bin as the highest that any single core could achieve. 'Turbo 3.0' pre-selected two cores that were able to clock above the all-core turbo bin and allowed them to clock higher for lightly threaded workloads.
Jaxidian - Tuesday, September 17, 2019 - link
Is this WSL tool available for us to use? I'd love to have a better view of what speeds my cores could hit with a tool like this. In fact, I'd probably use it to map out all 12 cores (disabling 11 of them at a time). Obviously even that wouldn't quite give the whole picture, but it would be an interesting baseline map to have for my 3900x chip.
Jaxidian - Tuesday, September 17, 2019 - link
I got my "no" answer here: https://twitter.com/IanCutress/status/117401405985...

"It's a custom kludgy thing for internal use."
MFinn3333 - Tuesday, September 17, 2019 - link
I miss the old days when I would just push the Turbo frequency on my 286 and the CPU would go from 10MHz to 12MHz. Sure occassionally chip poppped off from the Glue but it was totally worth it to play Dune 2.
sing_electric - Tuesday, September 17, 2019 - link
"Turbo, in this instance, is aspirational. We typically talk about things like ‘a 4.4 GHz Turbo frequency’, when technically we should be stating ‘up to 4.4 GHz Turbo frequency’."

This is true, but EXACTLY the problem. The marketing teams at AMD, Intel and everyone else KNOW that when you see "3.6 GHz / 4.5 GHZ Turbo" written on a box, your eye falls to the second, larger number, and that's what sticks in your head.

Why should the consumer know that some of the numbers on the box (core count, base freq) are guaranteed, but some (turbo) aren't? That makes no sense and is borderline deceptive. And this doesn't just matter to the fairly small, tech savvy group of people who buy a processor alone in a box - here's how Dell lists the processor on its base config XPS 13 laptop when you go to "Tech Specs & Customization"

"8th Generation Intel® Core™ i5-8265U Processor (6M Cache, up to 3.9 GHz, 4 cores)"

Dell doesn't even bother LISTING the base frequency, even when you click to get more detail - how's a consumer supposed to gauge how fast their processor is? (To their credit, Apple, HP and Lenovo all list base frequency and "up to" the turbo).

Turbo is a great technology for getting the most out of limited silicon, but both AMD and Intel are, while not QUITE being untruthful, certainly trying to put their products in as good of a light as possible.
DigitalFreak - Tuesday, September 17, 2019 - link
That's marketing for you. Step as close to the "deceive the customer" line as possible without getting sued.
Jaxidian - Tuesday, September 17, 2019 - link
I'm looking at the retail box for my 3900x right now. The only thing it says about frequencies is "4.6 GHz Max Boost, 3.8 GHz Base". There is no "up to" verbiage anywhere on the box. From a FTC advertising standpoint, the 4.6GHz should be guaranteed even if only under nuanced "limited single-core" and "with specific but reasonable motherboard, cooling, and software" scenarios.

While this is a very good article and I generally have very few issues with AMD's new approach here, I'm of the belief that legally, a 3900x should be guaranteed to hit 4.6GHz when in a specific-yet-real-world scenario. I don't mean $100 mobos with $25 coolers should be able to hit it. But a better-than-budget x570 motherboard using the stock cooler with proper updates on a supported OS should absolutely hit 4.6GHz with certain loads. Otherwise, I think there's a real legal issue here.

All this said, I am now seeing 4.6GHz from time to time on my 3900x with ABBA on my x570 Aorus Master, so we're good here. Never saw higher than 4.575 before ABBA.

Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics

Defining Turbo, Intel Style

The Turbo We All Understand: Intel Turbo

The Effect of Intel’s Turbo Regime, and Intel’s Binning

Post Your Comment

144 Comments

View All Comments

ajlueke - Tuesday, September 17, 2019 - link

mabellon - Tuesday, September 17, 2019 - link

Targon - Tuesday, September 17, 2019 - link

edzieba - Thursday, September 19, 2019 - link

Jaxidian - Tuesday, September 17, 2019 - link

Jaxidian - Tuesday, September 17, 2019 - link

MFinn3333 - Tuesday, September 17, 2019 - link

sing_electric - Tuesday, September 17, 2019 - link

DigitalFreak - Tuesday, September 17, 2019 - link

Jaxidian - Tuesday, September 17, 2019 - link

Log in

Don't have an account? Sign up now