Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics

Name: Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics
Item: Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics
Author: Dr. Ian Cutress

by Dr. Ian Cutress on September 17, 2019 10:00 AM EST

Posted in
CPUs
AMD
Zen
Turbo Boost
Ryzen
Zen 2

144 Comments | Add A Comment

144 Comments

Detecting Turbo: Microseconds vs. Milliseconds

One of the biggest issues with obtaining frequency data is the actual process of monitoring. While there are some basic OS commands to obtain the frequency, it isn’t just as simple as reading a number.

How To Read A Frequency

As an outlay, we have to differentiate between the frequency of a processor vs. the frequency of a core. On modern PC processors, each core can act independently of each other in terms of frequency and voltage, and so each core can report different numbers. In order to read the value for each core, that core has to be targeted using an affinity mask that binds the reading to a particular core. If a simple ‘what’s the frequency’ request goes out to a processor without an affinity mask, it will return the value of the core to which that thread ends up being assigned. Typically this is the fastest core, but if there is already work being performed on a chip, that thread might end up on an idle core. If a request to find out ‘what is the current frequency of the processor’ is made, users could end up with a number of values: the frequency on a specific core, the frequency of the fastest core, or an average frequency of all the cores. To add more confusion to the matter, if the load on a core is taken into account, depending on the way the request is made, a core running at ‘50%’ load at peak frequency might end up returning a value of half frequency.

There are a multitude of programs that report frequency. Several of the most popular include:

CPU-Z
HWiNFO
Intel XTU
Intel Power Gadget
Ryzen Master
AIDA64

Some of these use similar methods to access frequency values, others have more intricate methods, and then the reporting and logging of each frequency value can have different effects on the system being tested.

I asked one of the main developers of these monitoring tools how they detect the frequency of a core. They gave me a brief overview – it’s not as simple as it turns out.

Know the BCLK (~100 MHz) precisely. Normally this is done my measuring the APIC clock, but on modern systems that use internal clock references (Win 10 1803+) this causes additional interrupt bandwidth, and so often this value is polled rarely and cached.
Detect the CPU Core multiplier by reading a single Model Specific Register based on the CPU. This has to be done in kernel mode, so there is additional overhead switching from user mode to kernel mode and back.
This has to be repeated for each core by using an affinity mask, using a standard Win32 API call of SetCurrentThreadAffinityMask. As this is an API call, there is again additional overhead.

So the frequency of a single core here is measured by the base clock / BCLK and multiplying it by the Core Multiplier as defined in the registers for that core, all through an affinity mask. Typically BCLK is the same across all cores, but even that has some drift and fluctuations over time, so it will depend on how frequently you request that data.

Another alternative method is to apply a simple load – a known array of consistent instructions and to measure the number of cycles / length of time it takes to compute that small array. This method might be considered more accurate by some, but it still requires the appropriate affinity mask to be put in place, and actually puts in additional load to the system, which could cause erroneous readings.

How Quick Can Turbo Occur

Modern processors typically Turbo anywhere from 4 GHz to 5 GHz, or four to five billion cycles a second. That means each cycle at 5 GHz is equal to 0.2 nanoseconds, or 0.2 x 10^-9 seconds. These processors don’t stay at that frequency – they adjust the frequency up or down based on the load requests, which helps manage power and heat. How quickly a processor can respond to these requests for a higher frequency has become a battleground in recent years.

How a processor manages its frequency all comes down to how it interacts with the operating system. In a pre-Skylake world, a processor would have a number of pre-defined ACPI power states, relating to performance (P), Device (D), and processor (C), based on if the processor was on, in sleep, or needed high frequency. P-states relied on a voltage-frequency scaling, and the OS could control P0 to P1 to P2 and beyond, with P1 being the guaranteed base frequency and any higher P number being OS controlled. The OS could request P0, which enabled the processor to enter boost mode. All of this would go through a set of OS drivers relating to power and frequency control; this came to be known as SpeedStep for Intel, and Cool’n’Quiet for AMD.

As defined in the ACPI specifications, with the introduction of UEFI control came CPPC, or Collaborative Processor Performance Control. Requiring CPU and OS support, with Skylake we saw Intel and Microsoft introduced a new ‘Speed Shift’ feature that put the control of the frequency modes of the processor back in the hands of the processor – the CPU could directly respond to the instruction density coming into the core and modify the frequency directly without additional commands. The end result of CPPC, and Speed Shift for Intel, was a much faster frequency response mechanism.

With Speed Shift in Skylake, on Windows, Intel was promoting that before Speed Shift they were changing frequency anywhere up to 100 milliseconds (0.1 s) after the request was made. With Speed Shift, that had come down to the 35 millisecond mark, around a 50-66% improvement. With subsequent updates to the Skylake architecture and the driver stack, Intel states that this has improved further.

Users can detect to see if CPPC is enabled on their Intel system very easily. By going to the Event Viewer, selecting Window Logs -> System, and then going to a time stamp where the machine was last rebooted, we can see ACPI CPPC listed under the Kernel-Processor-Power source.

For my Core i7-8565U Whiskey Lake CPU, it shows that APCI CPPC is enabled, and that my CPU Core 5 is running at 2.0 GHz base with a 230% peak turbo, or 4.6 GHz, which relates to the single-core turbo frequency of my processor.

For AMD, with Zen 2, the company announced the use of CPPC2 in collaboration with Microsoft. This is CPPC but with a few extra additional tweaks to the driver stack for when an AMD processor is detected.

Here AMD is claiming that they can change frequency, when using the Windows 10 May 2019 update or newer, on the scale of 1-2 ms, compared to 30 ms with the standard CPPC interface. This comes down to how AMD has implemented its ‘CPPC2’ model, with a series of shim drivers in place to help speed the process along. If we go back to how we can detect that CPPC mode similar to Intel, we see a subtle difference:

Ryzen 7 3700X

Notice here it doesn’t say CPPC2, just CPPC. What does display is the 3600 MHz base frequency of our 3700X, and a maximum performance percentage of 145%, which would make the peak turbo of this processor somewhere near 5220 MHz. Clearly that isn’t the peak turbo of this CPU (which would be 4400 MHz), which means that AMD is using this artificially high value combined with its CPPC driver updates to help drive a faster frequency response time.

The Observer Effect

Depending on the software being used, and the way it calculates the current frequency of any given core/processor, we could end up artificially loading the system, because as explained above it is not as simple as just reading a number – extra calculations have to be made or API calls have to be driven. The more frequently the user tries to detect the frequency, the more artificial load is generated on the core, and at some point the system will detect this as requiring a different frequency, making the readings change.

This is called the observer effect. And it is quite easy to see it in action.

For any tool that allows the user to change the polling frequency, as the user changes that frequency from once per second to ten times per second, then 100 times per second, or 1000 times per second, even on a completely idle system, some spikes will be drawn – more if the results are being logged to memory or a data file.

Therein lies the crutch of frequency reporting. Ultimately we need the polling frequency to be quick enough to capture all the different frequency changes, but we don’t want it interfering with the measurement. Combined with CPPC, this can make detecting certain peak frequencies particularly annoying.

Let’s go back to our time scales for instructions and frequency changes. At 4 GHz, we can break down the following:

Time Scales at 4 GHz
AnandTech	Time	Unit
One Cycle	`0.00000000025`	s
Simple Loop (1000 cycles)	`0.0000025`	s
CPPC Frequency Change (AMD)	`0.002`	s
Frequency Polling	`0.1`	s

Note that a frequency change is the equivalent to losing around 800,000 cycles at 4 GHz, so the CPU has to gauge to what point the frequency change is worth it based on the instructions flowing into the core.

But what this does tell is one of the inherent flaws in frequency monitoring – if a CPU can change frequency as quickly as every 1-2 ms, but we can only poll at around 50-100 ms, then we can miss some turbo values. If a processor quickly fires up to a peak turbo, processes a few instructions, and then drops down almost immediately due to power/frequency requirements not being met for the incoming instruction stream, it won’t ever be seen by the frequency polling software. If the requirements are met of course, then we do see the turbo frequency – the value we end up seeing is the one that the system feels is more long-term suitable.

With an attempt at sub-1ms polling time, we can see this in effect. The blue line shows the Ryzen processor in a balanced power configuration, and at around 3.6 milliseconds the 3700X jumps up to 4350-4400 MHz, bouncing around between the two. But by 4.6 milliseconds, we have already jumped down to 4.3 GHz, then at 5.2 milliseconds we are at 4.2 GHz.

We were able to obtain this data using Windows Subsystem for Linux, using an add-dependency chain from which we derive the frequency based on the throughput. There is no observer effect here because it is the workload – not something that can be done when an external workload is used. It gives us a resolution of around 40 microseconds, and relies on the scheduler automatically assigning the thread to the best core.

But simply put, unless a user is polling this quick, the user will not see the momentary peaks in turbo frequency if they are on the boundary of supporting it. The downside of this is that polling this quick puts an artificial load on the system, and means any concurrent running benchmark will be inadequate.

(For users wondering what that orange line is, that would be the processor in ‘performance mode’, which gives a higher tolerance for turbo.)

It all leads to a question – if a core hits a turbo frequency but you are unable to detect it, does that count?

Ultimately, by opting for a more aggressive binning strategy so close to silicon limits, AMD has reached a point where, depending on the workload and the environment, a desktop CPU might only sustain a top Turbo bins momentarily. Like Turbo itself, this is not a bad thing, as it extracts more performance from their processors that would otherwise be left on the table by lower clockspeeds. But compared to Intel’s processors and what we’re used to, these highest bins require more platform management to ensure that the processor is indeed reaching its full potential.

AMD’s Turbo Issue (Abridged) AMD Found An Issue, for +25-50 MHz

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

144 Comments

View All Comments

ajlueke - Tuesday, September 17, 2019 - link
More specifically, I was referring to this test from the article.

"Because of the new binning strategy – and despite what some of AMD's poorly executed marketing material has been saying – PBO hasn't been having the same effect, and users are seeing little-to-no benefit. This isn’t because PBO is failing, it’s because the CPU out of the box is already near its peak limits, and AMD’s metrics from manufacturing state that the CPU has a lifespan that AMD is happy with despite being near silicon limits."

What silicon limits exactly? AMDs marketing material has always indicated that a CPU will boost until it reaches either the PPT, TDC, EDC, or thermal limits. If none of those are met, it will boost until Fmax, which it simply will not exceed. Now, in a single threaded workload, the user is almost never at a PPT,TDC, EDC or thermal limit, and seem to be just shy of Fmax anyway. Now, if the user enables the auto-oc feature and extends Fmax by 100, 150 or 200MHz...nothing happens. The identical clockspeed and performance are observed.
I see the same thing happen in multicore on my 3900X. I normally hits the EDC and PPT limits under standard boosting. If I remove them, with precision boost overdrive, it does boost higher, but not by much. It again seems to stop a certain point. Again, EDC, TDC and PPT motherboard limits are not met, I am certainly not at Fmax, and the chip is under 70C, but it stops nonetheless. Nothing I can do makes it boost further.
"The Stilt", seems to mention the silicon fitness monitoring feature (FIT) in his "Matisse Strictly Technical" post on overclock.net. FIT appears to be a specific voltage limit for high and low current the CPU cannot exceed. This has never been included in AMDs documentation, and would help explain why the processor's stop boosting when according to AMD's own documentation, they should keep on going. So what exactly is this feature, and how does it work? I think that answer would do a great deal to alleviate user confusion.
mabellon - Tuesday, September 17, 2019 - link
>> "To a certain extent, Intel already kind of does this with its Turbo Boost Max 3.0 processors... [the] difference between the two companies is that AMD has essentially applied this idea chip-wide and through its product stack, while Intel has not, potentially leaving out-of-the-box performance on the table."

What does this mean? What has Intel not done that AMD has done? Both have variable max frequency per core. Both expose this concept to the OS. Both rely on the same Window scheduler. What are you alluding to is different here?

It seems to me that Intel's HEDT platform with Turbo 3.0 is very much similar to AMD's implementation in the sense of having certain cores run faster. @Ian how is performance left on the table for Intel here? (Intel non HEDT is obviously stuck on Turbo 2.0 which is at a disadvantage)
Targon - Tuesday, September 17, 2019 - link
The majority of Intel chips are multiplier locked, so there isn't any real overclocking ability to speak of. It is only the k chips that users can overclock. AMD on the other hand, has PBO which is more advanced when it comes down to it.
edzieba - Thursday, September 19, 2019 - link
"What does this mean? What has Intel not done that AMD has done?"

Intel picks the maximum 'turbo' bin as the lowest that any core can achieve. AMD picks their maximum boost bin as the highest that any single core could achieve. 'Turbo 3.0' pre-selected two cores that were able to clock above the all-core turbo bin and allowed them to clock higher for lightly threaded workloads.
Jaxidian - Tuesday, September 17, 2019 - link
Is this WSL tool available for us to use? I'd love to have a better view of what speeds my cores could hit with a tool like this. In fact, I'd probably use it to map out all 12 cores (disabling 11 of them at a time). Obviously even that wouldn't quite give the whole picture, but it would be an interesting baseline map to have for my 3900x chip.
Jaxidian - Tuesday, September 17, 2019 - link
I got my "no" answer here: https://twitter.com/IanCutress/status/117401405985...

"It's a custom kludgy thing for internal use."
MFinn3333 - Tuesday, September 17, 2019 - link
I miss the old days when I would just push the Turbo frequency on my 286 and the CPU would go from 10MHz to 12MHz. Sure occassionally chip poppped off from the Glue but it was totally worth it to play Dune 2.
sing_electric - Tuesday, September 17, 2019 - link
"Turbo, in this instance, is aspirational. We typically talk about things like ‘a 4.4 GHz Turbo frequency’, when technically we should be stating ‘up to 4.4 GHz Turbo frequency’."

This is true, but EXACTLY the problem. The marketing teams at AMD, Intel and everyone else KNOW that when you see "3.6 GHz / 4.5 GHZ Turbo" written on a box, your eye falls to the second, larger number, and that's what sticks in your head.

Why should the consumer know that some of the numbers on the box (core count, base freq) are guaranteed, but some (turbo) aren't? That makes no sense and is borderline deceptive. And this doesn't just matter to the fairly small, tech savvy group of people who buy a processor alone in a box - here's how Dell lists the processor on its base config XPS 13 laptop when you go to "Tech Specs & Customization"

"8th Generation Intel® Core™ i5-8265U Processor (6M Cache, up to 3.9 GHz, 4 cores)"

Dell doesn't even bother LISTING the base frequency, even when you click to get more detail - how's a consumer supposed to gauge how fast their processor is? (To their credit, Apple, HP and Lenovo all list base frequency and "up to" the turbo).

Turbo is a great technology for getting the most out of limited silicon, but both AMD and Intel are, while not QUITE being untruthful, certainly trying to put their products in as good of a light as possible.
DigitalFreak - Tuesday, September 17, 2019 - link
That's marketing for you. Step as close to the "deceive the customer" line as possible without getting sued.
Jaxidian - Tuesday, September 17, 2019 - link
I'm looking at the retail box for my 3900x right now. The only thing it says about frequencies is "4.6 GHz Max Boost, 3.8 GHz Base". There is no "up to" verbiage anywhere on the box. From a FTC advertising standpoint, the 4.6GHz should be guaranteed even if only under nuanced "limited single-core" and "with specific but reasonable motherboard, cooling, and software" scenarios.

While this is a very good article and I generally have very few issues with AMD's new approach here, I'm of the belief that legally, a 3900x should be guaranteed to hit 4.6GHz when in a specific-yet-real-world scenario. I don't mean $100 mobos with $25 coolers should be able to hit it. But a better-than-budget x570 motherboard using the stock cooler with proper updates on a supported OS should absolutely hit 4.6GHz with certain loads. Otherwise, I think there's a real legal issue here.

All this said, I am now seeing 4.6GHz from time to time on my 3900x with ABBA on my x570 Aorus Master, so we're good here. Never saw higher than 4.575 before ABBA.

Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics

Detecting Turbo: Microseconds vs. Milliseconds

How To Read A Frequency

How Quick Can Turbo Occur

The Observer Effect

Post Your Comment

144 Comments

View All Comments

ajlueke - Tuesday, September 17, 2019 - link

mabellon - Tuesday, September 17, 2019 - link

Targon - Tuesday, September 17, 2019 - link

edzieba - Thursday, September 19, 2019 - link

Jaxidian - Tuesday, September 17, 2019 - link

Jaxidian - Tuesday, September 17, 2019 - link

MFinn3333 - Tuesday, September 17, 2019 - link

sing_electric - Tuesday, September 17, 2019 - link

DigitalFreak - Tuesday, September 17, 2019 - link

Jaxidian - Tuesday, September 17, 2019 - link

Log in

Don't have an account? Sign up now