AMD’s Turbo Issue (Abridged)

So why all this talk about how each company does its Turbo functionality, as well as its binning strategy? I prepped this story with the differences because a lot of our user base still thinks in terms of Intel’s way of doing things. Now that AMD is in the game and carving its own path, it’s important to understand AMD’s strategy in the context of the products coming out.

With that in mind, let’s cover AMD’s recent news.

Ever since the launch of Zen 2 and the Ryzen 3000 series CPUs, AMD has done its usual of advertising core counts, base frequency, TDP, and turbo frequencies. What has occurred since initial launch day reviews and through the public availability has been that groups of enthusiast users, looking to get the most out of their new shiny hardware, have reported that their processors are not hitting the turbo frequencies.

If a processor had a guaranteed 4.4 GHz turbo frequency, users were complaining that their peak turbo frequencies observed were 25-100 MHz less, or in some cases more than 100 MHz down on what was advertised. This kind of drop in frequency was being roughly reported through the ecosystem, but no-one particularly acted on it until these past few weeks, around the same time that AMD had several other news stories going on.

Multiple outlets, such as Hardware Unboxed, noted that the frequencies they were seeing significantly depended on the motherboard used. Hardware Unboxed tested 14 different AMD AM4 motherboards (some X570, others not X570) with a Ryzen 7 3800X, expecting a peak turbo frequency during Cinebench R20 of 4.5 GHz. Only one motherboard was consistent across most CPUs, while a few others were hit and miss.

This obviously plays into some reasoning that the turbo is motherboard dependent, all else being equal. It should be noted that there’s no guarantee that all these motherboards, despite being on the latest BIOSes, actually had AMD’s latest firmware versions in place.

Another outlet, Gamers Nexus, also observed that they could guarantee a CPU would hit its rated turbo speeds when the system was under some form of cold, either chilled water or a sub-zero cooling environment. This ultimately would lead some believe that this relates to a thermal capacity issue within the motherboard, CPU, or power delivery.

The Stilt, a popular user commonly associated with AMD’s hardware and its foibles, posted on 8/12 that AMD had reduced its peak temperature value for the Ryzen 3000 CPUs, and had introduced a middle temperature value to help guide the turbo. These values would be part of the SMU, or System Management Unit, that helps control turbo functionality.

Peter Tan, aka Shamino, a world renowned (retired) overclocker and senior engineer for ASUS’ motherboard division, acknowledged the issue in a forum post on 8/22 with his own take on the matter. He stated that AMD’s initial outlay with its turbo boost behavior was ultimately too aggressive, and in order to ensure longevity of the chip, the boost behavior was in line with what AMD needed to achieve that longevity.

It should be stated here that Shamino is speaking here in his personal capacity.

For those not engrained in the minutae of forum life, the biggest arrow to this issue came from Roman Hartung, or Der8auer, through his YouTube channel. He enlisted the help of his audience to tabulate what frequencies users were getting.

In the survey, the following details were requested:

  • CPU
  • Motherboard
  • AGESA version/BIOS version
  • PBO disabled
  • Air cooling

Now obviously when it was announced that this survey was going to happen, Roman and AMD discussed behind the scenes the pros and cons about this survey. As you might expect, AMD had some reservations that this survey was in any way going to be fair – it’s about as unscientific as you can get. Naturally Roman argued that these would be real world results with users machines, rather than in-lab results, and AMD should be guaranteeing users on their home machines with specific frequency values. AMD also pointed out that with this sort of survey, you have an inherent selection bias: users who feel negatively impacted by any issue (through AMD’s fault or the users own) are more likely to respond than those that were happy with the performance. Roman agreed that this would be a concern, but still highlighted the fact that users shouldn’t be having these issues in the first place. AMD also mentioned that the Windows version couldn’t be controlled, to which Roman argued that if turbo is only valid for a certain Windows version, then it’s not fair to promote it, however did concede that the best performance was the latest version of Windows 10, and users on Windows 7 will have to accept some level of reduced performance.

Roman and AMD did at least agree on a testing scenario in order to standardize the reporting. Based on AMD’s recommendations, Roman requested from his audience that they use CineBench R15 as a single threaded load, and HWiNFO as the reporting tool, set to a 500 millisecond (0.5 second) polling interval, with the peak frequency from the CPU listed.

The survey ended up with ~3300 valid submissions, which Roman checked one-by-one to make sure all the data was present, screenshots showed the right values, and removed any data points that didn’t pass the testing conditions (such as PBO enabled). The results are explained in Roman’s video and the video is well worth a look. I’ve summarized the data for each CPU here.

Der8auer's Ryzen Turbo Survey Results
AnandTech 3600 3600X 3700X 3800X 3900X
Rated Turbo MHz of CPU 4200 4400 4400 4500 4600
Average Survey MHz 4158 4320 4345 4450 4475
Mode Survey MHz* 4200 4350 4375 4475 4525
Total Results Submitted 568 190 1087 159 722
# Results Minus Outliers 542 180 1036 150 685
Results >= Rated Turbo 210 17 153 39 38
% Results >= Rated Turbo 50% 9% 15% 26% 6%
*Mode = most frequent result

I have corrected a couple of Roman’s calculations based on the video data, but they were minor changes.

For each CPU, we have the listed turbo frequency, the average turbo frequency from the survey, and the modal CPU frequency (i.e. the most frequently reported frequency). Beyond this, the number of users that reported a frequency equal to the turbo frequency or higher is listed as a percentage.

On the positive, the modal reported CPU frequency for almost all chips (except the 3900X) is relatively close, showing that most users are within 25-50 MHz of the advertised peak turbo frequency. The downside is that the actual number of users achieving the rated turbo is quite low. Aside from the Ryzen 5 3600, which is 50%, all the other CPUs struggle to see rated turbo speeds on the box.

As you might imagine, this data caused quite a stir in the community, and a number of vocal users who had invested hard earned money into their systems were agonizingly frustrated that they were not seeing the numbers that the box promised.

Before covering AMD’s response, I want to discuss frequency monitoring tools, turbo times, and the inherent issues with the Observer Effect. There’s also the issue of how long does turbo need to be active for it to count (or even register in software).

Do Manufacturers Guarantee Turbo Frequencies? Detecting Turbo: Microseconds vs. Milliseconds
Comments Locked

144 Comments

View All Comments

  • Smell This - Wednesday, September 18, 2019 - link


    { s-n-i-c-k-e-r }
  • BurntMyBacon - Wednesday, September 18, 2019 - link

    Electron migration is generally considered to be the result of momentum transfer from the electrons, which move in the applied electric field, to the ions which make up the lattice of the interconnect material.

    Intuitively speaking, raising the frequency would proportionally increase the number of pulses over a given time, but the momentum (number of electrons) transferred per pulse would remain the same. Conversely, raising the voltage would proportionally increase the momentum (number of electrons) per pulse, but not the number of pulses over a given time. To make an analogy, raising the frequency is like moving your sandpaper faster while raising your voltage is like using coarser grit sandpaper at the same speed.

    You might assume that if the total number of electrons are the same, then the wear will be the same? However, there is a certain amount of force required to dislodge an atom (or multiple atoms) from the interconnect material lattice. Though the concept is different, you can simplistically think of it like stationary friction. Increasing the voltage increases the force (momentum) from each pulse which could overcome this resistance where nominal voltages may not be enough. Also, increasing voltage has a larger affect on heat produced than increasing frequency. Adding heat energy into the system may lower the required force to dislodge the atom(s). If the nominal voltage is unable or only intermittently able to exceed the required force, then raising the frequency will have little effect compared to raising the voltage. That said, continuous strain will probably weaken the resistance over time, but it is likely that this still less significant than increasing voltage. Based on this, I would expect (read my opinion) four things:
    1) Electron migration becomes exponentially worse the farther you exceed specifications (Though depending on where your initial durability is it may not be problematic)
    2) The rate of electron migration is not constant. Holding all variables constant, it likely increases over time. That said, there are likely a lot of process specific variables that determine how quickly the rate increases.
    3) Increasing voltage has a greater affect on electron migration than frequency. Increasing frequency alone may be considered far more affordable from a durability standpoint than increases that require significant voltage.
    4) Up to a point, better cooling will likely reduce electron migration. We are already aware that increased heat physically expands the different materials in the semiconductor at different rates. It is likely that increased heat energy in the system also makes it easier to dislodge atoms from their lattice. Reducing this heat build-up should lessen the effect here.

    Some or all of these may be partially or fully incorrect, but this is where my out of date intuition from limited experience in silicon fabrication takes me.
  • eastcoast_pete - Wednesday, September 18, 2019 - link

    Thanks Ian! And, as mentioned, would also like to hear from you or Ryan on the same for GPUs. With lots of former cryptomining cards still in the (used) market, I often wonder just how badly those GPUs were abused in their former lifes.
  • nathanddrews - Tuesday, September 17, 2019 - link

    My hypothesis is that CPUs are more likely to outlive their usefulness long before a hardware failure. CPUs failing due to overclocking is not something we hear much about - I'm thinking it's effectively a non-issue. My i5-3570K has been overclocked at 4.2GHz on air for 7 years without fault. I don't think it has seen any time over 60C. That said, as a CPU, it has nearly exhausted its usefulness in gaming scenarios due to lack of both speed and cores.

    What would cause a CPU to "burn out" that hasn't already been accounted for via throttling, auto-shutdown procedures, etc.?
  • dullard - Tuesday, September 17, 2019 - link

    Thermal cycling causes CPU damage. Different materials expand at different rates when they heat, eventually this fatigue builds up and parts begin to crack. The estimated failure rate for a CPU that never reaches above 60°C is 0.1% ( https://www.dfrsolutions.com/hubfs/Resources/servi... ). So, in that case, you are correct that your CPU will be just fine.

    But, now CPUs are reaching 100°C, not 60°C. That higher temperature range doubles the temperature range the CPUs are cycling through. Also, with turbo kicking on/off quickly, the CPUs are cycling more often than before. https://encrypted-tbn0.gstatic.com/images?q=tbn:AN...
  • GreenReaper - Wednesday, September 18, 2019 - link

    Simple solution: run BOINC 24/7, keeps it at 100°C all the time!
    I'm sure this isn't why my Surface Pro isn't bulging out of its case on three sides...
  • Death666Angel - Thursday, September 19, 2019 - link

    Next up: The RGB enabled hair dryer upgrade to stop your precious silicon from thermal cycling when you shut down your PC!
  • mikato - Monday, September 23, 2019 - link

    Now I wonder how computer parts had an RGB craze before hair dryers did. Have there been andy RGB hair dryers already?
  • tygrus - Saturday, September 28, 2019 - link

    The CPU temperature sensors have changed type and location. Old sensors were closer to the surface temperature just under the heatsink (more of an average or single spot assumed to be the hottest). Now its the highest of multiple sensors built into the silicon and indicates higher temperatures for the same power&area than before. There is always a temperature gradient from the hot spots to where heat is radiated.
  • eastcoast_pete - Wednesday, September 18, 2019 - link

    For me, the key statement in your comment is that your Sandy Bridge i7 rarely if ever went above 60 C. That is a perfectly reasonable upper temperature for a CPU. Many current CPUs easily get 50% hotter, and that's before any overclocking and overvolting. For GPUs, it even worse; 100 - 110 C is often considered "normal" for "factory overclocked" cards.

Log in

Don't have an account? Sign up now