AMD Found An Issue, for +25-50 MHz

Of course, with Roman’s dataset hitting the internet with its results, a number of outlets reported on it and a lot of people were in a spin. It wasn’t long for AMD to have a response, issued in the form of a blog post. I’m going to take bits and pieces here from what is relevant, starting with the acknowledgement that a flaw was indeed found:

As we noted in this blog, we also resolved an issue in our BIOS that was reducing maximum boost frequency by 25-50MHz depending on workload. We expect our motherboard partners to make this update available as a patch in two to three weeks. Following the installation of the latest BIOS update, a consumer running a bursty, single threaded application on a PC with the latest software updates and adequate voltage and thermal headroom should see the maximum boost frequency of their processor.

AMD acknowledged that they had found a bug in their firmware that was reducing the maximum boost frequency of their CPUs by 25-50 MHz. If we take Roman’s data survey, adding 50 MHz to every value would push all the averages and modal values for each CPU above the turbo frequency. It wouldn’t necessarily help the users who were reporting 200-300 MHz lower frequencies, to which AMD had an answer there:

Achieving this maximum boost frequency, and the duration of time the processor sits at this maximum boost frequency, will vary from PC to PC based on many factors such as having adequate voltage and current headroom, the ambient temperature, installing the most up-to-date software and BIOS, and especially the application of thermal paste and the effectiveness of the system/processor cooling solution.

As we stated at the AMD Turbo section of this piece, the way that AMD implements its turbo is different, and it does monitor things like power delivery, voltage and current headroom, and will adjust the voltage/frequency based on the platform in use. AMD is reiterating this, as I expected they would have to.

AMD in the blog post mentioned how it had changed its firmware (1003AB) in August for system stability reasons, categorically denying that it was for CPU longevity reasons, saying that the latest firmware (1003ABBA) improves performance and does not affect longevity either.

The way AMD distributes its firmware is through AGESA (AMD Generic Encapsulated Software Architecture). The AGESA is essentially a base set of firmware and library files that gets distributed to motherboard vendors who then apply their own UEFI interfaces on top. The AGESA can also include updates for other parts of the system, such as the System Management Unit, that have their own firmware related to their operation. This can make updating things a bit annoying – motherboard vendors have been known to mix and match different firmware versions, because ultimately at the end of the day the user ends up with ‘BIOS F9’ or something similar.

AMD’s latest AGESA at the time of writing is 1003ABBA, which is going through motherboard vendors right now. MSI and GIGABYTE have already launched beta BIOS updates with the new AGESA, and should be pushing it through to stable versions shortly, as should be ASUS and ASRock.

Some media outlets have already tested this new firmware, and in almost all circumstances, are seeing a 25-50 MHz uplift in the way that the frequency was being reported. See the Tom’s Hardware article as a reference, but in general, reports are showing a 0.5-2.0% increase in performance in single thread turbo limited tests.

I Have a Ryzen 3000 CPU, Does It Affect Me?

The short answer is that if you are not overclocking, then yes. When your particular motherboard has a BIOS update for 1003ABBA, then it is advised to update. Note that updating a BIOS typically means that all BIOS settings are lost, so keep a track in case the DRAM needs XMP enabled or similar.

Users that are keeping their nose to the grindstone on the latest AMD BIOS developments should know the procedure.

The Future of Turbo

It would be at this point that I might make commentary that single thread frequency does not always equal performance. As part of the research for this article, I learned that some users believe that the turbo frequency listed on the box believe it is the all-core turbo frequency, which just goes to show that turbo still isn’t well understood in name alone. But as modern workloads move to multi-threaded environments with background processes, the amount of time spent in single-thread turbo is being reduced. Ultimately we’re ending up with a threading balance between background processes and immediate latency sensitive requirements.

At the end of the day, AMD identifying a 25-50 MHz deficit and fixing it is a good thing. The number of people for whom this is a critical boundary that enables a new workflow though, is zero. For all the media reports that drummed up AMD not hitting published turbo speeds as a big thing, most of those reporters ended up by contrast being very subdued with AMD’s fix. 2% on the single core turbo frequency hasn’t really changed anyone in this instance, despite all the fuss that was made.

I wrote this piece just to lay some cards on the table. The way AMD is approaching the concept of Turbo is very different to what most people are used to. The way AMD is binning its CPUs on a per-core basis is very different to what we’re used to. With all that in mind, peak turbo frequencies are not covered by warranty and are not guaranteed, despite the marketing material that goes into them. Users who find that a problem are encouraged to vote with their wallet in this instance.

Moving forward, I’m going to ask our motherboard editor, Gavin, to start tracking peak frequencies with our WSL tool. Because we’re defining the workload, our results might end up different to what users are seeing with their reporting tools while running CineBench or any other workload, but it can offer the purest result we can think of.

Ultimately the recommendations we made in our launch day Ryzen review still stand. If anything, if we had experienced some frequency loss, some extra MHz on the ST tests would push the parts slightly up the graph. Over time we will be retesting with the latest BIOS updates.

Detecting Turbo: Microseconds vs. Milliseconds
Comments Locked

144 Comments

View All Comments

  • Smell This - Wednesday, September 18, 2019 - link


    { s-n-i-c-k-e-r }
  • BurntMyBacon - Wednesday, September 18, 2019 - link

    Electron migration is generally considered to be the result of momentum transfer from the electrons, which move in the applied electric field, to the ions which make up the lattice of the interconnect material.

    Intuitively speaking, raising the frequency would proportionally increase the number of pulses over a given time, but the momentum (number of electrons) transferred per pulse would remain the same. Conversely, raising the voltage would proportionally increase the momentum (number of electrons) per pulse, but not the number of pulses over a given time. To make an analogy, raising the frequency is like moving your sandpaper faster while raising your voltage is like using coarser grit sandpaper at the same speed.

    You might assume that if the total number of electrons are the same, then the wear will be the same? However, there is a certain amount of force required to dislodge an atom (or multiple atoms) from the interconnect material lattice. Though the concept is different, you can simplistically think of it like stationary friction. Increasing the voltage increases the force (momentum) from each pulse which could overcome this resistance where nominal voltages may not be enough. Also, increasing voltage has a larger affect on heat produced than increasing frequency. Adding heat energy into the system may lower the required force to dislodge the atom(s). If the nominal voltage is unable or only intermittently able to exceed the required force, then raising the frequency will have little effect compared to raising the voltage. That said, continuous strain will probably weaken the resistance over time, but it is likely that this still less significant than increasing voltage. Based on this, I would expect (read my opinion) four things:
    1) Electron migration becomes exponentially worse the farther you exceed specifications (Though depending on where your initial durability is it may not be problematic)
    2) The rate of electron migration is not constant. Holding all variables constant, it likely increases over time. That said, there are likely a lot of process specific variables that determine how quickly the rate increases.
    3) Increasing voltage has a greater affect on electron migration than frequency. Increasing frequency alone may be considered far more affordable from a durability standpoint than increases that require significant voltage.
    4) Up to a point, better cooling will likely reduce electron migration. We are already aware that increased heat physically expands the different materials in the semiconductor at different rates. It is likely that increased heat energy in the system also makes it easier to dislodge atoms from their lattice. Reducing this heat build-up should lessen the effect here.

    Some or all of these may be partially or fully incorrect, but this is where my out of date intuition from limited experience in silicon fabrication takes me.
  • eastcoast_pete - Wednesday, September 18, 2019 - link

    Thanks Ian! And, as mentioned, would also like to hear from you or Ryan on the same for GPUs. With lots of former cryptomining cards still in the (used) market, I often wonder just how badly those GPUs were abused in their former lifes.
  • nathanddrews - Tuesday, September 17, 2019 - link

    My hypothesis is that CPUs are more likely to outlive their usefulness long before a hardware failure. CPUs failing due to overclocking is not something we hear much about - I'm thinking it's effectively a non-issue. My i5-3570K has been overclocked at 4.2GHz on air for 7 years without fault. I don't think it has seen any time over 60C. That said, as a CPU, it has nearly exhausted its usefulness in gaming scenarios due to lack of both speed and cores.

    What would cause a CPU to "burn out" that hasn't already been accounted for via throttling, auto-shutdown procedures, etc.?
  • dullard - Tuesday, September 17, 2019 - link

    Thermal cycling causes CPU damage. Different materials expand at different rates when they heat, eventually this fatigue builds up and parts begin to crack. The estimated failure rate for a CPU that never reaches above 60°C is 0.1% ( https://www.dfrsolutions.com/hubfs/Resources/servi... ). So, in that case, you are correct that your CPU will be just fine.

    But, now CPUs are reaching 100°C, not 60°C. That higher temperature range doubles the temperature range the CPUs are cycling through. Also, with turbo kicking on/off quickly, the CPUs are cycling more often than before. https://encrypted-tbn0.gstatic.com/images?q=tbn:AN...
  • GreenReaper - Wednesday, September 18, 2019 - link

    Simple solution: run BOINC 24/7, keeps it at 100°C all the time!
    I'm sure this isn't why my Surface Pro isn't bulging out of its case on three sides...
  • Death666Angel - Thursday, September 19, 2019 - link

    Next up: The RGB enabled hair dryer upgrade to stop your precious silicon from thermal cycling when you shut down your PC!
  • mikato - Monday, September 23, 2019 - link

    Now I wonder how computer parts had an RGB craze before hair dryers did. Have there been andy RGB hair dryers already?
  • tygrus - Saturday, September 28, 2019 - link

    The CPU temperature sensors have changed type and location. Old sensors were closer to the surface temperature just under the heatsink (more of an average or single spot assumed to be the hottest). Now its the highest of multiple sensors built into the silicon and indicates higher temperatures for the same power&area than before. There is always a temperature gradient from the hot spots to where heat is radiated.
  • eastcoast_pete - Wednesday, September 18, 2019 - link

    For me, the key statement in your comment is that your Sandy Bridge i7 rarely if ever went above 60 C. That is a perfectly reasonable upper temperature for a CPU. Many current CPUs easily get 50% hotter, and that's before any overclocking and overvolting. For GPUs, it even worse; 100 - 110 C is often considered "normal" for "factory overclocked" cards.

Log in

Don't have an account? Sign up now