Conclusion

Although the fundamental issue is clear in that some users are experiencing burnout of their Ryzen 7000X3D processors, the issue isn't limited to just Zen 4-based SKUs with 3D V-Cache. The problem could potentially lay at several doors, at a silicon level, the motherboard's implementation of SoC voltage, and, in some cases, an uncontrollable current rampaging through the chip and socket. As a highly destructive issue, which isn't only killing the processor, but in some cases, taking the motherboard socket with it, AMD and motherboard vendors are experiencing a tumultuous time in diagnosing and implementing a solid fix to resolve these issues.

While writing this article, Gamers Nexus posted a new video where they sent the failed CPU to an external laboratory to investigate the issue. The 'failure analysis report,' as GN is calling it, uses an external and unnamed lab to do a variety of testing including and not limited to, C-mode scanning acoustic microscopy, X-ray analysis, a 3D CT scan, high magnification microscopy, and scanning using an electron microscope Xsec.

The biggest takeaway from Gamers Nexus's most recent external lab-based analysis is that multiple manufacturing defects could have caused the issue. The lab in question couldn't identify the issues specifically, and much of it at this point is based on assumptions; to assume isn't a scientific method to establish anything from, only opinions. 

From our testing, we set out not to look at trying to replicate the burnout issue but to try and understand what AMD's AGESA updates are doing to different variables such as current, voltage, and power, typically focusing on the SoC, as that's what AMD has primarily concentrated on capping to try and alleviate the issues.

Looking at an overall view of the peak current we experienced with our AMD Ryzen 9 7950X3D with ASRock's X670E Taichi motherboard, we can see that everything is fundamentally well within control; this can be taken one of two ways. The first is that ASRock has done things correctly by applying 1.30 V to the SoC by default when applying memory overclocks via AMD's EXPO and XMP profiles. The second is that other vendors haven't been getting things as right, especially with reports of ASUS boards before the new AGESA firmware having OCP (over current protection) failures, resulting in too much current going through the chip. 

In our testing and focusing primarily on the amperage levels, we can see that the initial Ryzen 7000X3D firmware, AGESA 1.0.0.5c, did spike higher than all the other firmware in terms of SoC current. This is the case at default memory settings and with AMD EXPO applied to our G.Skill Trident Z5 Neo DDR5-6000 memory kit. Despite the higher peaks in SoC current, the peak didn't seem troublesome, but having a lower current is always advantageous in helping reduce overall power, heat, and in this case, not overloading and frying CPUs.

In our testing, the latest (at the time of writing) AGESA 1.0.0.7 (BETA) firmware had the lowest peak SoC current at default settings and the lowest amperage with EXPO applied. By setting 1.25 V instead of ASRock's 1.30 V default on the SoC voltage, we managed to lower the peak SoC amperage.

Turning to the average current through the SoC rails, AMD's AGESA 1.0.0.5c, as expected, has the highest average. AMD AGESA 1.0.0.7 (BETA), the newest at the time of writing, has the lowest average current levels. We reduced the average current by around 7% by setting 1.25 V on the SoC voltage instead of relying on ASRock's 'one size to fit all' approach. Interestingly, despite having the highest peak SoC current, AGESA 1.0.0.6 has a very marginally lower average current through the SoC, by 0.01 A. 

Final Words: Speculations on Ryzen 7000 Burnout Issue, But Nothing Conclusive

The biggest problem is that AMD's Ryzen 7000 series processors (mainly X3D) are burning up inside the socket, frying, and sometimes totaling the AM5 socket. This is a big issue that AMD and its partners still need to address 'properly.' It's not that they aren't working tirelessly to rectify the problem, as it released three AGESA firmware updates in just over a month. AMD's most significant strategy to fix the issue has been to curtail SoC voltage, which, as it has been found by Gamers Nexus, is at the root of the problem; it's not the only problem, but rampaging and unattended SoC current is a notable cause for the destruction.

Perhaps one of the other core problems with all of these issues is all of the speculation. We aren't interested in speculation because, as good as it is sometimes to speculate, assumptions can run wild. Even when Gamers Nexus sent one of their dead Ryzen 7000X3D CPUs to an external lab, the unnamed lab didn't come up with anything particularly conclusive. While it is clear there's a lot of speculation and analysis that's already been done, as well as more likely to come shortly until the root cause is identified, the buck stops with AMD and its partners.


Image Credit: Speedrookie/Reddit

From our testing, we can highlight clearly that we didn't experience any issues with the ASRock X670E Taichi, nor did we find any cause for concern. If anything, we can see one particular trend throughout our testing, and we're making this claim based on our testing; AMD's AGESA 1.0.0.6 looks rushed, and that's certainly not without benefit to users and scrutiny. It benefits users by not allowing them to accidentally enable too much SoC voltage to the chip, which in the case of ASRock's X670E Taichi on AGESA 1.0.0.5c, allowed us to set 2.50 V.

With the second BIOS fix through AGESA 1.0.0.7 (BETA), we observed more reserved SoC current, peak power from the SoC and more conservative average values. This is a step in the right direction in terms of lowering the likelihood that SoC voltage and current are going to kill the CPU. While AMD is rolling out its AGESA firmware, it's fundamental to note that these revisions are listed as BETA, which gives AMD room to improve for a comprehensively tested and tweaked firmware designed to alleviate all of the issues above.

Exposure to higher voltage and heat can energize the atoms and molecules of a dielectric material and trigger chemical reactions that break down its structure, leading to dielectric degradation. Common mechanisms of degradation include thermal oxidation and electrical breakdown, which respectively create defects and conductivity in the material. The end result is the loss of insulation properties, increased leakage currents, and eventual material failure.

In the case of the Ryzen 7000/X3D series, the large current and heat are accelerating dielectric degradation and are not only weakening the integrity of the silicon and the internals but it's effectively damaging them beyond a point of no return. This is why it's important to operate with lower voltages which in turn lowers current, lowers total power output, and in turn, lowers temperatures. Overshooting so high on something with a fragile component added through vias as a 3D packaged die is, isn't likely to turn out well, at least not from a theoretical standpoint.

It's expected that AMD is going to soon roll out a new fully-fledged AGESA firmware to mitigate these issues. Which, according to Gamers Nexus, is likely a result of failing to implement proper fail-safes in over current protection (OCP), thereby in some cases letting current run rampant through the CPU. Whether this is down to motherboard vendors such as ASUS, GIGABYTE, and ASRock, or is something under AMD's umbrella, is speculation at this point.

Our testing shows that the latest AGESA 1.0.0.7 (BETA) (BETA) firmware is undoubtedly better overall than the initial firmware. However, the news that AMD openSIL is set to replace AGESA firmware in 2026 is another variable entirely. The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks.

AGESA 1.0.0.5c to 1.0.0.7 Firmware Testing: Temps, Voltages, Currents, and Power
Comments Locked

39 Comments

View All Comments

  • Golgatha777 - Tuesday, May 16, 2023 - link

    I'm on a B650E-F with a 7700X CPU. With all that's gone on here with the X3D parts, I think I'll give it awhile to sort things out before I think about upgrading to the 7800X3D. I upgraded to 1406 when I first got my motherboard, and I believe was the first one with X3D support (until ASUS started daily edits of their BIOS and CPU support lists anyway). I have a very stable system, so I plan to sit on the sidelines and not upgrade my BIOS until there's a non-beta one that's been listed for at least a couple of months.
  • GreenReaper - Saturday, May 27, 2023 - link

    I think if it was actually damaged it would have likely shown up in improper working. Most of the damage seems to have been people manually increasing SoC voltage, being allowed to do so.
  • The_Assimilator - Tuesday, May 16, 2023 - link

    This embarrassing disaster is the cherry on top for the dismal and disappointing Zen 4 launch. AMD managed to replicate the original Zen's rubbish memory controller, but this time around they "fixed" it by allowing board partners to overvolt it through the roof - with the inevitable result. Play stupid games, win stupid prizes.
  • Sunrise089 - Tuesday, May 16, 2023 - link

    I’ve vaguely followed this story but am obviously still somewhat out of the loop.

    I appreciate this article, but it seems like the conclusion here could be “ASRock boards don’t suffer from the overvolting issue,” no?

    So what IS the real issue here? Is it just that Asus had bad voltage settings applied when users used faster memory? And that Asus just assumed they’d be fine because AMD would have protections in the chip that would prevent damage?

    Is there more to it than that? Because otherwise I don’t understand why this is being presented as a general issue affecting AMD and multiple board partners if it’s really only Asus-specific.
  • meacupla - Wednesday, May 17, 2023 - link

    There is more to it than that, yes. The older BIOS allowed 7000X3D to be overvolted when XMP was enabled. Asus was the most egregious, but this same flaw seems to have existed on all vendors.
    Asus mobos had a fail safe that didn't kick in properly.
    It seems that AMD chips also don't have a fail safe that kicks in properly either.

    AMD and mobo makers endorse fast RAM speeds, but AMD only "officially" supports DDR5-5200.
    To ensure maximum RAM compatibility, Asus likely pushed Vsoc too high to get DDR5-6000 to 6400 to work on their mobos.
    A high delta between Vsoc and Vram has resulted in poor RAM stability on the AM4 platform, and probably also does so on AM5, but that is just my guess.
  • Targon - Friday, May 26, 2023 - link

    There is a difference between allowing the user to do stupid things, and the BIOS by default doing stupid things. This goes back to the old idea of AMD having supported freedom by allowing motherboard makers to tune things themselves, but when those motherboard makers completely screw up and don't even read the, "you shouldn't go over 1.3V" guidance, causing things to go horribly wrong, then AMD had to remove some of those freedoms.

    Remember as well that Intel had lots of time to really focus on allowing a lot of voltage to their chips since Intel went from 6th to 10th generation on the same CPU design, and only factory overclocking(more clock speed but also needing more voltage) made newer chips actually faster from those generations. AMD hasn't had to do that for quite a while, and the Ryzen improvements since the Zen+ days to now have all been design improvements, combined with benefits that come from using better fab processes(lower voltages, higher clock speeds, etc).

    Realistically, there are some failsafes in place, but if the chip gets damaged due to excessive voltage, the failsafes in place seem to have broken down. It's like a fire killing your smoke detector, and as a result, you get no warning that your house is burning down.
  • edzieba - Wednesday, May 17, 2023 - link

    "So what IS the real issue here?"

    - No overvoltage limits (or limits set far above hardware-bricking levels) in hardware or in AMD's AGESA, from launch
    - No QC step by AMD and/or motherboard and/or DIMM vendors confirming voltage setpoints for EXPO do not exceed limits
    Or worse
    - No published voltage limits (or published limits incorrect) so everyone involved was flying blind in setting voltages in the first place

    That every motherboard manufacturer simultaneously and independently decided to exceed core voltage limits seems extraordinarily unlikely. More likely is that they all believed based on information from AMD that they were operating within safe voltage ranges, and subsequently optimised voltages for speed and stability over power consumption (as they have been doing for years with XMP) unaware of Ryzen's vulnerability.
  • Targon - Friday, May 26, 2023 - link

    AMD had given the guidance to the motherboard makers, but Asus clearly ignored that information. Further, when the X3D chips came out, AMD again would have had to tell the motherboard makers, "for this chip, these are the safe voltages!", and again, Asus dropped the ball, while clearly, ASRock and most others did not. If anything, that proves that ASRock is no longer that "low end garbage" brand that they were 20 years ago.
  • haplo602 - Wednesday, May 17, 2023 - link

    Thing is, nobody as of now explained why only 7800X3D burned out ... no other model did that ... Even GN did not try as their investigation was clearly in the clickbait and spectacle direction and not the scientific explanation direction ...
  • meacupla - Wednesday, May 17, 2023 - link

    Well the X3D vs regular is pretty obvious. Regular 7000 series are not as heat sensitive as X3D chips, since they don't have 3D V-cache sitting on top of the CPU.

    Between the X3D chips, it's not so obvious, since it could be any number of factors, including how the 7900X3D and 7950X3D are dual chiplets of dissimilar chips, how the BIOS was handling vsoc between the various CPUs, the most popular RAM configuration on those two (ie 16GB at 6000 vs 32~64GB at 3600~4800), etc.

    Trying to destructively test a 7900X3D and 7950X3D is going to be very expensive, very quick.

Log in

Don't have an account? Sign up now