Conclusion

Although the fundamental issue is clear in that some users are experiencing burnout of their Ryzen 7000X3D processors, the issue isn't limited to just Zen 4-based SKUs with 3D V-Cache. The problem could potentially lay at several doors, at a silicon level, the motherboard's implementation of SoC voltage, and, in some cases, an uncontrollable current rampaging through the chip and socket. As a highly destructive issue, which isn't only killing the processor, but in some cases, taking the motherboard socket with it, AMD and motherboard vendors are experiencing a tumultuous time in diagnosing and implementing a solid fix to resolve these issues.

While writing this article, Gamers Nexus posted a new video where they sent the failed CPU to an external laboratory to investigate the issue. The 'failure analysis report,' as GN is calling it, uses an external and unnamed lab to do a variety of testing including and not limited to, C-mode scanning acoustic microscopy, X-ray analysis, a 3D CT scan, high magnification microscopy, and scanning using an electron microscope Xsec.

The biggest takeaway from Gamers Nexus's most recent external lab-based analysis is that multiple manufacturing defects could have caused the issue. The lab in question couldn't identify the issues specifically, and much of it at this point is based on assumptions; to assume isn't a scientific method to establish anything from, only opinions. 

From our testing, we set out not to look at trying to replicate the burnout issue but to try and understand what AMD's AGESA updates are doing to different variables such as current, voltage, and power, typically focusing on the SoC, as that's what AMD has primarily concentrated on capping to try and alleviate the issues.

Looking at an overall view of the peak current we experienced with our AMD Ryzen 9 7950X3D with ASRock's X670E Taichi motherboard, we can see that everything is fundamentally well within control; this can be taken one of two ways. The first is that ASRock has done things correctly by applying 1.30 V to the SoC by default when applying memory overclocks via AMD's EXPO and XMP profiles. The second is that other vendors haven't been getting things as right, especially with reports of ASUS boards before the new AGESA firmware having OCP (over current protection) failures, resulting in too much current going through the chip. 

In our testing and focusing primarily on the amperage levels, we can see that the initial Ryzen 7000X3D firmware, AGESA 1.0.0.5c, did spike higher than all the other firmware in terms of SoC current. This is the case at default memory settings and with AMD EXPO applied to our G.Skill Trident Z5 Neo DDR5-6000 memory kit. Despite the higher peaks in SoC current, the peak didn't seem troublesome, but having a lower current is always advantageous in helping reduce overall power, heat, and in this case, not overloading and frying CPUs.

In our testing, the latest (at the time of writing) AGESA 1.0.0.7 (BETA) firmware had the lowest peak SoC current at default settings and the lowest amperage with EXPO applied. By setting 1.25 V instead of ASRock's 1.30 V default on the SoC voltage, we managed to lower the peak SoC amperage.

Turning to the average current through the SoC rails, AMD's AGESA 1.0.0.5c, as expected, has the highest average. AMD AGESA 1.0.0.7 (BETA), the newest at the time of writing, has the lowest average current levels. We reduced the average current by around 7% by setting 1.25 V on the SoC voltage instead of relying on ASRock's 'one size to fit all' approach. Interestingly, despite having the highest peak SoC current, AGESA 1.0.0.6 has a very marginally lower average current through the SoC, by 0.01 A. 

Final Words: Speculations on Ryzen 7000 Burnout Issue, But Nothing Conclusive

The biggest problem is that AMD's Ryzen 7000 series processors (mainly X3D) are burning up inside the socket, frying, and sometimes totaling the AM5 socket. This is a big issue that AMD and its partners still need to address 'properly.' It's not that they aren't working tirelessly to rectify the problem, as it released three AGESA firmware updates in just over a month. AMD's most significant strategy to fix the issue has been to curtail SoC voltage, which, as it has been found by Gamers Nexus, is at the root of the problem; it's not the only problem, but rampaging and unattended SoC current is a notable cause for the destruction.

Perhaps one of the other core problems with all of these issues is all of the speculation. We aren't interested in speculation because, as good as it is sometimes to speculate, assumptions can run wild. Even when Gamers Nexus sent one of their dead Ryzen 7000X3D CPUs to an external lab, the unnamed lab didn't come up with anything particularly conclusive. While it is clear there's a lot of speculation and analysis that's already been done, as well as more likely to come shortly until the root cause is identified, the buck stops with AMD and its partners.


Image Credit: Speedrookie/Reddit

From our testing, we can highlight clearly that we didn't experience any issues with the ASRock X670E Taichi, nor did we find any cause for concern. If anything, we can see one particular trend throughout our testing, and we're making this claim based on our testing; AMD's AGESA 1.0.0.6 looks rushed, and that's certainly not without benefit to users and scrutiny. It benefits users by not allowing them to accidentally enable too much SoC voltage to the chip, which in the case of ASRock's X670E Taichi on AGESA 1.0.0.5c, allowed us to set 2.50 V.

With the second BIOS fix through AGESA 1.0.0.7 (BETA), we observed more reserved SoC current, peak power from the SoC and more conservative average values. This is a step in the right direction in terms of lowering the likelihood that SoC voltage and current are going to kill the CPU. While AMD is rolling out its AGESA firmware, it's fundamental to note that these revisions are listed as BETA, which gives AMD room to improve for a comprehensively tested and tweaked firmware designed to alleviate all of the issues above.

Exposure to higher voltage and heat can energize the atoms and molecules of a dielectric material and trigger chemical reactions that break down its structure, leading to dielectric degradation. Common mechanisms of degradation include thermal oxidation and electrical breakdown, which respectively create defects and conductivity in the material. The end result is the loss of insulation properties, increased leakage currents, and eventual material failure.

In the case of the Ryzen 7000/X3D series, the large current and heat are accelerating dielectric degradation and are not only weakening the integrity of the silicon and the internals but it's effectively damaging them beyond a point of no return. This is why it's important to operate with lower voltages which in turn lowers current, lowers total power output, and in turn, lowers temperatures. Overshooting so high on something with a fragile component added through vias as a 3D packaged die is, isn't likely to turn out well, at least not from a theoretical standpoint.

It's expected that AMD is going to soon roll out a new fully-fledged AGESA firmware to mitigate these issues. Which, according to Gamers Nexus, is likely a result of failing to implement proper fail-safes in over current protection (OCP), thereby in some cases letting current run rampant through the CPU. Whether this is down to motherboard vendors such as ASUS, GIGABYTE, and ASRock, or is something under AMD's umbrella, is speculation at this point.

Our testing shows that the latest AGESA 1.0.0.7 (BETA) (BETA) firmware is undoubtedly better overall than the initial firmware. However, the news that AMD openSIL is set to replace AGESA firmware in 2026 is another variable entirely. The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks.

AGESA 1.0.0.5c to 1.0.0.7 Firmware Testing: Temps, Voltages, Currents, and Power
Comments Locked

39 Comments

View All Comments

  • haplo602 - Thursday, May 18, 2023 - link

    but they brag about the sunk cost and beg for shop purchases throughout the whole video ...

    they could have at least tried a regular 7800 in the same mobo and compare the voltage readings to have at least something relevant ...
  • dan121loveu - Wednesday, May 17, 2023 - link

    Gamernexus premise on poor SOC overvolt is wrong. You have to question if their other findings are reliable. Unofficial Asus video on die sense and socket sense, not a new thing, they had one few years back in the same channel. They have even put this features in their ROG X670E homepage. Auto-translate to english video here https://www.youtube.com/watch?v=l8r4LVV_jsQ
  • Silver5urfer - Wednesday, May 17, 2023 - link

    Very interesting and a good video.

    Major takeaway point are - Die Sense is enabled on default for the C8H and other premium ASUS boards, and GN has same board, and that means HWInfo should actually show proper data. This is like those premium Intel boards which have a switch that allows directly reading from Die Sense on the fly (HWinfo helps to spit out that data). Only diff here it's already default you get 2 measurement points to compare it to !

    So GN fked up by using the farthest point, and they did not even check or bothered to check the HWinfo reading when they are doing their big space age big brain investigation and throw a bunch of terms at the audience to confuse massively. 1M views already are not free so the sensationalism has to get the maximum coverage skipping all the points in the middle and it works, always did because avg consumer is a dumb rock.

    I presume AT's X670 Taichi is also similar design, so the VSoC reading is accurate (guessing). Now if you go to Igor's Lab they have a Gigabyte Aorus X670E which is also using Mobo read points, they make the similar mistake like GN using board readouts farthest ones to measure the Voltages which gives them again wrong picture how GB is also shoving 0.03-0.05 volts more despite the new AGESA, and they are ignoring the HWinfo readings as I see only one measurement result from them too. Why not just double check the HWInfo readings instead go for the singular measurement point ?

    Top notch journalism nowadays lmao..
  • haplo602 - Thursday, May 18, 2023 - link

    so you boot up the system into BIOS/UEFI, change the settings you want to test and then it reboots and fries the CPU right away ... HOW do you get anything from HWinfo there when you did not even make it to Windows with a functional CPU ? but I am sure you would figure out a way genius ...
  • Silver5urfer - Thursday, May 18, 2023 - link

    You completely seem to miss the point. Let alone understand this. I'm talking about the behavior while you are talking about a scenario of all CPUs are dying and so boo hoo I cannot get a read out. If I had an Intel board I'd knew it because I as I alr mentioned I know Die Sense on Apex exists directly.
  • Techie2 - Wednesday, May 17, 2023 - link

    The key takeaway for me is that the majority of mobo makers caused the burnouts by automatically bumping the SoC voltages too high when EXPO is enabled. It does not surprise me at all that Asus had excessive voltage. IME they always push the envelope to get minutely better performance numbers and great reviews. It does not surprise me that Asrock used a proper mobo design. They have been doing this for many years IME.
  • dicobalt - Thursday, May 18, 2023 - link

    This reminds me when Intel released Core was first released and the memory controller would get easily fried. I was one of the fryers.
  • biostud - Friday, May 19, 2023 - link

    I'm using the 1.21 BIOS with 1.0.0.6 AGESA in my ASRock X670E PRO RS, it only applies 1.25V voltage for vSoc on my 7800X3D.
  • GreenReaper - Saturday, May 27, 2023 - link

    And that's likely all you need. It's both the minimum and maximum for me - I wasn't able to go beyond 1.25V (which incidentally measured as 1.272V...) without running into issues, while going below it showed computation errors in y-Cruncher's HNT test - a great tool for diagnosing Infinity Fabric instability, which also applies to BOINC tasks.

Log in

Don't have an account? Sign up now