Conclusion

Although the fundamental issue is clear in that some users are experiencing burnout of their Ryzen 7000X3D processors, the issue isn't limited to just Zen 4-based SKUs with 3D V-Cache. The problem could potentially lay at several doors, at a silicon level, the motherboard's implementation of SoC voltage, and, in some cases, an uncontrollable current rampaging through the chip and socket. As a highly destructive issue, which isn't only killing the processor, but in some cases, taking the motherboard socket with it, AMD and motherboard vendors are experiencing a tumultuous time in diagnosing and implementing a solid fix to resolve these issues.

While writing this article, Gamers Nexus posted a new video where they sent the failed CPU to an external laboratory to investigate the issue. The 'failure analysis report,' as GN is calling it, uses an external and unnamed lab to do a variety of testing including and not limited to, C-mode scanning acoustic microscopy, X-ray analysis, a 3D CT scan, high magnification microscopy, and scanning using an electron microscope Xsec.

The biggest takeaway from Gamers Nexus's most recent external lab-based analysis is that multiple manufacturing defects could have caused the issue. The lab in question couldn't identify the issues specifically, and much of it at this point is based on assumptions; to assume isn't a scientific method to establish anything from, only opinions. 

From our testing, we set out not to look at trying to replicate the burnout issue but to try and understand what AMD's AGESA updates are doing to different variables such as current, voltage, and power, typically focusing on the SoC, as that's what AMD has primarily concentrated on capping to try and alleviate the issues.

Looking at an overall view of the peak current we experienced with our AMD Ryzen 9 7950X3D with ASRock's X670E Taichi motherboard, we can see that everything is fundamentally well within control; this can be taken one of two ways. The first is that ASRock has done things correctly by applying 1.30 V to the SoC by default when applying memory overclocks via AMD's EXPO and XMP profiles. The second is that other vendors haven't been getting things as right, especially with reports of ASUS boards before the new AGESA firmware having OCP (over current protection) failures, resulting in too much current going through the chip. 

In our testing and focusing primarily on the amperage levels, we can see that the initial Ryzen 7000X3D firmware, AGESA 1.0.0.5c, did spike higher than all the other firmware in terms of SoC current. This is the case at default memory settings and with AMD EXPO applied to our G.Skill Trident Z5 Neo DDR5-6000 memory kit. Despite the higher peaks in SoC current, the peak didn't seem troublesome, but having a lower current is always advantageous in helping reduce overall power, heat, and in this case, not overloading and frying CPUs.

In our testing, the latest (at the time of writing) AGESA 1.0.0.7 (BETA) firmware had the lowest peak SoC current at default settings and the lowest amperage with EXPO applied. By setting 1.25 V instead of ASRock's 1.30 V default on the SoC voltage, we managed to lower the peak SoC amperage.

Turning to the average current through the SoC rails, AMD's AGESA 1.0.0.5c, as expected, has the highest average. AMD AGESA 1.0.0.7 (BETA), the newest at the time of writing, has the lowest average current levels. We reduced the average current by around 7% by setting 1.25 V on the SoC voltage instead of relying on ASRock's 'one size to fit all' approach. Interestingly, despite having the highest peak SoC current, AGESA 1.0.0.6 has a very marginally lower average current through the SoC, by 0.01 A. 

Final Words: Speculations on Ryzen 7000 Burnout Issue, But Nothing Conclusive

The biggest problem is that AMD's Ryzen 7000 series processors (mainly X3D) are burning up inside the socket, frying, and sometimes totaling the AM5 socket. This is a big issue that AMD and its partners still need to address 'properly.' It's not that they aren't working tirelessly to rectify the problem, as it released three AGESA firmware updates in just over a month. AMD's most significant strategy to fix the issue has been to curtail SoC voltage, which, as it has been found by Gamers Nexus, is at the root of the problem; it's not the only problem, but rampaging and unattended SoC current is a notable cause for the destruction.

Perhaps one of the other core problems with all of these issues is all of the speculation. We aren't interested in speculation because, as good as it is sometimes to speculate, assumptions can run wild. Even when Gamers Nexus sent one of their dead Ryzen 7000X3D CPUs to an external lab, the unnamed lab didn't come up with anything particularly conclusive. While it is clear there's a lot of speculation and analysis that's already been done, as well as more likely to come shortly until the root cause is identified, the buck stops with AMD and its partners.


Image Credit: Speedrookie/Reddit

From our testing, we can highlight clearly that we didn't experience any issues with the ASRock X670E Taichi, nor did we find any cause for concern. If anything, we can see one particular trend throughout our testing, and we're making this claim based on our testing; AMD's AGESA 1.0.0.6 looks rushed, and that's certainly not without benefit to users and scrutiny. It benefits users by not allowing them to accidentally enable too much SoC voltage to the chip, which in the case of ASRock's X670E Taichi on AGESA 1.0.0.5c, allowed us to set 2.50 V.

With the second BIOS fix through AGESA 1.0.0.7 (BETA), we observed more reserved SoC current, peak power from the SoC and more conservative average values. This is a step in the right direction in terms of lowering the likelihood that SoC voltage and current are going to kill the CPU. While AMD is rolling out its AGESA firmware, it's fundamental to note that these revisions are listed as BETA, which gives AMD room to improve for a comprehensively tested and tweaked firmware designed to alleviate all of the issues above.

Exposure to higher voltage and heat can energize the atoms and molecules of a dielectric material and trigger chemical reactions that break down its structure, leading to dielectric degradation. Common mechanisms of degradation include thermal oxidation and electrical breakdown, which respectively create defects and conductivity in the material. The end result is the loss of insulation properties, increased leakage currents, and eventual material failure.

In the case of the Ryzen 7000/X3D series, the large current and heat are accelerating dielectric degradation and are not only weakening the integrity of the silicon and the internals but it's effectively damaging them beyond a point of no return. This is why it's important to operate with lower voltages which in turn lowers current, lowers total power output, and in turn, lowers temperatures. Overshooting so high on something with a fragile component added through vias as a 3D packaged die is, isn't likely to turn out well, at least not from a theoretical standpoint.

It's expected that AMD is going to soon roll out a new fully-fledged AGESA firmware to mitigate these issues. Which, according to Gamers Nexus, is likely a result of failing to implement proper fail-safes in over current protection (OCP), thereby in some cases letting current run rampant through the CPU. Whether this is down to motherboard vendors such as ASUS, GIGABYTE, and ASRock, or is something under AMD's umbrella, is speculation at this point.

Our testing shows that the latest AGESA 1.0.0.7 (BETA) (BETA) firmware is undoubtedly better overall than the initial firmware. However, the news that AMD openSIL is set to replace AGESA firmware in 2026 is another variable entirely. The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks.

AGESA 1.0.0.5c to 1.0.0.7 Firmware Testing: Temps, Voltages, Currents, and Power
Comments Locked

39 Comments

View All Comments

  • techjunkie123 - Tuesday, May 16, 2023 - link

    Any by anandtech reviewer, I meant reader.
  • TheinsanegamerN - Wednesday, May 17, 2023 - link

    As predicted, GN calls out AMD and people start whining.
  • Silver5urfer - Wednesday, May 17, 2023 - link

    GN kisses Nvidia and people shrug off. Also their new Muh Failure rate website page, it does not list the LGA1700 socket engineering failure, but has 12VHPWR as "Fixed per GN standards" is laughable at best as the socket design causes other unwanted behavior along with the HS contact and longevity of the PCB traces. See Buildzoid IMC video on RPL, in short contact issue for the Socket but hey you can use the Thermalgrizzly Contact Frame and fix it while screwing your mobo in the process with non-factory Torque spec funnily Thermalright one is far superior, however since GN said the former is good all people learn the hardway.

    Both the OPs are correct, throwing a bunch of zoom images and extrapolating on lack of information with confusing the end user tricking onto some space age analysis does not help. Meanwhile AT's solid pieces on both LGA1700 bendgate and the AGESA on X3D provides far more useful information the Thermal and Electrical behavior from 3 diff AGESA versions is excellent approach to check what is going on than poking in the dark (Lithography, Metallurgy etc), only positive thing to come out of GN was ASUS rolling back their shady tactics esp many YTers called them out.

    Anandtech has no rival in how they cover, I wish they reviewed GPUs and staff did not leave (Ian, Andrei etc), but the facts are hard truth, like the YT content killed blogs like this which is a big loss to many but most of the people around the world do not care for in-depth pieces and real Tech Journalism which is not capitalizing on the content for clicks. All the reviewers out there just simply copy paste the slide deck OR read their PR guide in virtually all the sites / videos except AT.
  • cheshirster - Tuesday, May 16, 2023 - link

    "we can see that everything is fundamentally well within control"
    I've seen an interesting behavior on GB board.
    It was trying to prevent high voltage delta between vmem and vsoc when using manual settings.
    The parasite current between different voltage lines (in case of too big delta) could be the case of the problem and you won't see the solution working by simply measuring voltages.
  • meacupla - Tuesday, May 16, 2023 - link

    On Ryzen 2000, 3000, and 5000, high voltage delta between Vsoc and Vram did result in poor ram stability. Particularly when the DDR4 required more than 1.35V to run at its rated speed.
  • Silver5urfer - Tuesday, May 16, 2023 - link

    I already knew that Failure Analysis Lab won't do anything. It is known fact that how can an unnamed lab can breakdown the reason of the CPU failure after the fact when they are not the ones associated with the OEM manufacturer of the said processor. Same for that 12VHPWR GN's video which did not yield anything. It's just a shock value capitalization for the maximum hits on the topic the real deal was 12VHPWR Nvidia statement which was given to GN rather like AMD who is giving it from their PR handle directly, allowing Nvidia to shrug off.

    Anyways moving on the AMD perhaps did not do proper verification as this is their First take at EXPO and the Zen 4 processor outside the Server HPC space where they are severely limited to add the extra consumer features like Overclocking, and Intel has an edge here because Intel has been doing the OC business since their Core series processors debuted on DDR3 which means literally 2-3 generations of Memory Overclock experience. Plus they sponsor HWBot too.

    I find Anandtech's conclusions far more useful, yet read non conclusive as stated which is obvious when you are dealing with Microprocessors made in this era where we have ton of variables at play. Plus gives a good insight on how the Current, Voltage and Temperatures are being effected thus giving some picture of the inner workings of the CPU which we cannot really ever know because one AMD does not provide documentation / datasheets like Intel, two is these are bleeding edge tech, it's hard to know many things esp when you have a non monolithic design, on Intel it's easy and esp Intel can fix the clock rate as in user can do it. Plus no uncore sitting on a different piece of die (may change in MTL and ARL in the future).

    So yea this is a great piece on how the AGESA varies than a nice Electron Microscope zoom picture content with a ton of VLSI terminology thrown at the user to confuse them. However I do agree with GN's ASUS part 100%, because that company has been complete pile of rubbish nowadays. I had to return a lot of Z590 boards because their Mobo PCB paint was chipped off on brand new APEX boards. Then the whole BIOS problems associated with ASUS - ROG Forums are a disaster now, they killed the site with mobile focus. ASUS implements ARB, Anti Roll Back, forcing you to get restricted to a BIOS this is very bad because on Z590 their boards had RTX40 series PCIe4.0 issue as in they did not run on 4.0 speed. Beta BIOS had the fix but actual update did not and they had ARB on the actual one, that's how bad ASUS is, some of them cannot even be rolled back even if you use BIOS Flashback. The Armory Crate is a cancer software which you cannot get rid of due to Registry into deep OS stack. Same like Intel XTU (Should use Throttlestop which is leagues ahead).

    All in all it's unfortunate situation AMD should improve this and gain from this experience, they learned a lot with Zen 3, the IODie was a mess on it now Zen 4 is solid in that dept esp when they reduced the Memory variables from 3 values (fclk, mclk, uclk) in zen 3 to zen 4 now only has 2 resulting in a stable I/O handling. Plus the significantly higher clock ratio etc.
  • Silver5urfer - Tuesday, May 16, 2023 - link

    More clarification,

    Nvidia's 12VHPWR ultimately was flawed as Intel ATX 3.0 power revises the 4 Sense pins to be elongated plus use of the Tulip design vs the Dot which was mentioned by Igor and ignored by GN. And the fact that AMD's R9 295 X2 on Anandtech here pulls 500W using just 2x8Pin standard further reinforcing that 12VHPWR is a clearly rushed one, and the fact that RTX 3090Ti does not have this problem because of lack of 4 Sense pins thus limiting the hard power cap.

    And about ASUS, to add after getting caught they are now issuing a PR that all Mobos with EXPO and Beta also are covered under warranty. GN take did something good they have a long way to go esp how their BIOS is top notch yet they cram too much voltage into every possible way. Taking advantage of the brand value and consumer mindset.
  • Hairs - Saturday, May 27, 2023 - link

    Igor's analysis on the 12vHPWR was absolute guesswork based on looking at some mobile phone pictures posted to reddit.

    The failure analysis lab pointed out that tulip vs dot would not provide sufficient difference in power connector stability to cause the issue on its own, as suggested by Igor.

    GN is the only tech outlet that did any actual testing on the cables, and they did this not just by sending one out to a professional lab for verification, but by doing individual unit tests on multiple physical tables to re-create possible error scenarios. Of all the possiblities (and they tested different cables from different vendors) the only one which reliably recreates the burnout is when the connector is both not fully seated, and also sits at a slight angle in the socket. Both of these are user-error problems, but the user error is compounded by the fact that the physical security of the socket (not the internal design of the pin connectors) isn't robust enough compared to the old 8-pin design.

    Literally everyone else was guessing. Only GN actually tried to recreate the problem and validate what was going on.
  • Hairs - Saturday, May 27, 2023 - link

    "The key takeaway is that, at least on the ASRock X670E Taichi, things are working as they should be with AGESA 1.0.0.7 (BETA), and we look forward to a full release (non-BETA) of their latest AGESA in the coming weeks."

    Anandtech haven't event tested OCP, which is one of ASUS's primary failures and yet claim "everything" is working. Great conclusion and in-depth analysis there. "Ignore that other reviewer who claims there are multiple problems, I ran HWinfo and it's grand."

    Calling GN's work "shock value capitalization" when they were literally the only tech reviewers either on YT or on written blogs who took actual time to analyse things and delayed their content specifically to avoid bandwagon-jumping and pushing a scare narrative that all cards using the connector were guaranteed to burn up is laughable.

    Where was Anandtech's deep analysis of the topic? Oh right they haven't done any GPU work in years other than reprint PR releases.
  • army165 - Tuesday, May 16, 2023 - link

    I have a 7800X3D and an Asus B650 board. Should I take it out and inspect it for damage? I upgraded to the 1303 BIOS when I got the board and didn't move to the Beta BIOS 1410 until after Asus redacted their "we won't fix this if you use this" message on the BIOS description.

Log in

Don't have an account? Sign up now