Data Corruption - not Political Corruption - with NVIDIA’s Latest Boards

Our performance board roundups ended up delayed for a variety of reasons, but we will be back on track next week. Every conceivable problem has hit us from shoddy BIOS releases to repeated problems getting Crysis to benchmark correctly under 64-bit Vista. We are still not sure about the latter problem, as one image works and another does not on identical hardware and software setups. We finally got to the point of being able to benchmark, but it is not a process we would wish upon our worst enemies.

However, none of that compares to the data corruption problems we are seeing intermittently on the 790i and 780i platforms. We honestly thought NVIDIA had solved these problems back in 2006 on the 680i platform. Since the MCP has not changed, it is disconcerting to us that this problem seems to be rearing its ugly head again. This time, the data corruption problems appear contained to memory overclocking, especially on the 790i boards. We are not talking massive overclocks here, but apparently hitting the right combination of FSB rates around 400 and memory speeds above DDR3-1600 seem to trigger our problems. Also, we have been able to reach higher DDR3 speeds with absolute stability on the 790i than on the X48 during extreme overclocking, so this problem is even more perplexing to us.

On the 780i boards, the magical combination is right above 400MHz FSB (1600 QDR) and memory unlinked anywhere from DDR2-900~1200. Our 780i problems have been minor for the most part, but the underlying problem is that after the systems recover from a BSOD, we typically have stability problems or gremlin behaviors until we reload the system. This same problem can occur on Intel or AMD chipset boards, but it is extremely rare in our experiences to date unless we absolutely pushed the memory beyond reasonable settings.


Back to the 790i boards; the data corruption problems have occurred more frequently as the boards (and their early BIOS revisions) seem more susceptible to faulty behavior when pushing the memory above DDR3-1600 with low latencies. We have not nailed downed exact settings at this point, as they tend to fluctuate between test sessions and boards. What we do know is that we are tired of constantly reloading our images after making minor changes to our settings.

It is possibly coincidence only, but over the past couple of months we have lost two WD Raptors, a couple of Samsung 500GB drives, and a WD 250GB drive while benchmarking the 790/780i boards. It may have just been time for these drives to meet their maker, as our particular samples have spent significant time running benchmarks almost 24/7 over the past year or so (it might not sound like a long time, but we totally abuse the drives to some degree when testing in this manner). We have certainly had hard drive failures when testing other chipsets, ranging from complete mechanical breakdowns to index tables being so corrupted that we could not fully recover the disk. It could just be bad luck on our part.

However, we think it goes deeper than that. After the first roundups this coming week, we plan to delve into it. The reason is that we have not had any data corruption problems testing our 650i/750i, GeForce 6100/6150, or GeForce 7050 boards, none of which utilize the MCP in the 680/780/790i boards.  Of course, this could be tied to the fact that we do not push the boards as hard, but knowing about the previous 680i problems makes us think the current BIOS code or Vista drivers need to be revised again.

Other problems

We share test notes on an almost continual basis with each other when testing boards. We thought some of the test notes from our upcoming roundup would be interesting. In all fairness to NVIDIA, we are including our X48 thoughts as we wrap up testing.

790i test notes:

a) CPU multiplier likes to changes at will, causing an inability to POST after changing BIOS options. (Problem is likely linked to bad NVIDIA base code).

b) Poor memory read performance above 475FSB unless you enable “P1” and “P2” which NVIDIA refuses to document operation of or provide information about.

c) EVGA/XFX (NVIDIA reference design) lacks support for tRFC tuning - high density DDR3 configurations often refuse to work unless the module SPDs are tuned from the manufacturer. (This makes them needlessly slow in low-density configurations.)

d) The chipset does not do a very good job of balancing read vs. write priorities with respect to memory access - copy scores lower than X38/X48.

e) Regardless of what NVIDIA says, we think PCI-E 2.0 (and 1.x) implementation is still better on Intel’s Express chipsets - give us SLI on Intel to prove it!!!

f) Possible problem with NVIDIA reference design: sustained overclocked operation at >~1.9V for VDIMM may cause critical failure of 790i (Ultra) SPP. This does not seem to affect ASUS S2E design and is the most critical issue facing the board; we need to verify before making recommendations.

g) Possible HDD corruption issues. (We lost the two 74GB WD Raptors so far…)

X48 test notes:

a) Chipset defaults to tRD values that are excessively loose and are not competitive with NVIDIA’s new 790i. The problem is most MB manufacturers do not allow this to be specifically tuned in the BIOS.

b) DMI interface (x4 PCI-E link) is sloooow….X38/X48 should have been paired with ICH10(R), which will be PCI-E 2.0 compliant on the link interface.

c) Haven’t found an Intel X48 board yet that will handle 8GB of DDR3 properly, even though this is a major bullet for chipset support - board or memory makers? (We need to test this on the Intel DX48BT2 that just arrived.)

d) Chipset runs HOT…might even be hotter than 790i. Intel should have shrunk this thing long ago!

That is it for now and we will have additional information in the first roundup. Now a take on Gigabyte.

Pop goes the MOSFET Walking the Plank with Gigabyte...
POST A COMMENT

81 Comments

View All Comments

  • johnsonx - Monday, April 7, 2008 - link

    After running for 4 days (not doing anything, just idling but at 3.2ghz - no CnQ yet), nothing appears to be wrong with that system. I guess the proof will come when I put it under a full load later. Reply
  • WW2Planes1 - Saturday, April 5, 2008 - link

    Power moFSets?

    other than that, good to know about the power requirements of the new Phenoms, probably wouldn't have crossed my mind when I go to build my new system. Although, after reading this, I'll probably wait a while at the moment.
    Reply
  • piroroadkill - Sunday, April 6, 2008 - link

    I noticed that too, it should surely be MOSFET.. metal–oxide–semiconductor field-effect transistor. Reply
  • Glenn - Saturday, April 5, 2008 - link

    Great article and kudos for the honesty in the face of some hardware giants! In the end, hopefully they will appreciate it too!

    I build and service alot of systems and have learned some hard lessons along the way. My philosophy may not work for others but it has certainly made my life easier. I quit using anything but Intel chipsets, which also required a shift away from using amd processors.

    No matter how much good I read about Nvidia, SIS, ATI or other chipsets, in real world day to day use, there has always seemed to be hair growing out of something! My experience has shown that for every purported preformance or functionality promise, there has been a reliability tradeoff somewhere which I am ultimately responsible for. No Thanks! I may still venture away from that philosophy on my own system occasionally, but if it's built to sell, then its Intel!
    Reply
  • sprockkets - Saturday, April 5, 2008 - link

    WHen you say you lost the HDD, did they just go corrupt or did they fail to work or be detected anymore?

    Reply
  • Gary Key - Sunday, April 6, 2008 - link

    We have two 74GB Raptors that are basically dead, they will power up but cannot be low-level formatted. The WD 250GB drive basically had the same problem. The two 500GB Samsungs will power up and repeat a click-clack pattern. The Samsungs have been returned for analysis as will the 74GB Raptors.

    I am getting ready to do the same for a pair of 150GB Raptors that I wrote off in December. However, those drives failed (no longer accessible) during RAID testing on the 780i board (usually I yell at myself when that happens as I have had far too many RAID 0 arrays drop a drive over the years). I did not think much of it until we started having these data corruption problems over the past six weeks.

    Between Kris, Raja, and I there have probably been around 14~16 image reloads the past six weeks after overclocking. We fully except to trash the OS when exploring the boundaries of memory/fsb rates, but it might happen once or twice a month at best and is not limited to NVIDIA chipsets. However, all of these failures have been on the 780i/790i boards and we were not really pushing the systems except for two times when the drives failed or the images were corrupted.

    The frustrating/perplexing problem is that the 790i testing with Kris resulted in some of the best overclocks we have ever experienced and they were 100% stable. We changed the settings to a normal overclock at 400FSB/1600 DDR3 and the images are corrupted or the drives went south. It is not repeatable. We have seen results like ours in various forums so there is something amiss here, just trying to find it right now.

    In all cases, we have had the memory settings set at something other than stock/default. I am still working with Derek as he has experienced several data corruption problems during SLI testing the past couple of weeks. I did not mention that until we figure out if his problems are related to ours.
    Reply
  • TheBeagle - Saturday, April 5, 2008 - link

    I was wondering (and hoping) if this article would ever appear. It took guts for Gary and AT to publish this article. We all know by the banners, etc on Anandtech that Gigabyte is a major advertiser on this web site. So for Gary to "tell it like it is" is truly a breath of fresh air. Gary was quite understated in his description of that FIRESTORM that is brewing against Gigabyte on account of its rather insane handling of this fiasco involving the failed N680i boards. In fact, this matter ought to be a case study on how to NOT handle a public relations crisis!

    What is even worse, is the equally asinine reported present requirement of Gigabyte that an owner of a failed N680i board has to actually own a QX6850 processor (and show a receipt and pictures of it) in order to get a replacement/upgraded motherboard. That is just NUTS! The N680i board NEVER supported that processor, although it was clearly and openly advertised on the web, the literature and on the board packaging to specifically support an "Intel Dual Core 2 1333FSB Extreme" processor. That condition concerning ownership of an QX6850 CPU is just a flat out slimy maneuver by Gigabyte to avoid having to replace these failed N680i boards!

    I, for one, want to openly thank Gary and AT for their courage to disclose this matter in a published article - WELL DONE!!

    Best regards to everyone. TheBeagle
    Reply
  • gfredsen - Saturday, April 5, 2008 - link

    "it is discerning to us that this problem seems to be rearing its ugly head again." Did you mean perhaps to say disconcerting? I know how it is, believe me I know. Reply
  • Gary Key - Sunday, April 6, 2008 - link

    Sorry, I set the article to a post time that occurred before I finished my final edits and Jarred had the opportunity to complete his edits. It was disconcerting to me that I was still writing while the article was live. ;-) Thanks for the comments. Reply
  • corporategoon - Sunday, April 6, 2008 - link

    I'll pick nits!

    There are sentence fragments, incorrect words, wrong phrases (low and behold should be Lo and behold), and sentences that just don't make sense. Even without the missing word, "In addition, we will look at what we despise about the new releases of PowerDVD 8 and WinDVD 9, maybe it’s not their fault but whose it." still isn't a proper sentence.

    Great work on the research side - I'm guessing this was just a quickly written article to address these issues before the full reviews go up. Still...
    Reply

Log in

Don't have an account? Sign up now