The SF-2281 BSOD Bug

A few weeks ago I was finally able to reproduce the SF-2281 BSOD bug in house. In working on some new benchmarks for our CPU Bench database I built an updated testbed using OCZ's Agility 3. All of the existing benchmarks in CPU Bench use a first generation Intel X25-M and I felt like now was a good time to update that hardware. My CPU testbeds need to be stable given their importance in my life so if I find a particular hardware combination that works, I tend to stick to it. I've been using Intel's DH67BL motherboard for this particular testbed since I'm not doing any overclocking - just stock Sandy Bridge numbers using Intel's HD 3000 GPU. The platform worked perfectly and it has been crash free for weeks.

A slew of tablet announcements pulled me away from CPUs for a bit, but I wanted to get more testing done while I worked on other things. With my eye off the ball I accidentally continued CPU testing using an ASUS P8Z68-V Pro instead of my Intel board. All of the sudden I couldn't complete a handful of my benchmarks. I never did see a blue screen but I'd get hard locks that required a power cycle/reset to fix. It didn't take me long to realize that I had been testing on the wrong board, but it also hit me that I may have finally reproduced the infamous SandForce BSOD issue. The recent Apple announcements once more kept me away from my CPU/SSD work but with a way to reproduce the issue I vowed to return to the faulty testbed when my schedule allowed.

Even on the latest drive firmware, I still get hard locks on the ASUS P8Z68-V Pro. They aren't as frequent as before with the older firmware revision, but they still happen. What's particularly interesting is that the problem doesn't occur on Intel's DH67BL, only on the ASUS board. To make matters worse, I switched power supplies on the platform and my method for reproducing the bug no longer seems to work. I'm still digging to try and find a good, reproducible test scenario but I'm not quite there yet. It's also not a Sandy Bridge problem as I've seen the hard lock on ASRock's A75 Extreme6 Llano motherboard, although admittedly not as frequently.

Those who have reported issues have done so from a variety of platforms including Alienware, Clevo and Dell notebooks. Clearly the problem isn't limited to a single platform.

At the same time there are those who have no problems at all. I've got a 240GB Vertex 3 in my 2011 MacBook Pro (15-inch) and haven't seen any issues. The same goes for Brian Klug, Vivek Gowri and Jason Inofuentes. I've sent them all SF-2281 drives for use in their primary machines and none of them have come back to me with issues.

I don't believe the issue is entirely due to a lack of testing/validation. SandForce drives are operating at speeds that just a year ago no one even thought of hitting on a single SATA port. Prior to the SF-2281 I'm not sure that a lot of these motherboard manufacturers ever really tested if you could push more than 400MB/s over their SATA ports. I know that type of testing happens during chipset development, but I'd be surprised if every single motherboard manufacturer did the same.

Regardless the problem does still exist and it's a valid reason to look elsewhere. My best advice is to look around and see if other users have had issues with these drives and have a similar system setup to you. If you do own one of these drives and are having issues, I don't know that there's a good solution out today. Your best bet is to get your money back and try a different drive from a different vendor.

Update: I'm still working on a sort of litmus test to get this problem to appear more consistently. Unfortunately even with the platform and conditions narrowed down, it's still an issue that appears rarely, randomly and without any sort of predictability. SandForce has offered to fly down to my office to do a trace on the system as soon as I can reproduce it regularly. 

Introduction The Newcomers
POST A COMMENT

90 Comments

View All Comments

  • secretanchitman - Thursday, August 11, 2011 - link

    Thanks for the great review Anand! I'm rocking a Patriot Wildfire 240GB in my 2011 15" mbp (2.2ghz, 8GB, 6750m 1GB, 1680x1050 anti-glare) and it's been 100% perfect. I haven't seen any errors whatsoever in snow leopard, lion, and windows 7 via boot camp.

    These benchmarks are pretty consistent with what I see on my own drive, although the 240GB is a bit higher all around. :)
    Reply
  • Movieman420 - Thursday, August 11, 2011 - link

    Here is a good summary of the issue to date:

    From Ocz:

    '...I think the ultimate fix will come with a FW coupled with Orom change and new RST/IME driver and possibly UEFI update for the motherboards, the issue needs to be nailed down, at this time its floating around with Orom changes etc and what ever SF do can be countered by what the Orom is doing...and yes SF are talking to Intel so i would hope between them they can get it worked out....

    Full Post:

    http://www.ocztechnologyforum.com/forum/showthread...
    Reply
  • Nickel020 - Thursday, August 11, 2011 - link

    Was gonna post this as well as the likely cause for the problems with the Asus board.

    Then again, if the Intel H67 is your testbed Anand, have you even updated the BIOS or are you staying with an older one for comparability? With an older BIOS it might have an older OROM as well and thus the issue could then not be solely caused by the OROM.
    Reply
  • xijox - Thursday, August 11, 2011 - link

    Thank you for another great write-up, Anand!

    I'm curious why you left the Corsair out of several of the benchmark results (4KB Random Read, 128KB Sequential Read and Write)?
    Reply
  • beginner99 - Thursday, August 11, 2011 - link

    maybe because it preformed worse than expected and the site got a little bribe from Corsair not to publish but instead put a nice commercial on the last page? Reply
  • philosofool - Thursday, August 11, 2011 - link

    Don't be a jerk. If you're going to accuse someone of something like this, have some evidence. Reply
  • Anand Lal Shimpi - Thursday, August 11, 2011 - link

    Or because I accidentally put the wrong graphs in the piece :) It has been fixed.

    Take care,
    Anand
    Reply
  • Beenthere - Thursday, August 11, 2011 - link

    Sorry but the current SSDs are unreliable at this point in time and it's unscrupulous to continue selling these SSDs when a mfg. doesn't know the root cause, have a resolution for the operational/compatibility issues and can not tell consumers what systems can use these SSDs without issue.

    It's good to see Anandtech substantiate what I have been saying for some time. Now consumers need to stop purchasing these SSDs until they are properly revised so they function without issues for everyone.
    Reply
  • gevorg - Thursday, August 11, 2011 - link

    SSDs offer amazing performance, but too many of them are cursed with reliability problems. A is faster than B at price point C is not sufficient to make buying decisions with SSDs. When and how can benchmarks examine SSD quality issues? Reply
  • Axonn - Thursday, August 11, 2011 - link

    Why is the Corsair Force 3 in only 1 of the benchmarks @ Random/sequential speed? And I can't see the Corsair GT anywhere on the first page? Reply

Log in

Don't have an account? Sign up now