The SF-2281 BSOD Bug

A few weeks ago I was finally able to reproduce the SF-2281 BSOD bug in house. In working on some new benchmarks for our CPU Bench database I built an updated testbed using OCZ's Agility 3. All of the existing benchmarks in CPU Bench use a first generation Intel X25-M and I felt like now was a good time to update that hardware. My CPU testbeds need to be stable given their importance in my life so if I find a particular hardware combination that works, I tend to stick to it. I've been using Intel's DH67BL motherboard for this particular testbed since I'm not doing any overclocking - just stock Sandy Bridge numbers using Intel's HD 3000 GPU. The platform worked perfectly and it has been crash free for weeks.

A slew of tablet announcements pulled me away from CPUs for a bit, but I wanted to get more testing done while I worked on other things. With my eye off the ball I accidentally continued CPU testing using an ASUS P8Z68-V Pro instead of my Intel board. All of the sudden I couldn't complete a handful of my benchmarks. I never did see a blue screen but I'd get hard locks that required a power cycle/reset to fix. It didn't take me long to realize that I had been testing on the wrong board, but it also hit me that I may have finally reproduced the infamous SandForce BSOD issue. The recent Apple announcements once more kept me away from my CPU/SSD work but with a way to reproduce the issue I vowed to return to the faulty testbed when my schedule allowed.

Even on the latest drive firmware, I still get hard locks on the ASUS P8Z68-V Pro. They aren't as frequent as before with the older firmware revision, but they still happen. What's particularly interesting is that the problem doesn't occur on Intel's DH67BL, only on the ASUS board. To make matters worse, I switched power supplies on the platform and my method for reproducing the bug no longer seems to work. I'm still digging to try and find a good, reproducible test scenario but I'm not quite there yet. It's also not a Sandy Bridge problem as I've seen the hard lock on ASRock's A75 Extreme6 Llano motherboard, although admittedly not as frequently.

Those who have reported issues have done so from a variety of platforms including Alienware, Clevo and Dell notebooks. Clearly the problem isn't limited to a single platform.

At the same time there are those who have no problems at all. I've got a 240GB Vertex 3 in my 2011 MacBook Pro (15-inch) and haven't seen any issues. The same goes for Brian Klug, Vivek Gowri and Jason Inofuentes. I've sent them all SF-2281 drives for use in their primary machines and none of them have come back to me with issues.

I don't believe the issue is entirely due to a lack of testing/validation. SandForce drives are operating at speeds that just a year ago no one even thought of hitting on a single SATA port. Prior to the SF-2281 I'm not sure that a lot of these motherboard manufacturers ever really tested if you could push more than 400MB/s over their SATA ports. I know that type of testing happens during chipset development, but I'd be surprised if every single motherboard manufacturer did the same.

Regardless the problem does still exist and it's a valid reason to look elsewhere. My best advice is to look around and see if other users have had issues with these drives and have a similar system setup to you. If you do own one of these drives and are having issues, I don't know that there's a good solution out today. Your best bet is to get your money back and try a different drive from a different vendor.

Update: I'm still working on a sort of litmus test to get this problem to appear more consistently. Unfortunately even with the platform and conditions narrowed down, it's still an issue that appears rarely, randomly and without any sort of predictability. SandForce has offered to fly down to my office to do a trace on the system as soon as I can reproduce it regularly. 

Introduction The Newcomers
POST A COMMENT

90 Comments

View All Comments

  • arklab - Thursday, August 11, 2011 - link

    A pity you didn't get the new ... err revised OWC 240GB Mercury EXTREME™ Pro 6G SSD.

    It now uses the SandForce 2282 controller.
    While said to be similar to the troubled 2281, I'm wondering if it is different enough to side step the BSOD bug.

    It may well also be faster - at least by a bit.

    Only the 240GB has the new controller, not there 120GB - though the 480 will also be getting it "soon".

    PLEASE get one, test, and add to this review!
    Reply
  • cigar3tte - Thursday, August 11, 2011 - link

    Anand mentioned that he didn't see any BSOD's with the 240GB drives he passed out. AFAIK, only the 120GB drives have the problem.

    Also, the BSOD is only when you are running the OS on the drive. So if you have the drive as an addon, you'd just lose the drive, but no BSOD, I believe.

    I returned my 120GB Corsair Force 3 and got a 64GB Micro Center SSD (the first SandForce controller) instead.
    Reply
  • jcompagner - Sunday, August 14, 2011 - link

    ehh,, i have one of the first 240GB vertex 3 in my Dell XPS17 sandy bridge laptop.

    with the first firmware 2.02 i didn't get BSOD after i got Windows 7 64bit installed right (using the intel drivers, fixing the LPM settings in the registry)
    everything was working quite right

    then we got 2 firmware version who where horrible BSOD almost any other day. Then we get 2.09 which OCZ says thats a bit of an debug/intermediate release not really a final release. And what is the end result? ROCK STABLE!! no BSOD at all anymore.

    But then came the 2.11 release they stressed that everybody should upgrade and also upgrade to the latest 10.6 intel drivers.. I thought ok lets do it then.

    In 2 weeks: 3 BSOD, at least 2 of them where those F6 errors again..

    Now i think it is possible to go back to 2.09 again, which i am planing to do if i got 1 more hang/BSOD ...
    Reply
  • geek4life!! - Thursday, August 11, 2011 - link

    I thought OCZ purchasing Indilnx was to have their own drives made "In house".

    To my knowledge they already have some drives out that use the Indilnx controller with more to come in the future.

    I would like your take on this Anand ?
    Reply
  • zepi - Thursday, August 11, 2011 - link

    How about digging deeper into SSD behavior in server usage?

    What kind of penalties can be expected if daring admins use couple of SSD's in a raid for database / exchange storage? Or should one expect problems if you run a truckload of virtual machines from reasonably priced a raid-5 of MLC-SSD's ?

    Does the lack of trim-support in raid kill the performance and which drivers are the best etc?
    Reply
  • cactusdog - Thursday, August 11, 2011 - link

    Great review but Why wouldnt you use the latest RST driver? Supposed to fix some issues. Reply
  • Bill Thomas - Thursday, August 11, 2011 - link

    What's your take on the new EdgeTech Boost SSD's? Reply
  • ThomasHRB - Thursday, August 11, 2011 - link

    Thanks for another great article Anand, I love reading all the articles on this site. I noticed that you have also managed to see the BSOD issues that others are having.

    I don't know if my situation is related, but from personal experience and a bit of trial and error I found that by unstable power seems to be related to the frequency of these BSOD events. I recently built a new system while I was on holiday in Brisbane Australia.
    Basic Specs:
    Mainboard - Gigabyte GA-Z68X-UD3R-B3
    Graphics - Gigabyte GV-N580UD-15I
    CPU - Intel Core i72600K (stock clock)
    Cooler - Corsair H60 (great for computer running in countries where ambient temp regularly reach 35degrees Celsius)
    PSU -Corsair TX750

    In Brisbane my machine ran stable for 2 solid weeks (no shutdown's only restart during software installations, OS updates etc).

    However when I got back to Fiji, and powered up my machine, I had these BSOD's every day or 2 (I shutdown my machine during the days when I am at work and at night when I am asleep) (CPU temp never exceeded 55degrees C measured with CoreTemp and RealTemp) and GPU temp also never went above 60degrees C measured with nvidia gadget from addgadget.com)

    All my computer's sit behind an APC Back-UPS RS (BR1500). I also have an Onkyo TX-NR609 hooked up to the HDMI-mini port, so I disconnected that for a few days, but i saw no differences.

    However last Friday, a major power spike caused my Broadband router (dlink DIR-300) to crash, and I had to reset the unit to get it working. My machine also had a BSOD at that exact same moment. so I thought that it was a possibility that I was getting a power spike being transmitted through the Ethernet cable from my ISP (the only thing that I have not got an isolation unit for)

    So the next day I bough and installed an APC ProtectNET (PNET1GB) and I have not had a single BSOD running for almost 1 full week (no shutdown's and my Onkyo has been hooked back up).

    Although this narrative is long and reflects nothing more than my personal experiences, I at least found it strange that my BSOD seems to have nothing to do with the Vertex3 and more to do with random power fluctuations in my living environment.

    And it may be possible that other people are having the same problem I had, and attributing it to a particular piece of hardware simply because other people have done the same attribution.

    Kind Regards.
    Thomas Rodgers
    Reply
  • etamin - Thursday, August 11, 2011 - link

    Great article! The only thing that's holding me back from buying an SSD is that secure data erasing is difficult on an SSD and a full rewrite of the drive is neither time efficient nor helpful to the longevity of the drive...or so I have heard from a few other sources. What is your take on this secure deletion dilemma (if it actually exists)? Reply
  • lyeoh - Friday, August 12, 2011 - link

    AFAIK erasing a "conventional" 1 TB drive is not very practical either ( takes about 3 hours).

    Options:
    a) Use encryption, refer to the "noncompressible" benchmarks, use the more reliable SSDs, and use hardware acceleration or fast CPUs e.g. http://www.truecrypt.org/docs/?s=hardware-accelera...
    b) Use physical destruction - e.g. thermite, throwing it into lava, etc :).
    Reply

Log in

Don't have an account? Sign up now