The SF-2281 BSOD Bug

A few weeks ago I was finally able to reproduce the SF-2281 BSOD bug in house. In working on some new benchmarks for our CPU Bench database I built an updated testbed using OCZ's Agility 3. All of the existing benchmarks in CPU Bench use a first generation Intel X25-M and I felt like now was a good time to update that hardware. My CPU testbeds need to be stable given their importance in my life so if I find a particular hardware combination that works, I tend to stick to it. I've been using Intel's DH67BL motherboard for this particular testbed since I'm not doing any overclocking - just stock Sandy Bridge numbers using Intel's HD 3000 GPU. The platform worked perfectly and it has been crash free for weeks.

A slew of tablet announcements pulled me away from CPUs for a bit, but I wanted to get more testing done while I worked on other things. With my eye off the ball I accidentally continued CPU testing using an ASUS P8Z68-V Pro instead of my Intel board. All of the sudden I couldn't complete a handful of my benchmarks. I never did see a blue screen but I'd get hard locks that required a power cycle/reset to fix. It didn't take me long to realize that I had been testing on the wrong board, but it also hit me that I may have finally reproduced the infamous SandForce BSOD issue. The recent Apple announcements once more kept me away from my CPU/SSD work but with a way to reproduce the issue I vowed to return to the faulty testbed when my schedule allowed.

Even on the latest drive firmware, I still get hard locks on the ASUS P8Z68-V Pro. They aren't as frequent as before with the older firmware revision, but they still happen. What's particularly interesting is that the problem doesn't occur on Intel's DH67BL, only on the ASUS board. To make matters worse, I switched power supplies on the platform and my method for reproducing the bug no longer seems to work. I'm still digging to try and find a good, reproducible test scenario but I'm not quite there yet. It's also not a Sandy Bridge problem as I've seen the hard lock on ASRock's A75 Extreme6 Llano motherboard, although admittedly not as frequently.

Those who have reported issues have done so from a variety of platforms including Alienware, Clevo and Dell notebooks. Clearly the problem isn't limited to a single platform.

At the same time there are those who have no problems at all. I've got a 240GB Vertex 3 in my 2011 MacBook Pro (15-inch) and haven't seen any issues. The same goes for Brian Klug, Vivek Gowri and Jason Inofuentes. I've sent them all SF-2281 drives for use in their primary machines and none of them have come back to me with issues.

I don't believe the issue is entirely due to a lack of testing/validation. SandForce drives are operating at speeds that just a year ago no one even thought of hitting on a single SATA port. Prior to the SF-2281 I'm not sure that a lot of these motherboard manufacturers ever really tested if you could push more than 400MB/s over their SATA ports. I know that type of testing happens during chipset development, but I'd be surprised if every single motherboard manufacturer did the same.

Regardless the problem does still exist and it's a valid reason to look elsewhere. My best advice is to look around and see if other users have had issues with these drives and have a similar system setup to you. If you do own one of these drives and are having issues, I don't know that there's a good solution out today. Your best bet is to get your money back and try a different drive from a different vendor.

Update: I'm still working on a sort of litmus test to get this problem to appear more consistently. Unfortunately even with the platform and conditions narrowed down, it's still an issue that appears rarely, randomly and without any sort of predictability. SandForce has offered to fly down to my office to do a trace on the system as soon as I can reproduce it regularly. 

Introduction The Newcomers
POST A COMMENT

90 Comments

View All Comments

  • rigged - Sunday, August 14, 2011 - link

    are you using the SF-2281 or SF-2282 based OWC drive?

    only the new 240GB and 480GB drives from OWC use this controller.

    http://eshop.macsales.com/item/Other+World+Computi...

    Under Specs

    Controller: SandForce 2282 Series
    Reply
  • Justin Case - Sunday, August 14, 2011 - link

    It's not just the BSOD. Even systems that don't crash have frequent freezes for anything up to 90 seconds. Tht's enough to make network transfers abort, connections to game servers drop, etc..

    I've tried three Corsair drives on multiple platforms and I know people who have used those and also OCZ. Not a single drive was 100% stable on any platform. They tended to crash more on Intel chipsets and freeze more on AMD chipsets (sometimes recoverable, sometimes a hard lock), but NOT A SINGLE ONE was problem-free for more than 2 or 3 days in a row (often you'll get two or three freezes within the sme hour).

    The first job of a drive is to reliably hold your data. People use SSDs to install their OS and applications. It takes days to reinstall and recover from errrors. It's irrelevant if some drive gives you 500 happybytes in some benchmark when the same drive keeps losing your thesis or getting you killed in Tem Fortress. I have systems with Raptors that have been running for 5 years without a single error.

    If you have any problems (which you will), don't let them string you along with nonsense about your cables or obscure BIOS options or promises about future fixes. Return the drive and demand a refund. Both OCZ and Corsair are still selling drives that they KNOW to be defective, and removing any reference to those problems from the support section of their sites (you'll still find thousands of complaints in their user forums, though). Demanding refunds (or starting a class action suit) seems to be the only language they understand.
    Reply
  • SjarbaDarba - Sunday, August 14, 2011 - link

    I experience some hard locks too, mainly during gaming only since upgrading to a 120GB Vertex 3.

    System is an X58-UD7 + i7-960, 2 GTX570 OC Sli, Seasonic X-850 80+ GOLD, 6GB Corsair DDR3 1600C8.

    Was originally using 2 x 300GB Velociraptors in RAID 0 with WD1002FAEX and Seagate 2TB XT, stable for 1-2 months before upgrading to the V3. Storage configuration since upgrade is 120GB Vertex3, WD1002FAEX and Raptors RAID 0.

    System perfectly stable under 600GB RAID 0 OS with Crysis 2, CS:S, LoL, L4D, L4D2 and Borderlands all playing stable at all loads with hardware monitoring active, no problems found with any hardware or software at this point, system performed flawlessly for all tasks.

    Dropped the V3 in with the typical "It's fast - but it could be faster" attitude we all know and love and instantly started experiencing ... whackness. System works flawlessly 99% of the time, however, a few times a week I will lock up and need to power cycle - I have the SSD running in AHCI with TRIM etc. enabled, page file, defrag etc. turned off and pretty much every detail of the drive perfectly specced for optimum performance.

    If I lock up and have to cycle, upon restart the SATA controller the SSD is attached to will hang at BIOS and not detect the V3 - however - cycling again at this point allows the SSD to be detected within ~1 second and Windows boots normally.

    At this point, however (and using nVidia 275.33 drivers) returning to desktop boots me in 800x600 resolution with no nVidia control panel and a further power cycle is required again to reset the resolution.

    Yet to test this problem with nVidia 280.16 drivers but havn't had stability problems since then.

    Sorry for any tl;dr, just thought Annad might like to hear about a strange error I've encountered in the SF controller.

    P.S: System is 3DMark, Furmark and Prime stable, it just has some whack locks randomly and the SSD disappears completely for a power cycle.
    Reply
  • readyrover - Monday, August 15, 2011 - link

    I was going to dive into my first SSD with a Bulldozer build on the upcoming horizon...until this all shakes out...absolutely no way. My usage is for processing large music files on a Digital Audio Workstation with multiple time based effects and multi-tracks of instruments. I have been experiencing some latency bottle necks and thought "wow" ssd is an instant fix!

    If they have ironed out the problems and the reviews' negative percentages drop back below an astounding 20% of my recent research..then perhaps a year from now...Bulldozers should be less expensive then as well..

    Just my humble opinion, but I can't roll the dice on a hit and miss crash...."Please Mr. $120 hour guitarist...would you wait an hour for me to fix the computer and replay that absolutely inspired, one of kind improvisation...AGAIN!

    Brrrr...shiver...run away fast!
    Reply
  • Gothmoth - Friday, August 19, 2011 - link

    i have a few asus z68- v pro boards (three to be exact).

    all of them have an vertex3 120 GB SSD as C drive.
    all have 16 GB g.skill ram and run win 7 64 bit sp1.

    i had not a single issue with the vertex 3 since i bought them (13. april 2011).

    i have still the first firmware running.
    thank god i have avoided updating to firmware v2.06 or v2.09.

    i have put the vertex3 240 GB from a friend in my system with firmware 2.06.
    we could reproduce the BSOD after 1 hour.
    he has constand crashes on his gigabye motherboard based system.

    we but one of my vertex3 120GB SSD in his system and it was running flawless for 2 days.
    Reply
  • twindragon6 - Friday, August 26, 2011 - link

    I know the market sucks! But I would rather pay more for something that actually works than pay less for something that doesn't and be stuck with an expensive paperweight! Reply
  • alpha754293 - Friday, September 02, 2011 - link

    Anand:

    Does that BSOD bug only affect drives that are boot drives? i.e. What would happen if the test drives were slave/data/non-OS-containing drives? Does it still do the same BSOD thing?
    Reply
  • Keith2468 - Monday, December 12, 2011 - link

    Digital people tend to think digital issue when looking for the causes of computer hardsware and software failure. But sometimes the failures are not digital in origin.

    The power supply may well be critical to SSD failures.

    What causes SSD failures? Largely power disturbances to the SSD.

    Why are SSDs with smaller IOPS and smaller caches less likely to fail?
    Less data to move from volatile RAM cache to Flash when power disturbances occur.

    Why should you not use a notebook SSD in a desktop?
    A notebook SSD designer will typically assume that the notebook's battery means he doesn't have to design for power distrubances.

    "The design of an SSD's power down management system is a fundamental characteristic of the SSD which can determine its suitability and compatibility with user operational environments. Systems integrators must take this into account when qualifying SSDs in new applications - because subtle differences in OS timings, rack power loading and rack logic affect some types of SSDs more than others. Users should be aware that power management inside the SSD (a factor which doesn't get much space in most product datasheets) is as important to reliable operation as management of endurance, IOPS, cost and other headline parameters."

    http://www.storagesearch.com/ssd-power-going-down....
    Reply
  • jfraser7 - Friday, November 14, 2014 - link

    This article is very useful because Mac OS X 10.10 Yosemite dropped all support for third-party Solid State Drives, except for those which use SandForce controllers. Reply
  • jfraser7 - Friday, November 14, 2014 - link

    Also, all three of Kingston's recent Solid State Drive lines(V300, KC300 & HyperX) use SandForce controllers. Reply

Log in

Don't have an account? Sign up now