The SF-2281 BSOD Bug

A few weeks ago I was finally able to reproduce the SF-2281 BSOD bug in house. In working on some new benchmarks for our CPU Bench database I built an updated testbed using OCZ's Agility 3. All of the existing benchmarks in CPU Bench use a first generation Intel X25-M and I felt like now was a good time to update that hardware. My CPU testbeds need to be stable given their importance in my life so if I find a particular hardware combination that works, I tend to stick to it. I've been using Intel's DH67BL motherboard for this particular testbed since I'm not doing any overclocking - just stock Sandy Bridge numbers using Intel's HD 3000 GPU. The platform worked perfectly and it has been crash free for weeks.

A slew of tablet announcements pulled me away from CPUs for a bit, but I wanted to get more testing done while I worked on other things. With my eye off the ball I accidentally continued CPU testing using an ASUS P8Z68-V Pro instead of my Intel board. All of the sudden I couldn't complete a handful of my benchmarks. I never did see a blue screen but I'd get hard locks that required a power cycle/reset to fix. It didn't take me long to realize that I had been testing on the wrong board, but it also hit me that I may have finally reproduced the infamous SandForce BSOD issue. The recent Apple announcements once more kept me away from my CPU/SSD work but with a way to reproduce the issue I vowed to return to the faulty testbed when my schedule allowed.

Even on the latest drive firmware, I still get hard locks on the ASUS P8Z68-V Pro. They aren't as frequent as before with the older firmware revision, but they still happen. What's particularly interesting is that the problem doesn't occur on Intel's DH67BL, only on the ASUS board. To make matters worse, I switched power supplies on the platform and my method for reproducing the bug no longer seems to work. I'm still digging to try and find a good, reproducible test scenario but I'm not quite there yet. It's also not a Sandy Bridge problem as I've seen the hard lock on ASRock's A75 Extreme6 Llano motherboard, although admittedly not as frequently.

Those who have reported issues have done so from a variety of platforms including Alienware, Clevo and Dell notebooks. Clearly the problem isn't limited to a single platform.

At the same time there are those who have no problems at all. I've got a 240GB Vertex 3 in my 2011 MacBook Pro (15-inch) and haven't seen any issues. The same goes for Brian Klug, Vivek Gowri and Jason Inofuentes. I've sent them all SF-2281 drives for use in their primary machines and none of them have come back to me with issues.

I don't believe the issue is entirely due to a lack of testing/validation. SandForce drives are operating at speeds that just a year ago no one even thought of hitting on a single SATA port. Prior to the SF-2281 I'm not sure that a lot of these motherboard manufacturers ever really tested if you could push more than 400MB/s over their SATA ports. I know that type of testing happens during chipset development, but I'd be surprised if every single motherboard manufacturer did the same.

Regardless the problem does still exist and it's a valid reason to look elsewhere. My best advice is to look around and see if other users have had issues with these drives and have a similar system setup to you. If you do own one of these drives and are having issues, I don't know that there's a good solution out today. Your best bet is to get your money back and try a different drive from a different vendor.

Update: I'm still working on a sort of litmus test to get this problem to appear more consistently. Unfortunately even with the platform and conditions narrowed down, it's still an issue that appears rarely, randomly and without any sort of predictability. SandForce has offered to fly down to my office to do a trace on the system as soon as I can reproduce it regularly. 

Introduction The Newcomers
Comments Locked

90 Comments

View All Comments

  • bernardl - Thursday, August 11, 2011 - link

    I am pretty surprised by the little mention of the OWC SSD in your introduction and conclusion. It seems to belong to the top 3 perforers in every single of your tests and their product have proven extremely reliable and durable over the years.

    I am using 3 of their SSDs (Mac pro boot, mac mini boot and external storage for music server) and have experienced zero issue and stable/fast performance.

    Cheers,
    Bernard
  • arntc - Friday, August 12, 2011 - link

    I'm not sure if I read correctly between the lines; one should stick to the previous generation of consumer SSD's if your on the prowl for a systemdisk in a notebook?

    If the 3-dimensional comparison of Price/Performance/Reliability is charted, which SSD would currently come out on top (subjective comments allowed)?
  • 86waterpumper - Friday, August 12, 2011 - link

    I just bought the 120mb version of the mercury extreme 6g for my sandy bridge build a few weeks ago. I sure wish I had known they were coming out with faster drives :( Oh well so far no bsod issues, and I
    hope I don't see one!
  • FunBunny2 - Friday, August 12, 2011 - link

    My understanding is that NAND is measured in bits because the controllers see the data as bits, not bytes, leveling across available (addressable) dies. Yes?
  • vashtyphoon - Friday, August 12, 2011 - link

    Thanks for the article, very informative, but it made me cringe.

    I just placed an order for a PC build based off of the SandyBridge guide, with the OCZ Vertex 2, but I changed the motherboard to a ASUS P8Z68-V LE, same base model as the setup that caused the BSODs with a vertex 3 on page 2.... Is this going to really miss me up or do the Vertex 2s have a better track record?

    Any thoughts?
  • brakhage - Friday, August 12, 2011 - link

    I just got 3 OCZ drives, 2 vertex3 60's, and a Solid3 120. I quickly encountered the BSOD/freeze issue on the 2 60s (OS drives). After extensive research and thread-chasing, it seems like OCZ has a solid solution, though it's not simple.

    Basically, you update RST and INF drivers (and throw in a BIOS update if possible), then flash the firmware (2.11), clear cmos and fire it up. I've been BSOD-free ever since... EXCEPT when I use the Solid3, which I got a few days later, and haven't flashed yet. (I'm using it for additional programs, so when I play a game that's on the Solid3, I freeze up. Maybe. It may be an overclocking issue there, I haven't had time to figure it out. I just got that Solid 3 a couple days ago - the same day I OC'd the new machine.)

    Full details on this fix can be found on the OCZ forums; once in a terse post, once in a more verbose one.

    So: the firmware flash is a bit of a problem. The tool they provide didn't detect my drives, and it isn't recommended to flash a drive from the OS stored on that drive. I installed windows 7 on a second (spinner) hdd, and tried the tool; it still didn't work. (Maybe because they're in RAID 0?) So I put Ubuntu on the spinner and flashed them through that with no problem.

    (The HDD has since been disconnected, and I haven't gotten around to hooking it up again to flash the solid3, but I'll try to do that this weekend - hopefully that will fix this one too.)

    All this said, the above posters are absolutely right - this should never have happened. However, I'm WAAYYY too impatient to wait for Sandforce to solve the problem, and that impatience extends to waiting for programs to load or for the system to boot. SSDs are like Linux - freaking awesome, but, yes, they aren't the plug-n-play, fire-up-and-forget, McDonalds-style components we've come to expect when running big name OS's. Frustrating, yes, but totally worth it.
  • KPOM - Saturday, August 13, 2011 - link

    Given the ongoing reliability issues with the Sandforce drives, perhaps Apple is justified in using "slower" Toshiba and Samsung SSDs. I've had SSDs since my 2008 Rev B MacBook Air and haven't had a problem with them (the 2008 had a Samsung, my 2010 a Toshiba, and my 2011 a Samsung).
  • Ao1 - Saturday, August 13, 2011 - link

    Lal can you please provide some statistics to back up your claim that the 8MB bug is a plague? How many occurrences of the bug have been reported and how many 310 have been sold?

    Can you als please confirm why you suspect that Intel have cut corners resulting deficiencies in quality control procedures? Perhaps half of the validation team were made redundant; or is that statement just an outrageous speculation?
  • mikeyd55 - Saturday, August 13, 2011 - link

    In 2011, for a consumer to even have to be concerned about technology issues like this, is very disconcerting and bad for everyone. Don’t release a product when it’s not stable and/or hasn’t been thoroughly tested – even if it has to cost more as a result. It’s cheaper in the end for all! It reminds me of my experiences with cell / smart phones that are continuously released to consumers despite their software / hardware/ firmware not being ready for prime time. Regarding my recent (June ‘11) SSD build: OCZ Vertex 3 MAX IOPS 120 GB (updated to firmware 2.06), no hard drive, Intel DZ68DB mb (updated to second BIOS revision), and Windows 7 Home Premium 64 bit; I’ve been fortunate, so far at least, to not have experienced any BSOD’s, although under this cloud of uncertainty, I’m especially leery of updating mb BIOS, firmware or any drivers until Sandforce gets a true handle on this problem.
  • 86waterpumper - Saturday, August 13, 2011 - link

    Well I have had two bsods so far just this weekend :( System is a 2500k non- overclocked running in the normal temperature ranges. First bsod happened during a windows update so I chalked it up to that, but the 2nd one happend awhile ago with the system just sitting there idle. Looks like the owc drives for sure are affected too. Now my question is, how do I prove it is the hard drive causing the bsod lol. Also is there any newer firmware than 3.19 out yet to install or what is the fix?

Log in

Don't have an account? Sign up now