The SF-2281 BSOD Bug

A few weeks ago I was finally able to reproduce the SF-2281 BSOD bug in house. In working on some new benchmarks for our CPU Bench database I built an updated testbed using OCZ's Agility 3. All of the existing benchmarks in CPU Bench use a first generation Intel X25-M and I felt like now was a good time to update that hardware. My CPU testbeds need to be stable given their importance in my life so if I find a particular hardware combination that works, I tend to stick to it. I've been using Intel's DH67BL motherboard for this particular testbed since I'm not doing any overclocking - just stock Sandy Bridge numbers using Intel's HD 3000 GPU. The platform worked perfectly and it has been crash free for weeks.

A slew of tablet announcements pulled me away from CPUs for a bit, but I wanted to get more testing done while I worked on other things. With my eye off the ball I accidentally continued CPU testing using an ASUS P8Z68-V Pro instead of my Intel board. All of the sudden I couldn't complete a handful of my benchmarks. I never did see a blue screen but I'd get hard locks that required a power cycle/reset to fix. It didn't take me long to realize that I had been testing on the wrong board, but it also hit me that I may have finally reproduced the infamous SandForce BSOD issue. The recent Apple announcements once more kept me away from my CPU/SSD work but with a way to reproduce the issue I vowed to return to the faulty testbed when my schedule allowed.

Even on the latest drive firmware, I still get hard locks on the ASUS P8Z68-V Pro. They aren't as frequent as before with the older firmware revision, but they still happen. What's particularly interesting is that the problem doesn't occur on Intel's DH67BL, only on the ASUS board. To make matters worse, I switched power supplies on the platform and my method for reproducing the bug no longer seems to work. I'm still digging to try and find a good, reproducible test scenario but I'm not quite there yet. It's also not a Sandy Bridge problem as I've seen the hard lock on ASRock's A75 Extreme6 Llano motherboard, although admittedly not as frequently.

Those who have reported issues have done so from a variety of platforms including Alienware, Clevo and Dell notebooks. Clearly the problem isn't limited to a single platform.

At the same time there are those who have no problems at all. I've got a 240GB Vertex 3 in my 2011 MacBook Pro (15-inch) and haven't seen any issues. The same goes for Brian Klug, Vivek Gowri and Jason Inofuentes. I've sent them all SF-2281 drives for use in their primary machines and none of them have come back to me with issues.

I don't believe the issue is entirely due to a lack of testing/validation. SandForce drives are operating at speeds that just a year ago no one even thought of hitting on a single SATA port. Prior to the SF-2281 I'm not sure that a lot of these motherboard manufacturers ever really tested if you could push more than 400MB/s over their SATA ports. I know that type of testing happens during chipset development, but I'd be surprised if every single motherboard manufacturer did the same.

Regardless the problem does still exist and it's a valid reason to look elsewhere. My best advice is to look around and see if other users have had issues with these drives and have a similar system setup to you. If you do own one of these drives and are having issues, I don't know that there's a good solution out today. Your best bet is to get your money back and try a different drive from a different vendor.

Update: I'm still working on a sort of litmus test to get this problem to appear more consistently. Unfortunately even with the platform and conditions narrowed down, it's still an issue that appears rarely, randomly and without any sort of predictability. SandForce has offered to fly down to my office to do a trace on the system as soon as I can reproduce it regularly. 

Introduction The Newcomers
POST A COMMENT

90 Comments

View All Comments

  • V3ctorPT - Thursday, August 11, 2011 - link

    Exactly what I think, I have an X25-M 160Gb and that thing is still working flawlessly with the advertised speeds, every week he gets the Intel Optimizer and it's good...

    Even my Gskill Falcon 1 64Gb is doing great, no BSOD's, no unexpected problems, the only "bad" thing that I saw was in SSD Life Free, when it say's my SSD is at 80% of NAND wear n' tear, my Intel is at 100%.

    CrystalDisk Info confirms those conditions (that SSD Life reports), Anand, do you think these "tools" are trust worthy? Or they're some sort of scam?
    Reply
  • SjarbaDarba - Sunday, August 14, 2011 - link

    Where I work - we have had 265 Vertex II drives come back since June 2010.

    That's one every day or two since for our 1 store, hardly reliable tech.
    Reply
  • Ikefu - Thursday, August 11, 2011 - link

    "a 64Gb 25nm NAND die will set you back somewhere from $10 - $20. If we assume the best case scenario that's $160 for the NAND alone"

    I think you meant to say an 8Gb Nand die will set you back $10-$20. Not 64Gb

    Yay math typos. Those are always hard to catch.
    Reply
  • bobbozzo - Thursday, August 11, 2011 - link

    No, 64Gb = 8GB

    Note the capitalization/case.
    Reply
  • Ryan Smith - Thursday, August 11, 2011 - link

    We're using gigaBITs (little b), not gigaBYTEs (big B).

    64Gb x 16 modules / 8 bits-to-bites = 128GBytes.
    Reply
  • Ikefu - Thursday, August 11, 2011 - link

    Ah Capitalization for the loss, I see my error now. Thank you =)

    Later in the article they refer to 8GB so the switch from Gigabits to Gigabytes through me.
    Reply
  • philosofool - Thursday, August 11, 2011 - link

    I made the same mistake at first.

    Can I request that, in the future, we write either in terms of bytes or bits for the same type of part? There's no need to switch from bits to bytes when talking about storage capacity and you just confuse a reader or two when you do.
    Reply
  • nbrenner - Thursday, August 11, 2011 - link

    I understand the GB vs Gb argument, but even if it takes 8 modules to make up 64Gb it was stated that a 64Gb die would set you back $10-$20, so saying a 128Gb drive would cost $160 didn't make any sense until 3 paragraphs later when it said the largest die you could get is 8GB.

    I think most of us read that if 64Gb is $10-$20, then why in the world would it cost $160 to get to 128Gb?
    Reply
  • Death666Angel - Friday, August 12, 2011 - link

    Unless he edited it, it clearly states "128GB". I think the b=bit and B=byte is quite clear, though I would not complain if they stick with one thing and not change it in between. :-) Reply
  • Mathieu Bourgie - Thursday, August 11, 2011 - link

    Once again, a fantastic article from you Anand on SSDs.

    I couldn't agree more on the state of consumer SSDs and their reliability (or lack of...).

    The problem as you mentioned is the small margins that manufacturers are getting (if they are actually manufacturing it...), which results in less QA than required and products that launch with too many bugs. The issue is, this won't go away, because many customers do want the price per GB to go down before they'll buy. Probably waiting for that psychological $1 per GB, that same 1$ per GB that HDDs reached many years ago.

    With prices per GiB (actual capacity in Windows) dropping below $1.50, reliability is one of the last barrier for SSDs to actually become mainstream. Most power users now have one or are considering one, but SSDs are still very rare in most desktops/laptops sold by HP, Dell and the like. Sometimes they will be offered as an option (with additional cost), but rarely as a standard drive (only a handful or two of exceptions come to mind for laptops).

    I can only hope that the reliability situation improves, because I do wish to see a major computing breakthrough, that is for SSDs to replace HDDs entirely one day. As you said years ago in an early SSD article, once you had a SSD, you can't go without one.

    My desktop used to have two Samsung F3 1TB in RAID 0. Switching to it from my laptop (which had an Intel 120GB X25-M G2) was almost painful. Being accustomed to the speed of the SSD, the HDDs felt awfully slow. And I'm talking about two top of the line (besides raptors) HDDs in RAID 0 here, not a five year old IDE HDD here.

    It's always a pleasure to read your articles Anand, keep up the outstanding work!
    Reply

Log in

Don't have an account? Sign up now