Firmware Installation

Now that the hardware side of the drive is ready, it's time to put some intelligence (the firmware) inside. 

The firmware download is done by custom PC setups that consist of normal PC hardware (if you look closely, you can see ASUS' logo on a motherboard or two) running some sort of a Linux distro with OCZ's custom firmware download tool. If you zoom into the monitor you can see that in this case the system is applying firmware to 240GB ARC100 drives.

Once the firmware has been loaded, the drives will move to run-in testing. OCZ has developed a custom script that writes and reads all LBAs eight times with the purpose of identifying bad blocks. If a drive has more bad blocks than a preset threshold allows, it will be pulled away and either fixed or destroyed. The scripts also test performance using common benchmarking tools (e.g. AS-SSD and ATTO) to ensure that all drives meet the spec. 

Currently OCZ has two different test setups. One half of the test systems are regular PCs that are very similar to the firmware download systems, whereas the other half are custom racks pictured above. OCZ is looking to move all testing to rack-based cabins since one cabin can simultaneously test 256 drives, which is far more efficient than having dozens of PC setups around that can only test a handful of drives each at a time. The test regime is the same in both cases, so it's purely a matter of space and labor efficiency.

At the moment SATA based drives are tested through the host, which means that the IO commands are sent by the host similar to how we test SSDs. For PCIe drives, however, OCZ is developing a Manufacturing Self Test (MST) that is essentially a custom firmware that is loaded into the drive, which then reads and writes all LBAs to test for bad blocks. The benefit of MST is the fact that it bypasses the host interface (i.e. all IO commands are generated by the controller/firmware), making the test cycle faster as the host overhead is removed. 

Additionally, every month a sample of finished drives go through a more rigid tests called Ongoing Reliability Testing (ORT) to ensure that nothing has changed in production quality. The tests consist of Thermal Cycle Test (TCT) where the drive is subjected to thermal shocks to validate the quality of manufacturing and Reliability Demonstration Test (RDT) where drives are tested at elevated temperature (~70°C) to demonstrate that the mean time before failure (MTBF) meets the specification. 

The run-in testing hasn't changed much since Toshiba took over, but Toshiba did help OCZ to align to its quality standards. All the processes running today have been inspected by Toshiba and meet the strict standards set by the company. Note that the purpose of run-in testing isn't to screen for firmware bugs, but to ensure that the hardware is functional. The firmware development and validation is done before the mass production begins and after Toshiba took over OCZ has modified its development process to increase the quality and reliability of its products.

OCZ's whole philosophy has actually changed since the previous CEO left the company because in the past OCZ always tried to be the first to the market at any cost and tried to cover every possible micro-niche, which resulted in too many product lines for the resources OCZ had. Nowadays OCZ is putting a lot of effort into product qualification and it no longer has a dozen products in development at the same time, meaning that there's now sufficient resources to properly validate every product before it enters mass production. 

The run-in testing may seem light with only eight full LBA read/write spans, but honestly I don't think it's necessary to hammer a drive for days because any apparent hardware flaw should surface very quickly. Basically, the hardware either works or it doesn't, and once the drive leaves the factory it's more likely to fail due to firmware anomaly than a physical hardware failure. 

The Factory & Assembling an SSD Packaging & Final Words
Comments Locked

64 Comments

View All Comments

  • dreamslacker - Thursday, May 21, 2015 - link

    Not quite. You can only do that if the entire process is automated and centrally controlled.

    Their SMT line uses tape reels for parts, if you wanted to pre-test NAND flash before mounting them, you will need to extract, test and reinsert into the tape reels (or trays) in a specific order.

    The mapping information from testing will also need to be entered into a centralized database. The reels or trays will then need to be sent to the SMT line in the same specific order and the circuit boards, once soldered, will also have to be sent to the firmware flashing racks in order as well. Otherwise, there is no way of telling which board contains which flash chips.

    Again, they have to extract and send for labeling in a specific order so that the label serials will match what is programmed into the drive.

    All of these basically means you need to invest into a completely automated line with conveyor belts, robotic arms, automatic labeling machines, database servers and apps to integrate.

    It's simply more economical, for a smaller scale operation like this, to solder first and ditch the defective units as long as your components do not come with a high defect rate.
  • caleblloyd - Thursday, May 21, 2015 - link

    I think the article got posted Tuesday morning then taken down a few minutes later. But all good now!
  • dreamslacker - Wednesday, May 20, 2015 - link

    Odd that they intend to both shift towards a rack-based system and also MST at the same time.

    A decent rack type system would be running ASICs that offload the test regime from the host computer(s) and can be configured on-the-fly to run both drive tests as well as programming.
  • DanNeely - Wednesday, May 20, 2015 - link

    Even if most of the testing can be done on board the drive itself using the MST; they still need to plug the drives into something else to verify the sata connection and to read out the results of the MST. MST will just let them swap new batches of drives into the tester more frequently.
  • dreamslacker - Thursday, May 21, 2015 - link

    MST is normally used when you have reduced capability test chambers / racks. For mechanical drives that need to be tested under high temperature conditions, it allows them to dump a load of drives into what is effectively a large oven with triggered power supplies for thermal tests.

    If they are utilizing 'smarter' racks, such as the Xyratex (now owned by Seagate) modules, then the ASICs will handle everything and loading/ unloading MST firmware manually isn't needed.

    The primary difference is that a smarter ASIC based unit can automatically run the tests, dump the logs and prep the drive for sale (clear the test data, reset the counters etc) whereas MST requires the worker to extract the drives and send them to another workspace to have the MST logs dumped and the production firmware loaded in.

    The ASIC based test racks still connect to a computer, you just have the ability to use a single computer for more drives (for Seagate's Xyratex modules, it's 1 computer to 192 drives). They are, however, headless units and are fully controlled by a central server via CGI & Python scripts.
    Very little user intervention is required on their fully automated line except when they need to re-configure the racks to accept modules with different drive interfaces. This is done by a single terminal console where they telnet into the machine handling the modules and edit the configuration file by VI.
  • GTan36 - Wednesday, May 20, 2015 - link

    Bought a OCZ Arc100 240GB SSD last week for my first PC build. It's really fast and you can tell the drive is high quality and manufactured well. Boots Windows 8.1 in less than 15 seconds and was pretty affordable.

    It's nice to see OCZ becoming a quality brand.
  • r3loaded - Wednesday, May 20, 2015 - link

    Excellent article, it was like a written version of a How It's Made episode!
  • Phynaz - Wednesday, May 20, 2015 - link

    Presented by Anandtech, a division of OCZ Marketing
  • xthetenth - Wednesday, May 20, 2015 - link

    Yes, this tour of a company's factories and discussion of how they make things is different than all the others because
  • MrSpadge - Wednesday, May 20, 2015 - link

    Don't fault AT for OCZ being more open than other manufacturers.

Log in

Don't have an account? Sign up now