Inside OCZ's Factory: How SSDs Are Madeby Kristian Vättö on May 20, 2015 8:30 AM EST
Now that the hardware side of the drive is ready, it's time to put some intelligence (the firmware) inside.
The firmware download is done by custom PC setups that consist of normal PC hardware (if you look closely, you can see ASUS' logo on a motherboard or two) running some sort of a Linux distro with OCZ's custom firmware download tool. If you zoom into the monitor you can see that in this case the system is applying firmware to 240GB ARC100 drives.
Once the firmware has been loaded, the drives will move to run-in testing. OCZ has developed a custom script that writes and reads all LBAs eight times with the purpose of identifying bad blocks. If a drive has more bad blocks than a preset threshold allows, it will be pulled away and either fixed or destroyed. The scripts also test performance using common benchmarking tools (e.g. AS-SSD and ATTO) to ensure that all drives meet the spec.
Currently OCZ has two different test setups. One half of the test systems are regular PCs that are very similar to the firmware download systems, whereas the other half are custom racks pictured above. OCZ is looking to move all testing to rack-based cabins since one cabin can simultaneously test 256 drives, which is far more efficient than having dozens of PC setups around that can only test a handful of drives each at a time. The test regime is the same in both cases, so it's purely a matter of space and labor efficiency.
At the moment SATA based drives are tested through the host, which means that the IO commands are sent by the host similar to how we test SSDs. For PCIe drives, however, OCZ is developing a Manufacturing Self Test (MST) that is essentially a custom firmware that is loaded into the drive, which then reads and writes all LBAs to test for bad blocks. The benefit of MST is the fact that it bypasses the host interface (i.e. all IO commands are generated by the controller/firmware), making the test cycle faster as the host overhead is removed.
Additionally, every month a sample of finished drives go through a more rigid tests called Ongoing Reliability Testing (ORT) to ensure that nothing has changed in production quality. The tests consist of Thermal Cycle Test (TCT) where the drive is subjected to thermal shocks to validate the quality of manufacturing and Reliability Demonstration Test (RDT) where drives are tested at elevated temperature (~70°C) to demonstrate that the mean time before failure (MTBF) meets the specification.
The run-in testing hasn't changed much since Toshiba took over, but Toshiba did help OCZ to align to its quality standards. All the processes running today have been inspected by Toshiba and meet the strict standards set by the company. Note that the purpose of run-in testing isn't to screen for firmware bugs, but to ensure that the hardware is functional. The firmware development and validation is done before the mass production begins and after Toshiba took over OCZ has modified its development process to increase the quality and reliability of its products.
OCZ's whole philosophy has actually changed since the previous CEO left the company because in the past OCZ always tried to be the first to the market at any cost and tried to cover every possible micro-niche, which resulted in too many product lines for the resources OCZ had. Nowadays OCZ is putting a lot of effort into product qualification and it no longer has a dozen products in development at the same time, meaning that there's now sufficient resources to properly validate every product before it enters mass production.
The run-in testing may seem light with only eight full LBA read/write spans, but honestly I don't think it's necessary to hammer a drive for days because any apparent hardware flaw should surface very quickly. Basically, the hardware either works or it doesn't, and once the drive leaves the factory it's more likely to fail due to firmware anomaly than a physical hardware failure.