One of the expanding elements of the storage business is that the capacity per drive has been ever increasing. Spinning hard-disk drives are approaching 20 TB soon, while solid state storage can vary from 4TB to 16TB or even more, if you’re willing to entertain an exotic implementation. Today at the Data Centre World conference in London, I was quite surprised to hear that due to managed risk, we’re unlikely to see much demand for drives over 16TB.

Speaking with a few individuals at the show about expanding capacities, storage customers that need high density are starting to discuss maximum drive size requirements based on their implementation needs. One message starting to come through is that storage deployments are looking at managing risk with drive size – sure, a large capacity drive allows for high-density, but in a drive failure of a large drive means a lot of data is going to be lost.

If we consider how data is used in the datacentre, there are several levels regarding how often the data is used. Long-term storage, known as cold storage, is accessed very infrequently and occupied with mechanical hard-drives with long-time data retention. A large drive failure at this level might lose substantial archival data, or require long build times. More regularly accessed storage, or nearline storage / warm storage, is accessed frequently but is often used as a localised cache from the long-term storage. For this case, imagine Netflix storing a good amount of its back-catalogue for users to access – a loss of a drive here requires accessing colder storage, and the rebuild times come in to play. For hot storage, the storage that has constant read/write access, we’re often dealing with DRAM or large database work with many operations per second. This is where a drive failure and rebuild can result in critical issues with server uptime and availability.

Ultimately the size of the drive and the failure rate leads to element of risks and downtime, and aside from engineering more reliant drives, the other variable for risk management is drive size. 16TB, based on the conversations I’ve had today, seems to be that inflection point; no-one wants to lose 16TB of data in one go, regardless of how often it is accessed, or how well a storage array has additional failover metrics.

I was told that sure, drives above 16TB do exist in the market, however aside from niche applications (such as risk is an acceptable factor for higher density), volumes are low. This inflection point, one would imagine, is subject to change based on how the nature of data and data analytics will change over time. Samsung’s PM983 NF1 drive tops out at 16 TB, and while Intel has shown samples of 8 TB units of its long ruler E1.L form factor, it has listed future drives using QLC up to 32TB. Of course, 16 TB per drive puts no limits on the number of drives per system – we have seen 1U units with 36 of these drives in the past, and Intel has been promoting up to 1 PB in a 1U form factor. It is worth noting that the market for 8 TB SATA SSDs is relatively small - no-one wants to rebuild that large a drive at 500 MB/s, which would take a minimum of 4.44 hours, bringing server uptime down to 99.95% rather than the 99.999% metric (5m22 per year).

Related Reading

Comments Locked

86 Comments

View All Comments

  • zepi - Wednesday, March 13, 2019 - link

    I don't find small drives surprising at all. There are many many people who don't save anything locally. They rely solely on dropbox / onedrive / google drive and are happy that their files are available on every device they own.
  • Scott_T - Wednesday, March 13, 2019 - link

    definitely, every home user's computer I've worked on would be fine with a 256gb ssd (about $30 now!) and with everyone streaming video and music I dont see that changing.
  • erple2 - Wednesday, March 20, 2019 - link

    The only thing that this fails for is "gaming" - though if gaming services start streaming, then it's possible that large drives become even less useful.
  • CaedenV - Wednesday, March 13, 2019 - link

    yep... 128GB is a flash drive or SD card these days. I am not understanding people who build with 128GB SSDs and then a 2TB HDD and then expecting users to figure out how to redirect folders, or otherwise use the larger space. They just fill up the SSD and then wonder why on earth they can't install the next game.
    I am a data hog on my NAS, but on my local system I feel like I am a fairly light user, and even I need 256GB of space at minimum just for windows, office, and a few games.
  • mitsuhashi - Wednesday, March 13, 2019 - link

    You're misusing B vs b!
    **triggered**
  • bloodgain - Wednesday, March 13, 2019 - link

    Baloney. It's a matter of cost, not capacity. If the price of SSDs drops significantly -- maybe by half, definitely by 75% -- then you simply switch to redundant storage and rebuild time becomes a non-issue, as there is no downtime for the rebuild, no matter how long it takes. If data centers could by a 1 PB enterprise-class SSD for $25K, they'd order them by the pallet-load.
  • deil - Wednesday, March 13, 2019 - link

    Problem is price of 10x4TB ssd for 16 TB one, does not mean it will survive 10xlonger, and you still need to raid them. That size already goes into storage capacity, and those are dominated by cheaper HDD's.
    They don't say that 32TB SSD's don't have a reason to exist, they say that its stopping to be cost/effective. Speed is reason for going to SSD and you don't need it as big as cold storage/size does not mean faster access/copy.

    And nobody sane would loose 16 TB as anyone who have 16 TB of important stuff raids their storage.
    raid's can wait for replication, there is no need to recover swiftly. -> hdd's wins
  • TrevorH - Wednesday, March 13, 2019 - link

    > no-one wants to rebuild that large a drive at 500 MB/s, which would take a minimum of 4.44 hours, bringing server uptime down to 99.95% rather than the 99.999% metric

    Who rebuilds a drive while a server is down?
  • jordanclock - Wednesday, March 13, 2019 - link

    You rebuild on live production servers?
  • afidel - Wednesday, March 13, 2019 - link

    Absolutely, if you had to take down a server or worse SAN array every time a drive died no data center would ever get anything done. Mine was ~500 drives, with a 1.5% AFR that's 2 drives a month lost. You're not stopping operations 2x a month to wait on the RAID rebuild.

Log in

Don't have an account? Sign up now