What's Necessary to Get Full Performance Out of a Solid State Drive?

The storage hardware in the new consoles opens up new possibilities, but on its own the hardware cannot revolutionize gaming. Implementing the new features enabled by fast SSDs still requires software work from the console vendors and game developers. Extracting full performance from a high-end NVMe SSD requires a different approach to IO than methods that work for hard drives and optical discs.

We don't have any next-gen console SSDs to play with yet, but based on the few specifications released so far we can make some pretty solid projections about their performance characteristics. First and foremost, hitting the advertised sequential read speeds will require keeping the SSDs busy with a lot of requests for data. Consider one of the fastest SSDs we've ever tested, the Samsung PM1725a enterprise SSD. It's capable of reaching over 6.2 GB/s when performing sequential reads in 128kB chunks. But asking for those chunks one at a time only gets us 680 MB/s. This drive requires a queue depth of at least 16 to hit 5 GB/s, and at least QD32 to hit 6 GB/s. Newer SSDs with faster flash memory may not require queue depths that are quite so high, but games will definitely need to make more than a few requests at a time to keep the console SSDs busy.

The consoles cannot afford to waste too much CPU power on communicating with the SSDs, so they need a way for just one or two threads to manage all of the IO requests and still have CPU time left over for those cores to do something useful with the data. That means the consoles will have to be programmed using asynchronous IO APIs, where a thread issues a read request to the operating system (or IO coprocessor) but goes back to work while the request is being processed. And the thread will have to check back later to see if a request has been fulfilled. In the hard drive days, such a thread would go off and do several non-storage tasks while waiting for a read operation to complete. Now, that thread will have to spend that time issuing several more requests.

In addition to keeping queue depths up, obtaining full speed from the SSDs will require doing IO in relatively large chunks. Trying to hit 5.5 GB/s with 4kB requests would require handling about 1.4M IOs per second, which would strain several parts of the system with overhead. Fortunately, games tend to naturally deal with larger chunks of data, so this requirement isn't too much trouble; it mainly means that many traditional measures of SSD performance are irrelevant.

Microsoft has said extremely little about the software side of the Xbox Series X storage stack. They have announced a new API called DirectStorage. We don't have any description of how it works or differs from existing or previous console storage APIs, but it is designed to be more efficient:

DirectStorage can reduce the CPU overhead for these I/O operations from multiple cores to taking just a small fraction of a single core.

The most interesting bit about DirectStorage is that Microsoft plans to bring it to Windows, so the new API cannot be relying on any custom hardware and it has to be something that would work on top of a regular NTFS filesystem. Based on our experiences testing fast SSDs under Windows, they could certainly use a lower-overhead storage API, and it would be applicable to far more than just video games.

Sony's storage API design is probably intertwined with their IO coprocessors, but it's unlikely that game developers have to be specifically aware that their IO requests are being offloaded. Mark Cerny has stated that games can bypass normal file IO, elaborated a bit in an interview with Digital Foundry:

There's low level and high level access and game-makers can choose whichever flavour they want - but it's the new I/O API that allows developers to tap into the extreme speed of the new hardware. The concept of filenames and paths is gone in favour of an ID-based system which tells the system exactly where to find the data they need as quickly as possible. Developers simply need to specify the ID, the start location and end location and a few milliseconds later, the data is delivered. Two command lists are sent to the hardware - one with the list of IDs, the other centring on memory allocation and deallocation - i.e. making sure that the memory is freed up for the new data.

Getting rid of filenames and paths doesn't win much performance on its own, especially since the system still has to support a hierarchical filesystem API for the sake of older code. The real savings come from being able to specify the whole IO procedure in a single step instead of the application having to manage parts like the decompression and relocating the data in memory—both handled by special-purpose hardware on the PS5.

For a more public example of what a modern high-performance storage can accomplish, it's worth looking at the io_uring asynchronous API added to Linux last year. We used it on our last round of enterprise SSD reviews to get much better throughput and latency out of the fastest drives available. Where old-school Unix style synchronous IO topped out at a bit less than 600k IOPS on our 36-core server, io_uring allowed a single core to hit 400k IOPS. Even compared to the previous asynchronous IO APIs in Linux, io_uring has lower overhead and better scalability. The API's design has applications communicating with the operating system in a very similar manner to how the operating system communicates with NVMe SSDs: pairs of command submission and completion queues that are accessible by both parties. Large batches of IO commands can be submitted with at most one system call, and no system calls are needed to check for command completion. That's a big advantage in a post-Spectre world where system call overhead is much higher. Recent experimentation has even shown that the io_uring design allows for shader programs running on a GPU to submit IO requests with minimal CPU involvement.

Most of the work relating to io_uring on Linux is too recent to have influenced console development, but it still illustrates a general direction that the industry is moving toward, driven by the same needs to make good use of NVMe performance without wasting too much CPU time.

Keeping Latency Under Control

While game developers will need to put some effort into extracting full performance from the console SSDs, there is a competing goal. Pushing an SSD to its performance limits causes latency to increase significantly, especially if queue depths go above what is necessary to saturate the drive. This extra latency doesn't matter if the console is just showing a loading screen, but next-generation games will want to keep the game running interactively while streaming in large quantities of data. Sony has outlined their plan for dealing with this challenge: their SSD implements a custom feature to support 6 priority levels for IO commands, allowing large amounts of data to be loaded without getting in the way when a more urgent read request crops up. Sony didn't explain much of the reasoning behind this feature or how it works, but it's easy to see why they need something to prioritize IO.


Loading a new world in 2.25 seconds as Ratchet & Clank fall through an inter-dimensional rift

Mark Cerny gave a hypothetical example of when multiple priority levels are needed: when a player is moving into a new area, lots of new textures may need to be loaded, at several GB per second. But since the game isn't interrupted by a loading screen, stuff keeps happening, and an in-game event (eg a character getting shot) may require data like a new sound effect to be loaded. The request for that sound effect will be issued after the requests for several GB of textures, but it needs to be completed before all the texture loading is done because stuttering sound is much more noticeable and distracting than a slight delay in the gradual loading of fresh texture data.

But the NVMe standard already includes a prioritization feature, so why did Sony develop their own? Sony's SSD will support 6 priority levels, and Mark Cerny claims that the NVMe standard only supports "2 true priority levels". A quick glance at the NVMe spec shows that it's not that simple:

The NVMe spec defines two different command arbitration schemes for determining which queue will supply the next command to be handled by the drive. The default is a simple round-robin balancing that treats all IO queues equally and leaves all prioritization up to the host system. Drives can also optionally implement the weighted round robin scheme, which provides four priority levels (not counting one for admin commands only). But the detail Sony is apparently concerned with is that among those four priority levels, only the "urgent" class is given strict priority over the other levels. Strict prioritization is the simplest form of prioritization to implement, but such methods are a poor choice for general-purpose systems. In a closed specialized system like a game console it's much easier to coordinate all the software that's doing IO in order to avoid deadlocks and starvation. much of the IO done by a game console also comes with natural timing requirements.

This much attention being given to command arbitration comes as a bit of a surprise. The conventional wisdom about NVMe SSDs is that they are usually so fast that IO prioritization is unnecessary, and wasting CPU time on re-ordering IO commands is just as likely to reduce overall performance. In the PC and server space, the NVMe WRR command arbitration feature has been largely ignored by drive manufacturers and OS vendors—a partial survey of our consumer NVMe SSD collection only turned up two brands that have enabled this feature on their drives. So when it comes to supporting third-party SSD upgrades, Sony cannot depend on using the WRR command arbitration feature. This might mean they also won't bother to use it even when a drive has this feature, instead relying entirely on their own mechanism managed by the CPU and IO coprocessors.

Sony says the lack of six priority levels on off the shelf NVMe drives means they'll need slightly higher raw performance to match the same real-world performance of Sony's drive because Sony will have to emulate the 6 priority levels on the host side, using some combination of CPU and IO coprocessor work. Based on our observations of enterprise SSDs (which are designed with more of a focus on QoS than consumer SSDs), holding 15-20% of performance in reserve typically keeps latency plenty low (about 2x the latency of an idle SSD) without any other prioritization mechanism, so we project that drives capable of 6.5GB/s or more should have no trouble at all.

Latency spikes as drives get close to their throughput limit

It's still a bit of a mystery what Sony plans to do with so many priority levels. We can certainly imagine a hierarchy of several priority levels for different kinds of data: Perhaps game code is the highest priority to load since at least one thread of execution will be completely stalled while handling a page fault, so this data is needed as fast as possible (and ideally should be kept in RAM full-time rather than loaded on the fly). Texture pre-fetching is probably the lowest priority, especially fetching higher-resolution mimpaps when a lower-resolution version is already in RAM and usable in the interim. Geometry may be a higher priority than textures, because it may be needed for collision detection and textures are useless without geometry to apply them to. Sound effects should ideally be loaded with latency of at most a few tens of milliseconds. Their patent mentions giving higher priority to IO done using the new API, on the theory that such code is more likely to be performance-critical.

Planning out six priority classes of data for a game engine isn't too difficult, but that doesn't mean it will actually be useful to break things down that way when interacting with actual hardware. Recall that the whole point of prioritization and other QoS methods is to avoid excess latency. Excess latency happens when you give the SSD more requests than it can work on simultaneously; some of the requests have to sit in the command queue(s) waiting their turn. If there are a lot of commands queued up, a new command added at the back of the line will have a long time to wait. If a game sends the PS5 SSD new requests at a rate totaling more than 5.5GB/s, a backlog will build up and latency will keep growing until the game stops requesting data more quickly than the SSD can deliver. When the game is requesting data at much less than 5.5GB/s, every time a new read command is sent to the SSD, it will start processing that request almost immediately.

So what's most important is limiting the amount of requests that can pile up in the SSD's queues, and once that problem is solved, there's not much need for further prioritization. It should only take one queue to put all the background, latency-insensitive IO commands into to be throttled, and then everything else can be handled with low latency.

Closing Thoughts

The transition of console gaming to solid state storage will change the landscape of video game design and development. A dam is breaking, and game developers will soon be free to ignore the limitations of hard drives and start exploring the possibilities of fast storage. It may take a while for games to fully utilize the performance of the new console SSDs, but there will be many tangible improvements available at launch.

The effects of this transition will also spill over into the PC gaming market, exerting pressure to help finally push hard drives out of low-end gaming PCs, and allowing gamers with high-end PCs to start enjoying more performance from their heretofore underutilized fast SSDs. And changes to to the Windows operating system itself are already underway because of these new consoles.

Ultimately, it will be interesting to see whether the novel parts of the new console storage subsystems end up being a real advantage that influences the direction of PC hardware development, or if they end up just being interesting quirks that get left in the dust as PC hardware eventually overtakes the consoles with superior raw performance. NVMe SSDs arrived at the high end of the consumer market five years ago. Now, they're crossing a tipping point and are well on the way to becoming the mainstream standard for storage.

What To Expect From Next-gen Games
Comments Locked

200 Comments

View All Comments

  • eddman - Monday, June 15, 2020 - link

    Yes, I added the CPU in the paths simply because the data goes through the CPU complex, but not necessarily through the cores.

    "Data coming in from the SSD can be forwarded .... to the GPU (P2P DMA)"

    You mean the data does not go through system RAM? The CPU still has to process the I/O related operations, right?

    It seems nvidia has tackled this issue with a proprietary solution for their workstation products:
    https://developer.nvidia.com/gpudirect
    https://devblogs.nvidia.com/gpudirect-storage/

    They talk about the data path between GPU and storage.

    "The standard path between GPU memory and NVMe drives uses a bounce buffer in system memory that hangs off of the CPU.

    GPU DMA engines cannot target storage. Storage DMA engines cannot target GPU memory through the file system without GPUDirect Storage.

    DMA engines, however, need to be programmed by a driver on the CPU."

    Maybe MS' DirectStorage is similar to nvidia's solution.
  • Oxford Guy - Monday, June 15, 2020 - link

    "Consoles" are nothing more than artificial walled software gardens that exist because of consumer stupidity.

    They offer absolutely nothing the PC platform can't offer, via Linux + Vulkan + OpenGL.

    Period.
  • Oxford Guy - Monday, June 15, 2020 - link

    "but also going a step beyond the PC market to get the most benefit out of solid state storage."

    In order to justify their existence. Too bad it doesn't justify it.

    It's more console smoke and mirrors. People fall for it, though.
  • Oxford Guy - Monday, June 15, 2020 - link

    Consoles made sense when personal computer hardware was too expensive for just playing games, for most consumers.

    Back in the day, when real consoles existed, even computer expansion modules didn't take off. Why? Cost. Those "consoles" were really personal computers. All they needed was a keyboard, writable storage, etc. But, people didn't upgrade ANY console to a computer in large numbers. Even the NES had an expansion port on the bottom that sat unused. Lots of companies had wishful thinking about turning a console into a PC and some of them used that in marketing and were sued vaporware/inadequateanddelayedware (Intellivision).

    Just the cost of adding a real keyboard was more than consumers were willing to pay. Even inexpensive personal computers (PCs!) had chicklet keyboards, like the Atari 400. That thing cost a lot to build because of the stricter EMI emissions standards of its time but Atari used a chicklet keyboard anyway to save money. Sinclair also used them. Many inexpensive "home" computers that had full-travel keyboards were so mushy they were terrible to use. Early home PCs like the VideoBrain highlight just how much companies tried to cut corners just on the keyboard.

    Then, there is the writable storage. Cassettes were too slow and were extremely unreliable. Floppy drives were way too expensive for most PC consumers until the Apple II (where Wozniak developed a software controller to reduce cost a great deal vs. a mechanical one). They remained too expensive for gaming boxes, with the small exception of the shoddy Famicom FDS in Japan.

    All of these problems were solved a long time ago. Writable storage is dirt cheap. Keyboards are dirt cheap. Full-quality graphics display hardware is dirt cheap (as opposed to the true console days when a computer with more pixels/characters would cost a bundle and "consoles" would have much less resolution).

    The only thing remaining is the question: "Is the PC software ecosystem good enough". The answer was a firm no when it was Windows + DirectX. Now that we have Vulkan, though, there is no need for DirectX. Developers can use either the low-latency lower-level Vulkan or the high-level OpenGL, depending upon their needs for specific titles. Consumers and companies don't have to pay the Microsoft tax because Linux is viable.

    There literally is no credible justification for the existence of non-handheld "consoles" anymore. There hasn't been for some time now. The hardware is the same. In the old days a console would have much less RAM memory, due to cost. It would have much lower resolution, typically, due to cost. It wouldn't have high storage capacity, due to cost.

    All of that is moot. There is NOT ONE IOTA of difference between today's "console" and a PC. The walled software garden can evaporate. All it takes is Dorothy to use her bucket of water instead of continuing to drink the Kool-Aid.
  • Oxford Guy - Monday, June 15, 2020 - link

    Back in the day:

    A console had:

    much lower-resolution graphics, designed for TV sets at low cost
    much less RAM
    no floppy drive
    no keyboard
    no hard disk

    A quality personal computer had:

    more RAM, plus expansion (except for Jobs perversities like the original Mac)
    80 column character-based or, later, high-resolution bitmapped monitor graphics
    (there were some home PCs that used televisions but had things like disk drives)
    floppy drive support
    hard disk support (except, again, for the first Mac, which was a bad joke)
    a full-travel full-size non-mushy keyboard
    expansion slots (typically — not the first Mac!)
    an operating system and first-party software (both of which cost)
    thick paperbook manuals
    typically, a more powerful CPU (although not always)

    Today:

    A console has:

    Nothing a PC doesn’t have except for a stupid walled software garden.

    A PC has:

    Everything a console has except for the ludicrous walled software garden, a thing that offers no added value for consumers — quite the opposite.
  • Oxford Guy - Monday, June 15, 2020 - link

    The common claim that "consoles" of today offer more simplicity is a lie, too.

    In the true console days, you'd stick a cartridge in, turn on the power, and press start.

    Today, just as with the "PC" (really the same thing) — you have a complex operating system that needs to be patched relentlessly. You have games that have to be patched relentlessly. You have microtransactions. You have log-ins/accounts and software stores. Literally, you have games on disc that you can't even play until you patch the software to be compatible with the latest OS DRM. Developers also helpfully use that as an opportunity to drastically change gameplay (as with PS3 Topspin) and you have no choice in the matter. Remember, it's always an "upgrade".

    The hardware is identical. Even the controllers, once one of the few advantages of consoles (except for some, like the Atari 5200, which were boneheaded), are the same. They use the same USB ports and such. There is no difference. Even if there were, the rise of Chinese manufacturing and the Internet means you could get a cheap and effective adapter with minimal fuss.

    You want fast storage so badly? You can get it on the PC. You want software that is honed to be fast and efficient? Easily done. It's all x86 stuff.

    Give me justified elaborate custom chips (not frivolous garbage like Apple's T2), truly novel form factors that are needed for special gameplay, and things like that and then, maybe, you might be able to sell to people on the higher end of the Bell curve.

    If I were writing an article on consoles I'd use a headline something like this: "Consoles of 2020: The SSD Speed Gimmick — Betting on the Bell Curve"

    It would be bad enough if there were only one extra stupid walled garden (beyond Windows + DirectX). But to have three is even more irksome.
  • edzieba - Monday, June 15, 2020 - link

    "partially resident textures"

    Megatexturing is back!

    "The most interesting bit about DirectStorage is that Microsoft plans to bring it to Windows, so the new API cannot be relying on any custom hardware and it has to be something that would work on top of a regular NTFS filesystem. "

    The latter does not imply the former. API support just means that the API calls will not fail. It doesn't mean they will be as fast as a system using dedicated hardware to handle those calls. Just like with DXR: you can easily support DXR calls on a GPU without dedicated BVH traversal hardware, they'll just be as slow as unaccelerated raytracing has always been.
    Soft API support for DirectStorage makes sense to aid in Microsoft's quest for 'cross play' between PC and XboX. If the same API calls can be used for both developers are more likely to work into implementing DirectStorage. As long as DirectStorage doesn't have too large a penalty when used on PC without dedicated hardware, the reduction in dev overhead is attractive.
  • eddman - Monday, June 15, 2020 - link

    "The latter does not imply the former. API support just means that the API calls will not fail. It doesn't mean they will be as fast as a system using dedicated hardware to handle those calls."

    True, but apparently nvidia's GPUDirect Storage, which enables direct transfer between GPU and storage, is a software only solution and doesn't require specialized hardware.

    If that's the case, then there's a good chance MS' DirectStorage is a software solution too.

    AFA I can tell, the custom I/O chips in XSX and PS5 are used for compressing the assets to increase the bandwidth, not enable direct GPU-to-storage access.

    We'll know soon enough.
  • ichaya - Monday, June 15, 2020 - link

    You have to ask: What is causing low FPS for current gen games? I think loading textures are by far the largest culprit, and even in cases where it's only a few levels or a few sections of a few levels, it does affect the overall immersion and playability of games where all of this storage tech should help.
  • Oxford Guy - Monday, June 15, 2020 - link

    I love how people forget how there is fast storage available on the "PC" (in quotes because, except for the Switch, these Sony/MS "consoles" are PCs with smoke and mirrors trickery to disguise that fact — the fact that all they are are stupidity taxes).

    Yes, stupidity taxes. That's exactly what "consoles" are, except for the Switch, which has a form factor that differs from PC.

Log in

Don't have an account? Sign up now