Power Management

Idle power management for SSDs can be surprisingly complicated, especially for NVMe drives. But it is also vitally important for any battery-powered system. Real-world client storage workloads leave SSDs idle most of the time, so idle behavior is a big factor in how battery-friendly a drive is. Power draw when idle isn't the only thing that matters; how quickly a drive can enter or wake up from a low-power state can have a big impact on how effective its power management is.

For SATA SSDs, the host system doesn't have a lot of say in how the drive manages power. Using the SATA Aggressive Link Power Management (ALPM) feature to mostly power the SATA connection is usually sufficient to put a drive to sleep. But the lowest-power sleep state supported by SATA devices (DevSleep) requires extra signalling on a pin that's part of the SATA power connector. This means that DevSleep is in practice only supported on laptops, and our desktop testbeds cannot use or measure this sleep state.

NVMe includes numerous features pertaining to power management or thermal management. Most of them are optional in the NVMe spec, but there's a common subset supported by most consumer SSDs. NVMe drives can support numerous different power states, including multiple active and multiple inactive power states. The drive's firmware provides information about its capabilities to the host system:

Samsung 980 PRO
NVMe Power States
Controller Samsung Elpis
Firmware 1B2QGXA7
Power
State
Maximum
Power
Active/Idle Entry
Latency
Exit
Latency
PS 0 8.49 W Active - -
PS 1 4.48 W Active - 0.2 ms
PS 2 3.18 W Active - 1.0 ms
PS 3 40 mW Idle 2.0 ms 1.2 ms
PS 4 5 mW Idle 0.5 ms 9.5 ms

 

When a drive and the host OS both support the Autonomous Power State Transition (APST) feature in NMVe 1.1 or later, the host system can give the drive a set of rules for how long it should wait while idle before dropping down to a lower-power state. Operating systems choose these delays based on the power state entry and exit latencies claimed by the drive, and other configuration information about the system's overall tolerance for increased disk access times.

One common problem with the NVMe APST feature is that the NVMe spec doesn't really say anything about how APST interacts with PCIe Active State Power Management. SSD vendors tend to make assumptions that eg. a system which configures the drive to use its deepest idle state will fully support PCIe APSM. Most of the time, things work out, but it's also possible to end up with a drive that goes to sleep and never wakes up, or a drive that defaults back to its highest power state if anything goes wrong when it tries to go to sleep.

Using our Coffee Lake testbed that has fully functional PCIe power management, we test SSD power in three states. Active idle is when the drive is not using any externally-configurable power management features: SATA or PCIe link power management is disabled, and NVMe APST is off. We're now using a more reliable and broadly-compatible method for disabling APST through the Linux kernel rather than directly poking the drive's registers. This means that some drives will probably end up showing higher active idle power draw than we have previously measured.

Even though there are many combinations of power management settings and power states that can be used with a typical consumer NVMe SSD, we condense it down to just two low-power configurations to test. What we call "Desktop Idle" is using the features that are almost always available and working on desktop platforms, even if they're off by default. This includes enabling SATA ALPM, NVMe APST, and PCIe ASPM.

Next, we have the "Laptop Idle" state, with all the power-saving features fully enabled. For SATA SSDs, this would include DevSleep, so we can't fairly measure the Laptop Idle power draw of SSDs. For NVMe SSDs, this includes enabling PCIe L1 substates.

Idle Power Consumption - No PMIdle Power Consumption - DesktopIdle Power Consumption - Laptop

Accurately measuring the time it takes for a drive to enter a low-power state is tricky, but measuring the time taken to wake up is straightforward. We run a synthetic test that performs a single 4kB random read once every 10 seconds. When power management features are disabled and the drive stays in its active idle state, the random read latency will be determined mainly by the speed of the NAND flash. When the drive is in the Desktop Idle or Laptop Idle state, it will go to sleep between each random read, so we can repeatedly sample the time taken to wake up and perform a random read. The difference between this time and the random read latency from the drive in the active idle state is due almost entirely to the overhead of waking up the drive from a sleep state, and this difference is what we report as a drive's wake-up latency.

Idle Wake-Up Latency

 

Conclusions

In this article we hope we've given you an insight into how much goes into testing a modern solid state storage drive - something more than just running CrystalDiskMark and finding peak sequential speeds! The new suite is not only more in-depth, but also we've streamlined it somewhat for automation, enabling fewer sleepless nights as deadlines loom on the horizon (or put another way, more reviews to come). We're obviously keen to take on additional feedback with the testing, so please leave a comment below.

Advanced Synthetic Tests: Block Sizes and Cache Size Effects
Comments Locked

70 Comments

View All Comments

  • drmaddogs - Saturday, June 19, 2021 - link

    Random is measured by Chaos measures. Turing had it best. And AI mimics this like the human brain.
  • pexxie - Friday, February 12, 2021 - link

    I was hoping to hear more from the linux fundi. :-(
    I guess criticism is easy, guidance takes effort. :-P
  • pexxie - Saturday, February 13, 2021 - link

    An alternative to this might be a retention or volatility test. So basically hook the SSD up in a way that you can quickly yank out its sata or power cable. Then copy a very big file to it, and immediately after Windows says the copy is done; yank out the data or power cable. Then reboot and do a checksum on the file on the target SSD, and compare to the original, and see if any of them have actually written all the data.
  • pexxie - Saturday, February 13, 2021 - link

    I wish we could edit posts. Grrr.
    Otherwise if it's an M.2 slot; hit the reset button on the PC immediately after Windows says the file has finished copying. Then compare checksums.
  • pexxie - Saturday, February 13, 2021 - link

    So basically testing power loss resiliency. There in the 1st world power reliability is of no concern, but it's a big concern here in the 3rd world. Power aint reliable like in America.
  • pexxie - Saturday, February 13, 2021 - link

    You can observe the disk's misconduct with the disk LED on your chassis. The disk LED should stop when the file copy is done, but it doesn't - so it still takes time for it to get it onto non-volatile storage. So the data is still floating around in volatile memory while that LED is still on. I have 4 SSDs - for one of them the LED only stays on like a second after the file copy is "done." The others take 5-ish seconds. They all fail in a power cut test - killing power immediately after the OS says the copy is "done." Checked in Windows and Linux. I suspected this was misconduct by Windows, but since I see it in linux too; I'm more confident about it being disk misconduct.
  • pexxie - Saturday, February 13, 2021 - link

    My bad. Actually this LED thing was because of buffered writes by the OS. Using xcopy in windows with the /J parameter avoids this "misconduct." So it is actually the OS behaving badly. Now to just figure out how to force all writes to be unbuffered....
    Even using unbuffered writing; my SSDs still fail my power cut test - parts of the file sit in volatile memory for too long after the copy is "done" and the file gets corrupted on the destination disk.
  • pexxie - Sunday, February 14, 2021 - link

    Woohoo! Finally solved this by mounting partitions in linux using the "sync" option. I knew TLC chips were insanely slow, but damn - less than 1MB/s sequential writing is madness. At least I'm getting 10MB/s sequential with my old MLC chips. So it was the doing of the OS all long. Multiple layers of caching make a tortoise storage medium look like a rabbit.

    Won't add any more posts/spam. Just wish I could consolidate into 1.
  • kpb321 - Monday, February 1, 2021 - link

    How is the AMD Ryzen 5 3600X system being run without a GPU? That chip doesn't have integrated video so generally I expect it would fail at Post with beep codes. AFAIK none of the AMD APUs have PCI-E 4 support so I don't think there is a way to use integrated video and support PCI-E 4. I mean it doesn't need much of a video card and the 580 in the other system is probably overkill for storage testing but it seems like it would need something even if it's installed in one of the PCI-e 3 lanes hanging off the chipset instead of the 4.0 lane off the cpu.
  • frbeckenbauer - Monday, February 1, 2021 - link

    You can run Ryzen headless without issues on many motherboards, while some will indeed refuse to boot. MSI apparently provides a BIOS that has the error disabled so it works headless if you ask them.

Log in

Don't have an account? Sign up now