AnandTech Storage Bench - The Destroyer

The Destroyer is an extremely long test replicating the access patterns of very IO-intensive desktop usage. A detailed breakdown can be found in this article. Like real-world usage, the drives do get the occasional break that allows for some background garbage collection and flushing caches, but those idle times are limited to 25ms so that it doesn't take all week to run the test. These AnandTech Storage Bench (ATSB) tests do not involve running the actual applications that generated the workloads, so the scores are relatively insensitive to changes in CPU performance and RAM from our new testbed, but the jump to a newer version of Windows and the newer storage drivers can have an impact.

We quantify performance on this test by reporting the drive's average data throughput, the average latency of the I/O operations, and the total energy used by the drive over the course of the test.

ATSB - The Destroyer (Data Rate)

The Toshiba XG6 is slightly faster than the XG5 on The Destroyer. It still trails behind the fastest retail SSDs but at twice the speed of a mainstream SATA drive it's well into high-end territory.

ATSB - The Destroyer (Average Latency)ATSB - The Destroyer (99th Percentile Latency)

Average and 99th percentile latency have both improved for the XG6, bringing it even closer to the top of the charts and leaving only a small handful of drives that score better.

ATSB - The Destroyer (Average Read Latency)ATSB - The Destroyer (Average Write Latency)

The average read latency for the XG6 is only slightly better than the XG5, which was the slowest drive in the high-end tier. For average write latency, the XG6 represents a much more substantial improvement that puts it ahead of almost every other TLC-based drive.

ATSB - The Destroyer (99th Percentile Read Latency)ATSB - The Destroyer (99th Percentile Write Latency)

The XG5 already had very good QoS with 99th percentile read and write latencies that were quite low. The XG6 improves on both counts, with writes particularly improving.

ATSB - The Destroyer (Power)

The total energy usage of the XG6 over the course of The Destroyer is very slightly higher than what the XG5 required, but this tiny efficiency sacrifice is easily justified by the performance increases. Toshiba's XG series remains one of the few options for a high-performance NVMe SSD with power efficiency that is comparable to mainstream SATA drives.

Introduction AnandTech Storage Bench - Heavy
POST A COMMENT

32 Comments

View All Comments

  • Valantar - Friday, September 07, 2018 - link

    AFAIK they're very careful which patches are applied to test beds, and if they affect performance, older drives are retested to account for this. Benchmarks like this are never really applicable outside of the system they're tested in, but the system is designed to provide a level playing field and repeatable results. That's really the best you can hope for. Unless the test bed has a consistent >10% performance deficit to most other systems out there, there's no reason to change it unless it's becoming outdated in other significant areas. Reply
  • iwod - Thursday, September 06, 2018 - link

    So we are limited by PCI-e interface again. Since the birth of SSD, we pushed past SATA 3Gbps / 6Gbps, than PCI-E 2.0 x4 2GB/S and now PCI-E 3.0, 4GB/s.

    When are we going to get PCI-E 4.0, or since 5.0 is only just around the corner may as well wait for it. That is 16GB/s, plenty of room for SSD maker to figure out how to get there.
    Reply
  • MrSpadge - Thursday, September 06, 2018 - link

    There's no need to rush there. If you need higher performance, use multiple drives. Maybe on a HEDT or Enterprise platform if you need extreme performance.

    But don't be surprised if that won't help your PC as much as you thought. The ultimate limit currently is a RAMdisk. Launch a game from there or install some software - it's still surprisingly slow, because the CPU becomes the bottleneck. And that already applies to modern SSDs, which is obvious in benchmarks which test copying, installing or application launching etc.
    Reply
  • abufrejoval - Friday, September 07, 2018 - link

    Could also be the OS or the RAMdisk driver. When I finished building my 128GB 18-Core system with a FusionIO 2.4 TB leftover and 10Gbit Ethernet, I obviously wanted to bench it on Windows and Linux. I was rather shocked to see how slow things generally remained and how pretty much all these 36 HT-"CPU"s were just yawning.

    In the end I never found out, if it was the last free version (3.4.8) version of SoftPerfect's RAM disk that didn' seem to make use of all four memory Xeon E5 memory channels, or some bottleneck in Windows (never seen Windows update user more than a single core), but I never got anywhere near the 70GB/s Johan had me dream of (https://www.anandtech.com/show/8423/intel-xeon-e5-... Don't think I even saturated the 10Gbase-T network, if I recall correctly.

    It was quite different in many cases on Linux, but I do remember running an entire Oracle database on tmpfs once, and then an OLTP benchmark on that... again earning myself a totally bored system under the most intensive benchmark hammering I could orchestrate.

    There are so many serialization points in all parts of that stack, you never really get the performance you pay for until someone has gone all the way and rewritten the entire software stack from scratch for parallel and in-memory.

    Latency is the killer for performance in storage, not bandwidth. You can saturate all bandwidth capacities with HDDs, even tape. Thing is, with dozens (modern CPUs) or thousands (modern GPGPUs) SSDs *become tape*, because of the latencies incurred on non-linear access patterns.

    That's why after NVMe, NV-DIMMs or true non-volatile RAM is becoming so important. You might argue that a cache line read from main memory still looks like a tape library change against the register file of an xPU, but it's still way better than PCIe-5-10 with a kernel based block layer abstraction could ever be.

    Linear speed and loops are dead: If you cannot unroll, you'll have to crawl.
    Reply
  • halcyon - Monday, September 10, 2018 - link

    Thank you for writing this. Reply
  • Quantum Mechanix - Monday, September 10, 2018 - link

    Awesome write up- my favorite kind of comment, where I walk away just a *tiny* less ignorant. Thank you! :) Reply
  • DanNeely - Thursday, September 06, 2018 - link

    We've been 3.0 x4 bottlenecked for a few years.

    From what I've read about the implementing 4.0/5.0 on a mobo I'm not convinced we'll see them on consumer boards, at least not in its current form. The maximum PCB trace length without expensive boosters is too short, AIUI 4.0 is marginal to the top PCIe slot/chipset and 5.0 would need signal boosters even to go that far. Estimates I've seen were $50-100 (I think for an x16 slot) to make a 4.0 slot and several times that for 5.0. Cables can apparently go several times longer than PCB traces while maintaining signal quality, but I'm skeptical about them getting snaked around consumer mobos.

    And as MrSpadge pointed out in many applications scale out wider is an option, and what I've read that Enterprise Storage is looking at. Instead of x4 slots that have 2/4x the bandwidth of current ones that market is more interested in 5.0 x1 connections that have the same bandwidth as current devices but which would allow them to connect 4 times as many drives. That seems plausible to me since enterprise drive firmware is generally tuned for steady state performance not bursts and most of them don't come as close to saturating buses as high end consumer drives do for shorter/more intermitant workloads.
    Reply
  • abufrejoval - Friday, September 07, 2018 - link

    I guess that's why they are working on silicon photonics: PCB voltage levels, densities, layers, trace lengths... Whereever you look there are walls of physics rising into mountains. If only PCBs weren't so much cheaper than silicon interposers, photonics and other new and rare things! Reply
  • npz - Friday, September 07, 2018 - link

    I don't see how this particular drive, outside of any burst i/o is limited by PCIE3 x4 at all. It's not even close to top competitors in throughput.

    That said, higher bandwtich options can use PCIE AIC slots rather than m.2 for desktop as there is at least one planned nvme drive for x8. Yet the biggest bottleneck for I/O is small random I/O, including multi-threaded and the one that is most felt by the end user
    Reply
  • darwiniandude - Sunday, September 09, 2018 - link

    Any testing under windows on current MacBook Pro hardware? Those SSD's I would've thought are much much faster, but I'd love to see the same test on them. Reply

Log in

Don't have an account? Sign up now