Sequential Read Performance

Our first test of sequential read performance uses short bursts of 128MB, issued as 128kB operations with no queuing. The test averages performance across eight bursts for a total of 1GB of data transferred from a drive containing 16GB of data. Between each burst the drive is given enough idle time to keep the overall duty cycle at 20%.

Burst 128kB Sequential Read (Queue Depth 1)

The burst sequential read speed of the Toshiba XG6 is slightly slower than the XG5, and still middle of the road for NVMe drives. The top drives are approaching twice the QD1 performance of the XG6, so this is probably where Toshiba needs to focus the most on improving.

Our test of sustained sequential reads uses queue depths from 1 to 32, with the performance and power scores computed as the average of QD1, QD2 and QD4. Each queue depth is tested for up to one minute or 32GB transferred, from a drive containing 64GB of data. This test is run twice: once with the drive prepared by sequentially writing the test data, and again after the random write test has mixed things up, causing fragmentation inside the SSD that isn't visible to the OS. These two scores represent the two extremes of how the drive would perform under real-world usage, where wear leveling and modifications to some existing data will create some internal fragmentation that degrades performance, but usually not to the extent shown here.

Sustained 128kB Sequential Read

The longer sequential read test including moderately higher queue depths puts the XG6 in much better light, with clear improvement over the XG5 and scores that are behind only Samsung and Silicon Motion.

Sustained 128kB Sequential Read (Power Efficiency)
Power Efficiency in MB/s/W Average Power in W

The power efficiency of the Toshiba XG6 during sequential reads is uncontested by anything other current drive using flash memory. It delivers 15% better performance per Watt than the SM2262EN that is tied for highest absolute performance.

The Toshiba XG5 wasn't quite able to deliver its maximum sequential read speed at QD4, but the XG6 is saturated by then and delivers more than 3GB/s. The Silicon Motion controllers scale up in performance soonest, with the HP EX920 delivering full sequential read speed at QD2 while some high-end drives don't saturate until QD16.

Sequential Write Performance

Our test of sequential write burst performance is structured identically to the sequential read burst performance test save for the direction of the data transfer. Each burst writes 128MB as 128kB operations issued at QD1, for a total of 1GB of data written to a drive containing 16GB of data.

Burst 128kB Sequential Write (Queue Depth 1)

The burst sequential write speed of the Toshiba XG6 is another slight regression relative to the XG5, but it doesn't change its standing all that much. The Samsung 970 EVO and the upcoming high-end controllers from Silicon Motion and Phison offer substantially better QD1 performance, but the XG6 is more or less tied with most of the current high-end drives like the WD Black and the HP EX920.

Our test of sustained sequential writes is structured identically to our sustained sequential read test, save for the direction of the data transfers. Queue depths range from 1 to 32 and each queue depth is tested for up to one minute or 32GB, followed by up to one minute of idle time for the drive to cool off and perform garbage collection. The test is confined to a 64GB span of the drive.

Sustained 128kB Sequential Write

The sustained sequential write speed of the Toshiba XG6 is a substantial improvement over the XG5 and puts the XG6 very close to the top of the charts with clearly better performance than any of the BiCS3-based drives.

Sustained 128kB Sequential Write (Power Efficiency)
Power Efficiency in MB/s/W Average Power in W

The Toshiba XG5 was still holding on to its lead in power efficiency (among flash-based SSDs), and the XG6 runs up the score by another 5% by delivering much higher performance while still drawing less power than any competing high-end NVMe drive.

The sequential write performance of the XG6 shows some variability due to the SLC cache filling up and requiring some background work that doesn't fit within the idle time this test provides, but performance doesn't drop as much or as often as for the Phison E12, and the average at higher queue depths is still competitive with the drives that offer steadier write speeds.

Random Performance Mixed Read/Write Performance
Comments Locked

31 Comments

View All Comments

  • Valantar - Friday, September 7, 2018 - link

    AFAIK they're very careful which patches are applied to test beds, and if they affect performance, older drives are retested to account for this. Benchmarks like this are never really applicable outside of the system they're tested in, but the system is designed to provide a level playing field and repeatable results. That's really the best you can hope for. Unless the test bed has a consistent >10% performance deficit to most other systems out there, there's no reason to change it unless it's becoming outdated in other significant areas.
  • iwod - Thursday, September 6, 2018 - link

    So we are limited by PCI-e interface again. Since the birth of SSD, we pushed past SATA 3Gbps / 6Gbps, than PCI-E 2.0 x4 2GB/S and now PCI-E 3.0, 4GB/s.

    When are we going to get PCI-E 4.0, or since 5.0 is only just around the corner may as well wait for it. That is 16GB/s, plenty of room for SSD maker to figure out how to get there.
  • MrSpadge - Thursday, September 6, 2018 - link

    There's no need to rush there. If you need higher performance, use multiple drives. Maybe on a HEDT or Enterprise platform if you need extreme performance.

    But don't be surprised if that won't help your PC as much as you thought. The ultimate limit currently is a RAMdisk. Launch a game from there or install some software - it's still surprisingly slow, because the CPU becomes the bottleneck. And that already applies to modern SSDs, which is obvious in benchmarks which test copying, installing or application launching etc.
  • abufrejoval - Friday, September 7, 2018 - link

    Could also be the OS or the RAMdisk driver. When I finished building my 128GB 18-Core system with a FusionIO 2.4 TB leftover and 10Gbit Ethernet, I obviously wanted to bench it on Windows and Linux. I was rather shocked to see how slow things generally remained and how pretty much all these 36 HT-"CPU"s were just yawning.

    In the end I never found out, if it was the last free version (3.4.8) version of SoftPerfect's RAM disk that didn' seem to make use of all four memory Xeon E5 memory channels, or some bottleneck in Windows (never seen Windows update user more than a single core), but I never got anywhere near the 70GB/s Johan had me dream of (https://www.anandtech.com/show/8423/intel-xeon-e5-... Don't think I even saturated the 10Gbase-T network, if I recall correctly.

    It was quite different in many cases on Linux, but I do remember running an entire Oracle database on tmpfs once, and then an OLTP benchmark on that... again earning myself a totally bored system under the most intensive benchmark hammering I could orchestrate.

    There are so many serialization points in all parts of that stack, you never really get the performance you pay for until someone has gone all the way and rewritten the entire software stack from scratch for parallel and in-memory.

    Latency is the killer for performance in storage, not bandwidth. You can saturate all bandwidth capacities with HDDs, even tape. Thing is, with dozens (modern CPUs) or thousands (modern GPGPUs) SSDs *become tape*, because of the latencies incurred on non-linear access patterns.

    That's why after NVMe, NV-DIMMs or true non-volatile RAM is becoming so important. You might argue that a cache line read from main memory still looks like a tape library change against the register file of an xPU, but it's still way better than PCIe-5-10 with a kernel based block layer abstraction could ever be.

    Linear speed and loops are dead: If you cannot unroll, you'll have to crawl.
  • halcyon - Monday, September 10, 2018 - link

    Thank you for writing this.
  • Quantum Mechanix - Monday, September 10, 2018 - link

    Awesome write up- my favorite kind of comment, where I walk away just a *tiny* less ignorant. Thank you! :)
  • DanNeely - Thursday, September 6, 2018 - link

    We've been 3.0 x4 bottlenecked for a few years.

    From what I've read about the implementing 4.0/5.0 on a mobo I'm not convinced we'll see them on consumer boards, at least not in its current form. The maximum PCB trace length without expensive boosters is too short, AIUI 4.0 is marginal to the top PCIe slot/chipset and 5.0 would need signal boosters even to go that far. Estimates I've seen were $50-100 (I think for an x16 slot) to make a 4.0 slot and several times that for 5.0. Cables can apparently go several times longer than PCB traces while maintaining signal quality, but I'm skeptical about them getting snaked around consumer mobos.

    And as MrSpadge pointed out in many applications scale out wider is an option, and what I've read that Enterprise Storage is looking at. Instead of x4 slots that have 2/4x the bandwidth of current ones that market is more interested in 5.0 x1 connections that have the same bandwidth as current devices but which would allow them to connect 4 times as many drives. That seems plausible to me since enterprise drive firmware is generally tuned for steady state performance not bursts and most of them don't come as close to saturating buses as high end consumer drives do for shorter/more intermitant workloads.
  • abufrejoval - Friday, September 7, 2018 - link

    I guess that's why they are working on silicon photonics: PCB voltage levels, densities, layers, trace lengths... Whereever you look there are walls of physics rising into mountains. If only PCBs weren't so much cheaper than silicon interposers, photonics and other new and rare things!
  • darwiniandude - Sunday, September 9, 2018 - link

    Any testing under windows on current MacBook Pro hardware? Those SSD's I would've thought are much much faster, but I'd love to see the same test on them.
  • halcyon - Monday, September 10, 2018 - link

    Thanks for the review. For future, could you consider segregating the drives into different tiers based on results, e.g. video editing, dB, generic OS/boot/app drive, compilation, whatnot.

    Now it seems that one drive is better in ine thing, and another drive in anither scenario. But not having your in-depth knowledge, makes it harder to assess which drive would be closest to optimal in which scenario.

Log in

Don't have an account? Sign up now