In 2017, Toshiba was the first vendor to ship 64-layer 3D NAND in the consumer SSD market with their XG5 NVMe SSD. Now a little over a year later, the XG6 is the first SSD with 96-layer 3D NAND. The new generation of flash memory allows for better performance, improved power efficiency, and lower costs.

As the high-end tier of Toshiba's OEM SSD product line, the XG series is not officially available for retail purchase, but we think this one is pretty likely to be used as the starting point for a retail product. The last XG series drive with a retail counterpart was the XG3, the planar MLC-based sibling to the OCZ RD400. Toshiba's low-end NVMe BG series of single-chip BGA SSDs also got a retail version in the Toshiba RC100, mounted on a M.2 2242 card. Retail and OEM versions usually have some firmware differences and occasionally one or two significant hardware differences such as 19nm vs 15nm MLC for the XG3 and RD400, or 30mm vs 42mm card length for the BG3 and RC100. Despite those differences, OEM SSDs are usually a pretty accurate preview of later retail versions.

OEM SSDs are usually not designed with maximum performance as the primary goal. OEMs prefer to have the option of sourcing more than one SSD for use in each of their systems, and they aren't interested in paying a large premium for some of their drives to be substantially faster than the rest. That said, for certain systems OEMs do want a true high-end drive, and the bar for that gets higher with every generation. Late last year, Toshiba introduced a higher performing XG5-P variant with a focus on better performance for more intensive workloads and benchmarks that exercise the full drive capacity with lots of random access. The XG6 is only intended to directly replace the XG5, but in light of the performance increases it brings, the 1TB XG5-P is now obsolete. The 2TB XG5-P may stick around for a while longer simply because the XG6 is not available in capacities above 1TB.

The new 96-layer BiCS4 3D TLC NAND used by the Toshiba XG6 is the most advanced flash memory currently shipping, but relative to the 64-layer BiCS3 that currently makes up most of the NAND volume from Toshiba and SanDisk it is more of an incremental update rather than revolutionary change. The increased layer count improves density but the TLC die capacities are still 256Gb and 512Gb. The I/O interface has been upgraded to the Toggle NAND 3.0 standard, with speeds in the 667-800MT/s range compared to the 400-533MT/s speeds used by earlier 3D NAND from Toshiba. The speed bump brings Toshiba's NAND up to par for its current competition, but it will soon be eclipsed by the 1.4GT/s Toggle 4.0 interface that Samsung's upcoming 96L V-NAND will be using. (Though it remains to be seen whether such a big increase in interface speed will have much effect on overall drive performance when drives will still be limited to PCIe 3.0 x4 speeds for another generation or two.) The NAND interface voltage has also dropped from 1.8V to 1.2V, so the higher I/O speed shouldn't have much impact on power efficiency.

Toshiba OEM NVMe SSD Comparison
Model XG6 XG5 XG5-P BG3 XG3
Retail Counterpart None RC100 RD400
Capacities 256GB, 512GB, 1024GB 1TB, 2TB 128GB, 256GB, 512GB 128GB, 256GB, 512GB, 1024GB
Form Factor M.2 2280 M.2 2230 M.2 2280
Host Interface PCIe 3.1 x4 PCIe 3.1 x2 PCIe 3.1 x4
Protocol NVMe 1.3a NVMe 1.2.1 NVMe 1.1b
NAND Flash Toshiba 96L BiCS4 3D TLC Toshiba 64L BiCS3 3D TLC Toshiba 19nm MLC
Sequential Read 3180 MB/s 3000 MB/s 3000 MB/s 1500 MB/s 2400 MB/s
Sequential Write 2960 MB/s 2100 MB/s 2200 MB/s 800 MB/s 1500 MB/s
Random Read 355k IOPS   320k IOPS    
Random Write 365k IOPS   265k IOPS    
Power Read 4.2 W 4.5 W 4.9 W 3.3 W 5.5 W
Write 4.7 W 3.4 W 3.2 W 6.4 W
Idle 3 mW 3 mW 3 mW 5 mW 6 mW
TCG Opal Encryption Optional No

Aside from the upgrade to a new generation of 3D NAND, not much has changed from the XG5. The Toshiba XG6 is still using the same TC58NCP090GSB 8-channel controller as the XG5, but with another year's worth of firmware development. The use of an existing controller probably helped Toshiba get the XG6 out the door sooner and ensure they could be first to ship drives with 96L NAND, but it is possible that the XG6's performance is being held back a bit by the older controller. The controller is not really obsolete yet since it is still one of the most power-efficient NMVe controllers available, but the new in-house controller Western Digital debuted earlier this year gets more performance out of the same flash while usually offering similar power efficiency. Toshiba will need a new controller next year in order to keep the XG series in the high-end segment.

The basic layout of the XG6 has not changed from the XG5, though the power delivery components have been modified slightly, likely to accommodate the lower voltage for the NAND interface. The XG6 is another single-sided design to maximize compatibility with the thinnest notebook computers. Our 1TB sample has two NAND packages each containing eight 512Gb BiCS4 3D TLC dies. Toshiba is using 256Gb dies on at least some of the smaller capacities, but they won't say specifically whether it's just the 256GB model or also the 512GB model. Either way, it's nice that they are willing to use the slightly less cost-effective low-capacity parts for smaller drives in order retain most of the performance by keeping all 8 of the controller's channels populated.

Our XG6 came with an unusual rigid plastic label that gives it the polished appearance of a retail product, but doesn't actually serve as the heatspreader it resembles. Thanks to the power efficiency of Toshiba's controller, heat shouldn't be a problem at all.

The Competition

We don't get OEM SSDs in for review very often. Toshiba is only really sampling the XG series because it is where their 64L and 96L NAND have debuted, and they haven't had retail versions ready to sample instead. Most of the other drives we have to compare the XG6 against are retail models, but most of them have OEM counterparts based on the same hardware and similar or identical firmware. For example, the WD Black is closely related to the WD SN720 that was announced slightly earlier but wasn't sampled for review.

AnandTech 2018 Consumer SSD Testbed
CPU Intel Xeon E3 1240 v5
Motherboard ASRock Fatal1ty E3V5 Performance Gaming/OC
Chipset Intel C232
Memory 4x 8GB G.SKILL Ripjaws DDR4-2400 CL15
Graphics AMD Radeon HD 5450, 1920x1200@60Hz
Software Windows 10 x64, version 1709
Linux kernel version 4.14, fio version 3.6
Spectre/Meltdown microcode and OS patches current as of May 2018
AnandTech Storage Bench - The Destroyer
POST A COMMENT

34 Comments

View All Comments

  • 29a - Thursday, September 06, 2018 - link

    If that was the case they wouldn't use the spectre/md patches. Reply
  • Valantar - Friday, September 07, 2018 - link

    AFAIK they're very careful which patches are applied to test beds, and if they affect performance, older drives are retested to account for this. Benchmarks like this are never really applicable outside of the system they're tested in, but the system is designed to provide a level playing field and repeatable results. That's really the best you can hope for. Unless the test bed has a consistent >10% performance deficit to most other systems out there, there's no reason to change it unless it's becoming outdated in other significant areas. Reply
  • iwod - Thursday, September 06, 2018 - link

    So we are limited by PCI-e interface again. Since the birth of SSD, we pushed past SATA 3Gbps / 6Gbps, than PCI-E 2.0 x4 2GB/S and now PCI-E 3.0, 4GB/s.

    When are we going to get PCI-E 4.0, or since 5.0 is only just around the corner may as well wait for it. That is 16GB/s, plenty of room for SSD maker to figure out how to get there.
    Reply
  • MrSpadge - Thursday, September 06, 2018 - link

    There's no need to rush there. If you need higher performance, use multiple drives. Maybe on a HEDT or Enterprise platform if you need extreme performance.

    But don't be surprised if that won't help your PC as much as you thought. The ultimate limit currently is a RAMdisk. Launch a game from there or install some software - it's still surprisingly slow, because the CPU becomes the bottleneck. And that already applies to modern SSDs, which is obvious in benchmarks which test copying, installing or application launching etc.
    Reply
  • abufrejoval - Friday, September 07, 2018 - link

    Could also be the OS or the RAMdisk driver. When I finished building my 128GB 18-Core system with a FusionIO 2.4 TB leftover and 10Gbit Ethernet, I obviously wanted to bench it on Windows and Linux. I was rather shocked to see how slow things generally remained and how pretty much all these 36 HT-"CPU"s were just yawning.

    In the end I never found out, if it was the last free version (3.4.8) version of SoftPerfect's RAM disk that didn' seem to make use of all four memory Xeon E5 memory channels, or some bottleneck in Windows (never seen Windows update user more than a single core), but I never got anywhere near the 70GB/s Johan had me dream of (https://www.anandtech.com/show/8423/intel-xeon-e5-... Don't think I even saturated the 10Gbase-T network, if I recall correctly.

    It was quite different in many cases on Linux, but I do remember running an entire Oracle database on tmpfs once, and then an OLTP benchmark on that... again earning myself a totally bored system under the most intensive benchmark hammering I could orchestrate.

    There are so many serialization points in all parts of that stack, you never really get the performance you pay for until someone has gone all the way and rewritten the entire software stack from scratch for parallel and in-memory.

    Latency is the killer for performance in storage, not bandwidth. You can saturate all bandwidth capacities with HDDs, even tape. Thing is, with dozens (modern CPUs) or thousands (modern GPGPUs) SSDs *become tape*, because of the latencies incurred on non-linear access patterns.

    That's why after NVMe, NV-DIMMs or true non-volatile RAM is becoming so important. You might argue that a cache line read from main memory still looks like a tape library change against the register file of an xPU, but it's still way better than PCIe-5-10 with a kernel based block layer abstraction could ever be.

    Linear speed and loops are dead: If you cannot unroll, you'll have to crawl.
    Reply
  • halcyon - Monday, September 10, 2018 - link

    Thank you for writing this. Reply
  • Quantum Mechanix - Monday, September 10, 2018 - link

    Awesome write up- my favorite kind of comment, where I walk away just a *tiny* less ignorant. Thank you! :) Reply
  • DanNeely - Thursday, September 06, 2018 - link

    We've been 3.0 x4 bottlenecked for a few years.

    From what I've read about the implementing 4.0/5.0 on a mobo I'm not convinced we'll see them on consumer boards, at least not in its current form. The maximum PCB trace length without expensive boosters is too short, AIUI 4.0 is marginal to the top PCIe slot/chipset and 5.0 would need signal boosters even to go that far. Estimates I've seen were $50-100 (I think for an x16 slot) to make a 4.0 slot and several times that for 5.0. Cables can apparently go several times longer than PCB traces while maintaining signal quality, but I'm skeptical about them getting snaked around consumer mobos.

    And as MrSpadge pointed out in many applications scale out wider is an option, and what I've read that Enterprise Storage is looking at. Instead of x4 slots that have 2/4x the bandwidth of current ones that market is more interested in 5.0 x1 connections that have the same bandwidth as current devices but which would allow them to connect 4 times as many drives. That seems plausible to me since enterprise drive firmware is generally tuned for steady state performance not bursts and most of them don't come as close to saturating buses as high end consumer drives do for shorter/more intermitant workloads.
    Reply
  • abufrejoval - Friday, September 07, 2018 - link

    I guess that's why they are working on silicon photonics: PCB voltage levels, densities, layers, trace lengths... Whereever you look there are walls of physics rising into mountains. If only PCBs weren't so much cheaper than silicon interposers, photonics and other new and rare things! Reply
  • npz - Friday, September 07, 2018 - link

    I don't see how this particular drive, outside of any burst i/o is limited by PCIE3 x4 at all. It's not even close to top competitors in throughput.

    That said, higher bandwtich options can use PCIE AIC slots rather than m.2 for desktop as there is at least one planned nvme drive for x8. Yet the biggest bottleneck for I/O is small random I/O, including multi-threaded and the one that is most felt by the end user
    Reply

Log in

Don't have an account? Sign up now