Back to Article

  • iwod - Thursday, August 21, 2014 - link

    Finally, I have been waiting for two weeks for Anandtech to report this! Since other places aren't much good at discussing it.

    "in our experience the efficiency of PCIe has been about 80%"

    What causes that? I am pretty sure the PCIe has very low overhead.
    I think this will be the next SSD for any current SSD owner to upgrade to. Since all current SSD are piratically limited by SATA. And May be its time for Apple to make their own firmware and SSD with this controller?
  • Kristian Vättö - Thursday, August 21, 2014 - link

    "Finally, I have been waiting for two weeks for Anandtech to report this! Since other places aren't much good at discussing it."

    That's the reason why I'm not a big fan of live reporting at trade shows. As everyone is trying to be the first, I rather take my time and add some analysis instead of rewriting the PR. Too bad I didn't have the chance to meet with Marvell at FMS, so my details are limited to the PR :/

    As for the PCIe efficiency, I'm not sure about that (yet). Based to my internal tests you can only get ~780MB/s out of a PCIe 2.0 x2 link and ~1560MB/s with x4, and Ryan, our GPU editor, confirmed similar efficiency with PCIe 3.0 with CUDA bandwidth.

    From what I have heard, there are ways to increase the maximum bandwidth (which is why SF3700 is rated at up to 1.8GB/s with PCIe 2.0 x4) by playing with PCIe clock settings but I have yet to try that. I will definitely investigate this once we have more PCIe SSDs shipping.
  • repoman27 - Thursday, August 21, 2014 - link

    It's due to protocol overhead and is directly related to the TLP Maximum Payload Size. Each Transaction Layer Packet has either a 12 or 16 byte header depending on whether it's 32 or 64-bit, optional ECRC which adds another 4 bytes, a 2 byte sequence number, LCRC which uses 4 bytes, and another couple bytes for framing. The TLPs are also interspersed with 8 byte Data Link Layer Packets at regular intervals. With a TLP Max Payload Size of 128 B, which is typical of current Intel desktop and mobile platforms, and provided no retransmissions, that works out to a theoretical peak efficiency of 2560 bytes of payload throughput for every 3112 bytes transferred, or ~82%. With larger maximum payload sizes, better efficiency can be achieved—up to 99% for a payload size of 4096 B.

    I really hope this controller provides for more than 8 channels, seeing as you would need 16 channels running at north of 200 MB/s apiece to hit the 3240 MB/s that a PCIe 3.0 x4 link is capable of.
  • Kristian Vättö - Thursday, August 21, 2014 - link

    Thanks for the detailed explanation, it makes a lot more sense now.

    Most of the currently available NAND already support ONFI 3.0 or Toggle-Mode 2.0, which are good for up to 400MB/s per channel, so achieving 3GB/s should be possible even with an 8-channel design.
  • repoman27 - Thursday, August 21, 2014 - link

    And a quick count shows the 88SS1093 package has 557 balls vs 400 for the 88SS9187, 320 for the 88SS1074, or 289 for the 88NV9145. So it could be a more than 8 channel design, or they actually expect the 400 MT/s NAND interfaces to deliver close to 400 MB/s. Reply
  • npz - Thursday, August 21, 2014 - link

    Most dekstop BIOS actually give you the ability to set the TLP payload size up to 4k from a several years ago, and the onboard chipset devices do support it. The only issue are add-on devices and switches. But all modern devices should support 4k packets. Very few however support ECRC. Reply
  • micksh - Friday, August 22, 2014 - link

    Since long time ago PCIe controller has been in CPU. Intel desktop processors support only 128 bytes TLP payload size. Server CPUs (E, EP series in LGA2011 socket) support 256b maximum. Reply
  • iwod - Friday, August 22, 2014 - link

    That is something i have been thinking about as well. We are running out of PCI-Express lanes direct from the CPU, We need 4x for SSD. Direct Connected I/O !!!, 16x for GPU, and a few more for other connectivity. Reply
  • DanNeely - Friday, August 22, 2014 - link

    Skylake is rumored to have 20 CPU lanes on its massmarket/consumer model to feed PCIe storage without getting in the way of the GPU. Reply
  • iwod - Saturday, August 23, 2014 - link

    Well looks like I will have to skip Broadwell generation then. With this controller, PCI-E based SSD and NVMe i think the bottleneck will be shifted to somewhere else. Hopefully Software; OS / Filesystem will catch up to take advantage of it soon. Reply
  • leminlyme - Tuesday, September 02, 2014 - link

    I mean but, haswell-e has 28 and 40 lanes? Forgive me if I'm mistaken, I don't understand all of this yet I'm just reading and learning now as a "new enthusiast" Reply
  • npz - Friday, August 22, 2014 - link

    You may be right. My own experience was from programming on AMD chipsets from several years ago where the PCIE root complex was (and still is) in the mobo chipset. Reply
  • npz - Thursday, August 21, 2014 - link

    Also Kristian, don't forget 8b/10b encoding. IMO the "theoretical max" figures quoted should never be the giga-transfers of the physical layer, but rather the post-8b/10b bandwidth of the data link layer since the theoretical max for data transmission will always be reduced by 20% Reply
  • Kristian Vättö - Friday, August 22, 2014 - link

    As repoman27 stated below, the 80% efficiency is after the 8b/10b encoding overhead, so I had that included already :) Reply
  • eSyr - Friday, August 22, 2014 - link

    PCIe Gen3 uses 128b/130b encoding. Reply
  • Stan11003 - Thursday, August 21, 2014 - link

    From Wikipedia:
    PCIe 1.x uses an 8b/10b encoding scheme that results in a 20 percent ((10−8)/10) overhead on the raw bit rate. It uses a 2.5 GHz clock rate, therefore delivering an effective 250 000 000 bytes per second (250 MB/s) maximum data rate.

    This scheme was used for PCIe 2.x also lucky for all of us PCIe 3.0 has a new scheme.

    PCI Express 3.0 upgrades the encoding scheme to 128b/130b from the previous 8b/10b encoding, reducing the overhead to approximately 1.54% ((130–128)/130), as opposed to the 20% overhead of PCI Express 2.0. This is achieved by a technique called "scrambling" that applies a known binary polynomial to a data stream in a feedback topology. Because the scrambling polynomial is known, the data can be recovered by running it through a feedback topology using the inverse polynomial. PCI Express 3.0's 8 GT/s bit rate effectively delivers 985 MB/s per lane, practically doubling the lane bandwidth relative to PCI Express 2.0.[21]
  • repoman27 - Thursday, August 21, 2014 - link

    The 80% number Kristian referred to is on top of the loss due to encoding efficiency. But yes, PCIe 2.0 is really 4 Gbit/s per lane because of 8b/10b encoding, yet PCIe 3.0 is still 7.877 Gbit/s (close to the nominal 8 Gbit/s) because of the switch to 128b/130b. Reply
  • DIYEyal - Thursday, August 21, 2014 - link

    Finally, the NVMe revolution begins.. (I'm aware it's not the first product, but it's good to see more NVMe controllers), although this seems like it's targeted more towards M.2 SSDs rather than desktop PCIe cards.. Reply
  • frenchy_2001 - Friday, August 22, 2014 - link

    There is absolutely no difference from a controller perspective between the form factors: both use PCIe interface and will deliver data over the NVMe protocol. M.2 is limited to 4x lanes at up to Gen3 while a PCIe card could use up to 16x lanes Gen3. As this controller is limited to x4 anyway, no restriction... (as a comparison, Intel SF3700 uses a Gen3 x4 controller too and is sold in PCIe card format only so far). Reply
  • MrSpadge - Thursday, August 21, 2014 - link

    This could be a nice competitor to SF3700. And I'd love to see a MX100-style drive based on this :) Reply
  • Peeping Tom - Thursday, August 21, 2014 - link

    More innovation, please. Reply
  • chris81 - Friday, August 22, 2014 - link

    Always keep in mind that SATA is half-duplex, but PCIe is full-duplex. So even PCIe 2.0 x2 with 10 Gb/s is much better than 6 Gb/s SATA III (both use 8b/10b encoding) Reply
  • FaaR - Friday, August 22, 2014 - link

    It shouldn't be half duplex - well in theory anyway - due to having separate wires for send and receive... Reply

Log in

Don't have an account? Sign up now