RAPID 2.0: Support For More RAM & Updated Caching Algorithm

When the 840 EVO launched a year ago, Samsung introduced a new feature called RAPID (Real-time Accelerated Processing of I/O Data). The idea behind RAPID is very simple: it uses the excess DRAM in your system to cache IOs, thus accelerating storage performance. Modern computers tend to have quite a bit of DRAM that is not always used by the system, so RAPID turns a portion of that into a DRAM cache. 

With the 850 Pro, Samsung is introducing Magician 4.4 along with an updated version of RAPID. The 1.0 version of RAPID supported up to 1GB of DRAM (or up to 25% if you had less than 4GB of RAM) but the 2.0 version increases the RAM allocation to up to 4GB if you have 16GB of RAM or more. There is still the same 25% limit, meaning that RAPID will not use 4GB of your RAM if you only have 8GB installed in your system.

I highly recommend that you read the RAPID page of our 840 EVO review because Anand explained the architecture and behavior of RAPID in detail, so I will keep the fundamentals short and focus on what has changed. 

In addition to increasing the RAM allocation, Samsung has also improved the caching algorithms. Unfortunately, I was not able to get any details before the launch but I am guessing that the new version includes better optimization for file types and IO sizes that get the biggest benefit from caching. Remember, while RAPID works at the block level, the software also looks at the file types to determine what files and IO blocks should be prioritized. The increased RAM allocation also needs an optimized set of caching algorithms because with a 4GB cache RAPID is able to cache more data at a time, which means it can relax the filetype and block size restrictions (i.e. it can also cache larger files/IOs).

To test how the new version of RAPID performs, I put it through our Storage Benches as well as PCMark 8’s storage test. Our testbed is equipped with 32GB of RAM, so we should be able to get the full benefit of RAPID 2.0.

Samsung SSD 850 Pro 256GB
  ATSB - Heavy 2011 Workload (Avg Data Rate) ATSB - Heavy 2011 Workload (Avg Service Time) ATSB - Light 2011 Workload (Avg Data Rate) ATSB - Light 2011 Workload (Avg Service Time)
RAPID Disabled 310.8MB/s 676.7ms 366.6MB/s 302.5ms
RAPID Enabled 549.1MB/s 143.4ms 664.4MB/s 134.6ms

The performance increase in our Storage Benches is pretty outstanding. In both the Heavy and Light suites the increase in throughput is around 80%, making the 850 Pro even faster than the Samsung XP941 PCIe SSD. 

Samsung SSD 850 Pro 1TB
  PCMark 8 - Storage Score PCMark 8 - Storage Bandwidth
RAPID Disabled 4998 298.6MB/s
RAPID Enabled 5046 472.8MB/s

PCMark 8, on the other hand, tells a different story. As you can see, the bandwidth is again much faster, about 60%, but the storage score is only a mere 1% higher. 

PCMark 8 - Application Performance

PCMark 8 also records the completion time of each task in the storage suite, which gives us an explanation as to why the storage scores are about equal. The fundamental issue is that today’s applications are still designed with hard drives in mind, meaning that they cannot utilize the full potential of SSDs. Even though the throughput is much higher with RAPID, the application performance is not because the software has been designed to wait several milliseconds for each IO to complete, so it does not know what to do when the response time is suddenly in the magnitude of a millisecond or two. That is why most applications load the necessary data to RAM when launched and only access storage when it is a must as back in the hard drive days, you wanted to avoid touching the hard drive as much as possible. 

It will be interesting to see what the industry does with the software stack over the next few years. In the enterprise, we have seen several OEMs release their own APIs (like SanDisk’s ZetaScale) so companies can optimise their server software infrastructure for SSDs and take the full advantage of NAND. I do not believe that a similar approach works for the client market as ultimately everything is on the hands of Microsoft. 

I also tried running the 2013 suite, a.k.a. The Destroyer, but for some reason RAPID did not like that and the system BSODed midway through the test. I am thinking that this is because our Storage Benches are ran without a partition, whereas RAPID also works at the file system level in the sense that it takes hints of what files should be cached. Due to that, it may be as simple as that under a high queue depth workload (like the ATSB2013), RAPID does not know what IOs to cache because there is no filesystem to guide it. I faced the same BSOD issue immediately when I fired up our IO consistency test (also ran without a partition), but when I tested with a similar 4KB random write workload using the new Iometer (which supports filesystem testing), there was absolutely no issue. This further suggests that the issue lies in our tests instead of the RAPID software itself as end-users will always run the drive with a partition anyway.

As Anand mentioned in the 840 EVO review, it is possible to monitor RAPID’s RAM usage by looking at the non-paged RAM pool. Instead of just looking at the resource monitor, I decided to take the monitoring one step further by recording the RAM usage over time with Windows’ Performance Monitor while running the 2011 Heavy workload. RAPID seems to behave fairly aggressively when it comes to RAM caching as the RAM usage increases to ~4.7GB almost immediately after firing up the test and stays there almost throughout the test. There are some drops, although I am not sure what is causing them. The idle times are limited to a maximum of 25 seconds when running the trace, so some drops could be caused by that. I need to do run some additional test and monitor the IOs to see if it is just the idle times of whether RAPID is excluding certain types of IOs. 

I also ran ATTO to see how the updated RAPID responses to different transfer sizes. It looks like read performance scales quite linearly until hitting the IO size of 256KB. ATTO stores its performance values in 32-bit integers and with RAPID enabled performance exceeds the size of the result variable, thus wrapping around back to 0.

With writes, RAPID continues to cache fluently until hitting 1MB, which is when it starts to cache less aggressively. 

3D NAND In Numbers: Is It Economical? Performance Consistency
Comments Locked

160 Comments

View All Comments

  • Cerb - Tuesday, July 1, 2014 - link

    As soon as it is cheap enough. But, don't get your hopes up about performance. SD cards are mostly limited by the controllers being slow, and in the tiny package they fit in, with the narrow margins they have, there's not a lot of room, physically and economically, to give them fast controllers, even if you get a big one that must have several NAND dies, and are talking about full-size SD, where multiple channels might be viable. It sucks, and I dislike shopping for SD cards as much as anybody, but today, that's how it is.
  • frenchy_2001 - Tuesday, July 1, 2014 - link

    I think he was talking about V-NAND (3D cells) which is independent of the controller.
    I would guess it will, as density will continue to scale up which will make it the cheaper technology.
    It is cutting edge now, but will let Samsung scale higher densities very aggressively in the coming years, replacing all their 2D NAND production (they announced it when presenting the 3D cells).
  • Harry Lloyd - Tuesday, July 1, 2014 - link

    Personally I have no interest in this kind of performance, and I really hope they focus on reducing prices and increasing capacities. The MX100 is just great for home usage (system and gaming), and I would like to see a 512 GB equivalent for around 100 $ by the end of 2015.
  • Spatty - Tuesday, July 1, 2014 - link

    "Oftentimes when cell size is discussed, it is only the actual size of the cell that is taken into account, which leaves the distance between cells out of the conclusion."

    Incorrect. Oftentimes what is being discussed is the half pitch. The 16nm, 19nm, 20nm, etc of the die. That is not the cell. The cell is Always defined as the repeatable structure in a memory device, and this includes the space between cells as described. The cell size is incorrectly referenced as being the half pitch.

    Then there is marketing gimmick by companies who call their products 19nm when it is really 19nm by 2xnm. A rectangle and not a true 19nm square half pitch.
  • Larry Endomorph - Tuesday, July 1, 2014 - link

    Good review. Bad charts. All of these are useless to color blind people:
    http://images.anandtech.com/doci/8216/NAND%20overv...
    http://images.anandtech.com/doci/8216/cell%20inter...
    http://images.anandtech.com/doci/8216/V-NAND_1.png
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
    http://images.anandtech.com/doci/8216/850%20Pro%20...
  • Cerb - Tuesday, July 1, 2014 - link

    I never paid much attention, but you're right. If they changed the point shapes, and maybe dashed a couple of the lines, they could take care of that easily.
  • fokka - Tuesday, July 1, 2014 - link

    it's great to see a new drive from samsung and even greater seeing them advancing ssd tech and performance in such substantial ways. keeping that in mind i'm not really surprised about the msrp sammy is asking for its drives. and as always when new devices hit the scene, we're comparing msrp with real market prices here, so the difference should be a bit lower in a couple weeks when enough stock is available.

    that said, even if sata3 remains the most important storage interface today, it's kind of a shame seeing such a beautiful drive limited by this "old" interface. i know the new standards like m2, sata3.2 and pci-e-drives are still kind of a mess, but we already saw what higher throughputs in combination with more efficient interface protocols can do and seeing an expensive enthusiast drive like the 850 pro connected to sata3 just makes it seem more limited than it needed to be.

    all that said, it doesn't change much for the average user, or advanced users even, since for most people a good sized evo or crucial is all they ever need in the years to come. upgrading to expensive drives like the 850 will only make sense for the most demanding users, for the rest it will only get interesting again when pci based storage gets more affordable.
  • Daniel Egger - Tuesday, July 1, 2014 - link

    Minor nit: There's no such thing as "pentalobe torx" it's either one or the other but I'm guessing that it might have been torx security since pentalobe screws have only been used by Apple a couple of years back.
  • iwod - Tuesday, July 1, 2014 - link

    Its great to see its doing well in power consumption area. Which is important in Notebook. I hope we could bring this down to 2W or even 1.5W during operation.

    I really do think our SSD storage tier deserve a PCI-E lane direct from CPU. It would be great if the market just settle on 2x PCI-E 3.0 from CPU. We get 2GB/s out of it. That is plenty of headroom to grow until we move to PCI-E 4.0
  • hojnikb - Tuesday, July 1, 2014 - link

    Thats what sata-express is doing

Log in

Don't have an account? Sign up now