Original Link: http://www.anandtech.com/show/2973/6gbps-sata-performance-amd-890gx-vs-intel-x58-p55



Earlier this month my Crucial RealSSD C300 died in the middle of testing for AMD’s 890GX launch. This was a problem for two reasons:

1) Crucial’s RealSSD C300 is currently shipping and selling to paying customers. The 256GB drive costs $799.
2) AMD’s 890GX is the first chipset to natively support 6Gbps SATA. The C300 is the first SSD to natively support the standard as well. Butter, meet toast.

Since then, Crucial dispatched a new drive and discovered what happened to my first drive (more on this in a separate update). While waiting for the autopsy report, I decided to look at 890GX 6Gbps performance since it was absent from my original review.


AMD's SB850 with Native 6Gbps SATA

In the 890GX review I found that AMD’s new South Bridge, the SB850, wasn’t quite as fast as Intel’s ICH/PCH when dealing with the latest crop of high performance SSDs. My concerns were particularly about high bandwidth or high IOPS situations, admittedly things that you only bump into if you’re spending a good amount of money on an SSD. Case in point, here is OCZ’s Vertex LE running on an AMD 890GX compared to an Intel X58:

Iometer 6-22-2008 Performance 2MB Sequential Read 2MB Sequential Write 4KB Random Read 4KB Random Write (4K Aligned)
AMD 890GX 248 MB/s 217.5 MB/s 38.4 MB/s 130.1 MB/s
Intel H55 264.9 MB/s 247.7 MB/s 48.6 MB/s 180 MB/s

 

My concern was that if 3Gbps SSDs were underperfoming on the SB850, then 6Gbps SSDs definitely would.

Other reviewers had mixed results with the SB850. Some boards did well while others did worse. I also discovered that AMD’s own internal testing is done on an internal reference board with both Cool’n’Quiet and SB power management disabled, which is why disabling CnQ improved performance in my results. As far as why AMD does any of its own internal testing in such a way, your guess is as good as mine.

I received an ASUS 890GX board for this followup and updated to the latest BIOS on that board. That didn’t fix my performance problems. Using AMD’s latest SB850 AHCI drivers however (1.2.0.164), did...sort of:

Iometer 6-22-2008 Performance 2MB Sequential Read 2MB Sequential Write 4KB Random Read 4KB Random Write (4K Aligned)
AMD 890GX (3/2/10) 248 MB/s 217.5 MB/s 38.4 MB/s 130.1 MB/s
AMD 890GX (3/25/10) 253.5 MB/s 223.8 MB/s 51.2 MB/s 152.1 MB/s
Intel H55 264.9 MB/s 247.7 MB/s 48.6 MB/s 180 MB/s

 

All performance improved, but we’re still looking at lower performance compared to Intel’s 3Gbps SATA controller except for random read speed. Random read speed is faster on the 890GX (but slower than X58).

The best part of it all is that I no longer had to disable CnQ or C1E to get this performance. I will note that my performance is still lower than what AMD is getting on its internal reference board and the performance from 3rd party boards varies significantly from one board to the next depending on board and BIOS revisions. But at least we’re getting somewhere.

In testing the 890GX, I decided to look into how Intel’s chipsets perform with this new wave of high performance SSDs. It’s not as straightforward as you’d think.



The Primer: PCI Express 1.0 vs. 2.0

A serial interface, PCI Express is organized into lanes. Each lane has an independent set of transmit and receive pins, and data can be sent in both directions simultaneously. And here’s where things get misleading. Bandwidth in a single direction for a single PCIe 1.0 lane (x1) is 250MB/s, but because you can send and receive 250MB/s at the same time Intel likes to state the bandwidth available to a PCIe 1.0 x1 slot as 500MB/s. While that is the total aggregate bandwidth available to a single slot, you can only reach that bandwidth figure if you’re reading and writing at the same time.


One of our first encounters with PCI Express was at IDF in 2002

PCI Express 2.0 doubles the bidirectional bandwidth per lane. Instead of 250MB/s in each direction per lane, you get 500MB/s.

Other than graphics, there haven’t been any high bandwidth consumers on the PCIe bus in desktops. Thus the distinction between PCIe 1.0 and 2.0 has never really mattered. Today, both USB 3.0 and 6Gbps SATA aim to change that. Both can easily saturate a PCIe 1.0 x1 connection.


Intel's X58 Chipset. The only PCIe 2.0 lanes come from the IOH.

This is a problem because all Intel chipsets have a combination of PCIe 1.0 and 2.0 slots. Intel’s X58 chipset for example has 36 PCIe 2.0 lanes off of the X58 IOH, plus an additional 6 PCIe 1.0 lanes off the ICH. AMD’s 7 and 8 series chipsets don’t have any PCIe 1.0 slots.


AMD's 890GX doesn't have any PCIe 1.0 lanes

No desktop chipset natively supports both 6Gbps SATA and USB 3.0. AMD’s 8-series brings native 6Gbps SATA support, but USB 3 still requires an external controller. On Intel chipsets, you need a separate controller for both 6Gbps SATA and USB 3.

These 3rd party controllers are all PCIe devices, just placed on the motherboard. NEC’s µPD720200 is exclusively used by all motherboard manufacturers for enabling USB 3.0 support. The µPD720200 has a PCIe 2.0 x1 interface and supports two USB 3.0 ports.

The USB 3 spec calls for transfer rates of up to 500MB/s. Connected to a PCIe 2.0 interface, you get 500MB/s up and down, more than enough bandwidth for the controller. However if you connect the controller to a PCIe 1.0 interface, you only get half that (and even less in practice). It’s not a problem today but eventually, with a fast enough USB 3 device, you’d run into a bottleneck.

The 6Gbps situation isn’t any better. Marvell’s 88SE91xx PCIe 2.0 controller is the only way to enable 6Gbps SATA on motherboards (other than 890GX boards) or add-in cards today.

The interface is only a single PCIe 2.0 lane. The 6Gbps SATA spec allows for up to 750MB/s of bandwidth, but the PCIe 2.0 x1 interface limits read/write speed to 500MB/s. Pair it with a PCIe 1.0 x1 interface and you’re down to 250MB/s (and much less in reality due to bus overhead).



Crucial’s RealSSD C300 - The Perfect Test Candidate

The C300 is capable of pushing over 300MB/s in sequential reads. More than enough bandwidth to need 6Gbps SATA as well as expose limitations from PCIe 1.0 slots.

To test the C300 I’m using Highpoint’s RocketRAID 62X. This PCIe 2.0 x1 card has a Marvell 88SE9128 6Gbps controller on it.

What About P5x/H5x?

Unlike Intel’s X58, the P55 and H5x chipsets don’t have any PCIe 2.0 lanes. The LGA-1156 Core i7/5/3 processors have an on-die PCIe 2.0 controller with 16 lanes, but the actual chipset only has 8 PCIe 1.0 lanes. And as we’ve already established, a single PCIe 1.0 lane isn’t enough to feed a bandwidth hungry SSD on a 6Gbps SATA controller.

Gigabyte does the obvious thing and uses the PCIe 2.0 lanes coming off the CPU for USB 3 and 6Gbps SATA. This works perfectly if you are using integrated graphics. If you’re using discrete graphics, you have the option of giving it 8 lanes and have the remaining lanes used by USB 3/SATA 6Gbps. Most graphics cards are just fine running in x8 mode so it’s not too big of a loss. If you have two graphics cards installed however, Gigabyte’s P55 boards will switch to using the PCIe 1.0 lanes from the P55/H5x.

ASUS uses the same approach on its lower end P55 boards, but takes a different approach on its SLI/CF P55 boards. Enter the PLX PEX8608:

The PLX PEX8608 combines 4 PCIe x1 lanes and devotes their bandwidth to the Marvell 6Gbps controller. You lose some usable PCIe lanes from the PCH, but you get PCIe 2.0-like performance from the Marvell controller.

For most users, ASUS and Gigabyte’s varying approaches should deliver the same results. If you are running a multi-GPU setup), then ASUS’ approach makes more sense if you are planning on using a 6Gbps SATA drive. The downside is added cost and power consumed by the PLX chip (an extra ~1.5W).



The Test Platforms

To see how much we’d be limited by Intel’s PCIe 1.0 slots and AMD’s new SB850, I put together a handful of test platforms.

I’ve got ASUS’ 890GX motherboard equipped with native 6Gbps SATA support. This board/chipset should give us full bandwidth to the Crucial RealSSD C300:


ASUS' M4A89GTD Pro/USB3

I’ve got Intel’s own X58 motherboard. With no on-board 6Gbps support I installed my RocketRAID 62X card into a PCIe 2.0 x16 slot, a PCIe 1.0 x4 slot and a PCIe 1.0 x1 slot.


Intel's DX58SO

Gigabyte sent its X58-UD3R motherboard, which has a Marvell 6Gbps controller branching off one the X58’s PCIe 2.0 lanes.

Next up is Intel’s P55 board where I use one of the x16 slots branching off the CPU socket, as well as a PCIe 1.0 x1 slot from the PCH. The results here should be equal to a H55/H57 platform, which I also verified.

Finally I’ve got ASUS’ P7H57D-V EVO with the PLX solution, just to see how well combining a bunch of PCIe 1.0 lanes to feed Marvell’s 6Gbps SATA controller works.



The First Test: Sequential Read Speed

The C300 can break 300MB/s in sequential read performance so it’s the perfect test for 6Gbps SATA bandwidth.

Intel’s X58 is actually the best platform here, delivering over 340MB/s from the C300 itself. If anything, we’re bound by the Marvell controller or the C300 itself in this case. AMD’s 890GX follows next at 319MB/s. It’s faster than 3Gbps SATA for sure, but just not quite as fast as the Marvell controller on an Intel X58.

The most surprising is that using the Marvell controller on Intel’s P55 platform, even in a PCIe 2.0 x16 slot, only delivers 308MB/s of read bandwidth. The PCIe controller is on the CPU die and should theoretically be lower latency than anything the X58 can muster, but for whatever reason it actually delivers lower bandwidth than the off-die X58 PCIe controller. This is true regardless of whether we use Lynnfield or Clarkdale in the motherboard, or if we’re using a P55, H55 or H57 motherboard. All platform/CPU combinations result in performance right around 310MB/s - a good 30MB/s slower than the X58. Remember that this is Intel’s first on-die PCIe implementation. It’s possible that performance is lower in order to first ensure compatibility. We may see better performance out of Sandy Bridge in 2011.

Using any of the PCIe 1.0 slots delivers absolutely horrid performance. Thanks to encoding and bus overhead, the most we can get out of PCIe 1.0 slot is ~192MB/s with our setup. Intel’s X58 board has a PCIe 1.0 x4 that appears to give us better performance than any other 1.0 slot for some reason despite us only using 1 lane on it.

Using one of the x1 slots on a P55 motherboard limits us to a disappointing 163.8MB/s. In other words, there’s no benefit to even having a 6Gbps drive here. ASUS PLX implementation however fixes that right up - at 336.9MB/s it’s within earshot of Intel’s X58.

It’s also worth noting that you’re better off using your 6Gbps SSD on one of the native 3Gbps SATA ports rather than use a 6Gbps card in a PCIe 1.0 slot. Intel’s native SATA ports read at ~265MB/s - better than the Marvell controller on any PCIe 1.0 slot.



Random Read Performance is Also Affected

It’s not all about peak bandwidth either. Remember that bandwidth and latency are related, so it’s not all too surprising that the setups that delivered the least amount of bandwidth, also hurt small file read speed.

The target here is around 80MB/s. That’s what Intel’s X58 can do off one of its native 3Gbps SATA ports. Let’s see how everything else fares:

At 80MB/s the Crucial RealSSD C300 is pushing roughly 20,000 IOPS in this test. The highest random read speed of any MLC SSD we’ve ever tested in fact. With the 890GX the C300 can only manage 64.3MB/s.

Naturally I shared my data with AMD before publishing, including my Iometer test scripts. Running on its internal 890GX test platform, AMD was able to achieve a 4KB random read speed of 102.6MB/s in this test - faster than anything I’d ever tested. Unfortunately that appears to be using AMD’s own internal reference board and not one of the publicly available 890GX platforms. The good news is that if AMD’s numbers are accurate, there is hope for 890GX’s SATA performance. It’s just a matter of getting the 3rd party boards up to speed (AMD has since shared some more results with me that show performance with some beta BIOSes on 3rd party boards improving even more).

Using the Marvell 6Gbps controller in any PCIe 2.0 slot (or off a PCIe 2.0 interface as is the case with Gigabyte’s X58), or in one of ASUS’ 6Gbps ports behind the PLX switch, yields peak performance more or less.

Any of the PCIe 1.0 slots however saw a drop from ~80MB/s to ~65MB/s. The exception being Intel’s odd x4 slot that is a PCIe 1.0 slot, but branches off the X58 IOH and thus appears to offer lower latency than PCIe 1.0 slots dangling off the ICH.



Write Performance Isn’t Safe Either

Testing read performance is the easiest since I just fill the SSD with data once and can read off it multiple times. To achieve repeatable write performance however I have to secure erase the drive between each test. Not impossible, but annoying given that only certain motherboards allow me to drop the SATA ports into legacy mode which is necessary for the DOS based secure erase application to work. For that reason I’m only providing a small subset of my testbeds here to prove that write speed is also impacted:

Here the results are even more frustrating. Paired with a PCIe 1.0 slot, random write speed is virtually cut in half. The frustration comes from the fact that Intel’s native 3Gbps controller is faster than almost anything else here.

I say almost because we do have one exception. AMD’s 890GX delivers a staggering 180MB/s in random write performance, a full 31% faster than Intel’s X58. The random write speed makes me believe that with a bit of driver and/or BIOS work we can get random read performance up there as well.

Performance in the Real World

These differences are visible in the real world as well. I took four systems and copied a 10GB file from the C300 to itself and measured average write speed:

The real takeaway here is that sticking a 6Gbps controller behind a PCIe 1.0 slot wreaks havoc on performance.



Final Words

AMD’s ATI acquisition was about bringing graphics to the portfolio with the eventual goal of integration into the CPU itself. We’ll see the first of that early next year with Llano. But as AMD goes down this integration route, it needs to make sure that its chipsets are at least up to par with Intel’s. Many have complained about AMD’s South Bridges in the past, but with SB850 we’ve actually seen some real improvement. There still appear to be some strange behaviors and I don’t like that there’s any discrepancy between AMD’s reference board and retail 890GX boards, but these results look very promising.

AMD’s native 6Gbps implementation manages to outperform both Marvell and Intel’s controllers in the 4KB random write test by a substantial margin. AMD’s sequential read speed is lower than the Marvell controller, and random read speed is lower than Intel’s 3Gbps controller. With a bit of work, AMD looks like it could have the best performing SATA controller on the market.

Intel’s X58 still has a few tricks left up its sleeve - it manages to be a very high performing 3Gbps SATA controller. Other than in sequential read speed, it’s even faster than Marvell’s 6Gbps controller with a 6Gbps SSD - although not by much.


Marvell makes the only 6Gbps SSD controller today. By next year that will change.

The P55 and H55 platforms are far less exciting. Any 6Gbps controller connected off the PCH is severely limited by Intel’s use of PCIe 1.0 slots. Unfortunately this means that you’ll have to use the 16 PCIe 2.0 lanes branching off the CPU for any real performance. That either results in you limiting your GPU to only 8 lanes or dropping back down to PCIe 1.0 if you have two graphics cards installed. ASUS’ PLX solution is an elegant workaround for the specific case of a user having two graphics cards and a 6Gbps SATA controller on-board. Our tests show that it does work well.

We have to give AMD credit here. Its platform group has clearly done the right thing. By switching to PCIe 2.0 completely and enabling 6Gbps SATA today, its platforms won’t be a bottleneck for any early adopters of fast SSDs. For Intel these issues don't go away until 2011 with the 6-series chipsets (Cougar Point) which will at least enable 6Gbps SATA.

Log in

Don't have an account? Sign up now