Original Link: http://www.anandtech.com/show/4503/sandy-bridge-memory-scaling-choosing-the-best-ddr3



Investigating Sandy Bridge Memory Scaling

Intel's Second Generation Core processors, based on the Sandy Bridge architecture, include a number of improvements over the previous generation's Nehalem architecture. We’ll be testing one specific area today: the improved memory controller. Current Sandy Bridge based processors officially support up to DDR3-1333 memory. Unfortunately, due to changes in the architecture, using faster rated memory (or overclocking memory) on Sandy Bridge via raising the base clock is extremely limited. Luckily, there are additional memory multipliers that support DDR3-1600, DDR3-1866, and DDR3-2133 memory. Some motherboards include support for even higher memory multipliers, but we’ll confine our investigations to DDR3-2133 and below.

Since Sandy Bridge is rated for up to DDR3-1333 memory, we will start there and work our way up to DDR3-2133 memory. We'll also be testing a variety of common CAS latency options for these memory speeds. Our purpose is to show how higher bandwidth memory affects performance on Sandy Bridge, and how latency changes—or doesn’t change—the picture. More specifically, we’ll be looking at the impact of memory speed on application and gaming performance, with some synthetic memory tests thrown into the mix. We’ll also test some overclocked configurations. So how much difference will lowering the CAS latency make, and does memory performance scale with processor clock speed?

Back when I originally envisioned this comparison, the price gap between DDR3-1333 and DDR3-2133 memory was much wider. A quick scan of Newegg reveals that a mere $34 separates those two 4GB kits. Below is a breakdown of the lowest prices (as of 7/16/2011) for various memory configurations.

4GB 2x2GB Kits
DDR3-1333 CL9 $31
DDR3-1333 CL8 $40
DDR3-1600 CL9 $40
DDR3-1600 CL8 $41
DDR3-1333 CL7 $45
DDR3-1600 CL7 $50
DDR3-1866 CL9 $60
DDR3-2133 CL9 $65


8GB 2x4GB Kits
DDR3-1333 CL9 $58
DDR3-1600 CL9 $66
DDR3-1333 CL7 $75
DDR3-1600 CL8 $80
DDR3-1866 CL9 $85
DDR3-1600 CL7 $115
DDR3-2133 CL9 $150

You can see from the above chart that balancing memory clocks with latency results in some interesting choices, particularly on the 8GB kits where price differences are a bit larger. Is it best to go with a slower clock speed and better timings, or vice versa, or is the optimal path somewhere in between? That’s the aim of this article.



Test Configuration and Settings

For our testing, we used the following system.

Memory Benchmarking System Configuration
CPU Intel Core i7-2600K (Stock with Turbo Boost enabled: 3.5GHz - 3.8GHz)
Motherboard ASUS P8P67 Pro - BIOS version 1502
Memory Patriot Viper Extreme Division 2 4GB (2x2GB) DDR3-2133 Kit
Graphics MSI GTX 580 Lightning - Stock clocks (832MHz/1050MHz)
SSD OCZ Agility 2 120GB
PSU Corsair HX850 Power Supply
OS Microsoft Windows 7 Professional 64-bit

You’ll notice that we list only one specific set of memory; I don't have specifically rated modules for each of the memory speeds tested. Instead, I used a pair of DDR3-2133 modules that worked flawlessly at all of the lower speeds. Thanks to Patriot for supplying the DDR3-2133 4GB kit used for today's testing. To ensure my results weren't skewed, I tested a pair of DDR3-1600 CL9 modules against the DDR3-2133 CL9 modules running at the lower DDR3-1600 CL9 speed. The results of this test were identical. There may be minor variations between memory brands, but as a baseline measurement of what to expect our testing will be sufficient. We then used the following clock speeds and timings:

Tested Memory Speeds
DDR3-1333 7-7-7-18-2T
8-8-8-18-2T
9-9-9-18-2T
DDR3-1600 7-8-7-21-2T
8-8-8-21-2T
9-9-9-21-2T
DDR3-1866 8-9-8-24-2T
9-9-9-24-2T
DDR3-2133 9-11-9-27-2T

Testing Procedures

Each of the tests were performed three times with the average of those three runs used for the final results. However, there were a few exceptions to this. First, PCMark 7 was only ran once because it loops three times before providing its score. Second, the x264 HD Benchmark was only ran once because it looped four times in a single run. Third and finally, the LINPACK Benchmark was looped twenty-five times because it was also used to test for stability. And with that out of the way, let’s get to the test results.



AIDA64 Memory Benchmark

AIDA64 provides a basic synthetic benchmark for comparing the read, write, and copy performance of system memory while also measuring latency. This should provide us the raw bandwidth for each memory configuration that was tested. Later on, we'll see how this translates into real world performance.

AIDA64 v1.60.1300 - Memory Read

AIDA64 v1.60.1300 - Memory Write

AIDA64 v1.60.1300 - Memory Copy

AIDA64 v1.60.1300 - Memory Latency

Our preliminary results show us the expected memory scaling. The faster DDR3-2133 memory has a ~36% advantage over the slowest DDR3-1333 memory in the read test, and the copy and latency tests show similar results. However, the write test closes the gap with only a ~7% difference between the fastest and the slowest. Overall, we see a linear performance increase as the memory clock speed is raised as well as when the CAS latency is lowered. Synthetic tests are really the best-case scenario, so let's move on to find out how the extra raw bandwidth affects our other tests.



LINPACK Benchmark

At first I wasn't going to include the results of the LINPACK benchmark, but I figured there's no reason for them to go to waste as they were used for stability testing. The LINPACK benchmark is a measurement of a system's floating-point computing power. Today, it's widely used by enthusiasts for testing the stability of their overclocked systems. The later versions of LINPACK include support for Intel's AVX instruction set, which stress the CPU and RAM even more than before. We'll be using a front end to the LINPACK benchmark called LinX.

LinX v0.6.4 - Linpack Benchmark v10.3.4.007

Now we begin to see how that extra ~36% of bandwidth really affects system performance. As you can see, there's not exactly a ~36% advantage in LINPACK from the fastest to the slowest. Here, we're barely seeing a ~3% advantage for the faster memory. Once we get to DDR3-1600, there's not much of a difference at all.

PCMark 7

We'll measure overall system performance using the PCMark suite. This will perform a broad range of tests including video playback, video transcoding (downscaling), system storage (gaming), graphics (DX9), image manipulation, system storage (importing pictures), web browsing, data decrypting, and system storage (Windows Defender).

PCMark 7 v1.0.4 - PCMark Suite

If you take a step back and look at performance from an overall perspective, you can see that faster memory doesn't really have much of an effect. Every speed tested shows a ~2% performance increase over the slowest memory. Outside of CAS 9 DDR3-1333, then, you can pretty much use any DDR3 memory and get close to optimal performance in general applications.



7-Zip

Many people are moving over to 7-Zip for their compression/decompression needs. 7-Zip is not only free and open-source, but it also has a built in benchmark for measuring system performance using the LZMA compression/decompression method. Keep in mind that these tests are ran in memory and bypass any potential disk bottlenecks. The compression routines in particular can put a heavy load on the memory subsystem, as many MB worth of data is scanned for patterns that allow the compression to take place. In a sense, data compression is one of the best real-world tests for memory performance.

7-Zip v9.20 Compressing Benchmark

7-Zip v9.20 Decompressing Benchmark

The compression test shows a linear performance increase with a ~7% variance between the fastest and slowest. If you do a fair amount of compressing, you could potentially save some time in the long run by using faster memory. This, of course, is assuming you're not bottlenecked elsewhere such as in your I/O or CPU performance. The decompression test isn’t affected by faster memory in the same way, as there’s no pattern recognition going on; it’s simply expanding the already found patterns into the original files. With less than 2% separating the range, it's unlikely to make much of a difference if you’re primarily decompressing files.

x264 HD Benchmark

The x264 HD Benchmark measures how fast your system can encode a short HD-quality video clip into a high quality H.264 video file. There are two separate passes performed and compared. Multiple passes are generally used to ensure the highest quality video output, and the first pass tends to be more I/O bound while the second pass is typically constrained by CPU performance.

x264 HD Benchmark v4.0 - Pass 1

x264 HD Benchmark v4.0 - Pass 2

While not a huge spread, we do see a difference of 5% from the fastest to the slowest in the first pass. The second pass, however, shows a less than 2% gain. If encoding is one of your systems primary tasks, it's possible that having faster memory could pay off over time, but a faster CPU will be far more beneficial.

Cinebench 11.5

The Cinebench CPU test scenario uses all of your system's processing power to render a photorealistic 3D scene containing approximately 2,000 objects and nearly 300,000 polygons. This scene makes use of various algorithms to stress all available processor cores, but how does memory speed come into play?

Cinebench R11.5 - CPU

Apparently not much in this benchmark. We're looking at a less than 2% difference from the fastest to the slowest. It's possible that CAS latency is more important for this type of load, but due to the extremely small variance, I don't believe that statement is conclusive. Overall, even a single CPU bin would be enough to close the gap between the fastest and slowest memory we tested.



3DMark 11

We're going to start the graphics benchmarks with the synthetic 3DMark test. The latest version, 3DMark 11, is still very GPU dependent. However, it does include a CPU Physics test and a combined graphics/physics test for simulating those types of loads. We’ll use the overall score with the three subtests to see if we can find any areas where memory performance makes a noticeable difference.

3DMark 11 v1.02 - Performance Preset (Overall)

3DMark 11 v1.02 - Performance Preset (Graphics)

3DMark 11 v1.02 - Performance Preset (Physics)

3DMark 11 v1.02 - Performance Preset (Combined)

The overall score, which is heavily based on the graphics tests, shows a mere ~1% change across the board. When you get to the graphics test, you can see that the faster memory makes absolutely no difference at all. It's not until we get to the physics test where we see some improvement from increasing the memory speed. We get performance boost of up to 11% when going from DDR3-133 to DDR3-2133. The combined test entails the rendering of a 3D scene with the GPU while performing physics tasks on the CPU. Here again, were see a very small 2% increase in performance from the slowest to the fastest.

Crysis and Metro 2033

Based on 3DMark 11, then, we’d expect most games to show very little improvement from upgrading your memory, but we ran several gaming benchmarks just to be sure. I decided to combine the analysis for Crysis: Warhead and Metro 2033 due to the virtually non-existent differences observed during these tests. Crysis: Warhead was the previous king of the hill when it came to bringing video cards to their knees. The newer kid on the block, Metro 2033, has somewhat taken over that throne. Just how do they react to the various memory configurations we're testing today?

It's worth noting that the settings used here are the settings that I would actually play these games at: 1920x1080 with most of the high quality features enabled. Frame rates are well above 30, so definitely playable, though they’re below 60 so some would say they’re not perfectly smooth. Regardless, unless you play at settings where your GPU isn’t the primary bottleneck, you should see similar scaling from memory performance.

Crysis: Warhead - 1920x1080 0xAA DX10 Enthusiast 64-bit - Frost

Metro 2033 - 1920x1080 AAA 16xAF DX11 Very High - Frontline

The results weren't very stimulating, were they? Just as expected, gaming with faster memory just doesn't make any notable difference. I could have potentially lowered the resolution and settings in an attempt to produce some sort of difference, but I felt that testing these games at the settings they're most likely to be played at was far more enlightening. If you want better gaming performance, the GPU is the best component to upgrade—no news there.



Memory Scaling with Overclocking

What happens when we increase the CPU clock speed on our Core i7-2600K from the default 3.5GHz to 4.8GHz; how will that affect memory performance? To find out, I ran the memory bandwidth tests again comparing DDR3-1333 CL9, DDR3-1600 CL9, and DDR3-2133 CL9 at both 3.5GHz and 4.8GHz CPU clock speeds. I also ran the most bandwidth intensive real-world test along with the least bandwidth intensive real-world test at the overclocked CPU speed to see if the faster CPU clock speed made any difference here as well.

AIDA64 v1.60.1300 - Memory Read (Overclocked)

AIDA64 v1.60.1300 - Memory Write (Overclocked)

AIDA64 v1.60.1300 - Memory Copy (Overclocked)

AIDA64 v1.60.1300 - Memory Latency (Overclocked)

The AIDA64 memory benchmark shows that memory bandwidth does scale with CPU clock speed. Going from DDR3-1333 to DDR3-1600 showed a 14% boost on our stock CPU while showing a 16% boost on our overclocked CPU. Stepping up from DDR3-1333 to DDR3-2133 saw a 33% increase on the stock CPU and a 43% increase on our overclocked CPU. The copy and latency tests showed similar results. What's more impressive is that the write test showed a much larger 15% increase from DDR3-1333 to DDR3-1600 on the overclocked CPU compared to 3% on the stock CPU. Going from DDR3-1333 to DDR3-2133 increased write performance by 22% when overclocked compared to 7% when stock. While it's interesting to see how an overclocked CPU affects raw memory bandwidth, I'm much more interested to see how it affects our real-world benchmarks.

x264 HD Benchmark v4.0 - Pass 1 (Overclocked)

x264 HD Benchmark v4.0 - Pass 2  (Overclocked)

Cinebench R11.5 - CPU (Overclocked)

The extra bandwidth gained with the overclocked CPU doesn't exactly translate into much. The first pass of the x264 test reveals a 7% advantage for DDR3-2133 over DDR3-1333 on our overclocked CPU while the stock CPU shows a 5% increase. The increase for DDR3-1600 over DDR3-1333 is 3% for both our overclocked and stock CPUs. Once we move on to the second pass, there's no discernible advantage for faster memory on our overclocked system. The Cinebench test results are every bit as unimpressive with overclocking as at stock: overclocked or not, faster memory makes no real difference (though the faster CPU clock speed definitely helps a lot).



Final Words

I think we confirmed what we pretty much knew all along: Sandy Bridge's improved memory controller has all but eliminated the need for extreme memory bandwidth, at least for this architecture. It's only when you get down to DDR3-1333 that you see a minor performance penalty. The sweet spot appears to be at DDR3-1600, where you will see a minor performance increase over DDR3-1333 with only a slight increase in cost. The performance increase gained by going up to DDR3-1866 or DDR3-2133 isn't nearly as pronounced.

As a corollary, we've seen that some applications do react differently to higher memory speeds than others. The compression and video encoding tests benefited the most from the increased memory bandwidth while the overall synthetic benchmark and 3D rendering test did not. If your primary concern is gaming, you’ll want to consider investing in more GPU power instead of a faster system memory; likewise, a faster CPU will be far more useful than more memory performance for most applications. Outside of chasing ORB chart placement, memory is one of the components least likely to play a significant role in performance.

We also found that memory bandwidth does scale with CPU clock speed; however, it still doesn't translate into any meaningful real-world performance. The sweet spot still appears to be DDR3-1600. All of the extra performance gained by overclocking almost certainly comes from the CPU overclock itself and not from the extra memory bandwidth.

Finally, although the effects of low latency memory can be seen in our bandwidth tests, they don't show any real world advantage over their higher latency (ahem, cheaper) counterparts. None of the real-world tests performed showed any reason to prefer low latency over raw speed.

Even though there's merely a $34 price difference between the fastest and slowest memory tested today, I still don't believe there's any value in the more expensive memory kits on the Sandy Bridge platform. Once you have enough bandwidth (DDR3-1600 at a small $9-$10 price premium), there's just not enough of a performance increase beyond that to justify the additional cost, even when it's only $34 between 4GB kits. Once you jump to the 8GB kits, the price difference for CL9 DDR3-1600 is a mere $8, but it becomes much more pronounced at $92 to move to DDR3-2133. We simply can’t justify such a price difference based on our testing.

Of course, testing with Sandy Bridge doesn't necessarily say anything about other platforms. It's possible that AMD's Llano and Bulldozer platforms will benefit more from higher bandwidth and/or better latency memory, but we'll save that article for another day. Also, we've shown that performance scaling on integrated graphics solutions can benefit, particularly higher performance IGPs like Llano. Ultimately, it's up to you to choose what's best for your particular situation, and we hope this article will help you make better-informed decisions.

Log in

Don't have an account? Sign up now