Sandy Bridge Memory Scaling: Choosing the Best DDR3

Name: Sandy Bridge Memory Scaling: Choosing the Best DDR3
Item: Sandy Bridge Memory Scaling: Choosing the Best DDR3
Author: Jared Bell

by Jared Bell on July 25, 2011 1:55 AM EST

76 Comments | Add A Comment

76 Comments

7-Zip

Many people are moving over to 7-Zip for their compression/decompression needs. 7-Zip is not only free and open-source, but it also has a built in benchmark for measuring system performance using the LZMA compression/decompression method. Keep in mind that these tests are ran in memory and bypass any potential disk bottlenecks. The compression routines in particular can put a heavy load on the memory subsystem, as many MB worth of data is scanned for patterns that allow the compression to take place. In a sense, data compression is one of the best real-world tests for memory performance.

7-Zip v9.20 Compressing Benchmark

7-Zip v9.20 Decompressing Benchmark

The compression test shows a linear performance increase with a ~7% variance between the fastest and slowest. If you do a fair amount of compressing, you could potentially save some time in the long run by using faster memory. This, of course, is assuming you're not bottlenecked elsewhere such as in your I/O or CPU performance. The decompression test isn’t affected by faster memory in the same way, as there’s no pattern recognition going on; it’s simply expanding the already found patterns into the original files. With less than 2% separating the range, it's unlikely to make much of a difference if you’re primarily decompressing files.

x264 HD Benchmark

The x264 HD Benchmark measures how fast your system can encode a short HD-quality video clip into a high quality H.264 video file. There are two separate passes performed and compared. Multiple passes are generally used to ensure the highest quality video output, and the first pass tends to be more I/O bound while the second pass is typically constrained by CPU performance.

x264 HD Benchmark v4.0 - Pass 1

x264 HD Benchmark v4.0 - Pass 2

While not a huge spread, we do see a difference of 5% from the fastest to the slowest in the first pass. The second pass, however, shows a less than 2% gain. If encoding is one of your systems primary tasks, it's possible that having faster memory could pay off over time, but a faster CPU will be far more beneficial.

Cinebench 11.5

The Cinebench CPU test scenario uses all of your system's processing power to render a photorealistic 3D scene containing approximately 2,000 objects and nearly 300,000 polygons. This scene makes use of various algorithms to stress all available processor cores, but how does memory speed come into play?

Cinebench R11.5 - CPU

Apparently not much in this benchmark. We're looking at a less than 2% difference from the fastest to the slowest. It's possible that CAS latency is more important for this type of load, but due to the extremely small variance, I don't believe that statement is conclusive. Overall, even a single CPU bin would be enough to close the gap between the fastest and slowest memory we tested.

LINPACK and PCMark 7 Graphics and Gaming

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

76 Comments

View All Comments

mga318 - Monday, July 25, 2011 - link
You mentioned Llano at the end, but in the Llano reviews & tests, memory bandwidth was tested primarily with little reference to latency. I'd be curious as to which is more important with a higher performance IGP like Llano's. Would CAS 7 (or 6) be preferrable over 1866 or 2166 speeds wtih CAS 8 or 9?
DarkUltra - Monday, July 25, 2011 - link
How about testing Valves particle benchmark or a source based game at low reslution with a non-geometry limited 3d card (fermi) and overclocked cpu? Valve did an incredible job with their game engine. They used a combination of fine-grained and coarse threading to max out all the cpu cores. Very few games can do that today, but may in the future.
DarkUltra - Monday, July 25, 2011 - link
Why test with 4GB? RAM is cheap, most people who buy the premium 2600K should pair it with two 4GB modules. I imagine Windows would require 4GB ram and games the same in the future. Just look at all the .net developers out there, .net usually results in incredible memory bloated programs.
dingetje - Monday, July 25, 2011 - link
hehe yeah
.net sucks
Atom1 - Monday, July 25, 2011 - link
Most algorithms on CPU platform are optimized to have their data 99% of time inside the CPU cache. If you look at the SisSoft Sandra where there is a chart of bandwidth as a function of block size copied you can see that CPU cache is 10-50x faster than global memory depending on the level. Linpack here is no exception. The primary reason for success of linpack is its ability to have data in CPU cache nearly all of the time. Therefore, if you do find an algorithm which can benefit considerably from global memory bandwidth, you can be sure it is a poor job on the programmers side. I think it is a kind of a challenge to see which operations and applications do take a hit when the main memory is 2x faster or 2x slower. I would be interested to see where is the breaking point, when even well written software starts to take a hit.
DanNeely - Monday, July 25, 2011 - link
That's only true for benchmarks and highly computationally intensive apps (and even there many problem classes can't be packed into the cache or written to stream data into it). In the real world where 99% of software's performance is bound by network IO, HD IO, or user input trying to tune data to maximize the CPU cache is wasted engineering effort. This is why most line of business is written using java or .net, not C++; the finer grained memory control of the latter doesn't benefit anything while the higher level nature of the former allows for significantly faster development.
Rick83 - Monday, July 25, 2011 - link
I think image editing (simple computation on large datasets) and engineering software (numerical simulations) are two types of application that benefit more than average from memory bandwidth, and in the second case, latency.
But, yeah, with CPU caches reaching the tens of Megabytes, Memory bandwidth and latency is getting less important for many problems.
MrSpadge - Wednesday, July 27, 2011 - link
True.. large matrix operations love bandwidth and low latency never hurts. I've seen ~13% speedup on part of my Matlab code going from DDR3-1333 CL9 to DDR3-1600 CL9 on an i7 870!

MrS
Patrick Wolf - Monday, July 25, 2011 - link
You don't test CPU gaming benchmarks at normal settings cause you may become GPU limited so why do it here?
http://www.xbitlabs.com/articles/memory/display/sa...
dsheffie - Monday, July 25, 2011 - link
....uh...Linpack is just LU which in turn is just DGEMM. DGEMM has incredible operand reuse (O(sqrt(cache size)).

Sandy Bridge Memory Scaling: Choosing the Best DDR3

Post Your Comment

76 Comments

View All Comments

mga318 - Monday, July 25, 2011 - link

DarkUltra - Monday, July 25, 2011 - link

DarkUltra - Monday, July 25, 2011 - link

dingetje - Monday, July 25, 2011 - link

Atom1 - Monday, July 25, 2011 - link

DanNeely - Monday, July 25, 2011 - link

Rick83 - Monday, July 25, 2011 - link

MrSpadge - Wednesday, July 27, 2011 - link

Patrick Wolf - Monday, July 25, 2011 - link

dsheffie - Monday, July 25, 2011 - link

Log in

Don't have an account? Sign up now