Final Words

I think we confirmed what we pretty much knew all along: Sandy Bridge's improved memory controller has all but eliminated the need for extreme memory bandwidth, at least for this architecture. It's only when you get down to DDR3-1333 that you see a minor performance penalty. The sweet spot appears to be at DDR3-1600, where you will see a minor performance increase over DDR3-1333 with only a slight increase in cost. The performance increase gained by going up to DDR3-1866 or DDR3-2133 isn't nearly as pronounced.

As a corollary, we've seen that some applications do react differently to higher memory speeds than others. The compression and video encoding tests benefited the most from the increased memory bandwidth while the overall synthetic benchmark and 3D rendering test did not. If your primary concern is gaming, you’ll want to consider investing in more GPU power instead of a faster system memory; likewise, a faster CPU will be far more useful than more memory performance for most applications. Outside of chasing ORB chart placement, memory is one of the components least likely to play a significant role in performance.

We also found that memory bandwidth does scale with CPU clock speed; however, it still doesn't translate into any meaningful real-world performance. The sweet spot still appears to be DDR3-1600. All of the extra performance gained by overclocking almost certainly comes from the CPU overclock itself and not from the extra memory bandwidth.

Finally, although the effects of low latency memory can be seen in our bandwidth tests, they don't show any real world advantage over their higher latency (ahem, cheaper) counterparts. None of the real-world tests performed showed any reason to prefer low latency over raw speed.

Even though there's merely a $34 price difference between the fastest and slowest memory tested today, I still don't believe there's any value in the more expensive memory kits on the Sandy Bridge platform. Once you have enough bandwidth (DDR3-1600 at a small $9-$10 price premium), there's just not enough of a performance increase beyond that to justify the additional cost, even when it's only $34 between 4GB kits. Once you jump to the 8GB kits, the price difference for CL9 DDR3-1600 is a mere $8, but it becomes much more pronounced at $92 to move to DDR3-2133. We simply can’t justify such a price difference based on our testing.

Of course, testing with Sandy Bridge doesn't necessarily say anything about other platforms. It's possible that AMD's Llano and Bulldozer platforms will benefit more from higher bandwidth and/or better latency memory, but we'll save that article for another day. Also, we've shown that performance scaling on integrated graphics solutions can benefit, particularly higher performance IGPs like Llano. Ultimately, it's up to you to choose what's best for your particular situation, and we hope this article will help you make better-informed decisions.

Memory Scaling with Overclocking
Comments Locked

76 Comments

View All Comments

  • mga318 - Monday, July 25, 2011 - link

    You mentioned Llano at the end, but in the Llano reviews & tests, memory bandwidth was tested primarily with little reference to latency. I'd be curious as to which is more important with a higher performance IGP like Llano's. Would CAS 7 (or 6) be preferrable over 1866 or 2166 speeds wtih CAS 8 or 9?
  • DarkUltra - Monday, July 25, 2011 - link

    How about testing Valves particle benchmark or a source based game at low reslution with a non-geometry limited 3d card (fermi) and overclocked cpu? Valve did an incredible job with their game engine. They used a combination of fine-grained and coarse threading to max out all the cpu cores. Very few games can do that today, but may in the future.
  • DarkUltra - Monday, July 25, 2011 - link

    Why test with 4GB? RAM is cheap, most people who buy the premium 2600K should pair it with two 4GB modules. I imagine Windows would require 4GB ram and games the same in the future. Just look at all the .net developers out there, .net usually results in incredible memory bloated programs.
  • dingetje - Monday, July 25, 2011 - link

    hehe yeah
    .net sucks
  • Atom1 - Monday, July 25, 2011 - link

    Most algorithms on CPU platform are optimized to have their data 99% of time inside the CPU cache. If you look at the SisSoft Sandra where there is a chart of bandwidth as a function of block size copied you can see that CPU cache is 10-50x faster than global memory depending on the level. Linpack here is no exception. The primary reason for success of linpack is its ability to have data in CPU cache nearly all of the time. Therefore, if you do find an algorithm which can benefit considerably from global memory bandwidth, you can be sure it is a poor job on the programmers side. I think it is a kind of a challenge to see which operations and applications do take a hit when the main memory is 2x faster or 2x slower. I would be interested to see where is the breaking point, when even well written software starts to take a hit.
  • DanNeely - Monday, July 25, 2011 - link

    That's only true for benchmarks and highly computationally intensive apps (and even there many problem classes can't be packed into the cache or written to stream data into it). In the real world where 99% of software's performance is bound by network IO, HD IO, or user input trying to tune data to maximize the CPU cache is wasted engineering effort. This is why most line of business is written using java or .net, not C++; the finer grained memory control of the latter doesn't benefit anything while the higher level nature of the former allows for significantly faster development.
  • Rick83 - Monday, July 25, 2011 - link

    I think image editing (simple computation on large datasets) and engineering software (numerical simulations) are two types of application that benefit more than average from memory bandwidth, and in the second case, latency.
    But, yeah, with CPU caches reaching the tens of Megabytes, Memory bandwidth and latency is getting less important for many problems.
  • MrSpadge - Wednesday, July 27, 2011 - link

    True.. large matrix operations love bandwidth and low latency never hurts. I've seen ~13% speedup on part of my Matlab code going from DDR3-1333 CL9 to DDR3-1600 CL9 on an i7 870!

    MrS
  • Patrick Wolf - Monday, July 25, 2011 - link

    You don't test CPU gaming benchmarks at normal settings cause you may become GPU limited so why do it here?
    http://www.xbitlabs.com/articles/memory/display/sa...
  • dsheffie - Monday, July 25, 2011 - link

    ....uh...Linpack is just LU which in turn is just DGEMM. DGEMM has incredible operand reuse (O(sqrt(cache size)).

Log in

Don't have an account? Sign up now