Memory Scaling on Haswell CPU, IGP and dGPU: DDR3-1333 to DDR3-3000 Tested with G.Skill

Name: Memory Scaling on Haswell CPU, IGP and dGPU: DDR3-1333 to DDR3-3000 Tested with G.Skill
Item: Memory Scaling on Haswell CPU, IGP and dGPU: DDR3-1333 to DDR3-3000 Tested with G.Skill
Author: Dr. Ian Cutress

by Ian Cutress on September 26, 2013 4:00 PM EST

Posted in
Memory
G.Skill
Haswell
DDR3

89 Comments | Add A Comment

89 Comments

One side I like to exploit on CPUs is the ability to compute and whether a variety of mathematical loads can stress the system in a way that real-world usage might not. For these benchmarks we are ones developed for testing MP servers and workstation systems back in early 2013, such as grid solvers and Brownian motion code. Please head over to the first of such reviews where the mathematics and small snippets of code are available.

3D Movement Algorithm Test

The algorithms in 3DPM employ uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc. The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score. This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark. The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance. Results are expressed in millions of particles moved per second, and a higher number is better.

Single threaded results:

For software that deals with a particle movement at once then discards it, there are very few memory accesses that go beyond the caches into the main DRAM. As a result, we see little differentiation between the memory kits, except perhaps a loose automatic setting with 3000 C12 causing a small decline.

Multi-Threaded:

With all the cores loaded, the caches should be more stressed with data to hold, although in the 3DPM-MT test we see less than a 2% difference in the results and no correlation that would suggest a direction of consistent increase.

N-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity. Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other. When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code. We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

Despite co-interaction of many particles, the fact that a simulation of this scale can hold them all in caches between time steps means that memory has no effect on the simulation.

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it. This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step. By choosing a regular grid, we reduce the levels of memory access required for irregular grids. We test both 2D and 3D explicit finite difference simulations with 2ⁿ nodes in each dimension, using OpenMP as the threading operator in single precision. The grid is isotropic and the boundary conditions are sinks. We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Two-Dimensional Grid:

In 2D we get a small bump over at 1600 C9 in terms of calculation speed, with all other results being fairly equal. This would statistically be an outlier, although the result seemed repeatable.

Three Dimensions:

In three dimensions, the memory jumps required to access new rows of the simulation are far greater, resulting in L3 cache misses and accesses into main memory when the simulation is large enough. At this boundary it seems that low CAS latencies work well, as do memory speeds > 2400 MHz. 2400 C12 seems a surprising result.

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations. This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread. The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing. For this we simulate a 2D grid of 2ⁿ nodes in each dimension, using OpenMP in single precision. Again our grid is isotropic with the boundaries acting as sinks. We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

2D Implicit:

Despite the nature if implicit calculations, it would seem that as long as 1333 MHz is avoided, results are fairly similar. 1866 C8 being a surprise outlier.

Memory Scaling on Haswell: CPU Real World Memory Scaling on Haswell: IGP Compute

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

89 Comments

View All Comments

Rainman11 - Tuesday, October 1, 2013 - link
The gaming segment was utterly pointless. Show the difference using a resolution of at least 1080p or don't even bother including it.
Anonymuze - Tuesday, October 1, 2013 - link
I'm really curious to see a similar test on HD5000 or (28W) HD5100 - they don't have the benefit of EDRAM like the HD5200 and should be much closer to being memory bandwidth limited than HD4600.
Anonymuze - Tuesday, October 1, 2013 - link
..."should be much closer to being memory bandwidth limited"
I meant to say "should be much closer to memory bandwidth limits" or "should be much more memory bandwidth limited" - pick one :P
Kathrine647 - Wednesday, October 2, 2013 - link
like Gregory said I am alarmed that a stay at home mom able to earn $5886 in 1 month on the internet. visit their website............B u z z 5 5 . com open the link without spaces
Hrel - Thursday, October 3, 2013 - link
This is a lot of pages on content that all just tells you to buy 1866-CL9. Good to know.
SetiroN - Friday, October 4, 2013 - link
Ian, you REALLY should include code compilation benchmarks.
80% of the people I know who actually need a powerful CPU/RAM/SSD combination use it to build software.
You took the time to test IGP performance (who the spends money on RAM to play on an HD4000?) when you could have provided much more useful data. :)
dreamer77dd - Saturday, October 5, 2013 - link
AMD might like higher speed RAM then Intel. That could be interesting article also.
Laststop311 - Sunday, October 6, 2013 - link
This article just confirmed my suspicions, that this more expensive faster ram basically has no effect on your system. Basically anything 1866+ is going to be relatively the same performance. I use 2133mhz CAS 8 ram in my system and am totally happy and only paid 105 for 4x4GB kit.
SmokingCrop - Sunday, October 27, 2013 - link
What a useless test.. Now we don't even know if resolution matters..
No one is going to be doing crossfire so (s)he can play on 1 monitor with 1360x768 pixels..
qiplayer - Saturday, November 2, 2013 - link
I don't understand testing a 3000mhz kit and to evaluate gaming performance use that resolution (extremely low) and even not one gpu.
I would suggest to once test the difference with this very interesting test on a triple hd resolution with 2 or 3 gpu. Or even better, as we talk about memory for the enthusiasts, cpu should be overclocked, gpu should be at least 2 and overclocked.
Te title cud be: Aiming at 120hz on 5800x1080, how much to spend on the ram?
Maybe it comes out that 150$ more on memory are enough for 5% higher fps, that are not nothing when spending already some $$$$ on gpu to get the best, another $$$ on cpu and $$$$to put all on water.

Memory Scaling on Haswell CPU, IGP and dGPU: DDR3-1333 to DDR3-3000 Tested with G.Skill

Post Your Comment

89 Comments

View All Comments

Rainman11 - Tuesday, October 1, 2013 - link

Anonymuze - Tuesday, October 1, 2013 - link

Anonymuze - Tuesday, October 1, 2013 - link

Kathrine647 - Wednesday, October 2, 2013 - link

Hrel - Thursday, October 3, 2013 - link

SetiroN - Friday, October 4, 2013 - link

dreamer77dd - Saturday, October 5, 2013 - link

Laststop311 - Sunday, October 6, 2013 - link

SmokingCrop - Sunday, October 27, 2013 - link

qiplayer - Saturday, November 2, 2013 - link

Log in

Don't have an account? Sign up now