CPU Compute

One side I like to exploit on CPUs is the ability to compute and whether a variety of mathematical loads can stress the system in a way that real-world usage might not.  For these benchmarks we are ones developed for testing MP servers and workstation systems back in early 2013, such as grid solvers and Brownian motion code.  Please head over to the first of such reviews where the mathematics and small snippets of code are available.

3D Movement Algorithm Test

The algorithms in 3DPM employ uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc.  The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score.  This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark.  The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance.  Results are expressed in millions of particles moved per second, and a higher number is better.

3D Particle Movement: Single Threaded3D Particle Movement: Multi-Threaded

N-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity.  Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other.  When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions.  The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code.  We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

N-Body Simulation

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it.  This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step.  By choosing a regular grid, we reduce the levels of memory access required for irregular grids.  We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Explicit Finite Difference Solver (2D)Explicit Finite Difference Solver (3D)

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations.  This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread.  The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing.  For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision.  Again our grid is isotropic with the boundaries acting as sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Implicit Finite Difference Solver (2D)

CPU Real World IGP Compute
Comments Locked

19 Comments

View All Comments

  • Khenglish - Tuesday, December 17, 2013 - link

    Your tRFC and tRRD rows are flipped in the table on the first page.

    That tRFC is ridiculously high. Cut it by 150-200 in XTU (yes, by up to 200. It's that ridiculously high) and you'll see around a 500MB/s bandwidth improvement. For reference I've run 2133 CAS9 stable at tFRC 128 in a laptop.

    Also it would be nice to see results on an IVB system as well as Haswell. IVB is just as good at the top end, but I've seen signs that haswell has a better IMC, so while this memory is stable on haswell, it might not be on IVB. Also most people still have an IVB or SB anyway.
  • Gen-An - Wednesday, December 18, 2013 - link

    You can't cut the tRFC by much on these, they use 4Gbit Hynix H5TQ4G83MFR ICs, and especially not at the high clocks they're running. It's going to be over 200 at about any speed, and you pretty much have to run it at 396 or so for 2933.
  • Hairs_ - Tuesday, December 17, 2013 - link

    Can we please stop this series of articles now? please?
  • jasonelmore - Tuesday, December 17, 2013 - link

    yeah enough with the RAM reviews. we've been on DDR3 for almost 10 years,
  • Jeffrey Bosboom - Tuesday, December 17, 2013 - link

    The only explanation I can think of for all these RAM reviews is that in exchange for samples, Anandtech is obligated to provide brand exposure. Otherwise there'd just be one roundup saying "yeah, they're all about the same". If that's the case, I feel sorry for Ian, who surely has better things to do with his time.
  • Navvie - Wednesday, December 18, 2013 - link

    This is what I figured, the original article was quite interesting. But this is really scraping the bottom of the barrel for article ideas.
  • DanNeely - Wednesday, December 18, 2013 - link

    Gotta agree. At most this should be a twice a year round up; maybe only once yearly depending on how frequently binning changes switch out which clock/cl combinations offer the best bang for the buck.
  • Gen-An - Wednesday, December 18, 2013 - link

    Ian, you state these sticks are using Hynix CFR, but CFR is a 2Gbit IC, it'd be impossible to make an 8GB DIMM with them. This has to be Hynix 4Gbit MFR.
  • ceomrman - Tuesday, December 31, 2013 - link

    Hmmm... couldn't this article just say "premium RAM is a hoax. Just buy some decent sticks from a known brand with plenty of 4 or 5 Egg reviews, and go with 1866 MHz if your motherboard will support it. Go ahead and buy faster RAM, but don't spend much since the performance impact is not noticeable in the real world."
    There are zero users who would not benefit more from some other use of the money, either in their savings account or in the form of a bigger SSD, nicer motherboard, more efficient PSU, faster CPU, etc.
    Dog + pony show = Blech.

Log in

Don't have an account? Sign up now