One of the touted benefits of Haswell is the compute capability afforded by the IGP.  For anyone using DirectCompute or C++ AMP, the compute units of the HD 4600 can be exploited as easily as any discrete GPU, although efficiency might come into question.  Shown in some of the benchmarks below, it is faster for some of our computational software to run on the IGP than the CPU (particularly the highly multithreaded scenarios). 

Grid Solvers - Explicit Finite Difference on IGP

As before, we test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Two Dimensional:

The results on the IGP are 50% higher than those on the CPU, and it would seem that memory can make a difference as well.  As long as 1333 MHz is not chosen, there is at least a 2% gain to be had.  Otherwise, the next jump up is at 2666 MHz for another 2%, which might not be cost effective.

Three Dimensional:

The 3D results seem to be a little haphazard, with 1333 C7 and 2400 C9 both performing well.  1600 C11 definitely is out of the running, although anything 2400 MHz or above affords almost a 10%+ benefit.

N-Body Simulation on IGP

As with the CPU compute, we run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

In terms of a workload that calculates FLOPs, the operational workload does not seem to be affected by memory.

3D Particle Movement on IGP

Similar to our CPU Compute algorithm, we calculate the random motion in 3D of free particles involving random number generation and trigonometric functions.  For this application we take the fastest true-3D motion algorithm and test a variety of particle densities to find the peak movement speed.  Results are given in ‘million particle movements calculated per second’, and a higher number is better.

Despite this result being over 35x the equivalent calculation on a fully multithreaded 4770K CPU (200 vs. 7000), again there seems little difference between memory speeds.  3000 C12 gets a small peak over the rest, similar to the n-Body test.

Matrix Multiplication on IGP

Matrix Multiplication occurs in a number of mathematical models, and is typically designed to avoid memory accesses where possible and optimize for a number of reads and writes depending on the registers available to each thread or batch of dispatched threads.  He we have a crude MatMul implementation, and iterate through a variety of matrix sizes to find the peak speed.  Results are given in terms of ‘million nodes per second’ and a higher number is better.

Matrix Multiplication on this scale seems to vary little between memory settings, although a shift towards the lower CL timings gives a marginally (though statistically minor) better result.

3D Particle Movement on IGP

Similar to our 3DPM Multithreaded test, except we run the fastest of our six movement algorithms with several million threads, each moving a particle in a random direction for a fixed number of steps.  Final results are given in million movements per second, and a higher number is better.

While there is a slight dip using 1333 C9, in general almost all of our memory timing settings perform roughly the same.  The peak shown using our memory kit at its XMP rated timings are presumably more due to the adjustments in BCLK which need to be made in order to hit this memory frequency.

Memory Scaling on Haswell: CPU Compute Memory Scaling on Haswell: IGP Gaming
Comments Locked

89 Comments

View All Comments

  • MrSpadge - Thursday, September 26, 2013 - link

    Is your HDD scratching because you're running out of RAM? Then an upgrade is worth it, otherwise not.
  • nevertell - Thursday, September 26, 2013 - link

    Why does going from 2933 to 3000, with the same latencies, automatically make the system run slower on almost all of the benchmarks ? Is it because of the ratio between cpu, base and memory clock frequencies ?
  • IanCutress - Thursday, September 26, 2013 - link

    Moving to the 3000 MHz setting doesn't actually move to the 3000 MHz strap - it puts it on 2933 and adds a drop of BCLK, meaning we had to drop the CPU multiplier to keep the final CPU speed (BCLK * multi) constant. At 3000 MHz though, all the subtimings in the XMP profile are set by the SPD. For the other MHz settings, we set the primaries, but we left the motherboard system on auto for secondary/tertiary timings, and it may have resulted in tighter timings under 2933. There are a few instances where the 3000 kit has a 2-3% advantage, a couple where it's at a disadvantage, but the rest are around about the same (within some statistical variance).

    Ian
  • mikk - Thursday, September 26, 2013 - link

    What a stupid nonsense these iGPU Benchmarks. Under 10 fps, are you serious? Do it with some usable fps and not in a slide show.
  • MrSpadge - Thursday, September 26, 2013 - link

    Well, that's the reality of gaming on these iGPUs in low "HD" resolution. But I actually agree with you: running at 10 fps is just not realistic and hence not worth much.

    The problem I see with these benchmarks is that at maximum detail settings you're putting en emphasis on shaders. By turning details down you'd push more pixels and shift the balance towards needing more bandwidth to achieve just that. And since in any real world situation you'd see >30 fps, you ARE pushing more pixels in these cases.
  • RYF - Saturday, September 28, 2013 - link

    The purpose was to put the iGPU into strain and explore the impacts of having faster memory in improving the performance.

    You seriously have no idea...
  • MrSpadge - Thursday, September 26, 2013 - link

    Your benchmark choices are nice, but I've seen quite a few "real world" applications which benefit far more from high-performance memory:
    - matrix inversion in Matlab (Intel MKL), probably in other languages / libs too
    - crunching Einstein@Home (BOINC) on all 8 threads
    - crunching Einstein@Home on 7 threads and 2 Einstein@Home tasks on the iGPU
    - crunching 5+ POEM@Home (BOINC) tasks on a high end GPU

    It obviously depends on the user how real the "real world" applications are. For me they are far more relevant than my occasional game, which is usually fast enough anyway.
  • MrSpadge - Thursday, September 26, 2013 - link

    Edit: in fact, I have set a maximum of 31 fps in PrecisionX for my nVidia, so that the games don't eat up too much crunching time ;)
  • Oscarcharliezulu - Thursday, September 26, 2013 - link

    Yep it'd be interesting to understand where extra speed does help, eg database, j2ee servers, cad, transactional systems of any kind, etc. otherwise great read and a great story idea, thanks.
  • willis936 - Thursday, September 26, 2013 - link

    SystemCompute - 2D Ex CPU 1600CL10. Nice.

Log in

Don't have an account? Sign up now