IGP Compute

One of the touted benefits of Haswell is the compute capability afforded by the IGP.  For anyone using DirectCompute or C++ AMP, the compute units of the HD 4600 can be exploited as easily as any discrete GPU, although efficiency might come into question.  Shown in some of the benchmarks below, it is faster for some of our computational software to run on the IGP than the CPU (particularly the highly multithreaded scenarios). 

Grid Solvers - Explicit Finite Difference on IGP

As before, we test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  We iterate through a series of grid sizes, and results are shown in terms of ‘million nodes per second’ where the peak value is given in the results – higher is better.

Explicit Finite Difference Solver (2D) on IGPExplicit Finite Difference Solver (3D) on IGP

N-Body Simulation on IGP

As with the CPU compute, we run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

N-Body Simulation on IGP

3D Particle Movement on IGP

Similar to our CPU Compute algorithm, we calculate the random motion in 3D of free particles involving random number generation and trigonometric functions.  For this application we take the fastest true-3D motion algorithm and test a variety of particle densities to find the peak movement speed.  Results are given in ‘million particle movements calculated per second’, and a higher number is better.

3D Particle Movement on IGP

Matrix Multiplication on IGP

Matrix Multiplication occurs in a number of mathematical models, and is typically designed to avoid memory accesses where possible and optimize for a number of reads and writes depending on the registers available to each thread or batch of dispatched threads.  He we have a crude MatMul implementation, and iterate through a variety of matrix sizes to find the peak speed.  Results are given in terms of ‘million nodes per second’ and a higher number is better.

Matrix Multiplication on IGP

CPU Compute Overclocking
Comments Locked

23 Comments

View All Comments

  • The Von Matrices - Monday, November 11, 2013 - link

    The silly part is that this is marketed as "gaming" memory while its advantages in gaming on a discrete GPU are minimal. It should be marketed as accelerating applications, which would be a much more reasonable statement. I bought 2400MHz memory not because I play games but because I perform encoding and file compression on my PC, and that is a situation where fast memory makes a difference.

    As far as making a recommendation on value, Ian stated (and I agree) that memory prices are very volatile. It's basically impossible to make a lasting value comparison on memory because of this. What is a great deal today could be eclipsed next week by a dramatic price decrease of a faster, better product. I agree with Ian omitting a value comparison because it would be pointless a month after the article is posted. However, the performance comparisons of different memory speeds and timings are still of value.

    I think the general conclusion he stated is still of value - buy something faster than DDR3-1600 but don't spend too much money because the performance increase is minimal beyond that.
  • DanNeely - Monday, November 11, 2013 - link

    Are any of your planned reviews going to look at the impact of timing relaxation needed to run 4 dimms instead of 2? Having bumped off 12GB a few times I'm now running 18 in my aging i7-920 box; and with both my browsers (Opera, FF) having multi-process upgrades forthcoming that will let them expand beyond the 4GB barrier I've decided on 4x8gb for my new system.
  • The Von Matrices - Monday, November 11, 2013 - link

    I don't understand why you're creating a new term "performance index" instead of just using the more standard time to first word (in ns). It would behave exactly in reverse to your "performance index" with lower times being better but otherwise the comparison would be the same.
  • ShieTar - Tuesday, November 12, 2013 - link

    I agree. Its not only more standard, it is also physically more meaningful, and can be adapted to describe the performance of software with known algorithms E.g. if your ramdisk is reading 512-Byte-sectors from memory, its performance will scale with the "time to get a full sector".

    But of course, frequency is also a much more useful parameter to distinguish electromagnetic signal than wavelength, and you still can't get anybody who learned their field on wavelength to give it up. Once people start to think within certain terms, they are very stubborn about changing definitions.
  • whyso - Monday, November 11, 2013 - link

    If you run IGP benchmarks can you please run at something relevant? 11 fps is not relevant.
  • cmdrdredd - Monday, November 11, 2013 - link

    Still with these big heatsinks on the memory? I almost have to use the low profile Samsung stuff because of my Noctua cooler not allowing much clearance.
  • meacupla - Monday, November 11, 2013 - link

    I think these unnecessarily tall RAM heatsinks are still being made, because the manufacturers think people will use CLC CPU coolers instead of a dual tower heatsink.

    or maybe they think the only people who will buy this type of RAM are people with real water cooling loops.
    or maybe they are for LNG overclocking contests or something.

    Either way, if the customer is sensible enough to buy a tower heatsink in the first place, I'm sure they would also be sensible and buy some lower profile, 1600Mhz or 1866Mhz CAS8 or CAS9 RAM, instead of overkill 2400Mhz.
  • DanNeely - Tuesday, November 12, 2013 - link

    giant ramsinks long predate CLCs. For that matter I'm fairly sure they predate tower style heatsinks as well.
  • Hood6558 - Wednesday, November 13, 2013 - link

    Overkill is best, sensible decisions are for Grandma's email machine...
  • Kamus - Tuesday, November 12, 2013 - link

    Some battlefield 4 tests would've been nice... According to corsair, 2400 memory was giving up to 20% better performance than 1333.mbut I've yet to see another test like that one to corroborate.

Log in

Don't have an account? Sign up now