Gigabyte GA-7PESH1 Review: A Dual Processor Motherboard through a Scientist’s Eyes

Name: Gigabyte GA-7PESH1 Review: A Dual Processor Motherboard through a Scientist’s Eyes
Item: Gigabyte GA-7PESH1 Review: A Dual Processor Motherboard through a Scientist’s Eyes
Author: Dr. Ian Cutress

by Ian Cutress on January 5, 2013 10:00 AM EST

64 Comments | Add A Comment

64 Comments

Two Dimensional Implicit Finite Difference

The ‘Finite Difference’ part of this computational grid solver means that the derivation of this method is similar to that shown in the Explicit Finite Difference method on the previous page. We are presented with the following equation which explains Fick’s first law of diffusion for mass transport in three dimensions:

[8]

The implicit method takes the view that the concentrations at time t+1 are a series of unknowns, and the equations are thus coupled into a series of simultaneous equations with an equal set of unknowns, which must be solved together:

[9]

[10]

The implicit method is algorithmically more complex than the explicit method, but does offer the advantage of unconditional stability with respect to time.

The Alternating Direction Implicit (ADI) Method

For a system in two dimensions (labelled r and z), such as a microdisk simulation, the linear system has to be solved in both directions using Fick’s Laws:

[11]

The alternating direction implicit (ADI) method is a straightforward solution to solving what are essentially two dimensional simultaneous equations whilst retaining a high degree of algorithm stability.

ADI splits equation [11] into two half time steps – by treating one dimension explicitly and the other dimension implicitly in the same half time step. Thus the explicit values known in one direction are fed into the series of simultaneous equations to solve the other direction. For example, using the r direction explicitly to solve the z direction implicitly:

[12]

[13]

By solving equation [13] for the concentrations in the z direction, the next half time step concentrations can be calculated for the r direction, and so on until the desired time in simulation is achieved. These time step equations are solved using the Thomas Algorithm for tri-diagonal matrices.

Application to this Review

For the purposes of this review, we generate an x-by-x grid of points with a relative concentration of 1, where the boundaries are fixed at a relative concentration of 0. The grid is a regular grid, simplifying calculations. The nature of the simulation allows that for each half-time step to focus on calculating in one dimension, for a simulation of x-by-x nodes we can spawn x threads as adjacent rows/columns (depending on direction) are independent. These threads, in comparison to the explicit finite difference, are substantially bulkier in terms of memory usage.

The code was written in Visual Studio C++ 2012 with OpenMP as the multithreaded source. The main function to do the calculations is as follows.

For our scores, we increase the size of the grid from a 2x2 until we hit 2GB memory usage. At each stage, the time taken to repeatedly process the grid over many time steps is calculated in terms of ‘million nodes per second’, and the peak value is used for our results.

Implicit Finite Difference: 2D

Previously where the explicit 2D method was indifferent to HyperThreading and the explicit 3D method was very sensitive; the implicit 2D is a mix of both. There are still benefits to be had from enabling HyperThreading. Nevertheless, the line between single processor systems and dual processors is being blurred a little due to the different speeds of the SP results, but in terms of price/performance, the DP system is at the wrong end.

Two and Three Dimensional Explicit Finite Difference Simulations Brownian Motion

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

64 Comments

View All Comments

mayankleoboy1 - Saturday, January 5, 2013 - link
Ian :

How much difference do you think Xeon Phi will make in these very different type of Computations?
Will buying a Xeon Phi "pay itself out" as you said in the above comments ? (or is xeon phi linux only ?)
IanCutress - Saturday, January 5, 2013 - link
As far as we know, Xeon Phi will be released for Linux only to begin with. I have friends who have been able to play with them so far, and getting 700 GFlops+ in DGEMM in double precision.

It always comes down to the algorithm with these codes. It seems that if you have single precision code that doesn't mind being in a 2P system, then the GPU route may be preferable. If not, then Phi is an option. I'm hoping to get my hands on one inside H1 this year. I just have to get my hands dirty with Linux as well.

In terms of the codes used here, if I were to guess, the Implicit Finite Difference would probably benefit a lot from Xeon Phi if it works the way I hope it does.

Ian
mayankleoboy1 - Saturday, January 5, 2013 - link
Rather stupid question, but have you tried using PGO builds ?
Also, do you build the code with the default optimizations, or use the MSVC equivalent switch of -O2 ?
IanCutress - Saturday, January 5, 2013 - link
Using Visual Studio 2012, all the speed optimisations were enabled including /GL, /O2, /Ot and /fp:fast. For each part I analysed the sections which took the most time using the Performance Analysis tools, and tried to avoid the long memory reads. Hence the Ex-FD uses an iterative loading which actually boosts speed by a good 20-30% than without it.

Ian
Klimax - Sunday, January 6, 2013 - link
Interesting. Why not Ox (all optimisations on)

BTW: Do you have access to VTune?
IanCutress - Wednesday, January 9, 2013 - link
In case /Ox performs an optimisation for memory over speed in an attempt to balance optimisations. As speed is priority #1, it made more sense to me to optimise for that only. If VS2012 gave more options, I'd adjust accordingly.

Never heard of VTune, but I did use the Performance Analysis tools in VS2012 to optimise certain parts of the code.

Ian
Beenthere - Saturday, January 5, 2013 - link
Business and mobo makers do not use 2P mobos to get high benches or performance bragging rights per se. These systems are build for bullet-proof reliability and up time. It does no good for a mobo/system to be 3% faster if it crashes while running a month long analysis. These 2P mobos are about 100% reliability, something rarely found in a enthusiasts mobo.

Enterprise mobos are rarely sold by enthusiast marketeers. Newegg has a few enterprise mobos listed primarily because they have started a Newegg Biz website to expand their revenue streams. They don't have much in the line of true enterprise hardware however. It's a token offering because manufacturers are not likely to support whoring of the enterprise market lest they lose all of their quality vendors who provide customer technical product support.
psyq321 - Sunday, January 6, 2013 - link
Actually, ASUS Z9PE-D8 WS allows for some overclocking capabilities.

CPU overclocking with 2P/4P Xeon E5 (2600/4600 sequence) is a no-go because Intel explicitly did not store proper ICC data so it is impossible to manipulate BCLK meaningfully (set the different ratios). Oh, and the multipliers are locked :)

However, Z9PE D8 WS allows memory overclocking - I managed to run 100% 24/7 stable with the Samsung ECC 1600 DDR3 "low voltage" RAM (16 GB sticks) - just switching memory voltage from 1.35v to 1.55v allows overclocking memory from 1600 MHz to 2133 MHz.

Why would anyone want to do that in a scientific or b2b environment? The only usage I can see are applications where memory I/O is the biggest bottleneck. Large-scale neural simulations are one of such applications, and getting 10 GB/s more of memory I/O can help a lot - especially if stable.

Also, low-latency trading applications are known to benefit from overclocked hardware and it is, in fact, used in production environment.

Modern hardware does tend to have larger headrooms between the manufacturer's operating point and the limits - if the benefit from an overclock is more benefitial than work invested to find the point where the results become unstable - and, of course, shorter life span of the hardware - then, it can be used. And it is used, for example in some trading scenarios.
Drazick - Saturday, January 5, 2013 - link
Will You, Please, Update Your Google+ Page?

It would be much easier to follow you there.
Ryan Smith - Saturday, January 5, 2013 - link
Our Google+ page is just a token page. If you wish to follow us then your best option is to follow our RSS feeds.

Gigabyte GA-7PESH1 Review: A Dual Processor Motherboard through a Scientist’s Eyes

Post Your Comment

64 Comments

View All Comments

mayankleoboy1 - Saturday, January 5, 2013 - link

IanCutress - Saturday, January 5, 2013 - link

mayankleoboy1 - Saturday, January 5, 2013 - link

IanCutress - Saturday, January 5, 2013 - link

Klimax - Sunday, January 6, 2013 - link

IanCutress - Wednesday, January 9, 2013 - link

Beenthere - Saturday, January 5, 2013 - link

psyq321 - Sunday, January 6, 2013 - link

Drazick - Saturday, January 5, 2013 - link

Ryan Smith - Saturday, January 5, 2013 - link

Log in

Don't have an account? Sign up now