Browsing through a manufacturer’s website can offer a startling view of the product line up.  Such was the case when I sprawled through Gigabyte’s range, only to find that they offer server line products, including dual processor motherboards.  These are typically sold in a B2B environment (to system builders and integrators) rather than to the public, but after a couple of emails they were happy to send over their GA-7PESH1 model and a couple of Xeon CPUs for testing.  Coming from a background where we used dual processor systems for some serious CPU Workstation throughput, it was interesting to see how the Sandy Bridge-E Xeons compared to consumer grade hardware for getting the job done. 

In my recent academic career as a computational chemist, we developed our own code to solve issues of diffusion and migration.  This started with implicit grid solvers – everyone in the research group (coming from chemistry backgrounds rather than computer science backgrounds), as part of their training, wrote their own grid and solver classes in C++ which would be the backbone of the results obtained in their doctorate degree.  Due to the idiosyncratic nature of coders and learning how to code, some of the students naturally wrote classes were easily multi-threaded at a high level, whereas some used a large amount of localized cache which made multithreading impractical.  Nevertheless, single threaded performance was a major part in being able to obtain the results of the simulations which could last from seconds to weeks.  As part of my role in the group, I introduced the chemists to OpenMP which sped up some of their simulations, but as a result caused the shift in writing this code towards the multithreaded.  I orchestrated the purchasing of dual processor (DP) Nehalem workstations from Dell (the preferred source of IT equipment for the academic institution (despite my openness to build in-house custom hardware) in order to speed up the newly multithreaded code (with ECC memory for safety), and then embarked on my own research which looked at off-the-shelf FEM solvers then explicit calculations to parallelize the code at a low level, which took me to GPUs, which resulted in nine first author research papers overall in those three years. 

In a lot of the simulations written during that period by the multiple researchers, one element was consistent – trying to use as much processor power as possible.  When one of us needed more horsepower for a larger number of simulations, we used each other’s machines to get the job done quicker.  Thus when it came to purchasing those DP machines, I explored the SR-2 route and the possibility of self-building the machines, but this was quickly shot down by the IT department who preferred pre-built machines with a warranty.  In the end we purchased three dual E5520 systems, to give each machine 8 cores / 16 threads of processing power, as well as some ECC memory (thankfully the nature of the simulations required no more than a few megabytes each), to fit into the budget.  When I left that position, these machines were still going strong, with one colleague using all three to correlate the theoretical predictions with experimental results.

Since leaving that position and working for AnandTech, I still partake in exploring other avenues where my research could go into, albeit in my spare time without funding.  Thankfully moving to a single OCed Sandy Bridge-E processor let me keep the high level CPU code comparable to during the research group, even if I don’t have the ECC memory.  The GPU code is also faster, moving from a GTX480 during research to 580/680s now.  One of the benchmarks in my motherboard reviews is derived from one of my research papers – regular readers of our motherboard reviews will recognize the 3DPM benchmark from those reviews and in the review today, just to see how far computation has gone.  Being a chemist rather than a computer scientist, the code for this benchmark could be comparable to similar non-CompSci trained individuals – from a complexity point of view it is very basic, slightly optimized to perform faster calculations (FMA) but not the best it could be in terms of full blown SSE/SSE2/AVX extensions et al.

With the vast number of possible uses for high performance systems, it would be impossible for me to cover them all.  Johan de Gelas, our server reviewer, lives and breathes this type of technology, and hence his benchmark suite deals more with virtualization, VMs and database accessing.  As my perspective is usually from performance and utility, the review of this motherboard will be based around my history and perspective.  As I mentioned previously, this product is primarily B2B (business to business) rather than B2C (business to consumer), however from a home build standpoint, it offers an alternative to the two main Sandy Bridge-E based Xeon home-build workstation products in the market – the ASUS Z9PE-D8 WS and the EVGA SR-X.  Hopefully we will get these other products in as comparison points for you.

Gigabyte GA-7PESH1 Visual Inspection, Board Features
Comments Locked

64 Comments

View All Comments

  • mayankleoboy1 - Saturday, January 5, 2013 - link

    Ian :

    How much difference do you think Xeon Phi will make in these very different type of Computations?
    Will buying a Xeon Phi "pay itself out" as you said in the above comments ? (or is xeon phi linux only ?)
  • IanCutress - Saturday, January 5, 2013 - link

    As far as we know, Xeon Phi will be released for Linux only to begin with. I have friends who have been able to play with them so far, and getting 700 GFlops+ in DGEMM in double precision.

    It always comes down to the algorithm with these codes. It seems that if you have single precision code that doesn't mind being in a 2P system, then the GPU route may be preferable. If not, then Phi is an option. I'm hoping to get my hands on one inside H1 this year. I just have to get my hands dirty with Linux as well.

    In terms of the codes used here, if I were to guess, the Implicit Finite Difference would probably benefit a lot from Xeon Phi if it works the way I hope it does.

    Ian
  • mayankleoboy1 - Saturday, January 5, 2013 - link

    Rather stupid question, but have you tried using PGO builds ?
    Also, do you build the code with the default optimizations, or use the MSVC equivalent switch of -O2 ?
  • IanCutress - Saturday, January 5, 2013 - link

    Using Visual Studio 2012, all the speed optimisations were enabled including /GL, /O2, /Ot and /fp:fast. For each part I analysed the sections which took the most time using the Performance Analysis tools, and tried to avoid the long memory reads. Hence the Ex-FD uses an iterative loading which actually boosts speed by a good 20-30% than without it.

    Ian
  • Klimax - Sunday, January 6, 2013 - link

    Interesting. Why not Ox (all optimisations on)

    BTW: Do you have access to VTune?
  • IanCutress - Wednesday, January 9, 2013 - link

    In case /Ox performs an optimisation for memory over speed in an attempt to balance optimisations. As speed is priority #1, it made more sense to me to optimise for that only. If VS2012 gave more options, I'd adjust accordingly.

    Never heard of VTune, but I did use the Performance Analysis tools in VS2012 to optimise certain parts of the code.

    Ian
  • Beenthere - Saturday, January 5, 2013 - link

    Business and mobo makers do not use 2P mobos to get high benches or performance bragging rights per se. These systems are build for bullet-proof reliability and up time. It does no good for a mobo/system to be 3% faster if it crashes while running a month long analysis. These 2P mobos are about 100% reliability, something rarely found in a enthusiasts mobo.

    Enterprise mobos are rarely sold by enthusiast marketeers. Newegg has a few enterprise mobos listed primarily because they have started a Newegg Biz website to expand their revenue streams. They don't have much in the line of true enterprise hardware however. It's a token offering because manufacturers are not likely to support whoring of the enterprise market lest they lose all of their quality vendors who provide customer technical product support.
  • psyq321 - Sunday, January 6, 2013 - link

    Actually, ASUS Z9PE-D8 WS allows for some overclocking capabilities.

    CPU overclocking with 2P/4P Xeon E5 (2600/4600 sequence) is a no-go because Intel explicitly did not store proper ICC data so it is impossible to manipulate BCLK meaningfully (set the different ratios). Oh, and the multipliers are locked :)

    However, Z9PE D8 WS allows memory overclocking - I managed to run 100% 24/7 stable with the Samsung ECC 1600 DDR3 "low voltage" RAM (16 GB sticks) - just switching memory voltage from 1.35v to 1.55v allows overclocking memory from 1600 MHz to 2133 MHz.

    Why would anyone want to do that in a scientific or b2b environment? The only usage I can see are applications where memory I/O is the biggest bottleneck. Large-scale neural simulations are one of such applications, and getting 10 GB/s more of memory I/O can help a lot - especially if stable.

    Also, low-latency trading applications are known to benefit from overclocked hardware and it is, in fact, used in production environment.

    Modern hardware does tend to have larger headrooms between the manufacturer's operating point and the limits - if the benefit from an overclock is more benefitial than work invested to find the point where the results become unstable - and, of course, shorter life span of the hardware - then, it can be used. And it is used, for example in some trading scenarios.
  • Drazick - Saturday, January 5, 2013 - link

    Will You, Please, Update Your Google+ Page?

    It would be much easier to follow you there.
  • Ryan Smith - Saturday, January 5, 2013 - link

    Our Google+ page is just a token page. If you wish to follow us then your best option is to follow our RSS feeds.

Log in

Don't have an account? Sign up now