As part of this review, we also ran our normal motherboard benchmarks.

WinRAR x64 3.93 - link

With 64-bit WinRAR, we compress the set of files used in the USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible, and provides as a good test for when a system has variable threaded load.  If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.

WinRar x64 3.93

WinRAR is another example where enabling HyperThreading is actually hurting the throughput of the system. But even with all 32 threads in the system, the lack of memory speed hurts the benchmark.

FastStone Image Viewer 4.2 - link

FastStone Image Viewer is a free piece of software I have been using for quite a few years now.  It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters.  It also has a bulk image conversion tool, which we use here.  The software currently operates only in single-thread mode, which should change in later versions of the software.  For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.

FastStone Image Viewer 4.2

FastStone is relatively unaffected due to the single-threaded nature of the program.

Xilisoft Video Converter

With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices.  By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD APP for AMD GPUs.  For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU.  The time taken to convert these videos gives us our result.

Xilisoft Video Converter 7

With XVC having many threads is what wins the day, and having HT enabled made the process very fast indeed.  With HT on, we have 32 threads, meaning most of the videos were actually converted very quickly – the final 33rd video caused an extra delay at the end.  This is yet another example of an algorithm that can be ported to GPUs, as XVC offers both an AMD and NVIDIA option for conversion.

x264 HD Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps.  This test represents a standardized result which can be compared across other reviews, and is dependant on both CPU power and memory speed.  The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.

x264 HD Pass 1

x264 HD Pass 2

In contrast to XVC, which splits its threads across many files, the x264 HD benchmark splits threads across one file.  As a result it seems that having HT off gives a subtle 13.7% boost in performance in the first pass and 11.0% boost in the second pass.  The results of the first pass makes the second pass a lot more efficient across all the threads due to fewer memory accesses.

n-Body Simulations System Benchmarks
Comments Locked

64 Comments

View All Comments

  • mayankleoboy1 - Saturday, January 5, 2013 - link

    Ian :

    How much difference do you think Xeon Phi will make in these very different type of Computations?
    Will buying a Xeon Phi "pay itself out" as you said in the above comments ? (or is xeon phi linux only ?)
  • IanCutress - Saturday, January 5, 2013 - link

    As far as we know, Xeon Phi will be released for Linux only to begin with. I have friends who have been able to play with them so far, and getting 700 GFlops+ in DGEMM in double precision.

    It always comes down to the algorithm with these codes. It seems that if you have single precision code that doesn't mind being in a 2P system, then the GPU route may be preferable. If not, then Phi is an option. I'm hoping to get my hands on one inside H1 this year. I just have to get my hands dirty with Linux as well.

    In terms of the codes used here, if I were to guess, the Implicit Finite Difference would probably benefit a lot from Xeon Phi if it works the way I hope it does.

    Ian
  • mayankleoboy1 - Saturday, January 5, 2013 - link

    Rather stupid question, but have you tried using PGO builds ?
    Also, do you build the code with the default optimizations, or use the MSVC equivalent switch of -O2 ?
  • IanCutress - Saturday, January 5, 2013 - link

    Using Visual Studio 2012, all the speed optimisations were enabled including /GL, /O2, /Ot and /fp:fast. For each part I analysed the sections which took the most time using the Performance Analysis tools, and tried to avoid the long memory reads. Hence the Ex-FD uses an iterative loading which actually boosts speed by a good 20-30% than without it.

    Ian
  • Klimax - Sunday, January 6, 2013 - link

    Interesting. Why not Ox (all optimisations on)

    BTW: Do you have access to VTune?
  • IanCutress - Wednesday, January 9, 2013 - link

    In case /Ox performs an optimisation for memory over speed in an attempt to balance optimisations. As speed is priority #1, it made more sense to me to optimise for that only. If VS2012 gave more options, I'd adjust accordingly.

    Never heard of VTune, but I did use the Performance Analysis tools in VS2012 to optimise certain parts of the code.

    Ian
  • Beenthere - Saturday, January 5, 2013 - link

    Business and mobo makers do not use 2P mobos to get high benches or performance bragging rights per se. These systems are build for bullet-proof reliability and up time. It does no good for a mobo/system to be 3% faster if it crashes while running a month long analysis. These 2P mobos are about 100% reliability, something rarely found in a enthusiasts mobo.

    Enterprise mobos are rarely sold by enthusiast marketeers. Newegg has a few enterprise mobos listed primarily because they have started a Newegg Biz website to expand their revenue streams. They don't have much in the line of true enterprise hardware however. It's a token offering because manufacturers are not likely to support whoring of the enterprise market lest they lose all of their quality vendors who provide customer technical product support.
  • psyq321 - Sunday, January 6, 2013 - link

    Actually, ASUS Z9PE-D8 WS allows for some overclocking capabilities.

    CPU overclocking with 2P/4P Xeon E5 (2600/4600 sequence) is a no-go because Intel explicitly did not store proper ICC data so it is impossible to manipulate BCLK meaningfully (set the different ratios). Oh, and the multipliers are locked :)

    However, Z9PE D8 WS allows memory overclocking - I managed to run 100% 24/7 stable with the Samsung ECC 1600 DDR3 "low voltage" RAM (16 GB sticks) - just switching memory voltage from 1.35v to 1.55v allows overclocking memory from 1600 MHz to 2133 MHz.

    Why would anyone want to do that in a scientific or b2b environment? The only usage I can see are applications where memory I/O is the biggest bottleneck. Large-scale neural simulations are one of such applications, and getting 10 GB/s more of memory I/O can help a lot - especially if stable.

    Also, low-latency trading applications are known to benefit from overclocked hardware and it is, in fact, used in production environment.

    Modern hardware does tend to have larger headrooms between the manufacturer's operating point and the limits - if the benefit from an overclock is more benefitial than work invested to find the point where the results become unstable - and, of course, shorter life span of the hardware - then, it can be used. And it is used, for example in some trading scenarios.
  • Drazick - Saturday, January 5, 2013 - link

    Will You, Please, Update Your Google+ Page?

    It would be much easier to follow you there.
  • Ryan Smith - Saturday, January 5, 2013 - link

    Our Google+ page is just a token page. If you wish to follow us then your best option is to follow our RSS feeds.

Log in

Don't have an account? Sign up now