Professional Performance: Windows

Agisoft Photoscan – 2D to 3D Image Manipulation: link

Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.

Agisoft Photoscan 1.0.0

Photoscan, on paper, would offer more possibilities for faster memory to make a difference. However it would seem that the most memory dependent stage (stage 3) is actually a small part of the overall calculation and was absorbed by the natural variation in the larger stages, giving at most a 1.1% difference between times.

Cinebench R15

Cinebench R15 - Single Thread

Cinebench R15 - MultiThread

Cinebench is historically CPU dependent, giving a 2% difference from JEDEC to peak results.

3D Particle Movement

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

3DPM is also relatively memory agnostic for DDR4 on Haswell-E, showing that DDR4-2133 is good enough.

Professional Performance: Linux

Built around several freely available benchmarks for Linux, Linux-Bench is a project spearheaded by Patrick at ServeTheHome to streamline about a dozen of these tests in a single neat package run via a set of three commands using an Ubuntu 14.04 LiveCD. These tests include fluid dynamics used by NASA, ray-tracing, molecular modeling, and a scalable data structure server for web deployments. We run Linux-Bench and have chosen to report a select few of the tests that rely on CPU and DRAM speed.

C-Ray: link

C-Ray is a simple ray-tracing program that focuses almost exclusively on processor performance rather than DRAM access. The test in Linux-Bench renders a heavy complex scene offering a large scalable scenario.

Linux-Bench c-ray 1.1 (Hard)

Natural variation gives a 4% difference, although the faster and more dense memory gave slower times.

NAMD, Scalable Molecular Dynamics: link

Developed by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign, NAMD is a set of parallel molecular dynamics codes for extreme parallelization up to and beyond 200,000 cores. The reference paper detailing NAMD has over 4000 citations, and our testing runs a small simulation where the calculation steps per unit time is the output vector.

Linux-Bench NAMD Molecular Dynamics

NAMD showed little difference between our memory kits, peaking at 0.7% above JEDEC.

NPB, Fluid Dynamics: link

Aside from LINPACK, there are many other ways to benchmark supercomputers in terms of how effective they are for various types of mathematical processes. The NAS Parallel Benchmarks (NPB) are a set of small programs originally designed for NASA to test their supercomputers in terms of fluid dynamics simulations, useful for airflow reactions and design.

Linux-Bench NPB Fluid Dynamics

Despite the 4x8 GB results going south of the border, the faster memory does give a slight difference in NPB, peaking at 4.3% increased performance for the 3000+ memory kits.

Redis: link

Many of the online applications rely on key-value caches and data structure servers to operate. Redis is an open-source, scalable web technology with a b developer base, but also relies heavily on memory bandwidth as well as CPU performance.

Linux-Bench Redis Memory-Key Store, 100x

When tackling a high number of users, Redis performs up to 17% better using 2800+ memory, indicating our best benchmark result.

Memory Scaling on Haswell-E: CPU Real World Memory Scaling on Haswell: Single GTX 770 Gaming
Comments Locked

120 Comments

View All Comments

  • dgingeri - Thursday, February 5, 2015 - link

    Really, what applications use this bandwidth now?

    I'm the admin of a server software test lab, and we've been forced to move to the Xeon E5 v3 platform for some of our software, and it isn't seeing any enhancement from DDR4 either. These are machines and software using 256GB of memory at a time. The steps from Xeon E5 and DDR3 1066 to E5 v2 and DDR3 1333 and then up to the E5 v3 and DDR4 2133 are showing no value whatsoever. We have a couple aspects with data dedup and throughput are processor intensive, and require a lot of memory, but the memory bandwidth doesn't show any enhancement. However, since Dell is EOLing their R720, under Intel's recommendation, we're stuck moving up to the new platform. So, it's driving up our costs with no increase in performance.

    I would think that if anything would use memory bandwidth, it would be data dedup or storage software. What other apps would see any help from this?
  • Mr Perfect - Thursday, February 5, 2015 - link

    Have you seen the reported reduction in power consumption? With 256GBs per machine, it sounds like you should be benefiting from the lower power draw(and lower cooling costs) of DDR4.
  • Murloc - Thursday, February 5, 2015 - link

    depending on the country and its energy prices, the expense to upgrade and the efficiency gains made, you may not even be able to recoup the costs, ever.
    From a green point of view it may be even worse due to embodied energy going to waste depending on what happens to the old server.
  • Mr Perfect - Friday, February 6, 2015 - link

    True, but if you have to buy DDR4 machines because the DDR3 ones are out of production(like the OP), then dropping power and cooling would be a neat side bonus.

    And now, just because I'm curios: If the max DDR4 DIMM is 8GB, and there's 256GB per server, then that's 32 DIMMs. 32 times 1 to 2 watts less a DIMM would be 32 to 64 watts less load on the PSU. If the PSU is 80% efficient, then that should be 38.4 to 76.8 watts less at the wall per machine. Not really spectacular, but then you've also got cooling. If the AC is 80% efficient, that would be 46.08 to 92.16 watts less power to the AC. So in total, the new DDR4 server would cost you (wall draw plus AC draw) 84.48 to 168.96 watts lower load per server versus the discontinued DDR3 ones. Not very exciting if you've only got a couple of them, but I could see large server farms benefiting.

    Anyone know how to work out the KWh and resulting price from electric rates?
  • menting - Friday, February 6, 2015 - link

    100W for an hour straight = 0.1KWH. If you figure 10-20 cents per KWH, it's about 1-2 cents per hour for a 100W difference. That's comes to about $7-$14 per month in bills provided that 100W is consistent 24/7.
  • menting - Thursday, February 5, 2015 - link

    pattern recognition is one that comes to mind.
  • Murloc - Thursday, February 5, 2015 - link

    physical restraints of light speed? Isn't any minuscule parasitic capacitance way more speed limiting than that?
  • menting - Thursday, February 5, 2015 - link

    there's tons of limiting factors, with capacitance being one of those. But even if you take pains to optimize those, the one factor that nobody can get around is the speed of light.
  • menting - Thursday, February 5, 2015 - link

    i guess i should say speed of electricity in a conductive medium instead of speed of light.
  • retrospooty - Friday, February 6, 2015 - link

    Agreed if an app required high total bandwidth it would benefit.

    Now see if you can name a few that actually need that.

Log in

Don't have an account? Sign up now