Professional Performance: Windows

Agisoft Photoscan – 2D to 3D Image Manipulation: link

Agisoft Photoscan creates 3D models from 2D images, a process which is very computationally expensive. The algorithm is split into four distinct phases, and different phases of the model reconstruction require either fast memory, fast IPC, more cores, or even OpenCL compute devices to hand. Agisoft supplied us with a special version of the software to script the process, where we take 50 images of a stately home and convert it into a medium quality model. This benchmark typically takes around 15-20 minutes on a high end PC on the CPU alone, with GPUs reducing the time.

Agisoft Photoscan 1.0.0

Photoscan, on paper, would offer more possibilities for faster memory to make a difference. However it would seem that the most memory dependent stage (stage 3) is actually a small part of the overall calculation and was absorbed by the natural variation in the larger stages, giving at most a 1.1% difference between times.

Cinebench R15

Cinebench R15 - Single Thread

Cinebench R15 - MultiThread

Cinebench is historically CPU dependent, giving a 2% difference from JEDEC to peak results.

3D Particle Movement

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

3DPM is also relatively memory agnostic for DDR4 on Haswell-E, showing that DDR4-2133 is good enough.

Professional Performance: Linux

Built around several freely available benchmarks for Linux, Linux-Bench is a project spearheaded by Patrick at ServeTheHome to streamline about a dozen of these tests in a single neat package run via a set of three commands using an Ubuntu 14.04 LiveCD. These tests include fluid dynamics used by NASA, ray-tracing, molecular modeling, and a scalable data structure server for web deployments. We run Linux-Bench and have chosen to report a select few of the tests that rely on CPU and DRAM speed.

C-Ray: link

C-Ray is a simple ray-tracing program that focuses almost exclusively on processor performance rather than DRAM access. The test in Linux-Bench renders a heavy complex scene offering a large scalable scenario.

Linux-Bench c-ray 1.1 (Hard)

Natural variation gives a 4% difference, although the faster and more dense memory gave slower times.

NAMD, Scalable Molecular Dynamics: link

Developed by the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign, NAMD is a set of parallel molecular dynamics codes for extreme parallelization up to and beyond 200,000 cores. The reference paper detailing NAMD has over 4000 citations, and our testing runs a small simulation where the calculation steps per unit time is the output vector.

Linux-Bench NAMD Molecular Dynamics

NAMD showed little difference between our memory kits, peaking at 0.7% above JEDEC.

NPB, Fluid Dynamics: link

Aside from LINPACK, there are many other ways to benchmark supercomputers in terms of how effective they are for various types of mathematical processes. The NAS Parallel Benchmarks (NPB) are a set of small programs originally designed for NASA to test their supercomputers in terms of fluid dynamics simulations, useful for airflow reactions and design.

Linux-Bench NPB Fluid Dynamics

Despite the 4x8 GB results going south of the border, the faster memory does give a slight difference in NPB, peaking at 4.3% increased performance for the 3000+ memory kits.

Redis: link

Many of the online applications rely on key-value caches and data structure servers to operate. Redis is an open-source, scalable web technology with a b developer base, but also relies heavily on memory bandwidth as well as CPU performance.

Linux-Bench Redis Memory-Key Store, 100x

When tackling a high number of users, Redis performs up to 17% better using 2800+ memory, indicating our best benchmark result.

Memory Scaling on Haswell-E: CPU Real World Memory Scaling on Haswell: Single GTX 770 Gaming
Comments Locked

120 Comments

View All Comments

  • wyewye - Sunday, February 8, 2015 - link

    Extremely weak review.

    Ian, is this your first memory review?
    Everyone knows in the real world apps the difference is small. Whats the point to show a gazilion of charts with 1% differences. You had way more random noise from the tests errors, those numbers are meaningless.
    For memory, the syntetic tests is the only way.

    Thumbs down, bring back Anand for decent reviews.
  • wyewye - Sunday, February 8, 2015 - link

    @Ian
    ProTip: when the differences are small and you get obviously wrong results like 2800@cl14 slower than 2133@cl16, run 10 or 20 tests, eliminate spikes and compute the median.
  • wyewye - Sunday, February 8, 2015 - link

    Ian stop being sloppy and do a better job next time!
  • Oxford Guy - Sunday, February 8, 2015 - link

    "Moving from a standard DDR3-2133 C11 kit to DDR4-2133 C15, just by looking at the numbers, feels like a downgrade despite what the rest of the system is."

    Sure... let's just ignore the C10 and C9 DDR3 that's available to make DDR4 look better?
  • eanazag - Monday, February 9, 2015 - link

    Why not post some RAM disk numbers?

    What I saw in the article is that the cheapest, high capacity made the most sense for my dollar.
  • SFP1977 - Tuesday, February 10, 2015 - link

    Am I missing something, or how did they over come the fact that their 2011 test processor has 4 memory lanes while that 1150 processor has only 2??
  • deanp0219 - Wednesday, February 11, 2015 - link

    Great article, but in fairness, you're comparing the first run of DDR4 modules against very well developed and evolved DDR3 modules. When DDR3 was first released, I'll bet some of the high-end DDR2 modules available at the time matched up with them fairly well. We'll have to see where DDR4 technology goes from here. Again, great read though. Totally not a reflection on the article -- nothing you can do about the state of the tech. Made me feel better about my DDR3-2133 machine!
  • MattMe - Friday, July 10, 2015 - link

    Am I right in thinking that the benefits of DDR4 outside of power consumption could well be in scenarios where integrated graphics are being utilised?

    The additional channels and clock speeds are more likely to have an effect there than an external GPU, I would assume. But we're still yet to see any DDR4L in the consumer market (as far as I'm aware), it's most beneficial area.

    Seeing some benchmarks including integrated graphics would be very interesting, especially in smaller, lower powered systems like a NUC or similar.
  • LorneK - Monday, October 5, 2015 - link

    My gripe with Cinebench as a "professional" test is that aside from tracing rays, it in no way resembles the kind of rendering that an actual professional would be doing.

    There's hardly any geometry, hardly any textures, no displacement, no advanced lighting models, etc.

    So yeah, DDR4 makes barely any impact in Cinebench, but I have to wonder how much of that is due to Cinebench requiring almost nothing from RAM in general.

    Someone needs to come along and make a truly useful rendering benchmark. A complex scene with millions of polygons, gigs of textures, global illumination, glossy reflections, the works basically.

    Only then can we actually know what various aspects of a machine's hardware are affecting.

    An amazing SSD would reduce initial scene spool up time. Fast single thread performance would also increase render start times. Beefy RAM configs would be better at feeding the CPUs the multiple GBs needed to do the job. And the render tiles would take long enough to complete that a 72 thread Xeon box isn't wasting half its resources simply moving from tile to tile and rendering microscopic regions.
  • Zerung - Tuesday, February 9, 2016 - link

    My Asus Mobo notes the following:
    'Due to Intel® chipset limitation, DDR4 2133 MHz and higher memory modules on XMP mode will run at the maximum transfer rate of DDR4 2133 Mhz'. Does this mean that running the DDR4 3400 CL16 may not give me the latency below 10?
    Thanks

Log in

Don't have an account? Sign up now