DDR4 vs DDR3L on the CPU

One of the big questions when DDR4 was launched was around the comparison to DDR3. Was it better, was it worse? DDR4 by default switches down to an operating voltage of 1.2 volts from 1.5 volts, making it more power efficient, and the standard increases the maximum capacity on an unbuffered memory module. There are also some other enhancements such as per-IC voltage drop control and a design to aid DRAM placement in motherboards. But there was one big scary number – a CAS Latency of 15 (known as C15 or CL15).

Let’s do a quick memory recap on frequency (technically, transfer rate but used interchangeably for this purpose) against latency.

The CAS latency is the number of clocks taken between an access request from the memory controller to actually acting on that request. So a CL of 15 means that there are 15 clocks between that request and getting access. Generally, a lower CL is better.

The Frequency is the rate at which those clocks occur. DDR stands for Double Data Rate, which means that in one hertz in the frequency there are two requests – one each on the rise and fall of the clock signal. The reciprocal of the frequency/transfer rate (one divided by the frequency) is the time taken to perform a clock.

But the important thing here is that the latency is a number of clocks and thus is just a number, and the frequency determines how fast these clocks go. So on its own the CAS Latency value doesn’t say much.  The important metric is when the two are used together -the true latency is the CAS Latency * Time taken per clock, and here’s a table of values from Crucial’s recent whitepaper on the subject:

So here we have the values for True Latency:

DDR3-1600 C11: 13.75 nanoseconds
DDR4-2133 C15: 14.06 nanoseconds

In fact despite the development of new memory interfaces, the true latency for DRAM under default specifications has stayed roughly the same since DDR. As we make faster memory modules, the CAS Latency rises to keep higher frequency memory stable, but overall the true latency stays the same.

Normally in our DRAM reviews I refer to the performance index, which has a similar effect in gauging general performance:

DDR3-1600 C11: 1600/11 = 145.5
DDR4-2133 C15: 2133/15 = 142.2

As you have faster memory, you get a bigger number, and if you reduce the CL, we get a bigger number also. Thus for comparing memory kits, if the difference > 10, then the kit with the biggest performance index tends to win out, though for similar kits the one with the highest frequency is preferred.

“But who uses DDR3-1600 C11? Isn’t most memory like DDR3-1866 C9?”

This is valid point – as DDR3 has matured, the number of kits in the market that are running faster than default specifications are actually normal now. The performance index for this kit is:

DDR3-1866 C9: 1866/9 = 207.3

In the grand scheme of things, a PI of 207 is actually quite large, and super high for DDR3L. There are a few DDR3 memory kits that go beyond this up to a PI of 220, or an overclock might go to 240 beyond normal voltages, but a value of 207 shows the maturity of the DDR3 market.  If we look at the current DDR4 market, we can pick up kits with DDR4-3000 C15 ratings, which are similarly in the 200 bracket now too.

I’ve prefaced our DDR3L vs DDR4 testing with all this as a response to ‘large CL = bad’. Actually, you have to compare both numbers. Now that we have a platform that runs both, and we were able to source a beta DDR3L/DDR4 combination motherboard to test them on, we can see how it squares up from ‘regular DDR4’ against ‘high performance DDR3(L)’.

For these tests, both sets of numbers were run at 3.0 GHz with hyperthreading disabled.  Memory speeds were DDR4-2133 C15 and DDR3-1866 C9 respectively.

Dolphin Benchmark: link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.

Dolphin Emulation Benchmark

Cinebench R15

Cinebench is a benchmark based around Cinema 4D, and is fairly well known among enthusiasts for stressing the CPU for a provided workload. Results are given as a score, where higher is better.

Cinebench R15 - Single Threaded

Cinebench R15 - Multi-Threaded

Point Calculations – 3D Movement Algorithm Test: link

3DPM is a self-penned benchmark, taking basic 3D movement algorithms used in Brownian Motion simulations and testing them for speed. High floating point performance, MHz and IPC wins in the single thread version, whereas the multithread version has to handle the threads and loves more cores. For a brief explanation of the platform agnostic coding behind this benchmark, see my forum post here.

3D Particle Movement: Single Threaded

3D Particle Movement: MultiThreaded

Compression – WinRAR 5.0.1: link

Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.

WinRAR 5.01, 2867 files, 1.52 GB

Image Manipulation – FastStone Image Viewer 4.9: link

Similarly to WinRAR, the FastStone test us updated for 2014 to the latest version. FastStone is the program I use to perform quick or bulk actions on images, such as resizing, adjusting for color and cropping. In our test we take a series of 170 images in various sizes and formats and convert them all into 640x480 .gif files, maintaining the aspect ratio. FastStone does not use multithreading for this test, and thus single threaded performance is often the winner.

FastStone Image Viewer 4.9

Video Conversion – Handbrake v0.9.9: link

Handbrake is a media conversion tool that was initially designed to help DVD ISOs and Video CDs into more common video formats. The principle today is still the same, primarily as an output for H.264 + AAC/MP3 audio within an MKV container. In our test we use the same videos as in the Xilisoft test, and results are given in frames per second.

HandBrake v0.9.9 LQ Film

HandBrake v0.9.9 2x4K

Rendering – PovRay 3.7: link

The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.

POV-Ray 3.7 Beta RC4

Synthetic – 7-Zip 9.2: link

As an open source compression tool, 7-Zip is a popular tool for making sets of files easier to handle and transfer. The software offers up its own benchmark, to which we report the result.

7-zip Benchmark

Overall: DDR4 vs DDR3L on the CPU

Pretty sure the results speak for themselves:

Comparing default DDR4 to a high performance DDR3 memory kit is almost an equal contest. Having the faster frequency helps for large frame video encoding (HandBrake HQ) as well as WinRAR which is normally memory intensive. The only real benchmark loss was FastStone, which regressed by one second (out of 48 seconds).

End result, looking at the CPU test scores, is that upgrading to DDR4 doesn’t degrade performance from your high end DRAM kit, and you get the added benefit of future upgrades, faster speeds, lower power consumption due to the lower voltage and higher density modules.

Overclocking, Test Setup, Power Consumption Skylake i7-6700K DRAM Testing: DDR4 vs DDR3L on Gaming
POST A COMMENT

476 Comments

View All Comments

  • CaedenV - Wednesday, August 5, 2015 - link

    Agreed, seems like the only way to get a real performance boost is to up the core count rather than waiting for dramatically more powerful single-core parts to hit the market. Reply
  • kmmatney - Wednesday, August 5, 2015 - link

    If you have an overclocked SandyBridge, it seems like a lot of money to spend (new motherboard and memory) for a 30% gain in speed. I personally like to upgrade my GPU and CPU when I can get close the double the performance of the previous hardware. It's a nice improvement here, but nothing earth=shattering - especially considering you need a new motherboard and memory. Reply
  • Midwayman - Wednesday, August 5, 2015 - link

    And right as dx12 is hitting as well. That sandy bridge may live a couple more generations if dx12 lives up to the hype. Reply
  • freaqiedude - Wednesday, August 5, 2015 - link

    agreed I really don't see the point of spending money for a 30% speedbump in general, (as its not that much) when the benefit in games is barely a few percent, and my other workloads are fast enough as is.

    If Intel would release a mainstream hexa/octa core I would be all over that, as the things I do that are heavy are all SIMD and thus fully multithreaded, but I can't justify a new pc for 25% extra performance in some area's. with CPU performance becoming less and less relevant for games that atleast is no reason for me to upgrade...
    Reply
  • Xenonite - Thursday, August 6, 2015 - link

    "If Intel would release a mainstream hexa/octa core I would be all over that, as the things I do that are heavy are all SIMD and thus fully multithreaded, but I can't justify a new pc for 25% extra performance in some area's."

    SIMD actually has absolutely nothing to do with multithreading. SIMD refers to instruction-level parallellism, and all that has to be done to make use of it, for a well-coded app, is to recompile with the appropriate compiler flag. If the apps you are interested in have indeed been SIMD optimised, then the new AVX and AVX2 instructions have the potential to DOUBLE your CPU performance. Even if your application has been carefully designed with multi-threading in mind (which very few developers can, let alone are willing to, do) the move from a quad core to a hexa core CPU will yield a best-case performance increase of less than 50%, which is less than half what AVX and AVX2 brings to the table (with AVX-512 having the potential to again provide double the performance of AVX/AVX2).

    Unfortunately it seems that almost all developers simply refuse to support the new AVX instructions, with most apps being compiled for >10 year old SSE or SSE2 processors.

    If someone actually tried, these new processors (actually Haswell and Broadwell too) could easily provide double the performance of Sandy Bridge on integer workloads. When compared to the 900-series Nehalem-based CPUs, the increase would be even greater and applicable to all workloads (integer and floating point).
    Reply
  • boeush - Thursday, August 6, 2015 - link

    Right, and wrong. SIMD are vector based calculations. Most code and algorithms do not involve vector math (whether FP or integer). So compiling with or without appropriate switches will not make much of a difference for the vast majority of programs. That's not to say that certain specialized scenarios can't benefit - but even then you still run into a SIMD version of Amdahl's Law, with speedup being strictly limited to the fraction of the code (and overall CPU time spent) that is vectorizable in the first place. Ironically, some of the best vectorizable scenarios are also embarrassingly parallel and suitable to offloading on the GPU (e.g. via OpenCL, or via 3D graphics APIs and programmable shaders) - so with that option now widely available, technologically mature, and performant well beyond any CPU's capability, the practical utility of SSE/AVX is diminished even further. Then there is the fact that a compiler is not really intelligent enough to automatically rewrite your code for you to take good advantage of AVX; you'd actually have to code/build against hand-optimized AVX-centric libraries in the first place. And lastly, AVX 512 is available only on Xeons (Knights Landing Phi and Skylake) so no developer targeting the consumer base can take advantage of AVX 512. Reply
  • Gonemad - Wednesday, August 5, 2015 - link

    I'm running an i7 920 and was asking myself the same thing, since I'm getting near 60-ish FPS on GTA 5 with everything on at 1080p (more like 1920 x 1200), running with a R9 280. It seems the CPU would be holding the GFX card back, but not on GTA 5.

    Warcraft - who could have guessed - is getting abysmal 30 FPS just standing still in the Garrison. However, system resources shows GFX card is being pushed, while the CPU barely needs to move.

    I was thinking perhaps the multicore incompatibility on Warcraft would be an issue, but then again the evidence I have shows otherwise. On the other hand, GTA 5, that was created in the multicore era, runs smoothly.

    Either I have an aberrant system, or some i7 920 era benchmarks could help me understand what exactly do I need to upgrade. Even specific Warcraft behaviour on benchmarks could help me, but I couldn't find any good decisive benchmarks on this Blizzard title... not recently.
    Reply
  • Samus - Wednesday, August 5, 2015 - link

    The problem now with nehalem and the first gen i7 in general isn't the CPU, but the x58 chipset and its outdated PCI express bus and quickpath creating a bottleneck. The triple channel memory controller went mostly unsaturated because of the other chipset bottlenecks which is why it was dropped and (mostly) never reintroduced outside of enthusiast x99 quad channel interface.

    For certain applications the i7 920 is, amazingly, still competitive today, but gaming is not one of them. An SLI GTX 570 configuration saturates the bus, I found out first hand that is about the most you can get out of the platform.
    Reply
  • D. Lister - Thursday, August 6, 2015 - link

    Well said. The i7 9xx series had a good run, but now, as an enthusiast/gamer in '15, you wouldn't want to go any lower than Sandy Bridge. Reply
  • vdek - Thursday, August 6, 2015 - link

    I'm still running my x58 motherboard. I ended up upgrading to a Xeon 5650 for $75, which is a 6 core 32nm CPU compatible with the x58. Overclocked at 4.2ghz on air, the thing has excellent gaming performance, I see absolutely no reason to upgrade to Skylake. Reply

Log in

Don't have an account? Sign up now