CPU Real World Performance

A small note on real world testing against synthetic testing – due to the way that DRAM affects a system, there can be a large disconnect between what we can observe in synthetic tests against real world testing. Synthetic tests are designed to exploit various feature XYZ, usually in an unrealistic scenario, such as pure memory read speeds or bandwidth numbers. While these are good for exploring the peak potential of a system, they often to not translate as well as CPU speed does if we invoke some common prosumer real world task. So while spending 10x on memory might show a large improvement in peak bandwidth numbers, users will have to weigh up the real world benefits in order to find the day-to-day difference when going for expensive hardware. Typically a limiting factor might be something else in the system, such as the size of a cache, so with all the will in the world a faster read speed won’t make much difference. As a result, we tend to stick to real world tests for almost all of our testing (with a couple of minor suggestions). Our benchmarks are either derived from areas such as transcoding a film or come from a regular software format such as molecular dynamics running a consistent scene.

Handbrake v0.9.9

For HandBrake, we take two videos (a 2h20 640x266 DVD rip and a 10min double UHD 3840x4320 animation short) and convert them to x264 format in an MP4 container.  Results are given in terms of the frames per second processed, and HandBrake uses as many threads as possible.

HandBrake v0.9.9 LQ Film

HandBrake v0.9.9 HQ Film

The low quality conversion is more reliant on CPU cycles available, while the high resolution conversion seems to have a very slight ~3% benefit moving up to DDR4-3000 memory.

WinRAR 5.01

Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.

WinRAR 5.01

The biggest difference showed a 5% gain over DDR4-2133 C15, although this seemed at random.

FastStone Image Viewer 4.9

FastStone Image Viewer is a free piece of software I have been using for quite a few years now. It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters. It also has a bulk image conversion tool, which we use here. The software currently operates only in single-thread mode, which should change in later versions of the software. For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions. Results shown are in seconds, lower is better.

FastStone Image Viewer 4.9

No difference between the memory speeds in FastStone.

x264 HD 3.0 Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps. This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed. The benchmark performs a 2-pass encode, and the results shown are the average frame rate of each pass performed four times. Higher is better this time around.

x264 HD 3.0, 1st Pass

x264 HD 3.0, 2nd Pass

The faster memory showed a 2.5% gain on the first pass, but less than a 1% gain in the second pass.

7-Zip 9.2

As an open source compression tool, 7-Zip is a popular tool for making sets of files easier to handle and transfer. The software offers up its own benchmark, to which we report the result.

7-Zip 9.2

At most a 2% gain was shown by 3000+ memory.

Mozilla Kraken 1.1

One of the more popular web benchmarks that stresses various codes, we run this benchmark in Chrome 35.

Mozilla Kraken 1.1

Kraken seemed to prefer the fast 1.2V memory, giving a 4.8% gain at DDR4-2800 C16, although this did not translate into the faster memory.

WebXPRT

A more in-depth web test featuring stock price rendering, image manipulation and face recognition algorithms, also run in Chrome 35.

WebXPRT

The DDR4-3200 gave an 11% gain over the base JEDEC memory, although this seemed to be more of a step than a slow rise.

Enabling XMP Memory Scaling on Haswell: Professional Performance
Comments Locked

120 Comments

View All Comments

  • dgingeri - Thursday, February 5, 2015 - link

    Really, what applications use this bandwidth now?

    I'm the admin of a server software test lab, and we've been forced to move to the Xeon E5 v3 platform for some of our software, and it isn't seeing any enhancement from DDR4 either. These are machines and software using 256GB of memory at a time. The steps from Xeon E5 and DDR3 1066 to E5 v2 and DDR3 1333 and then up to the E5 v3 and DDR4 2133 are showing no value whatsoever. We have a couple aspects with data dedup and throughput are processor intensive, and require a lot of memory, but the memory bandwidth doesn't show any enhancement. However, since Dell is EOLing their R720, under Intel's recommendation, we're stuck moving up to the new platform. So, it's driving up our costs with no increase in performance.

    I would think that if anything would use memory bandwidth, it would be data dedup or storage software. What other apps would see any help from this?
  • Mr Perfect - Thursday, February 5, 2015 - link

    Have you seen the reported reduction in power consumption? With 256GBs per machine, it sounds like you should be benefiting from the lower power draw(and lower cooling costs) of DDR4.
  • Murloc - Thursday, February 5, 2015 - link

    depending on the country and its energy prices, the expense to upgrade and the efficiency gains made, you may not even be able to recoup the costs, ever.
    From a green point of view it may be even worse due to embodied energy going to waste depending on what happens to the old server.
  • Mr Perfect - Friday, February 6, 2015 - link

    True, but if you have to buy DDR4 machines because the DDR3 ones are out of production(like the OP), then dropping power and cooling would be a neat side bonus.

    And now, just because I'm curios: If the max DDR4 DIMM is 8GB, and there's 256GB per server, then that's 32 DIMMs. 32 times 1 to 2 watts less a DIMM would be 32 to 64 watts less load on the PSU. If the PSU is 80% efficient, then that should be 38.4 to 76.8 watts less at the wall per machine. Not really spectacular, but then you've also got cooling. If the AC is 80% efficient, that would be 46.08 to 92.16 watts less power to the AC. So in total, the new DDR4 server would cost you (wall draw plus AC draw) 84.48 to 168.96 watts lower load per server versus the discontinued DDR3 ones. Not very exciting if you've only got a couple of them, but I could see large server farms benefiting.

    Anyone know how to work out the KWh and resulting price from electric rates?
  • menting - Friday, February 6, 2015 - link

    100W for an hour straight = 0.1KWH. If you figure 10-20 cents per KWH, it's about 1-2 cents per hour for a 100W difference. That's comes to about $7-$14 per month in bills provided that 100W is consistent 24/7.
  • menting - Thursday, February 5, 2015 - link

    pattern recognition is one that comes to mind.
  • Murloc - Thursday, February 5, 2015 - link

    physical restraints of light speed? Isn't any minuscule parasitic capacitance way more speed limiting than that?
  • menting - Thursday, February 5, 2015 - link

    there's tons of limiting factors, with capacitance being one of those. But even if you take pains to optimize those, the one factor that nobody can get around is the speed of light.
  • menting - Thursday, February 5, 2015 - link

    i guess i should say speed of electricity in a conductive medium instead of speed of light.
  • retrospooty - Friday, February 6, 2015 - link

    Agreed if an app required high total bandwidth it would benefit.

    Now see if you can name a few that actually need that.

Log in

Don't have an account? Sign up now