DDR4

Intel and the DRAM world are switching over to DDR4 and with good reason. DDR4 is a large step forward, and some of the highlights of DDR4 include the following:

  • Speeds up to 3200 MT/s (1.6GHz Double Data Rate)
  • Lower DRAM I/O voltage (1.2 instead of 1.5 V VDDQ)
  • Twice the capacity (using the same DRAM chips)
  • Improved RAS

The improvements start with the internal organization. A DDR3 chip has eight independent banks, while DDR4 comes with 16 banks, organized in a 4x4 configuration: four bank groups with four banks. More banks mean that more pages can stay open (more page hits, lower latency) at a small power increase, which is completely negated by a whole range of power efficiency features (see further). The power efficiency gains are rather large. Samsung quantifies them in the slide below. 

Samsung claims about 21% lower power thanks to the drop in operating voltage (1.5 ->1.2v). Low Power DDR4 will run at 1.05v and will lower the power usage even further. But there is more to DDR4 than lowering the voltage. Samsung claims that, when both are manufactured with the same process technology, the DDR4 runs at 2/3 of the power DDR3L needs. 

Micron gives a break down of the features that made DDR4 more power efficient besides the obvious drop in VDDQ. 

Note that the total power efficiency increase is 30-35%, and this is not just a result of the VDD reduction (20%). In that sense, DDR4 is a larger step forward than previous DDR technology transistions. Of course, the 30-35% improvement in power efficiency is measured with RAM running at the same speed. It's also possible to run DDR4 at much higher speeds (3200 MT/s vs 1866 MT/s) while sacrificing some of the power savings. The DDR4 memory that we are using for testings runs at 2100 MT/s, a good compromise between a mild speed increase and power efficiency.

A more elaborate discussion will follow in our next server memory article, but each bank also has much smaller rows (four times smaller) and thus the cycle time of the DRAM can be much higher. The result is lower latency.

The improved signal to noise ratio and the extra pins for addressing allow DDR4 to support eight DRAM stacks instead of four (DDR3). As a result, DDR4 can support twice the capacity of DDR3 using the same (4-16Gb) DRAM chips. This will require the use of 3D stacking technology, which will take time to implement. However, since 8Gb chips are now used, Registered DIMMs of 32GB should soon be a reality, as well as 64GB LRDIMMs. We'll discuss this in more detail on the next page.

Power Optimizations Improved Support for LRDIMMs
Comments Locked

85 Comments

View All Comments

  • cmikeh2 - Monday, September 8, 2014 - link

    In the SKU comparison table you have the E5-2690V2 listed as a 12/24 part when it is in fact a 10/20 part. Just a tiny quibble. Overall a fantastic read.
  • KAlmquist - Monday, September 8, 2014 - link

    Also, the 2637 v2 is 4/8, not 6/12.
  • isa - Monday, September 8, 2014 - link

    Looking forward to a new supercomputer record using these behemoths.
  • Bruce Allen - Monday, September 8, 2014 - link

    Awesome article. I'd love to see Cinebench and other applications tests. We do a lot of rendering (currently with older dual Xeons) and would love to compare these new Xeons versus the new 5960X chips - software license costs per computer are so high that the 5960X setups will need much higher price/performance to be worth it. We actually use Cinema 4D in production so those scores are relevant. We use V-Ray, Mental Ray and Arnold for Maya too but in general those track with the Cinebench scores so they are a decent guide. Thank you!
  • Ian Cutress - Monday, September 8, 2014 - link

    I've got some E5 v3 Xeons in for a more workstation oriented review. Look out for that soon :)
  • fastgeek - Monday, September 8, 2014 - link

    From my notes a while back... two E5-2690 v3's (all cores + turbo enabled) under 2012 Server yielded 3,129 for multithreaded and 79 for single.

    While not Haswell, I can tell you that four E5-4657L V2's returned 4,722 / 94 respectively.

    Hope that helps somewhat. :-)
  • fastgeek - Monday, September 8, 2014 - link

    I don't see a way to edit my previous comment; but those scores were from Cinebench R15
  • wireframed - Saturday, September 20, 2014 - link

    You pay for licenses for render Nodes? Switch to 3DS, and you get 9999 nodes for free (unless they changed the licensing since I last checked). :)
  • Lone Ranger - Monday, September 8, 2014 - link

    You make mention that the large core count chips are pretty good about raising their clock rate when only a few cores are active. Under Linux, what is the best way to see actual turbo frequencies? cpuinfo doesn't show live/actual clock rate.
  • JohanAnandtech - Monday, September 8, 2014 - link

    The best way to do this is using Intel's PCM. However, this does not work right now (only on Sandy and Ivy, not Haswel) . I deduced it from the fact that performance was almost identical and previous profiling of some of our benchmarks.

Log in

Don't have an account? Sign up now