Memory Subsystem: Latency

To measure latency, we use the open source TinyMemBench benchmark. The source was compiled for x86 with gcc 4.8.2 and optimization was set to "-O2". The measurement is described well by the manual of TinyMemBench:

Average time is measured for random memory accesses in the buffers of different sizes. The larger the buffer, the more significant the relative contributions of TLB, L1/L2 cache misses, and DRAM accesses become. All the numbers represent extra time, which needs to be added to L1 cache latency (4 cycles).

We tested with dual random read, as we wanted to see how the memory system coped with multiple read requests. To keep the graph readeable we limited ourselves to the CPUs that were different. The Xeon E5-2695 and 2699 have a very similar memory subsytem (dual memory controller) so we tested only the E5-2699.

The massive L3 caches do have some disadvantages: latency goes up. The L3 cache of the Xeon E5-2699 v3 (45MB) has a latency between 20 and 32 ns while the 20MB cache of the Xeon E5-2690 hovers between 15 and 20 ns. That translates to about 90 cycles versus 60, which is considerable. However, it's not a case of the Haswell's L3 cache being a lot worse: the 20MB L3 cache of the Xeon E5-2667 v3 is only slightly slower than the Xeon E5-2690 and is still faster than the Xeon E5-2697 v2 (30MB). The main culprit is simply dealing with a huge amount of cache on the E5-2699 v3. In the next test, we will focus on the latency of the DRAM subsystem.

Dual Random read Latency

The DRAM subsystem is still three or four times slower than the massive L3 cache. LRDIMMs still have a very small latency overhead – +3.6% at the most – but that is neglible.

DDR4-2133 seems to have the same latency as DDR3-1866 . We measured 81.6 ns on the Xeon E5-2697 v2. Considering that DDR4-2400 is just around the corner, DDR4 will quickly give a performance boost to the new platform.

Memory Subsystem: Bandwidth Single-Threaded Integer Performance
Comments Locked

85 Comments

View All Comments

  • SuperVeloce - Tuesday, September 9, 2014 - link

    Oh, nevermind... I unknowingly caught an error.
  • JohanAnandtech - Tuesday, September 9, 2014 - link

    thx! Fixed. Sorry for the late reaction, jetlagged and trying to get to the hectic pace of IDF :-)
  • hescominsoon - Tuesday, September 9, 2014 - link

    As long as AMD continues it's idiotic two integer units sharing an fpu design they will be an afterthought in the cpu department.
  • nils_ - Sunday, September 14, 2014 - link

    Serious competition for Intel will not come from AMD any time soon, but possibly IBM with the POWER8, Tyan even came out with a single socket board for that CPU so it might make it's way into the same market soon.
  • ScarletEagle - Tuesday, September 16, 2014 - link

    Any feel for the relative HPC performance of the E5-2680v3 with respect to the E5-2650Lv3? I am looking at purchasing a PowerEdge 730 with two of these and the 2133MHz RAM. My guess is that the higher base clock speed should make somewhat of an improvement?

Log in

Don't have an account? Sign up now