Memory Subsystem: Latency

To measure latency, we use the open source TinyMemBench benchmark. The source was compiled for x86 with gcc 4.8.2 and optimization was set to "-O2". The measurement is described well by the manual of TinyMemBench:

Average time is measured for random memory accesses in the buffers of different sizes. The larger the buffer, the more significant the relative contributions of TLB, L1/L2 cache misses, and DRAM accesses become. All the numbers represent extra time, which needs to be added to L1 cache latency (4 cycles).

We tested with dual random read, as we wanted to see how the memory system coped with multiple read requests. To keep the graph readable we limited ourselves to the CPUs that were different.

L3 caches have increased significantly the past years, but it is not all good news. The L3 cache of the Xeon E3 responds very quickly (about 10 ns or less than 30 cycles at 2.8 GHz) while the L3-cache of the new generation needs almost twice as much time to respond (about 20 ns or 50 cycles at 2.6 GHz). Larger L3 caches are not always a blessing and can result in a hit to latency - there are applications that have a relatively small part of cacheable data/instructions such as search engines and HPC application that work on huge amounts of data. 

It gets worse for the "large L3 cache" models when we look at latency of accessing memory (measured at 64 MB): 

Latency in memory

The higher L3-cache latency makes memory accesses more costly in terms of latency for the Xeon E5. Despite having access to DDR4-2133 DIMMs, the Xeon E5-2650L accesses memory slower than the Xeon E3-1230L.  It is also a major weakness of the Atom C2750 which has much less sophisticated memory controller/prefetching.

Memory Subsystem: Bandwidth Single-Threaded Integer Performance
Comments Locked


View All Comments

  • julianb - Saturday, October 31, 2015 - link

    Thanks for the reply, man.
    And sorry for my late reply, totally forgot about this thread :)
  • eva2000 - Tuesday, June 23, 2015 - link

    Nice... Xeon D-1540 is awesome, but I wish it was clocked 0.2Ghz higher across the board would be just enough to tip that scale versus E5. Did my own benchmarks at :)
  • extide - Wednesday, June 24, 2015 - link

    Thats probably exactly why it ISNT clocked 0.2Ghz higher across the board ;)

    I'm sure Intel wants to see some space between this and E5.
  • boogerlad - Tuesday, June 23, 2015 - link

    If this was marketed for the consumer market with the ability to overclock, this would outsell everything completely. This is what the enthusiast needs!!!
  • Refuge - Tuesday, June 23, 2015 - link

    I don't think this is going to do much of anything for an enthusiast.

    Unless they are interested in building a server for some experiment or project.
  • JohanAnandtech - Wednesday, June 24, 2015 - link

    I still think the i7 59xx series is a better match for consumers: higher clocks and thus ST performance. The Xeon D most interesting features such as integrated 10 GBe and low power don't interest most performance consumers. Most people will have a hard time saturating a 1 GBe line and power savings are not a priority.
  • tspacie - Wednesday, June 24, 2015 - link

    Seems to tick all the boxes for a software development machine. Very good at compilation. Reasonably priced for the performance. Low power. ECC memory. I'm tempted
  • extide - Wednesday, June 24, 2015 - link

    EXACTLY what I was thinking!
  • MrSpadge - Saturday, June 27, 2015 - link

    I would be very tempted by such a chip as well, using it for BOINC. However, Broadwell looses some of the power efficiency advantage if you push it harder, i.e. the largest gains are at low and moderate frequency. Perfect for such server chips and mobile ones, but not so much for people aiming for 4+ GHz.
  • MaxKreimerman - Tuesday, June 23, 2015 - link

    Sounds impresive in just 45w package, but imposible to find in the retail sites such as newegg or wiredzone

Log in

Don't have an account? Sign up now