Memory Subsystem: Bandwidth

As we have reported before, measuring the full bandwidth potential with John McCalpin's Stream bandwidth benchmark has become a matter of extreme tuning, requiring a very deep understanding of the platform. 

If we used our previous binaries, both the first and second generation EPYC could not get past 200-210 GB/s. It gave the impression of running into a "bandwidth wall", despite the fact that we now had 8-channel DDR4-3200. So we used the results that Intel and AMD best binaries produce using AVX-512 (Intel) and AVX-2 (AMD). 

The results are expressed in gigabytes per second.

Stream Triad

AMD can reach even higher numbers with the setting "number of nodes per socket" (NPS) set to 4. With 4 nodes per socket, AMD reports up to 353 GB/s. NPS4 will cause the CCX to only access the memory controllers with the lowest latency at the central IO Hub chip.

Those numbers only matter to a small niche of carefully AVX(-256/512) optimized HPC applications. AMD claims a 45% advantage compared to the best (28-core) Intel SKUs. We have every reason to believe them but it is only relevant to a niche. 

For the rest of the enterprise world (probably 95+%), memory latency has much larger impact than peak bandwidth. 

Benchmark Configuration and Methodology Memory Subsystem: Latency
Comments Locked

180 Comments

View All Comments

  • Zoolook - Saturday, August 10, 2019 - link

    It's been a pretty good investment for me, bought at 8$ two years ago, seems like I'll keep it for a while longer.
  • CheapSushi - Wednesday, August 7, 2019 - link

    It's glorious...one might say.... even EPYC.
  • abufrejoval - Wednesday, August 7, 2019 - link

    Hard to believe a 64 core CPU can be had for the price of a used middle class car or the price of four GTX 2080ti.

    Of course once you add 2TB of RAM and as many PCIe 4 SSDs as those lanes will feed, it no longer feels that affordable.

    There is a lot of clouds still running ancient Sandy/Ivy Bridge and Haswell CPUs: I guess replacing those will eat quite a lot of chips.

    And to think that it's the very same 8-core part that powers the engire range: That stroke of simplicity and genius took so many years of planning ahead and staying on track during times when AMD was really not doing well. Almost makes you believe that corporations owned by share holders can actually sometimes actually execute a strategy, without Facebook type voting rights.

    Raising my coffee mug in a salute!
  • schujj07 - Thursday, August 8, 2019 - link

    Sandy Bridge maxed out at 8c/16t.
    Ivy Bridge maxed out at 15c/30t.
    Haswell maxed out at 18c/36t.
    That means that a single socket Epyc 64c/128t can give you more CPU cores than a quad socket Sandy Bridge (32c/64t) or Ivy Bridge (60c/120t) and only a few less cores that a quad socket Haswell (72c/144t).
  • Eris_Floralia - Wednesday, August 7, 2019 - link

    This is what we've all been waiting for!
  • Eris_Floralia - Wednesday, August 7, 2019 - link

    Thank you for all the work!
  • quorm - Wednesday, August 7, 2019 - link

    Given the range of configurations and prices here, I don't see much room for threadripper. Maybe 16 - 32 cores with higher clock speeds? Really wondering what a new threadripper can bring to the table.
  • willis936 - Wednesday, August 7, 2019 - link

    A reduced feature set and lower prices, namely.
  • quorm - Wednesday, August 7, 2019 - link

    Reduced in what way, though? I'm assuming threadripper will be 4 chiplets, 64 pcie lanes, single socket only. All ryzen support ecc.

    So, what can it offer? At 32 cores, 8 channel memory becomes useful for a lot of workloads. Seems like a lot of professionals would just choose epyc this time. On the other end, I don't think any gamers need more than a 3900x/3950x. Is threadripper just going to be for bragging rights?
  • quorm - Wednesday, August 7, 2019 - link

    Sorry, forgot to add, 3950x is $750, epyc 7302p is $825. Where is threadripper going to fit?

Log in

Don't have an account? Sign up now