Broadwell with eDRAM: Still Has Gaming Legs

As we crossover into the 2020s era, we now have more memory bandwidth from DRAM than a processor in 2015. Intel's Broadwell processors were advertised as having 128 megabytes of 'eDRAM', which enabled 50 GiB/s of bidirectional bandwidth at a lower latency of main memory, which ran only at 25.6 GiB/s. Modern processors have access to DDR4-3200, which is 51.2 GiB/s, and future processors are looking at 65 GiB/s or higher.

At this time, it is perhaps poignant to take a step back and understand the beauty of having 128 MiB of dedicated silicon for a singular task.

Intel’s eDRAM enabled Broadwell processors accelerated a significant number of memory bandwidth and memory latency workloads, in particular gaming. What eDRAM has enabled in our testing, even if we bypass the now antiquated CPU performance, is surprisingly good gaming performance. Most of our CPU gaming tests are designed to enable a CPU-limited scenario, which is exactly where Broadwell can play best. Our final CPU gaming test is a 1080p Max scenario where the CPU matters less, but there still appears to be good benefits from having an on-die DRAM and that much lower latency all the way out to 128 MiB.

There have always been questions around exactly what 128 MiB of eDRAM cost Intel to produce and supply to a generation of processors. At launch, Intel priced the eDRAM versions of 14 nm Broadwell processors as +$60 above the non-eDRAM versions of 22 nm Haswell equivalents. There are arguments to say that it cost Intel directly somewhere south of $10 per processor to build and enable, but Intel couldn’t charge that low, based on market segmentation. Remember, that eDRAM was built on a mature 22 nm SoC process at the time.

As we move into an era where AMD is showcasing its new ‘double’ 32 MiB L3 cache on Zen 3 as a key part of their improved gaming performance, we already had 128 MiB of gaming acceleration in 2015. It was enabled through a very specific piece of hardware built into the chip. If we could do it in 2015, why can’t we do it in 2020?

What about HBM-enabled eDRAM for 2021?

Fast forward to 2020, and we now have mature 14 nm and 7 nm processes, as well as a cavalcade of packaging and eDRAM opportunities. We might consider that adding 1-2 GiB of eDRAM to a package could be done with high bandwidth connectivity, using either Intel’s embedded multi-die technology or TSMC’s 3DFabric technology.

If we did that today, it could arguably be just as complex as what it was to add 128 MiB back in 2015. We now have extensive EDA and packaging tools to deal with chiplet designs and multi-die environments.

So consider, at a time where high performance consumer processors are in the realm of $300 up to $500-$800, would customers consider paying +$60 more for a modern high-end processor with 2 gigabytes of intermediate L4 cache? It would extend AMD’s idea of high-performance gaming cache well beyond the 32 MiB of Zen 3, or perhaps give Intel a different dynamic to its future processor portfolio.

As we move into more a chiplet enabled environment, some of those chiplets could be an extra cache layer. However, to put some of this into perspective.

  • Intel's Broadwell's 128 MiB of eDRAM was built (is still built) on Intel's 22nm IO process and used 77 mm2 of die area.
  • AMD's new RX 6000 GPUs use '128 MiB' of 7nm Infinity Cache SRAM. At an estimated 6.4 billion transistors, or 24% of the 26.8 billion transistors and ~510-530mm2 die, this cache requires a substantial amount of die area, even on 7nm.

This would suggest that in order for future products to integrate large amounts of cache or eDRAM, then layered solutions will need to be required. This will require large investment in design and packaging, especially thermal control.

Many thanks to Dylan522p for some minor updates on die size and pointing out that the same 22nm eDRAM chip is still in use today with Apple's 2020 base Macbook Pro 13.

Gaming Tests: Strange Brigade
Comments Locked

120 Comments

View All Comments

  • krowes - Monday, November 2, 2020 - link

    CL22 memory for the Ryzen setup? Makes absolutely no sense.
  • Ian Cutress - Tuesday, November 3, 2020 - link

    That's JEDEC standard.
  • Khenglish - Monday, November 2, 2020 - link

    Was anyone else bothered by the fact that Intel's highest performing single thread CPU is the 1185G7, which is only accessible in 28W tiny BGA laptops?

    Also the 128mb edram cache does seem to make on average a 10% improvement over the edramless 4790S at the same TDP. I would love to see edram on more cpus. It's so rare to need more than 8 cores. I'd rather have 8 cores with edram than 16+ cores and no edram.
  • ichaya - Monday, November 2, 2020 - link

    There's definitely a cost trade-off involved, but with an I/O die since Zen 2, it seems like AMD could just spin up a different I/O die, and justify the cost easily by selling to HEDT/Workstation/DC.
  • Notmyusualid - Wednesday, November 4, 2020 - link

    Chalk me up as 'bothered'.
  • zodiacfml - Monday, November 2, 2020 - link

    Yeah but Intel is about squeezing the last dollar in its products for a couple of years now.
  • Endymio - Monday, November 2, 2020 - link

    CPU register-> 3 levels of cache -> eDRAM -> DRAM -> Optane -> SSD -> Hard Drive.

    The human brain gets by with 2 levels of storage. I really don't feel that computers should require 9. The entire approach needs rethinking.
  • Tomatotech - Tuesday, November 3, 2020 - link

    You remember everything without writing down anything? You remarkable person.

    The rest of us rely on written materials, textbooks, reference libraries, wikipedia, and the internet to remember stuff. If you jot down all the levels of hierarchical storage available to the average degree-educated person, it's probably somewhere around 9 too depending on how you count it.

    Not everything you need to find out is on the internet or in books either. Data storage and retrieval also includes things like having to ask your brother for Aunt Jenny's number so you can ring Aunt Jenny and ask her some detail about early family life, and of course Aunt Jenny will tell you to go and ring Uncle Jonny, but she doesn't have Jonny's number, wait a moment while she asks Max for it and so on.
  • eastcoast_pete - Tuesday, November 3, 2020 - link

    You realize that the closer the cache is to actual processor speed, the more demanding the manufacturing gets and the more die area it eats. That's why there aren't any (consumer) CPUs with 1 or more MB of L1 Cache. Also, as Tomatotech wrote, we humans use mnemonic assists all the time, so the analogy short-term/long-term memory is incomplete. Writing and even drawing was invented to allow for longer-term storage and easier distribution of information. Lastly, at least IMO, it boils down to cost vs. benefit/performance as to how many levels of memory storage are best, and depends on the usage scenario.
  • Oxford Guy - Monday, November 2, 2020 - link

    Peter Bright of Ars in 2015:

    "Intel’s Skylake lineup is robbing us of the performance king we deserve. The one Skylake processor I want is the one that Intel isn't selling.

    in games the performance was remarkable. The 65W 3.3-3.7GHz i7-5775C beat the 91W 4-4.2GHz Skylake i7-6700K. The Skylake processor has a higher clock speed, it has a higher power budget, and its improved core means that it executes more instructions per cycle, but that enormous L4 cache meant that the Broadwell could offset its disadvantages and then some. In CPU-bound games such as Project Cars and Civilization: Beyond Earth, the older chip managed to pull ahead of its newer successor.

    in memory-intensive workloads, such as some games and scientific applications, the cache is better than 21 percent more clock speed and 40 percent more power. That's the kind of gain that doesn't come along very often in our dismal post-Moore's law world.

    Those 5775C results tantalized us with the prospect of a comparable Skylake part. Pair that ginormous cache with Intel's latest-and-greatest core and raise the speed limit on the clock speed by giving it a 90-odd W power envelope, and one can't help but imagine that the result would be a fine processor for gaming and workstations alike. But imagine is all we can do because Intel isn't releasing such a chip. There won't be socketed, desktop-oriented eDRAM parts because, well, who knows why.

    Intel could have had a Skylake processor that was exciting to gamers and anyone else with performance-critical workloads. For the right task, that extra memory can do the work of a 20 percent overclock, without running anything out of spec. It would have been the must-have part for enthusiasts everywhere. And I'm tremendously disappointed that the company isn't going to make it."

    In addition to Bright's comments I remember Anandtech's article that showed the 5675C beating or equalling the 5775C in one or more gaming tests, apparently largely due to the throttling due to Intel's decision to hobble Broadwell with such a low TDP.

Log in

Don't have an account? Sign up now