A Broadwell Retrospective Review in 2020: Is eDRAM Still Worth It?
by Dr. Ian Cutress on November 2, 2020 11:00 AM ESTBroadwell with eDRAM: Still Has Gaming Legs
As we crossover into the 2020s era, we now have more memory bandwidth from DRAM than a processor in 2015. Intel's Broadwell processors were advertised as having 128 megabytes of 'eDRAM', which enabled 50 GiB/s of bidirectional bandwidth at a lower latency of main memory, which ran only at 25.6 GiB/s. Modern processors have access to DDR4-3200, which is 51.2 GiB/s, and future processors are looking at 65 GiB/s or higher.
At this time, it is perhaps poignant to take a step back and understand the beauty of having 128 MiB of dedicated silicon for a singular task.
Intel’s eDRAM enabled Broadwell processors accelerated a significant number of memory bandwidth and memory latency workloads, in particular gaming. What eDRAM has enabled in our testing, even if we bypass the now antiquated CPU performance, is surprisingly good gaming performance. Most of our CPU gaming tests are designed to enable a CPU-limited scenario, which is exactly where Broadwell can play best. Our final CPU gaming test is a 1080p Max scenario where the CPU matters less, but there still appears to be good benefits from having an on-die DRAM and that much lower latency all the way out to 128 MiB.
There have always been questions around exactly what 128 MiB of eDRAM cost Intel to produce and supply to a generation of processors. At launch, Intel priced the eDRAM versions of 14 nm Broadwell processors as +$60 above the non-eDRAM versions of 22 nm Haswell equivalents. There are arguments to say that it cost Intel directly somewhere south of $10 per processor to build and enable, but Intel couldn’t charge that low, based on market segmentation. Remember, that eDRAM was built on a mature 22 nm SoC process at the time.
As we move into an era where AMD is showcasing its new ‘double’ 32 MiB L3 cache on Zen 3 as a key part of their improved gaming performance, we already had 128 MiB of gaming acceleration in 2015. It was enabled through a very specific piece of hardware built into the chip. If we could do it in 2015, why can’t we do it in 2020?
What about HBM-enabled eDRAM for 2021?
Fast forward to 2020, and we now have mature 14 nm and 7 nm processes, as well as a cavalcade of packaging and eDRAM opportunities. We might consider that adding 1-2 GiB of eDRAM to a package could be done with high bandwidth connectivity, using either Intel’s embedded multi-die technology or TSMC’s 3DFabric technology.
If we did that today, it could arguably be just as complex as what it was to add 128 MiB back in 2015. We now have extensive EDA and packaging tools to deal with chiplet designs and multi-die environments.
So consider, at a time where high performance consumer processors are in the realm of $300 up to $500-$800, would customers consider paying +$60 more for a modern high-end processor with 2 gigabytes of intermediate L4 cache? It would extend AMD’s idea of high-performance gaming cache well beyond the 32 MiB of Zen 3, or perhaps give Intel a different dynamic to its future processor portfolio.
As we move into more a chiplet enabled environment, some of those chiplets could be an extra cache layer. However, to put some of this into perspective.
- Intel's Broadwell's 128 MiB of eDRAM was built (is still built) on Intel's 22nm IO process and used 77 mm2 of die area.
- AMD's new RX 6000 GPUs use '128 MiB' of 7nm Infinity Cache SRAM. At an estimated 6.4 billion transistors, or 24% of the 26.8 billion transistors and ~510-530mm2 die, this cache requires a substantial amount of die area, even on 7nm.
This would suggest that in order for future products to integrate large amounts of cache or eDRAM, then layered solutions will need to be required. This will require large investment in design and packaging, especially thermal control.
Many thanks to Dylan522p for some minor updates on die size and pointing out that the same 22nm eDRAM chip is still in use today with Apple's 2020 base Macbook Pro 13.
120 Comments
View All Comments
bernstein - Monday, November 2, 2020 - link
GDDR6 would be ideally suited as an L4 CPU cache... it has >500GB/s throughput and relatively low cost...e36Jeff - Monday, November 2, 2020 - link
Sure, if you build a 256-bit bus and somehow cram 8 GDDR6 chips onto the CPU package. You'd also be losing 30-40W of TDP to that.This is an application that HBM2 would be much better for. You can easily cram up to 4GB into the package with a much lower TDP impact and still get your 500+GB/s throughput. The biggest issue for this is going to be the impact of having to add in another memory controller and the associated die space and power that it eats up.
FreckledTrout - Monday, November 2, 2020 - link
This is also how I see it playing out. Certainly by the time Intel/AMD switch to using GAAFET maybe before. You just need a couple die shrinks that bring densities up and power down.bernstein - Monday, November 2, 2020 - link
scratch that, GDDR6 has much too high latency...stanleyipkiss - Monday, November 2, 2020 - link
The 5775C was ahead of its time. Don't know why they didn't go down that rabbit hole (of increasing the size with each gen)hecksagon - Monday, November 2, 2020 - link
Adding an extra 84mm2 of die area is a recipe for margin erosion, especially when the benefit is situational.CrispySilicon - Monday, November 2, 2020 - link
Well, I use a 5775C for my main home PC (using it now) and it's more than that. Broadwell was designed for low power. It doesn't run well over 4Ghz and it's not made to.My rig idles at about 800mhz, clocks up to 4ghz on all cores, 2ghz on the edram, and 2ghz on DDR3L (overclocked 1866 hyperx fury), yes, 3L, becuase THAT'S where the magic happens. Low power performance.
I've also used TridentX 2400CL10 modules in it, not worth the higher voltage.
I'm going to upgrade finally next year. CXL and DDR5 will finally retire this diamond in the rough.
Retest with nothing in the BIOS changed except the eDRAM multiplier to 20 and see what happens.
Notmyusualid - Wednesday, November 4, 2020 - link
I usually run my Broadwell at 4.4GHz 24/7. However I have a failed bios battery so using the m/b default 4.0GHz overclock settings today. I don't let mine idle at low speeds, its High Performance mode only & I only boot the Desktop for gaming, or Software Define Radio. Both of which want GHz.Memory is Vengeance LED 3200MHz (CL15 & only stable at 3000MHz, XMP is not stable either), and 32GB is currently installed.
Given;
C:\Windows\System32>winsat mem
Windows System Assessment Tool
> Running: Feature Enumeration ''
> Run Time 00:00:00.00
> Running: System memory performance assessment ''
> Run Time 00:00:05.45
> Memory Performance 54386.55 MB/s
> Total Run Time 00:00:06.65
I think that is why my Broadwell missed out on any eDRAM - it wasn't necessary.
Dolphin runs about 35x seconds, as I remember it.
6950X running cool in 2020...
MrCommunistGen - Monday, November 2, 2020 - link
HA. Epic timing. Just starting to read this now, but I recently built a system with a Broadwell-based Xeon E3 chip I got for cheap on eBay. Mostly just because I wanted to play with a chip that had eDRAM and the price of entry for an i5 or i7 has remained pretty high.This will be a very interesting read!
alufan - Monday, November 2, 2020 - link
News all day as long as its about Intel so it seems on here said it before and have seen nothing since to change my mind