Addressing the Memory Bandwidth Problem

Integrated graphics solutions always bumped into a glass ceiling because they lacked the high-speed memory interfaces of their discrete counterparts. As Haswell is predominantly a mobile focused architecture, designed to span the gamut from 10W to 84W TDPs, relying on a power-hungry high-speed external memory interface wasn’t going to cut it. Intel’s solution to the problem, like most of Intel’s solutions, involves custom silicon. As a owner of several bleeding edge foundries, would you expect anything less?

As we’ve been talking about for a while now, the highest end Haswell graphics configuration includes 128MB of eDRAM on-package. The eDRAM itself is a custom design by Intel and it’s built on a variant of Intel’s P1271 22nm SoC process (not P1270, the CPU process). Intel needed a set of low leakage 22nm transistors rather than the ability to drive very high frequencies which is why it’s using the mobile SoC 22nm process variant here.

Despite its name, the eDRAM silicon is actually separate from the main microprocessor die - it’s simply housed on the same package. Intel’s reasoning here is obvious. By making Crystalwell (the codename for the eDRAM silicon) a discrete die, it’s easier to respond to changes in demand. If Crystalwell demand is lower than expected, Intel still has a lot of quad-core GT3 Haswell die that it can sell and vice versa.

Crystalwell Architecture

Unlike previous eDRAM implementations in game consoles, Crystalwell is true 4th level cache in the memory hierarchy. It acts as a victim buffer to the L3 cache, meaning anything evicted from L3 cache immediately goes into the L4 cache. Both CPU and GPU requests are cached. The cache can dynamically allocate its partitioning between CPU and GPU use. If you don’t use the GPU at all (e.g. discrete GPU installed), Crystalwell will still work on caching CPU requests. That’s right, Haswell CPUs equipped with Crystalwell effectively have a 128MB L4 cache.

Intel isn’t providing much detail on the connection to Crystalwell other than to say that it’s a narrow, double-pumped serial interface capable of delivering 50GB/s bi-directional bandwidth (100GB/s aggregate). Access latency after a miss in the L3 cache is 30 - 32ns, nicely in between an L3 and main memory access.

The eDRAM clock tops out at 1.6GHz.

There’s only a single size of eDRAM offered this generation: 128MB. Since it’s a cache and not a buffer (and a giant one at that), Intel found that hit rate rarely dropped below 95%. It turns out that for current workloads, Intel didn’t see much benefit beyond a 32MB eDRAM however it wanted the design to be future proof. Intel doubled the size to deal with any increases in game complexity, and doubled it again just to be sure. I believe the exact wording Intel’s Tom Piazza used during his explanation of why 128MB was “go big or go home”. It’s very rare that we see Intel be so liberal with die area, which makes me think this 128MB design is going to stick around for a while.

The 32MB number is particularly interesting because it’s the same number Microsoft arrived at for the embedded SRAM on the Xbox One silicon. If you felt that I was hinting heavily at the Xbox One being ok if its eSRAM was indeed a cache, this is why. I’d also like to point out the difference in future proofing between the two designs.

The Crystalwell enabled graphics driver can choose to keep certain things out of the eDRAM. The frame buffer isn’t stored in eDRAM for example.

Peak Theoretical Memory Bandwidth
  Memory Interface Memory Frequency Peak Theoretical Bandwidth
Intel Iris Pro 5200 128-bit DDR3 + eDRAM 1600MHz + 1600MHz eDRAM 25.6GB/s + 50GB/s eDRAM (bidirectional)
NVIDIA GeForce GT 650M 128-bit GDDR5 5016MHz 80.3 GB/s
Intel HD 5100/4600/4000 128-bit DDR3 1600MHz 25.6GB/s
Apple A6X 128-bit LPDDR2 1066MHz 17.1 GB/s

Intel claims that it would take a 100 - 130GB/s GDDR memory interface to deliver similar effective performance to Crystalwell since the latter is a cache. Accessing the same data (e.g. texture reads) over and over again is greatly benefitted by having a large L4 cache on package.

I get the impression that the plan might be to keep the eDRAM on a n-1 process going forward. When Intel moves to 14nm with Broadwell, it’s entirely possible that Crystalwell will remain at 22nm. Doing so would help Intel put older fabs to use, especially if there’s no need for a near term increase in eDRAM size. I asked about the potential to integrate eDRAM on-die, but was told that it’s far too early for that discussion. Given the size of the 128MB eDRAM on 22nm (~84mm^2), I can understand why. Intel did float an interesting idea by me though. In the future it could integrate 16 - 32MB of eDRAM on-die for specific use cases (e.g. storing the frame buffer).

Intel settled on eDRAM because of its high bandwidth and low power characteristics. According to Intel, Crystalwell’s bandwidth curve is very flat - far more workload independent than GDDR5. The power consumption also sounds very good. At idle, simply refreshing whatever data is stored within, the Crystalwell die will consume between 0.5W and 1W. Under load, operating at full bandwidth, the power usage is 3.5 - 4.5W. The idle figures might sound a bit high, but do keep in mind that since Crystalwell caches both CPU and GPU memory it’s entirely possible to shut off the main memory controller and operate completely on-package depending on the workload. At the same time, I suspect there’s room for future power improvements especially as Crystalwell (or a lower power derivative) heads towards ultra mobile silicon.

Crystalwell is tracked by Haswell’s PCU (Power Control Unit) just like the CPU cores, GPU, L3 cache, etc... Paying attention to thermals, workload and even eDRAM hit rate, the PCU can shift power budget between the CPU, GPU and eDRAM.

Crystalwell is only offered alongside quad-core GT3 Haswell. Unlike previous generations of Intel graphics, high-end socketed desktop parts do not get Crystalwell. Only mobile H-SKUs and desktop (BGA-only) R-SKUs have Crystalwell at this point. Given the potential use as a very large CPU cache, it’s a bit insane that Intel won’t even offer a single K-series SKU with Crystalwell on-board.

As for why lower end parts don’t get it, they simply don’t have high enough memory bandwidth demands - particularly in GT1/GT2 graphics configurations. According to Intel, once you get to about 18W then GT3e starts to make sense but you run into die size constraints there. An Ultrabook SKU with Crystalwell would make a ton of sense, but given where Ultrabooks are headed (price-wise) I’m not sure Intel could get any takers.

Haswell GPU Architecture & Iris Pro The Core i7-4950HQ Mobile CRB
POST A COMMENT

174 Comments

View All Comments

  • DanaGoyette - Saturday, June 01, 2013 - link

    Any idea if this IGP supports 30-bit color and/or 120Hz displays?
    Currently, laptops like the HP EliteBook 8770w and Dell Precision M6700 haven't been able to use Optimus if you opt for such displays. It would be nice to see that question addressed...
    Reply
  • DickGumshoe - Saturday, June 01, 2013 - link

    I have been planning on getting a Haswell rMBP 15". I was holding out for Haswell namely due to the increased iGPU performance. My primary issue with the current Ivy Bridge rMBP is the lagginess with much of the UI, especially when there are multiple open windows.

    However, I'm a bit concerned about how the Haswell CPU's will compare with the current Ivy Bridge CPU's that Apple is currently shipping with the rMBP. The Haswell equivalent of the current rMBP Ivy Bridge CPU's do not have the Iris Pro, they only have the "slightly improved" HD 4600.

    Obviously, we still need to wait until WWDC, but based on the released Haswell info, will Haswell only be a slight bump in performance for the 15" rMBP? If so, that is *very* disappointing news.
    Reply
  • hfm - Saturday, June 01, 2013 - link

    This is a huge win for Intel, definitely performance on par with a 650M. It's just as playable on nearly all those games at 1366x768. Even though the 650M pulls away at 1600X900, I wouldn't call either gpu playable in most of those games at that resolution.

    you look at it intelligently, this is a huge win by Intel. The 750M may save them, but if I was in the market for an Ultrabook to complement my gaming notebook, I would definitely go with iris pro. Hell, even if I didn't have a dedicated gaming notebook I would probably get iris Peru in my Ultrabook just for the power savings, it's not that much slower at playable resolution.
    Reply
  • IntelUser2000 - Tuesday, June 04, 2013 - link

    Iris Pro 5200 with eDRAM is only for the quad core standard notebook parts. The highest available for the Ultrabook is the 28W version, the regular Iris 5100. Preliminary results shows the Iris 5100 to be roughly on par with Desktop HD 4600. Reply
  • smilingcrow - Saturday, June 01, 2013 - link

    For those commenting about pricing Intel has only released data for the high end Iris Pro enabled SKUs at this point and cheaper ones are due later.
    The high end chips are generally best avoided due to being poor value so stay tuned.
    Reply
  • whyso - Saturday, June 01, 2013 - link

    Yes, the rmbp is clearly using 90 watts on an 85 watt power adapter for the WHOLE SYSTEM! Reply
  • gxtoast - Sunday, June 02, 2013 - link

    Question for Anand:

    I'm looking at getting a Haswell 15" Ultrabook with 16GB RAM and plenty of SSD to run up come fairly sophisticated Cisco, Microsoft and VMware cloud labs.

    Is it likely that the Crystalwell cache could offset the lower performance specifications on the 4950HQ to make it as competitive, or more so, against the 4900MQ in this scenario?

    It would also be good to understand the performance improvement, for non-game video tasks, the HQ part might have over the 4900MQ on a FHD panel. If the advantage isn't there, then, unless the Crystalwell makes a big difference, the 4900MQ part is likely the one to get.

    Cheers
    Reply
  • piesquared - Sunday, June 02, 2013 - link

    Question. Why in Kabini reviews did we get the standard "just wait til intel releases their next gen parts to see the real competion OMGBBSAUCE!!" marketing spiel, while not a mention that hsw's competition is Kaveri? Reply
  • IntelUser2000 - Sunday, June 02, 2013 - link

    Uhh, because Haswell launch was less than a month away from Kabini, while Kaveri is 6+ months away from Haswell?

    AMD paper launched Kabini and Richland in March, and products are coming now. Kaveri claims to be late Q4 for Desktop and early Q1 next year for mobile. If they do the same thing, that means Feb-March for Desktop Kaveri and April/May for Mobile. Yeah.... perhaps you should think about that.
    Reply
  • JarredWalton - Sunday, June 02, 2013 - link

    The Kabini article never said, "just wait and see what Intel has coming!" so much as it said, "We need to see the actual notebooks to see how this plays out, and with Intel's Celeron and Pentium ULV parts are already at Kabini's expected price point, it's a tough row to hoe." Kabini is great as an ARM or Atom competitor; it's not quite so awesome compared to Core i3, unless the OEMs pass the price savings along in some meaningful way. I'd take Kabini with a better display over Core i3 ULV, but I'll be shocked if we actually see a major OEM do Kabini with a quality 1080p panel for under $500. Reply

Log in

Don't have an account? Sign up now