Haswell GPU Architecture & Iris Pro

In 2010, Intel’s Clarkdale and Arrandale CPUs dropped the GMA (Graphics Media Accelerator) label from its integrated graphics. From that point on, all Intel graphics would be known as Intel HD graphics. With certain versions of Haswell, Intel once again parts ways with its old brand and introduces a new one, this time the change is much more significant.

Intel attempted to simplify the naming confusion with this slide:

While Sandy and Ivy Bridge featured two different GPU implementations (GT1 and GT2), Haswell adds a third (GT3).

Basically it boils down to this. Haswell GT1 is just called Intel HD Graphics, Haswell GT2 is HD 4200/4400/4600. Haswell GT3 at or below 1.1GHz is called HD 5000. Haswell GT3 capable of hitting 1.3GHz is called Iris 5100, and finally Haswell GT3e (GT3 + embedded DRAM) is called Iris Pro 5200.

The fundamental GPU architecture hasn’t changed much between Ivy Bridge and Haswell. There are some enhancements, but for the most part what we’re looking at here is a dramatic increase in the amount of die area allocated for graphics.

All GPU vendors have some fundamental building block they scale up/down to hit various performance/power/price targets. AMD calls theirs a Compute Unit, NVIDIA’s is known as an SMX, and Intel’s is called a sub-slice.

In Haswell, each graphics sub-slice features 10 EUs. Each EU is a dual-issue SIMD machine with two 4-wide vector ALUs:

Low Level Architecture Comparison
  AMD GCN Intel Gen7 Graphics NVIDIA Kepler
Building Block GCN Compute Unit Sub-Slice Kepler SMX
Shader Building Block 16-wide Vector SIMD 2 x 4-wide Vector SIMD 32-wide Vector SIMD
Smallest Implementation 4 SIMDs 10 SIMDs 6 SIMDs
Smallest Implementation (ALUs) 64 80 192

There are limitations as to what can be co-issued down each EU’s pair of pipes. Intel addressed many of the co-issue limitations last generation with Ivy Bridge, but there are still some that remain.

Architecturally, this makes Intel’s Gen7 graphics core a bit odd compared to AMD’s GCN and NVIDIA’s Kepler, both of which feature much wider SIMD arrays without any co-issue requirements. The smallest sub-slice in Haswell however delivers a competitive number of ALUs to AMD and NVIDIA implementations.

Intel had a decent building block with Ivy Bridge, but it chose not to scale it up as far as it would go. With Haswell that changes. In its highest performing configuration, Haswell implements four sub-slices or 40 EUs. Doing the math reveals a very competent looking part on paper:

Peak Theoretical GPU Performance
  Cores/EUs Peak FP ops per Core/EU Max GPU Frequency Peak GFLOPs
Intel Iris Pro 5100/5200 40 16 1300MHz 832 GFLOPS
Intel HD Graphics 5000 40 16 1100MHz 704 GFLOPS
NVIDIA GeForce GT 650M 384 2 900MHz 691.2 GFLOPS
Intel HD Graphics 4600 20 16 1350MHz 432 GFLOPS
Intel HD Graphics 4000 16 16 1150MHz 294.4 GFLOPS
Intel HD Graphics 3000 12 12 1350MHz 194.4 GFLOPS
Intel HD Graphics 2000 6 12 1350MHz 97.2 GFLOPS
Apple A6X 32 8 300MHz 76.8 GFLOPS

In its highest end configuration, Iris has more raw compute power than a GeForce GT 650M - and even more than a GeForce GT 750M. Now we’re comparing across architectures here so this won’t necessarily translate into a performance advantage in games, but the takeaway is that with HD 5000, Iris 5100 and Iris Pro 5200 Intel is finally walking the walk of a GPU company.

Peak theoretical performance falls off steeply as soon as you start looking at the GT2 and GT1 implementations. With 1/4 - 1/2 of the execution resources as the GT3 graphics implementation, and no corresponding increase in frequency to offset the loss the slower parts are substantially less capable. The good news is that Haswell GT2 (HD 4600) is at least more capable than Ivy Bridge GT2 (HD 4000).

Taking a step back and looking at the rest of the theoretical numbers gives us a more well rounded look at Intel’s graphics architectures :

Peak Theoretical GPU Performance
  Peak Pixel Fill Rate Peak Texel Rate Peak Polygon Rate Peak GFLOPs
Intel Iris Pro 5100/5200 10.4 GPixels/s 20.8 GTexels/s 650 MPolys/s 832 GFLOPS
Intel HD Graphics 5000 8.8 GPixels/s 17.6 GTexels/s 550 MPolys/s 704 GFLOPS
NVIDIA GeForce GT 650M 14.4 GPixels/s 28.8 GTexels/s 900 MPolys/s 691.2 GFLOPS
Intel HD Graphics 4600 5.4 GPixels/s 10.8 GTexels/s 675 MPolys/s 432 GFLOPS
AMD Radeon HD 7660D (Desktop Trinity, A10-5800K) 6.4 GPixels/s 19.2 GTexels/s 800 MPolys/s 614 GFLOPS
AMD Radeon HD 7660G (Mobile Trinity, A10-4600M) 3.97 GPixels/s 11.9 GTexels/s 496 MPolys/s 380 GFLOPS

Intel may have more raw compute, but NVIDIA invested more everywhere else in the pipeline. Triangle, texturing and pixel throughput capabilities are all higher on the 650M than on Iris Pro 5200. Compared to AMD's Trinity however, Intel has a big advantage.

The Prelude Crystalwell: Addressing the Memory Bandwidth Problem
Comments Locked

177 Comments

View All Comments

  • DanaGoyette - Saturday, June 1, 2013 - link

    Any idea if this IGP supports 30-bit color and/or 120Hz displays?
    Currently, laptops like the HP EliteBook 8770w and Dell Precision M6700 haven't been able to use Optimus if you opt for such displays. It would be nice to see that question addressed...
  • DickGumshoe - Saturday, June 1, 2013 - link

    I have been planning on getting a Haswell rMBP 15". I was holding out for Haswell namely due to the increased iGPU performance. My primary issue with the current Ivy Bridge rMBP is the lagginess with much of the UI, especially when there are multiple open windows.

    However, I'm a bit concerned about how the Haswell CPU's will compare with the current Ivy Bridge CPU's that Apple is currently shipping with the rMBP. The Haswell equivalent of the current rMBP Ivy Bridge CPU's do not have the Iris Pro, they only have the "slightly improved" HD 4600.

    Obviously, we still need to wait until WWDC, but based on the released Haswell info, will Haswell only be a slight bump in performance for the 15" rMBP? If so, that is *very* disappointing news.
  • hfm - Saturday, June 1, 2013 - link

    This is a huge win for Intel, definitely performance on par with a 650M. It's just as playable on nearly all those games at 1366x768. Even though the 650M pulls away at 1600X900, I wouldn't call either gpu playable in most of those games at that resolution.

    you look at it intelligently, this is a huge win by Intel. The 750M may save them, but if I was in the market for an Ultrabook to complement my gaming notebook, I would definitely go with iris pro. Hell, even if I didn't have a dedicated gaming notebook I would probably get iris Peru in my Ultrabook just for the power savings, it's not that much slower at playable resolution.
  • IntelUser2000 - Tuesday, June 4, 2013 - link

    Iris Pro 5200 with eDRAM is only for the quad core standard notebook parts. The highest available for the Ultrabook is the 28W version, the regular Iris 5100. Preliminary results shows the Iris 5100 to be roughly on par with Desktop HD 4600.
  • smilingcrow - Saturday, June 1, 2013 - link

    For those commenting about pricing Intel has only released data for the high end Iris Pro enabled SKUs at this point and cheaper ones are due later.
    The high end chips are generally best avoided due to being poor value so stay tuned.
  • whyso - Saturday, June 1, 2013 - link

    Yes, the rmbp is clearly using 90 watts on an 85 watt power adapter for the WHOLE SYSTEM!
  • gxtoast - Sunday, June 2, 2013 - link

    Question for Anand:

    I'm looking at getting a Haswell 15" Ultrabook with 16GB RAM and plenty of SSD to run up come fairly sophisticated Cisco, Microsoft and VMware cloud labs.

    Is it likely that the Crystalwell cache could offset the lower performance specifications on the 4950HQ to make it as competitive, or more so, against the 4900MQ in this scenario?

    It would also be good to understand the performance improvement, for non-game video tasks, the HQ part might have over the 4900MQ on a FHD panel. If the advantage isn't there, then, unless the Crystalwell makes a big difference, the 4900MQ part is likely the one to get.

    Cheers
  • piesquared - Sunday, June 2, 2013 - link

    Question. Why in Kabini reviews did we get the standard "just wait til intel releases their next gen parts to see the real competion OMGBBSAUCE!!" marketing spiel, while not a mention that hsw's competition is Kaveri?
  • IntelUser2000 - Sunday, June 2, 2013 - link

    Uhh, because Haswell launch was less than a month away from Kabini, while Kaveri is 6+ months away from Haswell?

    AMD paper launched Kabini and Richland in March, and products are coming now. Kaveri claims to be late Q4 for Desktop and early Q1 next year for mobile. If they do the same thing, that means Feb-March for Desktop Kaveri and April/May for Mobile. Yeah.... perhaps you should think about that.
  • JarredWalton - Sunday, June 2, 2013 - link

    The Kabini article never said, "just wait and see what Intel has coming!" so much as it said, "We need to see the actual notebooks to see how this plays out, and with Intel's Celeron and Pentium ULV parts are already at Kabini's expected price point, it's a tough row to hoe." Kabini is great as an ARM or Atom competitor; it's not quite so awesome compared to Core i3, unless the OEMs pass the price savings along in some meaningful way. I'd take Kabini with a better display over Core i3 ULV, but I'll be shocked if we actually see a major OEM do Kabini with a quality 1080p panel for under $500.

Log in

Don't have an account? Sign up now