Haswell GPU Architecture & Iris Pro

In 2010, Intel’s Clarkdale and Arrandale CPUs dropped the GMA (Graphics Media Accelerator) label from its integrated graphics. From that point on, all Intel graphics would be known as Intel HD graphics. With certain versions of Haswell, Intel once again parts ways with its old brand and introduces a new one, this time the change is much more significant.

Intel attempted to simplify the naming confusion with this slide:

While Sandy and Ivy Bridge featured two different GPU implementations (GT1 and GT2), Haswell adds a third (GT3).

Basically it boils down to this. Haswell GT1 is just called Intel HD Graphics, Haswell GT2 is HD 4200/4400/4600. Haswell GT3 at or below 1.1GHz is called HD 5000. Haswell GT3 capable of hitting 1.3GHz is called Iris 5100, and finally Haswell GT3e (GT3 + embedded DRAM) is called Iris Pro 5200.

The fundamental GPU architecture hasn’t changed much between Ivy Bridge and Haswell. There are some enhancements, but for the most part what we’re looking at here is a dramatic increase in the amount of die area allocated for graphics.

All GPU vendors have some fundamental building block they scale up/down to hit various performance/power/price targets. AMD calls theirs a Compute Unit, NVIDIA’s is known as an SMX, and Intel’s is called a sub-slice.

In Haswell, each graphics sub-slice features 10 EUs. Each EU is a dual-issue SIMD machine with two 4-wide vector ALUs:

Low Level Architecture Comparison
  AMD GCN Intel Gen7 Graphics NVIDIA Kepler
Building Block GCN Compute Unit Sub-Slice Kepler SMX
Shader Building Block 16-wide Vector SIMD 2 x 4-wide Vector SIMD 32-wide Vector SIMD
Smallest Implementation 4 SIMDs 10 SIMDs 6 SIMDs
Smallest Implementation (ALUs) 64 80 192

There are limitations as to what can be co-issued down each EU’s pair of pipes. Intel addressed many of the co-issue limitations last generation with Ivy Bridge, but there are still some that remain.

Architecturally, this makes Intel’s Gen7 graphics core a bit odd compared to AMD’s GCN and NVIDIA’s Kepler, both of which feature much wider SIMD arrays without any co-issue requirements. The smallest sub-slice in Haswell however delivers a competitive number of ALUs to AMD and NVIDIA implementations.

Intel had a decent building block with Ivy Bridge, but it chose not to scale it up as far as it would go. With Haswell that changes. In its highest performing configuration, Haswell implements four sub-slices or 40 EUs. Doing the math reveals a very competent looking part on paper:

Peak Theoretical GPU Performance
  Cores/EUs Peak FP ops per Core/EU Max GPU Frequency Peak GFLOPs
Intel Iris Pro 5100/5200 40 16 1300MHz 832 GFLOPS
Intel HD Graphics 5000 40 16 1100MHz 704 GFLOPS
NVIDIA GeForce GT 650M 384 2 900MHz 691.2 GFLOPS
Intel HD Graphics 4600 20 16 1350MHz 432 GFLOPS
Intel HD Graphics 4000 16 16 1150MHz 294.4 GFLOPS
Intel HD Graphics 3000 12 12 1350MHz 194.4 GFLOPS
Intel HD Graphics 2000 6 12 1350MHz 97.2 GFLOPS
Apple A6X 32 8 300MHz 76.8 GFLOPS

In its highest end configuration, Iris has more raw compute power than a GeForce GT 650M - and even more than a GeForce GT 750M. Now we’re comparing across architectures here so this won’t necessarily translate into a performance advantage in games, but the takeaway is that with HD 5000, Iris 5100 and Iris Pro 5200 Intel is finally walking the walk of a GPU company.

Peak theoretical performance falls off steeply as soon as you start looking at the GT2 and GT1 implementations. With 1/4 - 1/2 of the execution resources as the GT3 graphics implementation, and no corresponding increase in frequency to offset the loss the slower parts are substantially less capable. The good news is that Haswell GT2 (HD 4600) is at least more capable than Ivy Bridge GT2 (HD 4000).

Taking a step back and looking at the rest of the theoretical numbers gives us a more well rounded look at Intel’s graphics architectures :

Peak Theoretical GPU Performance
  Peak Pixel Fill Rate Peak Texel Rate Peak Polygon Rate Peak GFLOPs
Intel Iris Pro 5100/5200 10.4 GPixels/s 20.8 GTexels/s 650 MPolys/s 832 GFLOPS
Intel HD Graphics 5000 8.8 GPixels/s 17.6 GTexels/s 550 MPolys/s 704 GFLOPS
NVIDIA GeForce GT 650M 14.4 GPixels/s 28.8 GTexels/s 900 MPolys/s 691.2 GFLOPS
Intel HD Graphics 4600 5.4 GPixels/s 10.8 GTexels/s 675 MPolys/s 432 GFLOPS
AMD Radeon HD 7660D (Desktop Trinity, A10-5800K) 6.4 GPixels/s 19.2 GTexels/s 800 MPolys/s 614 GFLOPS
AMD Radeon HD 7660G (Mobile Trinity, A10-4600M) 3.97 GPixels/s 11.9 GTexels/s 496 MPolys/s 380 GFLOPS

Intel may have more raw compute, but NVIDIA invested more everywhere else in the pipeline. Triangle, texturing and pixel throughput capabilities are all higher on the 650M than on Iris Pro 5200. Compared to AMD's Trinity however, Intel has a big advantage.

The Prelude Crystalwell: Addressing the Memory Bandwidth Problem
Comments Locked

177 Comments

View All Comments

  • TheJian - Sunday, June 2, 2013 - link

    This is useless at anything above 1366x768 for games (and even that is questionable as I don't think you were posting minimum fps here). It will also be facing richland shortly not AMD's aging trinity. And the claims of catching a 650M...ROFL. Whatever Intel. I wouldn't touch a device today with less than 1600x900 and want to be able to output it to at least a 1080p when in house (if not higher, 22in or 24in). Discrete is here to stay clearly. I have an Dell i9300 (Geforce 6800) from ~2005 that is more potent and runs 1600x900 stuff fine, I think it has 256MB of memory. My dad has an i9200 (radeon 9700pro with 128mb I think) that this IRIS would have trouble with. Intel has a ways to go before they can claim to take out even the low-end discrete cards. You are NOT going to game on this crap and enjoy it never mind trying to use HDMI/DVI out to a higher res monitor at home. Good for perhaps the NICHE road warrior market, not much more.

    But hey, at least it plays quite a bit of the GOG games catalog now...LOL. Icewind Dale and Baldur's gate should run fine :)
  • wizfactor - Sunday, June 2, 2013 - link

    Shimpi's guess as to what will go into the 15-inch rMBP is interesting, but I have a gut feeling that it will not be the case. Despite the huge gains that Iris Pro has over the existing HD 4000, it is still a step back from last year's GT 650M. I doubt Apple will be able to convince its customers to spend $2199 on a computer that has less graphics performance than last year's (now discounted) model. Despite its visual similarity to an Air, the rMBP still has performance as a priority, so my guess is that Apple will stick to discrete for the time-being.

    That being said, I think Iris Pro opens up a huge opportunity to the 15-inch rMBP lineup, mainly a lower entry model that finally undercuts the $2000 barrier. In other words, while the $2199 price point may be too high to switch entirely to iGPU, Apple might be able to pull it off at $1799. Want a 15-inch Retina Display? Here's a more affordable model with decent performance. Want a discrete GPU? You can get that with the existing $2199 price point.

    As far as the 13-inch version is concerned, my guesses are rather murky. I would agree with the others that a quad-core Haswell with Iris Pro is the best-case scenario for the 13-inch model, but it might be too high an expectation for Apple engineers to live up to. I think Apple's minimum target with the 13-inch rMBP should be dual-core Haswell with Iris 5100. This way, Apple can stick to a lower TDP via dual-core, and while Iris isn't as strong as Iris Pro, its gain over HD 4000 is enough to justify the upgrade. Of course, there's always the chance that Apple has temporary exclusivity on an unannounced dual-core Haswell with Iris Pro, the same way it had exclusivity with ULV Core 2 Duo years ago with MBA, but I prefer not to make Haswell models out of thin air.
  • BSMonitor - Monday, June 3, 2013 - link

    You are assuming that the next MBP will have the same chasis size. If thin is in, the dGPU-less Iris Pro is EXTREMELY attractive for heat/power considerations..

    More likely is the end of the thicker MBP and separate thin MBAir lines. Almost certainly, starting in two weeks we have just one line, MBP all with retina, all the thickness of MBAir. 11" up to 15"..
  • TheJian - Sunday, June 2, 2013 - link

    As far as encoding goes, why do you guys ignore cuda?
    http://www.extremetech.com/computing/128681-the-wr...
    Extremetech's last comment:
    "Avoid MediaEspresso entirely."

    So the one you pick is the worst of the bunch to show GPU power....jeez. You guys clearly have a CS6 suite lic so why not run Adobe Premiere which uses Cuda and run it vs the same vid render you use in Sony's Vegas? Surely you can rip the same vid in both to find out why you'd seek a CUDA enabled app to rip with. Handbrake looks like they're working on supporting Cuda also shortly. Or heck, try FREEMAKE (yes free with CUDA). Anything besides ignoring CUDA and acting like this is what a user would get at home. If I owned an NV card (and I don't in my desktop) I'd seek cuda for everything I did that I could find. Freemake just put out another update 5/29 a few days ago.
    http://www.tested.com/tech/windows/1574-handbrake-...
    2.5yrs ago it was equal, my guess is they've improved Cuda use by now. You've gotta love Adam and Jamie... :) Glad they branched out past just the Mythbusters show.
  • xrror - Sunday, June 2, 2013 - link

    I have a bad suspicion one of the reasons why you won't see a desktop Haswell part with eDRAM is that it would pretty much euthanize socket 2011 on the spot.

    IF Intel does actually release a "K" part with it enabled, I wonder how restrictive or flexible the frequency ratios on the eDRAM will be?

    Speaking of socket 2011, I wonder if/when Intel will ever refresh it from Sandy-E?
  • wizfactor - Sunday, June 2, 2013 - link

    I wouldn't call myself an expert on computer hardware, but isn't it possible that Iris Pro's bottleneck at 1600x900 resolutions could be attributed to insufficient video memory? Sure, that eDRAM is a screamer as far as latency is concerned, but if the game is running on higher resolutions and utilising HD textures, that 128MB would fill up really quickly, and the chip would be forced to swap often. Better to not have to keep loading and unloading stuff in memory, right?

    Others note the similarity between Crystalwell and the Xbox One's 32MB Cache, but let's not forget that the Xbox One has its own video memory; Iris Pro does not, or put another way, it's only got 128 MB of it. In a time where PC games demand at least 512 MB of video RAM or more, shouldn't the bottleneck that would affect Iris Pro be obvious? 128 MB of RAM is sure as hell a lot more than 0, but if games demand at least four times as much memory, then wouldn't Iris Pro be forced to use regular RAM to compensate, still? This sounds to me like what's causing Iris Pro to choke at higher resolutions.

    If I am at least right about Crystalwell, it is still very impressive that Iris Pro was able to get in reach of the GT 650M with so little memory to work with. It could also explain why Iris Pro does so much better in Crysis: Warhead, where the minimum requirements are more lenient with video memory (256 MB minimum). If I am wrong, however, somebody please correct me, and I would love to have more discussion on this matter.
  • BSMonitor - Monday, June 3, 2013 - link

    Me thinks thou not know what thou talking about ;)
  • F_A - Monday, June 3, 2013 - link

    The video memory is stored in main memory being it 4GB and above...(so minspecs of crysis are clearly met)... the point is bandwidtht.
    The article is telling there are roughly 50GB/s when the cachè is run with 1.6 Ghz.
    So ramping it up in füture makes the new Iris 5300 i suppose.
  • glugglug - Tuesday, June 4, 2013 - link

    Video cards may have 512MB to 1GB of video memory for marketing purposes, but you would be hard pressed to find a single game title that makes use of more than 128.
  • tipoo - Wednesday, January 21, 2015 - link

    Uhh, what? Games can use far more than that, seeing them push past 2GB is common. But what matters is how much of that memory needs high bandwidth, and that's where 128MB of cache can be a good enough solution for most games.

Log in

Don't have an account? Sign up now