Haswell GPU Architecture & Iris Pro

In 2010, Intel’s Clarkdale and Arrandale CPUs dropped the GMA (Graphics Media Accelerator) label from its integrated graphics. From that point on, all Intel graphics would be known as Intel HD graphics. With certain versions of Haswell, Intel once again parts ways with its old brand and introduces a new one, this time the change is much more significant.

Intel attempted to simplify the naming confusion with this slide:

While Sandy and Ivy Bridge featured two different GPU implementations (GT1 and GT2), Haswell adds a third (GT3).

Basically it boils down to this. Haswell GT1 is just called Intel HD Graphics, Haswell GT2 is HD 4200/4400/4600. Haswell GT3 at or below 1.1GHz is called HD 5000. Haswell GT3 capable of hitting 1.3GHz is called Iris 5100, and finally Haswell GT3e (GT3 + embedded DRAM) is called Iris Pro 5200.

The fundamental GPU architecture hasn’t changed much between Ivy Bridge and Haswell. There are some enhancements, but for the most part what we’re looking at here is a dramatic increase in the amount of die area allocated for graphics.

All GPU vendors have some fundamental building block they scale up/down to hit various performance/power/price targets. AMD calls theirs a Compute Unit, NVIDIA’s is known as an SMX, and Intel’s is called a sub-slice.

In Haswell, each graphics sub-slice features 10 EUs. Each EU is a dual-issue SIMD machine with two 4-wide vector ALUs:

Low Level Architecture Comparison
  AMD GCN Intel Gen7 Graphics NVIDIA Kepler
Building Block GCN Compute Unit Sub-Slice Kepler SMX
Shader Building Block 16-wide Vector SIMD 2 x 4-wide Vector SIMD 32-wide Vector SIMD
Smallest Implementation 4 SIMDs 10 SIMDs 6 SIMDs
Smallest Implementation (ALUs) 64 80 192

There are limitations as to what can be co-issued down each EU’s pair of pipes. Intel addressed many of the co-issue limitations last generation with Ivy Bridge, but there are still some that remain.

Architecturally, this makes Intel’s Gen7 graphics core a bit odd compared to AMD’s GCN and NVIDIA’s Kepler, both of which feature much wider SIMD arrays without any co-issue requirements. The smallest sub-slice in Haswell however delivers a competitive number of ALUs to AMD and NVIDIA implementations.

Intel had a decent building block with Ivy Bridge, but it chose not to scale it up as far as it would go. With Haswell that changes. In its highest performing configuration, Haswell implements four sub-slices or 40 EUs. Doing the math reveals a very competent looking part on paper:

Peak Theoretical GPU Performance
  Cores/EUs Peak FP ops per Core/EU Max GPU Frequency Peak GFLOPs
Intel Iris Pro 5100/5200 40 16 1300MHz 832 GFLOPS
Intel HD Graphics 5000 40 16 1100MHz 704 GFLOPS
NVIDIA GeForce GT 650M 384 2 900MHz 691.2 GFLOPS
Intel HD Graphics 4600 20 16 1350MHz 432 GFLOPS
Intel HD Graphics 4000 16 16 1150MHz 294.4 GFLOPS
Intel HD Graphics 3000 12 12 1350MHz 194.4 GFLOPS
Intel HD Graphics 2000 6 12 1350MHz 97.2 GFLOPS
Apple A6X 32 8 300MHz 76.8 GFLOPS

In its highest end configuration, Iris has more raw compute power than a GeForce GT 650M - and even more than a GeForce GT 750M. Now we’re comparing across architectures here so this won’t necessarily translate into a performance advantage in games, but the takeaway is that with HD 5000, Iris 5100 and Iris Pro 5200 Intel is finally walking the walk of a GPU company.

Peak theoretical performance falls off steeply as soon as you start looking at the GT2 and GT1 implementations. With 1/4 - 1/2 of the execution resources as the GT3 graphics implementation, and no corresponding increase in frequency to offset the loss the slower parts are substantially less capable. The good news is that Haswell GT2 (HD 4600) is at least more capable than Ivy Bridge GT2 (HD 4000).

Taking a step back and looking at the rest of the theoretical numbers gives us a more well rounded look at Intel’s graphics architectures :

Peak Theoretical GPU Performance
  Peak Pixel Fill Rate Peak Texel Rate Peak Polygon Rate Peak GFLOPs
Intel Iris Pro 5100/5200 10.4 GPixels/s 20.8 GTexels/s 650 MPolys/s 832 GFLOPS
Intel HD Graphics 5000 8.8 GPixels/s 17.6 GTexels/s 550 MPolys/s 704 GFLOPS
NVIDIA GeForce GT 650M 14.4 GPixels/s 28.8 GTexels/s 900 MPolys/s 691.2 GFLOPS
Intel HD Graphics 4600 5.4 GPixels/s 10.8 GTexels/s 675 MPolys/s 432 GFLOPS
AMD Radeon HD 7660D (Desktop Trinity, A10-5800K) 6.4 GPixels/s 19.2 GTexels/s 800 MPolys/s 614 GFLOPS
AMD Radeon HD 7660G (Mobile Trinity, A10-4600M) 3.97 GPixels/s 11.9 GTexels/s 496 MPolys/s 380 GFLOPS

Intel may have more raw compute, but NVIDIA invested more everywhere else in the pipeline. Triangle, texturing and pixel throughput capabilities are all higher on the 650M than on Iris Pro 5200. Compared to AMD's Trinity however, Intel has a big advantage.

The Prelude Crystalwell: Addressing the Memory Bandwidth Problem
Comments Locked

177 Comments

View All Comments

  • HisDivineOrder - Saturday, June 1, 2013 - link

    I see Razer making an Edge tablet with an Iris-based chip. In fact, it seems built for that idea more than anything else. That or a NUC HTPC run at 720p with no AA ever. You've got superior performance to any console out there right now and it's in a size smaller than an AppleTV.

    So yeah, the next Razer Edge should include this as an optional way to lower the cost of the whole system. I also think the next Surface Pro should use this. So high end x86-based laptops with Windows 8 Pro.

    And NUC/BRIX systems that are so small they don't have room for discrete GPU's.

    I imagine some thinner than makes sense ultrathins could also use this to great effect.

    All that said, most systems people will be able to afford and use on a regular basis won't be using this chip. I think that's sad, but it's the way it will be until Intel stops trying to use Iris as a bonus for the high end users instead of trying to put discrete GPU's out of business by putting these on every chip they make so people start seeing it CAN do a decent job on its own within its specific limitations.

    Right now, no one's going to see that, except those few fringe cases. Strictly speaking, while it might not have matched the 650m (or its successor), it did a decent job with the 640m and that's a lot better than any other IGP by Intel.
  • Spunjji - Tuesday, June 4, 2013 - link

    You confused me here on these points:

    1) The NUC uses a 17W TDP chip and overheats. We're not going to have Iris in that form factor yet.
    2) It would increase the cost of the Edge, not lower it. Same TDP problem too.

    Otherwise I agree, this really needs to roll down lower in the food chain to have a serious impact. Hopefully they'll do that with Broadwell used by the GPU when the die area effectively becomes free thanks to the process switch.
  • whyso - Saturday, June 1, 2013 - link

    So intel was right. Iris Pro pretty much matches a 650m at playable settings (30 fps +). Note that anandtech is being full of BullS**t here and comparing it to an OVERCLOCKED 650m from apple. Lets see, when intel made that 'equal to a 650m' claim it was talking about a standard 650m not an overclocked 650m running at 900/2500 (GDDR5) vs the normal 835/1000 (GDDR5 + boost at full, no boost = 735 mhz core). If you look at a standard clocked GDDR3 variant iris pro 5200 and the 650m are pretty much very similar (depending on the games) within around 10%. New Intel drivers should further shorten the gap (given that intel is quite good in compute).
  • JarredWalton - Sunday, June 2, 2013 - link

    http://www.anandtech.com/bench/Product/814

    For the games I tested, the rMBP15 isnt' that much faster in many titles. Iris isn't quite able to match GT 650M, but it's pretty close all things considered.
  • Spunjji - Tuesday, June 4, 2013 - link

    I will believe this about new Intel drivers when I see them. I seriously, genuinely hope they surprise me, though.
  • dbcoopernz - Saturday, June 1, 2013 - link

    Are you going to test this system with madVR?
  • Ryan Smith - Sunday, June 2, 2013 - link

    We have Ganesh working to answer that question right now.
  • dbcoopernz - Sunday, June 2, 2013 - link

    Cool. :)
  • JDG1980 - Saturday, June 1, 2013 - link

    I would have liked to see some madVR tests. It seems to me that the particular architecture of this chip - lots of computing power, somewhat less memory bandwidth - would be very well suited to madVR's better processing options. It's been established that difficult features like Jinc scaling (the best quality) are limited by shader performance, not bandwidth.
    The price is far steeper than I would have expected, but once it inevitably drops a bit, I could see mini-ITX boards with this become a viable solution for high-end, passively-cooled HTPCs.
    By the way, did they ever fix the 23.976 fps error that has been there since Clarkdale?
  • dbcoopernz - Saturday, June 1, 2013 - link

    Missing Remote reports that 23.976 timing is much better.

    http://www.missingremote.com/review/intel-core-i7-...

Log in

Don't have an account? Sign up now