Broadwell GPU Architecture

While Intel’s tick products are relatively conservative on the CPU side, the same cannot be said for the GPU side. Although the same general tick-tock rules apply to GPUs as well as they do CPUs – the bigger architectural changes are on the tock – the embarrassingly parallel nature of graphics coupled with the density improvements from newer process nodes means that even in a tick Intel’s GPU improvements are going to be substantial. And Broadwell will be no exception.

From a high level overview, Broadwell’s GPU is a continuation of the Intel Gen7 architecture first pioneered in Ivy Bridge and further refined for Gen7.5 in Haswell. While there are some important underlying changes that we’ll get to in a moment, at a fundamental level this is still the same GPU architecture that we’ve seen from Intel for the last two generations, just with more features, more polish, and more optimizations than ever before.

In terms of functionality Broadwell’s GPU has been upgraded to support the latest and greatest graphics APIs, an important milestone for Intel as this means their iGPU is now at feature parity with iGPUs and dGPUs from AMD and NVIDIA. With support for Direct3D feature level 11_2 and Intel’s previous commitment to Direct3D 12, Intel no longer trails AMD and NVIDIA in base features; in fact with FL 11_2 support they’re even technically ahead of NVIDIA’s FL 11_0 Kepler and Maxwell architectures. FL 11_2 is a rather minor update in the long run, but support for it means that Intel now supports tiled resources and pre-compiled shader headers.

Meanwhile on the compute front, Intel has confirmed that Broadwell’s GPU will offer support for OpenCL 2.0, including OpenCL’s shared virtual memory. OpenCL 2.0 will bring with it several improvements that allow GPUs to be more robust compute devices, and though Intel doesn’t have a programming paradigm comparable to AMD’s HSA, SVM none the less affords Intel and OpenCL programmers the chance to better leverage Broadwell’s CPU and GPU together by directly sharing complex data structures rather than copying them around.

Digging deeper however quickly exposes that Intel hasn’t left their GPU architecture entirely alone. Broadwell-Y, like Haswell-Y before it, implements a single slice configuration of Intel’s GPU architecture. However the composition of a slice will be changing for Broadwell, and this will have a significant impact on the balance between various execution units.

Low Level Architecture Comparison
  AMD GCN NVIDIA Maxwell Intel Gen7.5 Graphics Intel Gen8 Graphics
Building Block GCN Compute Unit Maxwell SMM Sub-Slice Sub-Slice
Shader Building Block 16-wide Vector SIMD 32-wide Vector SIMD 2 x 4-wide Vector SIMD 2 x 4-wide Vector SIMD
Smallest Implementation 4 SIMDs 4 SIMDs 10 SIMDs 8 SIMDs
Smallest Implementation (ALUs) 64 128 80 64

In Haswell-Y Intel used a GT2 configuration, which was composed of a single slice that in turn contained 2 sub-slices. In Intel’s GPU architecture the sub-slice is the smallest functional building block of the GPU, containing the EUs (shaders) along with caches and texture/data/media samplers. Each EU in turn was composed of 2 4-wide vector SIMDs, with 10 EUs per sub-slice.

For Broadwell Intel is not changing the fundamental GPU architecture, but they are rebalancing the number of EUs per sub-slice and increasing the number of sub-slices overall. As compared to Haswell, Broadwell’s sub-slices will contain 8 EUs per sub-slice, with a complete slice now containing 3 sub-slices. Taken altogether this means that whereas Haswell-Y was a 2x10EU GPU, Broadwell-Y will be a 3x8EU GPU.

The ramifications of this is that not only is the total number of EUs increased by 20% from 20 to 24, but Intel has greatly increased the ratio of L1 cache and samplers relative to EUs. There is now 25% more sampling throughput per EU, with a total increase in sampler throughput (at identical clockspeeds) of 50%. By PC GPU standards increases in the ratio of samplers to EUs is very rare, with most designs decreasing that ratio over the years. The fact that Intel is increasing this ratio is a strong sign that Haswell’s balance may have been suboptimal for modern workloads, lacking enough sampler throughput to keep up with its shaders.

Moving on, along with the sub-slices front end and common slice are also receiving their own improvements. The common slice – responsible for housing the ROPs, rasterizer, and a port for the L3 cache – is receiving some microarchitecture improvements to further increase pixel and Z fill rates. Meanwhile the front end’s geometry units are also being beefed up to increase geometry throughput at that end.

Much like overall CPU performance, Intel isn’t talking about overall GPU performance at this time. Between the 20% increase in shading resources and 50% increase in sampling resources Broadwell’s GPU should deliver some strong performance gains, though it seems unlikely that it will be on the order of a full generational gain (e.g. catching up to Haswell GT3). What Intel is doing however is reiterating the benefits of their 14nm process in this case, noting that because 14nm significantly reduces GPU power consumption it will allow for more thermal headroom, which should further improve both burst and sustained GPU performance in TDP-limited scenarios relative to Haswell.

14nm isn’t the only technique Intel has to optimize power consumption on Broadwell’s GPU, which brings us to Broadwell’s final GPU technology improvement: Duty Cycle Control. While Intel has been able to clamp down on GPU idle power consumption over the years, they are increasingly fighting the laws of physics in extracting more idle power gains. At this point Intel can significantly scale down the frequency and operating voltage of their GPU, but past a point this offers diminishing returns. Transistors require a minimum voltage to operate – the threshold voltage – which means that after a certain point Intel can no longer scale down their voltage (and hence idle power consumption) further.

Intel’s solution to this problem is both a bit brute force and a bit genius, and is definitely unlike anything else we’ve seen on PC GPUs thus far. Since Intel can’t reduce their idle voltage they are going to start outright turning off the GPU instead; the process of duty cycling. By putting the GPU on a duty cycle Intel can run the GPU for just a fraction of the time – down to 12.5% of the time – which gets around the threshold voltage issue entirely.

This duty cycling is transparent to applications and the end user, with the display controller decoupled from the GPU clock domain and always staying online so that attached displays are always being fed regardless of what the GPU itself is doing. Control of the duty cycle is then handled through a combination of the GPU hardware and Intel’s graphics drivers, so both components will play a part in establishing the cycle.

Because today’s preview is Broadwell-Y centric, it’s unclear whether GPU duty cycle control is just a Broadwell-Y feature or whether it will be enabled in additional Broadwell products. Like many of Intel’s announced optimizations for Broadwell, duty cycle control is especially important for the TDP and battery life constrained Y SKU, but ultimately all mobile SKUs would stand to benefit from this feature. So it will be interesting to see just how widely it is enabled.

Moving on, last but not least in our GPU discussion, Intel is also upgrading their GPU’s media capabilities for Broadwell. The aforementioned increase in sub-slices and the resulting increase in samplers will have a direct impact on the GPU’s video processing capabilities – the Video Quality Engine and QuickSync – further increasing the throughput of each of them, up to 2x in the case of the video engine. Intel is also promising quality improvements in QuickSync, though they haven’t specified whether this is from technical improvements to the encoder or having more GPU resources to work with.

Broadwell’s video decode capabilities will also be increasing compared to Haswell. On top of Intel’s existing codec support, Broadwell will be implementing a hybrid H.265 decoder, allowing Broadwell to decode the next-generation video codec in hardware, but not with the same degree of power efficiency as H.264 today. In this hybrid setup Intel will be utilizing both portions of their fixed function video decoder and executing decoding steps on their shaders in order to offer complete H.265 decoding. The use of the shaders for part of the decoding process is less power efficient than doing everything in fixed function hardware but it’s better than the even less optimal CPU.

The use of a hybrid approach is essentially a stop-gap solution to the problem – the lead time on the finalization of H.265 would leave little time to develop a fixed function encoder for anyone with a long product cycle like Intel – and we expect that future generation products will have a full fixed function decoder. In the meantime Intel will be in the company of other GPU manufacturers such as NVIDIA, who is using a similar hybrid approach for H.265 on their Maxwell architecture.

Finally, Broadwell’s display controller will be receiving an update of its own. Broadwell is too soon for HDMI 2.0 or DisplayPort 2.0 – it will support HDMI 1.4 and DP 1.2/eDP 1.3a respectively – but the Y SKU in particular is getting native support for 4K. This is admittedly something of a backport since Haswell already supports 4K displays, but in Haswell’s case that feature was not available on Haswell-Y, so this is the first time native 4K support has come to a Y series SKU. This means that Broadwell-Y will be able to drive 4K displays, whether that means a 4K display in the device itself, or a 4K display hooked up internally (with an overall limit of 2 displays on Broadwell-Y). Don’t expect Broadwell-Y to have the performance necessary to do intensive rendering at this resolution, but for desktop work and video playback this should be enough.

Broadwell CPU Architecture Putting It All Together: Low Power Core M
Comments Locked

158 Comments

View All Comments

  • mapesdhs - Monday, August 11, 2014 - link


    Yeah, sure, and that's exactly what everyone was saying back when we were waiting for
    the followon to Phenom II; just wait, their next chip will be great! Intel killer! Hmph. I recall
    even many diehard AMD fans were pretty angry when BD finally came out.

    Benchmarks show again and again that AMD's CPUs hold back performance in numerous
    scenarios. I'd rather get a used 2700K than an 8350; leaves the latter in the dust for all CPU
    tasks and far better for gaming.

    Btw, you've answered your own point: if an 8350 is overkill for a game, giving 120fps, then
    surely one would be better off with an i3, G3258 or somesuch, more than enough for most
    gaming if the game is such that one's GPU setup & sscreen res, etc. is giving that sort of
    frame rate, in which case power consumption is less, etc.

    I really hope AMD can get back in the game, but I don't see it happening any time soon.
    They don't have the design talent or the resources to come up with something genuinely
    new and better.

    Ian.
  • wurizen - Monday, August 11, 2014 - link

    an fx-8350 isn't holding anything back. come on, man. are you like a stat paper queen obsessor or something? oh, please. an fx-8350 and an amd r9290 gpu will give you "happy" frame rates. i say happy because it i know the frames rates will be high enough. more than good enough even. will it be lower than an i7-4770k and an r90? maybe. maybe the fx-8350 will avg 85 fps on so and so game while the i7-4770k will avg 90 fps. boohoo. who cares about 5 more frames.

    also, while you mention i3 as a sufficient viable alternative to an fx-8350. remember that the cost will probably be about the same. and fx-8350 is like $190. maybe the i3 is 20 dollars less. but, here's the big but, an i3 is not as good as an fx-8350 in video editing stuff and photo editing stuff if one would like to use their pc for more than just games. an fx-8350, while not as power efficient as an i3 (but who cares since we are talking about a desktop) literally has more bang for the back. it has more cores and is faster.

    amd will get back in the game. it is just a question of when. an fx-8350 is already toe-to-toe with an i7-2600k, which is no slouch in todays standard. so, amd just needs to refine their cpu's.

    as for talent? amd came up with x64, or amd64 before intel. intel developed their own x86-64 later.

    the resource that intel has over amd is just die shrinking. that's it. architecturally, an fx chip or the phenom chip before it seems like a more elegant design to me than intel chips. but that's subjective. and i don't really know that much about cpu's. but, i have been around since the days of 286 so maybe i just see intel as those guys who made 286 which were ubiquitous and plain. i also remember cyrix. and i remember g4 chips. and to me, the fx chip is like a great chip. it's full of compromises and promises at the same time.
  • Drumsticks - Monday, August 11, 2014 - link

    I think AMD might have a way back into the game, but the difference right now is way worse than you say.

    http://www.anandtech.com/bench/product/697?vs=287

    FX-8350 trails the 2600k frequently by 10-20% or more (in gaming).

    http://www.anandtech.com/bench/product/697?vs=288

    i5-2500k beats it just as badly and actually sells for less than the 8350 used on ebay. Games love single threaded power and the 8350 just doesn't have it.
  • wurizen - Monday, August 11, 2014 - link

    the games they have in that comparison are starcraft 2 and dragon age. 47 fps at 768 p for 8350 looks suspect on starcraft 2. what gpu did they use?

    it's not way worse as i say. omg.

    i have an i7-3770k oc'd to 4.1Ghz and a stock FX-8320 at stock. both can run cod: ghost and bf3. haven't tested my other games. might do the starcraft 2 test tomorrow. i don't have the numbers nor care. what ppl need to realize is the actual game experience while playing games and not the number. is the game smooth? a cpu that can't handle a game will be very evident. this means it's time to upgrade. and there are no fx cpu's from amd that can't handle modern games. again, they will trail intel, but that is like a car going at 220mph so that car wins but the other car is going at 190mph and it will lose but realistically and the experience of going at 190mph will still be fast. the good thing is that amd or cpu don't race each other unless you care about benchmarks. but, if you look past the benchmarks and just focus on the experience itself, an fx series cpu by amd is plenty fast enuff.

    omg.
  • silverblue - Tuesday, August 12, 2014 - link

    We're well within the realms of diminishing returns as regards standard CPU IPC. AMD has the most to gain here, though with HSA, will they bother?
  • kaix2 - Tuesday, August 12, 2014 - link

    so your response to people who are disappointed that broadwell is focused more on TDP instead of performance is to buy an AMD cpu with even lower performance?
  • wurizen - Tuesday, August 12, 2014 - link

    well, they don't have to get the fx-9590, which has serverlike cpu of 2008 like tdp or a gpu like tdp of 220 watts. there is a more modest tdp of 125w with the fx8350. all overclockable. seems like a good cpu for tinkerers, pc/enthusiast, gamers and video editors. i don't even think it's a budget cpu. there is the 6-core and 4-core variants which are cheaper. i am also not saying that an fx-8350 is like the best cpu since it's not and falls way down in the benchmark charts. but, it's not a bad cpu at all. it gets the work done (video editing) and let's you play games (wit's a modern cpu after) even though it's sort of 2 yrs old already. the 990FX chipset is even an older chipset. there's something to be said about that and i think im trying to say it. in light of all the news about intel, which we are guaranteed to get every year with each tick and tock... there is that little AMD sitting in the corner with a chipset that hasn't been updated for yrs and an 8-core cpu that's remarkably affordable. the performance is not that low at all. i mean, video editing with it or playing games with it doesn't hamper one's experience. so, maybe one will have to wait a couple more minutes for a video to render in a video editing program versus say an i7-4790k. but, one can simply get up from one's chair and return. instead of staring at how fast their cpu renders a video on the screen.

    know what i'm saying?

    so, yeah. an fx-8350 with an old 990fx mobo and now intel's upcoming broadwell cpu's with z97 chipsets and all the bells and whistles and productivity for either one will probably be similar. also, most video editing programs now will also leverage the gpu so an old fx-8350 w/ a compatible gpu will have help rendering those gpu's....

    i guess it's like new doesn't mean anything now. or something. like m2 sata and pci 3.0, which intel chipsets have over amd is kinda superflous and doesn't really help or do much.

    know what im saying?
  • rkrb79 - Tuesday, November 18, 2014 - link

    Agreed!!
  • name99 - Tuesday, August 12, 2014 - link

    Oh yes, Skylake.
    Intel has given 5% IPC improvements for every generation since Nehalem, but now Skylake is going to change everything?
    If you're one of the ten people on the planet who can actually get value out of AVX-512 then, sure, great leap forward. For everyone else, if you were pissed off at IB, HSW, BDW, you're going to be just as pissed off with Skylake.
  • DanNeely - Tuesday, August 12, 2014 - link

    No, the interest in Skylake is for all the non-CPU speed things promised with it. PCIe 4.0 and a bump from 16 to 20 CPU lanes (for PCIe storage) are at the top of the list. Other expected, but AFAIK not confirmed, benefits include USB3.1 and more USB3.x on the chipset than the current generation. We should have consumer DDR4 with Skylake too; but that's not expected to be a big bump in the real world.

Log in

Don't have an account? Sign up now