The Intel Xe-LP GPU Architecture Deep Dive: Building Up The Next Generation
by Ryan Smith on August 13, 2020 9:00 AM EST- Posted in
- GPUs
- Intel
- Tiger Lake
- Xe
- Xe-LP
- DG1
- Intel Arch Day 2020
- SG1
Xe-LP Media & Display Controllers
Our final stop on our deep dive through the Xe-LP architecture are the non-rendering aspects of the GPU: its media and display controllers. Changes here aren’t quite as eye-catching as changes to the core architecture, but the improvements made in these blocks help to keep the overall GPU current by supporting new media formats as well as new display connectivity protocols.
First off, let’s talk about the media engine. There are no crazy overhauls to speak of here, but for Xe-LP Intel has made some sensible additions to the engine. The marquee feature here is easily support for AV1 decode acceleration, making Intel the first vendor out of the Big 3 to add hardware decode support for the new codec.
The up-and-coming royalty free codec is widely expected to become the de facto successor to H.264/AVC, as while HEVC has been on the market for a number of years (and is already supported in all recent GPUs), the madcap royalty situation around the codec has discouraged its adoption. By contrast, AV1 should deliver similar or slightly better quality than HEVC without royalties for its use in distribution, which makes it a lot more palatable to content vendors. The one downside to AV1 thus far is that it’s pretty CPU heavy, which makes hardware decode support all the more important not only for battery life reasons, but even ensuring smooth, glitch-free playback.
Meanwhile, similar to Intel’s rendering performance goals, the company has worked to improve the encoding and decoding throughput of the media engine. According to the company the updated block is now capable of up to 2x the encode and decode throughput. For consumer machines this is unlikely to matter too much, but it’s something that will be particularly important for the SG1 server product, which will focus on bulk encoding.
Finally, among smaller changes made to the media engine, Intel has added official support for HDR and Dolby Vision playback. HDR support continues to roll out to PCs, if only slowly, so this is a notable step in ensuring that newer PCs can handle HDR content encoded in those formats. Also notable is an improvement to Intel’s HEVC encoding block, which now supports the HEVC Screen Content Coding (SCC) extension, which is designed to improve HEVC compression rates on content with static or nearly-static images.
Xe-LP Display Controller: DisplayPort 1.4, HDMI 2.0, & 8K Displays
Last but certainly not least, we have Xe-LP’s display engine. Like the media block there are no radical changes here, but there are some welcome improvements throughout.
Perhaps the biggest change here is that after several years, Intel has finally added a fourth display pipeline, meaning that the GPU can now drive four independent displays. Prior to this, Gen11 and earlier designs could only handle three displays, and though even that is more than most people will use, 4 displays has become the magic number for other GPU designs. Complicating matters is the recent push for dual screen laptops and other mobile devices with multiple displays, which would then eat up two of those three outputs.
On which note, Intel has also added a second embedded DisplayPort output, which would be extremely useful for those dual screen devices.
Otherwise, the basic display output choices are unchanged from Gen11. Xe-LP supports DisplayPort 1.4 as well as HDMI 2.0. The latter is a bit disappointing since HDMI 2.1 televisions are now shipping, but it’s admittedly not unusual for Intel to take an extra generation to adopt newer HDMI standards. These display outputs can also be fed into a USB4/Thunderbolt 4 port, where DisplayPort data is a first-class citizen and can be muxed in the signal, or the port reconfigured via alt modes.
Not evident on Intel’s block diagrams, the company has made some plumbing changes to better feed the display controllers. Specifically, the company has increased the bandwidth available to the display engine so that it can handle the kind of extreme, high-resolution displays that DisplayPort 1.4 was designed to feed. As a result the controller now has enough bandwidth and internal processing power to drive 8K UHD displays, as well as the recent generation of 360Hz displays.
33 Comments
View All Comments
mode_13h - Thursday, August 13, 2020 - link
I can't speak to Direct 3D, but OpenGL talks about work group invocations. I don't believe "threads" is mentioned anywhere in the API.Dolda2000 - Thursday, August 13, 2020 - link
Admittedly I haven't read the whole article yet, but it strikes me how the presentations seems to be comparing the new GPU to the previous GPU, rather than presenting it as a new architecture. Does this confirm that using the "Xe" moniker for this product is just marketing, and that it in fact is an evolution of previous Gen architectures?I mean, I don't mind if that's the case, I just wish they wouldn't overmarket it.
Ryan Smith - Thursday, August 13, 2020 - link
" is an evolution of previous Gen architectures?"It is an evolution of the previous Gen architectures. A major evolution, but an evolution none the less. Not even Intel is going to do a clean sheet design when they have bits and pieces that already work fine.
Dolda2000 - Thursday, August 13, 2020 - link
Certainly, they're not going to create a new clean-slate ALU design just for the sake of it, but it has always been my impression that Xe (at least Xe-HPC) was going to be a more-or-less new architecture. Maybe that has just been my misunderstanding the whole, and Xe-HPC too is going to be fundamentally Gen-based (though I seem to recall that being explicitly denied at some point), but what I was getting at here was that Xe-HPC is going to be the new architecture, and meanwhile this is "merely" an evolution of Gen for which they're just borrowing the product name of their higher-end offering to make it seem like more than what it is.mode_13h - Thursday, August 13, 2020 - link
You should distinguish between the ISA and uArch of the shader cores (EUs) vs. the macro-architecture of the GPU (e.g. buses, memories, caches, fixed-function units, etc.).So, you can have a macro-architecture that's *very* different, even while the ISA is a small evolution and the uArch of the EUs is somewhere in between.
tipoo - Thursday, August 13, 2020 - link
RDNA 1 still has significant GCN bits in it, I'm sure Nvidia does the same a few generations in a row, there's no necessary contention between it being an evolution and it being marked as something substantially new.abufrejoval - Thursday, August 13, 2020 - link
IMHO the overhead of multi GPU rendering with an iGPU and dGPU can't really be offset by the small contribution the iGPU is likely to make to a beefy dGPU.More likely will be dGPU via Thunderbolt 4 and very seamless transitions on docking/undocking and that's good enough.
Too bad that won't work nearly as well with Ryzen notebooks so there again consumer choice goes down the drain somewhat. Not that I believe TB dGPU is a really an attractive market unless prices change dramatically.
mode_13h - Thursday, August 13, 2020 - link
Agreed. I think it would work much better to task the iGPU with other compute tasks that involve less communication bandwidth with the dGPU. Things like physics, AI, audio processing, etc.brucethemoose - Thursday, August 13, 2020 - link
Maybe post processing? Like an Intel version lf ReShade? IIRC the frames have to come back to the IGPU's display block anyway.tipoo - Thursday, August 13, 2020 - link
In this case the IGP would be nearly equivalent to DG1