Graphics Tile: A Generational Leap Through Arc, Xe-LPG Graphics

As part of their disaggregated architecture for Meteor Lake, Intel has opted to use a separate tile for graphics. Intel has gone down an interesting route for its disaggregated graphics, with the most notable inclusion through an upgrade to Intel's Arc Graphics architecture. Powering Intel's integrated graphics for Meteor Lake is a new graphics architecture which Intel calls Xe-LPG (and no, we're not talking about fuel here). Based on Intel's current discrete graphics architecture known as Xe-HPG (used in their Arc GPUs), Intel claims 2x performance per watt compared to the Xe-LP architecture-based Iris Xe integrated graphics within Intel's 12th Gen Core series.

There are a number of different elements within the graphics and media area of Meteor Lake, the bulk of which is built into the graphics tile, where the Xe-LPG graphics architecture is located. Unlike the compute tile (Intel 4) and the SoC tile, which is manufactured on TSMC N6 (6 nm), the graphics tile is made on TSMC's N5 node (5 nm), the same generational family as the nodes used by AMD and NVIDIA's discrete and integrated GPUs.

With Meteor Lake and the graphics tile with the Xe-LPG graphics processor, Intel is promising discrete-level performance in an integrated form factor. Looking at the finer specifications, Intel includes 8 x Xe graphics cores with 128 vector engines (12 per Xe core) and 8 samplers, representing a 1.33 x increase over Intel's previous Xe LP graphics. There are also 4 Pixel backends, which is an improvement over the 3 PBs on Xe LP. Intel also doubles the number of geometry pipelines within Xe-LPG, with two, and they also introduce 8 dedicated Ray Tracing Units (RTU), which is new for Intel's integrated graphics line-up.

Looking at the makeup of Intel's Xe core, as previously mentioned, there are 16 Vector Engines that have a bus width of 256-bit, while each core also has 192 KB of shared L1 cache. Each Vector Engine enables 16 FP32 ops per clock, and 32 FP16 ops per clock, with a shared FP64 execution port with 64 INT8 ops per clock. One dedicated FP64 ops per clock unit is new over what's previously been seen in Raptor Lake and shares the overall design philosophy of Meteor Lake on power efficiency; pairs of Vector Engines can run in lockstep for better efficiency.

As part of Intel's goal of advancing the overall experience with Xe-LPG for users, the graphics are DX12 optimized, and Intel now brings Out of Order Samplng (OoOS) to Xe-LPG. It's worth noting that when talking about Execution Units (EUs), Intel's new and current term for this is Xe Vector Engines, or XVE for short. Intel hasn't provided us with how OoOS works within Xe-LPG, but we've reached out for more details.

Comparing Intel Xe Integrated Graphics (Mobile)
  Meteor Lake
(Xe-LPG)
Raptor Lake 
(Xe-LP)
Alder Lake GT1
(Xe-LP)
Tiger Lake GT2
(Xe-LP)
Process Node TSMC N5 Intel 7 Intel 7 Intel 7
Vector Engines/EUs 128 96 96 96
ALUs/Shaders 1024 768 768 768
TMUs ?* 48 16 48
ROPs ?* 24 8 24
Ray Tracing Units 8 - - -
TDP ? 15 W 15 W 15 W

*Intel hasn't given us a deep dive into the finer specifications of Xe-LPG integrated graphics. Looking at an existing integrated Intel Arc equivalent with similar specs, the Meteor Lake Xe LPG could have 64 TMUs and 32 ROPs per the Arc A370M, which also has 1024 ALUs.

Comparing Intel's integrated Xe graphics from previous mobile architectures, Meteor Lake, through the Xe-LPG Arc based graphics, has 128 XVEs, which is an increase of 1.33 X or 32 XVE/EUs, than the previous Xe-LP generation. Regarding arithmetic logic units (ALUs), which are essentially shader cores, Xe-LPG has been increased to 1024, which is 128 ALUs per Xe-LPG core. As previously mentioned, Intel hasn't given us more about the finer specifications, including TMUs or ROPs, but does bring 8 Ray Tracing units, which is new for Xe-LPG when compared directly to Xe-LP.

Meanwhile, with Intel's Foveros 3D packaging technology, disaggregating the Media Engine and Display Engine from the graphics tile means when doing encoding or decoding, as well as video playback, it doesn't require the graphics tile to be powered up to do workloads on more power consuming cores.

Intel Xe-LPG is the next step up from Xe LP, and one area where performance and efficiency gains are made is through a lower voltage frequency (V/F) curve, allowing the graphics to run at a lower minimum voltage with a higher maximum core clock speed. Intel has also optimized the pipelines for faster frequencies and is claiming up to 2 x performance at iso-voltages, which for a mobile platform such as Meteor Lake, adds more potential with a key focus on achieving a figure of up to 20% in power savings compared to the previous generation.

I/O Tile: Extended and Scalable Depending on Segment Intel Meteor Lake: Changing The Strategy, Laying the Foundation for Intel 3
Comments Locked

107 Comments

View All Comments

  • GeoffreyA - Saturday, September 23, 2023 - link

    I agree with most of what you're saying. What I was trying to get at is that there seems to be a belief that Apple has superior engineering ingenuity than Intel and AMD, when really, it is the difference between fixed- and variable-length instruction sets and all that entails. What I'd like to see is all of them on the same playing field and where each then stands, from a CPU point of view. Quite likely, there won't be much of a difference because good design principles are always the same. It's trying to be out of the ordinary that leads to Pentium 4s and Bulldozers.
  • GeoffreyA - Saturday, September 23, 2023 - link

    And yes, I'd like to see RISC-V winning in the end, rather than ARM.
  • GeoffreyA - Saturday, September 23, 2023 - link

    The thing is, ARM is almost fully ready on the Windows side of the coin. Windows on ARM appears to be working well, x64 emulation is up and running, increasingly more programs are getting ARM compiles, and Microsoft's VS and compilers now have ARM on an equal footing with x64. So, if Intel or AMD decided to make an ARM CPU, people could go over quite easily, similar to the early days of x64.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    Edit: royalist/encumberments to royalty/encumberments!

    And Firefox's Spell Checker is so bad that The Mozilla Foundation should be stripped of their Tax Exempt status until they fully comply and fix that.
  • Bluetooth - Saturday, September 23, 2023 - link

    Intel has proposed X86-S ISA, to get rid of all the legacy code and boot directly into 64 bit, (the proposal is available on their website). But I don't know, if this is enough to allow them to build wider decoders to improve the single thread performance.
  • GeoffreyA - Saturday, September 23, 2023 - link

    I took a look at x86-S and it certainly would be welcome, getting rid of unnecessary legacy features. From my understanding, I don't think it would help to build wider decoders. The problem in x86 is that the length of each instruction varies and is not known beforehand. At execution time, length has got to be worked out in predecode, and I imagine this constrains how much can be sent through the decoders, as well as taking up a great deal of power. In the fixed-width ISA, it is trivial to know where each instruction starts and send them off to the decoders in mass. A bit like comparing a linked list with an array.
  • FWhitTrampoline - Tuesday, September 19, 2023 - link

    up to clocked 2Ghz+ should read: Clocked up to.
  • Bluetooth - Saturday, September 23, 2023 - link

    He may overstate the power, but don't diss his remark by only focusing on that error, as the mobile processor is running at much lower frequencies.
  • tipoo - Tuesday, September 19, 2023 - link

    It sounds like you carried forward 3W from 2008. The A17 Pro draws more power than ever.

    https://www.youtube.com/watch?v=TX_RQpMUNx0
  • StevoLincolnite - Tuesday, September 19, 2023 - link

    He is nothing but a liar.

Log in

Don't have an account? Sign up now