Graphics Tile: A Generational Leap Through Arc, Xe-LPG Graphics

As part of their disaggregated architecture for Meteor Lake, Intel has opted to use a separate tile for graphics. Intel has gone down an interesting route for its disaggregated graphics, with the most notable inclusion through an upgrade to Intel's Arc Graphics architecture. Powering Intel's integrated graphics for Meteor Lake is a new graphics architecture which Intel calls Xe-LPG (and no, we're not talking about fuel here). Based on Intel's current discrete graphics architecture known as Xe-HPG (used in their Arc GPUs), Intel claims 2x performance per watt compared to the Xe-LP architecture-based Iris Xe integrated graphics within Intel's 12th Gen Core series.

There are a number of different elements within the graphics and media area of Meteor Lake, the bulk of which is built into the graphics tile, where the Xe-LPG graphics architecture is located. Unlike the compute tile (Intel 4) and the SoC tile, which is manufactured on TSMC N6 (6 nm), the graphics tile is made on TSMC's N5 node (5 nm), the same generational family as the nodes used by AMD and NVIDIA's discrete and integrated GPUs.

With Meteor Lake and the graphics tile with the Xe-LPG graphics processor, Intel is promising discrete-level performance in an integrated form factor. Looking at the finer specifications, Intel includes 8 x Xe graphics cores with 128 vector engines (12 per Xe core) and 8 samplers, representing a 1.33 x increase over Intel's previous Xe LP graphics. There are also 4 Pixel backends, which is an improvement over the 3 PBs on Xe LP. Intel also doubles the number of geometry pipelines within Xe-LPG, with two, and they also introduce 8 dedicated Ray Tracing Units (RTU), which is new for Intel's integrated graphics line-up.

Looking at the makeup of Intel's Xe core, as previously mentioned, there are 16 Vector Engines that have a bus width of 256-bit, while each core also has 192 KB of shared L1 cache. Each Vector Engine enables 16 FP32 ops per clock, and 32 FP16 ops per clock, with a shared FP64 execution port with 64 INT8 ops per clock. One dedicated FP64 ops per clock unit is new over what's previously been seen in Raptor Lake and shares the overall design philosophy of Meteor Lake on power efficiency; pairs of Vector Engines can run in lockstep for better efficiency.

As part of Intel's goal of advancing the overall experience with Xe-LPG for users, the graphics are DX12 optimized, and Intel now brings Out of Order Samplng (OoOS) to Xe-LPG. It's worth noting that when talking about Execution Units (EUs), Intel's new and current term for this is Xe Vector Engines, or XVE for short. Intel hasn't provided us with how OoOS works within Xe-LPG, but we've reached out for more details.

Comparing Intel Xe Integrated Graphics (Mobile)
  Meteor Lake
(Xe-LPG)
Raptor Lake 
(Xe-LP)
Alder Lake GT1
(Xe-LP)
Tiger Lake GT2
(Xe-LP)
Process Node TSMC N5 Intel 7 Intel 7 Intel 7
Vector Engines/EUs 128 96 96 96
ALUs/Shaders 1024 768 768 768
TMUs ?* 48 16 48
ROPs ?* 24 8 24
Ray Tracing Units 8 - - -
TDP ? 15 W 15 W 15 W

*Intel hasn't given us a deep dive into the finer specifications of Xe-LPG integrated graphics. Looking at an existing integrated Intel Arc equivalent with similar specs, the Meteor Lake Xe LPG could have 64 TMUs and 32 ROPs per the Arc A370M, which also has 1024 ALUs.

Comparing Intel's integrated Xe graphics from previous mobile architectures, Meteor Lake, through the Xe-LPG Arc based graphics, has 128 XVEs, which is an increase of 1.33 X or 32 XVE/EUs, than the previous Xe-LP generation. Regarding arithmetic logic units (ALUs), which are essentially shader cores, Xe-LPG has been increased to 1024, which is 128 ALUs per Xe-LPG core. As previously mentioned, Intel hasn't given us more about the finer specifications, including TMUs or ROPs, but does bring 8 Ray Tracing units, which is new for Xe-LPG when compared directly to Xe-LP.

Meanwhile, with Intel's Foveros 3D packaging technology, disaggregating the Media Engine and Display Engine from the graphics tile means when doing encoding or decoding, as well as video playback, it doesn't require the graphics tile to be powered up to do workloads on more power consuming cores.

Intel Xe-LPG is the next step up from Xe LP, and one area where performance and efficiency gains are made is through a lower voltage frequency (V/F) curve, allowing the graphics to run at a lower minimum voltage with a higher maximum core clock speed. Intel has also optimized the pipelines for faster frequencies and is claiming up to 2 x performance at iso-voltages, which for a mobile platform such as Meteor Lake, adds more potential with a key focus on achieving a figure of up to 20% in power savings compared to the previous generation.

I/O Tile: Extended and Scalable Depending on Segment Intel Meteor Lake: Changing The Strategy, Laying the Foundation for Intel 3
Comments Locked

107 Comments

View All Comments

  • FWhitTrampoline - Wednesday, September 20, 2023 - link

    I'm more focused the on eGPU usage for OCuLink so I'm not stating that TB4/USB4 connectivity does not have its usage model for your use case. But pure PCIe is lowest latency for eGPU usage and can be easily adopted by more OEMs than just GPD for their handhelds as that OCuLink will work with any makers' GPUs as long as one is using an OCuLink capable eGPU adapter or enclosure.

    And ETA Prime has extensively tested OCuLink adapters with plenty of Mini Desktop PCs and even the Steam Deck(M.2 slot is only PCIe 3.0 capable). It's the 64Gbs on any PCIe 4.0/x4 connection(M.2/NVMe or other) that's what good for eGPUs via OCuLink relative to the current bandwidth of TB4/USB4 40Gbs.
  • Exotica - Wednesday, September 20, 2023 - link

    I’ve seen those videos and the performance advantages for EGPUs. But most of the EGPUs in the market use alpine ridge. A chipset known to reserve bandwidth for DP and have less available for PCIe (22 Gbps). Perhaps there may be one or two based on Titan ridge with slightly more pcie bandwidth. It’s hard to say how barlow ridge will perform in terms of the amount of pcie bandwidth made available to peripherals. But a 64 Gbps pcie connection will not saturate the 80 Gbps link so hopefully we can have most of the available 64 Gbps pcie bandwidth. Another problem with occulink is that there’s no power delivery so you need to have a separate wire for power.

    So Barlow ridge TB5 has the potential to be a one cable solution, power upto 240W, pcie up to 64 Gbps, and it will also tunnel DisplayPort. Occulink is cool. But thunderbolt tunnels more capabilities over the wire.
  • FWhitTrampoline - Wednesday, September 20, 2023 - link

    OCuLink is lower latency as was stated in the earlier posts! And TB4/TB# or USB4/USB# will not be able to beat Pure PCIe connectivity for low latency and latency is the bigger factor for gaming workloads. TB tunneling protocol encapsulation of PCIe/Any other Protocol will add latency the result of having to do the extra encoding/encapsulation and decoding/de-encapsulation steps there and back whereas OCuLink is just unadulterated PCIe passed over an external cable.

    More Device makers need to be adding OCuLink capability to their systems as that's simple to do and requires no TB#/USB4-V# controller chip to be hung off of MB PCI lanes as the OCuLink port is just passing PCIe signals outside of the device. And TB5/USB4-V2 is more than 64Gbs but that will require more PCIe lanes be attached to the respective TB5/USB4-V2 controller and use more overheard to do that whereas if one has the same numbers of PCIe lanes connected via OCuLink then that's always going to be lower overhead with more available/usable bandwidth and lower latency for OCULink.

    Most likely the PCIe lane counts will remain at 4 lanes Max and that will just go from PCIe 4.0 to PCIe 5.0 instead to support TB5 and USB4-V2 bandwidth but whatever PCIe standard utilized OCuLink will always have lower overhead and lower latency than TB/Whatever or USB4/Whatever as with OCuLink that's skipping the extra tunneling protocol steps required.

    Plus by extension and with any OCuLink Ports being pure PCIe Protocol Based, that opens up the possibility of OCuLink to TB/USB/Whatever Adapters being utilized for maximum flexibility for other use cases as well.
  • Exotica - Wednesday, September 20, 2023 - link

    OCulink has merit for sure, but again, it is clunky. Unlike thunderbolt, it doesn't tunnel displayport or provide power delivery. It also doesn't support hotplugging. That is why it will most likely remain a niche offering. Also you're saying OcCulink is lower latency, but by how much? Where is the test data to prove that ?

    And does it really matter? Operating systems can be run directly off of thunderbolt NVME storage, the latency is low enough for a smooth experience. And even if OcCulink is technically faster, a GPU such as a 4080 or 4090 or 7900XTX in a PCIe4x4 or even PCIe5x4 eGPU thunderbolt 5 enclosure will be much faster than the iGPU or even internal graphics. And if the eGPU enclosure is thunderbolt enabled, it can power the laptop or host device and probably act as a dock and provide additional downstream thunderbolt ports and possibly USB as well. Thunderbolt provides flexibility that OcCulink does not. Both standards have merit.

    But I have a feeling Thunderbolt 5, if implemented properly in terms of bug-free firmware NVMs from Intel, will gain mass market appeal. The mass market is hungry for the additional bandwidth. AsMedia will probably do extremely well as well with its USB4 and upcoming USB4v2 offerings.
  • TheinsanegamerN - Thursday, September 21, 2023 - link

    Dont waste your time, Trampoline is an OCUlink shill who will ignore any criticism for his beloved zuckertech. The idea that most people dont want to disassemble a laptop to use a dock is totally alien to him.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    LOL, OCuLink's creator PCI-SIG is a not for profit Standards Organization that's responsible for the PCIe standards so it's not like they are any Business Interest with a Fiduciary responsibility to any investors.

    OCuLink is just a Port/electrical PCIe extension cabling standard that was in fact originally intended to be used in consumer products but Intel, a member of PCI-SIG along with other industry members, had a vested interest in that Intel/Apple co-developed Thunderbolt IP, because of TB controllers and sales of TB controllers related interests.

    And TB4/Later and USB4/Later will never have as low latency owing to the fact that any PCIe signalling will have to be intercepted and encapsulated by the TB/USB/Whatever protocol controller in order to be sent down the TB cabling whereas over the OCuLink ports/cabling that's just the PCIe signalling/packets there and no extra delays there related to any extra tunneling protocol encoding/encapsulation and decoding/de-encapsulating steps required.

    So OCuLink represents the maximum flexibility as that's the better lowest latency solution for eGPUs being just pure unadulterated PCIe signaling. And because it's just PCIe that opens up the possibility of all sorts of external adapters that take in PCIe and can convert that to Display Port/HDMI/USB/TB/Whatever the end users need because all Motherboard external I/O, for the most part, is in the from of PCIe and OCulink just brings that PCIe directly out of devices via Ports/External cables.

    And to be so dogmitacilly opposed to OCulink is the same as being opposed to PCIe! And does any rational person think that that's logical! OCuLink is External PCie and that's all there is to that and it's the lowest latency method to interface with GPUs via any PCIe Slot or externally via an OCuLink connection(PCIe is PCIe).

    Give me a Laptop with at least One OCuLink PCIe X4/4.0 port and with that I can interface to an eGPU at 64Gbs bandwidth/lowest latency possible! And there can and will be adapters that can be plugged into that One OCulink port that can do what any other ports on the laptop can do because those ports are all just connected to some MB PCIe lanes in the first place.
  • Kevin G - Wednesday, September 20, 2023 - link

    The main advantage of the TB4 is that the form factor is USB-C which can be configured for various other IO. This is highly desirable in a portable form factor like laptops or tablets. Performance is 'good enough' for external GPU usage. OCuLink maybe faster but doesn't have the flexibility like TB4 over the USB-C connector does. OCuLink has its niche but a mainstream consumer IO solution is not one of them.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    OCuLink is just externally routed PCIe lanes and really there can be one OCuLink port on every laptop specifically for the best and lowest latency eGPU interfacing and even OCuLink to HDMI/Display Port/whatever adapters that can make the OCuLink port into any other port at the end users discretion. So for eGPUs/Enclosures that have OCuLink ports that's 64Gbs/Lowest latency there and for any Legacy TB4/USB only external eGPU devices just get an OCuLink to TB4/USB4 adapter in the interim and live with the lower bandwidth and higher latency.

    GPD already has a line of Handheld Gaming devices that utilize a dedicated OCuLink port and a portable eGPU that supports both OCUlink interfacing and TB4/USB4 interfacing. And I do hope that GPD Branches out into the regular laptop market as GPD's external portable eGPU works with other makers products and even products that have M.2/NVMe capable slots available via an M.2/NVMe to OCuLink adapter! LOL, only Vested Interests would Object to OCuLink in the consumer market space, specifically those Vested Interests with Business Models that do not like any competition.
  • TheinsanegamerN - Thursday, September 21, 2023 - link

    Because most people dont want to disassemble their laptop to plug in a m.2 adapter, you knucklehead.
  • FWhitTrampoline - Thursday, September 21, 2023 - link

    No one is forcing you to do that and for others that's an option, albeit and inconvenient one. But really the adapters are not meant for Laptops in the first place and even for Mini Desktop PCs is not an easy task there but still more manageable that doing that with a laptop. It would just better if there was more Mini Desktop PC OEMs/Laptops OEMs where those OEMs would adopt an OCuLink PCie 4.0/x4 Port for eGPU usage like GPD has done with their line of handheld gaming devices. And with mass adoption of OCuLink there could also be adapters as well to support all the other standards as OCuLink being PCIe based by extension will support that as well.

Log in

Don't have an account? Sign up now