Performance Expectations & First Thoughts

Wrapping up this GPU architecture deep dive, while Intel didn’t use this year’s architecture day to discuss specific products and SKUs, the company did take a moment to discuss performance expectations for Xe-LP, and offer some quick videos of Xe-LP in action. Unfortunately we weren’t allowed to record these demos (least someone leak them), but we’ll post them here as soon as Intel releases copies to the public.

At any rate, as previously discussed, Intel’s goal was to double Ice Lake’s (Gen11) graphics performance, which Xe-LP will be accomplishing via a combination of a wider GPU (more hardware), a more power-efficient GPU (allowing higher clocks), and a more throughput-efficient GPU (higher IPC). This is a lofty goal given the fact that they don’t get the benefit of a wholly new process node, but Intel does seem rather confident about the performance potential of its new 10nm SuperFin process node, as well as the payoff from the tried-and-true method of brute forcing things by throwing more hardware at it.

Looking at our own performance data from reviews of Ice Lake and Ryzen 3000 “Renoir” laptops, if Intel can meet their performance goals then Tiger Lake should be able to pull ahead of AMD’s comparable U-series Ryzen APUs. As always, this is going to be game-dependent, but high-end Ice Lake laptops were never behind by more than 30% or so in GPU-limited scenarios. But since we’re talking about mobile scenarios, the power and cooling will always be a potential wildcard that can hold a laptop back. So for ultraportable gaming laptops in particular, Intel will undoubtedly want its partners to build laptops with the cooling capabilities to match, to give Tiger Lake every possible chance to succeed.

Framerates aside, Intel also expects Xe-LP’s performance to significantly raise the bar on image quality. With integrated graphics generally bringing up the rear in terms of image quality in order to deliver the necessary framerates, doubling their iGPU performance would allow a lot of games to be run at higher image quality settings. This again would vary from game to game, but at least for promotional purposes, Intel is eyeballing Tiger Lake/Xe-LP being able to run at high image quality in games where Ice Lake could only manage low.

But Xe-LP isn’t just an integrated graphics solution: it’s for discrete graphics too. And while we eagerly anticipate more information on DG1, given Intel’s focus today on architecture over products, we’re left with more questions than answers. Intel has a very interesting and OEM-friendly plan in place with Xe-LP, and by leveraging the same architecture for both the iGPU and an optional discrete GPU, OEMs are going to love the fact that they don’t have to validate and load separate GPU drivers for the integrated and discrete GPUs.

Most importantly, however, Intel is also refusing to answer the 10 million pixel question: will Tiger Lake’s iGPU be able to work in concert with the DG1? Intel has certainly not made any efforts to shoot down that idea, but they also aren’t confirming it, either. And even then, if they utilize mutli-GPU rendering, will they get it right? Multi-GPU rendering on the desktop is all but dead, and for good reason: it tends not to play nicely with certain modern rendering techniques, and it can add a fair bit of input lag. The answer to this question – and whether Intel has been able to conquer the traditional drawbacks of multi-GPU rendering – will absolutely have a huge impact on the commercial viability of the DG1 GPU. So we’ll be eagerly awaiting the answer to those questions.

Otherwise, Xe-LP marks an important step in the evolution of Intel’s GPU architectures, never mind a huge stepping stone in their plans to become a top-to-bottom GPU supplier. Though only destined for laptops, Xe-LP is the basis of something much bigger for Intel: Xe-LP will be the foundation of an entire generation of GPUs to come. So what Intel does here with regards to features, architecture, and above all else power efficiency will have enormous repercussions to come, for everything from gaming hardware to supercomputers. In many ways it’s the dawn of a new era for Intel, and one they are hoping will be a better era than what they leave behind.

Xe-LP Media & Display Controllers
Comments Locked


View All Comments

  • mode_13h - Thursday, August 13, 2020 - link

    As always, thanks for the deep coverage.

    Not finished reading, but I already have one complaint:

    > Gen11’s smallest wavefront width is 8 threads wide (SIMD8), so it can take multiple clock cycles to execute a single wavefront, with Intel interleaving multiple threads as a form of latency hiding.

    Wow. Mixing 2 different definitions of "thread" in the same sentence? Please don't.

    Last I checked Nvidia is the only one talking about SIMD lanes as if they're threads. In Intel's Gen 9 whitepaper, it uses "threads" in a manner equivalent to CPU threads, and they talk about SIMD lanes as SIMD lanes.

    And speaking of Gen 9, they claim it has 7-way SMT. Did they ever specify this, for Gen 11? I don't recall seeing it in their Gen 11 whitepaper, which went into significantly less detail on the EUs than previous whitepapers.
  • mode_13h - Thursday, August 13, 2020 - link

    I guess your article could be self-consistent by replacing the second use of "thread" in that quoted sentence with "wavefront"?

    Although, "wavefront" is an AMD term (Nvidia calls them "Warps"). However, Intel's slides suggest they still call them "threads".
  • Ryan Smith - Thursday, August 13, 2020 - link

    "I guess your article could be self-consistent by replacing the second use of "thread" in that quoted sentence with "wavefront"?"

    You are correct sir! That was supposed to be "wavefront".

    And Intel tends to use "wave" in its literature, though I prefer to collapse it down to just wavefront to keep things reasonably consistent. We don't need 2 nearly-identical terms for the same thing.
  • mode_13h - Thursday, August 13, 2020 - link

    Cool. Thanks for the reply!

    BTW, I don't mind the term "wavefront" - I said that more to point it out to those who might not know.
  • mode_13h - Thursday, August 13, 2020 - link

    IMO, the reason Nvidia has long called their Warp elements "threads" is so they can claim that each SIMD lane is a "core", to make their GPUs *sound* more impressive.

    Since Volta finally fixed their per-lane IP register (which is basically just a fancy form of branch predication), there's almost a touch of truth in that characterization, and I'd finally agree that their ISA is more than just a straight-forward combination of SIMD + SMT.
  • xenol - Thursday, August 13, 2020 - link

    AMD feels more confusing. Their base unit is a "stream processor" which seems to suggest something larger than it really is. But a group of stream processors is called a Compute Unit, which that seems to suggest something smaller than it really is.

    Though looking at some of the programming literature for GPUs, I can see where the "thread" terminology comes from. So this looks more like a problem of someone coming up with their own language instead of the industry coming together to standardize on it. However, given that NVIDIA, AMD, and Intel have their own way of doing things, it may not be possible to do that and for the sake of clarity, having their own terminology is more or less correct.
  • mode_13h - Thursday, August 13, 2020 - link

    Since Nvidia's Fermi and AMD's GCN, their architectures basically amount to SIMD + SMT. I'm not sure exactly when Intel added SMT.

    Anyway, I wouldn't characterize their architectures as fundamentally different. Intel is traditionally the most distinct, among the three.
  • jim bone - Friday, August 14, 2020 - link

    recent editions of Hennessy and Patterson have a nice table mapping the CPU terminology to nvidia’s GPU terminology:
  • jim bone - Friday, August 14, 2020 - link

    and yes for reasons nvidia calls a vertical slice of simd instructions a thread
  • kpx86 - Thursday, August 13, 2020 - link

    I believe the SW libraries like DirectX and OpenGL use threads this way.

    From MSFT website: The maximum number of threads is limited to D3D11_CS_4_X_THREAD_GROUP_MAX_THREADS_PER_GROUP (768) per group.

Log in

Don't have an account? Sign up now