Building an Optimized Rasterizer for Larrabee

We've touched on the latency focus. We talked about caches and internal memory busses. But what about external memory? To be honest, the answer is that we don't know. But we have an idea of the direction they want to move in. Lower external bandwidth and possibly lower framebuffer size than traditional hardware seems to be the goal. If they can maintain good performance, reducing the amount of memory and the number of traces on the board will reduce the cost to add-in card vendors who may want to sell cards based on Larrabee (and in turn could reduce cost to the end user).

This bit of speculation isn't just based on what we know about the hardware so far. It's also based on the direction they decided to take with their rasterizer: Intel is implementing a tile based rasterizer to support DirectX and OpenGL as well as their own software renderer. Speaking of their software renderer, they did state that it would be available for use by developers so that they don't have to start from nothing. When asked whether it would be available only as a set of binaries or as source, our answer was that this was still under discussion. We put in our two cents and suggested that distributing the source is the way to go.

Anyway, we haven't discussed tile based rasterization in quite a while on AnandTech as the Kyro line didn't stick around on the desktop. To briefly run it down, screen space is broken up into tiles. For each tile, primitives (triangles) are set aside. Fragments are created for a tile based on all the geometry therein. Since none of these fragments are further processed or shaded until the entire tile is finished, only visible fragments are sent on to be shaded (at least, this is how it used to be: some aspects of DX10+ may require occluded fragments to hang around in some cases). Occluded fragments are thrown out during rasterization. Intel does also support Z culling at geometry, fragment and pixel levels, which is also very useful as the actual rasterization, blending etc. must occur in software as well. Cutting down work at every point possible is the modus operandi of optimizing graphics.

This is in stark contrast to immediate mode renderers, which are what ATI and NVIDIA have been building for the past decade. Immediate mode rendering requires more memory bandwidth as it processes every fragment in the scene, sometimes even those that aren't visible (that can't easily be thrown out by pre-shading depth test techniques). Immediate mode renderers have some tricks that can let them know what fragments will be visible in the scene to help cut down on work, but there are still cases where the GPU does extra work that it doesn't need to because the fragment it is processing and shading isn't even visible in the scene. Immediate mode renderers require more memory bandwidth than tile based renderers, but some algorithms and features have been easier to implement with immediate mode.

STMicro had a short run of popular tile (or deferred) renderers in the early 2000s with the Kyro series. This style of rendering still lives on in cell phone/smart phone and other ultra low power devices that need graphics. While performance on this hardware is very low, memory efficiency is important in this space and thus tile based renderers are preferred.

The technique dropped out of the desktop space not because it was inherently unable to perform, but simply because the players that won out in the era didn't choose to make use of it. With smaller process technology, larger on die cache sizes, larger tiles sizes, and smaller geometry (meaning less triangles span multiple tiles), some advantages of tile based rendering have gotten ... well, more advantageous with advancements in technology.

Getting into the details of tile based rendering is a bit beyond where we want to go right now. But the point is that this technique results fewer occluded fragments end up being shaded. Additionally, the grouping of fragments into tiles helps with breaking up the workload and could help to optimize prefetching and caching so that fragments are only ever fetched once from external memory (tiles on Larrabee will fit into less than half the L2 space per core). These and other features help to reduce bandwidth needs compared to immediate mode renderers.

Looking a little deeper, it is both the burden and advantage of Larrabee that it implements all steps of the traditional graphics pipeline in software. While current GPUs have hardware for geometry setup, rasterization, texturing, filtering, compressing, decompressing, blending and much more, Larrabee maintains a minimum of fixed function features (related to texturing). Often, for a specific purpose, fixed function hardware can be more efficient and faster than general purpose hardware. But at the same time, the needs of individual games shift, and allocating greater or fewer resources to a specific component of the rendering pipeline does have advantages over fixed function hardware. Current GPUs can't shift resources to offer faster rasterization if needed. They can't devote more flops to speeding up stenciling or blending.

The flexibility of Larrabee allows it to best fit any game running on it. But keep in mind that just because software has a greater potential to better utilize the hardware, we won't necessarily see better performance than what is currently out there. The burden is still on Intel to build a part that offers real-world performance that matches or exceeds what is currently out there. Efficiency and adaptability are irrelevant if real performance isn't there to back it up.

Thread and Data Management: It's Time to Blow Your Mind Shading Tiles with Larrabee (With Extra Goodies)
Comments Locked

101 Comments

View All Comments

  • phaxmohdem - Monday, August 4, 2008 - link

    Can your mom play Crysis? *burn*
  • JonnyDough - Monday, August 4, 2008 - link

    I suppose she could but I don't think she would want to. Why do you care anyway? Have some sort of weird fetish with moms playing video games or are you just looking for another woman to relate to?

    Ooooh, burn!
  • Griswold - Monday, August 4, 2008 - link

    He is looking for the one playing his mom, I think.
  • bigboxes - Monday, August 4, 2008 - link

    Yup. He worded it incorrectly. It should have read, "but can it play your mom?" :p
  • Tilmitt - Monday, August 4, 2008 - link

    I'm really disappointed that Intel isn't building a regular GPU. I doubt that bolting a load of unoptimised x86 cores together is going to be able to perform anywhere near as well as a GPU built from the ground up to accelerate graphics, given equal die sizes.
  • JKflipflop98 - Monday, August 4, 2008 - link

    WTF? Did you read the article?
  • Zoomer - Sunday, August 10, 2008 - link

    He had a point. More programmable == more transistors. Can't escape from that fact.

    Given equal number of transistors, running the same program, a more programmable solution will always be crushed by fixed function processors.
  • JonnyDough - Monday, August 4, 2008 - link

    I was wondering that too. This is obviously a push towards a smaller Centrino type package. Imagine a powerful CPU that can push graphics too. At some point this will save a lot of battery juice in a notebook computer, along with space. It may not be able to play games, but I'm pretty sure it will make for some great basic laptops someday that can run video. Not all college kids and overseas marines want to play video games. Some just want to watch clips of their family back home.
  • rudolphna - Monday, August 4, 2008 - link

    as interesting and cool as this sounds, this is even more bad news for AMD, who was finally making up for lost ground. granted, its still probably 2 years away, and hopefully AMD will be back to its old self (Athlon64 era) They are finally getting products that can actually compete. Another challenger, especially from its biggest rival-Intel- cannot be good for them.
  • bigboxes - Monday, August 4, 2008 - link

    What are you talking about? It's been nothing but good news for AMD lately. Sure, let Intel sink a lot of $$ into graphics. Sounds like a win for AMD (in a roundabout way). It's like AMD investing into a graphics maker (ATI) instead of concentrating on what makes them great. Most of the Intel supporters were all over AMD for making that decision. Turn this around and watch Intel invest heavily into graphics and it's a grand slam. I guess it's all about perspective. :)

Log in

Don't have an account? Sign up now