Shading Tiles with Larrabee (With Extra Goodies)

We've looked at the way we get from triangles to tiles a bit. Intel shared a bit of a deeper look at how they are organizing their software render on the back end (from the tiles to the screen).

First, full tiles are fetched into cache. Reaching back to understanding how threads are organized, we can have four simulatneous threads running, and keeping all four of these threads working on parts of the same data set will help keep from thrashing the cache. Intel has indicated that the organization of software rendering threads durring back end processing will be as illustrated in the following diagram.

We see that there are 4 thread with one acting as a fragment setup thread which takes all the geometry in the tile and creating fragments from it for further processing. There are then three work threads that take ready fragments (or more like groups of 4 to 16 fragments each -- just a guess for now), check to see if they are visible, shade the fragment (load textures and run associated shader programs), perform any antialiasing and handle blend operations. Remember that this is all just software. It doesn't have to happen this way, but this is the direction Intel had indicated they have taken for their software renderer and for implementing DirectX and OpenGL.

By the time Larrabee arrives as a product, I certainly hope that we'll get a deeper look at what's really going on under the hood and how everything is organized. I suppose the holy grail would be if Intel decides to release it's software renderer source code to the general public, but even if we don't get that we'll try to get information on all the different types of threads, fibers and strands that are spawned to handle all the different steps in the rendering pipeline.

Beyond just taking traditionally fixed function features and running them in software, Intel can do a few cool things that are difficult with current hardware. In order to get layered transparency to work right, game developers need to sort objects and polygons as best then can from back to front (rendendering the furthest object to the screen first). If this isn't done, we can get some funky artifacts that don't look right. Since all this is software, Intel can do a few cool things to help developers out: where there is transparency, they can maintain an list of fragments at that screen position with z info attached rather than just blending or discarding data immediately. This way, when the blend is performed, it can be done properly no matter what order the geometry was rendered in.

Additionally, Irregular Z-buffers (which can allow for the creation of screen resolution shadow maps to avoid artifacts) and other complex data structures that can't easily or efficiently be implemented on traditional GPU hardware can be implemented on Larrabee without a second thought. Some of this stuff Intel can do on the back end to improve quality and performance in all applications, but some of it really won't make a difference until developers start to embrace the new architecture. And it's not just doing new things -- there are probably plenty of devs out there who would love to entirely skip the step of sorting their polygons when dealing with layered transparency.

Building an Optimized Rasterizer for Larrabee The Future of Larrabee: The Many Core Era and Launch Questions
Comments Locked

101 Comments

View All Comments

  • ocyl - Monday, August 4, 2008 - link

    Larrabee will be shipped when Diablo III is, and it will mark the beginning of the end for DirectX.

    Calling it first here at AnandTech.

    Thanks go to Anand and Derek for the very well written article. You are the ones who keep tech journalism alive.
  • erikespo - Monday, August 4, 2008 - link

    "At 143 mm^2, Intel could fit 10 Larrabee-like cores so let's double that. Now we're at 286mm^2 (still smaller than GT200 and about the size of AMD's RV770) and 20-cores. Double that once more and we've got 40-cores and have a 572mm^2 die, virtually the same size as NVIDIA's GT200 but on a 65nm process. "

    this math is way off

    143 mm^2 is 20449mm.. if they fit 10 there that is 2044.9 per core
    286mm^2 is 81796mm.. that is 4X the space so 40 cores in 286^2
    and 572mm^2 is 327184mm is 160 cores..

    double length will double area.. doubling length and width will quadruple area.
  • bauerbrazil - Monday, August 4, 2008 - link

    Hahahaha, YOUR math is way off!!!

    Jesus.
  • erikespo - Monday, August 4, 2008 - link

    I see where the article and you got your math..
    you both did 143mm^2 / 10 and got 14.3 then divided 286^2 by 14.3 and got 20.. this math is only acting on the one number..

    I know this because the area of 14.3 is 204.49 mm. 10 of those would be 2044.9mm. but the area of 143mm^2 is 20449mm.
  • WeaselITB - Monday, August 4, 2008 - link

    Wow ... No.
    143mm^2 is NOT equivalent to 143^2 mm ... Your analysis is flawed.

    If we use your example, 2mm^2 is NOT 2mm x 2mm ... it's actually root(2)mm x root(2)mm ... 4mm^2 is 2mm x 2mm, not 4mm x 4mm (that'd be 16mm).

    Maybe you should examine in depth that Wikipedia article you linked earlier ...

    Thanks,
    -Weasel
  • MamiyaOtaru - Monday, August 4, 2008 - link

    143mm^2 is NOT equivalent to 143^2 mm

    ^^THIS

    That's it in a nutshell. mm² doesn't mean you square 143, it refers to Square Millimeters, a unit of area (unlike Millimeters, a unit of distance).

    Revised mspaint illustration: http://img379.imageshack.us/my.php?image=squaremmh...">http://img379.imageshack.us/my.php?image=squaremmh...
  • erikespo - Monday, August 4, 2008 - link

    Anandtech Comment Section.. Forever record of my retardedness
  • erikespo - Monday, August 4, 2008 - link

    Dang.. Many apologies..
    got my square area and squared numbers confused..
  • WeaselITB - Monday, August 4, 2008 - link

    [quote]4mm^2 is 2mm x 2mm, not 4mm x 4mm (that'd be 16mm).[/quote]

    Dang, that was supposed to read "(that'd be 16mm^2)."

    Thanks,
    -Weasel
  • erikespo - Monday, August 4, 2008 - link

    another way to look as it is how man 143mm^2 squares does it take to make up 286mm^2?

    only 2 would only be 143mm x 286mm

    since 10 cores fit into 143 x 143, 20 will fit into 143 x 286mm
    286 x 286 (which is double that of 143 x 286mm) the 286mm^2 would fit 40

Log in

Don't have an account? Sign up now