Shading Tiles with Larrabee (With Extra Goodies)

We've looked at the way we get from triangles to tiles a bit. Intel shared a bit of a deeper look at how they are organizing their software render on the back end (from the tiles to the screen).

First, full tiles are fetched into cache. Reaching back to understanding how threads are organized, we can have four simulatneous threads running, and keeping all four of these threads working on parts of the same data set will help keep from thrashing the cache. Intel has indicated that the organization of software rendering threads durring back end processing will be as illustrated in the following diagram.

We see that there are 4 thread with one acting as a fragment setup thread which takes all the geometry in the tile and creating fragments from it for further processing. There are then three work threads that take ready fragments (or more like groups of 4 to 16 fragments each -- just a guess for now), check to see if they are visible, shade the fragment (load textures and run associated shader programs), perform any antialiasing and handle blend operations. Remember that this is all just software. It doesn't have to happen this way, but this is the direction Intel had indicated they have taken for their software renderer and for implementing DirectX and OpenGL.

By the time Larrabee arrives as a product, I certainly hope that we'll get a deeper look at what's really going on under the hood and how everything is organized. I suppose the holy grail would be if Intel decides to release it's software renderer source code to the general public, but even if we don't get that we'll try to get information on all the different types of threads, fibers and strands that are spawned to handle all the different steps in the rendering pipeline.

Beyond just taking traditionally fixed function features and running them in software, Intel can do a few cool things that are difficult with current hardware. In order to get layered transparency to work right, game developers need to sort objects and polygons as best then can from back to front (rendendering the furthest object to the screen first). If this isn't done, we can get some funky artifacts that don't look right. Since all this is software, Intel can do a few cool things to help developers out: where there is transparency, they can maintain an list of fragments at that screen position with z info attached rather than just blending or discarding data immediately. This way, when the blend is performed, it can be done properly no matter what order the geometry was rendered in.

Additionally, Irregular Z-buffers (which can allow for the creation of screen resolution shadow maps to avoid artifacts) and other complex data structures that can't easily or efficiently be implemented on traditional GPU hardware can be implemented on Larrabee without a second thought. Some of this stuff Intel can do on the back end to improve quality and performance in all applications, but some of it really won't make a difference until developers start to embrace the new architecture. And it's not just doing new things -- there are probably plenty of devs out there who would love to entirely skip the step of sorting their polygons when dealing with layered transparency.

Building an Optimized Rasterizer for Larrabee The Future of Larrabee: The Many Core Era and Launch Questions
Comments Locked

101 Comments

View All Comments

  • erikespo - Monday, August 4, 2008 - link

    http://en.wikipedia.org/wiki/Square_%28geometry%29">http://en.wikipedia.org/wiki/Square_%28geometry%29

    helpful page to take you back to first grade

    and excuse my decimal point.. it is 204.49mm total per core or 14.3mm^2
  • erikespo - Monday, August 4, 2008 - link

    Explain.

    lets use smaller numbers for you 2mm^2 is 2mm by 2 mm or 4 total mm

    double that and it is 4mm^2 or 4 mm by 4 mm or 16mm total..

    we are talking about area or 2 dimensions not 1 dimension.

    Same math applies to the article
  • MamiyaOtaru - Monday, August 4, 2008 - link

    No, you're way off. 2mm² is TWO square millimeters. (a rectangle 1x2 for example). Double that would be 4mm², which could either be 1x4 or 2x2.

    NUMBERmm² doesn't mean NUMBERxNUMBER mm, it means exactly what it says: NUMBER mm².

    Using your smaller numbers: 2mm² is not "4 total mm"; it is TWO mm². Saying it is 4 total mm doesn't even make sense. You _can't_ measure area in millimeters. You measure it in square millimeters, and there are two of them (_2_mm²).

    Here's an mspaint visual (if links work: http://img105.imageshack.us/my.php?image=squaremma...">http://img105.imageshack.us/my.php?image=squaremma...

    You're so sure you're right on this, it's really depressing :(
  • darkequitus - Monday, August 4, 2008 - link

    I did not appriciate the writer creaming over every digital page they wrote. especially when Larrabee's performance is mainl at the moment based on INtel hype and nothing real.
  • ZootyGray - Monday, August 4, 2008 - link

    THANK YOU.

    Somebody finally said it.

    The others prefer Eutopian illusion - aka the curse aka ntel antitrust. ntel has no grafx and the fools in the public buy "inside' and nvid and ati aren't exactly friends of the curse.

    welcome to the matrix. wakey wakey
  • ZootyGray - Monday, August 4, 2008 - link

    and a 16 pager on maybe might could be should be = wannabe "employ-boy"
    - payday ? hooyeh. This is so disappointing for me. Credibility sags to a new low.
  • strikeback03 - Tuesday, August 5, 2008 - link

    Someone whose two posts contain about 10 complete words and no complete thoughts says Anandtech's credibility has sagged to a new low?
  • ZootyGray - Tuesday, August 5, 2008 - link

    haha yeh - lots of room for thinking.
    or - if no thinkeez - ya gots der 16 pg inundation (that's a big word like marmalade) all based on nothing-is-real - you like that kind of brainwash? we don't know anything; but here's the tekspex?
    btw - did u get it? the matrix idea? watch the movie. cos here it is. pardon my loaded cryptic literacy.
    thx
    if you don't get it - well, that's what they want - a world of sleeping mob. never mind, that's just my concern.

  • The Preacher - Monday, August 4, 2008 - link

    I don't really care about how good it will be executing some software renderer but I feel it is going to kick ass in scientific calculations. Matrix operations, FFT/convolution, tremendous bandwidth, double precission... I may write C++/x86 assembly code directly for it and I may put this into a rack of servers and use it through MPI. Give me a compiler with vector intrinsic functions for it and my dreams just came true! :)
  • elerick - Monday, August 4, 2008 - link

    I have been a daily reader of another hardware review site for years. I ready nearly every articles that headlines and find many of them quite lacking. Today I got wind of your review for the Larabee. It was very well written and produced an amazing amount of tech knowledge not really commonly reviewed. I'm glad to have found you this site, and I never create an account but today I felt obligated to. Great work.

    PS: any news on that AMD / Fusion? or is that just them being intimidated by Intel's Larrabee?

Log in

Don't have an account? Sign up now