Shading Tiles with Larrabee (With Extra Goodies)

We've looked at the way we get from triangles to tiles a bit. Intel shared a bit of a deeper look at how they are organizing their software render on the back end (from the tiles to the screen).

First, full tiles are fetched into cache. Reaching back to understanding how threads are organized, we can have four simulatneous threads running, and keeping all four of these threads working on parts of the same data set will help keep from thrashing the cache. Intel has indicated that the organization of software rendering threads durring back end processing will be as illustrated in the following diagram.

We see that there are 4 thread with one acting as a fragment setup thread which takes all the geometry in the tile and creating fragments from it for further processing. There are then three work threads that take ready fragments (or more like groups of 4 to 16 fragments each -- just a guess for now), check to see if they are visible, shade the fragment (load textures and run associated shader programs), perform any antialiasing and handle blend operations. Remember that this is all just software. It doesn't have to happen this way, but this is the direction Intel had indicated they have taken for their software renderer and for implementing DirectX and OpenGL.

By the time Larrabee arrives as a product, I certainly hope that we'll get a deeper look at what's really going on under the hood and how everything is organized. I suppose the holy grail would be if Intel decides to release it's software renderer source code to the general public, but even if we don't get that we'll try to get information on all the different types of threads, fibers and strands that are spawned to handle all the different steps in the rendering pipeline.

Beyond just taking traditionally fixed function features and running them in software, Intel can do a few cool things that are difficult with current hardware. In order to get layered transparency to work right, game developers need to sort objects and polygons as best then can from back to front (rendendering the furthest object to the screen first). If this isn't done, we can get some funky artifacts that don't look right. Since all this is software, Intel can do a few cool things to help developers out: where there is transparency, they can maintain an list of fragments at that screen position with z info attached rather than just blending or discarding data immediately. This way, when the blend is performed, it can be done properly no matter what order the geometry was rendered in.

Additionally, Irregular Z-buffers (which can allow for the creation of screen resolution shadow maps to avoid artifacts) and other complex data structures that can't easily or efficiently be implemented on traditional GPU hardware can be implemented on Larrabee without a second thought. Some of this stuff Intel can do on the back end to improve quality and performance in all applications, but some of it really won't make a difference until developers start to embrace the new architecture. And it's not just doing new things -- there are probably plenty of devs out there who would love to entirely skip the step of sorting their polygons when dealing with layered transparency.

Building an Optimized Rasterizer for Larrabee The Future of Larrabee: The Many Core Era and Launch Questions
Comments Locked

101 Comments

View All Comments

  • Griswold - Monday, August 4, 2008 - link

    You seem to be confused. Time for a nap.
  • MDme - Monday, August 4, 2008 - link

    but AMD will have Cinema 2.0. did you see that demo? by 2010, AMD will have the RV990 or whatever...and Nvidia will have GT400?
  • phaxmohdem - Monday, August 4, 2008 - link

    Considering how long it took nVidia to release a single GPU significantly faster than G80, I'd be shocked if we wee GT300 by 2009/2010. however a GTX 295GT X2 ULTRA OC is not out of the question ;)
  • shuffle2 - Monday, August 4, 2008 - link

    mm², how hard is that to write? >.>
  • 1prophet - Monday, August 4, 2008 - link

    They need to hit one out of the park with the drivers (software)as well.
  • jltate - Tuesday, August 5, 2008 - link

    I've got a bunch of comments, so I'll just list them all here.

    SSE doesn't have fused multiply-add operations. Larrabee does -- thus that 10 core processor could perform a peak of 320 floating point operations per cycle (it's mentioned in the SIGGRAPH paper).

    Larrabee's programming model is variable width -- the hardware can and likely will be augmented in the future to perform more than just 16 operations in parallel.

    The ring bus between cores was stated to be for each group of 16. Intel stated that for more than 16 cores they'd use "multiple short-linked rings".

    Also, the diagram only shows one memory controller on one side with fixed function logic on the other, not two memory controllers as you showed on page 5 of your article. However, Intel stated in the paper that the configuration and number of processors, fixed function blocks and I/O controllers would be implementation dependent. So in effect it could very well have a half-dozen 64-bit interfaces like G80.

    My forecast? This thing will rock. I for one simply cannot wait.
  • Laura Wilson - Monday, August 4, 2008 - link

    that's the truth

    they say they know this. it sounds like they know this ... we'll see what happens :-)
  • gigahertz20 - Monday, August 4, 2008 - link

    I'm going to predict Larrabee will provide a huge boost of performance over Intel's current crappy integrated graphic solutions, but will not be able to compete with AMD/ATI's and Nvidia's high end GPU's when it (Larrabee) finally launches. If Intel can deliver a monster that can push 100+ FPS in Crysis and doesn't cost so much that it breaks the bank like the current Nvidia GTX 280's, then they will have a real winner! When it finally launches though, who knows what AMD/ATI and Nvidia will have out to compete against it, wonder if Intel is just trying to push out a mainstream chip or go high end as well...guess I need to read the rest of the article :)
  • JEDIYoda - Tuesday, August 5, 2008 - link

    dreaming again huh??? you people who want top notch performance without having to pay for it....rofl..hahaha
  • FITCamaro - Monday, August 4, 2008 - link

    This isn't mean to compete with their IGPs. At least not initially.

Log in

Don't have an account? Sign up now