Building an Optimized Rasterizer for Larrabee

We've touched on the latency focus. We talked about caches and internal memory busses. But what about external memory? To be honest, the answer is that we don't know. But we have an idea of the direction they want to move in. Lower external bandwidth and possibly lower framebuffer size than traditional hardware seems to be the goal. If they can maintain good performance, reducing the amount of memory and the number of traces on the board will reduce the cost to add-in card vendors who may want to sell cards based on Larrabee (and in turn could reduce cost to the end user).

This bit of speculation isn't just based on what we know about the hardware so far. It's also based on the direction they decided to take with their rasterizer: Intel is implementing a tile based rasterizer to support DirectX and OpenGL as well as their own software renderer. Speaking of their software renderer, they did state that it would be available for use by developers so that they don't have to start from nothing. When asked whether it would be available only as a set of binaries or as source, our answer was that this was still under discussion. We put in our two cents and suggested that distributing the source is the way to go.

Anyway, we haven't discussed tile based rasterization in quite a while on AnandTech as the Kyro line didn't stick around on the desktop. To briefly run it down, screen space is broken up into tiles. For each tile, primitives (triangles) are set aside. Fragments are created for a tile based on all the geometry therein. Since none of these fragments are further processed or shaded until the entire tile is finished, only visible fragments are sent on to be shaded (at least, this is how it used to be: some aspects of DX10+ may require occluded fragments to hang around in some cases). Occluded fragments are thrown out during rasterization. Intel does also support Z culling at geometry, fragment and pixel levels, which is also very useful as the actual rasterization, blending etc. must occur in software as well. Cutting down work at every point possible is the modus operandi of optimizing graphics.

This is in stark contrast to immediate mode renderers, which are what ATI and NVIDIA have been building for the past decade. Immediate mode rendering requires more memory bandwidth as it processes every fragment in the scene, sometimes even those that aren't visible (that can't easily be thrown out by pre-shading depth test techniques). Immediate mode renderers have some tricks that can let them know what fragments will be visible in the scene to help cut down on work, but there are still cases where the GPU does extra work that it doesn't need to because the fragment it is processing and shading isn't even visible in the scene. Immediate mode renderers require more memory bandwidth than tile based renderers, but some algorithms and features have been easier to implement with immediate mode.

STMicro had a short run of popular tile (or deferred) renderers in the early 2000s with the Kyro series. This style of rendering still lives on in cell phone/smart phone and other ultra low power devices that need graphics. While performance on this hardware is very low, memory efficiency is important in this space and thus tile based renderers are preferred.

The technique dropped out of the desktop space not because it was inherently unable to perform, but simply because the players that won out in the era didn't choose to make use of it. With smaller process technology, larger on die cache sizes, larger tiles sizes, and smaller geometry (meaning less triangles span multiple tiles), some advantages of tile based rendering have gotten ... well, more advantageous with advancements in technology.

Getting into the details of tile based rendering is a bit beyond where we want to go right now. But the point is that this technique results fewer occluded fragments end up being shaded. Additionally, the grouping of fragments into tiles helps with breaking up the workload and could help to optimize prefetching and caching so that fragments are only ever fetched once from external memory (tiles on Larrabee will fit into less than half the L2 space per core). These and other features help to reduce bandwidth needs compared to immediate mode renderers.

Looking a little deeper, it is both the burden and advantage of Larrabee that it implements all steps of the traditional graphics pipeline in software. While current GPUs have hardware for geometry setup, rasterization, texturing, filtering, compressing, decompressing, blending and much more, Larrabee maintains a minimum of fixed function features (related to texturing). Often, for a specific purpose, fixed function hardware can be more efficient and faster than general purpose hardware. But at the same time, the needs of individual games shift, and allocating greater or fewer resources to a specific component of the rendering pipeline does have advantages over fixed function hardware. Current GPUs can't shift resources to offer faster rasterization if needed. They can't devote more flops to speeding up stenciling or blending.

The flexibility of Larrabee allows it to best fit any game running on it. But keep in mind that just because software has a greater potential to better utilize the hardware, we won't necessarily see better performance than what is currently out there. The burden is still on Intel to build a part that offers real-world performance that matches or exceeds what is currently out there. Efficiency and adaptability are irrelevant if real performance isn't there to back it up.

Thread and Data Management: It's Time to Blow Your Mind Shading Tiles with Larrabee (With Extra Goodies)
Comments Locked

101 Comments

View All Comments

  • del - Friday, August 15, 2008 - link

    Don't be a hater. :P Intel has got it goin' on right now. Believe in the POWAH of Larrabee... unless it proves to be a failure upon release.

    :)
  • Thatguy97 - Sunday, June 28, 2015 - link

    IM FROM TEH FUTURE LARRABAE WAS CANCELLED OMG XDDDDD
  • atlmann10 - Saturday, August 9, 2008 - link

    Think about this ok AMD originally was a private IBM cpu manufacturer. Then bought out and run as a side unit of INTEL, that was dropped after they were done with them. So in a way the were partners and I'm sure there was some friendliness. As it's always been said keep your friends close but your enemies closer. There have been some things especially in these past two years that struck me kind of odd. Such as AMD's graphics chips running fine on a x38/48 chipset and the physics collaboration things as well as a few other rumors. Then Nvidia starts spouting off about how they could kick INTELS A77 etc. Now AMD has a definite GPU coprocessor in ATI and they wanna break into the market of GPU's etc. They know that there will be graphics competition with Nvidia being there largest competitior because there dedicated to GPU's solely and have a reputation. However now AMD has some chips that compete straight on weakening Nvidia to a point. Then AMD is getting more and more out of there cpu's gpu's and chipsets so INTEl jumps in the CPU GPU market just like AMD. Either way it turns out more are going to go with INTEL cpu's and many other products where AMD is kind of a fringe player. Who would you rather compete against full on 2 major GPU manufacturers or attempt to kind of co-align yourself with there competetitor while the somewhat down. Then throw out a whole new way to do graphics that performs well Nvidia is already loosing market share. So more people try it and the same number of people go with ATI. That leaves a much lower market for Nvidia plus there paying back what some 200 million dollars in bad GPU's right now as well and a few other problems they been having. Now this is not anything I know but knowing INTEL loves to stick it to competitors when there weak think about it.
  • benkantor - Wednesday, August 6, 2008 - link

    if you could fit 10 Larrabees on 143 mm^2, you could fit 40 Larrabees on 286 mm^2, not 20... :P
  • MamiyaOtaru - Saturday, August 9, 2008 - link

    For the love of education. We've already been through this. See the end of page 6 through page 7 in the comments section.

    143mm^2 doesn't mean 143*143. It means 143 square millimeters. 286 square millimeters is twice as many, allowing twice as many cores.
    http://img379.imageshack.us/my.php?image=squaremmh...">http://img379.imageshack.us/my.php?image=squaremmh...

    The article is right and you are so very wrong.
  • Barack Obama - Wednesday, August 6, 2008 - link

    Derek and Anand deliver again!
  • KGR - Wednesday, August 6, 2008 - link

    I am not a profeesional about software and hardware that is why maybe this question can sound nonsense .
    If larrabee will have a software renderer and programmed by C++ is it possible that it is not depended on windows? I mean if it doesnt need direct X can we run the games on Linux also??
  • npoe1 - Tuesday, August 5, 2008 - link

    I enjoyed reading this so much. I think that this kind of articles is what Anandtech needs; I usually go to Arstechnica to read things like this one.

    Again, thanks!
  • TrEmEnDo - Tuesday, August 5, 2008 - link

    I am definitely impressed with this new development and I expect that this technology will be disruptive down the road, however I feel that somehow they are about to commit another of their megalomaniac mistakes.
    Has anyone stopped for a sec and look where all gaming industry is heading into? Are PCs the future gaming platform? Maybe I am missing something but aren't the big guys already struggling to retain a 'decent' percentage of the multibillion gaming pie (PC gaming alliance anyone...)? I believe that whether us, tech enthusiast, hardcore pc gamers like it or not, it is the console arena where the big guns are going to be playing in a few years from now.
    Guys, we are seeing this happening everyday, we see tittles appearing and disappearing everyday b/c companies don't want to commit the resources to develop games for more than one or two platforms (normally doing a sloppy work BTW). Now that the grandpas of graphic hardware had manage to get DX/D3D derived engines into the last gen consoles (xenos, RSX) and a terribly inertial and rigid developer community avoids and whines about how difficult is to program for the few hardware 'jewels' that we have already in our hands (Cell/RV770/G200) do you think anyone except Intel is in the mood for yet another graphics industry spin?

    I have no doubt that this new development will have its own niche application or someone will definitely find something appropriate for it, but to say that Larrabee CAN do graphics and to say larrabee will kick ass so bad that in 3 years from now we all will be gaming from a Larrabee containing computer are two very different things.
    Congrats to Intel as the fathers of the creature, and congrats to us to see the tech world moving on....but just don't think this will change the world as we know it.
  • hooflung - Tuesday, August 5, 2008 - link

    They are doing something very AMD like and taking it a step further and tossing in a few Power ideals in. I just wonder what the power profile will look like and who will partner up with Intel for it.

    I am sure they will have 4+ of these cores built into integrated chip sets for OEMs and laptops to really boost those areas. And people who buy laptops will see that they can get a desktop with 'bigger larrabee' and play their games faster than their budget/laptop computer.

    So it does make sense. However, it is an empire made on a lot of ifs. It will be fun to watch. Thanks anandtech for the informative article.

Log in

Don't have an account? Sign up now