Building an Optimized Rasterizer for Larrabee

We've touched on the latency focus. We talked about caches and internal memory busses. But what about external memory? To be honest, the answer is that we don't know. But we have an idea of the direction they want to move in. Lower external bandwidth and possibly lower framebuffer size than traditional hardware seems to be the goal. If they can maintain good performance, reducing the amount of memory and the number of traces on the board will reduce the cost to add-in card vendors who may want to sell cards based on Larrabee (and in turn could reduce cost to the end user).

This bit of speculation isn't just based on what we know about the hardware so far. It's also based on the direction they decided to take with their rasterizer: Intel is implementing a tile based rasterizer to support DirectX and OpenGL as well as their own software renderer. Speaking of their software renderer, they did state that it would be available for use by developers so that they don't have to start from nothing. When asked whether it would be available only as a set of binaries or as source, our answer was that this was still under discussion. We put in our two cents and suggested that distributing the source is the way to go.

Anyway, we haven't discussed tile based rasterization in quite a while on AnandTech as the Kyro line didn't stick around on the desktop. To briefly run it down, screen space is broken up into tiles. For each tile, primitives (triangles) are set aside. Fragments are created for a tile based on all the geometry therein. Since none of these fragments are further processed or shaded until the entire tile is finished, only visible fragments are sent on to be shaded (at least, this is how it used to be: some aspects of DX10+ may require occluded fragments to hang around in some cases). Occluded fragments are thrown out during rasterization. Intel does also support Z culling at geometry, fragment and pixel levels, which is also very useful as the actual rasterization, blending etc. must occur in software as well. Cutting down work at every point possible is the modus operandi of optimizing graphics.

This is in stark contrast to immediate mode renderers, which are what ATI and NVIDIA have been building for the past decade. Immediate mode rendering requires more memory bandwidth as it processes every fragment in the scene, sometimes even those that aren't visible (that can't easily be thrown out by pre-shading depth test techniques). Immediate mode renderers have some tricks that can let them know what fragments will be visible in the scene to help cut down on work, but there are still cases where the GPU does extra work that it doesn't need to because the fragment it is processing and shading isn't even visible in the scene. Immediate mode renderers require more memory bandwidth than tile based renderers, but some algorithms and features have been easier to implement with immediate mode.

STMicro had a short run of popular tile (or deferred) renderers in the early 2000s with the Kyro series. This style of rendering still lives on in cell phone/smart phone and other ultra low power devices that need graphics. While performance on this hardware is very low, memory efficiency is important in this space and thus tile based renderers are preferred.

The technique dropped out of the desktop space not because it was inherently unable to perform, but simply because the players that won out in the era didn't choose to make use of it. With smaller process technology, larger on die cache sizes, larger tiles sizes, and smaller geometry (meaning less triangles span multiple tiles), some advantages of tile based rendering have gotten ... well, more advantageous with advancements in technology.

Getting into the details of tile based rendering is a bit beyond where we want to go right now. But the point is that this technique results fewer occluded fragments end up being shaded. Additionally, the grouping of fragments into tiles helps with breaking up the workload and could help to optimize prefetching and caching so that fragments are only ever fetched once from external memory (tiles on Larrabee will fit into less than half the L2 space per core). These and other features help to reduce bandwidth needs compared to immediate mode renderers.

Looking a little deeper, it is both the burden and advantage of Larrabee that it implements all steps of the traditional graphics pipeline in software. While current GPUs have hardware for geometry setup, rasterization, texturing, filtering, compressing, decompressing, blending and much more, Larrabee maintains a minimum of fixed function features (related to texturing). Often, for a specific purpose, fixed function hardware can be more efficient and faster than general purpose hardware. But at the same time, the needs of individual games shift, and allocating greater or fewer resources to a specific component of the rendering pipeline does have advantages over fixed function hardware. Current GPUs can't shift resources to offer faster rasterization if needed. They can't devote more flops to speeding up stenciling or blending.

The flexibility of Larrabee allows it to best fit any game running on it. But keep in mind that just because software has a greater potential to better utilize the hardware, we won't necessarily see better performance than what is currently out there. The burden is still on Intel to build a part that offers real-world performance that matches or exceeds what is currently out there. Efficiency and adaptability are irrelevant if real performance isn't there to back it up.

Thread and Data Management: It's Time to Blow Your Mind Shading Tiles with Larrabee (With Extra Goodies)
Comments Locked

101 Comments

View All Comments

  • iop3u2 - Monday, August 4, 2008 - link

    First of all it's called d3d not directx.

    Secondly you seem to imply that direct3d/opengl will cease to exist at some point if larrabee succeeds. I thinks you don't quite get what they are. They are APIs. Larrabee won't make programming APIless. Are you serious anand or what?
  • The Preacher - Tuesday, August 5, 2008 - link

    It could make programming D3D/OpenGL-less for programs/PCs that exploit Larrabee. And if the share of such programs/PCs increases, the share of competing solutions logically decreases and might eventually vanish (although not anytime soon).
  • iop3u2 - Tuesday, August 5, 2008 - link

    Just because you can for example write a c program without the c lib it doesn't mean that people follow that road. It's all about what programmers will choose to do.

    Also, even if they do vanish there will still be a need for an api. So there will either be a new api or they won't vanish. Both situations make no difference whatsoever to the fact that larrabee will always need api implementations.
  • ZootyGray - Tuesday, August 5, 2008 - link

    right - and I will put hotels on boardwalk and park place :)

    I used to own an 815chipset - it was like version 14 or whatever so it didn't suk as bad as some of the earlier ones - but it did blow up - I think pixelated FarCry and Doom3 really killed it. But o sure, the software fixes and bubblegum patches made it good, for a while. I really do think I am going to wait for this just so I can watch the lineups of returns - or read the funny forums posts of sheep seeking help - baaaahaha :) The best part is that it doesn't exist - delay, postpone - kinda like the 64bit chip also. Maybe later, maybe. But the ads invade the livingroom.
    Make sure you keep yer getouttajailfree card - receipt.
    Ummm let's see: I think I will buy this one!

    Reality is that 4870x2 is on deck. Not 'rumour and sigh'. I just know there will be a 16page article on that - not!
  • Pok3R - Monday, August 4, 2008 - link

    Larrabee means good news for consumers, and definitely bad news for nvidia. Maybe the worst in decades...with AMD and Ati having enough human resources now to face it, and Nvidia having nothing but bad policies and falling stocks despite good $elling numbers...

    The future, today, is definitely Intel vs AMD/Ati.
  • initialised - Monday, August 4, 2008 - link

    a miniature render farm (you know like they use to make films like Hulk and WALL-E) on a chip. Lets hope AMD and nVidia can keep up.
  • ZootyGray - Monday, August 4, 2008 - link

    Really? Guess again. There is NOT anything to keep up to.

    I do not accept that the grafx loser in the industry is going to simply become numero uno overnight.

    You really think that nvidia and ati have been sleeping for decades?

    Supporting the destruction of ntel's only competitors leaves us at the mercy of a group that's already been busted for monop and antitrst.

    Well written article? Of course, but I think it's like you are all fished in on many fronts. Nothing is really known except spin. This is beachfront property in the desert.

    There's nothing to watch except what we usually watch - released hardware benchmarks.

    I tell you AMD is going to be the cpu of choice in a few months when the truth about the bias in the benchies is revealed. And try - try real hard - to imagine ati+amd creating the ultimate cpu+gpu powerhouse. ntel needs this hype because I am not the only one with vision here. they are rich and scared, for now.

    but such talk seems to be frowned upon - so let's all cheer for the best grafx manufacturer - ntel = kkaakk! sorry to offend, so many of you just might be lost in the paid mob. so just watch and you will see for yourself- no need to believe me. I really know almost nothing - but I am free to see for myself. sorry to offend - I just can't cosign bs. but that's just me and a very few other posters here who have also been criticized. watch and see for yourself. watch...
  • Mr Roboto - Monday, August 4, 2008 - link

    I'd have to agree with the skeptics here. While the article is well written and informative (What AnandTech articles aren't?) it's purely speculation that Intel can get all of the variables right. How does a company that hasn't made a competitive GPU since the days of the 486 suddenly jump to Nvidia and ATI GPU type levels on their first try, never mind surpassing them. It's absolutely absurd to think that these chips are going to replace GPU's in terms of performance. I believe Larrabee will kick the shit out of Intel's own IGP but then again that's not much of a feat.

    Again I have to agree with previous posters that Intel just isn't that innovative. Even as I speak their are many lawsuits pending against Intel, most of them having to do with accusations of stolen IP that were used to design the Core2Duo. Antitrust suits aside, it's clear that Intel is similar to MS in that they just bully, bribe or outright steal to get ahead then pay whatever fines are levied because in the end they can never fine them enough to not make it worthwhile for Intel or MS to break the law.

    The 65nm Core2Duo is amazing. The 45nm E8400 I just bought is even more so. However the more I think about Intel's past failures as well as how they operate as a company the more far fetched this whole thing becomes.

    IMO they should have tried to compete in the dedicated GPU market before trying something like this. From a purely marketing standpoint Intel and graphics just don't go together. To come in to a new field in which they are unproven (I would bet Intel executives believe that building IGP's have somehow given them experience) and make outrageous claims such as the GPU is dead and Intel will now be the leader, is absurd.
  • JarredWalton - Tuesday, August 5, 2008 - link

    I think a lot of you are missing the point that we fully understand this is all on paper and what remains to be seen is how it actually pans out in practice. Without the necessary drivers to run DirectX and OpenGL at high performance, this will fail. How many times was that mentioned? At least two or three.

    Now, the other thing to consider is that in terms of complexity, a modern Core 2 core is far more complex to design than any of the GPUs out there. You have all sorts of general functions that need to be coded. A GPU core these days consists of a relatively simple core that you then repeat 4, 8, 16, 32, etc. times. Intel is doing exactly that with Larrabee. They went back to a simple x86 core and tacked on some serious vector processing power. Sounds a lot like NVIDIA's SP or ATI's SPU really.

    Fundamentally, they have what is necessary to make this work, and all that remains is to see if they can pull off the software side. That's a big IF, but then Intel is a big company. We have reached the point where GPUs and CPUs are merging - CUDA and GPGPU aim to do just that in some ways - so for Intel to start at the CPU side and move towards a GPU is no less valid an approach than NVIDIA/ATI starting at GPUs and moving towards general purpose CPUs.
  • Midwayman - Monday, August 4, 2008 - link

    I not interested in the graphics so much. It may or may not compete with the the top end nvidia chips if released on time. What is more interesting is if this can easily be integrated as a general purpose cpu for non-graphics work? Imagine getting a benefit out of your gpu 100% of the time, not just when you're gaming. I know its possible to use more modern GPU's this way if you code specifically for them, but with its x86 architecture, it might be able to do it without having apps specifically coded for it.

Log in

Don't have an account? Sign up now