Programming for Larrabee

The Larrabee programming model is what sets it apart from the competition. While competing GPU architectures have become increasingly programmable over the years, Larrabee starts from a position of being fully programmable. To the developer, it appears as exactly what it is - an arrangement of fully cache coherent x86 microprocessors. The first iteration of Larrabee will hide this fact from the OS through its graphics driver, but future versions of the chip could conceivably populate task manager just like your desktop x86 cores do today.

You have two options for harnessing the power of Larrabee: writing standard DirectX/OpenGL code, or writing directly to the hardware using Larrabee C/C++, which as it turns out is standard C (you can use compilers from MS, Intel, GCC, etc...). In a sense, this is no different than what NVIDIA offers with its GPUs - they will run DirectX/OpenGL code, or they can also run C-code thanks to CUDA. The difference here is that writing directly to Larrabee gives you some additional programming flexibility thanks to the GPU being an array of fully functional x86 GPUs. Programming for x86 architectures is a paradigm that the software community as a whole is used to, there's no learning curve, no new hardware limitations to worry about and no waiting on additional iterations of CUDA to enable new features. You treat Larrabee like you treat your host CPU.

Game developers aren't big on learning new tricks however, especially on an unproven, unreleased hardware platform such as Larrabee. Larrabee must run DirectX/OpenGL code out of the box, and to do this Intel has written its own Larrabee native software renderer to interface between DX/OGL and the Larrabee hardware.

In AMD/NVIDIA GPUs, DirectX/OpenGL instructions map to an internal GPU instruction set at runtime. With Larrabee Intel does this mapping in software, taking DX/OGL instructions, mapping them to its software renderer, and then running its software renderer on the Larrabee hardware.

This intermediate stage should incur a performance penalty, as writing directly to Larrabee is always going to be faster. However Intel has apparently produced a highly optimized software renderer for Larrabee, once that's efficient enough so that any performance penalty introduced by the intermediate stage is made up for by the reduction of memory bandwidth enabled by the software renderer (we'll get to how this is possible in a moment).

Developers can also use a hybrid approach to Larrabee development. Larrabee can run standard DX/OGL code but if there are features developers want to implement that aren't enabled in the current DirectX version, they can simply write those features that they want in Larrabee C/C++.

Without hardware it's difficult to tell exactly how well Larrabee will run DirectX/OpenGL code, but Intel knows it must succeed on running current games very well in order to make this GPU a success.

Cache and Memory Hierarchy: Architected for Low Latency Operation A Tribute to Michael Abrash: The ISA
Comments Locked

101 Comments

View All Comments

  • phaxmohdem - Monday, August 4, 2008 - link

    Can your mom play Crysis? *burn*
  • JonnyDough - Monday, August 4, 2008 - link

    I suppose she could but I don't think she would want to. Why do you care anyway? Have some sort of weird fetish with moms playing video games or are you just looking for another woman to relate to?

    Ooooh, burn!
  • Griswold - Monday, August 4, 2008 - link

    He is looking for the one playing his mom, I think.
  • bigboxes - Monday, August 4, 2008 - link

    Yup. He worded it incorrectly. It should have read, "but can it play your mom?" :p
  • Tilmitt - Monday, August 4, 2008 - link

    I'm really disappointed that Intel isn't building a regular GPU. I doubt that bolting a load of unoptimised x86 cores together is going to be able to perform anywhere near as well as a GPU built from the ground up to accelerate graphics, given equal die sizes.
  • JKflipflop98 - Monday, August 4, 2008 - link

    WTF? Did you read the article?
  • Zoomer - Sunday, August 10, 2008 - link

    He had a point. More programmable == more transistors. Can't escape from that fact.

    Given equal number of transistors, running the same program, a more programmable solution will always be crushed by fixed function processors.
  • JonnyDough - Monday, August 4, 2008 - link

    I was wondering that too. This is obviously a push towards a smaller Centrino type package. Imagine a powerful CPU that can push graphics too. At some point this will save a lot of battery juice in a notebook computer, along with space. It may not be able to play games, but I'm pretty sure it will make for some great basic laptops someday that can run video. Not all college kids and overseas marines want to play video games. Some just want to watch clips of their family back home.
  • rudolphna - Monday, August 4, 2008 - link

    as interesting and cool as this sounds, this is even more bad news for AMD, who was finally making up for lost ground. granted, its still probably 2 years away, and hopefully AMD will be back to its old self (Athlon64 era) They are finally getting products that can actually compete. Another challenger, especially from its biggest rival-Intel- cannot be good for them.
  • bigboxes - Monday, August 4, 2008 - link

    What are you talking about? It's been nothing but good news for AMD lately. Sure, let Intel sink a lot of $$ into graphics. Sounds like a win for AMD (in a roundabout way). It's like AMD investing into a graphics maker (ATI) instead of concentrating on what makes them great. Most of the Intel supporters were all over AMD for making that decision. Turn this around and watch Intel invest heavily into graphics and it's a grand slam. I guess it's all about perspective. :)

Log in

Don't have an account? Sign up now