Programming for Larrabee

The Larrabee programming model is what sets it apart from the competition. While competing GPU architectures have become increasingly programmable over the years, Larrabee starts from a position of being fully programmable. To the developer, it appears as exactly what it is - an arrangement of fully cache coherent x86 microprocessors. The first iteration of Larrabee will hide this fact from the OS through its graphics driver, but future versions of the chip could conceivably populate task manager just like your desktop x86 cores do today.

You have two options for harnessing the power of Larrabee: writing standard DirectX/OpenGL code, or writing directly to the hardware using Larrabee C/C++, which as it turns out is standard C (you can use compilers from MS, Intel, GCC, etc...). In a sense, this is no different than what NVIDIA offers with its GPUs - they will run DirectX/OpenGL code, or they can also run C-code thanks to CUDA. The difference here is that writing directly to Larrabee gives you some additional programming flexibility thanks to the GPU being an array of fully functional x86 GPUs. Programming for x86 architectures is a paradigm that the software community as a whole is used to, there's no learning curve, no new hardware limitations to worry about and no waiting on additional iterations of CUDA to enable new features. You treat Larrabee like you treat your host CPU.

Game developers aren't big on learning new tricks however, especially on an unproven, unreleased hardware platform such as Larrabee. Larrabee must run DirectX/OpenGL code out of the box, and to do this Intel has written its own Larrabee native software renderer to interface between DX/OGL and the Larrabee hardware.

In AMD/NVIDIA GPUs, DirectX/OpenGL instructions map to an internal GPU instruction set at runtime. With Larrabee Intel does this mapping in software, taking DX/OGL instructions, mapping them to its software renderer, and then running its software renderer on the Larrabee hardware.

This intermediate stage should incur a performance penalty, as writing directly to Larrabee is always going to be faster. However Intel has apparently produced a highly optimized software renderer for Larrabee, once that's efficient enough so that any performance penalty introduced by the intermediate stage is made up for by the reduction of memory bandwidth enabled by the software renderer (we'll get to how this is possible in a moment).

Developers can also use a hybrid approach to Larrabee development. Larrabee can run standard DX/OGL code but if there are features developers want to implement that aren't enabled in the current DirectX version, they can simply write those features that they want in Larrabee C/C++.

Without hardware it's difficult to tell exactly how well Larrabee will run DirectX/OpenGL code, but Intel knows it must succeed on running current games very well in order to make this GPU a success.

Cache and Memory Hierarchy: Architected for Low Latency Operation A Tribute to Michael Abrash: The ISA
Comments Locked

101 Comments

View All Comments

  • erikespo - Monday, August 4, 2008 - link

    http://en.wikipedia.org/wiki/Square_%28geometry%29">http://en.wikipedia.org/wiki/Square_%28geometry%29

    helpful page to take you back to first grade

    and excuse my decimal point.. it is 204.49mm total per core or 14.3mm^2
  • erikespo - Monday, August 4, 2008 - link

    Explain.

    lets use smaller numbers for you 2mm^2 is 2mm by 2 mm or 4 total mm

    double that and it is 4mm^2 or 4 mm by 4 mm or 16mm total..

    we are talking about area or 2 dimensions not 1 dimension.

    Same math applies to the article
  • MamiyaOtaru - Monday, August 4, 2008 - link

    No, you're way off. 2mm² is TWO square millimeters. (a rectangle 1x2 for example). Double that would be 4mm², which could either be 1x4 or 2x2.

    NUMBERmm² doesn't mean NUMBERxNUMBER mm, it means exactly what it says: NUMBER mm².

    Using your smaller numbers: 2mm² is not "4 total mm"; it is TWO mm². Saying it is 4 total mm doesn't even make sense. You _can't_ measure area in millimeters. You measure it in square millimeters, and there are two of them (_2_mm²).

    Here's an mspaint visual (if links work: http://img105.imageshack.us/my.php?image=squaremma...">http://img105.imageshack.us/my.php?image=squaremma...

    You're so sure you're right on this, it's really depressing :(
  • darkequitus - Monday, August 4, 2008 - link

    I did not appriciate the writer creaming over every digital page they wrote. especially when Larrabee's performance is mainl at the moment based on INtel hype and nothing real.
  • ZootyGray - Monday, August 4, 2008 - link

    THANK YOU.

    Somebody finally said it.

    The others prefer Eutopian illusion - aka the curse aka ntel antitrust. ntel has no grafx and the fools in the public buy "inside' and nvid and ati aren't exactly friends of the curse.

    welcome to the matrix. wakey wakey
  • ZootyGray - Monday, August 4, 2008 - link

    and a 16 pager on maybe might could be should be = wannabe "employ-boy"
    - payday ? hooyeh. This is so disappointing for me. Credibility sags to a new low.
  • strikeback03 - Tuesday, August 5, 2008 - link

    Someone whose two posts contain about 10 complete words and no complete thoughts says Anandtech's credibility has sagged to a new low?
  • ZootyGray - Tuesday, August 5, 2008 - link

    haha yeh - lots of room for thinking.
    or - if no thinkeez - ya gots der 16 pg inundation (that's a big word like marmalade) all based on nothing-is-real - you like that kind of brainwash? we don't know anything; but here's the tekspex?
    btw - did u get it? the matrix idea? watch the movie. cos here it is. pardon my loaded cryptic literacy.
    thx
    if you don't get it - well, that's what they want - a world of sleeping mob. never mind, that's just my concern.

  • The Preacher - Monday, August 4, 2008 - link

    I don't really care about how good it will be executing some software renderer but I feel it is going to kick ass in scientific calculations. Matrix operations, FFT/convolution, tremendous bandwidth, double precission... I may write C++/x86 assembly code directly for it and I may put this into a rack of servers and use it through MPI. Give me a compiler with vector intrinsic functions for it and my dreams just came true! :)
  • elerick - Monday, August 4, 2008 - link

    I have been a daily reader of another hardware review site for years. I ready nearly every articles that headlines and find many of them quite lacking. Today I got wind of your review for the Larabee. It was very well written and produced an amazing amount of tech knowledge not really commonly reviewed. I'm glad to have found you this site, and I never create an account but today I felt obligated to. Great work.

    PS: any news on that AMD / Fusion? or is that just them being intimidated by Intel's Larrabee?

Log in

Don't have an account? Sign up now