Programming for Larrabee

The Larrabee programming model is what sets it apart from the competition. While competing GPU architectures have become increasingly programmable over the years, Larrabee starts from a position of being fully programmable. To the developer, it appears as exactly what it is - an arrangement of fully cache coherent x86 microprocessors. The first iteration of Larrabee will hide this fact from the OS through its graphics driver, but future versions of the chip could conceivably populate task manager just like your desktop x86 cores do today.

You have two options for harnessing the power of Larrabee: writing standard DirectX/OpenGL code, or writing directly to the hardware using Larrabee C/C++, which as it turns out is standard C (you can use compilers from MS, Intel, GCC, etc...). In a sense, this is no different than what NVIDIA offers with its GPUs - they will run DirectX/OpenGL code, or they can also run C-code thanks to CUDA. The difference here is that writing directly to Larrabee gives you some additional programming flexibility thanks to the GPU being an array of fully functional x86 GPUs. Programming for x86 architectures is a paradigm that the software community as a whole is used to, there's no learning curve, no new hardware limitations to worry about and no waiting on additional iterations of CUDA to enable new features. You treat Larrabee like you treat your host CPU.

Game developers aren't big on learning new tricks however, especially on an unproven, unreleased hardware platform such as Larrabee. Larrabee must run DirectX/OpenGL code out of the box, and to do this Intel has written its own Larrabee native software renderer to interface between DX/OGL and the Larrabee hardware.

In AMD/NVIDIA GPUs, DirectX/OpenGL instructions map to an internal GPU instruction set at runtime. With Larrabee Intel does this mapping in software, taking DX/OGL instructions, mapping them to its software renderer, and then running its software renderer on the Larrabee hardware.

This intermediate stage should incur a performance penalty, as writing directly to Larrabee is always going to be faster. However Intel has apparently produced a highly optimized software renderer for Larrabee, once that's efficient enough so that any performance penalty introduced by the intermediate stage is made up for by the reduction of memory bandwidth enabled by the software renderer (we'll get to how this is possible in a moment).

Developers can also use a hybrid approach to Larrabee development. Larrabee can run standard DX/OGL code but if there are features developers want to implement that aren't enabled in the current DirectX version, they can simply write those features that they want in Larrabee C/C++.

Without hardware it's difficult to tell exactly how well Larrabee will run DirectX/OpenGL code, but Intel knows it must succeed on running current games very well in order to make this GPU a success.

Cache and Memory Hierarchy: Architected for Low Latency Operation A Tribute to Michael Abrash: The ISA
Comments Locked

101 Comments

View All Comments

  • Shinei - Monday, August 4, 2008 - link

    Some competition might do nVidia good--if Larrabee manages to outperform nvidia, you know nvidia will go berserk and release another hammer like the NV40 after R3x0 spanked them for a year.

    Maybe we'll start seeing those price/performance gains we've been spoiled with until ATI/AMD decided to stop being competitive.

    Overall, this can only mean good things, even if Larrabee itself ultimately fails.
  • Griswold - Monday, August 4, 2008 - link

    Wake-up call dumbo. AMD just started to mop the floor with nvidias products as far as price/performance goes.
  • watersb - Monday, August 4, 2008 - link

    great article!

    You compare the Larrabee to a Core 2 duo - for SIMD instructions, you multiplied by a (hypothetical) 10 cores to show Larrabee at 160 SIMD instructions per clock (IPC). But you show non-vector IPC as 2.

    For a 10-core Larrabee, shouldn't that be x10 as well? For 20 scalar IPC
  • Adamv1 - Monday, August 4, 2008 - link

    I know Intel has been working on Ray Tracing and I'm really curious how this is going to fit into the picture.

    From what i remember Ray Tracing is a highly parallel and scales quite well with more cores and they were talking about introducing it on 8 core processors, it seems to me this would be a great platform to try it on.
  • SuperGee - Thursday, August 7, 2008 - link

    How it fit's.
    GPU from ATI and nV are called HArdware renderers. Stil a lot of fixed funtion. Rops TMU blender rasterizer etc. And unified shader are on the evolution to get more general purpouse. But they aren't fully GP.
    This larrabee a exotic X86 massive multi core. Will act as just like a Multicore CPU. But optimised for GPU task and deployed as GPU.
    So iNTel use a Software renderer and wil first emulate DirectX/OpenGL on it with its drivers.
    Like nv ATI is more HAL with as backup HEL
    Where Larrabee is pure HEL. But it's parralel power wil boost Software method as it is just like a large bunch of X86 cores.
    HEL wil runs fast, as if it was 'HAL' with LArrabee. Because the software computing power for such task are avaible with it.

    What this means is that as a GFX engine developer you got full freedom if you going to use larrabee directly.

    Like they say first with a DirectX/openGL driver. Later with also a CPU driver where it can be easy target directly. thus like GPGPU task. but larrabee could pop up as extra cores in windows.
    This means, because whatever you do is like a software solution.
    You can make a software rendere on Ratracing method, but also a Voxel engine could be done to. But this software rendere will be accelerated bij the larrabee massive multicore CPU with could do GPU stuf also very good. But will boost any software renderer. Offcourse it must be full optimised for larrabee to get the most out of it. using those vector units and X86 larrabee extention.

    Novalogic could use this to, for there Voxel game engine back in the day's of PIII.

    It could accelerate any software renderer wich depend heavily on parralel computing.
  • icrf - Monday, August 4, 2008 - link

    Since I don't play many games anymore, that aspect of Larrabee doesn't interest me any more than making economies of scale so I can buy one cheap. I'm very interested in seeing how well something like POV-Ray or an H.264 encoder can be implemented, and what kind of speed increase it'd see. Sure, these things could be implemented on current GPUs through Cuda/CTM, but that's such an different kind of task, it's not at all quick or easy. If it's significantly simpler, we'd actually see software sooner that supports it.
  • cyberserf - Monday, August 4, 2008 - link

    one word: MATROX
  • Guuts - Monday, August 4, 2008 - link

    You're going to have to use more than one word, sorry... I have no idea what in this article has anything to do with Matrox.
  • phaxmohdem - Monday, August 4, 2008 - link

    What you mean you DON'T have a Parhelia card in your PC? WTF is wrong with you?
  • TonyB - Monday, August 4, 2008 - link

    but can it play crysis?!

Log in

Don't have an account? Sign up now