Intel's Larrabee Architecture Disclosure: A Calculated First Move

Name: Intel's Larrabee Architecture Disclosure: A Calculated First Move
Item: Intel's Larrabee Architecture Disclosure: A Calculated First Move
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on August 4, 2008 12:00 AM EST

Posted in
GPUs

101 Comments | Add A Comment

101 Comments

Programming for Larrabee

The Larrabee programming model is what sets it apart from the competition. While competing GPU architectures have become increasingly programmable over the years, Larrabee starts from a position of being fully programmable. To the developer, it appears as exactly what it is - an arrangement of fully cache coherent x86 microprocessors. The first iteration of Larrabee will hide this fact from the OS through its graphics driver, but future versions of the chip could conceivably populate task manager just like your desktop x86 cores do today.

You have two options for harnessing the power of Larrabee: writing standard DirectX/OpenGL code, or writing directly to the hardware using Larrabee C/C++, which as it turns out is standard C (you can use compilers from MS, Intel, GCC, etc...). In a sense, this is no different than what NVIDIA offers with its GPUs - they will run DirectX/OpenGL code, or they can also run C-code thanks to CUDA. The difference here is that writing directly to Larrabee gives you some additional programming flexibility thanks to the GPU being an array of fully functional x86 GPUs. Programming for x86 architectures is a paradigm that the software community as a whole is used to, there's no learning curve, no new hardware limitations to worry about and no waiting on additional iterations of CUDA to enable new features. You treat Larrabee like you treat your host CPU.

Game developers aren't big on learning new tricks however, especially on an unproven, unreleased hardware platform such as Larrabee. Larrabee must run DirectX/OpenGL code out of the box, and to do this Intel has written its own Larrabee native software renderer to interface between DX/OGL and the Larrabee hardware.

In AMD/NVIDIA GPUs, DirectX/OpenGL instructions map to an internal GPU instruction set at runtime. With Larrabee Intel does this mapping in software, taking DX/OGL instructions, mapping them to its software renderer, and then running its software renderer on the Larrabee hardware.

This intermediate stage should incur a performance penalty, as writing directly to Larrabee is always going to be faster. However Intel has apparently produced a highly optimized software renderer for Larrabee, once that's efficient enough so that any performance penalty introduced by the intermediate stage is made up for by the reduction of memory bandwidth enabled by the software renderer (we'll get to how this is possible in a moment).

Developers can also use a hybrid approach to Larrabee development. Larrabee can run standard DX/OGL code but if there are features developers want to implement that aren't enabled in the current DirectX version, they can simply write those features that they want in Larrabee C/C++.

Without hardware it's difficult to tell exactly how well Larrabee will run DirectX/OpenGL code, but Intel knows it must succeed on running current games very well in order to make this GPU a success.

Cache and Memory Hierarchy: Architected for Low Latency Operation A Tribute to Michael Abrash: The ISA

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

101 Comments

View All Comments

ocyl - Monday, August 4, 2008 - link
Larrabee will be shipped when Diablo III is, and it will mark the beginning of the end for DirectX.

Calling it first here at AnandTech.

Thanks go to Anand and Derek for the very well written article. You are the ones who keep tech journalism alive.
erikespo - Monday, August 4, 2008 - link
"At 143 mm^2, Intel could fit 10 Larrabee-like cores so let's double that. Now we're at 286mm^2 (still smaller than GT200 and about the size of AMD's RV770) and 20-cores. Double that once more and we've got 40-cores and have a 572mm^2 die, virtually the same size as NVIDIA's GT200 but on a 65nm process. "

this math is way off

143 mm^2 is 20449mm.. if they fit 10 there that is 2044.9 per core
286mm^2 is 81796mm.. that is 4X the space so 40 cores in 286^2
and 572mm^2 is 327184mm is 160 cores..

double length will double area.. doubling length and width will quadruple area.
bauerbrazil - Monday, August 4, 2008 - link
Hahahaha, YOUR math is way off!!!

Jesus.
erikespo - Monday, August 4, 2008 - link
I see where the article and you got your math..
you both did 143mm^2 / 10 and got 14.3 then divided 286^2 by 14.3 and got 20.. this math is only acting on the one number..

I know this because the area of 14.3 is 204.49 mm. 10 of those would be 2044.9mm. but the area of 143mm^2 is 20449mm.
WeaselITB - Monday, August 4, 2008 - link
Wow ... No.
143mm^2 is NOT equivalent to 143^2 mm ... Your analysis is flawed.

If we use your example, 2mm^2 is NOT 2mm x 2mm ... it's actually root(2)mm x root(2)mm ... 4mm^2 is 2mm x 2mm, not 4mm x 4mm (that'd be 16mm).

Maybe you should examine in depth that Wikipedia article you linked earlier ...

Thanks,
-Weasel
MamiyaOtaru - Monday, August 4, 2008 - link
143mm^2 is NOT equivalent to 143^2 mm

^^THIS

That's it in a nutshell. mm² doesn't mean you square 143, it refers to Square Millimeters, a unit of area (unlike Millimeters, a unit of distance).

Revised mspaint illustration: http://img379.imageshack.us/my.php?image=squaremmh...">http://img379.imageshack.us/my.php?image=squaremmh...
erikespo - Monday, August 4, 2008 - link
Anandtech Comment Section.. Forever record of my retardedness
erikespo - Monday, August 4, 2008 - link
Dang.. Many apologies..
got my square area and squared numbers confused..
WeaselITB - Monday, August 4, 2008 - link
[quote]4mm^2 is 2mm x 2mm, not 4mm x 4mm (that'd be 16mm).[/quote]

Dang, that was supposed to read "(that'd be 16mm^2)."

Thanks,
-Weasel
erikespo - Monday, August 4, 2008 - link
another way to look as it is how man 143mm^2 squares does it take to make up 286mm^2?

only 2 would only be 143mm x 286mm

since 10 cores fit into 143 x 143, 20 will fit into 143 x 286mm
286 x 286 (which is double that of 143 x 286mm) the 286mm^2 would fit 40

Intel's Larrabee Architecture Disclosure: A Calculated First Move

Programming for Larrabee

Post Your Comment

101 Comments

View All Comments

ocyl - Monday, August 4, 2008 - link

erikespo - Monday, August 4, 2008 - link

bauerbrazil - Monday, August 4, 2008 - link

erikespo - Monday, August 4, 2008 - link

WeaselITB - Monday, August 4, 2008 - link

MamiyaOtaru - Monday, August 4, 2008 - link

erikespo - Monday, August 4, 2008 - link

erikespo - Monday, August 4, 2008 - link

WeaselITB - Monday, August 4, 2008 - link

erikespo - Monday, August 4, 2008 - link

Log in

Don't have an account? Sign up now