The Design Experiment: Could Intel Build a GPU?

Larrabee is fundamentally built out of existing Intel x86 core technology, which not only means that the chip design isn't foreign to Intel, but also has serious implications for the future of desktop microprocessors. Larrabee isn't however built on Intel's current bread and butter, the Core architecture, instead Intel turned to a much older architecture as the basis for Larrabee: the original Pentium.

The original Pentium was manufactured on a 0.80µm process, later shrinking to 0.60µm. The question Intel posed was this: could an updated version of the Pentium core, built on a modern day process and equipped with a very wide vector unit, make a solid foundation for a high-end GPU?

To first test the theory Intel took a standard Core 2 Duo, with a 4MB L2 cache at an undisclosed clock speed (somewhere in the 1.8 - 2.9GHz range I'd guess). Then, on the same manufacturing process, roughly the same die area and power consumption, Intel sought to find out how many of these modified Pentium cores it could fit. The number was 10.

So in the space of a dual-core Core 2 Duo, Intel could construct this hypothetical 10-core chip. Let's look at the stats:

  Intel Core 2 Duo Hypothetical Larrabee
# of CPU Cores 2 out of order 10 in-order
Instructions per Issue 4 per clock 2 per clock
VPU Lanes per Core 4-wide SSE 16-wide
L2 Cache Size 4MB 4MB
Single-Stream Throughput 4 per clock 2 per clock
Vector Throughput 8 per clock 160 per clock


Note that what we're comparing here are operation throughputs, not how fast it can actually execute anything, just how many operations it can retire per clock.

Running a single instruction stream (e.g. single threaded application), the Core 2 can process as many as four operations per clock, since it can issue 4-instructions per clock and it isn't execution unit constrained. The 10-core design however can only issue two instructions per clock and thus the peak execution rate for a single instruction stream is two operations per clock, half the throughput of the Core 2. That's fine however since you'll actually want to be running vector operations on this core and leave your single threaded tasks to your Core 2 CPU anyways, and here's where the proposed architecture spreads its wings.

With two cores, each with their ability to execute 4 concurrent SSE operations per clock, you've got a throughput of 8 ops per clock on Core 2. On the 10-core design? 160 ops per clock, an increase of 20x in roughly the same die area and power budget.

On paper this could actually work. If you had enough of these cores, you could get the vector throughput necessary to actually build a reasonable GPU. Of course there are issues like adapting the x86 instruction set for use in a GPU, getting all of the cores to communicate with one another and actually keeping all of these execution resources busy - but this design experiment showed that it was possible.

Thus Larrabee was born.

Index Not Quite a Pentium, Not Quite an Atom: The Larrabee Core


View All Comments

  • FujiT - Monday, August 04, 2008 - link

    Some if you just don't get it.

    It's not about whether or not it can play crysis with 100 FPS and it's not as much about whether it can compete with AMD/nVidia (although that's important too).

    I see this chip as a beginning of a new revolution in computing. It reminds me a lot of a cell processor (although i don't know that much about architecture) where a smarter CPU will tell the dumber CPUs what to do. The ability to have a many core CPU with a mixture of really smart and dumber, but FP optimized cores will really make stuff like rendering a lot faster on a CPU, and would take programs such as F@H to the next level. The added perk is the fact that it's all x86 as anand pointed out.
  • DerekWilson - Monday, August 04, 2008 - link

    this is a pretty good observation ...

    but no matter how much potential it has, performance in games is going to be the thing that actually makes or breaks it. it's of no use to anyone if no one buys it. and no one is going to buy it because of potential -- it's all about whether or not they can deliver on game performance.
  • Griswold - Monday, August 04, 2008 - link

    Well, it seems you dont get it either. Reply
  • helms - Monday, August 04, 2008 - link

    I decided to check out the development of this game I heard about ages ago that seemed pretty unique not only the game but the game engine for it. Going to the website it seems Intel acquired them at the end of February.">">

    I wonder how significant this is.
  • iwodo - Monday, August 04, 2008 - link

    I forgot to ask, how will the Software Render works out on Mac? Since all Direct X code are run to Software renderer doesn't that fundamentally mean most of the current Windows based games could be run on Mac with little work? Reply
  • MamiyaOtaru - Monday, August 04, 2008 - link

    Not really. Larrabee will be translating directx to its software renderer. But unless Microsoft ports the directX API to OSX, there will be nothing for Larrabee to translate. Reply
  • Aethelwolf - Monday, August 04, 2008 - link

    I wonder if game devs can write their games in directx then have the software renderer convert it into larrabee's ISA on windows platform, capturing the binary somehow. Distribute the directx on windows and the software ISA for mac. No need for two separate code paths. Reply
  • iwodo - Monday, August 04, 2008 - link

    If anyone can just point out the assumption anand make are false? Then it would be great, because what he is saying is simply too good to be true.

    One point to mention the 4Mb Cache takes up nearly 50% of the die size. So if intel could rely more on bandwidth and saving on cache they could put in a few more core.

    And am i the only one who think 2010 is far away from Introduction. I think 2009 summer seems like a much better time. Then they will have another 6 - 8 months before they move on to 32nm with higher clock speed.

    And for the Game developers, with the cash intel have, 10 Million for every high profile studio like Blizzard, 50 Million to EA to optimize for Intel. It would only cost them 100 million of pocket money.
  • ZootyGray - Monday, August 04, 2008 - link

    I was thinking of all the p90's I threw away - could have made a cpu sandwich, with a lil peanut software butter, and had this tower of babel thing sticking out the side of the case with a fan on top, called lazarus, or something - such an opportunity to utilize all that old tek - such imagery.

    griswold u r funny :)
  • Griswold - Monday, August 04, 2008 - link

    You definitely are confused. Time for a nap. Reply

Log in

Don't have an account? Sign up now