Not Quite a Pentium, Not Quite an Atom: The Larrabee Core

Intel gave us enough information about Larrabee to begin a discussion of specifications, but not enough to even begin making any conclusions. We'll start with what we pretty much already know.

Intel's Larrabee is built out of a number of x86 cores that look, at a very high level, like this:

Each core is a dual-issue, in-order architecture loosely derived from the original Pentium microprocessor. The Pentium core was modified to include support for 64-bit operations, the updates to the x86 instruction set, larger caches, 4-way SMT/Hyper Threading and a 16-wide vector ALU.

While the team that ended up working on Atom may have originally worked on the Larrabee cores, there are some significant differences between Larrabee and Atom. Atom is geared towards much higher single threaded performance, with a deeper pipeline, a larger L2 cache and additional microarchitectural tweaks to improve general desktop performance.

  Intel Larrabee Core Intel Pentium Core (P54C) Intel Atom Core
Manufacturing Process 45nm 0.60µm 45nm
Simultaneous Multi-Threading 4-way 1-way 2-way
Issue Width dual-issue dual-issue dual-issue
Pipeline Depth 5-stages (?) 5-stages 16-stages
Scalar Execution Resources 2 x Integer ALUs (?)
1 x FPU (?)
2 x Integer ALUs
1 x FPU
2 x Integer ALUs
1 x FPU
Vector Execution Resources 16-wide Vector ALU None 1 x SIMD SSE
L1 Cache (I/D) 32KB/32KB 8KB/8KB 32KB/24KB
L2 Cache 256KB None (External) 512KB
ISA 64-bit x86
SSEn support?
Parallel/Graphics?
32-bit x86 64-bit x86
Full Merom ISA compatibility

 

Larrabee on the other hand is more Pentium-like to begin with; Intel states that Larrabee's execution pipeline is "short" and followed up with us by saying that it's closer to the 5-stage pipeline of the original Pentium than the 16-stage pipeline of Atom. While both Atom and Larrabee support SMT (Simultaneous Multi-Threading), Larrabee can work on four threads concurrently compared to two on Atom and one on the original Pentium.

L1 cache sizes are similar between Larrabee and Atom, but Larrabee gets a full 32KB data cache compared to 24KB on Atom. If you remember back to our architectural discussion of Atom, the smaller L1 D-cache was a side effect of going to a register file instead of a small signal array for the cache. Die size increased but operating voltage decreased, forcing Atom to have a smaller L1 D-cache but enabling it to reach lower power targets. Larrabee is a little less constrained and thus we have conventional balanced L1 caches, at 4x the size of that in the original Pentium.

The Pentium had no on-die L2 cache, it relied on external SRAM to be installed on the motherboard. In order to maintain good desktop performance Atom came equipped with a 512KB L2 cache, while each Larrabee core will feature a 256KB L2 cache. Larrabee's architecture does stress the importance of large, fast caches as you'll soon see, but 256KB is the right size for Intel's architecture at this point. Larrabee's default OpenGL/DirectX renderer is tile based and it turns out that most 64x64 or 128x128 tiles with 32-bit color/32-bit Z can fit in a 128KB space, leaving an additional 128KB left over for caching additional data. And remember, this is just on one Larrabee core - the whole GPU will be built out of many more.

The big difference between Larrabee, Pentium and Atom is in the vector execution side. The original Pentium had no SIMD units, Atom added support for SSE and Larrabee takes a giant leap with a massive 16-wide vector ALU. This unit is able to work on up to 16 32-bit floating point operations simultaneously, making it far wider than any of the aforementioned cores. Given the nature of the applications that Larrabee will be targeting, such a wide vector unit makes total sense.

Other changes to the Pentium core that made it into Larrabee are things like 64-bit x86 support and hardware prefetchers, although it is unknown as to how these compare to Atom's prefetchers. It is a fair guess to say that prefetching will include optimizations for data parallel situations, but whether this is in addition to other prefetch technology or a replacement for it is something we'll have to wait to find out.

The Design Experiment: Could Intel Build a GPU? Drilling Deeper and Making the AMD/NVIDIA Comparison
Comments Locked

101 Comments

View All Comments

  • Shinei - Monday, August 4, 2008 - link

    Some competition might do nVidia good--if Larrabee manages to outperform nvidia, you know nvidia will go berserk and release another hammer like the NV40 after R3x0 spanked them for a year.

    Maybe we'll start seeing those price/performance gains we've been spoiled with until ATI/AMD decided to stop being competitive.

    Overall, this can only mean good things, even if Larrabee itself ultimately fails.
  • Griswold - Monday, August 4, 2008 - link

    Wake-up call dumbo. AMD just started to mop the floor with nvidias products as far as price/performance goes.
  • watersb - Monday, August 4, 2008 - link

    great article!

    You compare the Larrabee to a Core 2 duo - for SIMD instructions, you multiplied by a (hypothetical) 10 cores to show Larrabee at 160 SIMD instructions per clock (IPC). But you show non-vector IPC as 2.

    For a 10-core Larrabee, shouldn't that be x10 as well? For 20 scalar IPC
  • Adamv1 - Monday, August 4, 2008 - link

    I know Intel has been working on Ray Tracing and I'm really curious how this is going to fit into the picture.

    From what i remember Ray Tracing is a highly parallel and scales quite well with more cores and they were talking about introducing it on 8 core processors, it seems to me this would be a great platform to try it on.
  • SuperGee - Thursday, August 7, 2008 - link

    How it fit's.
    GPU from ATI and nV are called HArdware renderers. Stil a lot of fixed funtion. Rops TMU blender rasterizer etc. And unified shader are on the evolution to get more general purpouse. But they aren't fully GP.
    This larrabee a exotic X86 massive multi core. Will act as just like a Multicore CPU. But optimised for GPU task and deployed as GPU.
    So iNTel use a Software renderer and wil first emulate DirectX/OpenGL on it with its drivers.
    Like nv ATI is more HAL with as backup HEL
    Where Larrabee is pure HEL. But it's parralel power wil boost Software method as it is just like a large bunch of X86 cores.
    HEL wil runs fast, as if it was 'HAL' with LArrabee. Because the software computing power for such task are avaible with it.

    What this means is that as a GFX engine developer you got full freedom if you going to use larrabee directly.

    Like they say first with a DirectX/openGL driver. Later with also a CPU driver where it can be easy target directly. thus like GPGPU task. but larrabee could pop up as extra cores in windows.
    This means, because whatever you do is like a software solution.
    You can make a software rendere on Ratracing method, but also a Voxel engine could be done to. But this software rendere will be accelerated bij the larrabee massive multicore CPU with could do GPU stuf also very good. But will boost any software renderer. Offcourse it must be full optimised for larrabee to get the most out of it. using those vector units and X86 larrabee extention.

    Novalogic could use this to, for there Voxel game engine back in the day's of PIII.

    It could accelerate any software renderer wich depend heavily on parralel computing.
  • icrf - Monday, August 4, 2008 - link

    Since I don't play many games anymore, that aspect of Larrabee doesn't interest me any more than making economies of scale so I can buy one cheap. I'm very interested in seeing how well something like POV-Ray or an H.264 encoder can be implemented, and what kind of speed increase it'd see. Sure, these things could be implemented on current GPUs through Cuda/CTM, but that's such an different kind of task, it's not at all quick or easy. If it's significantly simpler, we'd actually see software sooner that supports it.
  • cyberserf - Monday, August 4, 2008 - link

    one word: MATROX
  • Guuts - Monday, August 4, 2008 - link

    You're going to have to use more than one word, sorry... I have no idea what in this article has anything to do with Matrox.
  • phaxmohdem - Monday, August 4, 2008 - link

    What you mean you DON'T have a Parhelia card in your PC? WTF is wrong with you?
  • TonyB - Monday, August 4, 2008 - link

    but can it play crysis?!

Log in

Don't have an account? Sign up now