Cache and Memory Hierarchy: Architected for Low Latency Operation

Intel has had a lot of experience building very high performance caches. Intel's caches are more dense than what AMD has been able to produce on the x86 microprocessor front, and as we saw in our Nehalem preview - Intel is also able to deliver significantly lower latency caches than the competition as well. Thus it should come as no surprise to anyone that Larrabee's strengths come from being built on fully programmable x86 cores, and from having very large, very fast coherent caches.

Each Larrabee core features 4x the L1 caches of the original Pentium. The Pentium had an 8KB L1 data cache and an 8KB L1 instruction cache, each Larrabee core has a 32KB/32KB L1 D/I cache. The reasoning is that each Larrabee core can work on 4x the threads of the original Pentium and thus with a 4x as large L1 the architecture remains balanced. The original Pentium didn't have an integrated L2 cache, but each Larrabee core has access to its own L2 cache partition - 256KB in size.

Larrabee's L2 pool increases with each core. An 8-core Larrabee would have 2MB of total L2 cache (256KB per core x 8 cores), a 32-core Larrabee would have an 8MB L2 cache. Each core only has access to its L2 cache partition, it can read/write to its 256KB portion of the pool and that's it. Communication with other Larrabee cores happens over the ring bus; a single core will look for data in its L2 cache, if it doesn't find it there it will place the request on the ring bus and will eventualy find the data in its L2.

Intel doesn't attempt to hide latency nearly as much as NVIDIA does, instead relying on its high speed, low latency caches. The ratio of compute resources to cache size is much lower with Larrabee than either AMD or NVIDIA's architectures.

  AMD RV770 NVIDIA GT200 Intel Larrabee
Scalar ops per L1 Cache 80 24 16
L1 Cache Size 16KB unknown 32KB
Scalar ops per L2 Cache 100 30 16
L2 Cache Size unknown unknown 256KB

 

While both AMD and NVIDIA are very shy on giving out cache sizes, we do know that RV670 had a 256KB L2 for the entire chip cache and can expect that RV770 to have something larger, but not large enough to come close to what Intel has with Larrabee. NVIDIA is much closer in the compute-to-cache ratio than AMD, which makes sense given its approach to designing much larger GPUs, but we have no reason to believe that NVIDIA has larger caches on the GT200 die than Intel with Larrabee.

The caches are fully coherent, just like they are on a multi-core desktop CPU. The fully coherent caches makes for some interesting cases when looking at multi-GPU configurations. While Intel wouldn't get specific with multi-GPU Larrabee plans, it did state that with a multi-GPU Larrabee setup Intel doesn't "expect to have quite as much pain as they [AMD/NVIDIA] do".

We asked whether there was any limitation to maintaining cache coherence across multiple chips and the anwswer was that it could be possible with enough bandwidth between the two chips. While NVIDIA and AMD are still adding bits and pieces to refine multi-GPU rendering, Intel could have a very robust solution right out of the gate if desired (think shared framebuffer and much more efficient work load division for a single frame).

How Many Cores in a Larrabee? Programming for Larrabee
Comments Locked

101 Comments

View All Comments

  • Griswold - Monday, August 4, 2008 - link

    You seem to be confused. Time for a nap.
  • MDme - Monday, August 4, 2008 - link

    but AMD will have Cinema 2.0. did you see that demo? by 2010, AMD will have the RV990 or whatever...and Nvidia will have GT400?
  • phaxmohdem - Monday, August 4, 2008 - link

    Considering how long it took nVidia to release a single GPU significantly faster than G80, I'd be shocked if we wee GT300 by 2009/2010. however a GTX 295GT X2 ULTRA OC is not out of the question ;)
  • shuffle2 - Monday, August 4, 2008 - link

    mm², how hard is that to write? >.>
  • 1prophet - Monday, August 4, 2008 - link

    They need to hit one out of the park with the drivers (software)as well.
  • jltate - Tuesday, August 5, 2008 - link

    I've got a bunch of comments, so I'll just list them all here.

    SSE doesn't have fused multiply-add operations. Larrabee does -- thus that 10 core processor could perform a peak of 320 floating point operations per cycle (it's mentioned in the SIGGRAPH paper).

    Larrabee's programming model is variable width -- the hardware can and likely will be augmented in the future to perform more than just 16 operations in parallel.

    The ring bus between cores was stated to be for each group of 16. Intel stated that for more than 16 cores they'd use "multiple short-linked rings".

    Also, the diagram only shows one memory controller on one side with fixed function logic on the other, not two memory controllers as you showed on page 5 of your article. However, Intel stated in the paper that the configuration and number of processors, fixed function blocks and I/O controllers would be implementation dependent. So in effect it could very well have a half-dozen 64-bit interfaces like G80.

    My forecast? This thing will rock. I for one simply cannot wait.
  • Laura Wilson - Monday, August 4, 2008 - link

    that's the truth

    they say they know this. it sounds like they know this ... we'll see what happens :-)
  • gigahertz20 - Monday, August 4, 2008 - link

    I'm going to predict Larrabee will provide a huge boost of performance over Intel's current crappy integrated graphic solutions, but will not be able to compete with AMD/ATI's and Nvidia's high end GPU's when it (Larrabee) finally launches. If Intel can deliver a monster that can push 100+ FPS in Crysis and doesn't cost so much that it breaks the bank like the current Nvidia GTX 280's, then they will have a real winner! When it finally launches though, who knows what AMD/ATI and Nvidia will have out to compete against it, wonder if Intel is just trying to push out a mainstream chip or go high end as well...guess I need to read the rest of the article :)
  • JEDIYoda - Tuesday, August 5, 2008 - link

    dreaming again huh??? you people who want top notch performance without having to pay for it....rofl..hahaha
  • FITCamaro - Monday, August 4, 2008 - link

    This isn't mean to compete with their IGPs. At least not initially.

Log in

Don't have an account? Sign up now