Intel's Larrabee Architecture Disclosure: A Calculated First Move

Name: Intel's Larrabee Architecture Disclosure: A Calculated First Move
Item: Intel's Larrabee Architecture Disclosure: A Calculated First Move
Author: Anand Lal Shimpi & Derek Wilson

by Anand Lal Shimpi & Derek Wilson on August 4, 2008 12:00 AM EST

Posted in
GPUs

101 Comments | Add A Comment

101 Comments

Cache and Memory Hierarchy: Architected for Low Latency Operation

Intel has had a lot of experience building very high performance caches. Intel's caches are more dense than what AMD has been able to produce on the x86 microprocessor front, and as we saw in our Nehalem preview - Intel is also able to deliver significantly lower latency caches than the competition as well. Thus it should come as no surprise to anyone that Larrabee's strengths come from being built on fully programmable x86 cores, and from having very large, very fast coherent caches.

Each Larrabee core features 4x the L1 caches of the original Pentium. The Pentium had an 8KB L1 data cache and an 8KB L1 instruction cache, each Larrabee core has a 32KB/32KB L1 D/I cache. The reasoning is that each Larrabee core can work on 4x the threads of the original Pentium and thus with a 4x as large L1 the architecture remains balanced. The original Pentium didn't have an integrated L2 cache, but each Larrabee core has access to its own L2 cache partition - 256KB in size.

Larrabee's L2 pool increases with each core. An 8-core Larrabee would have 2MB of total L2 cache (256KB per core x 8 cores), a 32-core Larrabee would have an 8MB L2 cache. Each core only has access to its L2 cache partition, it can read/write to its 256KB portion of the pool and that's it. Communication with other Larrabee cores happens over the ring bus; a single core will look for data in its L2 cache, if it doesn't find it there it will place the request on the ring bus and will eventualy find the data in its L2.

Intel doesn't attempt to hide latency nearly as much as NVIDIA does, instead relying on its high speed, low latency caches. The ratio of compute resources to cache size is much lower with Larrabee than either AMD or NVIDIA's architectures.

	AMD RV770	NVIDIA GT200	Intel Larrabee
Scalar ops per L1 Cache	80	24	16
L1 Cache Size	16KB	unknown	32KB
Scalar ops per L2 Cache	100	30	16
L2 Cache Size	unknown	unknown	256KB

While both AMD and NVIDIA are very shy on giving out cache sizes, we do know that RV670 had a 256KB L2 for the entire chip cache and can expect that RV770 to have something larger, but not large enough to come close to what Intel has with Larrabee. NVIDIA is much closer in the compute-to-cache ratio than AMD, which makes sense given its approach to designing much larger GPUs, but we have no reason to believe that NVIDIA has larger caches on the GT200 die than Intel with Larrabee.

The caches are fully coherent, just like they are on a multi-core desktop CPU. The fully coherent caches makes for some interesting cases when looking at multi-GPU configurations. While Intel wouldn't get specific with multi-GPU Larrabee plans, it did state that with a multi-GPU Larrabee setup Intel doesn't "expect to have quite as much pain as they [AMD/NVIDIA] do".

We asked whether there was any limitation to maintaining cache coherence across multiple chips and the anwswer was that it could be possible with enough bandwidth between the two chips. While NVIDIA and AMD are still adding bits and pieces to refine multi-GPU rendering, Intel could have a very robust solution right out of the gate if desired (think shared framebuffer and much more efficient work load division for a single frame).

How Many Cores in a Larrabee? Programming for Larrabee

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

101 Comments

View All Comments

erikespo - Monday, August 4, 2008 - link
http://en.wikipedia.org/wiki/Square_%28geometry%29">http://en.wikipedia.org/wiki/Square_%28geometry%29

helpful page to take you back to first grade

and excuse my decimal point.. it is 204.49mm total per core or 14.3mm^2
erikespo - Monday, August 4, 2008 - link
Explain.

lets use smaller numbers for you 2mm^2 is 2mm by 2 mm or 4 total mm

double that and it is 4mm^2 or 4 mm by 4 mm or 16mm total..

we are talking about area or 2 dimensions not 1 dimension.

Same math applies to the article
MamiyaOtaru - Monday, August 4, 2008 - link
No, you're way off. 2mm² is TWO square millimeters. (a rectangle 1x2 for example). Double that would be 4mm², which could either be 1x4 or 2x2.

NUMBERmm² doesn't mean NUMBERxNUMBER mm, it means exactly what it says: NUMBER mm².

Using your smaller numbers: 2mm² is not "4 total mm"; it is TWO mm². Saying it is 4 total mm doesn't even make sense. You _can't_ measure area in millimeters. You measure it in square millimeters, and there are two of them (_2_mm²).

Here's an mspaint visual (if links work: http://img105.imageshack.us/my.php?image=squaremma...">http://img105.imageshack.us/my.php?image=squaremma...

You're so sure you're right on this, it's really depressing :(
darkequitus - Monday, August 4, 2008 - link
I did not appriciate the writer creaming over every digital page they wrote. especially when Larrabee's performance is mainl at the moment based on INtel hype and nothing real.
ZootyGray - Monday, August 4, 2008 - link
THANK YOU.

Somebody finally said it.

The others prefer Eutopian illusion - aka the curse aka ntel antitrust. ntel has no grafx and the fools in the public buy "inside' and nvid and ati aren't exactly friends of the curse.

welcome to the matrix. wakey wakey
ZootyGray - Monday, August 4, 2008 - link
and a 16 pager on maybe might could be should be = wannabe "employ-boy"
- payday ? hooyeh. This is so disappointing for me. Credibility sags to a new low.
strikeback03 - Tuesday, August 5, 2008 - link
Someone whose two posts contain about 10 complete words and no complete thoughts says Anandtech's credibility has sagged to a new low?
ZootyGray - Tuesday, August 5, 2008 - link
haha yeh - lots of room for thinking.
or - if no thinkeez - ya gots der 16 pg inundation (that's a big word like marmalade) all based on nothing-is-real - you like that kind of brainwash? we don't know anything; but here's the tekspex?
btw - did u get it? the matrix idea? watch the movie. cos here it is. pardon my loaded cryptic literacy.
thx
if you don't get it - well, that's what they want - a world of sleeping mob. never mind, that's just my concern.
The Preacher - Monday, August 4, 2008 - link
I don't really care about how good it will be executing some software renderer but I feel it is going to kick ass in scientific calculations. Matrix operations, FFT/convolution, tremendous bandwidth, double precission... I may write C++/x86 assembly code directly for it and I may put this into a rack of servers and use it through MPI. Give me a compiler with vector intrinsic functions for it and my dreams just came true! :)
elerick - Monday, August 4, 2008 - link
I have been a daily reader of another hardware review site for years. I ready nearly every articles that headlines and find many of them quite lacking. Today I got wind of your review for the Larabee. It was very well written and produced an amazing amount of tech knowledge not really commonly reviewed. I'm glad to have found you this site, and I never create an account but today I felt obligated to. Great work.

PS: any news on that AMD / Fusion? or is that just them being intimidated by Intel's Larrabee?

Intel's Larrabee Architecture Disclosure: A Calculated First Move

Cache and Memory Hierarchy: Architected for Low Latency Operation

Post Your Comment

101 Comments

View All Comments

erikespo - Monday, August 4, 2008 - link

erikespo - Monday, August 4, 2008 - link

MamiyaOtaru - Monday, August 4, 2008 - link

darkequitus - Monday, August 4, 2008 - link

ZootyGray - Monday, August 4, 2008 - link

ZootyGray - Monday, August 4, 2008 - link

strikeback03 - Tuesday, August 5, 2008 - link

ZootyGray - Tuesday, August 5, 2008 - link

The Preacher - Monday, August 4, 2008 - link

elerick - Monday, August 4, 2008 - link

Log in

Don't have an account? Sign up now