Drilling Deeper and Making the AMD/NVIDIA Comparison

Don't be fooled by the initial diagram, this simple x86 core gets far more complex. In the image below, the block to the left is the Larrabee core we mentioned earlier, to the right we've blown up the vector unit and its associated parts:

The vector unit is key and within that unit you've got a ton of registers and a very wide vector ALU, which leads us to the fundamental building block of Larrabee. NVIDIA's GT200 is built out of Streaming Processors, AMD's RV770 out of Stream Processing Units and Larrabee's performance comes from these 16-wide vector ALUs:

The vector ALU can behave as a 16-wide single precision ALU or an 8-wide double precision, although that doesn't necessarily translate into equivalent throughput (which Intel would not at this point clarify). Compared to ATI and NVIDIA, here's how Larrabee looks at a basic execution unit level:

NVIDIA's SPs work on a single operation, AMD's can work on five, and Larrabee's vector unit can work on sixteen. NVIDIA has a couple hundred of these SPs in its high end GPUs, AMD has 160 and Intel is expected to have anywhere from 16 - 32 of these cores in Larrabee. If NVIDIA is on the tons-of-simple-hardware end of the spectrum, Intel is on the exact opposite end of the scale.

We've already shown that AMD's architecture requires a lot of help from the compiler to properly schedule and maximize the utilization of its execution resources within one of its 5-wide SPs, with Larrabee the importance of the compiler is tremendous. Luckily for Larrabee, some of the best (if not the best) compilers are made by Intel. If anyone could get away with this sort of an architecture, it's Intel.

At the same time, while we don't have a full understanding of the details yet, we get the idea that Larrabee's vector unit is sort of a chameleon. From the information we have, these vector units could exectue atomic 16-wide ops for a single thread of a running program and can handle register swizzling across all 16 exectution units. This implies something very AMD like and wide. But it also looks like each of the 16 vector execution units, using the mask registers can branch independently (looking very much more like NVIDIA's solution).

We've already seen how AMD and NVIDIA architectural differences show distinct advantages and disadvantages against eachother in different games. If Intel is able to adapt the way the vector unit is used to suit specific situations, they could have something huge on their hands. Again, we don't have enough detail to tell what's going to happen, but things do look very interesting.

Not Quite a Pentium, Not Quite an Atom: The Larrabee Core Putting it all Together - Return of the Ring Bus
Comments Locked

101 Comments

View All Comments

  • phaxmohdem - Monday, August 4, 2008 - link

    Can your mom play Crysis? *burn*
  • JonnyDough - Monday, August 4, 2008 - link

    I suppose she could but I don't think she would want to. Why do you care anyway? Have some sort of weird fetish with moms playing video games or are you just looking for another woman to relate to?

    Ooooh, burn!
  • Griswold - Monday, August 4, 2008 - link

    He is looking for the one playing his mom, I think.
  • bigboxes - Monday, August 4, 2008 - link

    Yup. He worded it incorrectly. It should have read, "but can it play your mom?" :p
  • Tilmitt - Monday, August 4, 2008 - link

    I'm really disappointed that Intel isn't building a regular GPU. I doubt that bolting a load of unoptimised x86 cores together is going to be able to perform anywhere near as well as a GPU built from the ground up to accelerate graphics, given equal die sizes.
  • JKflipflop98 - Monday, August 4, 2008 - link

    WTF? Did you read the article?
  • Zoomer - Sunday, August 10, 2008 - link

    He had a point. More programmable == more transistors. Can't escape from that fact.

    Given equal number of transistors, running the same program, a more programmable solution will always be crushed by fixed function processors.
  • JonnyDough - Monday, August 4, 2008 - link

    I was wondering that too. This is obviously a push towards a smaller Centrino type package. Imagine a powerful CPU that can push graphics too. At some point this will save a lot of battery juice in a notebook computer, along with space. It may not be able to play games, but I'm pretty sure it will make for some great basic laptops someday that can run video. Not all college kids and overseas marines want to play video games. Some just want to watch clips of their family back home.
  • rudolphna - Monday, August 4, 2008 - link

    as interesting and cool as this sounds, this is even more bad news for AMD, who was finally making up for lost ground. granted, its still probably 2 years away, and hopefully AMD will be back to its old self (Athlon64 era) They are finally getting products that can actually compete. Another challenger, especially from its biggest rival-Intel- cannot be good for them.
  • bigboxes - Monday, August 4, 2008 - link

    What are you talking about? It's been nothing but good news for AMD lately. Sure, let Intel sink a lot of $$ into graphics. Sounds like a win for AMD (in a roundabout way). It's like AMD investing into a graphics maker (ATI) instead of concentrating on what makes them great. Most of the Intel supporters were all over AMD for making that decision. Turn this around and watch Intel invest heavily into graphics and it's a grand slam. I guess it's all about perspective. :)

Log in

Don't have an account? Sign up now