A New Architecture

This is a first. Usually when we go into these performance previews we’re aware of the architecture we’re reviewing, all we’re missing are the intimate details of how well it performs. This was the case for Conroe, Nehalem and Lynnfield (we sat Westmere out until final hardware was ready). Sandy Bridge, is a different story entirely.

Here’s what we do know.

Sandy Bridge is a 32nm CPU with an on-die GPU. While Clarkdale/Arrandale have a 45nm GPU on package, Sandy Bridge moves the GPU transistors on die. Not only is the GPU on die but it shares the L3 cache of the CPU.

There are two different GPU configurations, referred to internally as 1 core or 2 cores. A single GPU core in this case refers to 6 EUs, Intel’s graphics processor equivalent (NVIDIA would call them CUDA cores). Sandy Bridge will be offered in configurations with 6 or 12 EUs.

While the numbers may not sound like much, the Sandy Bridge GPU is significantly redesigned compared to what’s out currently. Intel already announced a ~2x performance improvement compared to Clarkdale/Arrandale, and I can say that after testing Sandy Bridge Intel has been able to achieve at least that.

Both the CPU and GPU on SB will be able to turbo independently of one another. If you’re playing a game that uses more GPU than CPU, the CPU may run at stock speed (or lower) and the GPU can use the additional thermal headroom to clock up. The same applies in reverse if you’re running something computationally intensive.

On the CPU side little is known about the execution pipeline. Sandy Bridge enables support for AVX instructions, just like Bulldozer. The CPU will also have dedicated hardware video transcoding hardware to fend off advances by GPUs in the transcoding space.

Caches remain mostly unchanged. The L1 cache is still 64KB (32KB instruction + 32KB data) and the L2 is still a low latency 256KB. I measured both as still 4 and 10 cycles respectively. The L3 cache has changed however.

Only the Core i7 2600 has an 8MB L3 cache, the 2400, 2500 and 2600 have a 6MB L3 and the 2100 has a 3MB L3. The L3 size should matter more with Sandy Bridge due to the fact that it’s shared by the GPU in those cases where the integrated graphics is active. I am a bit puzzled why Intel strayed from the steadfast 2MB L3 per core Nehalem’s lead architect wanted to commit to. I guess I’ll find out more from him at IDF :)

The other change appears to either be L3 cache latency or prefetcher aggressiveness, or both. Although most third party tools don’t accurately measure L3 latency they can usually give you a rough idea of latency changes between similar architectures. In this case I turned to cachemem which reported Sandy Bridge’s L3 latency as 26 cycles, down from ~35 in Lynnfield (Lynnfield’s actual L3 latency is 42 clocks).

As I mentioned before, I’m not sure whether this is the result of a lower latency L3 cache or more aggressive prefetchers, or both. I had limited time with the system and was unfortunately unable to do much more.

And that’s about it. I can fit everything I know about Sandy Bridge onto a single page and even then it’s not telling us much. We’ll certainly find out more at IDF next month. What I will say is this: Sandy Bridge is not a minor update. As you’ll soon see, the performance improvements the CPU will offer across the board will make most anyone want to upgrade.

A New Name A New Socket and New Chipsets
Comments Locked

200 Comments

View All Comments

  • teohhanhui - Saturday, August 28, 2010 - link

    Just something like nVidia Optimus? Perhaps Intel could come up with a more elegant solution to the same problem...
  • hnzw rui - Friday, August 27, 2010 - link

    Hmm, based on the roadmap I actually think the i7-2600K will be priced close to the i7-875K. The i7-950 is supposed to drop to $294 next week putting it in the high end Mainstream price range (it'll still be Q3'10 then). Also, all the $500+ processors are in the Performance category (i7-970, $885; i7-960, $562; i7-880, $562).

    If the i7-2600K goes for $340 or thereabouts, I can already see supply shortages due to high demand (and the eventual price gouging that would follow).
  • liyunjiu - Friday, August 27, 2010 - link

    How are the comparisons between NVIDIA low end discrete/mobile graphics?
  • tatertot - Friday, August 27, 2010 - link

    Hey Anand,

    How could you tell that this sample had only 6 execution units active in the GPU vs. the full 12?

    Was it just what this particular SKU is supposed to have, or some CPU-Z type info, or... ?

    thx
  • Anand Lal Shimpi - Saturday, August 28, 2010 - link

    Right now all desktop parts have 6 EUs, all mobile parts have 12 EUs. There are no exceptions on the mobile side, there may be exceptions on the desktop side but from the information I have (and the performance I saw) this wasn't one of those exceptions.

    Take care,
    Anand
  • steddy - Saturday, August 28, 2010 - link

    "all mobile parts have 12 EUs"

    Sweet! Guess the good 'ol GeForce 310m is on the way out.
  • mianmian - Saturday, August 28, 2010 - link

    The mobile CPU/GPU usually has much lower frequency.
    I guess the 12EU mobile GPU will perform on pair with the desktop 6EU one.
  • IntelUser2000 - Saturday, August 28, 2010 - link

    That seriously doesn't make sense. Couple of possible scenarios then.

    -Performance isn't EU bound and 2x EUs only bring 10-20%
    -The mobile parts are FAR faster than desktop parts(unlikely)
    -The mobile parts do have 12 EUs, but are clocked low enough to perform like the 6 EU desktop(but why?)
    -There will be specialized versions like the i5 661
  • DanNeely - Sunday, August 29, 2010 - link

    Actually I think it does. Regardless of if they 6 or 12EU's it's still not going be a replacement for any but the bottom tier of GPUs. However adding a budget GPU to a desktop system has a fairly minimal opportunity cost since you're just sticking a card into a slot.

    Adding a replacement GPU in a laptop has a much higher opportunity cost. You're paying in board-space and internal volume even if power gating, etc minimizes the extra power draw doubling the size of the on die GPU will cost less than adding an external GPU that's twice as fast. You also can't upgrade a laptop GPU later on if you decide you need more power.
  • Anand Lal Shimpi - Tuesday, August 31, 2010 - link

    I spoke too soon, it looks like this may have been a 12 EU part. I've updated the article and will post an update as soon as I'm able to confirm it :)

    Take care,
    Anand

Log in

Don't have an account? Sign up now