Sandy Bridge Graphics

The largest performance improvement on Sandy Bridge vs. current Westmere architectures actually has nothing to do with the CPU, it’s all graphics. While the CPU cores show a 10 - 30% improvement in performance, Sandy Bridge graphics performance is easily double what Intel delivered with Clarkdale/Arrandale at the beginning of the year.

While the ‘dales improved performance by moving graphics to a 45nm process, increasing shader hardware and clock speed, SNB graphics improves through a significant increase in IPC.

The Sandy Bridge GPU is on-die built out of the same 32nm transistors as the CPU cores. The GPU is on its own power island and clock domain. The GPU can be powered down or clocked up independently of the CPU. Graphics turbo is available on both desktop and mobile parts, and you get more bins with graphics turbo on Sandy Bridge than you did with Arrandale.

The GPU is treated like an equal citizen in the Sandy Bridge world, it gets equal access to the L3 cache. The graphics driver controls what gets into the L3 cache and you can even limit how much cache the GPU is able to use. Storing graphics data in the cache is particularly important as it saves trips to main memory which are costly from both a performance and power standpoint. Redesigning a GPU to make use of a cache isn’t a simple task. It usually requires the sort of complete re-design that NVIDIA did with GF100 for example.

SNB graphics is the anti-Larrabee. While Larrabee focused on extensive use of fully programmable hardware (with the exception of the texture hardware), SNB graphics (internally referred to as Gen 6 graphics) makes extensive use of fixed function hardware. The design mentality was anything that could be described by a fixed function should be implemented in fixed function hardware. The benefit is performance/power/die area efficiency, at the expense of flexibility. Keeping much of the GPU fixed function is in-line with Intel’s CPU centric view of the world. By contrast, taking the GPU as programmable as possible makes more sense for a GPU focused company like NVIDIA.

The programmable shader hardware is composed of shaders/cores/execution units that Intel calls EUs. Each EU can dual issue picking instructions from multiple threads. The internal ISA maps one-to-one with most DirectX 10 API instructions resulting in a very CISC-like architecture. Moving to one-to-one API to instruction mapping increases IPC by effectively increasing the width of the EUs.

There are other improvements within the EU. Transcendental math is handled by hardware in the EU and its performance has been sped up considerably. Intel told us that sine and cosine operations are several orders of magnitude faster now than they were in current HD Graphics.

In previous Intel graphics architectures the register file was repartitioned on the fly. If a thread needed fewer registers, the remaining registers could be allocated to another thread. While this was a great approach for saving die area, it proved to be a limiter for performance. In many cases threads couldn’t be worked on as there were no registers available for use. Prior to Intel HD Graphics each thread had an average of 64 registers, the ‘dales brought that average up to 80 per thread. In SNB, the GPU has a fixed amount of registers per thread: 120. The register count limiting thread count scenarios have been alleviated.

All of these enhancements result in 2x the instruction throughput per EU compared to Intel’s HD Graphics today.


Sandy Bridge vs. NVIDIA GeForce 310M Playing Starcraft 2

At launch there will be two versions of Sandy Bridge graphics: one with 6 EUs and one with 12 EUs. All mobile parts (at launch) will use 12 EUs, while desktop SKUs may either use 6 or 12 depending on the model. From what I’ve heard, the higher end SKUs may have 12 while the lower end models will have 6 but I haven’t seen it on a roadmap yet so I can’t confirm. With a 2x improvement in throughput per EU, you should see greater than Intel HD Graphics performance even with the 6 EU parts (thanks to higher clock speeds and the GPU sharing the L3 cache).

The Ring Bus & System Agent Sandy Bridge Media Engine
POST A COMMENT

62 Comments

View All Comments

  • beginner99 - Tuesday, September 14, 2010 - link

    AMD's been taking about fusion forever but I can't get rid of the feeling that this Intel implementation will be much more "fused" than the AMD one will be. AMD barley has CPU turbo so adding a comined cpu/gpu turbo at once, maybe they can pull it off but experience makes me doubt that very much.

    BTW, if it takes like 3mm^2 for a super fast video encoder I ask my self, why wasn't this done before?
    Reply
  • duploxxx - Tuesday, September 14, 2010 - link

    first or not, doesn't really matter.

    who says AMD need's GPU turbo? If Liano really is a 400SP GPU it will knock any Intel GPU with or without turbo.

    If we see the first results of Anadtech review which seems to be a GT2 part it doesn't have a chance at all.

    core i5 is really castrated due to lack of HT, This is exactly where liano will fight against, with a bit less cpu power.
    Reply
  • B3an - Tuesday, September 14, 2010 - link

    Even if AMD's GPU in Liano is faster, intels GPU is finally decent and good enough for most people, but more importantly more people will care about CPU performance because most users dont play games and this GPU can more than easily handle HD video. And i'm sure SB will be faster than anything AMD has. Then throw in the AVX and i'd say Intel clearly have a better option for the vast majority of people, it just comes down to price now. Reply
  • B3an - Tuesday, September 14, 2010 - link

    Sorry, didnt mean AVX, i meant the hardware accelerated video encoding. Reply
  • bitcrazed - Tuesday, September 14, 2010 - link

    But it's not just about raw power - it's about power per dollar.

    If you've got $500 to spend on a mobo and CPU, where do you spend it? On a slower Intel platform or on a faster AMD platform?

    If AMD get their pricing right, they could turn this into a no-brainer decision, greatly increasing their sales.
    Reply
  • duploxxx - Tuesday, September 14, 2010 - link

    now here comes the issue with the real fanboys:

    "And i'm sure SB will be faster than anything AMD has."

    It's exactly price where AMD has the better option. It's people " known brand name" that keeps them at buying the same thing without knowledge... yeah lets buy a Pentium.
    Reply
  • takeulo - Wednesday, September 15, 2010 - link

    hahahahah yeah i agree AMD is the better option at all if i have the high budget i'll go for Insane i mean Intel but since im only "poor" and i cant afford it so i'll stick to AMD and my money worth it

    sorry for my bad english XD
    Reply
  • MySchizoBuddy - Monday, December 20, 2010 - link

    how do you know Intel GPU has reached good enough state (do you have benchmarks to support your hypothesis). they have been trying to reach this state for as long as i can remember.

    your good enough state might be very different that somebodies else's good enough state.
    Reply
  • bindesh - Tuesday, September 20, 2011 - link

    Your all doubts will be cleared after watching this video, and related once.

    http://www.youtube.com/watch?v=XqBk0uHrxII&fea...

    I am having 3 AMDs and 1 Intel, Believe me with the price of AMD CPUs, i can only get a celeron in Intel. Which cannot run NFS SHIFT. Or TIme Shift. But other hand, with AMD athlon, i have completed Devil May Cry 4 with decent speed. And the laptop costs 24K, Toshiba C650, psg xxxxx18 model. It has 360 GB SSD, ATI 4200HD.

    Can you get such price and performance with Intel?

    Best part is that i am running it with 800MHz cpu speed, with performance much much greater than 55K intel dual core laptop of my friend.
    Reply
  • vlado08 - Tuesday, September 14, 2010 - link

    Still no word ont the 23.976 FPS play back? Reply

Log in

Don't have an account? Sign up now