Efficiency Gets Another Boon: Parallel Kernel Support

In GPU programming, a kernel is the function or small program running across the GPU hardware. Kernels are parallel in nature and perform the same task(s) on a very large dataset.

Typically, companies like NVIDIA don't disclose their hardware limitations until a developer bumps into one of them. In GT200/G80, the entire chip could only be working on one kernel at a time.

When dealing with graphics this isn't usually a problem. There are millions of pixels to render. The problem is wider than the machine. But as you start to do more general purpose computing, not all kernels are going to be wide enough to fill the entire machine. If a single kernel couldn't fill every SM with threads/instructions, then those SMs just went idle. That's bad.


GT200 (left) vs. Fermi (right)

Fermi, once again, fixes this. Fermi's global dispatch logic can now issue multiple kernels in parallel to the entire system. At more than twice the size of GT200, the likelihood of idle SMs went up tremendously. NVIDIA needs to be able to dispatch multiple kernels in parallel to keep Fermi fed.

Application switch time (moving between GPU and CUDA mode) is also much faster on Fermi. NVIDIA says the transition is now 10x faster than GT200, and fast enough to be performed multiple times within a single frame. This is very important for implementing more elaborate GPU accelerated physics (or PhysX, great ;)…).

The connections to the outside world have also been improved. Fermi now supports parallel transfers to/from the CPU. Previously CPU->GPU and GPU->CPU transfers had to happen serially.

A More Efficient Architecture ECC, Unified 64-bit Addressing and New ISA
Comments Locked

415 Comments

View All Comments

  • Kougar - Friday, October 2, 2009 - link

    Hey Anand:

    Just wanted to say thanks for the article. Love the quotes and behind-the-scene views, and in general the ever so informative articles like this that just can't be found elsewhere. So, thank you!
  • bobvodka - Friday, October 2, 2009 - link

    Someone earlier askes if supporting doubles was going to waste silicon, I don't think it will.

    If you look at the through put numbers and the fact that FP64 is half that of FP32 with the SFU disabled I suspect what is going on is that the FP64 calculations are being done by 2 cores at once with the SFU being involved in some way (given how it is decoupled from the cores there is no apprent good reason why the SFU should be disabled during FP64 operation).

    A comment was also made re:ECC memory.
    I suspect this wont make it to the consumer board; there is no good reason to do so and it would just cost silicon and power for a feature users don't need.

  • Zool - Friday, October 2, 2009 - link

    Maybe the consumer board wont hawe ECC but it will be still in the silicon (disabled). I dont think that they will produce two different silicons just becouse of ECC.
  • bobvodka - Friday, October 2, 2009 - link

    hmmm, you are probably right on that score and that might aid yield if they can turn it off as any faults in the ECC areas could be safely ignored.

    Chances of them using ECC ram on the boards themselves I would have said was zero simply due to cost :)
  • halcyon - Friday, October 2, 2009 - link

    Same foundry, same process, much more transistors....

    Based on roughly extrapolating scaling from the RV870, how much bigger power draw would this baby have?

    The dollar draw from my wallet is going to be really powerful, that's for sure, but how about power?



  • deeper - Friday, October 2, 2009 - link

    Well, not only is the GT300 months away but it looks like the card they showed off is a fake anyhoo, check it out at Charlie Demerjian's www.semiaccurate.com
  • Zool - Friday, October 2, 2009 - link

    Could you pls delete majority of SiliconDoc replies and than this after them. Its embarassing to read them.
  • Pirks - Friday, October 2, 2009 - link

    I call BS. How many people have 2560x1600 30-inchers? Two? Three? Main point - resolutions are _VERY_ far from being stagnated, they have SOOOOOOOOO _MUCH_ room for growth until 2560x1600 which right now covers maybe 1% of the PC gaming market. 90% of PC gamers still use low-res 1680x1050 if not less (I for one have 1400x1050, yeah shame on me, I don't want to spend $800 on hi-end SLI setup just to play Crysis in all its hi-res beauty, for.get.it.)

    Shame Anand, real shame.

    Otherwise top notch quality stuff, as always with Ananad.
  • bigboxes - Friday, October 2, 2009 - link

    1680x1050 = low res??? Seriously? That's hi-def bro. I understand you can do better, but for my 20" widescreen it is definitley hi-def.
  • JarredWalton - Friday, October 2, 2009 - link

    I believe what you describe is exactly what is meant by stagnation. From Merriam-Webster: "To become stagnant." Stagnant: "Not advancing or developing." So yeah, I'd say that pretty much sums up display resolutions: they're not advancing.

    Is that bad? Not necessarily, especially when we have so many applications that do thing based purely on the wonderful pixel instead of on points or DPI. I use a 30" LCD, and I love the extra resolution for working with images, but the text by default tends to be too small. I have to zoom to 150% in a lot of apps (including Firefox/IE) to get what I consider comfortably readable text. I would say that 2560x1600 on a 30" LCD is about as much as I see myself needing for a good, looooong time.

Log in

Don't have an account? Sign up now