A More Efficient Architecture

GPUs, like CPUs, work on streams of instructions called threads. While high end CPUs work on as many as 8 complicated threads at a time, GPUs handle many more threads in parallel.

The table below shows just how many threads each generation of NVIDIA GPU can have in flight at the same time:

  Fermi GT200 G80
Max Threads in Flight 24576 30720 12288

 

Fermi can't actually support as many threads in parallel as GT200. NVIDIA found that the majority of compute cases were bound by shared memory size, not thread count in GT200. Thus thread count went down, and shared memory size went up in Fermi.

NVIDIA groups 32 threads into a unit called a warp (taken from the looming term warp, referring to a group of parallel threads). In GT200 and G80, half of a warp was issued to an SM every clock cycle. In other words, it takes two clocks to issue a full 32 threads to a single SM.

In previous architectures, the SM dispatch logic was closely coupled to the execution hardware. If you sent threads to the SFU, the entire SM couldn't issue new instructions until those instructions were done executing. If the only execution units in use were in your SFUs, the vast majority of your SM in GT200/G80 went unused. That's terrible for efficiency.

Fermi fixes this. There are two independent dispatch units at the front end of each SM in Fermi. These units are completely decoupled from the rest of the SM. Each dispatch unit can select and issue half of a warp every clock cycle. The threads can be from different warps in order to optimize the chance of finding independent operations.

There's a full crossbar between the dispatch units and the execution hardware in the SM. Each unit can dispatch threads to any group of units within the SM (with some limitations).

The inflexibility of NVIDIA's threading architecture is that every thread in the warp must be executing the same instruction at the same time. If they are, then you get full utilization of your resources. If they aren't, then some units go idle.

A single SM can execute:

Fermi FP32 FP64 INT SFU LD/ST
Ops per clock 32 16 32 4 16

 

If you're executing FP64 instructions the entire SM can only run at 16 ops per clock. You can't dual issue FP64 and SFU operations.

The good news is that the SFU doesn't tie up the entire SM anymore. One dispatch unit can send 16 threads to the array of cores, while another can send 16 threads to the SFU. After two clocks, the dispatchers are free to send another pair of half-warps out again. As I mentioned before, in GT200/G80 the entire SM was tied up for a full 8 cycles after an SFU issue.

The flexibility is nice, or rather, the inflexibility of GT200/G80 was horrible for efficiency and Fermi fixes that.

Architecting Fermi: More Than 2x GT200 Efficiency Gets Another Boon: Parallel Kernel Support
Comments Locked

415 Comments

View All Comments

  • palladium - Monday, October 5, 2009 - link

    Not quite:

    http://www.dailytech.com/article.aspx?newsid=16410">http://www.dailytech.com/article.aspx?newsid=16410

    Scroll down halfway thru the comments. He re-registered as SilicconDoc and barks about his hatred for red roosters (in an Apple-related article!)
  • johnsonx - Monday, October 5, 2009 - link

    that looks more like someone mocking him
  • - Sunday, October 4, 2009 - link

    According to this very link http://www.anandtech.com/video/showdoc.aspx?i=3573...">http://www.anandtech.com/video/showdoc.aspx?i=3573... AMD already presented a WORKING SILICON at Computex roughly 4 months ago on June 3rd. So it took roughly 4 and a half months to prepare drivers, infrastructure and mass production to have enoough for the start of Windows 7 and DX11. However, Nvidia wasnt even talking about W7 and DX11 so late Q1 2010 or even later becomes more realistic than december. But there are much more questions ahead: What pricepoint, Clockrates and TDP. My impression is that Nvidia has no clue about this questions and the more I watch this development, the more Fermi resembles to the Voodoo5 Chip and the V6000 card which never made into the market because of its much to high TDP.
  • silverblue - Sunday, October 4, 2009 - link

    Nah, I expect nVidia to do everything they can to get this into retail channels because it's the culmination of a lot of hard work. I also expect it to be a monster, but I'm still curious as to how they're going to sort out mainstream options due to their top-down philosophy.

    That's not to say ATI's idea of a mid-range card that scales up and down doesn't have its flaws, but with both the 4800 and 5800 series, there's been a card out at the start with a bona fide GPU with nothing disabled (4850, and now 5870), along with a cheaper counterpart with slower RAM and a slightly handicapped core (4830/5850). Higher spec single GPU versions will most likely just benefit from more and/or faster RAM and/or a higher core clock, but the architecture of the core itself will probably be unchanged - can nVidia afford to release a competing version of Fermi without disabling parts of the core? If it's as powerful as we're lead to believe, it will certainly warrant a higher price tag than the 5870.
  • Ahmed0 - Saturday, October 3, 2009 - link

    Nvidia wants it to be the jack of all trades. However, they are risking with being an overpriced master of none. Thats probably the reason they give their cards more and more gimmicks to play with each year. They are hoping that the cards value will be greater than the sum of its parts. And that might even be a successful strategy to some extent. In a consumerist world, reputation is everything.

    They might start overdoing it at some point though.

    Its like mobile phones nowadays. You really dont need to have a radio, an mp3-player, a camera nor other such extras in it (in fact, my phone isnt able to do anything but call and send messages). But unless you have these features, you arent considered as competition. It gives you the opportunity to call your product "vastly superior" even though from a usability standpoint it isnt.
  • SymphonyX7 - Saturday, October 3, 2009 - link

    Ahh... I see where you're coming from. I've had many classmates who've asked me what laptop to buy and they're always so giddy when they see laptops with the "Geforce" sticker and say they want it cause they want some casual gaming. Yes, even if the GPU is a Geforce 9100M. I recommended them laptop using AMD's Puma platform and many of them ask if that's a good choice (unfortunately here, only the Macbook has a 9400M GPU and it's still outside many of my classmates' budgets). Seems like brand awareness of Nvidia amongst many consumers is still much better than AMD/ATI's. So it's an issue of clever branding then?

  • Lifted - Saturday, October 3, 2009 - link

    A little late for any meaningful discussion over here as AT let the trolls go for 40 or so pages. I doubt many people can be arsed to sort through it now, so you'd be better off going to a forum for a real discussion of Fermi.
  • neomocos - Saturday, October 3, 2009 - link

    if you missed it then here you go ... happy day for all of us :

    quote from comment posted on page 37 by Pastuch

    " Below is an email I got from Anand. Thanks so much for this wonderful site.

    -------------------------------------------------------------------
    Thank you for your email. SiliconDoc has been banned and we're accelerating the rollout of our new comments rating/reporting system as a result of him and a few other bad apples lately.

    A- "
  • james jwb - Saturday, October 3, 2009 - link

    Some may enjoy it, but this unusual freedom that blatant trolls using aggressive, rude language are getting lately is making a mockery of this site.

    I don't mind it going on for a while, even 20 pages tbh, it is funny, but at some point i'd like to see a message from Gary saying, "K, SiliconDoc, we've laughed enough at your drivel, tchau, banned! :)"

    That's what i want to see after reading through 380 bloody comments, not that he's pretty much gotten away with it. And if he has finally been banned, i'd actually love to know about it in the comments section.

    /Rant over.
  • Gary Key - Monday, October 5, 2009 - link

    He is gone as are a couple of others. We have a new comments system in final development now that should take care of this problem in the future.

Log in

Don't have an account? Sign up now