A More Efficient Architecture

GPUs, like CPUs, work on streams of instructions called threads. While high end CPUs work on as many as 8 complicated threads at a time, GPUs handle many more threads in parallel.

The table below shows just how many threads each generation of NVIDIA GPU can have in flight at the same time:

  Fermi GT200 G80
Max Threads in Flight 24576 30720 12288

 

Fermi can't actually support as many threads in parallel as GT200. NVIDIA found that the majority of compute cases were bound by shared memory size, not thread count in GT200. Thus thread count went down, and shared memory size went up in Fermi.

NVIDIA groups 32 threads into a unit called a warp (taken from the looming term warp, referring to a group of parallel threads). In GT200 and G80, half of a warp was issued to an SM every clock cycle. In other words, it takes two clocks to issue a full 32 threads to a single SM.

In previous architectures, the SM dispatch logic was closely coupled to the execution hardware. If you sent threads to the SFU, the entire SM couldn't issue new instructions until those instructions were done executing. If the only execution units in use were in your SFUs, the vast majority of your SM in GT200/G80 went unused. That's terrible for efficiency.

Fermi fixes this. There are two independent dispatch units at the front end of each SM in Fermi. These units are completely decoupled from the rest of the SM. Each dispatch unit can select and issue half of a warp every clock cycle. The threads can be from different warps in order to optimize the chance of finding independent operations.

There's a full crossbar between the dispatch units and the execution hardware in the SM. Each unit can dispatch threads to any group of units within the SM (with some limitations).

The inflexibility of NVIDIA's threading architecture is that every thread in the warp must be executing the same instruction at the same time. If they are, then you get full utilization of your resources. If they aren't, then some units go idle.

A single SM can execute:

Fermi FP32 FP64 INT SFU LD/ST
Ops per clock 32 16 32 4 16

 

If you're executing FP64 instructions the entire SM can only run at 16 ops per clock. You can't dual issue FP64 and SFU operations.

The good news is that the SFU doesn't tie up the entire SM anymore. One dispatch unit can send 16 threads to the array of cores, while another can send 16 threads to the SFU. After two clocks, the dispatchers are free to send another pair of half-warps out again. As I mentioned before, in GT200/G80 the entire SM was tied up for a full 8 cycles after an SFU issue.

The flexibility is nice, or rather, the inflexibility of GT200/G80 was horrible for efficiency and Fermi fixes that.

Architecting Fermi: More Than 2x GT200 Efficiency Gets Another Boon: Parallel Kernel Support
Comments Locked

415 Comments

View All Comments

  • Moricon - Thursday, October 1, 2009 - link

    Silicondoc you are a big GREEN C*OK lol it even rhymes!
  • Alberto - Thursday, October 1, 2009 - link

    The GPGPU market is very little...doesn't make good revenue for a company. Moreover the sw development is HARD and expensive; this piece of silicon seems like a NVidia mistake: too big to manifacture for a graphic card, nice (on the paper) for a market of niche.
    In comparison Amd is working far better and Intel too with Larrabee.
    Hard times at the orizon for Nvidia, it has a monster with very low manufacturing yields, but nothing feasible for the consumer arena.
    A prediction ? AMD will have the lead of graphic cards in the next years.....
  • SiliconDoc - Thursday, October 1, 2009 - link

    The yields are fine, apparently you caught wind of the ati pr crew who was caught out lying.
    If not, you just made your standard idiot assumption, because the actual FACTS concerning the these tow latest 40nm chips is that ati yields have been very poor, and nvidias have been good.
    ---
    Nice try, but you're wrong.

    " Scalable Informatics has been selling NVidia Tesla (C1060) units as part of our Pegasus-GPU and Pegasus-GPU+Cell units. Several issues have arisen with Tesla availability and pricing.

    First issue: Tesla units are currently on a 4-8 week back order. We have no control over this, all other vendors have the exact same issues. NVidia is not shipping Tesla in any appreciable volumes.

    Our workaround: Until NVidia is able to ramp its volume to appropriate levels, Scalable Informatics will provide loaner GTX260 cards in place of the Tesla units. Once the Tesla units ship, we will exchange the GTX260 units for the Tesla units.

    Update: 1-September-2009

    Tesla C1060 units are now readily available for Pegasus and JackRabbit systems.
    ---
    NOW THE PRICING

    " Scalable Informatics JackRabbit systems are available in deskside and rackmount configurations, starting at 8 TB (TeraByte) in size, with individual systems ranging from 8TB to 96TB, and storage clusters up to 62 PB (PetaByte), with most systems starting price under $1USD/GB."

    So, an 8TB system is 8 grand, 96TB 96 grand, and a 62 petabyte in the approaching one MILLION range.
    http://www.scalableinformatics.com/catalog">http://www.scalableinformatics.com/catalog

    Yes, not much there. LOLOLOL
    --
    POWER SAVINGS replacing massive cpu computers
    --
    The BNP Paribas (finance) study showed a $250,000 500 core cluster (37.5 kW) replaced with a 2 S1070 Tesla cluster at a cost of $24,000 and using only 2.8 kW. A study with oil and gas company Hess showed an $8M 2000-socket system (1.2Mw) being replaced by a 32 S1070 cluster for $400,000 and using only 45 kW in 31x less space. If you are running a CUDA-enabled application, or have access to the source code (you’ll need that to take advantage of the GPUs), you can clearly get significant performance gains for certain applications.
    -
    about 4 TFLOPS of peak from four C1060 cards (or 3 C1060 and a Quadro) and plugs into a standard wall outlet. Word from some of those selling this system is that sales have been mostly in the academic space and a little slower than expected, possibly due to the initially high ($10k+) price point. Prices have started to come down, however, and that might help sales. You can buy these today from vendors like Dell, Colfax, AMAX, Microway, and Penguin (for a partial list see NVIDIA’s PS product page).
    -
    ---

    And, of course you predict amd will have the lead in videocards the next few years. LOL
    bhwahahahahaaaaaaaaaaaaaaaaaa
  • thebeastie - Thursday, October 1, 2009 - link

    Personally I think NVidia has made the best bet it can make with supporting more Telsa style stuff, and in general just building a bigger madder GPU.

    The fact is that there aren't many good PC games around, I would say NVidia made some good sales out of Crysis by it self, people building a new PC with that game in mind having a very large weight on GPU choice.

    But it is just not enough. L4D 2 is the next big title but being on the Vavle engine everyone know you will get 100fps on a GTX 275.
    The other twist is that Steam has probably been one of the best things for gaming on the PC it just makes things 10 times easier.

    Manually patching games etc is a killer for all but those who are gaming enthusiasts.
  • Dante80 - Thursday, October 1, 2009 - link

    GT300 looks like a revolutionary product as far as HPC and GPU Computing are concerned. Happy times ahead, for professionals and scientists at least...

    Regarding the 3d gaming market though, things are not as optimistic. GT300 performance is rather irrelevant, due to the fact that nvidia currently does not have a speedy answer for the discrete, budget, mainstream and lower performance segments. Price projections aside, the GT300 will get the performance crown, and act as a marketing boost for the rest of the product line. Customers in the higher performance and enthusiast markets that have brand loyalty towards the greens are locked anyway. And yes, thats still irrelevant.

    Remember ppl, the profit and bulk in the market is in a price segment nvidia does not even try to address currently. We can only hope that the greens can get sth more than GT200 rebranding/respins out for the lower market segments. Fast. Ideally, the new architecture should be able to be downscaled easily. Lets hope for that, or its definitely rough times ahead for nvidia. Especially if you look closely at the 5850 performance per $ ratio, as well as the juniper projections. And add in the economy crisis, shifting consumer focus, the difference of performance needed by sotware and performance given by the hw, the locking of TFT resolutions and heat/power consumption concerns.

    With AMD getting out of the warehouses the whole 5XXX family in under 6months (I think thats a first for the GPU industry, I might be wrong though), the greens are in a rather tight spot atm. GT200 respins wont save the round, GT300 @500$++ wont save the round, and tesla wont certainly save the round (just look at sales and profit in the last years concerning the HPC-GPUCU segments).

    Lets hope for the best, its in our interest as consumers anyway..
  • blindbox - Thursday, October 1, 2009 - link

    I'm sorry, but I couldn't resist.

    The Adventures of SiliconDoc.

    NVIDIA GeForce GTS 250: A Rebadged 9800 GTX+
    http://www.anandtech.com/showdoc.aspx?i=3523">http://www.anandtech.com/showdoc.aspx?i=3523

    ATI Radeon HD 4890 vs. NVIDIA GeForce GTX 275
    http://www.anandtech.com/video/showdoc.aspx?i=3539...">http://www.anandtech.com/video/showdoc.aspx?i=3539...

    AMD's Radeon HD 5870: Bringing About the Next Generation Of GPUs
    http://www.anandtech.com/video/showdoc.aspx?i=3643...">http://www.anandtech.com/video/showdoc.aspx?i=3643...

    The Radeon HD 4870 1GB: The Card to Get
    http://www.anandtech.com/showdoc.aspx?i=3415">http://www.anandtech.com/showdoc.aspx?i=3415

    Overclocking Extravaganza: GTX 275's Complex Characteristics
    http://www.anandtech.com/video/showdoc.aspx?i=3575">http://www.anandtech.com/video/showdoc.aspx?i=3575

    NVIDIA GeForce GTX 295: Leading the Pack
    http://www.anandtech.com/showdoc.aspx?i=3498&p...">http://www.anandtech.com/showdoc.aspx?i=3498&p...

    Faster Graphics For Lower Prices: ATI Radeon HD 4770
    http://www.anandtech.com/video/showdoc.aspx?i=3553...">http://www.anandtech.com/video/showdoc.aspx?i=3553...

    Of course, check the comments.

    I couldn't find his comments in the 4870x2 review, nor the pre-DX10 days.
  • tamalero - Friday, October 2, 2009 - link

    this guy is such a epic trainwreck....
    I actually wonder if this guy is the ANGRY GERMAN KID on disguise ( check the video on youtube lol )
  • Docket - Thursday, October 1, 2009 - link

    Yep SiliconDoc has been making same nonsense noise elsewhere as well and been banned at least from one other site (google silicondocs):

    http://forums.bit-tech.net/showthread.php?p=203896...">http://forums.bit-tech.net/showthread.php?p=203896...
    Here extract from bit-tech staff:
    -----------
    OK time for you to go, you contribute nothing to the community other than trolling, bye bye.

    I'm leaving all your posts here for evidence that you're a complete lunatic, but I'm glad you realise that you do need help. It's the first step.

    I recommend checking out Nvidia forums and posting there - you'll feel more at home.
    -----------

    S/He is obviously retarded person. I mean initially it was "fun" to read but now I'm just so bored with this s*it and it is actually interfering while trying to read comments from other readers. Maybe that is the whole point of the noise; to side track any meaningful conversation.

    I vote silicondoc to banned from this site (or give me an ability to filter all the post from and related to this user)... anyone else?

  • SiliconDoc - Thursday, October 1, 2009 - link

    It seems to me, you wish to remain absolutely blind with your fuming hatred and emotional issues, let's see WHAT was supposedly said at your link:

    " Originally Posted by wuyanxu
    nVidia is trying very hard to NOT loose this round, they've priced this too aggressively, surely there's some cooperate law on this? "
    ---
    Here we see the brain deranged red rooster, who has been decieved by the likes of you know who, for so long, that a low priced Nvidia card that beats the ati card, must be "illegally priced", according to the little red communist overseas.
    I suppose pointing that out in a fashion you and your glorious roosters don't like, is plenty reason for you to shriek "contributes nothing" and "let's ban him!"
    Well, fire up your glowing red torches, and I will gladly continue to show what fools red roosters can be, and often are.
    I'm so happy you linked some silicondoc post on some other forum, and we had the chance to see the deranged red rooster screech that a low priced Nvidia GTX275 is illegal.
    --
    Good for you, you're such a big help here.
  • strikeback03 - Thursday, October 1, 2009 - link

    would be nice, I was wondering when I saw this article how it could have 140 comments already, forgetting he was sure to come trolling. I've stopped reading each comment thread after he got involved, since any chance of reliable information coning out has ceased.

Log in

Don't have an account? Sign up now