NVIDIA's Fermi: Architected for Tesla, 3 Billion Transistors in 2010

Name: NVIDIA's Fermi: Architected for Tesla, 3 Billion Transistors in 2010
Item: NVIDIA's Fermi: Architected for Tesla, 3 Billion Transistors in 2010
Author: Anand Lal Shimpi

by Anand Lal Shimpi on September 30, 2009 12:00 AM EST

Posted in
GPUs

415 Comments | Add A Comment

415 Comments

A More Efficient Architecture

GPUs, like CPUs, work on streams of instructions called threads. While high end CPUs work on as many as 8 complicated threads at a time, GPUs handle many more threads in parallel.

The table below shows just how many threads each generation of NVIDIA GPU can have in flight at the same time:

	Fermi	GT200	G80
Max Threads in Flight	24576	30720	12288

Fermi can't actually support as many threads in parallel as GT200. NVIDIA found that the majority of compute cases were bound by shared memory size, not thread count in GT200. Thus thread count went down, and shared memory size went up in Fermi.

NVIDIA groups 32 threads into a unit called a warp (taken from the looming term warp, referring to a group of parallel threads). In GT200 and G80, half of a warp was issued to an SM every clock cycle. In other words, it takes two clocks to issue a full 32 threads to a single SM.

In previous architectures, the SM dispatch logic was closely coupled to the execution hardware. If you sent threads to the SFU, the entire SM couldn't issue new instructions until those instructions were done executing. If the only execution units in use were in your SFUs, the vast majority of your SM in GT200/G80 went unused. That's terrible for efficiency.

Fermi fixes this. There are two independent dispatch units at the front end of each SM in Fermi. These units are completely decoupled from the rest of the SM. Each dispatch unit can select and issue half of a warp every clock cycle. The threads can be from different warps in order to optimize the chance of finding independent operations.

There's a full crossbar between the dispatch units and the execution hardware in the SM. Each unit can dispatch threads to any group of units within the SM (with some limitations).

The inflexibility of NVIDIA's threading architecture is that every thread in the warp must be executing the same instruction at the same time. If they are, then you get full utilization of your resources. If they aren't, then some units go idle.

A single SM can execute:

Fermi	FP32	FP64	INT	SFU	LD/ST
Ops per clock	32	16	32	4	16

If you're executing FP64 instructions the entire SM can only run at 16 ops per clock. You can't dual issue FP64 and SFU operations.

The good news is that the SFU doesn't tie up the entire SM anymore. One dispatch unit can send 16 threads to the array of cores, while another can send 16 threads to the SFU. After two clocks, the dispatchers are free to send another pair of half-warps out again. As I mentioned before, in GT200/G80 the entire SM was tied up for a full 8 cycles after an SFU issue.

The flexibility is nice, or rather, the inflexibility of GT200/G80 was horrible for efficiency and Fermi fixes that.

Architecting Fermi: More Than 2x GT200 Efficiency Gets Another Boon: Parallel Kernel Support

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

415 Comments

View All Comments

- Friday, October 2, 2009 - link
You are all talking too much about technologies. Who cares about this? DX11 from ATI is already available in Japan and they are selling like sex dolls. And why didnt NVDIA provided any benchmarks? Perhaps the drivers aren ready or Nvidia doesnt even know at what clockspeed this monster can run without exhausting your pcs power supply. Fermi is not here yet, it is a concept but not a product. ATI will cash in and Nvidia can only look. And when the Fermi-Monster will finally arrive, ATI will enroll with 5890 and X2 in the luxury class and some other products in the 100 Dollar class. Nvidia will always be a few months late and ATI will get the business. It is that easy. Who wants all this Cuda stuff? Some number crunching in the science field, ok. But if it were for physix an add-on board would do. But in reality there was never any run for physix. Why should this boom come now? I think Nvdia bet on the wrong card and they will suffer heavily for this wrong decision. They had better bought VIA or its CPU-division instead of Physix. Physix is no standard architecture and never will. In contrast, ATI is doing just what gamers want and this is were the money is. Were are the Gaming-benchmarks for FERMI? Nvidia is over!
- Friday, October 2, 2009 - link
With all this Cuda and Physix stuff Nvidia will have 20-30% more power consumption at any pricepoint and up to 50% higher production costs because of their much bigger die size. ATI will lower the price whenever necessary in order to beat Nvidia in the market place! And when will Nvida arrive? Yesterday we didnt see even a paperlaunch! It was the announcement of a paperlaunch maybe in late december but the cards wont be available until late q12010 I guess. They are so much out of the business but most people do not realise this.
Ahmed0 - Friday, October 2, 2009 - link
I know for sure SD is from Illinois (his online profiles which are related to his rants [which in turn are related to each other] point to it).

So, Im going to go out on a limb here and suggest that SiliconDoc was/is this guy:

http://www.automotiveforums.com/vbulletin/member.p...">http://www.automotiveforums.com/vbulletin/member.p...

A little googling might (or might not) support the fact that he is a loony. Just type "site:forums.sohc4.net silicondoc" and youll find he has quite a reputation there (different site but seems to be the same profile, "handwriting" and same bike)

And that MIGHT lead us to the fact that he MIGHT actually be (currently) 45 and not a young raging teenage nerd called Brian.

Of course... this is just some fun guesswork I did (its all just oh so entertaining).
Ahmed0 - Friday, October 2, 2009 - link
Well... either that or all users called SiliconDoc are arsholes.
k1ckass - Friday, October 2, 2009 - link
I guess silicondoc would eat **** if nvidia says that it tastes good, LOL.

btw, fermi cards shown appears to be fake...
http://www.semiaccurate.com/2009/10/01/nvidia-fake...">http://www.semiaccurate.com/2009/10/01/nvidia-fake...

and btw, I use an nvidia gtx, propable would get an hd5870 next week because of all this crap nvidia throws at its consumers.
Pastuch - Friday, October 2, 2009 - link
Below is an email I got from Anand. Thanks so much for this wonderful site.

-------------------------------------------------------------------
Thank you for your email. SiliconDoc has been banned and we're accelerating the rollout of our new comments rating/reporting system as a result of him and a few other bad apples lately.

A-
tamalero - Saturday, October 3, 2009 - link
about time, was getting boring with the constant "bubba, red roosters, morons..etc.."
sigmatau - Friday, October 2, 2009 - link
....
...
SiliconDoc getting banned.... PRICELESS.
PorscheRacer - Friday, October 2, 2009 - link
So it's safe now to post again? Much thanks has to go to Anand to cleaning up the virus that has infected these comments. I mean, it's new tech. Aren't we free to postulate about what we think is going on, discuss our thoughts and feelings without fear of some person trolling us down till we can't breathe? It feels better in here now, so thanks again.
Mr Perfect - Saturday, October 3, 2009 - link
It looks like it safe... After about 37 pages.

Good job though, it's actually been worse in Anandtech comments then it usually is on Daily Tech! Now that's saying something...