The Rest of GF104

Besides adding superscalar dispatch abilities to GF104, NVIDIA has also made a number of other tweaks to the Fermi architecture for this GPU.

As a mid-range product, GF104 does not need to do 2 jobs at once. GF100 had to be usable as a desktop/professional graphics GPU, but also as a compute GPU for NVIDIA’s Tesla line of cards. GF104 will not be a Tesla product, so those compute abilities are not as critical. Specifically, NVIDIA has taken a chisel to Tesla’s flagship compute abilities of FP64 and ECC, which in GF100 desktop GPUs were artificially throttled and disabled respectively.

For GF104, ECC is completely gone. Barring the errant burst of solar radiation, the odds of a flipped bit or other error in the operation of a GPU is extremely slim. NVIDIA only added the feature for Tesla customers who demanded increased reliability as they could not accept a silent error in their work. For graphics however this is unnecessary, so the feature has been dropped.

Double-precision floating-point (FP64) on the other hand hasn’t been entirely dropped. Like ECC, FP64 is primarily a Tesla feature, but at the same time NVIDIA believes it to not be in their best interests to remove the feature. From NVIDIA’s perspective without FP64 on their consumer cards developers could not test and debug FP64 code on their desktops and laptops, which in turn would impede development for Tesla and hurt their efforts to expand in to the professional compute space. As a result GF104 has an interesting compromise on FP64.

For GF104, NVIDIA removed FP64 from only 2 of the 3 blocks of CUDA cores. As a result 1 block of 16 CUDA cores is FP64 capable, while the other 2 are not. This gives NVIDIA the advantage of being able to employ smaller CUDA cores for 32 of the 48 CUDA cores in each SM while not removing FP64 entirely. Because only 1 block of CUDA cores has FP64 capabilities and in turn executes FP64 instructions at 1/4  FP32 performance (handicapped from a native 1/2), GF104 will not be a FP64 monster. But the effective execution rate of 1/12th FP32 performance will be enough to effectively program in FP64 and debug as necessary.

Moving on, we have GF104’s texture units. GF100 was an interesting beast when it came to texturing, as it had texture units more efficient than GT200, but fewer of them overall.  We don’t have any data that points to GF100 being absolutely deficient on texturing speeds, but at the same time it’s hard to imagine that GF100 was overbuilt to the point that losing 32 texture units wouldn’t hurt.

So for GF104, NVIDIA has doubled up on the number of texture units. A “full” GF104 has the same number of texture units at GF100 (64) in half as many SMs. NVIDIA tells us that this change is largely because texture units are small enough that they can be added without consuming too much additional die space, as opposed to requiring additional texture units such as a specific case of lacking texture performance or having too little texture performance relative to shading performance. But this isn’t something we can prove or disprove. High-detail settings optimized for high-end cards often go heavy on anti-aliasing or shading as opposed to textures, so ultimately we’re not surprised that NVIDIA kept the texture unit count constant while reducing the shader count in moving from GF100 to GF104. The shaders will be missed much less than the texture units would have been.

 

Finally, we have the ROPs. There haven’t been any significant changes here, but the ROP count does affect compute performance by impacting memory bandwidth and L2 cache. Even though NVIDIA keeps the same number of SMs on both the 1GB and 768MB of the GTX 460, the latter will have less L2 cache which may impact compute performance. Compute performance on the GTX 460 may also be impacted by pressure on the registers and L1 cache: NVIDIA increased the number of CUDA cores per SM, but not the size of the Register File or the amount of L1 cache/shared memory, so there are now additional CUDA cores fighting for the same resources. In the worst case scenarios, this can hurt the efficiency of GF104 compared to GF100.

For those of you who are curious, with all of these SM changes between GF100 and GF104 the size of a SM did increase, but by nearly as much as one would think: after adding the additional functional units, infusing the warp schedulers with superscalar dispatch capabilities, and removing unnecessary ECC and FP64 hardware, the size of an SM only increased by 25%. This is a tradeoff NVIDIA could not afford on the already massive GF100, but made sense on GF104 where the performance increase could justify the extra die space.

GF104: NVIDIA Goes Superscalar Meet the GTX 460
POST A COMMENT

93 Comments

View All Comments

  • jfelano - Monday, July 12, 2010 - link

    I've already seen 5830's at $170 after rebate. So there goes that short lived Nvidia advantage. Reply
  • itsmekirill - Monday, July 12, 2010 - link

    IMO the most important story here is not that it beats the HD 5830 or GTX 465, but that the SLI configuration is trading blows with 5870 CF and 5970.

    For ~450 dollars you can get comparable if not superior performance to an $800 CF setup or a $650 dollar 5970.
    Reply
  • tcnasc - Monday, July 12, 2010 - link

    Yes, that's what got me too!

    My 5870 feels so expensive right now
    Should I sell it and buy 2 GTX 460?
    Reply
  • fausto412 - Monday, July 12, 2010 - link

    i didn't miss that...i got my 5870 3 weeks ago and i was like "wtf...this is a 200 dolalr card, it shouldn't do that well!" Reply
  • VIDYA - Monday, July 12, 2010 - link

    good review but a bit partial towards the new born child(gtx 460, cant hold that kind of joy).......its pretty much still oranges and apples .....both are good and differ at a few games. Nvidia shouldn't have sold 480, 470, 465 and instead should be waiting back to mature the chip into 104gf.....think about the owners of gf100 chip reading this and cursing themselves for not holding back for a month. But all said and done from both sides, we all know that Nvidia is still no:1 when it comes to drivers and software support updates. Reply
  • SongEmu - Monday, July 12, 2010 - link

    The quality and depth of these articles is exactly why I keep Anandtech bookmarked.

    Also, it's good to see nVidia with its headon straight. I was afraid I'd have to give up CUDA and all those other goodies on my next upgrade, because there was no way in GF100-hell I was going to buy a GTX470 toaster.
    Reply
  • sparkuss - Monday, July 12, 2010 - link

    I thought I looked at all the charts and I also didn't see any mention in the conclusion of 5850 CF vs the 460 1GB. Then again I'm old and senile so I have that going for me!

    I know the price differences but I would still like to see the comparison, especially if this could actually cause AMD to lower 5850 prices.
    Reply
  • 7Enigma - Tuesday, July 13, 2010 - link

    Agreed. That would probably be the most common CF setup and certainly the most applicable from a cost-comparison standpoint. Sure it would be $600 vs. $400 but why have the 5870 CF which is even more crazy at $800 vs. $400? Reply
  • rocky12345 - Monday, July 12, 2010 - link

    I read the review & it was pretty good & well written. It is good to see Nvidia get their act together somewhat. I am not a Nvidia fan any more lost faith in them when they started rebranding the 8800 series over & over again & I switched to ATI after that. I still have a 9800GT 1GB in one of my systems which I rebadged myself from a 8800GT 1GB & clocked it at 755Mhz core & 2200mhz memory it is fast enough for my secondary system for when friends come over to game.

    This new chip from nvidia makes a lot more sense than what they released a few months ago. Would I own on hell no. I already have enough money tied up in video cards in my main system as I own 2 4870x2 2GB cards highly over clocked crossfired until ATI comes out with a single card that can beat my 2 beasts in quad GPU I am fine with what I have.
    Reply
  • Belard - Monday, July 12, 2010 - link

    Right on, the GTX 460-768 should be a 455, its so NOT the same card.

    For the most part... why is idiot-Nvidia even bothering with the "GTX" part since their model numbers don't collide? ie: there won't be a GTX 460 and a GTS 460 or a GT 460... well, maybe... who knows.

    The two 460's is designed EXACTLY to do what its going to do. People will buy the cheaper junk card and not get the performance they should get if they only spent $30 more.

    Considering the age of the ATI 5000 series... it really should be EASY for ATI to reduce the prices of the line a bit.

    The 5850 should be a $200 card by now. 5830 at $150, 5770 at $125... perhaps soon. 6000s come out just before Christmas?
    Reply

Log in

Don't have an account? Sign up now