The Rest of GF104

Besides adding superscalar dispatch abilities to GF104, NVIDIA has also made a number of other tweaks to the Fermi architecture for this GPU.

As a mid-range product, GF104 does not need to do 2 jobs at once. GF100 had to be usable as a desktop/professional graphics GPU, but also as a compute GPU for NVIDIA’s Tesla line of cards. GF104 will not be a Tesla product, so those compute abilities are not as critical. Specifically, NVIDIA has taken a chisel to Tesla’s flagship compute abilities of FP64 and ECC, which in GF100 desktop GPUs were artificially throttled and disabled respectively.

For GF104, ECC is completely gone. Barring the errant burst of solar radiation, the odds of a flipped bit or other error in the operation of a GPU is extremely slim. NVIDIA only added the feature for Tesla customers who demanded increased reliability as they could not accept a silent error in their work. For graphics however this is unnecessary, so the feature has been dropped.

Double-precision floating-point (FP64) on the other hand hasn’t been entirely dropped. Like ECC, FP64 is primarily a Tesla feature, but at the same time NVIDIA believes it to not be in their best interests to remove the feature. From NVIDIA’s perspective without FP64 on their consumer cards developers could not test and debug FP64 code on their desktops and laptops, which in turn would impede development for Tesla and hurt their efforts to expand in to the professional compute space. As a result GF104 has an interesting compromise on FP64.

For GF104, NVIDIA removed FP64 from only 2 of the 3 blocks of CUDA cores. As a result 1 block of 16 CUDA cores is FP64 capable, while the other 2 are not. This gives NVIDIA the advantage of being able to employ smaller CUDA cores for 32 of the 48 CUDA cores in each SM while not removing FP64 entirely. Because only 1 block of CUDA cores has FP64 capabilities and in turn executes FP64 instructions at 1/4  FP32 performance (handicapped from a native 1/2), GF104 will not be a FP64 monster. But the effective execution rate of 1/12th FP32 performance will be enough to effectively program in FP64 and debug as necessary.

Moving on, we have GF104’s texture units. GF100 was an interesting beast when it came to texturing, as it had texture units more efficient than GT200, but fewer of them overall.  We don’t have any data that points to GF100 being absolutely deficient on texturing speeds, but at the same time it’s hard to imagine that GF100 was overbuilt to the point that losing 32 texture units wouldn’t hurt.

So for GF104, NVIDIA has doubled up on the number of texture units. A “full” GF104 has the same number of texture units at GF100 (64) in half as many SMs. NVIDIA tells us that this change is largely because texture units are small enough that they can be added without consuming too much additional die space, as opposed to requiring additional texture units such as a specific case of lacking texture performance or having too little texture performance relative to shading performance. But this isn’t something we can prove or disprove. High-detail settings optimized for high-end cards often go heavy on anti-aliasing or shading as opposed to textures, so ultimately we’re not surprised that NVIDIA kept the texture unit count constant while reducing the shader count in moving from GF100 to GF104. The shaders will be missed much less than the texture units would have been.

 

Finally, we have the ROPs. There haven’t been any significant changes here, but the ROP count does affect compute performance by impacting memory bandwidth and L2 cache. Even though NVIDIA keeps the same number of SMs on both the 1GB and 768MB of the GTX 460, the latter will have less L2 cache which may impact compute performance. Compute performance on the GTX 460 may also be impacted by pressure on the registers and L1 cache: NVIDIA increased the number of CUDA cores per SM, but not the size of the Register File or the amount of L1 cache/shared memory, so there are now additional CUDA cores fighting for the same resources. In the worst case scenarios, this can hurt the efficiency of GF104 compared to GF100.

For those of you who are curious, with all of these SM changes between GF100 and GF104 the size of a SM did increase, but by nearly as much as one would think: after adding the additional functional units, infusing the warp schedulers with superscalar dispatch capabilities, and removing unnecessary ECC and FP64 hardware, the size of an SM only increased by 25%. This is a tradeoff NVIDIA could not afford on the already massive GF100, but made sense on GF104 where the performance increase could justify the extra die space.

GF104: NVIDIA Goes Superscalar Meet the GTX 460
Comments Locked

93 Comments

View All Comments

  • threedeadfish - Monday, July 12, 2010 - link

    I know you guys are all up in arms when a company releases information about up coming products, but you know that's information that can help a consumer.. I was looking for a card that was powerful enough while being quite and not using too much power. I ended up with a 5770 and I think it's a great product, however this the 460 offers 5830 performance at 5770 power and noise for only $30 more. I would have waited another week if I had any idea this was coming. You can't tell me nobody at Anandtech knew this was coming. Your anti-paper launch campain has a down site, it doesn't give consumers valuable information and as a result the video card I'll be using for the next couple years will be much less powerful then it would have been if the 465 artical just gave me a heads up, or just a little message saying hold off on $200 video card purchases something's coming. I only buy a new video card every few years please give me the information I need to make the best purchase. In this case waiting another week is what I should have done.
  • notext - Monday, July 12, 2010 - link

    If you notice, everyone put out their info on this card today. That is because an NDA. Even suggesting anything about this card without nVidia's permission is a quick way to guarantee you won't get future releases.
  • Phate-13 - Monday, July 12, 2010 - link

    Euhm, where can you find the "at 5770 power consumption"? The tables are quite clear that it uses 40-70Watts MORE under load then the 5770.

    And indeed, there is something called and NDA.
  • Phate-13 - Monday, July 12, 2010 - link

    **** this. I want to be able to edit my posts.

    'something called AN NDA.'
  • Death666Angel - Thursday, July 15, 2010 - link

    This is a review site, not a news or rumours site. If you are interested in the what the next couple of months bring from companies like Intel, AMD and nVidia, you need to start using sites like Fudzilla, that report hardware news and rumours.

    And trust me, there was plenty of information on the 460 being in the making and probably outperforming the 465 at a lower price point. :)

    And if you regret the purchase of a 9 month old card because one that just got released has higher performance (20%-40%?), while using more electricity (20%), costs more (60% - 130€ to 210€ for the cheapest cards each), you are going to be a very sad PC buyer, because normally, a new product will be faster _and_ cheaper, while now it is just faster, but a hellovalot more expensive too. :-)
  • Lord 666 - Monday, July 12, 2010 - link

    Definitely some details missing for a complete picture on this card.
  • Lonyo - Monday, July 12, 2010 - link

    There's more too.

    No real discussion of the reduction in polymorph engine to shader ratio, such as tessellation benchmarks (synthetic or otherwise).
    Nothing on minimum frame rates (and anything which is put up uses the older 10.3 drivers for ATI).
    In addition to the general compute performance benchmarks that you mention.

    Nothing about CUDA games (e.g. Just Cause 2) comparing the GTX465 to the GTX460.
    No consideration of ROP vs memory changes (i.e. is it memory bandwidth limited or is it purely the ROP reduction causing the performance hit on the 768MB card).

    Maybe the cards didn't come out in time. Maybe everything, or more stuff at least, will be covered in Pt 2, but it is somewhat disappointing that so many things are totally missing.
  • Ryan Smith - Monday, July 12, 2010 - link

    You hit the nail on the head with your comment on time. I actually have the data, but with the limited amount of time I had I wasn't able to write the analysis (most of my time was spent on better covering the architecture). That will be amended to the article later today, but for now you can see the raw graphs.

    http://images.anandtech.com/graphs/gtx460_07111017...
    http://images.anandtech.com/graphs/gtx460_07111017...
    http://images.anandtech.com/graphs/gtx460_07111017...
    http://images.anandtech.com/graphs/gtx460_07111017...
  • Lonyo - Monday, July 12, 2010 - link

    I hope I didn't come off as too harsh. I started writing and then towards the end realised it could be a time thing, and didn't go back to amend what I had written.
    After looking at most other sites, their reviews are sometimes even worse, covering only a very small handful of games.

    Thanks for the early graphs, much appreciated. Shame NV didn't give more time for proper reviews.
  • jonny30 - Monday, July 12, 2010 - link

    - maybe in your country my dear friend.......maybe there i tell you ;)
    - in my country is 300 you see.......300 as a price start i mean :)
    - and for those 100 extra i buy another hdd for example, not another video card if you know what i mean
    - so, maybe is worth for you, but for me to jump from 4870 to this......
    - i am sorry, but it is not wort it........

Log in

Don't have an account? Sign up now