Compute & Tessellation Performance

With our earlier discussion on the GF104’s revised architecture in mind, along with our gaming benchmarks we have also run a selection of compute and tessellation benchmarks specifically to look at the architecture. Due to the fact that NVIDIA added an additional block of CUDA cores to an SM without adding another warp scheduler, the resulting superscalar design requires that the card extract ILP from the warps in order to simultaneously utilize all 3 blocks of CUDA cores.

As a result the range of best case to worst case scenarios is wider on GF104 than it is GF100: while GF100 could virtually always keep 2 warps going and reach peak utilization, GF104 can only reach peak utilization when at least 1 of the warps has an ILP-safe instruction waiting to go, otherwise the 3rd block of CUDA cores is effectively stalled and a GTX 460 performs more like a 224 CUDA core part. Conversely with a total of 4 dispatch units GF104 is capable of exceeding GF100’s efficiency by utilizing 4 of 7 execution blocks in an SM instead of 2 of 6.

Or in other words, GF104 has the possibility of being more or less efficient than GF100.

For our testing we’re utilizing a GTX 480, a GTX 465, and both versions of the GTX 460, the latter in particular to see if the lack of L2 cache or memory bandwidth will have a significant impact on compute performance. Something to keep in mind is that with its higher clockspeed, the GTX 460 has more compute performance on paper than the GTX 465 – 907GFLOPs for the GTX 460, versus 855GFLOPs for the GTX 465. As such the GTX 460 has the potential to win, but only when it can extract enough ILP to keep the 3rd block of CUDA cores working. Otherwise the worst case scenario – every math instruction is dependent – is 605GFLOPs for the GTX 460. Meanwhile the GTX 480 is capable of 1344GFLOPs, which means the GTX 465 and GTX 460 are 63% and 45%-67% as fast as it on paper respectively.

We’ll start with Stanford’s Folding@Home client. Here we’re using the same benchmark version of the client as from our GTX 480 article, running the Lambda work-unit. In this case we almost have a tie between the GTX 460 and the GTX 465, with the two differing by only a few nodes per day. The GTX 465 reaches 65% of the performance of the GTX 480 here, which is actually beyond the theoretical performance difference. In this case it’s likely that the GTX 480 may be held back elsewhere, allowing slower cards to shorten the gap by some degree.

With that in mind the GTX 460 cards achieve 66% of the performance of the GTX 480 here, giving them a slight edge over the GTX 465. Because we’ve seen the GTX 465 pull off better than perfect scaling here it’s very unlikely that the GTX 460 is actually achieving a perfect ILP scenario here, but clearly it must be close. Folding@Home is clearly not L2 cache or memory bandwidth dependent either, as the 768MB version of the GTX 460 does no worse than its 1GB counterpart.

Next up on our list of compute benchmarks is Badaboom, the CUDA-based video encoder. Here we’re measuring the average framerate for the encode of a 2 minute 1080i video cap. Right off the bat we’re seeing dramatically different results than we saw with Folding@Home, with the GTX 460 cards falling well behind the GTX 465. It’s immediately clear here that Badaboom is presenting a sub-optimal scenario for the GTX 460 where the GPU cannot effectively extract much ILP from the program’s warps. At 56% the speed of a GTX 480, this is worse off than what we saw with Folding@Home but is also right in the middle of our best/worst case scenarios – if anything Badaboom is probably very close to average.

Meanwhile this is another program with the lack of memory bandwidth and L2 cache is not affecting the 768MB card in the slightest, as it returns the same 35fps rate as the 1GB card.

Our third and final compute benchmark is the PostFX OpenCL benchmark from GPU Caps Viewer. The PostFX benchmark clearly isn’t solely compute limited on the GTX 400 series, giving us a fairly narrow range of results that are otherwise consistent with the Badaboom. At 82fps, this puts the GTX 460 below the GTX 465 by around 7%, once again showcasing that the superscalar GTX 460 has more trouble achieving its peak efficiency than the more straightforward GTX 465.

Our final benchmark is a quick look at tessellation. As GF104 packed more CUDA cores in to a SM, the GPU has more than half the compute capabilities of GF100 but only a straight 50% the geometry capabilities. Specifically, the GTX 460 has 45% of the geometry capabilities of the GTX 480 after taking in to account the number of active SMs and the clockspeed difference.

With the DirectX 11 Detail Tessellation sample program, we’re primarily looking at whether we can throw a high enough tessellation load at the GPU to overwhelm its tessellation abilities and bring it to its knees. In this case we cannot, as the GTX 460 scales from tessellation factor 7 to tessellation factor 11 by basically the same rate as the GTX 480 and GTX 465. This means that the GTX 460 still has plenty of tessellation power for even this demanding sample, but by the same measure it showcases than the GTX 480 is overbuilt if future games target GTX 460 for tessellation.

All things considered our compute and tessellation results are where we expected them to be. That is to say that the GTX 460’s wider range of best and worst case scenarios will show up in real-world programs, making its performance relative to a GTX 465 strongly application dependent. While the GF104 GPU’s architectural changes seem to be well tuned for gaming needs and leading to the GTX 460 meeting or beating the GTX 465, the same can’t be said for compute. At this point it would be a reasonable assumption that the GTX 465 is going to outperform the GTX 460 in most compute workloads, so the relevance of this for buyers is going to be how often they’re doing compute workloads and whether they can deal with the GTX 465’s lower power efficiency.

Wolfenstein Power, Temperature, & Noise
Comments Locked

93 Comments

View All Comments

  • threedeadfish - Monday, July 12, 2010 - link

    I know you guys are all up in arms when a company releases information about up coming products, but you know that's information that can help a consumer.. I was looking for a card that was powerful enough while being quite and not using too much power. I ended up with a 5770 and I think it's a great product, however this the 460 offers 5830 performance at 5770 power and noise for only $30 more. I would have waited another week if I had any idea this was coming. You can't tell me nobody at Anandtech knew this was coming. Your anti-paper launch campain has a down site, it doesn't give consumers valuable information and as a result the video card I'll be using for the next couple years will be much less powerful then it would have been if the 465 artical just gave me a heads up, or just a little message saying hold off on $200 video card purchases something's coming. I only buy a new video card every few years please give me the information I need to make the best purchase. In this case waiting another week is what I should have done.
  • notext - Monday, July 12, 2010 - link

    If you notice, everyone put out their info on this card today. That is because an NDA. Even suggesting anything about this card without nVidia's permission is a quick way to guarantee you won't get future releases.
  • Phate-13 - Monday, July 12, 2010 - link

    Euhm, where can you find the "at 5770 power consumption"? The tables are quite clear that it uses 40-70Watts MORE under load then the 5770.

    And indeed, there is something called and NDA.
  • Phate-13 - Monday, July 12, 2010 - link

    **** this. I want to be able to edit my posts.

    'something called AN NDA.'
  • Death666Angel - Thursday, July 15, 2010 - link

    This is a review site, not a news or rumours site. If you are interested in the what the next couple of months bring from companies like Intel, AMD and nVidia, you need to start using sites like Fudzilla, that report hardware news and rumours.

    And trust me, there was plenty of information on the 460 being in the making and probably outperforming the 465 at a lower price point. :)

    And if you regret the purchase of a 9 month old card because one that just got released has higher performance (20%-40%?), while using more electricity (20%), costs more (60% - 130€ to 210€ for the cheapest cards each), you are going to be a very sad PC buyer, because normally, a new product will be faster _and_ cheaper, while now it is just faster, but a hellovalot more expensive too. :-)
  • Lord 666 - Monday, July 12, 2010 - link

    Definitely some details missing for a complete picture on this card.
  • Lonyo - Monday, July 12, 2010 - link

    There's more too.

    No real discussion of the reduction in polymorph engine to shader ratio, such as tessellation benchmarks (synthetic or otherwise).
    Nothing on minimum frame rates (and anything which is put up uses the older 10.3 drivers for ATI).
    In addition to the general compute performance benchmarks that you mention.

    Nothing about CUDA games (e.g. Just Cause 2) comparing the GTX465 to the GTX460.
    No consideration of ROP vs memory changes (i.e. is it memory bandwidth limited or is it purely the ROP reduction causing the performance hit on the 768MB card).

    Maybe the cards didn't come out in time. Maybe everything, or more stuff at least, will be covered in Pt 2, but it is somewhat disappointing that so many things are totally missing.
  • Ryan Smith - Monday, July 12, 2010 - link

    You hit the nail on the head with your comment on time. I actually have the data, but with the limited amount of time I had I wasn't able to write the analysis (most of my time was spent on better covering the architecture). That will be amended to the article later today, but for now you can see the raw graphs.

    http://images.anandtech.com/graphs/gtx460_07111017...
    http://images.anandtech.com/graphs/gtx460_07111017...
    http://images.anandtech.com/graphs/gtx460_07111017...
    http://images.anandtech.com/graphs/gtx460_07111017...
  • Lonyo - Monday, July 12, 2010 - link

    I hope I didn't come off as too harsh. I started writing and then towards the end realised it could be a time thing, and didn't go back to amend what I had written.
    After looking at most other sites, their reviews are sometimes even worse, covering only a very small handful of games.

    Thanks for the early graphs, much appreciated. Shame NV didn't give more time for proper reviews.
  • jonny30 - Monday, July 12, 2010 - link

    - maybe in your country my dear friend.......maybe there i tell you ;)
    - in my country is 300 you see.......300 as a price start i mean :)
    - and for those 100 extra i buy another hdd for example, not another video card if you know what i mean
    - so, maybe is worth for you, but for me to jump from 4870 to this......
    - i am sorry, but it is not wort it........

Log in

Don't have an account? Sign up now