Compute & Tessellation Performance

As we mentioned in our look at the new Forceware 260 driver set, the drivers provided by NVIDIA for our testing of the GTS 450 have a broken OpenCL component, so we had to cut our compute benchmarking short by dropping our OpenCL benchmark. Furthermore our pre-release version of Badaboom with Fermi support doesn’t work either, so that also was dropped. What we offer instead is a much more abbreviated look at the GTS 450’s compute performance.

As was the case with GF104, GF106 is a superscalar design. With 2 warp schedulers, only 2 banks of 16 CUDA cores per SM can be put in use unless NVIDIA’s hardware can extract a degree of instruction level parallelism from the resulting code. As a result all of these GF104-derrived GPUs have a wider range of compute performance than what we’re used to. At the best case scenario of being able to extract ILP every clock, we achieve peak theoretical performance. The worst case scenario is 2/3rds that performance. So the GTS 450 can perform between a 192 CUDA core and 128 CUDA core card depending on the application in use.

For our look at computing performance we once again have the CUDA version of Folding @ Home. Using the Lambda work unit, we run a short benchmark that extrapolates the number of nodes per day the card would be able to process. All things considered the GTS 450 does quite well here compared to the rest of the Fermi family thanks to its high clock speed. It may only have around 57% as many CUDA cores as the GTX 460, but the higher clockspeed means that it’s just shy of 70% of the performance. Furthermore we’ve already established that this benchmark isn’t L2 cache or memory bandwidth limited, so even though the GTS 450 isn’t using a “full” GF106 chip here, it isn’t penalized for the limitation.

Our other benchmark is a quick look at tessellation. With the DirectX 11 Detail Tessellation sample program, we’re primarily looking at whether we can throw a high enough tessellation load at the GPU to overwhelm its tessellation abilities and bring it to its knees. In this case we cannot, as the GTS 450 scales from tessellation factor 7 to tessellation factor 11 by only a little below the rate of the GTX 480 and GTX 460, achieving 63% of its performance at factor 11 . This means that the GTS 460 should still has plenty of tessellation power for even this demanding sample, but of course this is heavily dependent on how much tessellation is used by future games.

One interesting thing is that because NVIDIA built its geometry units in to its Polymorph Engines, their geometry abilities scale in a way that AMD’s doesn’t thanks to AMD’s relatively constant fixed-function pipeline in the Radeon HD 5000 series. With the GTX 480 NVIDIA was advertising an 8-fold increase in geometry performance over the GTX 285, but with the GTS 450 NVIDIA is only talking about around a 2.4x increase over the GTS 250. This neatly showcases the much wider range of geometry performance in NVIDIA’s Fermi family. It also reinforces the fact that they need developers to fully utilize tessellation in order to maximize the geometry capabilities of the GTX 480, otherwise if a card like the GTS 450 is the geometry baseline, then scaling geometry capabilities through the Polymorph Engines will not have paid off.

Wolfenstein Power, Temperature, & Noise
POST A COMMENT

66 Comments

View All Comments

  • just4U - Monday, September 13, 2010 - link

    overclockers is the only review I've seen that shows the 250 in the mix and by the looks of it the 450 is a good 25-30% faster then the 250 on most games they tested with... what reviews are you reading?

    Personally I see no reason to rush out and buy two of these. A 460 is cheaper and to close in performance to justify it.
    Reply
  • marraco - Tuesday, September 14, 2010 - link

    http://www.tomshardware.com/reviews/geforce-gts-45... Reply
  • KG Bird - Monday, September 13, 2010 - link

    Nice review, but answer one question for me. Why does the HD 5770 scale well in crossfire while others like the HD 5870 don't? Reply
  • heflys - Monday, September 13, 2010 - link

    That's one of the great mysteries that has plagued the Crossfire setup. Might just be crappy drivers. Who knows..... Reply
  • Jedi2155 - Monday, September 13, 2010 - link

    Do it right...and do it right the first time is the name of the game. Reply
  • OCNewbie - Monday, September 13, 2010 - link

    Why would they compare a GTS 450 overclocked to a stock 5770? Wouldn't it make more sense to compare apples to apples, or in this case, OC'd versus OC'd? If you're gonna OC the GTS 450, then wouldn't it be reasonable to expect you'd also OC the 5770? Doesn't the 5770 OC quite well? This is probably even less of a debate, as far as which is the best performer, if you factor in OC'ng to BOTH cards. Reply
  • jabber - Tuesday, September 14, 2010 - link

    Most 5770 cards go up to 900/1300 pretty easy.

    Would leave the 450 even further in the dust however.
    Reply
  • Belard - Monday, September 13, 2010 - link

    Looks like AMD still has a solid product line of DX11 parts. So an end-user would still be looking at the older 210~250 cards for the $40~80 market.

    AMD could still easily reduce their prices across the board... but guess they're going to wait until the 6000 series ships and them blow out the 5000 series for cheap.

    If the 6750 comes out at $120, but good deal faster than the 5770, that is going to hurt.
    Reply
  • KingKuei - Monday, September 13, 2010 - link

    So what happened to the Release 260 drivers we were supposed to get this morning?

    Anand mentioned something about a bug in the driver related to OpenGL (or was it CUDA?) that they were going to fix before releasing it. Yet it's already late afternoon Monday and there's nothing on nVidia's site yet.

    The big deal for me actually is related to SC2. SUPPOSEDLY, this is the driver release that fixes many of the issues related to framerate drops in SC2. I care more about that than the GTS 450!
    Reply
  • Spazweasel - Tuesday, September 14, 2010 - link

    One thing to note: there is a low-profile (double slot) version of this card already available from Palit. I can't find a low-profile 5770. For low-profile cases, this is therefore likely the best you can currently get, and given it's a low-thermal-impact part, this makes sense.

    The ball is in your court, Powercolor/Sparkle!
    Reply

Log in

Don't have an account? Sign up now