Final Words

There's no question that NVIDIA has built a very impressive chip with the GT200. As the largest microprocessor we've ever reviewed, NVIDIA has packed an unreal amount of computational horsepower into the GT200. What's even more impressive is that we can fully expect NVIDIA to double transistor count once again in about 18 months, and once more we'll be in this position of complete awe of what can be done. We're a little over a decade away from being able to render and display images that would be nearly indistinguishable from reality, and it's going to take massive GPUs like the GT200 to get us there.

Interestingly, though, AMD has decided to make public its decision to go in the opposite direction. No more will ATI be pushing as many transistors as possible into giant packages in order to do battle with NVIDIA for the coveted "halo" product that inspires the masses to think an entire company is better because they made the fastest possible thing regardless of value. The new direction ATI will go in will be one that it kind of stumbled inadvertently into: providing midrange cards that offer as high a performance per dollar as possible.

With AMD dropping out of the high end single-GPU space (they will still compete with multiGPU solutions), NVIDIA will be left all alone with top performance for the forseable future. But as we saw from our benchmarks, that doesn't always work out quite like we would expect.

There's another very important aspect of GT200 that's worth considering: a die-shrunk, higher clocked version of GT200 will eventually compete with Intel's Larrabee GPU. The GT200 is big enough that it could easily smuggle a Penryn into your system without you noticing, which despite being hilarious also highlights a very important point: NVIDIA could easily toss a high performance general purpose sequential microprocessor on its GPUs if it wanted to. At the same time, if NVIDIA can build a 1.4 billion transistor chip that's nearly 6x the size of Penryn, so can Intel - the difference being that Intel already has the high performance, general purpose, sequential microprocessor that it could integrate alongside a highly parallel GPU workhorse. While Intel has remained relatively quiet on Larrabee as of late, NVIDIA's increased aggressiveness towards its Santa Clara neighbors is making more sense every day.

We already know that Larrabee will be built on Intel's 45nm process, but given the level of performance it will have to compete with, it wouldn't be too far fetched for Larrabee to be Intel's first 1 - 2 billion transistor microprocessor for use in a desktop machine (Nehalem is only 781M transistors).

Intel had better keep an eye on NVIDIA as the GT200 cements its leadership position in the GPU market. NVIDIA hand designed the logic that went into much of the GT200 and managed to produce it without investing in a single fab, that is a scary combination for Intel to go after. It's not to say that Intel couldn't out engineer NVIDIA here, but it's just going to be a challenging competition.

NVIDIA has entered a new realm with the GT200, producing a world class microprocessor that is powerful enough to appear on even Intel's radar. If NVIDIA had the ability to enable GPU acceleration in more applications, faster, then it would actually be able to give Intel a tough time before Larrabee. Fortunately for Intel, NVIDIA is still just getting started on moving into the compute space.

But then we have the question of whether or not you should buy one of these things. As impressive as the GT200 is, the GeForce GTX 280 is simply overpriced for the performance it delivers. It is NVIDIA's fastest single-card, single-GPU solution, but for $150 less than a GTX 280 you get a faster graphics card with NVIDIA's own GeForce 9800 GX2. The obvious downside to the GX2 over the GTX 280 is that it is a multi-GPU card and there are going to be some situations where it doesn't scale well, but overall it is a far better buy than the GTX 280.

Even looking to the comparison of four and two card SLI, the GTX 280 doesn't deliver $300 more in value today. NVIDIA's position is that in the future games will have higher compute and bandwidth requirements and that the GTX 280 will have more logevity. While that may or may not be true depending on what actually happens in the industry, we can't recommend something based on possible future performance. It just doesn't make sense to buy something today that won't give you better performance on the software that's currently available. Especially when it costs so much more than a faster solution.

The GeForce GTX 260 is a bit more reasonable. At $400 it is generally equal to if not faster than the Radeon HD 3870 X2, and with no other NVIDIA cards occupying the $400 pricepoint it is without a competitor within its own family. Unfortunately, 8800 GT SLI is much cheaper and many people already have an 8800 GT they could augment.

The availability of cheaper faster alternatives to GT200 hardware is quite dangerous for NVIDIA, as value does count for quite a lot even at the high end. And an overpriced high end card is only really attractive if it's actually the fastest thing out there.

But maybe with the lowered high end threat from AMD, NVIDIA has decided to make a gutsy move by positioning its hardware such that multiGPU solutions do have higher value than single GPU solutions. Maybe this is all just a really good way to sell more SLI motherboards.

Overclocked and 4GB of GDDR3 per Card: Tesla 10P
Comments Locked

108 Comments

View All Comments

  • junkmonk - Monday, June 16, 2008 - link

    I can has vertex data? LMFAO, hahha that was a good laugh.
  • PrinceGaz - Monday, June 16, 2008 - link

    When I looked at that, I assumed it must be a non-native English speaker who put that in the block. I'm still not entirely sure what it was trying to convey other than that the core will need to be fed with lots of vertices to keep it busy.
  • Spoelie - Tuesday, June 17, 2008 - link

    http://icanhascheezburger.com/">http://icanhascheezburger.com/
    http://icanhascheezburger.com/tag/cheezburger/">http://icanhascheezburger.com/tag/cheezburger/
  • chizow - Monday, June 16, 2008 - link

    Its going to take some time to digest it all, but you two have done it again with a massive but highly readable write-up of a new complex microchip. You guys are still the best at what you do, but a few points I wanted to make:

    1) THANK YOU for the clock-for-clock comparo with G80. I haven't fully digested the results, but I disagree with your high-low increase thresholds being dependent on solely TMU and SP. You don't mention GT200 has 33% more ROP as well which I think was the most important addition to GT200.

    2) The SP pipeline discussion was very interesting, I read through 3/4 of it and glanced over the last few paragraphs and it didn't seem like you really concluded the discussion by drawing on the relevance of NV's pipeline design. Is that why NV's SPs are so much better than ATI's, and why they perform so well compared to deep piped traditional CPUs? What I gathered was that NV's pipeline isn't nearly as rigid or static as traditional pipelines, meaning they're more efficient and less dependent on other data in the pipe.

    3) I could've lived without the DX10.1 discussion and more hints at some DX10.1 AC/TWIMTBP conspiracy. You hinted at the main reason NV wouldn't include DX10.1 on this generation (ROI) then discount it in the same breath and make the leap to conspiracy theory. There's no doubt NV is throwing around market share/marketing muscle to make 10.1 irrelevant but does that come as any surprise if their best interest is maximizing ROI and their current gen parts already outperform the competition without DX10.1?

    4) CPU bottlenecking seems to be a major issue in this high-end of GPUs with the X2/SLI solutions and now GT200 single-GPUs. I noticed this in a few of the other reviews where FPS results were flattening out at even 16x12 and 19x12 resolutions with 4GHz C2D/Qs. You'll even see it in a few of your benches at those higher (16/19x12) resolutions in QW:ET and even COD4 and those were with 4x AA. I'm sure the results would be very close to flat without AA.

    That's all I can think of for now, but again another great job. I'll be reading/referencing it for the next few days I'm sure. Thanks again!
  • OccamsAftershave - Monday, June 16, 2008 - link

    "If NVIDIA put the time in (or enlisted help) to make CUDA an ANSI or ISO standard extention to a programming language, we would could really start to get excited."

    Open standards are coming. For example, see Apple's OpenCL, coming in their next OS release.
    http://news.yahoo.com/s/nf/20080612/bs_nf/60250">http://news.yahoo.com/s/nf/20080612/bs_nf/60250
  • ltcommanderdata - Monday, June 16, 2008 - link

    At least AMD seems to be moving toward standardizing their GPGPU support.

    http://www.amd.com/us-en/Corporate/VirtualPressRoo...">http://www.amd.com/us-en/Corporate/VirtualPressRoo...

    AMD has officially joined Apple's OpenCL initiative under the Khronos Compute Working Group.

    Truthfully, with nVidia's statements about working with Apple on CUDA in the days leading up to WWDC, nVidia is probably on board with OpenCL too. It's just that their marketing people probably want to stick with their own CUDA branding for now, especially for the GT200 launch.

    Oh, and with AMD's launch of the FireStream 9250, I don't suppose we could see benchmarks of it against the new Tesla?
  • paydirt - Monday, June 16, 2008 - link

    tons of people reading this article and thinking "well, performance per cost, it's underwhelming (as a gaming graphics card)." What people are missing is that GPUs are quickly becoming the new supercomputers.
  • ScythedBlade - Monday, June 16, 2008 - link

    Lol ... anyone else catch that?
  • Griswold - Monday, June 16, 2008 - link

    Too expensive, too power hungry and according to other reviews, too loud for too little gain.

    The GT200 could become Nvidias R600.

    Bring it on AMD, this is your big chance!
  • mczak - Monday, June 16, 2008 - link

    G92 does not have 6 rop partitions - only 4 (this is also wrong in the diagram). Only G80 had 6.
    And please correct that history rewriting - that the FX failed against radeon 9700 had NOTHING to do with the "powerful compute core" vs. the high bandwidth (ok the high bandwidth did help), in fact quite the opposite - it was slow because the "powerful compute core" was wimpy compared to the r300 core. It definitely had a lot more flexibility but the compute throughput simply was more or less nonexistent, unless you used it with pre-ps20 shaders (where it could use its fx12 texture combiners).

Log in

Don't have an account? Sign up now