No More Shader Replacement

The secret is all in compilation and scheduling. Now that NVIDIA has had more time to work with scheduling and profiling code on an already efficient and powerful architecture, they have an opportunity. This generation, rather than build a compiler to fit hardware, they were able to take what they've learned and build their hardware to better fit a mature compiler already targeted to the architecture. All this leads up to the fact that the 7800 GTX with current drivers does absolutely no shader replacement. This is quite a big deal in light of the fact that, just over a year ago, thousands of shaders were stored in the driver ready for replacement on demand in NV3x and even NV4x. It's quite an asset to have come this far with hardware and software in the relatively short amount of time NVIDIA has spent working with real-time compilation of shader programs.

All these factors come together to mean that the hardware is busy more of the time. And getting more things done faster is what it's all about.

So, NVIDIA is offering a nominal increase in clock speed to 430MHz, just a little more memory bandwidth (256bit memory buss running at a 1.2GHz data rate), 1.33x vertex pipelines, 1.5x pixel pipelines, and various increases in efficiency. These all work together to give us as much as double the performance in extreme cases. If the performance increase can actually be realized, we are looking at a pretty decent speed increase over the 6800 Ultra. Obviously, in the real world we won't be seeing a threefold performance increase in anything but a bad benchmark. In cases where games are CPU limited, we will likely see a much lower increase in performance, but performance double that of the 6800 Ultra is entirely possible in very shader limited games.

In fact, EPIC reports that under certain Unreal Engine 3 tests they currently see two to 2.4x improvements in framerate over the 6800 Ultra. Of course, UE3 is not finished yet and there won't be games out based on the engine for a while. We don't usually like reporting performance numbers from software that hasn't been released, but even if these numbers are higher than we will see in a shipping product, it seems that NVIDIA has at least gotten it right for one developer's technology. We are very interested in seeing how next generation games will perform on this hardware. If we can trust these numbers at all, it looks like the performance advantage will only get better for the GeForce 7800 GTX until Windows Graphics Foundation 2.0 comes along and inspires new techniques beyond SM3.0 capabilities.

Right now, each triangle that gets fed through the vertex pipeline, there are many pixels inside the object that needs her help.

Bringing It All Together

Why didn't NVIDIA build a part with unified shaders?

Every generation, NVIDIA evaluates alternative architectures, but at this time they don't feel that a unified architecture is a good match to the current PC landscape. We will eventually see a unified shader architecture from NVIDIA, but it will not likely be until DirectX itself is focused around a unified shader architecture. At this point, vertex hardware doesn't need to be as complex or intricate as the pixel pipeline. As APIs develop more and more complex functionality it will be advantageous for hardware developers to move towards a more generic and programmable shader unit that can easily adapt to any floating point processing need.

As pixel processing is currently more important than vertex processing, NVIDIA is separating the two in order to focus attention where it is due. Making hardware more generic usually makes it necessarily slower, but explicitly targeting a specific aspect of something can often improve performance a great deal.

When WGF 2.0 comes along and geometry shaders are able to dynamically generate vertex data inside the GPU we will likely see an increased burden on vertex processing as well. Being able to programmatically generate vertex data will help to remove the burden on the system to supply all the model data to the GPU.

Inside The Pipes Transparency AA, Purevideo, and HDTV
Comments Locked

127 Comments

View All Comments

  • WaltC - Thursday, June 23, 2005 - link

    I found this remark really strange and amusing:

    "It's taken three generations of revisions, augmentation, and massaging to get where we are, but the G70 is a testament to the potential the original NV30 design possessed. Using the knowledge gained from their experiences with NV3x and NV4x, the G70 is a very refined implementation of a well designed part."

    Oh, please...nV30 was so poor that it couldn't even run at its factory speeds without problems of all kinds--which is why nVidia officially cancelled nV30 production after shipping a mere few thousand units. JHH, nVidia's CEO went on record saying, "nV30 was a failure" [quote, unquote] at the time. nV30 was [i]not[/i] the foundation for nV40, let alone the G70.

    Indeed, if anything could be said to be foundational for both nV40 and G70, it would be ATi's R3x0 design of 2002. G70, imo, has far more in common with R300 than it does nV30. nV30, if you recall, was primarily a DX8 part with some hastily bolted on DX9-ish add-ons done in response to R300 (fully a DX9 part) which had been shipping for nine months prior to nV30 getting out of the door.

    In fact, ATi owes its meteoric rise to #1 in the 3d markets over the last three years precisely to the R3x0 products which served as the basis for its later R4x0 architectures. Good riddance to nV3x, I say.

    I'm always surprised at the short and selective memories displayed so often by tech writers--really makes me wonder, sometimes, whether they are writing tech copy for their readers or PR copy at the behest of specific companies, if you know what I mean.
  • JarredWalton - Thursday, June 23, 2005 - link

    98 - As far as I know, the power was measured at the wall. We use a device called "Kill A Watt", and despite the rather lame name, it gives accurate results. It's almost impossible to measure the power draw of any single component without some very expensive equipment - you know, the stuff that AMD and Intel use for CPUs. So under load, the CPU and GPU (and RAM and chipset, probably) are using far more power than at idle.
  • PrinceGaz - Thursday, June 23, 2005 - link

    I agree, starting at 1600x1200 for a card like this was a good idea. If your monitor can only do 1280x1024, you should consider getting a better one before buying a card like the 7800gtx. As a 2070/2141 owner myself, I know that a good monitor capable of high resolutions is a great investment that lasts a helluva lot longer than graphics cards, which are usually worthless after four or five years (along with most other components).

    I'm surprised that no one has moaned about the current lack of an AGP version, to go with their Athlon XP 1700+ or whatever ;)
  • Johnmcl7 - Thursday, June 23, 2005 - link

    I think it was spot on to have 1600x1200 as the minimum resolution, given the power of these cards I think 1024x768, no AA/AF results for 3Dmark2003/2005 which have been thrown around are a complete waste of time.

    John
  • Frallan - Thursday, June 23, 2005 - link

    Good review... And re: the NDA deadlines and the sleapless nights - don't sweat it if a few mistakes are published. The readers here have their heads screwed on the right way and will find the issues for soon enough. And for everyone that does not do 12*16 or 15*20 the answer is simple - U Don't Need The Power!! Save your hard earnt money and get a 6800gt instead.
  • Calin - Thursday, June 23, 2005 - link

    Maybe if you could save the game, change the settings and reload it you could obtain images from exactly the same positions. In one of the fence images, the distance to the fence is quite a bit different in different screenshots
  • Calin - Thursday, June 23, 2005 - link

    You had an 7800 SLI? I hate you all
    :p
  • xtknight - Thursday, June 23, 2005 - link

    Edit: last post correction: actually 21-page report!
  • xtknight - Thursday, June 23, 2005 - link

    Jeez...a couple spelling errors here and there...who cares? I'd like to see you type up a 12-page report and get it out the door in a couple days with no grammatical or spelling errors, especially when your main editor is gone. Remember that English study that showed the human brain interpreted words based on patterns and not spelling?

    I did read the whole review, word-for-word, with little to no trouble. There was not a SINGLE thing I had trouble comprehending. It's a better review than most sites have done which test lower resolutions. I love the non-CPU-limited benchmarks here.

    One thing that made me chuckle was "There is clearly a problem with the SLI support in Wolfenstein 3D". That MS-DOS game is in dire need of SLI. (It's abbreviated Wolfenstein: ET. Wolf3D is an oooold Nazi game.)
  • SDA - Thursday, June 23, 2005 - link

    Derek or Jarred or Wesley or someone:

    Did you measure system power consumption as how much power the computer drew from the wall, or how much power the innards drew from the PSU?


    #95, it's a good thing you know enough about running a major hardware site to help them out with your advice! :-)

Log in

Don't have an account? Sign up now