We always get very excited when we see a new GPU architecture come down the pipe from ATI or NVIDIA. For the past few years, we've really just been seeing reworked versions of old parts. NV40 evolved from NV30, G70 was just a step up from NV40, and the same is true with ATI as well. Fundamentally, not much has changed since the introduction of DX9 class hardware. But today, G80 ushers in a new class of GPU architecture that truly surpasses everything currently on the market. Changes like this only come along once every few years, so we will be sure to savor the joy that discovering a new architecture brings, and this one is big.

These massive architecture updates generally coincide with the release of a new DirectX, and guess what we've got? Thus we begin today's review not with discussions of pixel shaders and transistors, but about DirectX and what it will mean for the next-generation of graphics hardware, including G80.

DirectX 10

There has been quite a lot of talk about what DirectX 10 will bring to the table, and what we can expect from DX10 class hardware. Well, the hardware is finally here, but much like the situation we saw with the launch of ATI's Radeon 9700 Pro, the hardware precedes the new API. In the mean time, we can only look at our shiny new hardware as it performs under DX9. Of course, we will see full DX9 support, encompassing everything we've come to know and love about the current generation of hardware.

Even though we won't get to see any of the new features of DX10 and Shader Model 4.0, the performance of G80 will shine through due to its unified shader model. This will allow developers to do more with SM3.0 and DX9 while we all wait for the transition to DX10. In the mean time we will absolutely be able to talk about what the latest installment of Microsoft's pervasive graphics API will bring to the table.

More Efficient State and Object Management

One of the major performance improvements we will see from DX10 is a reduction in overhead. Under DX9, state change and draw calls are made quite often and can generate so much overhead that the API becomes the limiting factor in performance. With DX10, we will see the addition of state objects which hold all of the state information for a given pipeline stage. There are 5 state objects in DX10: InputLayout (vertex buffer layout), Sampler, Rasterizer, DepthStencil, and Blend. These objects can quickly change all state information without multiple calls to set the state per attribute.

Constant buffers have also been added to hold data for use in shader programs.

Each shader program has access to 16 buffers of 4096 constants. Each buffer can be updated in one function call. This hugely reduces the overhead of managing a lot of input for shader programs to use. Similar to constant buffers, texture arrays are also available in order to allow for much more data to be stored for use with a shader program. 512 equally sized textures can be stored in a texture array, and each shader is allowed 128 texture arrays (as opposed to 16 textures in DX9). The combination of 8Kx8K texture sizes with all this texture storage space will offer a huge boost in texturing ability to DX10 based games and hardware.

A new construct called a "view" is being introduced in DX10 which will allow resources to be used as more than one type of thing at the same time. For instance, a pixel shader could render vertex data to a texture, and then a vertex shader could use a view to interpret the data as vertex buffer. Views will basically give developers the ability to share resources between pipeline stages more easily.

There is also an DrawAuto call which can redraw an object without having to go back out to the CPU. This combined with predicated rendering should cut down on the overhead and performance impact of large numbers of draw calls currently being used in DX9.

GPUs get Virtual Memory
Comments Locked

111 Comments

View All Comments

  • JarredWalton - Wednesday, November 8, 2006 - link

    Page 17:

    "The dual SLI connectors are for future applications, such as daisy chaining three G80 based GPUs, much like ATI's latest CrossFire offerings."

    Using a third GPU for physics processing is another possibility, once NVIDIA begins accelerating physics on their GPUs (something that has apparently been in the works for a year or so now).
  • Missing Ghost - Wednesday, November 8, 2006 - link

    So it seems like by substracting the highest 8800gtx sli power usage result with the one for the 8800gtx single card we can conclude that the card can use as much as 205W. Does anybody knows if this number could increase when the card is used in DX10 mode?
  • JarredWalton - Wednesday, November 8, 2006 - link

    Without DX10 games and an OS, we can't test it yet. Sorry.
  • JarredWalton - Wednesday, November 8, 2006 - link

    Incidentally, I would expect the added power draw in SLI comes from more than just the GPU. The CPU, RAM, and other components are likely pushed to a higher demand with SLI/CF than when running a single card. Look at FEAR as an example, and here's the power differences for the various cards. (Oblivion doesn't have X1950 CF numbers, unfortunately.)

    X1950 XTX: 91.3W
    7900 GTX: 102.7W
    7950 GX2: 121.0W
    8800 GTX: 164.8W

    Notice how in this case, X1950 XTX appears to use less power than the other cards, but that's clearly not the case in single GPU configurations, as it requires more than everything besides the 8800 GTX. Here's the Prey results as well:

    X1950 XTX: 111.4W
    7900 GTX: 115.6W
    7950 GX2: 70.9W
    8800 GTX: 192.4W

    So there, GX2 looks like it is more power efficient, mostly because QSLI isn't doing any good. Anyway, simple subtraction relative to dual GPUs isn't enough to determine the actual power draw of any card. That's why we presented the power data without a lot of commentary - we need to do further research before we come to any final conclusions.
  • IntelUser2000 - Wednesday, November 8, 2006 - link

    It looks like putting SLI uses +170W more power. You can see how significant video card is in terms of power consumption. It blows the Pentium D away by couple of times.
  • JoKeRr - Wednesday, November 8, 2006 - link

    well, keep in mind the inefficiency of PSU, generally around 80%, so as overall power draw increases, the marginal loss of power increases a lot as well. If u actually multiply by 0.8, it gives about 136W. I suppose the power draw is from the wall.
  • DerekWilson - Thursday, November 9, 2006 - link

    max TDP of G80 is at most 185W -- NVIDIA revised this to something in the 170W range, but we know it won't get over 185 in any case.

    But games generally don't enable a card to draw max power ... 3dmark on the other hand ...
  • photoguy99 - Wednesday, November 8, 2006 - link

    Isn't 1920x1440 a resolution that almost no one uses in real life?

    Wouldn't 1920x1200 apply many more people?

    It seems almost all 23", 24", and many high end laptops have 1900x1200.

    Yes we could interpolate benchmarks, but why when no one uses 1440 vertical?

  • Frallan - Saturday, November 11, 2006 - link

    Well i have one more suggestion for a resolution. Full HD is 1920*1080 - that is sure to be found in a lot of homes in the future (after X-mas any1 ;0) ) on large LCDs - I believe it would be a good idea to throw that in there as well. Especially right now since loads of people will have to decide how to spend their money. The 37" Full HD is a given but on what system will I be gaming PS-3/X-Box/PC... Pls advice.
  • JarredWalton - Wednesday, November 8, 2006 - link

    This should be the last time we use that resolution. We're moving to LCD resolutions, but Derek still did a lot of testing (all the lower resolutions) on his trusty old CRT. LOL

Log in

Don't have an account? Sign up now