All GPUs are Created Equal: Say Goodbye to Cap Bits

DX9 allows quite a bit of flexibility in implementation. ATI and NVIDIA are free to do things a little differently as they see fit. In order for software to understand how fully the hardware supports the required and optional features of DX9, the hardware has specific capability bits set that describe its features. Microsoft has eliminated this feature from DX10. Software written for DX10 will not have to worry about checking cap bits for DX10 hardware. This is due to the fact that Microsoft has been much more specific about the features required to support DX10. There will still be differences in implementation, optimizations, performance characteristics and the like, but all DX10 hardware will have the same basic feature set to draw from. On the down side, hardware vendors who want to add custom features will have to rely on OpenGL (which allows custom vendor specific extensions to the API).

This will make things much easier for game developers, as they won't have to worry about not having a specific feature around to use for an effect or rendering technique. This is also another step in the direction of eliminating the need for multiple GPU specific rendering paths. We can't say that developers won't write different code for different hardware, because we don't know anything about the differences in performance characteristics at this point. We do know from past experience (with NV30) that even something as simple as the order in which code is executed can make a significant difference in performance. We would like to think that issues like this won't present themselves, but we'll have to wait and see when more hardware and software comes along.

In order to avoid programming issues like the initial NV30 + SM2.0 problems, Microsoft will only allow HLSL (High Level Shader Language) to be used with DX10. This means no low level shader ASM optimization, but it also means that each graphics hardware maker will have full control over how shaders get compiled. There is certainly a trade off here, but this should help keep developers from inadvertently doing something that severely hampers performance on any given architecture.

If DirectX 10 sounds like a great boon to software developers, the fact that DX10 will only be supported in Windows Vista is certain to curb enthusiasm. Other than Vista-only games, all developers will still be required to support DX9 in order to keep the installed Windows XP user base as part of their target market. Some developers have actually made comments to the effect that DX10 is more of a headache than a help right now, and that won't change until they are able to abandon support of older hardware. Hopefully, the DX10 performance and feature benefits will be enough to encourage people to upgrade sooner rather than later, but if the past is any indication it could be several years before DX9 is abandoned by the majority of users and developers.

Unified Shaders

Unified shaders aren't actually a feature as much as a result of DX10. This is a small point that seems to get lost in the shuffle, but Microsoft doesn't require a specific implementation for DX10 compliance: they simply made a better implementation more feasible. Until now, building a GPU with unified shaders would not been have desirable, let alone practical, but Shader Model 4.0 lends itself well to this approach.

We haven't seen unified shaders yet because we didn't need or want them. Up to SM2.0, vertex shaders had a higher precision requirement than pixel shaders. While 32bit floating point was required for compliance at the vertex level, 24bit was all that was needed for full precision in pixel shaders. Partial precision hints were added to accommodate 16bit pixel shaders on NVIDIA hardware. It wouldn't have been practical at the launch of DX9 to require that all shader units be 32bit. The same goes for including pixel oriented features in the vertex shader hardware: the API didn't support it, so there was no need to include it. The R300 GPU is 218mm^2 with only 107 Million transistors, and adding any more complexity than necessary would have certainly produced a much larger chip than they would have been able to handle on the 150nm process employed at the time. These days, we are able to do much more in the same space: ATI's latest chip, the RV570, is about 230mm^2 and has 330 Million transistors.

It is much cheaper, easier, and more efficient to build hardware to fit exactly what is required of each step in the rendering pipeline. This is as true with older hardware as it is with G80. Now that DX10 calls for full 32bit in each shader and nearly the same functionality for both vertex and pixel shader units, it doesn't make sense to duplicate and segregate the hardware. Now that functionality can't be excluded from either vertex or pixel processing, hardware designers are optimizing their parts to make the most efficient use of space. It just so happens that the best way to do this and meet the requirements of DX10 is with unified shaders.

GPUs get Virtual Memory Shader Model 4.0 Enhancements
Comments Locked

111 Comments

View All Comments

  • Sunrise089 - Thursday, November 9, 2006 - link

    Then I suppose he's in the market to part with an ugly old high-end CRT. I'd love to buy it from him. Seriously.
  • JarredWalton - Thursday, November 9, 2006 - link

    You want an older 21" Cornerstone CRT? It's a beast, but you can have it for the cost of shipping (which unfortunately would probably be ~$50). I'd also sell my Samsung 997DF 19" CRT for about $50, and maybe an NEC FE991-SB for $50 (which unfortunately has a scratch from my daughter in the anti-glare coating). If anyone lives in the Olympia, WA area, you know how to contact me (I hope). I'd rather someone come by to pick up any of these CRTs rather than shipping, as I don't think I have the original boxes.
  • DerekWilson - Thursday, November 9, 2006 - link

    lol next thing you know links to ebay auctions are gonna start showing up in our articles :-)
  • yyrkoon - Thursday, November 9, 2006 - link

    lol, I've got a 21" techtronics I'll sell for $200 usd, plus shipping ;) Hasnt been used since I purchased my Viewsonic VA1912wb (well, been used very little ).
  • imaheadcase - Wednesday, November 8, 2006 - link

    can't stand AA benchmarks myself :)

    Question: Do you have any info on what kinda card nvidia releasing this feb? Is it something in between these 2 cards or something even lower?

    Im looking for a $300ish g80! :D
  • flexy - Wednesday, November 8, 2006 - link

    if ANYTHING counts then how those high-end cards perform WITH their various AA settings.... the power of those cards (and the money spent on :) RIGHT translated into ---> IMAGE QUALITY/PERFORMANCE.

    Please dont tell you you would get an G80 but do NOT care about AA, this does NOT make any sense...sorry...

    I am especially impressed reading that transparency AA has such a LITTLE performance impact. What game engine did you test this on ?

    On the older ATI cards (and am i right that T.A. is the same as "adaptive antialiasing" ? )...this feature (depending on game engine) is the FPS killer....eg. w/ games like oblivion (WHERE ARE THE GOTHIC 3 BENCHEIS BTW ? :)...much vegetation etc. game-engines.

    Enable transparency AA and see all those trees, grass etc. without jaggies.

  • imaheadcase - Thursday, November 9, 2006 - link

    Well lots of people don't are for AA. Even if i had this card I would not use it. I visually see NO difference with it on or off. Its personal test. I don't even see "jaggies" on my older 9700 PRO card.
  • flexy - Thursday, November 9, 2006 - link

    you sure are talking about ANTIALIASING ???

    What resoltions do you run ? Not that my CRT can even handle more than 1600x1200..but even w/ 1600 i get VERY prominent jaggies if i dont run AA.

    I made it a habit to run at least 4xAA in ANY game, and some engines (hl2:source engine) etc. run extremely well with 4xAA, even 6xAA is very playable at elast with HL2.

    The very recent games, namely NWN2 and G3 now dont support AA, playing at 1280x1024 and it looks utterly horrible ! If you say you dont see jags in say ANY resolution under 1600..very hard to believe
  • imaheadcase - Thursday, November 9, 2006 - link

    Yes im talking about antialiasing. I normally play BF2 and oblivion at 1024x768 (9700 pro remember).

    Fact is most people won't see them unless someone points them out. The brain is still better at rendering stuff the way you want to see it vs hardware :)
  • flexy - Thursday, November 9, 2006 - link

    ok..but then it's also a performance problem. If it doesn't bother you, well ok.
    I also have to settle w/ the fact that many RECENT games are even unable to do AA..however i wish they would.

    But once i get a 8800 i will do &&&& to get the most out of IQ, AA, AF, transparency/texture AF, you name it. ALONE also for the reason that i would need a super-high end monitor first to even run resolutions like 2000xsomething...and as long as i have a lame 19" CRT and CANNOT even go over 1600 (99,99% of games even running everything on 1280x or 1360x) i will use all the power to get out best possible IQ in those low resolutions.

    Also..looking at the benchmarks..its NOT that you lose any real time gaming-experiencee since THOSE monster cards are made for exactly this...eg. running oblivion with all those settings at MAX AND AA on and HDR...and you are still in VERY reasobale FPS ranges.

Log in

Don't have an account? Sign up now