Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • Anonymous User - Thursday, October 23, 2003 - link

    Ever hear of journalistic integrity? He has a responsibility to be objective. He replies "also, there were no glaringly unplayable image quality issues on either side of the line."

    What a political answer - glaringly unplayable image quality issues? A $499 card shouldn't have any unplayable issues, heck even a $99 card should be playable.

    He's dodging the issue about playable image quality issues - missing or lower quality lighting effects for example. The point is that Nvidia has been caught lowering imager quality - removing the eye candy you are paying for in dx9 cards, and they have continued to do so.
  • Anonymous User - Thursday, October 23, 2003 - link

    Once again "IQ to come in part 2"... mebbe they will ... mebbe they won't... but they don't have a very good track record so far... and what is up with that choice of games? Go read the [H]OCP review... I may have been vocal against [H] in the past but there review of teh 5700 and 5950 is spot on with worthwhile gaming results.....

    I really fail to see how you recommend 5700 over 9600pro in this.... and skip all the NV 'driver bugs' too.... ah well nm... another nail in the AT coffin....
  • Anonymous User - Thursday, October 23, 2003 - link

    Hrmm, I see an NVida add on the top right of my screen. Ever see ATI adds ant anandtech? Know what complementary copy is?
  • Anonymous User - Thursday, October 23, 2003 - link

    Here's my conclusion: if you're gonna bitch and moan, read a different tech site. No one's forcing you to accept Derek's conclusions.

    I think some of you need to be a little more respectful with your comments and suggestions.
  • Anonymous User - Thursday, October 23, 2003 - link

    How can any conclusions be made without an image quality comparison. The "final words" section is based purely on the framerate numbers? How can you even draw a conclusion?
  • Anonymous User - Thursday, October 23, 2003 - link

    I'll just copy this from what I wrote at Beyond 3D:

    I was so confused by this comment from AT:

    AnandTech wrote: "In fact, NVIDIA has flipped the tables on ATI in the midrange segment and takes the performance crown with a late round TKO. It was a hard fought battle with many ties, but in the games where the NV36 based card took the performance lead, it lead with the style of a higher end card."


    That I tabulated my own results:

    NON AA
    ---------
    5700 wins 10 times
    9600 XT wins 6

    Where the 5700 won, it won on average by 15%
    Where the 9600 won, it won on average by 17%


    WITH AA / ANISO
    ---------
    5700 wins 6 times
    9600 wins 6 times

    Where the 5700 won, it won on average by 23%
    Where the 9600 won, it won on average by 54%

    There certainly is ZERO justification for saying something like: "but in the games where the NV36 based card took the performance lead, it lead with the style of a higher end card."

    That characteristic belongs to ATI, not nVidia.

    Another way to look at it: What percentage FPS difference is required to declare a "clear winner?"

    Let's say that less than 10% difference, the cards are tied. In this case:

    NO AA/ANISO
    ----------------

    5700 wins 6 tests
    9600 wins 4 tests

    When the 5700 wins, it's by an average of 22%
    When the 9600 wins, it's by an average of 22%

    With AA/Aniso
    ----------------
    5700 wins 4 tests
    9600 wins 6 tests

    When the 5700 wins, it's by an average of 33%
    When the 9600 wins, it's by an average of 54%


    I wish Anand's conclusions would actually agree with his data.
  • Anonymous User - Thursday, October 23, 2003 - link

    Hello? where are the hardware, software, and driver specs? Editorial review? What's that?
  • Anonymous User - Thursday, October 23, 2003 - link

    Separating Image Quality results from the review is completely misleading.
  • Anonymous User - Thursday, October 23, 2003 - link

    It's not unplayable image quality errors - the pics in the hardocp review show missing graphical features to enhance your gaming - ie walls with computers on them with nvidia with no blinking lights, on the ati it had purple and green blinking lights - yes playable on both - but when you pay $499 you want to see the game the way it was intended by the programmers. Same goes for the flashlight pics on hardocp , nvidia the flashlight beam is a mess, ati the flashlight beam is perfectly round like a real flashlight.

    Just another case of nvidia removing graphical effects to speed up their cards to compete with ati.
  • DerekWilson - Thursday, October 23, 2003 - link

    so, the cheapest 9800 Pro I see on new egg is a refurb for 280...

    also, there were no glaringly unplayable image quality issues on either side of the line.

    give us a chance to get everything we want to get done done wrt image quality. We've got a lot planned.

Log in

Don't have an account? Sign up now