Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • Anonymous User - Saturday, October 25, 2003 - link

    The irregularities ATi's drivers allegedly display in AquaMark 3 and UT2003 require further investigation. Factors such as image quality, driver reliability, and compatibility are hard to convey in a review anyway.
    this is from tom
    so to al the ati lovers here go #"&' yourself
    a few weeks/months ago you guys said that nvidia cheated
    and now ati does you still have a big mouth
    no i am not an nvidia lover
    i am de performance lover ( for me ati maybe change with the nv 40) but you guys AAAARRRGGGGHHHH
    btw i havent read page 4 and 5 too busy
  • Anonymous User - Friday, October 24, 2003 - link

    Speaking of the Hardocp review. I did notice on the intial review at nvnews that their screen shot of Halo shows the blinking lights in the hallways that Hardocp said were not there. They are using the 5950 and the latest drivers, so it would seem one of the two sites made a booboo.
  • Anonymous User - Friday, October 24, 2003 - link

    Excuse me, but if a review without an IQ comparions is ok, why even care about video cards at all? If image quality isn't important because yu cant really notice it in games, go buy yourself a geforce3 and knock yourself out, since it will play everything fine and you dont have to worry about image quality because you can even turn it up, how about that?

    And you dont need to zoom in anything to see Nvidia's new filtering method which is now, worse than ATI's. The bilinear filtering is ESPECIALLY noticeable in motion and causes the same kind of effect that aliasing does. Go look in the hardocp for yourself, especially in nascar and flight sim
  • Anonymous User - Friday, October 24, 2003 - link

    Wow. Anyone who whines about people demanding actual image quality comparisons is certainly NOT a gamer, or at least not one with decent hardware and eyes.

    I'll school all of your asses at UT and I'm damn sure not going to do it with dynamic detail reduction(TM), not for $200 or more.

    If the IQ differences between the cards are so minimal, why is it readily obvious when playing FS2004 and TR:AOD and UT2003 which card you're playing on? I'll tell you why:
    Because the ground textures on FS2004 look like crap, and trilinear filtering DOES NOT WORK AT ALL AFTER THE FIRST STAGE, REGARDLESS OF APPLICATION OR DRIVER CONTROL SETTINGS in D3D with the FX family.
    Instead, we get slightly improved bilinear that looks visually inferior to trilinear by a mile.

    And you know what?
    It's a lot EASIER to see WHEN YOU"RE PLAYING THE GAME, because the texture 'horizon' is always moving.
    Not that anyone who's fawning over AT would know. An FX5200 ain't gonna show you son.
  • Anonymous User - Friday, October 24, 2003 - link

    watchu' talkin'bout willis?!

    watchu' talkin'bout willis?!

    watchu' talkin'bout willis?!



  • WooDaddy - Friday, October 24, 2003 - link

    #62, Live, I agree.

    Derek, thanks for the review. I really liked the fact that the Ti4200 was included. REALLY helpful. I think I can hold out for a while. If not, the ATI 9700PRO will be considered.
  • Jeff7181 - Friday, October 24, 2003 - link

    Ok... I checked out the other reviews... and HardOCP's results differ from AT's... but Tom's look pretty much the same.
    AT did come up with a different conclusion though, saying the FX5700 is a better buy than the 9600XT. And I agree. I know many of you will get your shorts in a knot about this, but ATI's driver quality still isn't up to par with nVidia's. A friend of mine has had nothing but trouble getting his 9800 Pro to work correctly.
    In my opinion, ATI will have to take a hefty lead in the performance area to make up for the driver problems to get their card into my rig.
  • Jeff7181 - Friday, October 24, 2003 - link

    I'll have to take a look at the reviews by other sites... but personally, on my Aopen FX5900 @ 490/950, everything looks great. The quality is better than the 45.23's in my opinion. Taking a look at a still picture that is blown up 400X to compare individual pixels is stupid. What might look worse pixel per pixel may look better at normal size, frame by frame.
  • Anonymous User - Friday, October 24, 2003 - link

    ok..5700ultra seems fine... but i bought 5600ultra 1-2 months ago... what will happen to me... ????
    :(
  • Anonymous User - Friday, October 24, 2003 - link

    i think we need a butt pirate joke right about now

Log in

Don't have an account? Sign up now