Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • Anonymous User - Saturday, October 25, 2003 - link

    #49

    There are no DX9 cards....They run it via a DX9 wrapper since the native DX9 in-hardware support sucks more than Jenna Jameson on a gang bang movie.

    Horror stories? Like that of Cat 3.8 burning up monitor crap? Give me a break you idiot, I can say the same crap against NV without any proof, yet I lost 2 GFMX with a real bug on Det 6.xx where the speed of the GPU and memory doubled once you got out of Standby mode. Get back to your sandbox kid.
  • Anonymous User - Saturday, October 25, 2003 - link

    ""jesus, fanatics get all pissy if their card loses in FPS tests... you act like every consumer who reads this review will be swayed into believing that NV sells a superior midrange card... its obvious that the "ATI v NV" battle is personal to u... my only question is why? are you guys trying to justify your purchases by bashing something that poses a threat? personally, i dont let hardware sites choose what i buy... i often times purchase 2 contending cards, and take it upon myself to determine which is better... the winner stays in my machine, the loser goes back to where it came from...""



    My own personal reason is to save a few from lunatic flamers like you how post just rage instead of reason to support your standpoint.....pathetic.
  • Anonymous User - Saturday, October 25, 2003 - link

    Yup. HE couldn't do anything about scores (inevitable), so he proceeded to take everything NVidia said for granted (as usual).

    That's like listening to OJ SImpson acussing everyone else of being a murderer.

    DOn't care if there were developers there...ALL HAD WAY BIGGER ISSUES with FX cards and said nothing, but to show a fe selected screenshots on TWIMTBP games, and everyone with some brains knows NVidia pushes for optimized code on that software and/or code that will not work right on the competition.....old news.
  • Anonymous User - Saturday, October 25, 2003 - link

    "#19, I don't think it is a fanboy thing. It's an AT thing that's costing them their respect from other hardware sites and readers."


    Wiser words are yet to be spoken.
  • Anonymous User - Saturday, October 25, 2003 - link

    Driver optimization in Assembly explanation is complete BS.
    Easy and nice to rearrange commands and to clean code, but NO INFO about reducing data from 24/32bit registers to 16 bit....give me a break. No IQ loss? They should change their 14" 45Hz monitors for something more up to date, and please, use LOOSELESS images at HIGH RESOLUTIONS if you dare to compare IQ....Beyond3D is definitively light years beyond you at this.

    The benchmark that's hurting NVidia more besides HL2 is of no use because of a "strange" crash. WTF? How other sites can do it? Can some one plz explain them how to install software properly?

    Fun to see how NVidia "completely dominates" when it wins by 2-3% but "it can take a punck to the chin" when is trailing by a similar number.

    TR was omitted, but they admit X2 runs like crap in FX, yet they put the scores in.

    For the last time...Gunmetal onla has 2 Vertex Shader 2.0 instructions....just to be called a DX9 test...thats all. PS are of 1.1 level.
    Aquamark just uses 4 PS 2.0...dunno about VS 2.0 if any.
    Now, Tomb Raider uses 12 PS 2.0. The game can be crappy but there are plenty used, yet that "starnge crash" wont allow people to see a future-proff scenario.



    ...should I go on?

    This is a big bunch of tree hugging hippie crap.
    [A] for sure will have a happy christmas...I wonder how much was it.
  • Anonymous User - Saturday, October 25, 2003 - link

    #85 what do you prefer? do you prefer playing 1024x768 @ 60 fps or playing 1600x1200 +FSAA 8x/6x +AA 8x @ 19 fps?
  • Anonymous User - Saturday, October 25, 2003 - link

    Is it just me or could this massive 24 page review have been fit easily into about 10 pages. I spent more time clicking to get to the next page than actually reading the review. I guess that's one way to keep your page view numbers high if you can't provide a decent analysis of the product you are reviewing.
  • Anonymous User - Saturday, October 25, 2003 - link

    LOL 83 you call other hardware sites IQ comparisons shoddy? Boy you have some mouth, go look at Anands IQ comparison in the high-end shootout. This pictures are tiny, compressed jpgs, they do NOT come in fullscreen versions, and most of them omit the ground! a part that should be required to see in any comparison. No wonder they didnt see any IQ problems, they couldnt even SEE Nvidias new filtering method because they dont even show the ground, where filtering IQ is most noticeable. Their Iq comparisons are complete BS and look like they are purposely trying to hide something from their users
  • Anonymous User - Saturday, October 25, 2003 - link

    ()_()
    ( ._.)‹^›
    ((")(")
  • Anonymous User - Saturday, October 25, 2003 - link

    I think this is hilarious,

    The day that a new graphics card comes out, one of the most REPUTABLE hardware review sites busts there ass to write an article about it and gets downplayed.

    They admit that their review is not a full review because there are other factors that they would like to invest more attention to and will release a part 2 at a later date. They go as far as to even say, "we still have more to come in the form of image quality analysis. Our findings in that arena will affect what we recommend just as much as pure speed." Which still seems unsatisfying.

    From what I've seen from EVERY other hardware review site, their IQ examples are shoddy at best (here's two images where one is f'd up, compare). A few of these posters are also much more versed in IQ technology than the rest of us (comments about trilinear filtering in a compiler setting) which is applauded and most likely the type of insight that AT will be devoting to there analysis on IQ.

    The majority of these posts, however, are nothing more than a chance for some immature limp dick computer junkie to get his rocks off by chastising one of the biggest names in hardware anonymously. I will continue to come this site and read the reviews, to learn about new technology and drool high performance electronics. And I will continue to read these comment boards, but mostly as a reminder of how pathetic some folks can be and to get a good laugh every once in awhile (still trying to get past Thomas Jefferson supporting Anal Fisting).

    -The Ways

Log in

Don't have an account? Sign up now