Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • Anonymous User - Friday, October 24, 2003 - link

    #57, don't comment on something you literally have no clue about. I make $60,000 a year and I live alone and really have no expenses to speak of, so I have plenty of money to spend hardware. I'm just finally savvy and am not one of those blind dolts who doesn't research his hardware and makes impulse purchases. I would have thought your high school teachers would have taught that to you last year.

    #59, I'm questioning the point of buying a $500 video card, period. More to the point though, I'm questioning people's over-analyization (hum, not a word I guess, you get the point though) of IQ in games, ESPECIALLY fps games where constant movement makes it almost impossible to notice the IQ differences you are seeing between NVIDIA and ATI cards with the latest drivers right now. Even more to the point, you need a high quality monitor IN ADDITION to that high-end video card to make the purchase worthwhile. When all is said and done, you could be spending $700 on just your monitor + video, and easily as high as $1000. Unless you play games 24/7 and are unemployed, you need to rethink your video card purchases.

    And what planet have you been on? AnandTech has written like 50 reviews the last two months, the majority of them 10+ page in depth articles on the latest hardware. Heck, AT is more than likely making money hand over fist. What do you know, except that the video fanatics with no credentials to speak of are claiming AnandTech has "gone down hill". LMAO, right!
  • Anonymous User - Thursday, October 23, 2003 - link

    Well I do not see how the review data supports his conclusions. And I also question the point of a review without any IQ testing but on a lighter not.

    You notice how NVIDIA drivers are now called FORCEWARE, thats because they force you to use the trilinear filtering they want, not what you want.
  • Anonymous User - Thursday, October 23, 2003 - link

    Given the detail reductions seen in the IQ analyses of the sites that have done them, the trend is becoming alarmingly clear - at least until the NV40 arrives, nvidia will not stop optmizations that reduce IQ in favor of speed. This is not a necessary result of the NV3X architecture, but a result of the quality of the competitor's product.

    The glaring lack of trilinear filtering in stages 2+ of all the FXes, and the inferior antialiasing quality gives one pause to even compare similar settings between the cards. The "good enough except to everyone else" FP16 modes continue, and real-time HDR lighting at FP16 or lower (FX12) shows obvious banding.
    Therefore, pronouncing a winner at a high price ($0.5K!) level without IQ analysis and basing the judgement on absolute frame rates (without intentionally decreasing the output quality of the competition to make the IQ directly comparable), is simply sad.

    Adieu anandtech, adieu!
  • Anonymous User - Thursday, October 23, 2003 - link

    #56 If youre dumb enough to spend 500 bucks on a video card and not care about IQ you're wasting your money

    And who said Anand has ever been reputable? They haven't been doing too well lately
  • Anonymous User - Thursday, October 23, 2003 - link

    i just saw a add for ati i guess #10 is wrong
  • Anonymous User - Thursday, October 23, 2003 - link

    #56 just because I can afford a $500 graphics card and still have money to afford everything else i want, doesn't mean I'm dumb. Maybe your "dumb" because you can't find a job that pays enough to have that luxury. Sux to be you....
  • Anonymous User - Thursday, October 23, 2003 - link

    Yikes, the video card fanboys, that aren't gamers in the first place, come full force with the criticisms. It doesn't get any more pathetic than a video fanatic that doesn't actually play any or even some of the games AnandTech used and still criticizes this review based on the total over-analysis that is Beyond3D. Though B3D does a through job and that should always be commended.

    And WTH, since when has [H] ever been reputable? Why do you think one of their editors left just recently? Kyle doesn't know jack about GPU, video architecture, or pretty much anything else besides what he picked up from his years of hands-on experience. Actually, this is a good thing in a way, because Kyle (and Brent, etc.) write reviews from a gamers' perspective. Still, the over-analysis of IQ is getting ridiculous. If you're dumb enough to spend $500 on a video card in the first place, you don't deserve IQ analysis.
  • Anonymous User - Thursday, October 23, 2003 - link

    #53/54: You each have a respective product that you enjoy, that doesn't give you free reign to lord minor victories for your preferred product over everyone else or whine that whenever a benchmark doesn't go your way that the competitor is cheating or lying or optimizing or buying a victory with ads.
    Overall, nVidia and ATI really don't care what you think about them, as long as they're making money off of you; I agree that both companies should be chastised for their optimizing of their drivers, but neither is doing anything that others haven't done in the past. You just haven't noticed it until someone started slinging mud in order to try and gain more market share (ATI). I really wish ATI had shut their pieholes about the "cheats", since now everyone and their mother is throwing accusations of cheating in their competitor's drivers as soon as the competition wins a benchmark or takes performance lead in an application.
    Utterly terrible. Fanboys, stop your engines, for you're just spinning your wheels. Go get a job with this newfound free time and buy more products from the companies you so vehemently support over the Intarweb 2.0, so that we may see better products sooner rather than later.
  • Anonymous User - Thursday, October 23, 2003 - link

    that comment just shows that you are just as childish as the rest of them #53
  • Anonymous User - Thursday, October 23, 2003 - link

    awww boo hooo

    ati fanboys getting pissed? criticizing every site that doesn't agree with you is not going to help your credibility you know.

Log in

Don't have an account? Sign up now