Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • Anonymous User - Thursday, October 23, 2003 - link

    All I can say is good-bye and good riddance to Anandtech. HardOCP cleaned up their act. It's time for a house cleaning here.

    Delete ...
  • Anonymous User - Thursday, October 23, 2003 - link

    Alrighty, #7 I really hope that you don't trust those numbers you posted and would discount anything of that nature as pure BS after seeing the graphs.

    (e.g. Some of the differences in the performance of all the cards was less than 10 percent on many of the DX8 test, and your percentages are incredibly ludicris...I dunno, maybe your ATI renders funny graphs)

    The point is that nVidia has finally released a card that is competitive and in some cases superior to other technology. Derek isn't saying "nVidia wins", but more like "nVidia is finally starting to come around." The Final Word comments are strictly by his experience, do you have one of these cards? will your opinion of the IQ differ from mine?

    I think the greatest comment and truest is the one posted in all of the latest graphix card articles: "Wait until your game comes out to buy a new card". It seems like all the lemmings out there are so anxious to throw away there money to have the fastest thing on the market that they seriously get their feelings hurt by the prospect that what they want may not be the best. As an nVidiot myself, I am glad to see the Geforce line of cards starting to come around and admit that the Radeon 9700 Pro is definitely the greatest piece of hardware created since the Geforce 3 quite a few years ago. I am glad both companies are staying competitive, but will always root for my favorite team.

    Make no mistake, fuzzy math is about as logical as strategery. And some of you guys really need a life.

    -The Ways
  • Anonymous User - Thursday, October 23, 2003 - link

    #49, i hate to tell you but basically all optimizations and new filtering methods only apply to FX cards
  • Anonymous User - Thursday, October 23, 2003 - link

    The message is clear--
    Oh, wait, it's not. I personally like nVidia's products, and am leery to jump to ATI, because I've heard lots of horror stories about the Cats completely screwing up a system. Yeah, I know that the Dets are supposed to be "Cheatenators" if you listen to fanATIcs, but I haven't had any complaints about my gameplaying using my overclocked Ti4200 in Halo, UT2k3, or Max Payne 2...
    I'm glad to see nVidia pushing out a decent DX9 midrange card, but I'm not glad to see it not taking the performance crown and almost LOSING to a card that's a full generation behind it in API support!! Not to mention that the 5700 will be obsolete and pretty much bargain basement by the time the games that matter in DX9 come out, like HL2 and Doom3...
    So, the message is NOT clear. The winner remains to be seen, because this review is not finished, nor is the 5700 the last card nVidia's ever going to make. When we see NV40 and R420, then we can talk about the message being clear. Until then, it's all very fuzzy and dependant on which company you trust more... Well, that and how cheap the card is. :P
  • Anonymous User - Thursday, October 23, 2003 - link

    Did anyone else notice in the Nvidia PDF that the Det 50's offer AMD64 support? This sounds to me like it can work in a 64bit operating system. Am I wrong about this? It says it on page 19 of the PDF.
  • Anonymous User - Thursday, October 23, 2003 - link

    #45 it would be simple. when you click on an image to compare, just make the images pop-up in a little box, kind of like this comments box.

    also, maybe instead of making a conclusion at the end of the review, it might have been better to say "to be continued" or something like that.

    i kind of agree with what others are saying, how can you recommend something if you have only run half the tests so far...? seems like the conclusion came a little premature...

    what happens if ati comes out on top in the 2nd round of iq tests?

    will the recommendation get flip-flopped?




  • sandorski - Thursday, October 23, 2003 - link

    It's nice to see Nvidia competing on performance again. However, these visual anomalies and jerkiness gives pause.

    Re Final Word: It seems rather odd that such statements would be made after the first part of a 2 part review, especially when the first part brings up some potentially serious issues that the second part will examine further.
  • Anonymous User - Thursday, October 23, 2003 - link

    #43

    uhm... tell me how Derek can do that? when the screen real estate is obviously taken over by those funky ads ;) No way! sponsors first! they paid for that space. We just have to learn to squint. those graphs look colorful though, i might add.

    even if the game stutters when i play or i see artifacts i will sure be reminded by those graphs and continue to be inpired. wooohoooooooooooo!
  • Anonymous User - Thursday, October 23, 2003 - link

    The 5900 non-ultra at $220 looks like a better deal.

    http://www.newegg.com/app/ViewProduct.asp?descript...
  • Anonymous User - Thursday, October 23, 2003 - link

    I don't think it's right to make any recommendation unless IQ and Framerates are taken into consideration. And not the little bitty screenshots that I had to squint at from the last review.

    Derek, it would help if you made the screencaps larger, and made them animated so that the differences could be seen. For instance, someone put the images you used for that F1 Racer sim in a gif. Looking at the images side by side for the 51.xx drivers, the 45.xx drivers, and the ATI 3.7 drivers, I couldn't see a difference.

    However, once the animated graphic was made, the difference was EXTREMELY apparent that the 3.7 and 45.xx drivers were heads above the 51.xx drivers. Yet in your conclusion you said that there were no palpable differences between the graphics.

    I think what people are trying to say is that you guys can do better than this, and we expect that from you. I know I certainly do.

    Regards,

    Long time AT reader

Log in

Don't have an account? Sign up now