Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • XPgeek - Tuesday, October 28, 2003 - link

    Today I purchased this eVGA GF FX 5700 Ultra. i have no complaints of image quality. i am using the 52.16 betas, and Battlefiled 1942 and its XPacks run great, as do the rest of my games. The only issue i have is its length. in my case, the power connector nestles right up to one of my hard drives. but it does fit. barely.

    To re-itterate, this is a very nice card. no, i havent tested a 9600Pro / XT myself, but o well. no i dont work for AT or any other reviewing site. and no im not biased. i actually went to Best Buy to get a 9600 Pro, but saw the 5700U instead. so i wont get HL2 for free. o well, i'll just buy it when it comes out.
  • Anonymous User - Monday, October 27, 2003 - link

    you misspelled comparing 110, doh! rofl you sux!
  • Anonymous User - Monday, October 27, 2003 - link

    106, if you read the review and don't get the impression that it's a rushed and shoddy job, well then you're just not a particularly smart or insightful person. which is ok, no one said you had to be. again, i'm camparing this to the old AT from 2,3,4 years ago. read some of the older reviews, and you'll see what i mean. or maybe you won't, whatever.
  • Anonymous User - Monday, October 27, 2003 - link

    that'd be earth 106. and you? thanks 108.
  • Anonymous User - Monday, October 27, 2003 - link

    #104 you mispelled the word fuck.
  • Anonymous User - Monday, October 27, 2003 - link

    ...nvidia sucks.
  • Anonymous User - Monday, October 27, 2003 - link

    #104, you're officially an idiot. AT didn't spend "much time"? What planet are you living on.
  • Anonymous User - Monday, October 27, 2003 - link

    Firingsquad has a decent image quality article up today. You can draw your own conclusion from the screen shots.
  • Anonymous User - Monday, October 27, 2003 - link

    why does anandtech use these anonymous forums? it just encourages all of this nonsense. wtf are you yelling at eachother fanboy-this and fanboy-that? grow the fuk up.

    that said, i think anyone who has been a fan of AT (like myself) must be concerned with the recent nature of the graphics card reviews. i'm an owner of both nvidia and ati cards, and am too damn old to be a fanboy (maybe i'm a fanman). ATs recent reviews have been rubbish. I understand about trying to get info out in a timely fashion, but these reviews read like they were written the night before they were due (so to speak). i mean, if i were grading these as college papers or something, AT would get a D at best. i'm mostly comparing this to previous AT work, not other websites. i'm still an AT fan, i'm not goin anywhere.

    for some reason, the problems seem to be with the graphic card reviews more than anythng else. maybe because this is the most competetive market, and they have to pump it out ASAP.. it just feels like they're not giving much time to their reviews.

    the posters that have done the metrics on the review seem to have the right idea. specifically, it looks most like a tie to me, with 5700ultra being best in opengl situations, and 9600xt being best in other situations (ok, maybe that's not a tie :)
    the "TKO" conclusion certainly is baffling.
  • Anonymous User - Sunday, October 26, 2003 - link

    Stop acting like a fanboy #102, you look stupider by the second. Oh, and I'd like to see you try to keep my mouth shut. Ahhh, too bad, the little geek has no control over the situation. lol

Log in

Don't have an account? Sign up now