Compilation Integration

In order to maximize performance, the NV3x pipeline needs to be as full as possible all the time. For this to happen, special care needs to be taken in how instructions are issued to the hardware. One aspect of this is that the architecture benefits from interleaved pairs of different types of instructions (for instance: issue two texture instructions, followed by two math instructions, followed by two texture instructions, etc). This is in contrast to ATI's hardware which prefers to see a large block of texture instructions followed by a large block of math instructions for optimal results.

As per NVIDIA's sensitivity to instruction order, we can (most easily) offer the example of calculating a^2 * 2^b:

mul r0,a,a
exp r1,b
mul r0,r0,r1

-takes 2 cycles on NV35

exp r1,b
mul r0,a,a
mul r0,r0,r1

-takes 1 cycle on NV35

This is a trivial example, but it does the job of getting the point across. Obviously, there are real benefits to be had from doing simple standard compiler optimizations which don't effect the output of the code at all. What kind of optimizations are we talking about here? Allow us to elaborate.

Aside from instruction reordering to maximize the parallelism of the hardware, reordering can also help reduce register pressure if we minimize the live ranges of registers within independent data. Consider this:

mul r0,a,a
mul r1,b,b
st r0
st r1

If we reorder the instructions we can use only one register without affecting the outcome of the code:

mul r0,a,a
st r0
mul r0,b,b
st r0

Register allocation is a very hefty part of compiler optimization, but special care needs to be taken to do it correctly and quickly for this application. Commonly, a variety of graph coloring heuristics are available to compiler designers. It seems NVIDIA is using an interference graph style of register allocation, and is allocating registers per component, though we are unclear on what is meant by "component".

Dead code elimination is a very common optimization; essentially, if the developer includes code that can never be executed, we can eliminate this code from the program. Such situations are often revealed when performing multiple optimizations on code, but it’s still a useful feature for the occasional time a developer falls asleep at the screen.

There are a great many other optimizations that can be performed on code which have absolutely no effect on outcome. This is a very important aspect of computing, and only gets more complicated as computer technology gets more powerful. Intel's Itanium processors are prohibitive to hand coding, and no IA64 based processor would run code well unless the compiler that generated the code was able to specifically tailor that code to the parallel nature of the hardware. We are seeing the same type of thing here with NVIDIA's architecture.

Of course, NVIDIA has the added challenge of implementing a real-time compiler much like the java JIT, or Transmeta's code morphing software. As such, there are other very interesting time saving things they need to do with their compiler in order to reduce the impact of trying to adequately approximate the solution to an NP complete problem into am extremely small amount of time.

A shader cache is implemented to store previously compiled shaders; this means that shaders shouldn't have to be compiled more than once. Directed Acyclic Graphs (DAGs) of the code are used to fingerprint compiled shaders. There is also a stock set of common, precompiled, shaders that can get dropped in when NVIDIA detects what a developer is trying to accomplish. NVIDIA will need to take special care to make sure that this feature remains a feature and doesn't break anything, but we see this as a good thing as long no one feels the power of the dark side.

Also, until the most recent couple driver releases from NVIDIA, the real-time compiler didn't implement all of these important optimizations on shader code sent to the card by a game. The frame rate increases of beyond 50% with no image quality loss can be attributed to the enhancements of the real-time compiler NVIDIA has implemented. All of the performance we've previously seen has rested on how well NVIDIA and developers were able to hand code shaders and graphics subroutines.

Of course, writing "good code" (code that suits the hardware it’s written for) will help the compiler be more efficient as well. We certainly won't be seeing the end of NVIDIA sitting down at the table with developers to help them acclimate their code to NV3x hardware, but this Unified Compiler technology will definitely help us see better results from everyone's efforts.

Architecture Image Quality
Comments Locked

114 Comments

View All Comments

  • Anonymous User - Thursday, October 23, 2003 - link

    #7, way to go! that's some hard numbers you got there. now that's what I call objective analysis! i agree with #12 and #14 as well.

    this review IMHO has been very subjective. even if they mentioned they would follow up with image quality reviews, it may be too late because "simple minded" individuals looking only for frame rate numbers may have already been influenced in their graphics card future buying decisions.

    ..and people, listen up! if you disagree with the site reviews then don't visit them and don't recommend them to friends. that way they get less hits and vendors leave them, and they die sooner or later. if we keep on visiting their sites even with good or bad intentions, they get hit counts and people see their rotating ads. that's how web site businesses operates now and a way for them to generate income. money talks in the hardware and technology business.

    let's not keep debating or bashing each other for this. haven't you noticed they are playing us all like fools. I pity the people who will believe these reviews without really evaluating them.

    so again, vote with your wallet! i'm no fanboy and i will always evaluate and buy the best product i see out there.

    tech reviewers please be responsible! you got big in the first place because the community supports you. you owe it to them. you start without ads at first, people come and now you see "hits" coming you see an oppurtunity to generate income then you confuse us with your AD BLOATED web sites and BIASED reviews.
  • Anonymous User - Thursday, October 23, 2003 - link

    I agree with everyone about the conclusion not making much sense. And what is the price before rebate? Ya know...the money I ACTUALLY HAVE TO HAND OVER? Rebates are sometimes(not all the time) a risky business. Who is honoring these rebates? Why wasnt this mentioed? Thats rather odd.

    If there is a rebate involved im assuming that the card must be $250. If this is the case then the 9700pro is the same price. So actually....If your going to shell out $250, just get a 9700pro and forget about the stupid rebate =)

  • Anonymous User - Thursday, October 23, 2003 - link

    This is a FACT. Its not an excuse...its just a fact that im putting out.

    To be 100% unbiased is NOT human.

    So the boys at anandtech tend to lean towards Nvidia alittle more then ATI...maybe 55-45, dont get made about it though...just take what they say with a grain of salt =)
  • Anonymous User - Thursday, October 23, 2003 - link

    i tought that tom pointed ati as the cheater
  • Anonymous User - Thursday, October 23, 2003 - link

    #24 You should try taking English 101...it might help you a bit :)
  • Anonymous User - Thursday, October 23, 2003 - link

    This IQ comparison better not be like the last one. If every other site gives me full uncompressed screenshots and shows me Nvidia filtering issues, anandtech better show me too if they dont want to lose respect
  • Anonymous User - Thursday, October 23, 2003 - link

    I like AnandTech, and respect their reviews. I agree the conclusion doesn't match the test results, but the conclusion is also subjective to actually using the device reviewed.

    Besides, HardOCP is way more biased than Anand or Tom's, it's just that their bias changes based on Kyle's latest whim.
  • Anonymous User - Thursday, October 23, 2003 - link

    If the IQ testing for this card is the same as the last IQ article, no thanks. Let's hope they put more effort into actually showing the obvious differences that were passed up last time.
  • Anonymous User - Thursday, October 23, 2003 - link

    ej guys 9or better ati fans) if derek made this review besides posting the numbers he probably saw the image quality. all you knows perfectly whose the best driver writer. and it's been so seen i got my first tnt. but anyways it is obvious that some games work better on ati's card somework better on nvidia. in xbitlabs forum yesterday one guy had posted after he upgrated his GF3 to radeon 9800 pro he started experiencing quality promblems. so don't bitch about the quality it all dipends on the test suit.
    it was the same situation with the reviews of fx-51 and pentium EE. every single review gontradict to another.
    btw those of you trade better start buying nvidia'a stocks.
  • Anonymous User - Thursday, October 23, 2003 - link

    #10, The ads rotate. I just saw two different ATI ads on the same page. Keep trying!

Log in

Don't have an account? Sign up now