Let's talk Compilers...

Creating the perfect compiler is one of the more difficult problems in computing. Compiler optimization and scheduling is an NP-complete problem (think chess) so we can't "solve" it. And compounding the issue is that the best compiled code comes from a compiler that is written specifically for a certain processor and knows it inside and out. If we were to use a standard compiler to produce standard x86 code, our program will run much slower than if we tell our compiler we have a P4 with SSE2 and all the goodies that go along with it. I know this all seems pretty obvious, but allow me to illustrate a little.

Since I've always been interested in 3D graphics, back in 1998 I decided to write a 3D engine with a friend of mine for a project in our C++ class. It only did software rendering, but we implemented a software z-buffer and did back face culling with flat shading. Back then, my dad had a top of the line PII 300, and I acquired an AMD K6 200. Using a regular Borland C++ compiler with no real optimizations turned on, our little software 3D engine ran faster on my K6 than it did on my dad's PII. Honestly, I have no idea why that happened. But the point is that the standard output of the compiler ran faster on my slower platform while both systems were producing the same output. Now, if I had had a compiler from Intel optimized for the PII that knew what it was doing (or if I had hand coded the program in assembly for the PII), my code could have run insanely faster on my dad's box.

So, there are some really important points here. Intel and AMD processors were built around the same ISA (Instruction Set Architecture) and had a great deal in common back in 1998. Yet, performance varied in favor of the underpowered machine for my test. When you look at ATI and NVIDIA, their GPUs are completely and totally different. Sure, they both have to be able to run OpenGL and DirectX9, but this just means they are able to map OGL or DX9 function calls (via their drivers) to specific hardware routines (or even multiple hardware operations if necessary). It just so happens that the default Microsoft compiler generates code that runs faster on ATI's hardware than on NVIDIA's.

The solution NVIDIA has is to sit down with developers and help handcode stuff to run better on their hardware. Obviously this is an inelegant solution, and it has caused quite a few problems (*cough* Valve *cough*). The goal NVIDIA has is to eliminate this extended development effort via their compiler technology.

Obviously, if NVIDIA starts "optimizing" their compiler to the point where their hardware is doing things not intended by the developer, we have a problem. I think its very necessary to keep an eye on this, but its helpful to remember that such things are not advantageous to NVIDIA. Over at Beyond3d, there is a comparison of the different compiler (DX9 HLSL and NV Cg) options for NVIDIAs shaders.

We didn't have time to delve into comparisons with the reference rasterizer for this article, but our visual inspections confirm Beyond3d's findings. Since going from the game code to the screen is what this is all about, as long as image quality remains pristine, we think using the Cg compiler makes perfect sense. It is important to know that the Cg compiler doesn't improve performance (except for a marginal gain while using AA), and does a lot over the 45.xx dets for image quality.

Tomb Raider: Angel of Darkness Back to the game...
Comments Locked

117 Comments

View All Comments

  • Anonymous User - Tuesday, October 7, 2003 - link

    #41 "[...] who butters your bread???"

    Thats an interesting question, I suspect he does though my question is "who wants to know?" ; )

    In regard to your other question. "Why can't we have a true winner now?". As for myself, I'm going to give Dereck and Anand the benefit of the doubt.

    It seems to me that they realize that NVIDIA attempted to do somthing unique with it's 5000 series being that it does not exactly hold to the Direct X 9.1 spec. For instance it has a 16 bit and 32 bit rendering mode while DX 9.1 requires 24 bit - which ATI does (refer to Halflife 2 and DOOM III reviews). In the sharder area NVIDIA holds FAR more code (micro ops) than ATI - also if you check back to Anand's original post on the ATI and NVIDIA shootout(s) where there is a comparison between AA and AF NVIDIA was a CLEAR winner. I seem to recall a while ago that NVIDIA claimed ATI didn't do TRUE AF so they were therefore CHEATING. Boy did that one come back around with teeth, huh?

    What I'm saying is NVIDIA tried to one up ATI by tring to do more, unfortunately it seems they tried to do TOO much and ended up doing SHADY maneuvers like the whole Future Mark mess. They should of instead focused on the spec. DX 9.1 and the Microsoft shader/pixel code path and not tried to pull a GLIDE like 3DFX (excuse the parsed english).

    So, hopefully NVIDIA learns from it's mistakes modifies it's silicon to the spec. and gives us all BETTER cards to choose from come March/April.



    As far as the authors are concerned, Anand and Derick seem to be attempting JUSTICE (helping the party who needs the most help, and treating all parties equally) - which in this case seems to be NVIDIA. The authors are helping NVIDIA by dropping HEAVY hints like what you stated
    " Next year will be the year of DX9 titles, and it will be under the next generation of games that we will finally be able to crown a true DX9 winner. Until then, anyone's guess is fair game." and
    " If NVIDIA can continue to extract the kinds of performance gains from unoptimized DX9 code as they have done with the 52.14 drivers (without sacrificing image quality), they will be well on their way to taking the performance crown back from ATI by the time NV40 and R400 drop.".
    If NVIDIA takes head of these CONSTRUCTIVE statements then the entire gaming community could benifit - in better prices, higher quality to which the customer usually benifits (AMD vs INTEL sound familiar?).


    So, let us be easy and enjoy the night. Time will tell.

    Cheers,
    aka #37


    PS: Dereck please excuse me for leaving out your name before. The article was well written.
  • Anonymous User - Tuesday, October 7, 2003 - link

    Regarding my previous post #44, I wanted to write:

    ...the difference **between AA/AF and noAA/AF** is very noticeable in the game...
  • Jeff7181 - Tuesday, October 7, 2003 - link

    Can you say "highly programmable GPU?" I can =)
  • Anonymous User - Tuesday, October 7, 2003 - link

    Why didn't you guys wait for Catalyst 3.8? It's out tomorrow and is reported to fix many IQ problems in games like NWN. What would a couple of days have hurt, especially since this article is going to be irrelevant after the Cat drivers are released tomorrow?
  • Anonymous User - Tuesday, October 7, 2003 - link

    Note: the AA/AF and noAA/AF images of Warcraft3 have been mixed up for the NV52.14.

    It tells a lot about the value of the screenshots that it takes careful inspection to find this error. I have played a lot of War3 recently and the difference is very noticeable in game, even with this GF4.
  • Anonymous User - Tuesday, October 7, 2003 - link

    #18 Its not a problem figuring out the graphs its just weird that he would choose that type of graph excluding FPS.

    BTW I own a 5900U and a 9700pro.

    I don't like people avoiding ps2.0 tests. My 5900 sucks at it. I paid too much for what I got in the 5900. I try to get a good bang for the buck. The 5900 is not.
  • Anonymous User - Tuesday, October 7, 2003 - link

    ...
  • DerekWilson - Tuesday, October 7, 2003 - link

    First off... Thanks Pete ;-) ...

    Secondly, Anand and I both put a great deal of work into this article, and I am very glad to see the responses it has generated.

    Many of the image quality issues from part 1 were due to rendering problems that couldn't be captured in a screen shot (like jerkiness in X2 and F1), or a lack of AA. For some of the tests, we just didn't do AA performance benchmarks if one driver or the other didn't do what it was supposed to. There were no apples to anything other than apples tests in this review. The largest stretch was X2 where the screen was jerky and the AA was subpar. But we definitly noted that.

    TRAOD isn't a very high quality game, and certainly isn't the only DX9 (with PS2.0) test on the list. Yes, ATI beat NV in that bench. But its also true that ATI won most of the other benchmarks as well.

    Anyway, thanks again for the feedback, sorry BF1942 couldn't make it in, and we'll be bring back a flight sim game as soon as we tweak it out.

    J Derek Wilson
  • Anonymous User - Tuesday, October 7, 2003 - link

    Didn't Gabe Newell complain about screen capture "issues" with the Nvidia 50.xx drivers that show better image quality in screenshots than actually shows up in game?

    Anand spoke about image quality problems in almost every test in part 1, but i see almost nothing wrong with the screencaps in part 2.

    Can you verify this Anand?
  • Anonymous User - Tuesday, October 7, 2003 - link

    No difference in IQ, huh? Am I the only person to notice an IQ difference between the AA+8xAF pics of Aquamark3?

    http://images.anandtech.com/reviews/video/roundups...

    http://images.anandtech.com/reviews/video/roundups...

    It's funny how Anand and Derek did not comment on this. Maybe they missed it because they based their comparison off of those tiny images. Ah, so that's what the need of full-sized images are for?!

Log in

Don't have an account? Sign up now