Let's talk Compilers...

Creating the perfect compiler is one of the more difficult problems in computing. Compiler optimization and scheduling is an NP-complete problem (think chess) so we can't "solve" it. And compounding the issue is that the best compiled code comes from a compiler that is written specifically for a certain processor and knows it inside and out. If we were to use a standard compiler to produce standard x86 code, our program will run much slower than if we tell our compiler we have a P4 with SSE2 and all the goodies that go along with it. I know this all seems pretty obvious, but allow me to illustrate a little.

Since I've always been interested in 3D graphics, back in 1998 I decided to write a 3D engine with a friend of mine for a project in our C++ class. It only did software rendering, but we implemented a software z-buffer and did back face culling with flat shading. Back then, my dad had a top of the line PII 300, and I acquired an AMD K6 200. Using a regular Borland C++ compiler with no real optimizations turned on, our little software 3D engine ran faster on my K6 than it did on my dad's PII. Honestly, I have no idea why that happened. But the point is that the standard output of the compiler ran faster on my slower platform while both systems were producing the same output. Now, if I had had a compiler from Intel optimized for the PII that knew what it was doing (or if I had hand coded the program in assembly for the PII), my code could have run insanely faster on my dad's box.

So, there are some really important points here. Intel and AMD processors were built around the same ISA (Instruction Set Architecture) and had a great deal in common back in 1998. Yet, performance varied in favor of the underpowered machine for my test. When you look at ATI and NVIDIA, their GPUs are completely and totally different. Sure, they both have to be able to run OpenGL and DirectX9, but this just means they are able to map OGL or DX9 function calls (via their drivers) to specific hardware routines (or even multiple hardware operations if necessary). It just so happens that the default Microsoft compiler generates code that runs faster on ATI's hardware than on NVIDIA's.

The solution NVIDIA has is to sit down with developers and help handcode stuff to run better on their hardware. Obviously this is an inelegant solution, and it has caused quite a few problems (*cough* Valve *cough*). The goal NVIDIA has is to eliminate this extended development effort via their compiler technology.

Obviously, if NVIDIA starts "optimizing" their compiler to the point where their hardware is doing things not intended by the developer, we have a problem. I think its very necessary to keep an eye on this, but its helpful to remember that such things are not advantageous to NVIDIA. Over at Beyond3d, there is a comparison of the different compiler (DX9 HLSL and NV Cg) options for NVIDIAs shaders.

We didn't have time to delve into comparisons with the reference rasterizer for this article, but our visual inspections confirm Beyond3d's findings. Since going from the game code to the screen is what this is all about, as long as image quality remains pristine, we think using the Cg compiler makes perfect sense. It is important to know that the Cg compiler doesn't improve performance (except for a marginal gain while using AA), and does a lot over the 45.xx dets for image quality.

Tomb Raider: Angel of Darkness Back to the game...
Comments Locked

117 Comments

View All Comments

  • Anonymous User - Tuesday, October 7, 2003 - link

    How very balanced of you #30.

    Let us be patient; Anand is asking questions on OUR behalf in order to REVEAL truth.

    I'm focused on the questions and the answers. Where is your focus?
  • AgaBooga - Tuesday, October 7, 2003 - link

    #33, that's what came to my mind as soon as I read this article. I think that Anand may have just provided some input, done testing, or just edited it slightly...
  • Anonymous User - Tuesday, October 7, 2003 - link

    The IQ shots are not the best I could imagine.

    Some of them are cropped out so that you can't see a lot of details: UT2003, Aquamark3, Wolfenstein.

    Some of them are set up so that you wouldn't get any possible artifacts with texture filtering, because of the high camera angle: Warcraft3, C&C Generals.

    The Tomb Raider, Aquamark and Wolf screenshots are also too dark to notice anything. And I don't see any sign of a DX9 shader in either the Halo or the TR shots, so we have no idea of DX9 image quality.

    But kudos for all the testing you've done, must have been a lot of hard work.
  • Anonymous User - Tuesday, October 7, 2003 - link

    #30 ATI has not released performance drivers for a long time now and they already said don't hold your breath on those performance increases coming in the 3.8s either. The main focus since the 3.1s have mainly it seems been bug fixes with slight performance improvements in various games. 3.8 = more features and bug fixes with probably slight performance improvements here and there in specific games.
  • Anonymous User - Tuesday, October 7, 2003 - link

    Derek probably wrote the whole article while Anand was behind him cracking his whip. So I dunno about this "supposed" two authors!
  • Anonymous User - Tuesday, October 7, 2003 - link

    Would all the fanboys please take a deep breath or troll elsewhere? I swear to god some of you people will go out of your way to look for bias where there isn't any.

    I own a 9800 Pro and I for one am glad that it seems like Nvidia has closed the gap considerably, their customers deserve it.
  • Anonymous User - Tuesday, October 7, 2003 - link

    Great review, I love the IQ shots. I too am waiting to see the 9600xt review though.
  • AgaBooga - Tuesday, October 7, 2003 - link

    To those of you mentioned Anand a few times, you should also note this was written by two authors. Or atleast worked on together by two authors, so you should try and understand that you may different "types" of responses and analyses (sp?) of similar results if they're done by different people. I think we should wait for the 3.8 Cat. article before we jump to too many conclusions.
  • PKIte - Tuesday, October 7, 2003 - link

    This is the way I take screen shots in final fantasy XI benchmark 2.

    - Use Hypersnap-dx
    - Enable directx capture in Hypersnap
    - Change Hypersnap “Quick Save” settings to repeat capture every 5 seconds
    - Launch Final Fantasy XI benchmark 2 menu
    - When you click the “START” button press “Print Screen” once resolution changes.

    Wow this is the biggest video card review I have ever read: Awesome!!
  • Anonymous User - Tuesday, October 7, 2003 - link

    >Right now NVIDIA is at a disadvantage; ATI's >hardware is much easier to code for and the >performance on Microsoft's HLSL compiler clearly >favors the R3x0 over the NV3x
    ever heard from the ps2_a compiler target?

Log in

Don't have an account? Sign up now