NVIDIA does Cg

Before we get into the next topic of discussion let's have a quick lesson in the benefits of a high level programming language vs. hardware-centric assembly code. Remember that assembly language is what a particular processor operates on; whether it is a GeForce4 GPU or a Pentium 4, both of those processors operate on their own architecture-specific assembly code. When you compile a program in a high level programming language like C++ the compiler is merely translating the code that you wrote in the C++ language into assembly code which is then fed to the processor in binary form. Before high level programming languages became prevalent on the PC, almost all coding was done by hand in assembly.

In order to illustrate how much more tedious writing in assembly can be let's take a simple operation such as adding two integers together and storing them in a location in memory. In a high level programming language (e.g. C, C++, Java, etc…) the process goes like this:

int result = 2 + 2

The syntax obviously varies from one language to the next but in that one line we defined an integer variable, stored in memory and gave it the value of 2 + 2. Now let's do the same but in assembly, again this is a very general example and is not specific to any particular architecture:

ADD 2,2,R1
STORE R1,RESULT
RESULT: x133B

Once again, the syntax will vary from one architecture to the next but the basic idea remains the same. The first line adds the two numbers and stores the result in register R1. The second line stores the contents of R1 at the memory address pointed to by the label RESULT. The third line tells the assembler points the label RESULT at the appropriate memory address. Which one looks simpler to you?

Keep in mind that we're dealing with a relatively simple example here; once you start dealing with branches, loops and especially more complicated forms of memory addressing and allocation, assembly quickly becomes tedious.

You also have to be relatively familiar with the particular architecture you're coding for when using assembly as the opcodes and instruction formats do vary from one architecture to the next. This is both a pro and a con since it gives the programmer the opportunity to highly optimize their code for execution on a particular architecture but at the same time it makes their code virtually useless on any other platform.

Although NVIDIA always talked about how easy implementing DX8 pixel and vertex shader programs would be, they didn't really play up the fact that all of the coding was still done by hand in assembly. In order for more developers to actually take advantage of the shader capabilities of their next-generation GPUs NVIDIA would have to offer a higher level language for them to write code in. A good compiler can generate code very close in performance (and sometimes even faster than) to hand written assembly; even more importantly, a compiler can target multiple architectures and platforms to make reusing code much more programmer-friendly.

With all of that said, it wasn't a surprise that NVIDIA launched their high level programming language 'Cg' a few days ago. The name implies 'C for graphics' and thus employs a very C-like language but with an obvious skew towards writing shader programs. The syntax of the language is nearly identical to Microsoft's own high level graphics programming language called D3Dx, but the main difference between the two efforts is in NVIDIA's compiler development.

Here's an example of the reduction in code when going from raw assembly to Cg for the Phong Shader program used in the picture above:

Assembly Code for a Phong Shader

...
RSQR R0.x, R0.x;
MULR R0.xyz, R0.xxxx, R4.xyzz;
MOVR R5.xyz, -R0.xyzz;
MOVR R3.xyz, -R3.xyzz;
DP3R R3.x, R0.xyzz, R3.xyzz;
SLTR R4.x, R3.x, {0.000000}.x;
ADDR R3.x, {1.000000}.x, -R4.x;
MULR R3.xyz, R3.xxxx, R5.xyzz;
MULR R0.xyz, R0.xyzz, R4.xxxx;
ADDR R0.xyz, R0.xyzz, R3.xyzz;
DP3R R1.x, R0.xyzz, R1.xyzz;
MAXR R1.x, {0.000000}.x, R1.x;
LG2R R1.x, R1.x;
MULR R1.x, {10.000000}.x, R1.x;
EX2R R1.x, R1.x;
MOVR R1.xyz, R1.xxxx;
MULR R1.xyz, {0.900000, 0.800000, 1.000000}.xyzz, R1.xyzz;
DP3R R0.x, R0.xyzz, R2.xyzz;
MAXR R0.x, {0.000000}.x, R0.x;
MOVR R0.xyz, R0.xxxx;
ADDR R0.xyz, {0.100000, 0.100000, 0.100000}.xyzz, R0.xyzz;
MULR R0.xyz, {1.000000, 0.800000, 0.800000}.xyzz, R0.xyzz;
ADDR R1.xyz, R0.xyzz, R1.xyzz;
...

Cg Shader for same Phong Shader

...
COLOR cSpec = pow(max(0, dot(Nf, H)), phongExp).xxx;
COLOR cPlastic = Cd * (cAmbi + cDiff) + Cs * cSpec;

Microsoft's compiler will obviously only compile for Direct3D while NVIDIA's will be able to compile for both Direct3D and OpenGL. NVIDIA will also open-source the majority of the compiler with the exception of the backend that contains hardware specific optimizations for their GPUs. NVIDIA's compiler will receive regular updates (at least once for every major GPU release) and it will support all currently available competitor GPUs, however NVIDIA will be encouraging their competitors to develop their own compilers for Cg.

Currently Cg is offered in a beta state but it will go gold in the fall alongside the release of NV30. NVIDIA claims that the although the current version isn't faster than raw assembly code, the Fall release will be much more optimized and faster than hand coded assembly.

ATI's HD Dongles (continued) Parhelia - It's 1 Week Away
Comments Locked

0 Comments

View All Comments

Log in

Don't have an account? Sign up now