Pixel Shader Performance Tests

ShaderMark v2.0 is a program designed to stress test the shader performance of modern DX9 graphics hardware with Shader Model 2.0 programs written in HLSL running on a couple shapes in a scene.

We haven't used ShaderMark in the past because we don't advocate the idea of trying to predict the performance of real world game code using a synthetic set of tests designed to push the hardware. Honestly, as we've said before, the only way to determine performance of a certain program on specific hardware is to run that program on that hardware. As both software and hardware get more complex, results of any given test become less and less generalize able, and games, graphics hardware, and modern computer systems are some of the most complex entities on earth.

So why are we using ShaderMark you may ask. There are a couple reasons. First this is only a kind of ball park test. ATI and NVIDIA both have architectures that should be able to push a lot of shader operations through. It is a fact that NV3x had a bit of a handicap when it came to shader performance. A cursory glance at ShaderMark should tell us enough to know if that handicap carries over to the current generation of cards, and whether or not R420 and NV40 are on the same playing field. We don't want to make a direct comparison, we just want to get a feel for the situation. With that in mind, here are the benchmarks.

 

  Radeon X800 XT PE Radeon X800 Pro GeForce 6800 Ultra GeForce 6800 GT GeForce FX 5950 U
2
310
217
355
314
65
3
244
170
213
188
43
4
238
165
5
211
146
162
143
34
6
244
169
211
187
43
7
277
160
205
182
36
8
176
121
9
157
107
124
110
20
10
352
249
448
410
72
11
291
206
276
248
54
12
220
153
188
167
34
13
134
89
133
118
20
14
140
106
141
129
29
15
195
134
145
128
29
16
163
113
149
133
27
17
18
13
15
13
3
18
159
111
99
89
17
19
49
34
20
78
56
21
85
61
22
47
33
23
49
43
49
46

These benchmarks are run with fp32 on NVIDIA hardware and fp24 on ATI hardware. It isn't really an apples to apples comparison, but with some of the shaders used in shadermark, partial precision floating point causes error accumulation (since this is a benchmark designed to stress shader performance, this is not surprising).

ShaderMark v2.0 clearly shows huge increase in pixel shader performance from NV38 to either flavor of NV40. Even though the results can't really be compared apples to apples (because of the difference in precision), NVIDIA manages to keep up with the ATI hardware fairly well. In fact, under the diffuse lighting and environment mapping, shadowed bump mapping and water color shaders don't show ATI wiping the floor with NVIDIA.

In looking at data collected on the 60.72 version of the NVIDIA driver, no frame rates changed and a visual inspection of the images output by each driver yielded no red flags.

We would like to stress again that these numbers are not apples to apples numbers, but the relative performance of each GPU indicates that the ATI and NVIDIA architectures are very close to comparable from a pixel shader standpoint (with each architecture having different favored types of shader or operation).

In addition to getting a small idea of performance, we can also look deep into the hearts of NV40 and see what happens when we enable partial precision rendering mode in terms of performance gains. As we have stated before, there were a few image quality issues with the types of shaders ShaderMark runs, but this bit of analysis will stick only to how much work is getting done in the same amount of time without regard to the relative quality of the work.

  GeForce 6800 U PP GeForce 6800 GT PP GeForce 6800 U GeForce 6800 GT
2
413
369
355
314
3
320
283
213
188
5
250
221
162
143
6
300
268
211
187
7
285
255
205
182
9
159
142
124
110
10
432
389
448
410
11
288
259
276
248
12
258
225
188
167
13
175
150
133
118
14
167
150
141
129
15
195
173
145
128
16
180
161
149
133
17
21
19
15
13
18
155
139
99
89
23
49
46
49
46

The most obvious thing to notice is that, overall, partial precision mode rendering increases shader rendering speed. Shader 2 through 8 are lighting shaders (with 2 being a simple diffuse lighting shader). These lighting shaders (especially the point and spot light shaders) will make heavy use of vector normalization. As we are running in partial precision mode, this should translate to a partial precision normalize, which is a "free" operation on NV40. Almost any time a partial precision normalize is needed, NV40 will be able to schedule the instruction immediately. This is not the case when dealing with full precision normalization, so the many 50% performance gains coming out of those lighting shaders is probably due to the partial precision normalization hardware built into each shader unit in NV40. The smaller performance gains (which, interestingly, occur on the shaders that have image quality issues) are most likely the result of decreased bandwidth requirements, and decreased register pressure: a single internal fp32 register can handle two fp16 values making scheduling and managing resources much less of a task for the hardware.

As we work on our image quality analysis of NV40 and R420, we will be paying heavy attention to shader performance in both full and partial precision modes (as we want to look at what gamers will actually be seeing in the real world). We will likely bring shadermark back for these tests as well. This is a new benchmark for us, so please bear with us as we get used to its ins and outs.

NVIDIA's Last Minute Effort and The Test Aquamark 3 Performance
Comments Locked

95 Comments

View All Comments

  • ZobarStyl - Tuesday, May 4, 2004 - link

    Jibbo I thought that the dynamic branching capability as part of PS3.0 could make rendering a scene faster because it skips rendering unneccessary pixels and thus could offer an increase in performance, albeit a small one. In an interview one of the developers of Far Cry said that there weren't many more things that PS3.0 could do that 2.0 can't, but that 3.0 can do things in a single pass that a 2.0 shader would have to do in multiple passes. The way he described it, the real pretty effects can come in later but a streamlined (read: slightly faster) shader could very well improve NV40 scores as is. This seems kind of analogous to the whole 64-bit processor ordeal going on; Intel says you don't need it, but then most articles show higher scores from A64 chips when they are in a 64 bit OS, so basically if you streamline it you can run a little bit faster than in less efficient 32-bit.

    In the end, it'll still be bitter fanboys fighting it out and buying whatever product their respective corporation feeds them, despite features or speeds or price or whatever. Personally, like I said before, I'll wait and see who really ends up earning my dollar.

    Anyway, thanks for keeping me on my toes though, jib...I can't get lazy now... =)
  • Barkuti - Tuesday, May 4, 2004 - link

    From my point of view, the 6800U is superior high end hardware. Folks, you don't need to be that intelligent to understand that if ATI needs 520 Mhz to "beat" nVidia's 400 MHz chip, as it will need to overclock proportionally to keep the same level of performance that means it will need a good bunch of extra MHz to stay at least on par on the overclocking front.

    I think the final revision of the 6800U will manage 500 MHz overclocks or around (probably more if they deliberately set the initial clock low waiting for ATI), so ATI's hardware may need around 650 Mhz, which I doubt it'll make. As for the power requirements, sure ATI is the winner, but the nVidia's card can be fed with more standard PSU's than they claim; I just think they played on the safe side.
    Oh, sure, power may be a limiting factor when oc'ing the 6800U, but the reality is that people who buy these kind of harware already has top end computer components (including the PSU), so no worries here also.

    And finally speaking, I think PS 3.0 will make some additional difference. With the possibility to somewhat enhance shader performance and the superior displacement mapping effect, it may give it the edge in at least a handful of games. We'll see.

    "Just my 2 cents"
    Cheers
  • Staples - Tuesday, May 4, 2004 - link

    Everyone be sure to check out Tom's review. Looks like the X800 did better here than it did against the 6800. I have seen other reviews and the X800 doesn't really seem as fast in comparison as it does here.

    Anyway, it is a lot faster than I though. The 6800 was impressive but it seems that the reason it does really well in some games and not so great in others is because some games have NVIDIA specific code that the 6800 takes advantage of very well.
  • UlricT - Tuesday, May 4, 2004 - link

    wtf? the GT is outperforming the Ultra in F1 Challenge?
  • jibbo - Tuesday, May 4, 2004 - link

    Agree with you all the way on the fanboys, ZobarStyl.

    Just wanted to point out that PS3.0 is not "faster" - it's simply an API. It allows longer and more complex shaders so, if anything, it's likely to be "slower." I'm guessing that designers who use PS3.0 heavily will see serious fill-rate problems on the 6800. These shaders will have potentially 65k+ instructions with dynamic branching, a minimum of 4 render targets, 32-bit FP minimum color format, etc - I seriosuly doubt any hardcore 3.0 shader programs will run faster than existing 2.0 shaders.

    Clearly a developer can have much nicer quality and exotic effects if he/she exploits these, but how many gamers will have a PS3.0 card that will run these extremely complex shaders at high resolutions and AA/AF without crawling to single-digit fps? It's my guess that it will be *at least* a year until games show serious quality differentiation between PS2.0 and PS3.0. But I have been wrong in the past...
  • T8000 - Tuesday, May 4, 2004 - link

    I think it is strange that the tested X800XT is clocked at 520 Mhz, while the 6800U, that is manufactured by the same taiwanese company and also has 16 pipelines, is set at 400 Mhz.

    This suggests a lot of headroom on the 6800U or a large overclock on the X800XT.

    Also note that the 6800U scored much better on tomshardware.com (HALO 65FPS@1600x1200), but that can also be caused by their use of the 3.2 Ghz P4 instead of a 2.2 Ghz A64.
  • ZobarStyl - Tuesday, May 4, 2004 - link

    I love seeing these fanboys announce each product as the best thing ever (same thing happened with the Prescott, Intel fanboys called it the end of AMD and the AMD guys laughed and called it a flamethrower) without actually reading the benches. NV won some, ATi won some. Most of the time it was tiny margins either way. Fanboys aside, this is gonna be a driver war nothing more. The biggest margin was on Far Cry, and I'm personally waiting on the faster PS3.0 to see what that bench really is. This is a great card but price drops and drivers updates will eventually show us the real victor.
  • jibbo - Tuesday, May 4, 2004 - link

    If I had to guess, DX10 and Longhorn will coincide with the release of new hardware from everyone.
  • Akaz1976 - Tuesday, May 4, 2004 - link

    Just thought of something. If i am reading AT review right, ATi now has milked the original Radeon9700 architecture for nearly 2 years (sure says a lot of good things about the ArtX design team).

    Anyone know when the true next gen chip can be expected?

    Akaz
  • Ilmater - Tuesday, May 4, 2004 - link

    ---------------------------------------
    Hearing about the 6850 and the other Emergency-Extreme-Whatever 6800 variants that are floating about irritates me greatly. Nvidia, you are losing your way!

    Instead of spending all that time, effort and $$ just to try to take the "speed champ" title, make your shit that much cheaper instead! If your 6800 Ultra was $425 instead of $500, that would give you a hell of alot more market share and $$ than a stupid Emergency Edition of your top end cards... We laugh at Intel for doing it, and now you're doing it too, come fricking on...
    --------------------------------------------
    This is ridiculous!! What do you think the XT Platinum Edition from ATI is? The only difference is that nVidia released first, so it's more obvious when they do it than when ATI does. I'm not really a fanboy of either, but you shouldn't dog nVidia for something that everyone does.

    Plus, if nVidia dropped their prices, ATI would do the same thing. Then nVidia would be right back where it was before, but they wouldn't be making any money on the cards.

Log in

Don't have an account? Sign up now