The Performance Impact of Vertex Shaders

In the 'Dronez rolling demo', enabling vertex shaders has the effect of doubling or tripling GeForce3's triangle load (Figure 8). Triangle loads with vertex shaders enabled and disabled average 15300 and 7800 triangles per frame respectively. It must be emphasized that the degree of tesselation and quality of animation is similar in either mode. The more likely explanation to account for the divergent triangle counts is that in vertex shader mode, streams of vertex data are retrieved in multiple passes. Due to the limited space with which to store vertex attributes on the graphics processor, it is concievable that data may be retrieved in multiple passes for keyframe interpolation (two or more positions), vertex blending (matrices) and lighting (texture space coordinates). This does not necessarily imply that absolute geometry bandwidth is increased by a factor of two to three, provided that components of vertex data rather than the entire contents are retrieved during each pass.


Figure 8: Triangle counts over 9000 plus frames of the 'Dronez rolling demo'. Bump mapping enabled.

The rate at which vertices are modified is heavily dependent upon the instruction length of the vertex shader, in contrast to the invariant rate of hardwired transformation. GeForce3 executes one instruction in one cycle. To put this in perspective, consider that a simple transform with a six-instruction vertex shader processes 3.3 million vertices in one second. On the other hand, GeForce3's fixed transformation pipe is capable of approximately 16.6 million vertices per second (as measured on 3D Mark 2001). This dependency on instruction length is reflected in the fillrate graphs in figure 9. A vertex shader that only controls keyframe interpolation and vertex blending is faster than one which adds per pixel lighting after these animation routines. Having said this, figure 9 indicates that GeForce3 actually does an excellent job of relieving the central processing unit of vertex operations. At high resolution, the bottleneck shifts to memory bandwidth bandwidth.


Figure 9: Dronez rolling demo. GeForce3 - vertex shader enabled. GeForce2 - vertex shader disabled.


Resolution (16-bit color) 640x480 800x600 1024x768 1280x1024 1600x1200
GeForce2, bump mapping

66.05

64.36

63.92

61.14

56.15

GeForce2

94.5

93.35

92.22

87.96

79.76

GeForce3, bump mapping

112.45

104.18

98.46

82.67

69.56

GeForce3

144.46

141.6

141.17

124.29

104.96

Vertex shader Anti-aliasing
Comments Locked

0 Comments

View All Comments

Log in

Don't have an account? Sign up now