# Benchmarks and Limitations

by Ga'ash Soffer on October 10, 1998 6:16 PM EST
 Demo1.dm2, Demo2.dm2, Crusher.dm2, Massive1.dm2, Unreal timedemo, Forsaken Nuke.dem, Forsaken Biodome.dem, Mon2.dm2, etc. What is the difference between all of these benchmarks? What limitations do these benchmarks expose? Why are Crusher results always lower than demo1.dm2? How come Mon2.dm2 runs faster than a Voodoo2 on just about any AGP board? All of these questions will be answered, in detail with this article. Please note: This is a somewhat rigorous analysis of benchmarks, and some elementary calculus is used. I have tried my best to explain the calculus parts in plain English, but I'm not a math teacher...

Fill Rate Limited Benchmarks

A fill-rate limited benchmark is a benchmark which is designed to expose the fill-rate limit of a certain video card. Fill rate limited benchmarks generally employ multi-pass rendering techniques (3D accelerator cycle chewers) and run at very high resolutions. Nearly every benchmark can be made a fill-rate limited benchmark if run at a high enough resolution. However, not all 3D accelerators will reach their peak fill-rate with a fill-rate limited benchmark. The Voodoo2 SLI, for example, will not reach a fill-rate limit with demo1.dm2 (running at 800x600 or less), while an i740 might reach it's maximum potential on a system as slow as a PII/266. It is important to remember that simply because a benchmark is fill-rate limited for one card, it does not mean that it will be fill-rate limited for another.

Some Fill Rate Limited Benchmarks

Some popular fill-rate limited benchmarks are the recently adopted Unreal timedemo test and Quake2 Demo1.dm2, provided it is run at a high enough resolution (800x600 should do for 'most everything but Voodoo2 SLI) Unreal, which uses three pass rendering is probably the biggest fill-rate hog of any game currently in the market. What does three pass rendering mean? This means that on cards which do not support any special dual-pass / clock rendering, it will take three times as long to render a pixel in Unreal, than a pixel, in, Forsaken, lets say. Obviously, Unreal is very fill-rate limited. Quake2's demo1.dm2 is also a relatively fill-rate limited benchmark. Since Quake2 uses two-pass rendering, most 3D accelerators' fill-rates are already cut in half (when compared to peak fill-rate in single pass games). Though Quake2's demo1.dm2 does not expose the fill-rate limit of most cards as clearly as Unreal timedemo, when running at a sufficiently high resolution, demo1.dm2 can be used to differentiate between the fill-rates of different 3D accelerators.

Recognizing a Fill Rate Limited Benchmark

Perhaps the most important thing which you will hopefully learn in this article is HOW to analyze benchmark results and tell whether or not they reflect the fill-rate limits of video cards, or some other bottleneck (these other bottlenecks will be discussed in the next few pages) In order to recognize fill-rate limited benchmarks, it is necessary to analyze results generated from a fill-rate limited benchmark. Our fill-rate limited benchmark results will consist of a fill-rate limited benchmark run at various CPU speeds (i.e. pumping different amounts of data to the card)   Since a fill-rate limited benchmark is (you guessed it) fill-rate limited, no matter how much data is pumped to the card, once you hit the fill-rate limit, the FPS will REMAIN THE SAME.

The ideal Case

What was described above was the ideal case. In the ideal (theoretical) case, the 3D accelerator will scale linearly with CPU speed until the fill-rate limit is reached. At this point the graph, let's call it f(x), comparing FPS vs CPU Speed, will become a horizontal line, with a slope of zero. Of course, this is never really the case. Since most cards have latencies, driver issues, etc. an effective fill-rate limit is reached before the absolute fill-rate limit.

The real world situation

In "real life", As we begin to approach the absolute fill-rate limit of the accelerator, we will notice that the improvement in performance as we increase CPU speed (i.e. increase the amount of Data we feed) is decreasing. This can be summarized using a little mathematics notation by: f ' '  (x) < 0. (Where f(x) is the function of Frames per second vs CPU speed) For those of you who do not know simple calculus, f ' ' (x) is read 'f double prime of X'. What you really need to know is that f ' ' (x) [Which is the slope of the tangent line to the function of the slope of the tangent line to the function f(x)] is that it equals the acceleration at any given point on the function f(x). So what we mean by saying f ' ' (x) < 0 is that the acceleration is negative. (This means that speed (actually velocity) is decreasing) In our case, this means that the rate at which Frames/sec is increasing is decreasing. I hope I didn't lose anyone there... Anyway, below is a graph showing an results from running a fill-rate limited test (demo1.dm2) with the Riva 128ZX (@800x600, where it is fill-rate limited) with various speed Pentium IIs. (233,266,300,350 and 400mhz)

f ' ' (x) is obviously negative, as you can see from the graph. This verifies the information above, regarding real world fill-rate limit expectations. Unreal benchmarks coming as soon as OpenGL and D3D drivers mature more... So, what happens when the results don't fit the curve shown above? This probably means that the test is not fill-rate limited, but something else instead. The next type of benchmark I will talk about is Geometry (CPU) limited benchmark.

Geometry Limited Benchmark