Original Link: http://www.anandtech.com/show/523
In the world of computers, the cliché "more is better" seems to be commonplace. This cliché has been associated with computers since the dawn of the microprocessor. The years have seen computers with more megahertz, more storage space, and more options. Typically, the saying holds true, but occasionally more does not always make a superior product. Take, for example, two systems, one based on Intel's Celeron clocked at 600 MHz and a second based on Intel's Pentium III clocked at 550 MHz. To the naive consumer, the Celeron at 600 MHz seems like the better deal simply because "more is better." A veteran computer user would know, however, that the Pentium III 550E is in fact the faster of the two in most cases. It is for this reason that "more is better" remains a cliché and not a fact.
This does not stop manufacturers from touting a product asbeing betterthan a clearly superior product just because it has "more". Marketing firms know that many of the consumers they target will hold the "more is better" cliché to truth. Due to this fact, it is often-times hard to separate when more is truly better, when more is only slightly better, and when more is actually inferior. In some cases, the answer may seem obvious off the bat. Other times, the truth behind the marketing is skewed and the answer becomes less clear.
It has been awhile since the "more is better" debate was waged in the video card market. With NVIDIA on a 6 month processor cycle and few other competing products hitting the shelves, it seems that not many advancements, beneficial or not, have popped up. Well, video card manufacturers, with the permission of NVIDIA, recently set out to change the market with the "more is better" philosophy. This change comes with the release of 64 MB DDR GeForce video cards from a variety of manufacturers. How much weight does the additional memory hold? Is the additional memory actually useful or just a marketing gimmick? Is the cost of such cards justified? Is more, in fact, better in this case? These questions can only be addressed through hands on testing, an opportunity that AnandTech was recently given. The pages which follow attempt to guide the consumer through the pros and cons of 64 MB DDR GeForce cards and help access the value and speed to be gained by having RAM.
Limitations to the GeForce
It has been about four months since the last major revolution came to the video card market, a revolution brought upon by the release of NVIDIA's new video card processor: the GeForce. Referred to by NVIDIA as a GPU, standing for Graphics Processing Unit, the GeForce marked a significant improvement in 3D gaming and raw processing power. At a fill rate of 480 Million Pixels per Second and around 23 million transistors, the GeForce definitely had more than its predecessor, the TNT2. In addition, advanced features never seen before on video cards, such as T&L, provided for an even more optimistic view of 3D gaming future. However, for all the bang that came with NVIDIA's new powerhouse, there remain speed limitations to the GeForce product line.
In essence, there are two factors which affect the speed of a video card: the fill rate and the memory bandwidth. The fill rate refers to how many pixels the video processor can compute over a given time interval (measured in seconds). The theoretical fill rate is directly proportional to the core speed of a given card. For example, given that the standard core speed of the GeForce is 120 MHz and that the GeForce GPU can process 4 pixels per clock cycle, we can calculate the fill rate to be 4 pixels times the 120 MHz clock cycle resulting in 480 Million Pixels per Second. Any increase in core clock speed would result in an additional increase in fill rate, showing that at a 160 MHz core clock speed, the GeForce would be able to process 640 Million Pixels per Second.
The second factor that affects how fast a video card can function is the memory bandwidth. While the GeForce GPU may be able to process 480 Million Pixels per Second, the video card needs a place to collect and store this data before it gets rendered on the screen. This temporary storage area is provided via RAM. The problem is not keeping the data in the RAM but rather getting the data there. Data from the GeForce GPU can only get to the monitor by passing through the RAM first. Ideally, data transfer between the processor and the memory system would be instantaneous, leaving no potential bottleneck in the system. This, however, is not the case. The data from the processor must pass to the memory via the memory bus, a factor controlled by memory clock speed. More often than not, this bandwidth amount is too little to keep up with the fast rate that the core is sending out data. This is especially the case when running in 32 bit color, because, at this mode, twice as much data has to be passed from processor to RAM. This results in a process that does not quite meet the theoretical fill rate described above, thus this speed represents the effective fill rate. This effective fill rate is limited by the peak available bandwidth of the memory bus.
In the case of the GeForce, which speed factor plays a larger role in overall speed? To answer this question, we independently raised both the memory and core speeds of a typical DDR GeForce card. The Leadtek WinFast GeForce 256 DDR Rev B was used as the test card and results were recorded on an AMD Athlon system running at 750 MHz (see The Test section for more details). Quake III Arena was then run to determine which system proved to be the bottleneck. The next section begins with the graphs and continues into an explanation of the trends. The results may surprise you.
Result of GeForce Limitations
As suspected, raising the core clock speed as well as the memory clock speed results in an increase in frame rate. What the graphs above do a good job in showing, however, are how overclocking will increase your card differently depending on if you overclock the memory clock or the core clock. The core clock speed graphs show the FPS rate rising constantly at most resolutions when at 16-bit colors. The core clock speed increase does not seem to impact the frame rate as much when in 32-bit color mode, as can be seen by the fact that the graph of these increases produces data that is nearly horizontally linear.
When examining the graphs of memory clock increases, we find the opposite to be the case. At 16-bit colors, the memory clock increases result in small gain. The story changes, however, when 32-bit color mode is used. The data now suggests that increasing memory clock speed results in a constant rising frame rate at most resolutions, the same result that was found by increasing core speed in 16-bit colors.
The results of the above can easily be explained using the information about the GeForce given at the beginning of this section. When at the 16-bit color mode and clock speed is being increased, the GeForce GPU is accomplishing close to its theoretical fill rate. This is due to the fact that, at 16-bit color, a smaller amount of data is needed to be passed to the memory via the memory bus. Therefore, since the memory bus is not creating a bottleneck, data is free to flow from the GPU to the RAM and out to the display without many slowdowns due to the fact that the 300 MHz memory clock is providing enough bandwidth.
When core clock speed is increased while running in 32-bit color mode, the outcome is a bit different. Some performance is gained by overclocking the core here, but not only is the performance gain not as large as that experienced at 16-bit color mode, the performance gain seems to hit a limiting value (represented by a horizontal asymptote on the graph). This can be explained using our knowledge of the memory clock and bandwidth. Although the theoretical fill rate is being pushed out of the GPU, the memory bus cannot keep up with the massive amount of data it is receiving. The memory clock becomes a bottleneck simply due to the fact that 32-bit color requires twice as much temporary storage area as 16-bit color modes. Traveling at 300 MHz, the data can only move so fast into and out of the onboard RAM. This creates a bottleneck, a fact which explains the almost horizontal slope of the graph.
When the core clock is kept constant and the memory clock speed is raised, the results are swapped. We find that, at 16-bit color, overclocking the memory does not seem to make any significant difference in frame rate. This is because, as described in above, the GeForce was already meeting its theoretical fill rate because memory bandwidth was not an issue. Increasing the bandwidth by increasing the memory clock does not result in any appreciable increase in speed when at 16-bit colors because the GPU (at its stock setting) cannot fill the memory bandwidth available.
Tables turn, once again, when the color depth is changed from 16-bit to 32-bit. Unlike the results gathered by increasing core clock speed in 32-bit color, the speed of the DDR GeForce actually increases steadily when the memory clock is pushed up. Once again, this can be attributed to the memory bandwidth present at such high colors. Although the GeForce may be able to process sufficient information to keep up the 480 Million Pixels per Second, the memory bus under normal speeds cannot. By increasing the memory clock speed the memory bandwidth was also increased. This resulted in faster frame rates due to the fact that the bottleneck was widened. No longer is the massive amount of data that has to travel to the RAM forced to go at the speed provided by 300 MHz. Now information can move from core to the memory at a rate which does not limit the core fill rate. This fact is represented graphically via the constant increasing slope of the FPS graph when memory clock speed is increased with a constant core speed.
NVIDIA knew this would be a problem upon the release of the GeForce. It is for this reason that the GeForce comes in two models: SDR and DDR. The results of the above are from tests performed on a DDR card. Let's take a look at how SDR GeForce cards compare to DDR GeForce cards and explore how SDR cards would react to overclocking.
DDR vs. SDR
The limitations to the GeForce product line described in the previous section were predicted by NVIDIA upon the GeForce's release. It was no secret that the GeForce would simply have problems sending out the required data on such a narrow memory bandwidth. In anticipation of this problem, NVIDIA released both SDR and DDR versions of the GeForce. What are the differences between the two varieties of GeForce cards? Both use the same processor with the same fill rate, so how can the DDR be significantly faster than the SDR? The difference comes from increasing the memory bandwidth, thus decreasing the bottleneck.
SDR, standing for Single Data Rate, is essentially the same type of memory found on pretty much all video cards up until now. What single data rate means is that for every memory clock cycle, the memory can be written to one time. This means that data can be passed from the GPU to the memory at a rate equal to the speed of the memory clock. For SDR GeForces, this speed is 166 MHz. The resulting peak available bandwidth can be calculated by multiplying the SDRAM clock speed (166 MHz) by the number of times written to per cycle (1 time) times the memory bus width (128 bits) times a conversion factor of 1/8 to convert bits to bytes. This results in 2.7 GB/s peak bandwidth for SDR GeForce cards.
DDR, standing for Double Data Rate, made its first commercial video card appearance with the GeForce. With double data rate RAM, the memory is actually written to twice per clock cycle of the memory clock. Writing on both the rising and falling edges of the cycle, the stock memory speed for DDR cards is 300 MHz, achieved by having 150 MHz RAM being written to twice per cycle. The 300 MHz rating is not the actual speed that the RAM is running at, that remains at 150 MHz, it is just that it is written to twice as fast, resulting in the '300 MHz' speed. Since memory clock speed is directly related to memory bandwidth, bandwidth is increased by having the faster memory clock. This is shown by the following peak bandwidth calculation, as described above: 150 MHz SDRAM clock * 2 writes per clock cycle (DDR) * 128 bits (memory bus width) * 1/8 (8 bits in a byte) = 4.8 GB/s peak bandwidth.
There is no question that DDR GeForce cards outperform their SDR counterparts. While theoretical fill rate remains the same in both cards, the effective fill rate is substantially greater in DDR cards due to the decreased bottleneck achieved by widening the memory bandwidth. The increase in speed proves that memory bandwidth is crucial to video card speed.
Performance of Quaver on SDR/DDR
Any attempt to substantially push a video card to its limits using the built in Quake III Arena demos is almost impossible. The level is not only low on the texture front, it also does not contain as many rooms as other levels. Resolutions and colors where the demo plays flawlessly are often too high to play the game in full with. In order to fully test how memory bandwidth and RAM size affects video card performance, a separate and more stressful benchmark is needed. For the purposes of this test, we turned to Anthony "Reverend" Tan's Quaver benchmarking demo. It was a well known fact that some levels of Quake III Arena push video cards harder than others. The most notorious of these levels is the Q3DM9 map, with a reported 30 plus megabytes of textures. Tests were performed on our Pentium III 750E system (see The Test section for more details) with Quake III Arena settings at 'normal' except for one aspect: the texture detail slider was set all the way to the right, ensuring that the maximum textures were being processed and stressing the system. Let's take a look at how 32MB SDR and DDR cards react to such a stressful environment.
As can be seen in the above graphs, the difference in speed when at 16-bit is what we would expect. By having a higher peak bandwidth value, the DDR GeForce is able to accept information into and out of the RAM at a faster rate than the SDR card can. The speed differences between the DDR GeForce and the SDR GeForce when at 32-bit color mode are the intriguing numbers. The difference in speed begins at what we would expect, with the DDR GeForce beating the SDR. However, as the resolutions get higher, not only does the frame rate begin to crawl but the difference between the DDR and SDR cards becomes almost nonexistent. To what do we owe such an interesting outcome? The answer is RAM.
Before we discussed the limitations of memory bandwidth on a cards performance, but we assumed that the amount of memory on the video card was sufficient to support the data being passed to the RAM via the processor. This assumption neglected to take into account what happens when the RAM on the video card becomes saturated with data and the GPU still needs to pass more information into it before the scene can be successfully rendered. What happens in this stage, when the RAM on the video card becomes saturated and more data still needs to be passed to it, is what is called texture swapping. The next section deals with what texture swapping is and why it is such a hindrance on smooth game play.
Texture Swapping: Hidden Evil
With the AGP bus came a new era in graphic card design. No longer would no longer would a video card have to swap textures, at least in theory. One of the most heavily touted features of the AGP bus was the fact that it could render by reading textures directly out of system memory. While this may be the case, one has to consider the very poor speed of the AGP bus as well as the fact that the GeForce does not seem to take advantage of this feature. Odds are that you have experienced texture swapping issues on many occasions during heavy 3D graphics use. The result of such texture swapping is quite evident to the user: the game or program will be responding speedily and then will begin to crawl upon entry into a new room or area with many textures. The response will remain sluggish as long as the path is changing or new objects are being encountered. The symptoms are alleviated quickly upon entry into a smaller or less complex room or area, where upon game speed and response time return to normal. To explain this phenomena, we once again turn to the memory bandwidth.
While at low colors, low resolutions, and/or low scene complexity, the 32 MB of on board video memory that many GeForce cards have is sufficient to render the current frame to the screen. The problem, however, arises when this 32 MB of available memory gets saturated with data. This usually occurs when in 32-bit color mode, high resolutions, and highly complicated scenes. The most drastic change is often times seen when going from 16-bit to 32-bit color because a scene rendered in 32-bit color requires twice as much memory as the same scene in 16-bit mode. The computer has to be able to deal with the fact that the video card's memory is full and it still needs more textures to render a frame.
The way that the computer handles the problem is by calling upon the system memory for help. That is right, the RAM you have installed in your computer is occasionally called upon to help render complex video scenes. Overflowing textures are stored in the system memory and pushed back out to the video card when called upon. While this would be great if the path to the system memory was fast, it turns out that the slowness in the game is caused by the slow path that the textures are forced to take. As explained in DDR versus SDR section, the peak available memory bandwidth in DDR cards is 4.8 GB/s and 2.7 GB/s in the SDR GeForce card. We also saw in this section how much of a bottleneck is created by going a 'slow' 2.7 GB/s, thus resulting in the large speed increase of DDR cards over SDR cards. Well, you thought that 2.7 GB/s was pokey? Wait until you see how much peak bandwidth the AGP bus has.
We can calculate the peak bandwidth using the same method mentioned in the DDR versus SDR section. For a system running at AGP 1x mode, we take the AGP clock speed (66 MHz) and multiply it by the AGP mode (1x), then multiply it by the AGP bus width (32-bits) then times it by 1/8 to make up for the fact that there are 8 bits in one byte. The resulting speed is a super slow 266 MB/s, a mere 5% of the DDR's on card memory bus. When running in AGP 2x mode, the results are not much better. Using the same calculation as above and replacing the AGP mode with 2x, we find that in this mode the peak available bandwidth is approximately 533 MB/s. Once again, this about 11% of the DDR's speed. Finally comes the speed of writing to the system memory at AGP 4x. This time we replace the 2x with a 4x and find that at AGP 4x mode, the memory can be written to at 1.06 GB/s. While this approaches the speed of the DDR GeForce, it still remains only 22% of the DDR's power.
We saw how much difference dropping from 4.8 GB/s to 2.7 GB/s makes when comparing DDR to SDR cards. 2.7 GB/s still remains at 50% of the DDR's power. Dropping to the speed found when using the AGP bus system is the reason that the computer slows down when under heavy textures. If we could only prevent using the system memory from rendering and rely on the fast DDR memory bus, we would have a winner on our hands. Well, NVIDIA has come up with just the solution: the 64 MB DDR GeForce.
The Need for 64 MB
By incorporating 64 MB of DDR memory on the video card itself, NVIDIA could almost be assured that no current games or applications will require any system memory texture swapping. Rather than having to deal with the massive bottlenecks faced via AGP 1x, 2x, or 4x, 64 MB of on card DDR RAM could allow for 4.8 GB/s peak bandwidth no matter what. Saturation of this amount of memory would occur rarely, if ever, thus making it possible to run programs with heavy textures at many more resolutions than previously possible. Speed would no longer be an issue and games would play smoothly even at these normally strenuous levels. The need for 64 MB of on card memory is there, and NVIDIA has set out to fulfill you needs.
In order to test the effect of having more on card video memory, it was necessary to choose a card that has such a feature. The first 64 MB GeForce that came into our lab was the Platinum GeForce by a Korean company named SUMA. New to the market in general and brand new to the US market, SUMA seems to be the first manufacturer who was ready to produce retail versions of 64 MB GeForce cards. A few companies out there, mainly Dell, have begun producing 64 MB GeForce cards for OEM use, however SUMA is the first one that we have seen available on the retail market. Stacked with 64 MB of DDR SDRAM in 8 chips, the Platinum GeForce is a powerhouse to say the least. In fact it is such a powerhouse that it even comes to fit an AGP-Pro slot, however the card worked fine in our test bed with a regular AGP slot.
You may be asking yourself why use SDRAM when all other DDR cards use SGRAM. Well, it seems that Infineon, the manufacturer who provides the SGRAM chips used in every DDR GeForce we have seen, does not yet have out a DDR SGRAM chip with eight megabyte density. Thus, in order to fit all 64 megabytes on the card, SUMA (as well as other 64 MB GeForce manufacturers) chose to use Hyundai 6 ns SDRAM chips that are currently produced in 8 MB densities. Outfitted with eight of these bad boys and a chrome heatsink, the SUMA Platinum GeForce was ready to hit the testbed and serve as a model for all 64 MB GeForce cards to come.
Windows 98 SE Test System
Pentium III 550E
AMD Athlon 550
|Motherboard(s)||ABIT BF6||Tyan Trinity K7 S2380 KX133 Slot-A ATX|
128MB PC133 Crucial Technology SDRAM
Quantum Fireball CR 8.4 GB UDMA 33
Leadtek WinFast GeForce 256 DDR Revision B
ASUS Siluro GF256 SDR GeForce
SUMA Platinum GeForce 64MB DDR
ATI Rage Fury MAXX
Windows 98 SE
Interactive Unreal Tournament 4.04 UTbench.dem
Pentium III 750E - Quake III Arena demo001.dm3
The results collected under a normal run of Quake III Arena are not too surprising. We find that when set to "normal" mode, as this was the mode used to perform the benchmarks, the 64 MB DDR GeForce has a bit of a tough time standing out from the 32 MB GeForce. This is due simply to the fact that the GeForce card is not being pushed to any great limits in this demo. The on card memory is not coming close to being saturated and therefore the 64 megabytes of RAM work just as effectively as the 32 megabytes.
At the resolution of 1024x768 and below we actually find that the 32 MB GeForce outperforms its 64 MB brother. This is most likely due to the fact that all 32 MB GeForce cards use faster SGRAM while 64 MB cards use slower SDRAM because of the lack of availability of 8 MB SGRAM chips. Once the memory begins to get more saturated, the difference between the two becomes less due to the fact that the GeForce processor can not send information as fast. At high colors and resolutions we see the 64 MB GeForce take the lead by a hair.
Pentium III 750E - Quake III Arena Quaver.dm3
It is not until the graphics system is put under extreme stress does the 64 MB GeForce begin to shine. Although Quaver is a high stress benchmark, it actually does a very good job at modeling real world gaming scenarios. The typically used demo001.dm3 is a good benchmark for showing how a card will react under most conditions, however Quaver addresses the stressful conditions that Quake III Arena often throws out.
The performance differences between the 32 MB GeForce and the 64 MB GeForce are present from the start. As soon as the resolution is pushed up to 640x480x32, the 64 MB GeForce shows what it is made of. The 32 MB GeForce actually begins to texture swap at this point due to the large amounts of data being passed to the card. The use of the AGP bus to transfer textures begins to become apparent to the user at this resolution as well: the game begins to get choppy and alternates from slow to fast during game play. The 64 MB GeForce, on the other hand, reacts smoothly at this resolution. We can be assured that it is the differences in memory sizes that cause the increase in performance because the GeForce is still operating at the same clock speed and memory speed.
The real difference comes at 1024x768x32. This resolution is commonly thought to be playable on a 32 MB DDR GeForce even with high texture detail enabled. As the Quaver demo shows, the 32 MB GeForce is relying heavily on system memory to store additional textures, a fact represented by the extremely slow FPS rating of 15.9. The 64 MB GeForce, on the other hand, can easily handle the large textures passed to it from the processor. By avoiding the AGP bus as a mode of texture transfer, the 64 MB GeForce is able to earn a playable FPS rating of 57.
You may be asking yourself why the ATI Rage Fury MAXX does not perform similarly to the 64 MB GeForce. Both have the same amount of RAM, right? Well, not really. The way the Rage Fury MAXX is set up it actually gives each one of its Rage 128 processors 32 MB of RAM to work with. While this results in a total card memory amount of 64 MB, each processor is only able to use 32 MB of this memory to render its frame. This makes it nearly identical to the function of a single GeForce chip except for the fact that the two processors work in conjunction on the MAXX. 64 MB total RAM? Yes. 64 MB RAM for processor use? No.
Athlon 750 - Quake III Arena demo001.dm3
The results of the cards when tested on the Athlon 750 system remain almost identical to the results of the benchmarks on the Pentium III 750. Once again we find the 32 MB GeForce beating the 64 MB card in a few occasions due to the use of SGRAM. The cards remain to be pushed to the limit by demo001.dm3 because the demo is on a very taxing map. The 64 MB GeForce comes out on top at the higher resolutions and colors, however the amount by which the 64 MB GeForce wins, a mere 1.3 FPS, is not noticeable under any circumstances.
Athlon 750 - Quake III Arena Quaver.dm3
The results are once again similar to those found on the Intel processor. The 64 MB card shows its difference right from the start at 640x480x32. Again, the differences become even larger when higher resolutions and colors are used. This is due to the fact that once the data stream becomes as large as it does at 1600x1200x32, there is no way that 32 MB of storage area can hold his data. 64 MB, on the other hand, seems to suffice and provide a relatively smooth game.
Pentium III 550E - Quake III Arena demo001.dm3
The data once again models the initial observations made on the Pentium III 750E runs. Demo001 just does not seem to push the cards enough to show a difference that justifies the increase in price.
Pentium III 550E - Quake III Arena Quaver.dm3
Here, once again, we see that the 64 MB GeForce crushes the 32 MB GeForce when the memory of the card is finally put to use. In the runs of demo001.dm3 the extra storage space available on the 64 MB GeForce goes unused. There is no performance to be gained by having additional storage spaces if it is not being utilized. The Quaver demo makes use of this space while also giving a real world gaming situation: at some time in your Quake III Arena game playing time you will come across the map used to record this demo: Q3DM9. When this day comes, if it has not already, you will understand why 64 MB of on card memory is extremely useful at specific colors and resolutions.
Athlon 550 - Quake III Arena demo001.dm3
While the Athlon 550 is a hair slower than the rest of its counterparts, the results of benchmarking on this system show the same trends as those first described. In demo001.dm3 it seems that the performance gain that comes from an additional 32 MB of video memory would not be worth the additional cost. Once again, Quaver proves differently.
Athlon 550 - Quake III Arena Quaver.dm3
Running the tests on the Athlon system at 550 MHz just goes to prove that the performance to be gained by getting a 64 MB GeForce card are not dependent on processor speed. Sure, the FPS ratings are lower over all, however the difference between the 64 MB GeForce and the 32 MB GeForce remain just as large.
Pentium III 750E - Unreal Tournament
Although the majority of times Unreal Tournament proves to show little differences between video cards, when comparing 32 MB GeForce cards to 64 MB GeForce cards a difference is to be seen. This difference does not become noticeable until the resolution is set at or above 1024x768, hence the reason the the benchmarks start here. The difference to be noted is the large performance boost that the 64 MB GeForce experiences when at 1280x1024x32. Rising over 12 FPS, the 64 MB GeForce obviously has an advantage here. This advantage is most likely a result of the heavy textures that Unreal Tournament uses to render the game.
Rather than having to mess at all with the AGP bus, the 64 MB GeForce is able to store all the textures necessary to run the game at the high resolution of 1280x1024x32. The results would be similar at 1600x1200x32, we suspect, however Unreal Tournament will not run at this mode.
Athlon 750 - Unreal Tournament
Once again we find that the 64 MB GeForce card is able to provide substantial gain in FPS rating at 1280x1024x16. The large increase in performance suggests that texture swapping has been taken out of the picture. This is not observable from the user's perspective: even when using at 32 MB GeForce card that is running about 15 FPS slower than a 64 MB GeForce, no skipping or slowing down of the game seems to occur. Although it is always nice to have a higher FPS speed during game play, there seems to be no visual advantages gained from these 15 FPS.
Pentium III 550E - Unreal Tournament
The speed difference between cards seems to remain almost constant regardless of CPU speed. This time the 64 MB GeForce performs about 11 FPS better than the 32 MB GeForce at 1280x1024x32.
Athlon 550 - Unreal Tournament
The performance difference between the two GeForce cards once again hovers around 14 FPS. Using the logic described in the Pentium III 750E benchmark, we can begin to understand why the 64 MB GeForce is faster in Unreal Tournament.
Drivers: 5.13 vs 3.76
As the old saying goes, there is more than one way to skin a cat. We have seen the benchmarks and examined how large of a bottleneck AGP texture swapping can cause on any system. Well, besides increasing the amount of RAM on a card, other factors can decrease the need for AGP texture swapping. For example, if the amount of textures being transferred could be decreased then the net result would be the same as adding more RAM. The secrete that every manufacturer of 64 MB GeForce cards do not want you to know is that the next set of drivers from NVIDIA will include just such a feature.
The recently leaked 5.13 reference drivers from NVIDIA are currently unsupported and said to be "broken" by NVIDIA. It is for this reason that we will not be supplying a download for the drivers nor a link to the downloaded drivers. The truth of the matter is that while they are currently unsupported not not fully functional, the 5.13 drivers result in a massive performance gain when paired with 32 MB GeForce cards. Using S3TC compression, the drivers are able to make use of hardware compression to shrink down the size of textures. If the amount of storage space needed by textures could be shrunk, this large amount of data could be compressed to fit inside the 32 MB space limit found on most GeForce cards. Due to the fact that the compression is done via hardware, no speed is lost during the process. Hands on testing also revealed no noticeable image quality loss.
The best way to explore what kind of performance increase can be expected with the 5.13 driver set is to benchmark using the drivers. Although the drivers worked in what appeared to be a flawless manner on the 32 MB DDR GeForce tested, the 64 MB GeForce card could not complete any runs of Quaver to test the improvement on a 64 MB system. Below is graphical representation of the speed increase to be expected due to this texture compression:
By looking at this data you can understand why both NVIDIA and video card manufacturers do not want the 5.13 drivers out yet. Releasing the drivers for public consumption would all but kill the demand for 64 MB GeForce cards. It is yet to be seen how much speed is to be gained by using the 5.13 drivers on 64 MB cards, however it seems logical that the gains would not be very large. The reason that the 32 MB GeForce is able to keep up to (and in some cases surpass) the 64 MB GeForce is because texture compression is preventing the 32 MB card from ever having to use the AGP bus as a mode of transportation. This is exactly what the 64 MB GeForce is preventing already by having such a large on card RAM amount. Compressing the textures for use in the 64 MB GeForce card would only mean that a fraction of the 64 MB storage area was in use, a quality that would not increase speed at all.
Not only are 64 MB GeForce cards hard to come by on the retail market, the are also very costly. While they only cost a fraction of the cost of the 64 MB Quadro, they are still significantly more expensive than 32 MB DDR GeForces. The question is "does the increase in speed justify the increase in price?"
Let us first ignore what we learned in the last section about the performance of the 5.13 drivers on a 32 MB GeForce card. The 64 MB GeForce seems to offer some advantages over 32 MB GeForce cards. For example, a 64 MB GeForce card pretty much makes every level in Quake III Arena playable at 1024x768x32, a goal that remained unfulfilled until now. In addition, by preventing AGP texture swapping, the 64 MB GeForce prevents those times in games where you wonder what your system is doing. Not only are the sporadic skips and jumps annoying, they can also cost you a win. From this aspect of the card, the 64 MB GeForce seems like the product many have been waiting for.
We will continue to disregard the 5.13 driver issue for a bit longer so that we may examine the down sides to the 64 MB GeForce. First there is the price issue. Costing significantly more than DDR GeForce cards. If price is of no concern, availability must be of some. It is almost impossible to track down one of these cards for retail purchase. It may be easier, in fact, to order a new computer from Dell and wait at least 3 weeks to get your 64 MB GeForce than to try and find one of these cards on the market. Both of these drawbacks combine to bring up a new problem: if you have to wait so long and pay so much, why not wait a few months until the new NVIDIA processor, code named the NV15, comes out? It seems difficult to pay over $300 to get a video card that may easily be dwarfed by NVIDIA's latest creation.
Now for the drawback that has been placed to the side of our minds: the 5.13 drivers. As the tests in the previous section show, significant speed is to be gained by using the 5.13 driver set. Although we are currently unsure how the final S3TC texture compression will work, we know that if the tests above are any indication there may not be a market for 64 MB GeForce cards for much longer. It is really too early to tell (especially since we are yet to test 5.13 drivers on 64 MB GeForce cards), but the wise consumer would wait and see how things pan out. Who knows, maybe in this case more will not prove to be better.