Original Link: http://www.anandtech.com/show/875



We've said it countless times before, there's generally a lot that happens behind the scenes at AnandTech that we can't always publish. Whether it is involving large-cache CPUs that never made it to mass production or the GeForce4 that was demonstrated last November behind closed doors, there are some things that must go unsaid until the time is right. Luckily for us, today we're able to talk about an egg we've been sitting on for a few months now.

You have to give credit where it's due and in this case the recipients should be NVIDIA's very talented group of engineers. They have consistently been producing new graphics parts in record times; NVIDIA was so ahead of schedule in fact that they could have released their newest GPU, the NV25, last December. The Santa Clara based company was on track to release the NV25 in December but held off in order to prevent cannibalizing sales of their GeForce3 line that was doing so very well.

More recently you've undoubtedly noticed the apparent lack of GeForce3 Ti 500 cards in the market. Board manufacturers have been complaining about Ti 500 chip shortages and even AMD had problems outfitting their last run of high-performance test systems with GeForce3 Ti 500 cards. It's no coincidence that the GeForce3 Ti 500 has been slowly fading away and it just happens to be near the release of NVIDIA's next-generation graphics cores.

Earlier this week ATI announced that they would be producing Radeon 8500 and 8500LE based cards with 128MB frame buffers, and later in this review you'll see exactly how this ties into NVIDIA's plans for the coming months.

And now to today's introductions; as you've undoubtedly heard if you've been reading the news, today NVIDIA is releasing two new lines of desktop GPUs - the GeForce4 Titanium line and the GeForce4 MX line. On the mobile side, NVIDIA will also be introducing the GeForce4 Go mobile GPU line.

In many ways this introduction is highly reminiscent of the launch of Intel's Pentium III three years ago. There will be much controversy over whether this new line of GPUs is deserving of a new name based on their improvements, that's just one of the many things we'll be tackling in this review.



Three times's a charm, but what about the fourth?

When the GeForce3 was released, NVIDIA wowed the press and end-users alike. They did so not with benchmarks or extremely high frame rates, but with the promise of enabling much greater reality in games. This lends us to the classic chicken and the egg syndrome when it comes to 3D graphics. Developers aren't going to spend time incorporating features such as DirectX 8 pixel and vertex shaders into their games unless the vast majority of the market has hardware capable of supporting such features. At the same time, the hardware will never make it out to the market if the vendors aren't producing it with the knowledge that end-users will essentially be paying for features they won't use for maybe even two years. Luckily both ATI and NVIDIA have provided the necessary hardware to make that a reality and now they're both on a quest to outfit the vast majority of the market that is still running on pre-GeForce3 platforms with better cards.

The overall recommendations for the GeForce3 were that the technology has promise but there was no reason to purchase the card upon its release because of a lack of games that demand that sort of power. Fast forwarding to today, there's still no compelling reason to have the power of a GeForce3 however with NVIDIA's introduction of the GeForce3 Ti 200 it doesn't make sense not to have a GeForce3.

The beauty of the GeForce3's introduction was that it was brought to market alongside a brand new API by Microsoft, DirectX 8. This gave the features that the GeForce3 touted much more credibility since they would definitely be used in games going forward. The launch of the GeForce4 isn't as blessed as DirectX 9 still lies in the somewhat distant future. The task on NVIDIA's hands is to make the GeForce4 as attractive as possible with other tangible features to convince the world to upgrade. It's up to you to decide whether the features are worth it, and it's up to us to present them to you.

The GeForce4 core, as we've mentioned before, has been using the codename NV25 on all NVIDIA roadmaps. Architecturally, the core is quite impressive, as it is approximately 5% larger than the NV20 core while built on the same 0.15-micron process; keep that in mind as we go through the improvements the NV25 offers.

First and foremost, as it was pretty much expected from the start, the NV25 now features two vertex shader units akin to the Xbox's NV2A GPU. The presence of two vertex shader units will help in games that make extensive use of vertex shading operations as they are inherently parallel in nature. As developers grow more ambitious with lighting and other vertex processing algorithms, the need for fast vertex shader units operating in parallel will increase. In the end both the pixel shader and vertex shader units will need to increase in performance in order to avoid any pipeline bottlenecks but you can expect to see the performance of pixel shaders improve tremendously while the future performance of vertex shaders will improve only a few fold over what we have today.

As you can see from the above picture of the NV25 core, the second vertex shader unit occupies a decent portion of the GPU's die. Just by visual estimation we'd say that the second vertex shader is around 6.25% of the GeForce4's die. Going back to the first thing we said about the NV25 core, the fact that it's only 5% larger than the NV20 core becomes that much more impressive.

The pixel shader unit remains unchanged as does the majority of the rendering pipeline. The core still features 4 pixel pipelines and is capable of processing two textures per pixel; because of the core's programmable nature, it is possible to enable single pass quad-texturing just as was the case with the original GeForce3.

NVIDIA's marketing name for the original set of pixel and vertex shader units of the GeForce3 was the nfiniteFX engine. The successor found in the GeForce4 is thus named nfiniteFX II and basically consists of the addition of a second vertex shader. Although we haven't mentioned clock speed yet, it should be mentioned that both the vertex shaders and pixel shaders operate at higher clock speeds as well in the GeForce4.

The vast majority of the performance improvements of the NV25 core come from what NVIDIA is calling their Lightspeed Memory Architecture II. Before we get to what's new about LMA II let's talk about what's borrowed from the GeForce3.

NVIDIA's crossbar memory architecture is still present from the GeForce3. The architecture dictates that the GPU be outfitted with not one but four independent memory controllers, each with their own dedicated 32-bit DDR memory bus. All memory requests by the GPU are then split and load balanced among the individual memory controllers. What's interesting to note here is that the GeForce architecture benefits from this granularity in memory accesses while the same cannot be said for all GPU architectures. When ATI was designing the R200 core that went into the Radeon 8500 they noticed that by moving to larger 128-bit memory accesses rather than smaller memory accesses they saw a performance boost in most situations. ATI attributed this to the nature of some of their GPU caches.

This brings us to a new feature of the NV25's LMA II - NVIDIA's Quad Cache. With so much going on in the CPU world, you should already understand the need for efficient and high performance caches in any processor (graphics included). The Quad Cache is NVIDIA's name for the NV25's primitives, vertex, texture and pixel caches. The names of the four caches clearly explain their purpose in the GPU; the primitives and vertex caches store triangle data while the texture and pixel caches store texture and pixel data respectively. Not much is known about the size of these caches or how they've changed from those that were present in the GeForce3 but they are worth mentioning. It should also be mentioned that these types of caches aren't exclusive to the NV25 as the Radeon 8500's core has a similar set of caches. The reason caches play a much smaller role in the discussion of GPUs than they do in the CPU world is because they are not nearly as large of a portion of a GPU's core as the L1 and L2 caches are on most CPUs. Remember that today's GPUs run much slower than their memory busses while today's CPUs run 2 - 16 times faster than their memory buses thus making cache a much more important part of a CPU than of a GPU.

Lossless Z-buffer compression is present in the NV25 core seemingly unchanged from the GeForce3. As you'll remember from our technology overview of the GeForce3, z-buffer data (data that indicates how "deep" in the screen or far away an object is from the user) is actually very easily compressible. Using a lossless compression algorithm, meaning that once uncompressed the data's values will be identical to their original form (much like zip file compression), NVIDIA is able to attain up to a 4:1 compression ratio on z-buffer data. Since a great deal of memory bandwidth is used by z-buffer reads and writes (the z-buffer is read from whenever a pixel is drawn), this compression can improve memory bandwidth utilization significantly. This is unchanged from the original GeForce3 core and a similar feature is present in all Radeon cores.

Fast Z-clear also makes its way from the original GeForce3; this technology, also found in the Radeon cores, is an ultra-fast algorithm that is used to set all of the values in the z-buffer to zero. Why would you want to do that? After rendering a scene, the z-buffer values are no longer of any use for the next scene and thus must be discarded.

By far the biggest improvement in the NV25's architecture that more than justifies calling it the 2nd generation Lightning Memory Architecture are the improvements in the Visibility Subsystem. The original GeForce3 introduced a feature that is known as Z-occlusion culling, which is essentially a technology that allows the GPU to look at z-buffer values in order to see if a pixel will be viewable when rendered. If the pixel isn't viewable and will be overwritten (or overdrawn) by another pixel, then the pixel isn't rendered and memory bandwidth is saved. This addresses the problem known as overdraw and in the NV25's architecture the performance of the Z-occlusion culling algorithms have been improved tremendously. While NVIDIA vaguely states that "LMA II increases fill rates by optimizing the read, write and selection of pixels to be rendered" this is one of those cases when reality is understated by marketing. You'll get to see the performance data behind this in the test section, but the GeForce4, clock for clock, is approximately 25% more efficient at discarding unseen pixels than the GeForce3.

The final feature of LMA II is what is known as auto pre-charge. Remember that unlike cache, DRAM is capacitance based and thus must be refreshed in order to retrieve data from a DRAM cell. The benefit of this is that DRAM can be manufactured using far fewer costly components than cache but the downside being that there are sometimes hefty delays in getting access to information. One of those such delays occurs when attempting to read from or write to a part of memory other than the current area being worked with. What auto pre-charge does is it uses a bit of logic in the memory controller to effectively guess at what rows and columns in the DRAM array will be accessed next. These rows and columns are then charged before they are used thus reducing the amount of time that the GPU will wait if it does end up requesting data from the precharged rows/columns (banks). Should the GPU not request data from those banks then business is as usual and nothing is lost.

Auto pre-charge in addition to the NV25's Quad Cache are very necessary in preventing the GPU from wasting precious clock cycles in an idle state, waiting for data from memory. Auto pre-charge won't offer more memory bandwidth directly however it will offer lower latency memory accesses which does in turn translate into a greater percentage of useful memory bandwidth.



Accuview AA

We owe quite a bit to the late 3dfx, including some of the very first introductions to the world of true 3D gaming on our PCs. One thing that we most definitely have to "thank" 3dfx for is exposing us to anti-aliasing. After 3dfx demonstrated their VSA-100 architecture running current generation (at the time) games with FSAA enabled, we were spoiled for life. Suddenly we began picking out aliasing in virtually every game we saw, even first person shooters where aliasing wasn't supposed to be that big of a deal since everything was moving so quickly were victim to our obsession with FSAA. Unfortunately, the hardware available back then wasn't powerful enough to allow for FSAA to be enabled on most games. Finally, two years later, are we getting hardware that can run not only the present generation titles but as you'll soon see, the next-generation of games at high resolutions, at high frame rates, with AA enabled.

Going back to that picture of the NV25 core you'll notice that almost 13% of the die is dedicated to what NVIDIA calls their Accuview AA Engine. When you're dealing with a part that's as complex as the NV25, dedicating such a large portion of the die to a single feature must mean that the feature carries great importance in the eyes of the manufacturer. In this case it's clear that NVIDIA's goal is to not only offer AA as an option to everyone, but to make it as much of a standard as 32-bit color depths.

What the Accuview AA engine allows the NV25 core to do is have higher performance AA, even to the point where Quincunx AA can be performed at exactly the same speed as 2X AA.

The new AA engine also allows for a new 4X AA mode only under Direct3D applications called 4XS. The difference between 4X and 4XS is that the latter offers more texture samples per pixel in order to generate a better looking AA image.



nView

One of the most useful features of the NV25 core is the inclusion of the nView multiple display core. Originally introduced alongside the GeForce2 MX as TwinView, nView is an extension of TwinView with much greater flexibility from a hardware standpoint. The NV25 core features dual 350MHz integrated RAMDACs thus allowing for dual CRTs without any external circuitry. Although the core itself doesn't feature any TMDS transmitters there is support for up to two external TMDS transmitters on the GeForce4 to drive dual digital flat panel displays.


nView setup is extremely easy to use

The hardware behind nView enables the technology but it is the software that exists that truly makes it a feature. From a software standpoint, NVIDIA managed to hire talent from the same company that ATI gained a lot of its Hydravision technology from - Appian. The end result is that nView is virtually identical to Hydravision in terms of features, but NVIDIA has made it slightly more user friendly by integrating a very simple 11-step setup process that can have any user up and running in as few as 11 clicks. The wizard clearly explains all of the major nView features and offers options to turn on/off features like multiple desktops.


nView allows you to make windows transparent when dragged, unfortunately with larger windows this can be quite slow


Multiple desktops can be navigated easily through Explorer integration of nView

Alongside the easy to use setup there are some other neat features that give nView the slight feature edge over ATI's Hydravision. For example, nView has a feature that allows you to set a double right-click to open a HTML link up in a new IE browser window on the second monitor. This is perfect for surfing websites or discussion forums so you can keep the main page open while opening an article or thread on the second monitor just by doubleclicking.


The nView extensions option appears in almost all right-click menus after installation

There are still features that both nView and Hydravision lack that give Matrox the slight leg up on both technologies but after using nView and Hydravision extensively for the past couple of weeks we can honestly say that NVIDIA has done an excellent job with nView on the GeForce4.



The GeForce4 Titanium Lineup

We've taken you this far without mentioning a single product based on the NV25 core, and now we're going to talk about just that. Today the GeForce4 is being introduced in two distinct flavors, the Titanium 4600 and the Titanium 4400. As was the case with the various GeForce3 Titanium cards, the two differ only in their GPU and memory clocks.

The GeForce4 Ti 4600 is NVIDIA's new flagship and will feature a 300MHz core clock. The card will also feature a 128-bit DDR memory bus running at 325MHz that connects to a total of 128MB of DDR SDRAM. This DDR SDRAM is using BGA packaging which allows NVIDIA to not only layout the memory on the card much easier but they actually end up benefiting from reasonably low prices on 16MB density BGA memory chips in the market.

The 325MHz memory clock gives the Ti 4600 an incredible 10.4GB/s of raw memory bandwidth which is greater than the previous bandwidth champ, the Radeon 8500 with 8.8GB/s of bandwidth. Also keep in mind that these bandwidth figures aren't taking into account efficiency improvements such as through LMA II on the GeForce4 or HyperZ II on the Radeon 8500.


Click to Enlarge


Click to Enlarge

The GeForce4 Ti 4600 also sports NVIDIA new reference cooling design on the core. The low profile heatsink is coupled with a much more durable fan than NVIDIA's older designs although it's unlikely that third party manufacturers will stick solely to NVIDIA's design. Since the Ti 4600 will become NVIDIA's flagship, you can also expect it to carry NVIDIA's flagship price of $399.

Just beneath the Ti 4600 in the product line is the GeForce4 Ti 4400; this card features a 275MHz GPU clock and a 275MHz DDR memory clock (effectively 550MHz) on the same 128-bit DDR bus. The Ti 4400 will also be shipping with a 128MB frame buffer using BGA memory. The final decision is up to board manufacturers as to whether or not they'll use 128MB frame buffers however we don't expect many to make Ti 4600 or Ti 4400 offerings with 64MB frame buffers at the launch.

The use of a larger 128MB frame buffer enables game developers to use larger textures without fear of running out of texture memory on the card itself. Another interesting fact is that a 128MB frame buffer enables anti-aliasing at 1600 x 1200 however even with the GeForce4 you won't be able to turn on AA at such a high resolution and expect smooth frame rates in the latest games.

The new cooling system is also present on the Ti 4600 reference board and the cards will carry a $299 price tag.

A third GeForce4 model will be introduced in approximately 8 weeks, and that is the GeForce4 Ti 4200. The Ti 4200 will be identical to the 4600 and 4400 except that it will run at 225MHz core and 250MHz memory. The clock speeds haven't been finalized yet but the truly attractive point of the GeForce4 Ti 4200 is its $199 MSRP. The Ti 4200 will actually end up playing an extremely important role in NVIDIA's lineup but in order to understand why, you'll have to understand the GeForce4 MX - NVIDIA's mainstream graphics part.



The NV17 comes home

We really tried our best to hint at where we thought this one was going around the Comdex timeframe with our article on the NV17M. If you remember, the NV17M was announced on the first day of last year's Comdex and interestingly enough the mobile GPU was announced without an official name.

It turned out that the NV17M was eventually going to be a desktop part as well under the codename NV17, which is what we had always known as the GeForce3 MX. The major problem with this name is that the NV17 core lacks all of the DirectX 8 pixel and vertex shader units that made the original GeForce3 what it was. Instead, the NV17 would basically be a GeForce2 MX with an improved memory controller, multisample AA unit, and updated video features; another way of looking at it would be the GeForce3 without two pixel pipelines or DirectX 8 compliance. The problem most developers will have with this is that the uneducated end user would end up purchasing the GeForce3 MX with the idea that it had at least the basic functionality of the regular GeForce3, only a bit slower. While in reality, the GeForce3 MX would not allow developers to assume that a great portion of the market had DX8 compliant cards.

Luckily NVIDIA decided against calling the desktop NV17 the GeForce3 MX, unfortunately they stuck with the name GeForce4 MX. This is even more misleading to those that aren't well informed as it gives the impression that the card at least has the minimal set of features that the GeForce3 had - which it doesn't.

More specifically, the GeForce4 MX features no DirectX 8 pixel shaders and only limited support for vertex shaders. The chip does support NVIDIA's Shading Rasterizer (NSR) from the original GeForce2 but that's definitely a step back from the programmable nature of the GeForce3 core.

NVIDIA did however make the GeForce4 MX very powerful at running present day titles. While the core still only features two pixel pipelines with the ability of applying two textures per pipeline (thus offering half the theoretical fill rate of an equivalently clocked GeForce4), it does feature the Lightning Memory Architecture II from its elder brother as well as a few other features.

The GeForce4 MX's memory controller isn't directly borrowed from the GeForce4, the only change that was made was that there are only two independent memory controllers that make up the LMA II on the GeForce4 MX. So instead of having 4 x 32-bit load-balanced memory controllers, the GeForce4 MX only has 2 x 64-bit load balanced memory controllers. While it's unclear the performance difference that exists, if any, between the two options (and also very difficult to measure) it is very clear why NVIDIA chose such a setup for the GeForce4 MX's memory controllers. Remember that the nForce chipset features a dual channel 64-bit DDR memory interface, and also remember that the GeForce2 MX was the integrated graphics core of the first-generation nForce chipset. You can just as easily expect the GeForce4 MX core to be used in the next-generation nForce platform especially since the memory controller will work so very well in it without modification.

The rest of the GeForce4 MX is virtually identical to that of the GeForce4; it features the same Accuview AA engine, the same nView support, and the same improved Visibility Subsystem. The only remaining difference is that the GeForce4 MX features dual integrated TMDS transmitters for dual DVI output of resolutions up to 1280 x 1024 per monitor. The inclusion of nView makes the GeForce4 MX a very attractive card, especially at the lower price points for corporate users looking for good dual monitor support.

Today the GeForce4 MX line consists of three cards: the MX 460, the MX 440 and the MX 420. While all three feature a 128-bit memory bus with a 64MB frame buffer, only the 460 and 440 use DDR SDRAM, the 420 features conventional SDR SDRAM.


Click to Enlarge


Click to Enlarge

The GeForce4 MX 460 will be priced at $179 and it will have a 300MHz GPU clock and a 275MHz memory clock. The MX 440 will be clocked internally at 270MHz with a 200MHz memory clock and it will be priced at $149. Finally the entry-level GeForce4 MX 420 will be clocked at 250MHz core with 166MHz memory and it will retail for $99.

Now it's time to address the obvious tradeoffs that NVIDIA made on the GeForce4 MX. First of all there is the naming convention which developers have already been complaining to NVIDIA about. No longer can developers assume that going forward, their user base will have a DX8 compliant video card installed. You have to realize that well over 80% of all graphics cards sold are priced under $200. Since that is the market that the GeForce4 MX is after, should the product succeed then a great deal of that 80%+ of the market will not have DX8 support which means that developers will be less inclined to support those features in their games. For anyone that's ever seen what can be done with programmable pixel shaders, it's very clear that we want developers to use those features as soon as possible.

From NVIDIA's standpoint, there are no major titles that require those features and the cost in die size of implementing DX8 pixel and vertex shaders is too great for a 0.15-micron part priced at as low as $99. The GeForce4 MX will do quite well at today's titles while offering great features such as nView for those that care about more than just gaming.

The situation we're placed in is a difficult one; provided that you want cheap but excellent performance in today's games, it would seem that you're limited to the GeForce4 MX. Yet by purchasing the GeForce4 MX you're actually working against developers bringing DX8 features to the mass market. This leaves you with three options, either stick with a GeForce3 Ti 200 or ATI's Radeon 8500LE 128MB.

Remember the GeForce4 Ti 4200 we mentioned earlier? That's your third option. NVIDIA realizes that at the $199 price point the clear recommendation would be to stay far, far away from a GeForce4 MX as you wouldn't be buying any future performance at all. At the same time, ATI realized the weakness in NVIDIA's product line (they've known about it for a while now) and positioned the 128MB Radeon 8500LE at a very similar price point as the GeForce4 MX 460. When picking between the two, the Radeon 8500LE clearly offers better support for the future and would be the better option of the two. In order to prevent the Radeon 8500LE from completely taking over the precious $150 - $199 market, NVIDIA will be providing the GeForce4 Ti 4200 at the $199 price point. This part will offer full DX8 pixel and vertex shader support, yet it will be priced at a very competitive level.

Although this effectively makes the GeForce4 MX 460 pointless as it is priced only $20 less, it does solve the issue of getting too many new non-DX8 compliant cards into the hands of the end users. The important task now is to make sure that all serious gamers opt for at least the GeForce4 Ti 4200 and not a GeForce4 MX card. Again, the GeForce4 4200 won't be available for another 8 weeks or so but it's definitely worth waiting for if you're going to be purchasing a sub-$200 gaming card.



The Test

We tested using a AMD Athlon XP 2000+ on an ASUS A7V266-E under Windows XP. All NVIDIA cards used the latest 27.30 drivers with GeForce4 support while all ATI cards used the latest 7.66 beta drivers released on ATI's website.

Due to time constraints we were not able to include the GeForce4 Ti 4200, Kyro II and Radeon 7500 although we will be doing future comparisons with those cards included.

Memory Controller & Occlusion Culling Performance - VillageMark

VillageMark is a benchmark that was developed by PowerVR to show off the benefits of their tile rendering architecture which you may be most familiar with from the Kyro II card. Based on a tile or deferred rendering architecture, the Kyro II's core had the unique capability of not rendering any pixels that wouldn't be seen by the user. The VillageMark test has an incredible amount of overdraw (where the engine instructs the graphics unit to draw a large number of pixels that will never be seen) in order to stress the importance of features such as z-occlusion culling and other memory bandwidth optimization techniques.

While none of the architectures we're comparing today can be classified as deferred renderers, they do have some very advanced techniques borrowed from the deferred rendering world and thus VillageMark becomes a great benchmark of them. This way we can find out exactly how much of an improvement NVIDIA's LMA II is over the original Lightning Memory Architecture.

VillageMark
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce4 240/500

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

118

107

94

92

83

75

63

53

43

37

37

|
0
|
24
|
47
|
71
|
94
|
118
|
142

The first thing that should be mentioned is that the latest drivers from ATI resulted in a bit of a performance drop here. Normally the Radeon 8500 comes out with a 113 fps score, but with the latest drivers the score is pegged at 92 fps. Using the latest drivers fixed some significant bugs in one of the games we used to benchmark and thus we stuck with the latest drivers for this review.

Comparing clock for clock the GeForce4 to the GeForce3 Ti 500, both running at 240/500, we can see that there is definitely a significant improvement in the GeForce4's Z-occlusion culling algorithms. While the improvement isn't the 50% that NVIDIA is boasting, the 25% improvement that can be seen here is impressive none the less. The increased clock speed of the shipping GeForce4 boards make the improvement all that more noticeable.

The GeForce4's LMA II is clearly more than just a new marketing term.



Synthetic DirectX 8 Performance - 3DMark 2001

Next we'll take a look at some theoretical numbers from 3DMark 2001 in order to make any predictions about the performance of the contenders overall.

 
Score
Game 1 - Car Chase - Low Detail
Game 1 - Car Chase - High Detail
Game 2 - Dragothic - Low Detail
Game 2 - Dragothic - High Detail
Game 3 - Lobby - Low Detail
Game 3 - Lobby - High Detail
Game 4 - Nature
Fill Rate (Single Textured)
Fill Rate (Multi Textured)
High Poly Count (1 light)
High Poly Count (8 lights)
EMBM
DOT3
Vertex Shader
Pixel Shader
Point Sprites
NVIDIA GeForce4 Ti 4600
9814
146
51.2
182.6
105.4
132.6
60.2
43.4
1038.8
2299
50
12.6
174.2
151.2
101.9
122
30.3
NVIDIA GeForce4 Ti 4400
9369
144.4
51.2
167.5
98.1
130.9
59.8
38.1
912.4
2077.7
46.5
11.5
160.7
132.2
92.3
110.5
26
NVIDIA GeForce4 240/500
8733
137.9
50.3
153.1
85.3
127.8
59.4
32.2
802.2
1817.2
38.4
8.1
146.1
117.4
80.9
97.6
19.8
ATI Radeon 8500
8526
132.6
47.9
133.2
79.5
128.3
57.7
44.2
819
1807
36
9.7
112.6
87.6
87.4
101.5
28.1
NVIDIA GeForce3 Ti 500
8095
126.5
50.5
120.3
67.2
126.1
59.5
41.1
739.7
1556.6
26.1
5.6
122.2
119.8
55.3
90.6
17.6
ATI Radeon 8500LE
8088
126.9
49.6
121.4
71.9
123.8
57
39.9
743.2
1644.3
35.6
8.8
103.5
79.2
79.2
91.6
25.4
NVIDIA GeForce3
7139
113
48.8
109
57.7
117.6
56.8
23.8
640
1326.5
23.7
5.1
109.9
105.2
42.3
77
14.8
NVIDIA GeForce3 Ti 200
6565
101.8
47.2
100.1
50.4
109.6
54.1
20.8
558.4
1156.4
21.5
4.5
99.8
91.7
36.9
67.6
12.9
NVIDIA GeForce4 MX 460
6091
112.5
45.4
99.2
50.3
103
51.5
N/A
546.2
601.4
31.2
7.3
N/A
77.8
48.2
N/A
10.3
NVIDIA GeForce4 MX 440
5480
95.5
43.7
87.2
45.1
91.5
48.1
N/A
397.3
550.5
27.8
6.6
N/A
58.7
44.1
N/A
7.4
NVIDIA GeForce2 Ti 200
4941
85
38.1
82.4
38.5
85.5
44
N/A
326
623.7
26.9
6
N/A
50.8
54.7
N/A
8.6

We can already see that the GeForce4s are coming off to a strong start and the GeForce4, when clocked at 240/500 (the same speed as the GeForce3 Ti 500), is already significantly faster than its equivalently clocked predecessor. This is again strong proof that the architectural improvements in the GeForce4 are significant enough to warrant the name change. Looking at the vertex shader performance alone, the performance going from a GeForce3 Ti 500 to a GeForce4 clocked at the same 240/500 speed is an increase of 46% courtesy of the dual vertex shaders in the GeForce4. Interestingly enough, the Radeon 8500 vertex shader unit still manages to perform quite well only being defeated by the GeForce4 Ti 4400 in vertex shader performance.

Polygon throughput has also improved tremendously with the GeForce4 which should help next-generation games such as Unreal Tournament II and Unreal 2 which we will attempt to characterize later on with the Unreal Performance Test 2002.

Do take note of the tests that the GeForce4 MX could not complete because of its lack of full DX8 compliance.



Quake III Arena

Quake III Arena 1.30 demo four
High Quality - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

ATI Radeon 8500

NVIDIA GeForce3

ATI Radeon 8500LE

NVIDIA GeForce4 MX 460

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

218.4

216

211.2

202.6

196.2

191.6

186.7

181.9

153.8

131.8

|
0
|
44
|
87
|
131
|
175
|
218
|
262

Starting out with a fairly basic game (by today's standards) at a fairly common resolution we see that there's relatively no performance benefit to be seen from the new GeForce4s. Luckily for NVIDIA, running Quake III Arena at 1024 x 768 wasn't the main goal of the NV25 and that you'll see going forward. The performance of all of these cards in this benchmark is beyond respectable, with "slowest" card coming in at above 130 fps.

The GeForce4 MX cards perform quite well in this test as they outperform not only the top of the line GeForce2 but also the GeForce3 Ti 200 in some cases. We also see that the Radeon 8500LE performs almost on-par with GeForce4 MX 460 yet it offers full DX8 pixel and vertex shader support at a very similar price point.

Quake III Arena
High Quality - 1280x1024x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

ATI Radeon 8500

NVIDIA GeForce3

ATI Radeon 8500LE

NVIDIA GeForce4 MX 460

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

195.6

183.7

165.8

152

146.2

138.2

134.4

128.5

103.2

84.3

|
0
|
39
|
78
|
117
|
156
|
196
|
235

The situation changes a bit as we increase the resolution to 1280 x 1024, now the GeForce4 Ti 4400 is almost 11% faster than the GeForce3 Ti 500; the Ti 4600 being just shy of 18% faster than the Ti 500.

Quake III Arena
High Quality - 1600x1200x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

ATI Radeon 8500

NVIDIA GeForce3

ATI Radeon 8500LE

NVIDIA GeForce4 MX 460

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

160.6

143.1

123.5

111.4

106.9

100.6

97.5

92.8

73.1

59.3

|
0
|
32
|
64
|
96
|
128
|
161
|
193

For our final test with Quake III Arena we have the classic 1600 x 1200 test. When the GeForce2 Ultra was released we were amazed at its ability to run Quake III at above 60 fps at 1600 x 1200, today we have a cad that is capable of running the game at that high of a resolution at more than 160 fps.

Here the GeForce4 can truly begin to distance itself from the GeForce3 Ti 500, but this is far from where the card shines.



Serious Sam: The Second Encounter

When Serious Sam was released it instantly become a benchmarking favorite, courtesy of Croteam's attention to detail when it came to making their engine as configurable as possible. The Second Encounter is a direct evolution of the original Serious Sam title with an even more stressful set of options within the engine. We ran the benchmark using the publicly available demo version of the Second Encounter with the demo's built in extreme defaults enabled. The only exceptions that were that anisotropic filtering was disabled as was Truform on all ATI cards to prevent unfairly penalizing their performance.

Serious Sam: The Second Encounter
Fill Rate (Mpixels/s)
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

NVIDIA GeForce4 MX 460

NVIDIA GeForce3 Ti 200

ATI Radeon 8500LE

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 440

764.77

645.77

559.27

484

467.66

449.69

421.44

368.16

331.04

316.53

|
0
|
153
|
306
|
459
|
612
|
765
|
918

As a prelude to the actual performance benchmarks we'll use Serious Sam's built in synthetic test to give us an idea of what sort of fill rates we should expect the various cards to be pushing. Here we can see that the GeForce4 Ti 4600 comes out with a 36% higher fill rate than the GeForce3 Ti 500 which will definitely translate into a real world performance improvement, especially once we start getting into more memory bandwidth limited situations.

Serious Sam: The Second Encounter
Polygon Throughput (MTriangles/sec)
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

ATI Radeon 8500

ATI Radeon 8500LE

9.78

9.07

6.99

6.03

5.26

4.44

3.77

2.86

2.13

2.08

|
0
|
2
|
4
|
6
|
8
|
10
|
12

Here we notice that there are some serious issues with the latest drivers for the Radeon 8500 and Serious Sam as we know the cards should be performing much, much better than they are here.

What's interesting to note is that the Ti 4600 is not only able to offer a 36% higher fill rate than the Geforce3 Ti 500 but also push 39% more polygons than the previous generation flagship. This means that regardless of whether the situation is fill rate/memory bandwidth limited or geometry limited, the Ti 4600 will be able to produce significantly faster results than even the high-performing Ti 500.

We won't spend too much more time on what to expect however and we'll just dive into what we actually ended up seeing...

Serious Sam: The Second Encounter
High Quality - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 460

ATI Radeon 8500

NVIDIA GeForce2 Ti 200

ATI Radeon 8500LE

NVIDIA GeForce4 MX 440

109.7

99.7

78.9

64.9

59.9

50.1

46.8

42.9

41.4

39.6

|
0
|
22
|
44
|
66
|
88
|
110
|
132

The first thing we've got to talk about before we even touch on the GeForce4 is the Radeon 8500. ATI's latest beta drivers, although fixing a number of problems in arguably more important benchmarks (e.g. UPT2002), have reduced performance significantly under Serious Sam. As you can tell by the above performanc echart, the Radeon 8500 drops from where it normally resides between the GeForce3 Ti 200 and the GeForce3 down to the level of the GeForce2 Ti 200. ATI is approximately a month away from shipping a WHQL certified version of these drivers (v7.66) which should hopefully have all of the kinks worked out.

Moving on to the GeForce4, now we're able to see this card shine. The Ti 4600 is an astounding 39% faster than the GeForce3 Ti 500, while the Ti 4400 still holds an impressive 26% lead over the previous flagship.

The GeForce4 MX doesn't perform all too well in this test, only to be outperformed by the GeForce3 Ti 200. It seems as if the two pixel pipeline limitation is actually visible in this and other newer games causing the new MX series to lag behind a bit.

Serious Sam: The Second Encounter
High Quality - 1280x1024x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 460

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 440

76

67

52.4

45.4

39.7

32.9

28.6

25.5

25.4

24.3

|
0
|
15
|
30
|
46
|
61
|
76
|
91

Moving on to 1280 x 1024, the Ti 4600 is exactly 50% faster than the GeForce3 Ti 500. Normally with every sequential GPU release we notice small and arguably significant performance improvements, but this is proof alone that the GeForce4 is much more than NVIDIA has ever put forth in their very successful reign.

The GeForce4 MX 460 remains just below the GeForce3 Ti 200 with the Radeon 8500 clocking in just below it. With updated drivers the Radeon 8500 could easily pull ahead to the level of the GeForce3, however we'll have to wait a little while longer for those.

The GeForce4 Ti 4200 should offer performance somewhere in between the GeForce3 Ti 500 and the GeForce4 Ti 4400 although we didn't get enough time to run a quick set of benchmarks at the 225/500 clock speed.

Serious Sam: The Second Encounter
High Quality - 1600x1200x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 460

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 440

55

47.5

36.9

31.8

28

25.2

21.9

17.6

17.2

17.2

|
0
|
11
|
22
|
33
|
44
|
55
|
66

We finish off with scores at 1600 x 1200 where the Ti 4600 isn't able to make 60 fps which isn't bad considering that the old flagship GeForce3 Ti 500 can't break 40 fps. The rest of the cards clearly can't cope with the higher resolution as it does take the incredibly fill rate and memory bandwidth of a card like the GeForce4 Ti 4400 or 4600 in order to run at 1600 x 1200 relatively smoothly.

We'd still opt for a lower resolution in order to get a higher frame rate and one that's more consistently above 60 fps.



Return to Castle Wolfenstein

We started out this comparison with Quake III Arena as the benchmark of choice and pointed out its age as all competing cards were able to produce frame rates over 130 fps. Now we have a much more up-to-date implementation of the Quake III engine in Return to Castle Wolfenstein. This implementation is definitely much more stressful as we're not able to see the same types of frame rates, although the standings don't change all too much. An interesting note to make is that the 128MB frame buffers of the GeForce4 cards do come in handy when all of the texture quality and detail settings are turned up as it significantly reduces the amount of variation in frame rates.

As usual, we tested with the default high quality settings enabled and simply adjusted the resolution from 1024 x 768 up to 1600 x 1200.

Return to Castle Wolfenstein
atdemo1 - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

NVIDIA GeForce4 MX 460

NVIDIA GeForce3 Ti 200

ATI Radeon 8500LE

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

150.5

149.6

143.5

139.9

138.4

133.8

133

132.6

116.9

100.8

|
0
|
30
|
60
|
90
|
120
|
151
|
181

Just as was the case with Quake III Arena, initially at 1024 x 768 we don't see much variation between the top few cards. The GeForce4 does not distance itself from the GeForce3 all that much, and the GeForce4 MX performs quite well, even outperforming the GeForce3 Ti 200.

Return to Castle Wolfenstein
atdemo1 - 1280x1024x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

NVIDIA GeForce4 MX 460

NVIDIA GeForce3 Ti 200

ATI Radeon 8500LE

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

144.7

138

111.5

110.8

107.6

103.7

101.6

96

85.9

66.9

|
0
|
29
|
58
|
87
|
116
|
145
|
174

Increasing the resolution one notch divides the results much quicker than under Quake III Arena, with the two GeForce4 cards separating themselves from the Ti 500 by a good 24 - 29% margin.

The raw memory bandwidth of the GeForce4 MX 460 is what is able to keep it up in performance, just on the heels of the Radeon 8500 and even slightly ahead of the GeForce3 Ti 200. Unlike Serious Sam, the two pixel pipeline limitation of the MX's architecture does not pose a significant setback for the card under RtCW. We musn't forget that the game is still based on the Quake III engine which was released when cards had no more than two rendering pipelines whereas Serious Sam is a much newer engine.

Return to Castle Wolfenstein
atdemo1 - 1600x1200x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

NVIDIA GeForce4 MX 460

ATI Radeon 8500LE

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

125

112.8

90.6

82.4

79.3

78.8

73.2

73.1

61.6

44.8

|
0
|
25
|
50
|
75
|
100
|
125
|
150

At 1600 x 1200 we once again see the same stellar performance out of the GeForce4 series; the MX 460 continues to do good here as well.



Unreal Performance Test 2002 - Build 856

We've of course saved the best for last. About a week ago we introduced a brand new way of measuring video card performance through the use of the latest build of the current Unreal Engine that will be used in upcoming games such as Unreal Tournament II and Unreal 2. We began using this Unreal Engine benchmark as a method of not only characterizing the performance of today's graphics cards in tomorrow's games, but also as a way of helping developers such as Epic work more closely with the hardware vendors.

After publishing our initial article we asked Epic Games' founder and lead programmer, Tim Sweeney, about his perception of the immediate results of using the Unreal Performance Test 2002:

"Behind the scenes, the benchmark has really helped focus NVIDIA and ATI on next-generation game performance -- and move away from using vintage 1999games like Unreal Tournament and Quake 3 to judge new 3D cards. The improved interaction between our team and the hardware makers is leading to a lot of performance improvements that definitely wouldn't have happened without the benchmark."

Today's GeForce4 review provides an excellent environment to continue the use of the Unreal Performance Test 2002. Daniel Vogel, another talented programmer at Epic, spent a great deal of time putting the finishing touches on a new build of the benchmark that includes some indoor as well as outdoor arenas. We regret that we cannot publish screenshots of the benchmark at Epic's request however there are a number of screenshots of what the engine is capable of at Epic's website already if you're interested in seeing exactly what this engine is capable of.

In the original article on the new benchmark we outlined the type of stresses that this test will provide that previous benchmarks couldn't, for more information please feel free to read the first article.

Unreal Performance Test 2002
1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce4 240/500

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

85.6

82.8

75

65.3

62.8

58.7

55.8

55.5

37.8

32.4

25.8

|
0
|
17
|
34
|
51
|
68
|
86
|
103

First of all it's very important to note that the flickering fog issues we complained about on the Radeon 8500 in our original article have been fixed with the latest beta drivers (v7.66) from ATI. At the same time, the performance did decrease with this latest driver revision giving the GeForce3 Ti 500 the lead whereas the Radeon 8500 used to be on top. Whether this is related to the flickering fog fix or not, we are still not sure.

We also through in another statistical point to look at, and that is the GeForce4 clocked at 240/250 (DDR) which is exact operating frequency as the GeForce3 Ti 500. This helps illustrate the performance improvement that is a result of the GeForce4's architectural enhancements and not just the clock speed boosts. Here we can see that a 15% boost in performance can be attributed to the architectural improvements of the GeForce4 alone.

Combining that 15% improvement with the increase in GPU and memory clock speeds results in a 31% performance improvement over the Ti 500 for the GeForce4 Ti 4600. In a next-generation first person shooter, the GeForce4 is able to already run at almost 90 fps without even being optimized for the game.

A significant disappointment is the GeForce4 MX which fails to even outperform the GeForce2 Ti 200. This is exactly why we recommend either going for the GeForce3 Ti 200, the Radeon 8500LE or waiting for the GeForce4 Ti 4200; and this is only at 1024 x 768.

Unreal Performance Test 2002
1280x1024x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce4 240/500

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

67.3

61

54.6

47.2

43.9

42.3

38.6

38.5

24.3

23

17.8

|
0
|
13
|
27
|
40
|
54
|
67
|
81

The performance standings don't change at 1280 x 1024 but now it takes no less than a GeForce4 Ti 4400 in order to break the 60 fps barrier. Granted that we're looking at quite possibly the worst case scenario when it comes to this engine but that should make you appreciate the power of the GeForce4 even more. The 4600 is still able to perform very well while the GeForce4 MX cards make it clear that the only thing they share in common is their name.

Unreal Performance Test 2002
1600x1200x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce4 240/500

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 460

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 440

49.3

44.1

39

32.1

29.4

28.5

26.3

26

16.6

15.5

12.8

|
0
|
10
|
20
|
30
|
39
|
49
|
59

When we first looked at performance under the UPT2002 we said that 1600 x 1200 was pretty much wishful thinking for the most part, however with the GeForce4 that all changes. Granted that with the 4600 we're not at 60 fps just yet, but we're very close and that is without any significant driver or code optimizations yet as the engine is still not in a completed state.



AA Performance

In order to investigate AA performance we used the Unreal Performance Test 2002 and tried all the various AA settings on each of the cards. We'll save a visual comparsion for a future article as we've already done comparisons of most of these algorithms already.

Unreal Performance Test 2002
2X AA - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

ATI Radeon 8500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 460

ATI Radeon 8500LE

NVIDIA GeForce4 MX 440

78.8

71.8

60.1

46.4

40.5

33.5

27.3

25.2

20.3

18

|
0
|
16
|
32
|
47
|
63
|
79
|
95

This is quite possibly the most impressive part; the GeForce4 Ti 4600 can run the current build of the Unreal Engine, Build 856, through a very stressful benchmark at 1024 x 768 with 2X AA enabled at close to 80 fps. The GeForce4 continues to illustrate why it's truly a card built for the future. At the same time, the GeForce4 MX makes it clear that the mainstream won't be performing nearly as well as the rest of the market.

Unreal Performance Test 2002
Quincunx AA - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

78.8

71.8

60.1

35.9

33.4

28.3

18

|
0
|
16
|
32
|
47
|
63
|
79
|
95

Obviously the Quincunx AA mode is limited only to NVIDIA cards and thus the results are quite predictable.

Unreal Performance Test 2002
4X AA - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce4 MX 460

ATI Radeon 8500

ATI Radeon 8500LE

NVIDIA GeForce4 MX 440

NVIDIA GeForce2 Ti 200

51.2

45.2

37

27.1

25.6

22.8

22.2

20.3

18

16.2

|
0
|
10
|
20
|
31
|
41
|
51
|
61

Because of NVIDIA's advanced multisampling AA algorithm, the performance hit incurred when going to 4X AA is minimal, resulting in relatively decent frame rates even with 4X enabled. The reality of the matter however is that 4X AA will be pushing it a little too far with some of the next-generation games making 2X AA or simply increasing the resolution much better alternatives.

Unreal Performance Test 2002
4XS AA - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

45

39.5

22.8

18

|
0
|
9
|
18
|
27
|
36
|
45
|
54

The new 4XS AA mode is again only for NVIDIA cards and improves image quality at the result of a small performance hit. The option is only present under Direct3D thus limiting its use even more.



Anisotropic Filtering Performance

With the release of ATI's 7.66 beta drivers they have integrated an anisotropic filtering control panel into the Radeon's driver set thus making it much more user friendly to enable this higher quality filtering. The new slider in the drivers will actually select the maximum degree of texture anisotropy that will be used, the Radeon hardware will calculate what levels of anisotropy should be used depending on the situation thus giving the Radeon 8500 a much smaller performance hit when anisotropic filtering is enabled.

Unreal Performance Test 2002
Anisotropic Filtering
16-tap - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

ATI Radeon 8500

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

NVIDIA GeForce4 MX 460

NVIDIA GeForce4 MX 440

61.4

56.3

51.4

51.3

47.5

42

32.4

18.7

15.1

|
0
|
12
|
25
|
37
|
49
|
61
|
74

Here we see that although the GeForce4 drops significantly in performance, the Radeon 8500 realizes a very small performance penalty and all of a sudden is within 20% of the fastest GeForce4; again this is because of the Radeon's more efficient anisotropic filtering algorithm.

Unreal Performance Test 2002
Anisotropic Filtering
16-tap and 2X FSAA - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

ATI Radeon 8500

NVIDIA GeForce3

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

57.2

51.9

47.7

26.8

26.5

24.7

22.4

|
0
|
11
|
23
|
34
|
46
|
57
|
69

One thing we proved in our last AA investigation is that in order for the visual quality of NVIDIA's multisampling AA to equal that of ATI's supersampled AA on both an anti-aliasing and a texture quality level NVIDIA cards had to be run with 16-tap anisotropic filtering enabled in conjunction with their AA mode. Enabling anisotropic filtering on the Radeon 8500 won't hurt performance all too much but turning it on with the Radeon's supersampling AA at the same time reduces performance significantly.

The GeForce4 takes much less of a hit when simply enabling 2X AA because of its more efficient multisampling algorithm.

Unreal Performance Test 2002
Anisotropic Filtering
16-tap and 4X FSAA - 1024x768x32
NVIDIA GeForce4 Ti 4600

NVIDIA GeForce4 Ti 4400

NVIDIA GeForce3 Ti 500

NVIDIA GeForce3

ATI Radeon 8500

NVIDIA GeForce3 Ti 200

NVIDIA GeForce2 Ti 200

43.6

38.2

28.7

19.8

18.9

18.7

11.8

|
0
|
9
|
17
|
26
|
35
|
44
|
52

The same story can be told with 16-tap anisotropic filtering enabled and 4X AA. This time there is much more of a performance difference between the GeForce4 Ti 4600 and the GeForce3 Ti 500 because of the increase in memory bandwidth dependency.



Final Words

The past couple of launches from NVIDIA have been of products that have never really surpassed their predecessors by more than 10 - 20% initially. The gaps eventually extended beyond that but never has an NVIDIA launch been as exciting from a pure performance standpoint as today's launch of the GeForce4. Many situations placed the GeForce4 Ti 4600 anywhere from 20% to over 50% faster than the GeForce3 Ti 500, and that's without even turning on anti-aliasing.

The GeForce4 is not only capable of quite a bit in today's games but it will be a serious contender in titles that will be hitting the streets in the coming months before the end of the year. The improvements to the architecture of the GeForce3 have definitely given the GeForce4 a decent boost in performance, and coupled with more efficient management of die space (removing some portions of the die, optimizing others) NVIDIA is sitting on a very impressive GPU.

The GeForce4 is easily the crowned victor of the Unreal Performance Test 2002, at least for the next few months; it's featureset is made complete by improved AA performance and nView.

As a multi-monitor solution, nView is excellent; it is everything ATI's Hydravision is, and it's even easier to use. The fact that NVIDIA is integrated nView in all of their cards indicates their belief in the technology and it actually is a very useful technology that can greatly improve productivity.

We can't feel more than a little disappointed by the GeForce4 MX however. The $99 GeForce4 MX 420 will be a good replacement for the current GeForce2 MX, and the $149 GeForce4 MX 440 may be justifyable but the $179 MX 460 makes very little sense to us. Even the 440 could use some more justification in order for it to make perfect sense to the user that does care about gaming. Our recommendation is to stay away from the MX 460 and wait a few more weeks for the GeForce4 Ti 4200; pay the extra $20 and gain full DirectX 8 pixel and vertex shader support as well as two additional rendering pipelines.

The GeForce4 Ti 4200 is the perfect example of a product that would have never seen the light of day had it not been for competition from ATI, which clearly reiterates the point that competition definitely helps an industry, especially when it's competition between two capable firms such as ATI and NVIDIA.

ATI will not have anything spectacular to respond to the GeForce4 with immediately, however they are very confident that their next part will be improved in every way imaginable and thus very impressive. The usual complaint about drivers is the first thing that comes up whenever the words ATI and potential are used in the same sentence but from what we're hearing from developers, it seems as if ATI is much more responsive now to driver problems than they've ever been in the past.

For now, the GeForce4 has been the first NVIDIA GPU in recent history to hit the market in such an impressive fashion. The GeForce3 was exciting but it offered limited performance gains; the Ti 500 gave us another performance boost but nothing too impressive. The GeForce4 however is not only improved architecturally but its raw power boost translates into some serious performance gains even today, not to mention all of the feature improvements such as accuview AA and nView multimonitor support.

From the standpoint of the competition, ATI seems relatively unimpressed with the GeForce4's specifications from what we've heard. Whether or not this means that the supposed R300 core is just that much better is up for speculation...

Log in

Don't have an account? Sign up now