Original Link: http://www.anandtech.com/show/1717
NVIDIA's GeForce 7800 GTX Hits The Ground Runningby Derek Wilson on June 22, 2005 9:00 AM EST
- Posted in
IntroductionA vast expanse of destruction lies before you. Billowing blue smoke rises from the ashes of the destroyed city, and flames continue to lick towards the sky. The horizon shimmers from the heat waves and smoke emanating from the rubble. As you proceed into the wreckage, your boots splash through puddles, sending out ripples and churning up the ashes. One of the buildings appears to have escaped most of the force of the blast, so you head towards it hoping to find some shelter and a place to relax for a moment.
A glint of light reflects off of the cracked windows, and you instinctively dive to the ground. A split second later, the glass shatters and fragments rain down around you as the bullet misses its intended mark. You roll to the side and watch as dirt and rubble plumes into the air from the spot you so recently occupied. As you marvel at the small particles of dirt scattering into the air, you realize it's already too late; you're too far from cover and the sniper is skilled. As your body slams towards the ground and the scene fades to black, you're glad to know that this was only a game, regardless of how lifelike it appears...
That's not a description of any actual game, but it could be in the very near future judging by the progress we continue to see on the graphics front. The attempt to bring such visions to life is reason enough for us to encourage and revere continued excellence in the field of computer graphics. The ongoing struggle between ATI and NVIDIA to bring forth the most parallel and powerful GPUs at reasonable prices opens new possibilities to developers, pushing them to create content beyond the realm of dreams and move onto ground where angles fear to tread: reality. With each successive generation we work our way closer and closer to blurring the line between reality and rendering, while every step leaves us wanting more. Once again it is time to check in on our progress down the infinite road to graphical perfection.
The latest offering from NVIDIA does not offer a host of new features or any upgraded shader model version support as have the past few generations. The NV4x architecture remains a solid base for this product as the entire DirectX 9 feature set was already fully supported in hardware. Though the G70 (yes, the name change was just to reconcile code and marketing names) is directly based on the NV4x architecture, there are quite a few changes to the internals of the pipelines as well as an overall increase in the width and clock speed of the part. This new update much resembles what we saw when ATI moved from R300 to R420 in that most of the features and block diagrams are the same as last years part with a few revisions here and there to improve efficiency.
One of the most impressive aspects of this launch is that the part is available now. I mean right now. Order it today and plug it in tomorrow. That's right, not only has NVIDIA gotten the part to vendors, but vendors have gotten their product all the way to retailers. This is unprecedented for any graphics hardware launch in recent memory. In the midst of all the recent paper launches in the computer hardware industry, this move is a challenge to all other hardware design houses.
ATI is particularly on the spot after today. Their recent history of announcing products that don't see any significant volume in the retail market for months is disruptive in and of itself. Now that NVIDIA has made this move, ATI absolutely must follow suit. Over the past year, the public has been getting quite tired of failed assurances that product will be available "next week". This very refreshing blast of availability is long overdue. ATI cannot afford to have R520 availability "soon" after launch; ATI must have products available for retail purchase at launch.
We do commend NVIDIA for getting product out there before launching it. But now we move on to the least pleasant side of this launch: price. The GeForce 7800 GTX will cost a solid $600. Of course, we do expect retailers to charge a premium for the early adopters. Prices we are seeing at launch are on the order of $650. This means those who want to build an SLI system based on the GeForce 7800 GTX will be paying between $1200 and $1300 just for the graphics component of their system.
So, what exactly is bigger better and faster this time around? And more importantly, what does that mean for game performance and quality (and is it worth the price)? This is the right place to find the answers. As developers continue to grow in shader prowess, we expect to see hardware of this generation stretch its legs even more as NVIDIA believes this is the point where pure math and shader processing power will become the most important factor in graphics hardware.
The Pipeline OverviewFirst, let us take a second to run through NVIDIA's architecture in general. DirectX or OpenGL commands and HLSL and GLSL shaders are translated and compiled for the architectures. Commands and data are sent to the hardware where we go from numbers, instructions and artwork to a rendered frame.
The first major stop along the way is the vertex engine where geometry is processed. Vertices can be manipulated using math and texture data, and the output of the vertex pipelines is passed on down the line to the fragment (or pixel) engine. Here, every pixel on the screen is processed based on input from the vertex engine. After the pixels have been processed for all the geometry, the final scene must be assembled based on color and z data generated for each pixel. Anti-aliasing and blending are done into the framebuffer for final render output in what NVIDIA calls the render output pipeline (ROP). Now that we have a general overview, let's take a look at the G70 itself.
The G70 GPU is quite a large IC. Weighing in at 302 million transistors, we would certainly hope that NVIDIA packed enough power in the chip to match its size. The 110nm TSMC process will certainly help with die size, but that is quite a few transistors. The actual die area is only slightly greater than NV4x. In fact, NVIDIA is able to fit the same number of ICs on a single wafer.
A glance at a block diagram of the hardware gives us a first look at the methods by which NVIDIA increased performance this time around.
The first thing to notice is that we now have 8 (up from 6) vertex pipelines. We still aren't vertex processing limited (except in the workstation market), but this 33% upgrade in vertex power will help to keep the extra pixel pipelines fed as well as handle any added vertex load developers try to throw at games in the near future. There are plenty of beautiful things that can be done with vertex shaders that we aren't seeing come about in games yet like parallax and relief mapping as well as extended use of geometry instancing and vertex texturing.
Moving on to pixel pipelines, we see a 50% increase in the number of pipelines packed under the hood. Each of the 24 pixel pipes is also more powerful than those of NV4x. We will cover just why that is a little later on. For now though, it is interesting to note that we do not see an increase in the 16 ROPs. These pipelines take the output of the fragment crossbar (which aggregates all of the pixel shader output) and finalizes the rendering process. It is here where MSAA is performed, as well as the color and z/stencil operations. Not matching the number of ROPs to the number of pixel pipelines indicates that NVIDIA feels its fill rate and ability to handle current and near future resolutions is not an issue that needs to be addressed in this incarnation of the GeForce. As NVIDIA's UltraShadow II technology is driven by the hardware's ability to handle twice as many z operations per clock when a z only pass is performed, this also means that we won't see improved performance in this area.
If NVIDIA is correct in their guess (and we see no reason they should be wrong), we will see increasing amounts of processing being done per pixel in future titles. This means that each pixel will spend more time in the pixel pipeline. In order to keep the ROPs busy in light of a decreased output flow from a single pixel pipe, the ratio of pixel pipes to ROPs can be increased. This is in accord with the situation we've already described.
ROPs will need to be driven higher as common resolutions increase. This can also be mitigated by increases in frequency. We will also need more ROPs as the number pixel pipelines are able to saturate the fragment crossbar in spite of the increased time a pixel spends being shaded.
No More Memory BandwidthAgain, we have a 256 bit (4x 64 bit) memory interface to GDDR3 memory. The local graphics memory setup is not significantly different from the 6800 series of cards and only runs slightly faster at a 1.2 GHz effective data rate. This will work out in NVIDIA's favor as long as newer games continue to put a heavier burden on pixel shader processing. NVIDIA sees texture bandwidth as outweighing color and z bandwidth in the not too distant future. This doesn't mean the quest after ever increasing bandwidth will stop; it just means that the reasons we will need more bandwidth will change.
A good example of the changing needs of graphics cards is Half-Life 2. While the game runs very well even on older graphics cards like the 9800 Pro, the design is such that increased memory bandwidth is far less important than having more shader processing power. This is why we see the 6600GT cards significantly outperform the 9800 Pro. Even more interesting is that in our testing, we found that enabling 4xAA on a 9800 Pro didn't affect performance of HL2 much at all, while increasing the resolution from 1024x768 to 1280x1024 had a substantial impact on frame rates. If the HL2 model is a good example of the future of 3D engines, NVIDIA's decision to increase pixel processing power while leaving memory bandwidth for the future makes a lot of sense.
On an interesting side note, the performance tests in this article are mostly based around 1600x1200 and higher resolutions. Memory usage at 2048x1536 with 32bit color and z-buffer runs a solid 144MB for double buffered rendering with 4x AA. This makes a 256MB card a prerequisite for this setup, but depending on the textures, render targets and other local memory usage, 256MB may be a little short. PCI Express helps a little to alleviate any burden placed on system memory, but it is conceivable that some games could get choppier when swapping in and out large textures, normal maps, and the like.
We don't feel that ATI's 512MB X850 really brings anything necessary to the table, but with this generation we could start to see a real use for 512MB of local memory. MRTs, larger textures, normal maps, vertex textures, huge resolutions, and a lack of hardware compression for fp16 and fp32 textures all mean that we are on the verge of seeing games push memory usage way up. Processing these huge stores of data require GPUs powerful enough to utilize them efficiently. The G70 begins to offer that kind of power. For the majority of today's games, we are fine with 256MB of RAM, but moving into the future it's easy to see how more would help.
In addition to these issues, a 512MB card would be a wonderful fit for Dual-Link DVI. This would make the part a nice companion to Apple's largest Cinema Display (which is currently beyond the maximum resolution supported by the GeForce 7800 GTX). In case anyone is curious, a double buffered 4xAA 32bit color+z framebuffer at 2560x1600 is about 190MB.
In our briefings on G70, we were told that every part of the chip has been at least slightly updated from NV4x, but the general architecture and feature set is the same. There have been a couple of more significant updates as well, namely the increased performance capability of a single shader pipe and the addition of transparency antialiasing. Let's take a look at these factors right now.
Inside The PipesThe pixel pipe is made up of two vector units and a texture unit that all operate together to facilitate effective shader program execution. There are a couple mini-ALUs in each shader pipeline that allow operations such as a free fp16 normalize and other specialized features that relate to and assist the two main ALUs.
Even though this block diagram looks slightly different from ones shown during the 6800 launch, NVIDIA has informed us that these mini-ALUs were also present in NV4x hardware. There was much talk when the 6800 launched about the distinct functionality each of the main shader ALUs had. In NV4x, only one ALU had the ability to perform a single clock MADD (multiply-add). Similarly, only one ALU assisted in texture address operations for the texture unit. Simply having these two distinct ALUs (regardless of their functionality difference) is what was able to push the NV4x so much faster than the NV3x architecture.
In their ongoing research into commonly used shaders (and likely much of their work with shader replacement), NVIDIA discovered that a very high percentage of shader instructions were MADDs. Multiply-add is extremely common in 3D mathematics as linear algebra, matrix manipulation, and vector calculus are a huge part of graphics. G70 implements MADD on both main Shader ALUs. Taking into account the 50% increase in shader pipelines and each pipe's ability to compute twice as many MADD operations per clock, the G70 has the theoretical ability to triple MADD performance over the NV4x architecture (on a clock for clock basis).
Of course, we pressed the development team to tell us if both Shader ALUs featured identical functionality. The answer is that they do not. Other than knowing that only one ALU is responsible for assisting the texture hardware, we were unable to extract a detailed answer about how similar the ALUs are. Suffice it to say that they still don't share all features, but that NVIDIA certainly feels that the current setup will allow G70 to extract twice the shader performance for a single fragment over NV4x (depending on the shader of course). We have also learned that the penalty for branching in the pixel shaders is much less than in previous hardware. This may or may not mean that the pipelines are less dependent on following the exact same instruction path, but we really don't have the ability to determine what is going on at that level.
No More Shader ReplacementThe secret is all in compilation and scheduling. Now that NVIDIA has had more time to work with scheduling and profiling code on an already efficient and powerful architecture, they have an opportunity. This generation, rather than build a compiler to fit hardware, they were able to take what they've learned and build their hardware to better fit a mature compiler already targeted to the architecture. All this leads up to the fact that the 7800 GTX with current drivers does absolutely no shader replacement. This is quite a big deal in light of the fact that, just over a year ago, thousands of shaders were stored in the driver ready for replacement on demand in NV3x and even NV4x. It's quite an asset to have come this far with hardware and software in the relatively short amount of time NVIDIA has spent working with real-time compilation of shader programs.
All these factors come together to mean that the hardware is busy more of the time. And getting more things done faster is what it's all about.
So, NVIDIA is offering a nominal increase in clock speed to 430MHz, just a little more memory bandwidth (256bit memory buss running at a 1.2GHz data rate), 1.33x vertex pipelines, 1.5x pixel pipelines, and various increases in efficiency. These all work together to give us as much as double the performance in extreme cases. If the performance increase can actually be realized, we are looking at a pretty decent speed increase over the 6800 Ultra. Obviously, in the real world we won't be seeing a threefold performance increase in anything but a bad benchmark. In cases where games are CPU limited, we will likely see a much lower increase in performance, but performance double that of the 6800 Ultra is entirely possible in very shader limited games.
In fact, EPIC reports that under certain Unreal Engine 3 tests they currently see two to 2.4x improvements in framerate over the 6800 Ultra. Of course, UE3 is not finished yet and there won't be games out based on the engine for a while. We don't usually like reporting performance numbers from software that hasn't been released, but even if these numbers are higher than we will see in a shipping product, it seems that NVIDIA has at least gotten it right for one developer's technology. We are very interested in seeing how next generation games will perform on this hardware. If we can trust these numbers at all, it looks like the performance advantage will only get better for the GeForce 7800 GTX until Windows Graphics Foundation 2.0 comes along and inspires new techniques beyond SM3.0 capabilities.
Right now, each triangle that gets fed through the vertex pipeline, there are many pixels inside the object that needs her help.
Bringing It All Together
Why didn't NVIDIA build a part with unified shaders?
Every generation, NVIDIA evaluates alternative architectures, but at this time they don't feel that a unified architecture is a good match to the current PC landscape. We will eventually see a unified shader architecture from NVIDIA, but it will not likely be until DirectX itself is focused around a unified shader architecture. At this point, vertex hardware doesn't need to be as complex or intricate as the pixel pipeline. As APIs develop more and more complex functionality it will be advantageous for hardware developers to move towards a more generic and programmable shader unit that can easily adapt to any floating point processing need.
As pixel processing is currently more important than vertex processing, NVIDIA is separating the two in order to focus attention where it is due. Making hardware more generic usually makes it necessarily slower, but explicitly targeting a specific aspect of something can often improve performance a great deal.
When WGF 2.0 comes along and geometry shaders are able to dynamically generate vertex data inside the GPU we will likely see an increased burden on vertex processing as well. Being able to programmatically generate vertex data will help to remove the burden on the system to supply all the model data to the GPU.
Transparency AA, Purevideo, and HDTVOn of the problems with Multisample AA is its inability to correct aliasing within a polygon. One of the main new features that NVIDIA added to the G70 is a method to combat the most notorious problem associated with MSAA: antialiasing of transparent textures.
When transparency AA is enabled on a GeForce 7800 GTX, textures that make use of the alpha channel can be flagged to have either supersample or multisample AA performed inside the texture. This can help a great deal for features often implemented with transparent textures such as leaves, vegetation or chain link fences.
This affords an increased performance hit along with its higher image quality, but no longer will fences, bushes, and trees cause a marked decrease in image quality even while running 4xAA. We will explore the performance hit and quality of Transparency AA in our analysis of the hardware, but NVIDIA provides the option of running with either SSAA or MSAA in this mode. MSAA incurs less of a performance hit, but SSAA is higher quality. We are glad that the choice is left to the end user and would even prefer that we get the choice in how FSAA is performed as well.
With increasingly powerful hardware we can afford to "waste" some cycles on SS in order to achieve slightly higher image quality in game that are severely CPU bound. Check out our recent Insider Article on NVIDIA's upcoming introduction of a 16x AA mode for 7800 GTX SLI systems. We will test this mode as soon as NVIDIA offers a driver with support for it.
This time around, Purevideo has extended support for HD format acceleration. The 7800 GTX will now have support for spatial-temporal de-interlacing for HD content. This feature promises to make 1080i content look that much better on a PC. NVIDIA has also said that the 7800 GTX should support H.264, but have said that the driver will not have support until near year's end. As we have already seen an H.264 demo from ATI, and the lack of anything tangible from NVIDIA at this point is disappointing. We are hesitant to even mention NVIDIA's claimed "support" before we see it running on actual hardware (especially after the lacking and late Purevideo support for initial NV40 parts). This time around, we can expect more support for alternate video players from NVIDIA as they are working with InterVideo and Cyberlink.
Not tied to the 7800 GTX is NVIDIA's latest improvement on HDTV support in their 75 series drivers (also launching today). Over time support for fitting a PC's output to any HDTV has improved, but this latest update makes it that much easier to deal with. Providing sliders and a full screen underscan adjustment feature is long overdue, but we still wish modern hardware could provide a more fully featured plug and play environment for HDTV.
We will also be getting some Windows MCE extensions that make HDTV setups easier to configure as well. If the US can manage to keep broadcast flags off the law books, public support for and adoption of digital television services and computers as media center boxes will surely continue to grow and prosper.
The Test, Card, and High ResolutionBefore we get down to the performance tests, let's look at our test system.
ASUS nForce 4 A8N-SLI Deluxe Motherboard
1GB DDR3200 2:2:2:8
120GB Seagate 7200.7 HD
OCZ PowerStream 600W PSU
The card this time around is a single slot solution. With the process shrink to 110nm an insignificant increase in clock speed, NVIDIA has produced a chip that runs at lower power and temperature than NV40. At the same time, the increase in parallelism has served to boost performance.
Layout of the card is relatively similar to the 6800 Ultra, but there are a few differences. We've still got 2 DVI slots (both single link), but the solder point for the silicon image TMDS chip for dual-link DVI is either missing or moved. We will certainly be interested in seeing a workstation version of this part.
Here's a quick recap and summary of the G70 and GeForce 7800 GTX:
110 nm TSMC fabrication process
Single Slot HSF
430MHz Core clock
600MHz GDDR3 256MB/256-bit
8 vertex shader units
24 pixel shader units
16 ROP units
2 DVI and one HDTV / VIVO connection
PCI- Express (demand for an AGP part will be determined and addressed if necessary)
350W Power Supply Recommended (500W for SLI)
So what is all the fuss about? Here's a look at the NVIDIA GeForce 7800 GTX card:
Why high res?
It is important to remember that we tested at resolutions of 1600x1200 and higher because lower resolutions are CPU limited without AA and AF enabled. In many cases the GeForce 7800 GTX don't show much difference in performance with and without antialiasing at lower resolutions. This kind of data doesn't give us much useful information about the card. We have truly reached another plateau in graphics performance with this part: pushing the card to the max is all but necessary in order to understand its performance characteristics.
Battlefield 2 DemoFor this game, we recorded our own timedemo using the freely availabe demo versino of the game. The demo was played back using EA's demo.cmd file, but we used FRAPS to determine the framerate as the timedemo feature incorrectly incorporates frames from the loading screen (which generally runs at >400 fps on the cards we tested).
With the added graphical effects, Battlefield 2 is quite a bit more demanding of systems than its predecessor. In fact, BF2 actually has a huge memory footprint and could even take advantage of more than 1 GB of RAM! That said, frame rates varied quite a bit between the configurations, and once again a single 7800GTX beats the 6800U SLI setup - it's a tie at 1600x1200, but the 7800 holds a 42% lead at 2048x1536. ATI does very well here, surpassing the 6800U by a decent margin and coming within striking distance of the SLI setup at 2048x1536. As with other games, the 6800 series struggles with the high resolution, running less than half as fast compared to 1600x1200. The benefit of SLI over a single card ranges from over 100% on the 6800U to 59% for the 7800GTX. If you want AA/AF at 1600x1200 or higher resolutions, only the 6800U SLI or 7800GTX setups are even remotely able to handle the strain.
Doom 3 1.3 Performance
Unlike most of the other games we're looking at, Doom 3 actually places quite a strain on the memory bandwidth of the graphics card. This seems to be the a common occurrence with many of the OpenGL games, though Doom 3 more so than others. The reason for this is the large number of stencil calculations that are required for the real time shadows. This allows the 6800U SLI setup to actually outperform a single 7800GTX by a sizeable margin - remember that the difference in memory bandwidth between a 6800U and a 7800GTX is only 9%. We also see that antialiasing has a major impact on the single 7800GTX, though it still maintains a commanding lead (25 to 81% depending on resolution and settings) over the 6800U. In the SLI configurations, the 7800GTX only leads by 15% at 1600x1200 4xAA, but that grows to 61% when we move to 2048x1536.
Switching to the ATI card, we can see that ATI has done a lot to close the performance gap in Doom 3. While the 6800U still wins in 1600x1200, the ATI card actually comes out ahead at 2048x1536. Like we've seen in a few other games, though, the NVIDIA drivers don't seem to handle 2048x1536 very well. With AA/AF enabled, the 6800U once again takes a 50% performance hit when increasing the resolution. Due to the dark atmosphere and lighting flashes, Doom 3 is a game that definitely needs to run at a high refresh rate or with VSYNC enabled, so again the lack of performance at 2048x1536 isn't the end of the world. What we're mostly concerned with is taxing the hardware to show future potential, and it's safe to say that the 7800GTX - particularly with SLI - will be able to handle all games for many years.
EVE: Online Performance PerformanceEve is clearly not the most demanding of games when it comes to graphics cards. In fact, the difference between the various setups is at most 5%. There is also a problem with the SLI support, as it actually slows down performance in Eve. This could be due to the increased demands on the CPU required for dividing the workload between the cards, though we don't see this performance hit as much on the 6800 Ultra cards. Other than the problem with SLI decreasing performance, we're clearly CPU limited in Eve, so if this is your game du jour right now, you probably don't need to worry about a GPU upgrade for a while.
Everquest 2 Performance PerformanceDespite the fact that Everquest 2 is an MMORPG, it has some of the most demanding graphics of any game to date. The extreme quality mode manages to tax the system so severely that even at 1280x1024 we aren't able to get above 25 FPS with the 7800 GTX. ATI pulls ahead of the single 6800U by over 100% in the widescreen 1920x1200 resolution, though in more reasonable settings the performance is closer. It's interesting to note that the 6800U actually outperforms the X850XTPE in the "Extreme Quality" mode.
Everquest 2 does put a larger strain on the CPU than many other games, so the benefits of SLI are rather limited without enabling AA/AF. The biggest gap between a single card and SLI setup is only 22% at 1920x1200 with the 6800U cards; the 7800 SLI setup only manages a 13% margin of victory at 1600x1200. Turning on the AA/AF changes things quite a bit, however, with the SLI setups gaining 40 to 65% on the 6800U and 60 to 72% on the 7800 cards. The single 7800 competes rather well with the 6800SLI and takes the lead in most of the non-AA/AF settings, though the SLI cards win once AA/AF is enabled by 17%. The exception is once again the 2048x1536 resolution , where the 6800 cards simply can't provide acceptable frame rates with AA/AF enabled.
Enabling AA/AF causes a massive performance hit on most of the configurations, though. The single 7800 loses almost half of its performance while the 6800 configurations fare even worse in some cases. 2048 in particular causes them to run at 1/3 to 1/4 the speed that they managed without AA. It may be a driver bug, though with the graphical complexity and polygon counts of EQ2, it's difficult to lay all the blame on the drivers. ATI does a little better, but they still take a 30 to 45% performance hit by enabling AA/AF. For those of you addicted to the lifestyle known as Everquest 2, you may actually be able to stomach the cost of 7800GTX SLI.
One issue that we encountered on a few resolutions and configuration in EQ2 was a strange flickering/rendering problem. This occurred on both the 6600GT and 7800GTX SLI configurations, though it was only with AA/AF enabled and only at 2048x1536 for the 7800 SLI cards. We didn't bother running the 6600GT SLI at anything more than 1600x1200, of course, as even at that resolution the frame rates were all but unplayable.
Guild Wars Performance PerformanceGuild Wars isn't quite as demanding in terms of frame rates as most FPS games, as it is more of a 3D MMORPG version of Diablo, so all of the cards really provide acceptable frame rates. Guild Wars also includes easy support for widescreen resolutions that you find on some of the newer LCDs, and you can see the performance results. The 7800 GTX puts in a good showing here. We had reported SLI numbers, but we've since pulled them as there is no default profile and the settings we ran with produced worse results than single card performance.
Half Life 2 PerformanceHalf-Life 2 is arguably one of the best looking games currently available. We mentioned earlier that it stresses pixel processing power more than memory bandwidth on the graphics card, and we see that here. While enabling AA/AF does cause a performance loss at these high resolutions, it isn't truly severe until we reach 2048x1536. Assuming a more common resolution of 1600x1200 - not everyone has a monitor with support for higher resolutions - the single 7800GTX is actually faster than 6800U SLI. In fact, the only time we see the 6800 SLI setup win out is when we run 4xAA/8xAF at 2048x1536, and then it's only by 3%. Also worth noting is that while SLI helps out the 6800 series quite a bit (provided you have a fast CPU and are running a high resolution), the 7800GTX is clearly running into CPU limitations. Our FX-55 can't push any of the cards past the 142 FPS mark regardless of resolution.
ATI has done well in HL2 since its release, and that trend continues. The SLI configurations (other than the 6600GT) all surpass the performance of the X850XTPE, but it does come out ahead of the 6800U in a single card match - it's as much as 42% faster when we look at the 1600x1200 AA/AF scores. The 7800GTX, of course, manages to beat it quite easily. The 540 MHz core clock of the XTPE is quite impressive in the pixel shader heavy HL2, but with the additional pipelines and improved MADD functionality, the 7800GTX chews up HL2 rocks and spits out Combine gravel.
One thing that isn't immediately clear is why the 6800U cards have difficulties supporting the 2048x1536 resolution in some games. Performance drops by almost half when switching from 1600x1200 to 2048x1536, so either there's a driver problem or the 6800U simply doesn't do well with the demands of HL2 at such resolutions. There are 63% more pixels at 2048x1536 compared to 1600x1200, so it's rather shocking to see a performance decrease larger than this amount. We would venture to guess that it's a matter of priorities: the number of people that actually run 2048x1536 resolution in games is very small in comparison to the total number of gamers, and with most cards only providing a 60Hz refresh rate at that resolution, many don't worry too much about gaming performance.
Splinter Cell: Chaos Theory PerformanceIn Chaos Theory, we have another instance where the single 7800GTX is able to outperform the 6800U SLI setup in most situations. Without AA/AF, it wins by 10% and 42% at 1600 and 2048 respectively. With AA/AF, however, the 6800U SLI beats the single 7800GTX by 39% at 1600x1200. Again, the 6800U cards seem to have difficulty running 2048x1536, with the single card taking running 115% faster at 1600x1200 than 2048x1536. If you're looking for the ultimate in resolutions and image quality, the 7800GTX SLI setup manages an impressive 80% performance advantage over the single card at the highest resolutions. Whether or not that's worth the cost is up to the individual to decide, of course - most of use would be perfectly happy with 1600x1200 and 4xAA. The 6800U SLI setup also leads the single 6800U by 81% at 2048, though with an average frame rate below 40 FPS, it's something of a hollow victory.
Star Wars: Knights of the old Republic 2 PerformanceAs an RPG, frame rates are less critical, and all of the setups really do quite well. ATI comes out ahead of the 6800U cards, while the 7800GTX takes the lead by a sizeable margin. It's interesting to note that while the 6800U cards do not benefit from SLI in this game, the 7800GTX does. We see a 23 and 31% increase without AA/AF and with it enabled. The single 7800 also manages a similar performance gap relative to the 6800U. Assuming the 6800U gets SLI support and gains a similar amount, it should roughly match the 7800GTX. For now, though, the single card is clearly the better solution.
Tiger Woods PGA Tour 2005 PerformanceTiger Woods 2005 is similar to Eve in being limited by the CPU. At 1600x1200, we only see an 11% and 13% spread - AA/AF and no AA/AF, respectively - between the various configurations (not counting the 6600GT SLI). Of course, in a golf simulation frame rates aren't nearly as critical as in FPS games, so the 30+ FPS results generated by all contenders are more than acceptable.
Unreal Tournament 2004 PerformanceUnreal Tournament is a game that is clearly CPU limited. While we see a slight difference in performance between the 6800 Ultra and the SLI 6800U, it is still far less than what we'll find in other games. The 7800GTX receives only a minor increase in performance going to SLI mode, and that's when running 1600x1200 with 4xAA/8xAF. We've heard that the fully object oriented C++ design of the Unreal Engine contributes to the heavier CPU load. Whatever the cause, it's pretty clear that the current Unreal Engine isn't in desperate need of more graphics power. UE3 will add support for multi-threading as well as increased shader effects, though, so don't think that UT2K4 is indicative of future Unreal Engine requirements.
Worth note is that a single 7800GTX is only slightly slower (2%) than 6800U SLI, so if you've been holding off upgrading in anticipation of the G70, you should be able to save money and increase performance! Of course, the performance advantage of the 7800GTX (even in SLI mode) over the 6800U isn't so great in this game that you really need to consider upgrading, as it's only 39% from the single 6800U to the SLI'ed 7800GTX. However, we'll continue to look at how the single 7800GTX compares to the 6800U-SLI and X850XT.
Speaking of the X850XT, Unreal Tournament has often tended to favor NVIDIA cards slightly, and here we see the top ATI card being outperformed by every NVIDIA setup including the 6600GT SLI - at least when AA/AF aren't enabled. Once we enable those, the 6600GT SLI loses ground to the ATI card. Still, even the slowest of these configurations is capable of providing good to great frame rates at 1600x1200 4xAA/8xAF.
Wolfenstein: Enemy Territory PerformanceAs an older OpenGL game that lacks any fragment programs (OpenGL's equivalent of DirectX Pixel and Vertex Shaders), most of these setups breeze through it, even at high resolutions like 2048x1536. The only setup that doesn't provide smooth frame rates with AA/AF enabled is the 6600GT. There is clearly a problem with the SLI support in Wolfenstein 3D, as both the 6800U and 7800GTX both run slower in SLI mode than in single card mode. We would assume that the 6600GT SLI is also failing to take advantage of the second card, so the relatively low frame rates in AA/AF mode are due to driver problems more than anything. Also of note is that despite the lack of advanced pixel effects, Wolf3D is relatively CPU limited. 120 FPS even with an FX-55 appears to be the maximum.
The one noticeable advantage of the 7800GTX is that it makes 2048x1536 4xAA a truly playable resolution, even for competitive online gamers. The speedup at 1600x1200 4xAA is only 18%, but 2048x1536 runs almost 50% faster. If you have a monitor capable of running such a resolution, the 7800GTX is one of the few cards that can actually handle it. When (or perhaps if) NVIDIA fixes the SLI support for Wolf3D, we should also see acceptable frame rates out of the SLI setups even at the maximum resolution, though given the popularity of the game we have to wonder why NVIDIA hasn't already included proper SLI support.
The ATI side of the equation is once again lacking. While it is more or less able to match the 6800U, ATI clearly has no current answer to the 7800GTX. We'll have to see if Crossfire does better with accelerating Wolf3D than SLI. If it does, we expect NVIDIA to suddenly find the impetus to actually get SLI support working properly.
Transparency AA PerformanceFor our Transparency AA testing, we ran through the AT_Canals_08-rev7 timedemo in Half Life 2. This demo has quite a few fences, and we wanted to test NVIDIA's claim that graphical quality is indeed improved significantly. Our performance tests show that MSAA comes at a marginal performance hit to overall framerate, while SSAA consumes quite a bit of our performance.
In order to illustrate the effect of the new AA mode, we took screenshots of a tell tale frame from our demo run. The screenshot shows very clearly that SSAA provides quite a quality improvement over no AA. MSAA ends up not looking worth the investment.
No Transparency AA
MS Transparency AA
SS Transparency AA
Below we have prepared some mouse overs to demonstrate the difference between the various levels of Transparency Anti Aliasing. Put your mouse over the first image to see the difference betweeen no Transparency AA and SSAA. The second image compares no AA with MSAA and the final image compares SSAA to MSAA.
No Transparency AA versus SSAA
No Transparency AA versus MSAA
SSAA versus MSAA
The difference between no Transparency AA and MSAA is very small. Only with a difference map are we even able to distinguish between the two. However, the difference between MSAA and SSAA is very pronounced.
Power ConsumptionWe measured power consumption between the power supply and the wall. This multiplies essentially amplifies any differences in power draw because the powersupply is not 100% efficient. Ideally we would measure power draw of the card, but it is very difficult determine to determine the power draw from both the PCIe bus and the 12V molex connector.
As we can see, the GeForce 7800 GTX delivers on it's promise of higer performance at lower power due to the 110nm manufacturing process. We can't wait to see what NVIDIA's mobile division has planned for this.
Final WordsIt's taken three generations of revisions, augmentation, and massaging to get where we are, but the G70 is a testament to the potential the original NV30 design possessed. Using the knowledge gained from their experiences with NV3x and NV4x, the G70 is a very refined implementation of a well designed part.
With a max of three times the MADD throughput, 50% more pixel pipes, and 33% more vertex power than 6800 Ultra, the GeForce 7800 GTX is a force with which to be reckoned. Putting this much processing power into a package that pulls less juice from the wall than a 6800 Ultra is quite a feat as well. The 300+ million transistors fabbed on a 110 nm process are quite capable, and NVIDIA's compiler technology is finally mature to the point of handling all games with no shader replacement.
Adding transparency AA and further enhancing the efficiency of their PureVideo hardware will be the most tangible feature additions of the GeForce 7800 GTX. The tweaks in the pipeline really only come in performance numbers rather than feature enhancements. As there has been no DirectX update since the last part, NVIDIA has opted not to introduce any extra features. Their reasoning is that developers are slow enough to adopt DirectX changes, let alone a feature that would only run using OpenGL extensions.
Even though features haven't been added to the vertex and pixel shaders directly, the increased power will allow game developers more freedom to generate more incredible and amazing experiences. Though not seen in any game out now or coming out in the near term, the 7800 GTX does offer the ability to render nearly "Sprits Within" quality graphics in real-time. Games that live up to this example (such as Unreal Tournament 2007) still have quite a ways to go before they make it into our hands and onto our hardware, but it is nice to know the 7800 GTX has the power to run these applications when they do come along.
It is quite difficult to sum up this launch. From what is essentially a very thorough refresh of NV4x, we've got something that is more than the sum of its parts. The GeForce 7800 GTX is capable of smooth frame rates at incredibly high resolutions. Succeeding in bringing hardware and compiler together for a solution that does a better job of keeping the hardware busy than previous generations is definitely one of the most important aspects of this part. Eliminating shader replacement and performing this well is no feat to be underestimated.
Aside from the well executed hardware, NVIDIA has pulled off an incredible launch with availability right now. A quick look in our RealTime Price Engine shows several brands already available as low as $569. We can't stress enough how happy we are with NVIDIA's push to provide product in the retail market on the same day the product is announced. ATI really needs to follow suit on this one with their upcoming Crossfire launches.
For $600 we would like to see 512MB onboard, but with the current gaming landscape we certainly agree that more than 256MB is not an absolute necessity. But the GeForce 7800 GTX would have no reason to exist right now if not to accommodate future titles that will be more taxing than current games.
Overall, we consider this a successful launch. Aside from the performance of the 7800 GTX, we can infer that the PS3's RSX will be even more powerful than the G70. As RSX will be a 90nm part and will still have some time to develop further, the design will likely be even easier to program, faster, and full of more new features.