Original Link: http://www.anandtech.com/show/2213
DX10 for the Masses: NVIDIA 8600 and 8500 Series Launchby Derek Wilson on April 17, 2007 9:00 AM EST
- Posted in
Incredibly high priced and high powered graphics cards are exciting. For geeks like us, learning about the newest and best hardware out there is like watching a street race between a Lambo and a Ferrari. We are able to see what the highest achievable performance really offers when put to the test. Standing on the bleeding edge of technology and looking out at what is currently possible towards the next great advancement inspires us. The volume and depth of knowledge required to build a GPU humbles us while physically demonstrating the potential of the human race.
Unfortunately, this hardware, while attainable for a price, is often out of reach for most of its admirers. Sometimes it isn't owning the thing which we set apart that gives us joy, but knowing of its existence and relishing the fact that some time, in the not too distant future, that kind of performance will be available for a reasonable price as a mainstream staple.
While the performance of the 8800 GTX is still a ways off from making it to the mainstream, we finally have the feature set (and then some) of the GeForce 8 series in a very affordable package. While we only have the fastest of the new parts from NVIDIA today, the announcement today includes these new additions to the lineup:
GeForce 8600 GTS
GeForce 8600 GT
GeForce 8500 GT
GeForce 8400 (OEM only)
GeForce 8300 (OEM only)
The OEM only parts will not be available as add-in cards but will only be included in pre-built boxes by various system builders. While the 8600 GTS should be available immediately, we are seeing a little lag time between now and when the 8600 GT and 8500 GT will be available (though we are assured both will be on shelves before May 1). While NVIDIA has been very good about sticking to a hard launch strategy for quite a while now, we were recently informed that this policy would be changing.
Rather than coordinate hardware availability with their announcement, NVIDIA will move to announcing a product with availability to follow. Our understanding is that availability will happen within a couple weeks of announcements. There are reasons NVIDIA would prefer to do things this way, and they shared a few with us. It's difficult for all the various card manufacturers to meet the same schedule for availability, and giving them more time to get their hardware ready for shipment will even the playing field. It's hard to keep information from leaking when parts are already moving into the channel for distribution. With some sites able to get their hands on this information without talking to NVIDIA (and thus can avoid abiding by embargo dates on publication), taking measures to slow or stop leaks helps NVIDIA control information flow to the public and appease publishers who don't like getting scooped.
The bottom line, as we understand it, is that hard launches are difficult. It is our stance that anything worth doing right justifies the trouble it takes. NVIDIA stated that, as long as the information is accurate, there is no issue with delayed launches (or early announcements depending on how we look at things). On the surface this is true, but the necessity of a hard launch has reared its ugly head time and time again. We wouldn't need hard launches if we had infinite trust in hardware makers. The most blatant example in recent memory is the X700 XT from ATI. This product was announced, tested, reviewed (quite positively), but never saw the light of day. This type of misinformation can lead people to put off upgrading while waiting for the hardware or, even worse, trick people into buying hardware that does not match the performance of products we review.
So many people get confused by the fact that we still love hard launches even if only a handful of parts are available from a couple retailers. Sure, high availability at launch is a nice pipe dream, but the real meat of a hard launch is in the simple fact that we know the hardware is available, we know the hardware has the specs a company says it will, and we know the street price of the product. Trust is terrific, but this is business. NVIDIA, AMD, Intel, and everyone else are fighting an information war. On top of that, the pace of our industry is incredible and can cause plans to change at the drop of a hat. Even if a company is completely trustworthy, no one can predict the future and sometimes the plug needs to be pulled at the very last second.
In spite of all this, NVIDIA will do what they will do and we will continue to publish information on hardware as soon as we have it and are able. Just expect us to be very unforgiving when hardware specs don't match up exactly with what we are given to review.
For now, we have new hardware at hand, some available now and some not. While the basic architecture is the same as the 8800, there have been some tweaks and modifications. Before we get to performance testing, let's take a look at what we're working with.
Under the Hood of G84
So the quick and dirty summary of the changes is that the G84 is a reduced width G80 with a higher proportion of texture to shader hardware and a reworked PureVideo processing engine (dubbed VP2 as opposed to G80's VP1). Because there are fewer ROPs, fill rate and antialiasing capabilities will be reduced from the G80 as well. This isn't as necessary on a budget card where shader power won't be able to keep up with huge resolutions either.
We expect the target audience of the 8600 series to be running 1280x1024 resolution panels. Of course, some people will be running larger panels and we will test some higher resolutions to see what kind of capabilities the hardware has, but above 1600x1200 tests are somewhat academic. As 1080p TVs become more popular in the coming years, however, we may start putting pressure on graphics makers to target 1920x1200 as their standard resolution for mainstream parts even if average computer monitor sizes weigh in with fewer pixels.
In order to achieve playable performance at 1280x1024 with good quality settings, NVIDIA has gone with 32 shaders, 16 texture address units, and 8 ROPs. Here's the full breakdown:
|GeForce 8600/8500 Hardware|
|GeForce 8600 GTS||GeForce 8600 GT||GeForce 8500|
|Texture Address / Filtering||16/16||16/16||8/8|
|Core Clock||675 MHz||540 MHz||450 MHz|
|Shader Clock||1.45 GHz||1.19 GHz||900 MHz|
|Memory Clock (Data Rate)||2 GHz||1.4 GHz||800 MHz|
|Memory Bus Width||128-bit||128-bit||128-bit|
|Frame Buffer||256 MB||256 MB||256MB / 512MB|
|Outputs||2x dual-link DVI||2x dual-link DVI||?|
|Transistor count||289 M||289 M||?|
|Price||$200 - $230||$150 - $160||$90 - $130|
We'll tackle the 8500 in more depth when we have hardware. For now, we'll include the data as reference. As for the 8600, right out of the gate, 32 SPs mean one third the clock for clock shader power of the 8800 GTS. At the same time, NVIDIA has increased the ratio of Texture address units to SPs from 1:4 to 1:2. We also see a 1:1 ratio of texture address and filter units. These changes prompted NVIDIA to further optimize their scheduling algorithms.
The combination of greater resource availability and improved scheduling allow for increased efficiency. In other words, clock for clock, G84 SPs are more efficient than G80 SPs. This makes it harder to compare performance based on specifications. Apparently stencil culling performance has also been improved, which should help boost algorithms like the Doom 3 engine's shadowing technique. NVIDIA didn't give us any detail on how stencil culling performance was improved, but indicated that this, among other things, was also tweaked with the new hardware.
Top this off with the fact that G84 has also been enhanced for higher clock speeds than G80 and we can expect much more work to be done by each SP per second than on 8800 hardware. Exactly how much is something we don't have an easy way of measuring as changes in efficiency will vary by the algorithms running on the hardware as well.
With 256 MB of memory on a 128-bit bus, we can expect a little more memory pressure than on the 8800 series. The 2 x 64-bit wide channels provide 40% of the bus width of an 8800 GTS. This isn't as cut down as the number of SPs; remember that the texture address units have only been reduced from 24 on the 8800 GTS to 16 on the 8600 series. Certainly the reduction of 20 ROPs to 8 will help cut down on memory traffic, but that extra texturing power won't be insignificant. While we don't have quantitative measurements, our impression is that memory bandwidth is more important in NVIDIA's more finely grained unified architecture than it was with the GeForce 7 series pipelined architecture. Sticking with a 128-bit memory interface for their mainstream part might work this time around, but depending on what we see from game developers over the next six months, this could easily change in the near future.
Let's round out our architectural discussion with a nice block diagram for the 8600 series:
We can see very clearly that this is a cut down G80. As we have discussed, many of these blocks have been tweaked and enhanced to provide more efficient processing. The fundamental function of each block remains the same, and the inside of each SP remains unchanged as well. The features supported are also the same as G80. For 8500 hardware, based on G86, we drop down from two blocks of Shaders and ROPs to one each.
Two full dual-link DVI ports on a $150 card is a very nice addition. With the move from analog to digital displays, seeing a reduction in maximum resolution on budget parts because of single-link bandwidth limitations, while not devastating, isn't desirable. There are tradeoffs in moving from analog to digital display hardware, and now an additional issue has a resolution. Now we just need to see display makers crank up pixel density and improve color space without reducing response time and this old Sony GDM-F520 can finally rest in peace.
In the video output front, G84 makes a major improvement over all other graphics cards on the market: G84 based hardware supporting HDCP will be capable of HDCP over dual-link connections. This is a major feature, as a handful of larger widescreen monitors like Dell's 30" only support 1920x1080 with a dual-link connection. Unless both links are protected with HDCP, software players will refuse to play AACS protected HD content. NVIDIA has found a way around the problem by using one key ROM but sending the key over both links. The monitor is able to handle HDCP connections on both links, and is able to display the video properly at the right resolution.
As for manufacturing, the G84 is still an 80 nm part. While G80 is impressively huge at 681M transistors, G84 is "only" 289M transistors. This puts it at nearly the same transistor count as G71 (7900 GTX). While performance of the 8600 series doesn't quite compare to the 7900 GTX, the 80 nm process makes smaller die sizes (and lower prices) possible.
In addition to all this, PureVideo has received a significant boost this time around.
The New Face of PureVideo HD
The processing requirements of the highest quality HD-DVD and Blu-ray content are non-trivial. Current midrange CPUs struggle to keep up without assistance and older hardware simply cannot perform the task adequately. AMD and NVIDIA have been stepping in with GPU assisted video decode acceleration. With G84, NVIDIA takes this to another level moving well beyond simply accelerating bits and pieces of the process.
The new PureVideo hardware, VP2, is capable of offloading the entire decode process for HD-DVD and Blu-ray movies. With NVIDIA saying that 100% of the H.264 video decode process can be offloaded at up to 40 Mbits/sec on mainstream hardware, the average user will now be able to enjoy HD content on their PC (when prices on HD-DVD and Blu-ray drives fall, of course). There will still be some CPU involvement in the process, as the player will still need to run, AACS does have some overhead, and the CPU is responsible for I/O management.
This is quite a large change, even from the previous version of PureVideo. One of the most processing intensive tasks is decoding the entropy encoded bitstream. Entropy encoding is a method of coding that creates variable length symbols where the size of the symbol is inversely proportional to the probability of encountering it. In other words, patterns that occur often will be represented by short symbols when encoded while less probable patterns will get larger symbols. NVIDIA's BSP (bitstream processor) handles this.
Just adding the decoding of CABAC and CAVLC bitstreams (the two types of entropy encoding supported by H.264) would have helped quite a bit, but G84 also accelerates the inverse transform step. After the bitstream is processed, the data must go through an inverse transform to recover the video stream which then must have motion compensation and deblocking performed on it. This is a bit of an over simplification, but 100% of the process is 100% no matter how we slice it. Here's a look at the breakdown and how CPU involvement has changed between VP1 and VP2.
We have a copy of WinDVD that supports the new hardware acceleration and we are planning a follow up article to investigate real world impact of this change. As we mentioned, in spite of the fact that all video decoding is accelerated on the GPU, other tasks like I/O must be handled by the CPU. We are also interested in finding videos of more than 40 Mbit/sec to try and push the capabilities of the hardware and see what happens. We are interested in discovering the cheapest, slowest processor that can effectively play back full bandwidth HD content when paired with G84 hardware.
It is important to emphasize the fact that HDCP is supported over dual-link DVI, allowing 8600 and 8500 hardware to play HDCP protected content at its full resolution on any monitor capable of displaying 1920x1080. Pairing one of these cards with a Dell 30" monitor might not make sense for gamers, but for those who need maximal 2D desktop space and video playback, the 8600 GT or GTS would be a terrific option.
While it would be nice to have this hardware in NVIDIA's higher end offerings, this technology arguably makes more sense in mainstream parts. High end, expensive graphics cards are usually paired with high end expensive CPUs and lots of RAM. The decode assistance that these higher end cards offer is more than enough to enable a high end CPU to handle the hardest hitting HD videos. With mainstream graphics hardware providing a huge amount of decode assistance, the lower end CPUs that people pair with this hardware will benefit greatly.
The Cards and The Test
Both of our cards, the 8600 GT and the 8600 GTS, feature two DVI ports and a 7-pin video port. The GTS requires a 6-pin PCIe power connector, while the GT is capable of running using only the power provided by the PCIe slot. Each card is a single slot solution, and there isn't really anything surprising about the hardware. Here's a look at what we're working with:
In testing the 8600 cards, we used 158.16 drivers. Because we tested under Windows XP, we had to use the 93 series driver for our 7 series parts, the 97 series driver for our 8800 parts and the 158.16 beta driver for our new 8600 hardware. While Vista drivers are unified and the 8800 drivers were recently updated, GeForce 7 series running Windows XP (the vast majority of NVIDIA's customers) have been stuck with the same driver revision since early November last year. We are certainly hoping that NVIDIA will release a new unified Windows XP driver soon. Testing with three different drivers from one hardware manufacturer is less than optimal.
We haven't done any Windows Vista testing this time around, as we still care about maximum performance and testing in the environment most people will be using their hardware. This is not to say that we are ignoring Vista: we will be looking into DX10 benchmarks in the very near future. Right now, there is just no reason to move our testing to a new platform.
Here's our test setup:
|System Test Configuration|
|CPU:||Intel Core 2 Extreme X6800 (2.93GHz/4MB)|
|Motherboard:||EVGA nForce 680i SLI|
|Chipset:||NVIDIA nForce 680i SLI|
|Chipset Drivers:||NVIDIA nForce 9.35|
|Hard Disk:||Seagate 7200.7 160GB SATA|
|Memory:||Corsair XMS2 DDR2-800 4-4-4-12 (1GB x 2)|
|Video Drivers:||ATI Catalyst 7.3
NVIDIA ForceWare 93.71 (G70)
NVIDIA ForceWare 97.94 (G80)
NVIDIA ForceWare 158.16 (8600)
|Desktop Resolution:||1280 x 800 - 32-bit @ 60Hz|
|OS:||Windows XP Professional SP2|
The latest 100 series drivers do expose an issue with BF2 that enables 16xCSAA when 4xMSAA is selected in game. To combat this, we used the control panel to select 4xAA under the "enhance" application setting.
All of our games were tested using the highest selectable in-game quality options with the exception of Rainbow Six: Vegas. Our 8600 hardware had a hard time keeping up with hardware skinning enabled even at 1024x768. In light of this, we tested with hardware skinning off and medium blur. We will be doing a follow up performance article including more games. We are looking at newer titles like Supreme Commander, S.T.A.L.K.E.R., and Command & Conquer 3. We will also follow up with video decode performance.
For the comparisons that follow, the 8600 GTS is priced similarly to AMD's X1950 Pro, while the 8600 GT competes with the X1950 GT.
Battlefield 2 Performance
Our first test shows that current offerings from AMD's camp at the $150 and $200 price point's get the better of NVIDIA's new 8 series parts under BF2 with all the settings maxed out. Battlefield 2 does represent a less intense generation of DX9 games where HDR, floating point, and lots of shading power aren't the focus. Certainly we would like to see new hardware hit the market with higher performance per dollar than existing parts, but this is only our first test and feature set does count for a lot.
Looking at antialiasing performance, we see that the new hardware suffers quite a bit more here than other parts. Both 8600 parts perform near the X1650 XT, which is not a good thing. Obviously, the 128-bit memory interface comes into play with antialiasing enabled.
The Elder Scrolls IV: Oblivion Performance
One of the more important tests we have in our arsenal is Oblivion. This game really pushes the limits of hardware with beautiful scenery and HDR effects. While the 8800 GTS 320MB owns this benchmark, the 8600 GTS comes in second well above the nearest AMD competitor and the current $200 NVIDIA part it is replacing: the 7950 GT. Even the 8600 GT comes in ahead of most of the pack.
There is certainly an affinity between Oblivion and the GeForce 8 series, and if this is any indication of the type of games on the horizon NVIDIA is in a good place. Unfortunately, we can't know if this will be typical of the future games or not; most likely, it will be true sometimes and not others. It's all really up to the game developers at this point.
This OpenGL game represents the Doom 3 engine and is heavy on the texture and z/stencil operations. There are quite a few factors that could cause this game to run poorly on the G84 (clearly G80 has no real issues here). The reduced memory bandwidth or fewer ROPs (which means fewer z/stencil ops/clock) are likely culprits, but without more information it would be hard to nail down the reason we see such poor performance.
The 8600 GTS is able to keep up with its competition from AMD, but lags quite a bit behind the 7950 GT. The 8600 GT isn't able to hold its own against current $150 offerings, but it does at least stay ahead of the 7600 GT which held the $150 line for quite a while.
Both the 8600 cards fall further than other hardware when AA is enabled. Neither of our early generation DX9 games paints an attractive picture of the 8600 hardware. Let's take a look at our final test to round things out.
Rainbow Six: Vegas Performance
Our final test shows the 8600 GTS on par with both the 7950 GT and the X1950 Pro. Each of these $200 parts hits the same performance mark, and thus the 8600 GTS wins on features here.
As for the 8600 GT, performance is ahead of the 7900 GS and just hanging on to the X1950 GT. Once again, even though we have more features with the 8600 hardware, we would love to see more performance for these price points.
DirectX 10 is here, and NVIDIA has the hardware for it. Much like ATI led the way into DX9, NVIDIA has taken hold of the industry and we can expect to see developers take their DX10 cues from G80 behavior. After all, 8800 cards have been available for nearly half a year without any other DX10 alternative, so developers and consumers have both moved towards NVIDIA for now. Hopefully the robust design of DX10 will help avoid the pitfalls we saw in getting DX9 performance even across multiple GPU architectures.
Now that affordable GeForce 8 Series hardware is here, we have to weigh in on NVIDIA's implementation. While the 8600 GT improves on the performance of its spiritual predecessor the 7600 GT, we don't see significant performance improvements above hardware currently available at the target prices for the new hardware. In NVIDIA's favor, our newest and most shader intensive tests (Oblivion and Rainbow Six: Vegas) paint the 8600 hardware in a more favorable light than older tests that rely less on shader programs and more on texture, z, and color fill rates.
We are planning on looking further into this issue and will be publishing a second article on 8600 GTS/GT performance in the near future using games like S.T.A.L.K.E.R., Supreme Commander, and Company of Heroes. Hopefully these tests will help confirm our conclusion that near future titles that place a heavier emphasis on shader performance will benefit more from G84 based hardware than previous models.
Whatever we feel about where performance should be, we are very happy with the work NVIDIA has placed into video processing. We hope our upcoming video decoding performance update will reflect the expectations NVIDIA has set by claiming 100% H.264; VC-1 and MPEG-2 are not decoded 100% by the GPU, but at least in the case of MPEG-2 it's not nearly as CPU intensive anyway. Including two dual-link DVI ports even on $150 hardware with the capability to play HDCP protected content over a dual-link connection really makes the 8600 GTS and 8600 GT the hardware of choice for those who want HD video on their PC.
For users who own 7600 GT, 7900 GS, or X1950 Pro hardware, we can't recommend an upgrade to one of these new parts. Even though new features and higher performance in a few applications is better, there's not enough of a difference to justify the upgrade. On the other hand, those who are searching for new hardware to buy in the $150 - $200 range will certainly not be disappointed with 8600 based graphics. These cards aren't quite the silver bullet NVIDIA had with the 6600 series, but DX10 and great video processing are nothing to sneeze at. The features the 8600 series supports do add quite a bit of value where pure framerate may be lacking.
These cards are a good fit for users who have a 1280x1024 panel, though some of the newer games may need to have a couple settings turned down from the max to run smoothly. That's the classic definition of midrange, so in some ways it makes sense. At the same time, NVIDIA hasn't won the battle yet, as AMD has yet to unveil their DX10 class hardware. With midrange performance that's just on par with the old hardware occupying the various price points, NVIDIA has left themselves open this time around. We'll have to wait and see if AMD can capitalize.