Original Link: http://www.anandtech.com/show/3987/amds-radeon-6870-6850-renewing-competition-in-the-midrange-market
AMD’s Radeon HD 6870 & 6850: Renewing Competition in the Mid-Range Marketby Ryan Smith on October 21, 2010 10:08 PM EST
All things considered, the Radeon HD 5000 series has gone very well for AMD. When they launched it just over a year ago, they beat NVIDIA to the punch by nearly 6 months and enjoyed a solid term as the kings of the GPU world, with halo parts like the 5870 and 5970 giving them renewed exposure at the high-end of the market while mainstream products like the 5670 redefining the HTPC. Ultimately all good things come to an end though, and as NVIDIA has launched the GeForce 400 series AMD has needed to give up the single-GPU halo and lower prices in order to remain competitive.
But if spring is a period of renewal for NVIDIA, then it’s fall that’s AMD’s chance for renewal. Long before Cypress and the 5000 series even launched, AMD’s engineers had been hard at work at what would follow Cypress. Now a year after Cypress we get to meet the first GPU of the next Radeon family: Barts. With it comes the Radeon HD 6800 series, the culmination of what AMD has learned since designing and launching the 5800 series. AMD may not have a new process to produce chips on this year, but as we’ll see they definitely haven’t run out of ideas or ways to improve their efficiency on the 40nm process.
|AMD Radeon HD 6870||AMD Radeon HD 6850||AMD Radeon HD 5870||AMD Radeon HD 5850||AMD Radeon HD 4870|
|Memory Clock||1.05GHz (4.2GHz effective) GDDR5||1GHz (4GHz effective) GDDR5||1.2GHz (4.8GHz effective) GDDR5||1GHz (4GHz effective) GDDR5||900MHz (3600MHz effective) GDDR5|
|Memory Bus Width||256-bit||256-bit||256-bit||256-bit||256-bit|
|Manufacturing Process||TSMC 40nm||TSMC 40nm||TSMC 40nm||TSMC 40nm||TSMC 55nm|
Launching today are the first two members of AMD’s HD 6000 series. At the top end we have the Radeon HD 6870, a card utilizing a full-fledged version of AMD’s new Barts GPU. The core clock runs at 900MHz, which is driving 32 ROPs and 1120 SPs. Attached to that is 1GB of GDDR5 running at 4.2GHz effective. AMD puts the load TDP at 151W (the same as the Radeon HD 5850) and the idle TDP at 19W, lower than the last generation parts.
Below that is the Radeon HD 6850, which in the long history of 50-parts is utilizing a harvested version of the Barts GPU, which along with a lower load voltage make the card the low-power member of the 6800 family. The 6850 runs at 775MHz and is attached to 960SPs. Like 6870 it has 1GB of GDDR5, this time running at 4GHz effective. With its lower power consumption its load TDP is 127W, and its idle TDP is unchanged from 6870 at 19W.
The Barts GPU at the heart of these cards is the first GPU of AMD’s Northern Islands family. We’ll dive more in to its architecture later, but for now it’s easiest to call it a Cypress derivative. Contrary to the (many) early rumors, it’s still using the same VLIW5 design, cache hierarchy, and ROPs as Cypress. There are some very notable changes compared to Cypress, but except for tessellation these are more about quality and features than it is about performance.
Compared to Cypress, Barts is a notably smaller GPU. It’s still made on TSMC’s finally-mature 40nm process, but compared to Cypress AMD has shaved off 450 million transistors, bringing the die size down from 334mm2 to 255mm2. Much of this is achieved through a reduction in the SIMD count, but as we’ll see when we talk about architecture, it’s one of many tricks. As a result of AMD’s efforts, Barts at 255mm2 is right in the middle of what AMD considers their sweet spot. As you may recall from the 5870/Cypress launch, Cypress missed the sweet spot in the name of features and performance, which made it a powerful chip but also made it more expensive to produce (and harder to fabricate) than AMD would have liked. Barts is a return to the sweet spot, and more generally a return to the structure AMD operated on with the 4800 series.
With a focus on the sweet spot, it should come as no surprise that AMD is also focusing on costs and pricing. Realistically the 6800 series composes a lower tier of cards than the 5800 series – the performance is a bit lower, and so is the pricing. With a smaller GPU, cheaper GDDR5, and cheaper/fewer components, AMD is able to practically drive some members of the 6800 series down below $200, something that wasn’t possible with Cypress.
For today’s launch AMD is pricing the Radeon HD 6870 at $239, and the Radeon HD 6850 at $179. This is a hard launch, and boards should be available by the time you’re reading this article (or shortly thereafter). The launch quantities are, as AMD puts it, in the “tens of thousands” for the entire 6800 series. Unfortunately they are not providing a breakdown based on card, so we don’t have a solid idea of how much of each card will be available. We do know that all the initial 6870 cards are going to be relabeled reference cards, while the 6850 is launching with a number of custom designs – and in fact a reference 6850 may be hard to come by. We believe this is a sign that most of the card supply will be 6850s with far fewer 6870s being on the market, but this isn’t something we can back up with numbers. Tens of thousands of units may also mean that all the cards are in short supply, as cheaper cards have a tendency to fly off the shelves even faster than expensive cards – and the 5800 series certainly set a record there.
The rest of AMD’s products remain unchanged. The 5700 continues as-is, while the 5800 will be entering its twilight weeks. We’re seeing prices on the cards come down a bit, particularly on the 5850 which is caught between the 6800 cards in performance, but officially AMD isn’t changing the 5800 series pricing. Even with that, AMD expects the remaining card supply to only last through the end of the year.
Countering AMD’s launch, NVIDIA has repriced their own cards. The GTX 460 768MB stays at $169, while the GTX 460 1GB will be coming down to $199, and the GTX 470 is coming down to a mind-boggling $259 (GF100 is not a cheap chip to make, folks!). NVIDIA is also banking on factory overclocked GTX 460 1GB cards, which we’ll get to in a bit. Seeing as how AMD delivered a rude surprise for NVIDIA when they dropped the price of the 5770 series ahead of the GTS 450 launch last month, NVIDIA is a least trying to return the favor.
Ultimately this means we’re looking at staggered pricing. NVIDIA and AMD do not have any products that are directly competing at the same price points: at every $20 you’re looking at switching between AMD and NVIDIA.
|October 2010 Video Card MSRPs|
||$240||Radeon HD 6870|
||$180||Radeon HD 6850|
|$130||Radeon HD 5770|
|$80||Radeon HD 5670/5570|
Barts: The Next Evolution of Cypress
At the heart of today’s new cards is Barts, the first member of AMD’s Northern Island GPUs. As we quickly hinted at earlier, Barts is a very direct descendant of Cypress. This is both a product of design, and a product of consequences.
It should come as no surprise that AMD was originally looking to produce what would be the Northern Islands family on TSMC’s 32nm process; as originally scheduled this would line up with the launch window AMD wanted, and half-node shrinks are easier for them than trying to do a full-node shrink. Unfortunately the 32nm process quickly became doomed for a number of reasons.
Economically, per-transistor it was going to be more expensive than the 40nm process, which is a big problem when you’re trying to make an economical chip like Barts. Technologically, 32nm was following TSMC’s troubled 40nm process; TSMC’s troubles ended up being AMD’s troubles when they launched the 5800 series last year, as yields were low and wafers were few, right at a time where AMD needed every chip they could get to capitalize on their lead over NVIDIA. 32nm never reached completion so we can’t really talk about yields or such, but it’s sufficient to say that TSMC had their hands full fixing 40nm and bringing up 28nm without also worrying about 32nm.
Ultimately 32nm was canceled around November of last year. But even before that AMD made the hard choice to take a hard turn to the left and move what would become Barts to 40nm. As a result AMD had to make some sacrifices and design choices to make Barts possible on 40nm, and to make it to market in a short period of time.
For these reasons, architecturally Barts is very much a rebalanced Cypress, and with the exception of a few key changes we could talk about Barts in the same way we talked about Juniper (the 5700 series) last year.
Barts continues AMD’s DirectX 11 legacy, building upon what they’ve already achieved with Cypress. At the SPU level, like Cypress and every DX10 AMD design before it continues to use AMD’s VLIW5 design. 5 stream processors – the w, x, y, z, and t units – work together with a branch unit and a set of GPRs to process instructions. The 4 simple SPs can work together to process 4 FP32 MADs per clock, while the t unit can either do FP32 math like the other units or handle special functions such as a transcendental. Here is a breakdown of what a single Barts SPU can do in a single clock cycle:
- 4 32-bit FP MAD per clock
- 4 24-bit Int MUL or ADD per clock
- SFU : 1 32-bit FP MAD per clock
Compared to Cypress, you’ll note that FP64 performance is not quoted, and this isn’t a mistake. Barts isn’t meant to be a high-end product (that would be the 6900 series) so FP64 has been shown the door in order to bring the size of the GPU down. AMD is still a very gaming-centric company versus NVIDIA’s philosophy of GPU computing everywhere, so this makes sense for AMD’s position, while NVIDIA’s comparable products still offer FP64 if only for development purposes.
Above the SPs and SPUs, we have the SIMD. This remains unchanged from Cypress, with 80 SPs making up a SIMD. The L1 cache and number of texture units per SIMD remains at 16KB L1 texture, 8KB L1 compute, and 4 texture units per SIMD.
At the macro level AMD maintains the same 32 ROP design (which combined with Barts’ higher clocks, actually gives it an advantage over Cypress). Attached to the ROPs are AMD’s L2 cache and memory controllers; there are 4 128KB blocks of L2 cache (for a total of 512KB L2) and 4 64bit memory controllers that give Barts a 256bit memory bus.
Barts is not just a simple Cypress derivative however. For non-gaming/compute uses, UVD and the display controller have both been overhauled. Meanwhile for gaming Barts did receive one important upgrade: an enhanced tessellation unit. AMD has responded to NVIDIA’s prodding about tessellation at least in part, equipping Barts with a tessellation unit that in the best-case scenario can double their tessellation performance compared to Cypress. AMD has a whole manifesto on tessellation that we’ll get in to, but for now we’ll work with the following chart:
AMD has chosen to focus on tessellation performance at lower tessellation factors, as they believe these are the most important factors for gaming purposes. From their own testing the advantage over Cypress approaches 2x between factors 6 and 10, while being closer to a 1.5x increase before that and after that up to factor 13 or so. At the highest tessellation factors Barts’ tessellation unit falls to performance roughly in line with Cypress’, squeezing out a small advantage due to the 6870’s higher clockspeed. Ultimately this means tessellation performance is improved on AMD products at lower tessellation factors, but AMD’s tessellation performance is still going to more-or-less collapse at high factors when they’re doing an extreme amount of triangle subdivision.
So with all of this said, Barts ends up being 25% smaller than Cypress, but in terms of performance we’ve found it to only be 7% slower when comparing the 6870 to the 5870. How AMD accomplished this is the rebalancing we mentioned earlier.
Based on AMD’s design decisions and our performance data, it would appear that Cypress has more computing/shading power than it necessarily needs. True, Barts is slower, but it’s a bit slower and a lot smaller. AMD’s various compute ratios, such as compute:geometry and compute:rasterization would appear to be less than ideal on Cypress. So Barts changes the ratios.
Compared to Cypress and factoring in 6870/5870 clockspeeds, Barts has about 75% of the compute/shader/texture power of Cypress. However it has more rasterization, tessellation, and ROP power than Cypress; or in other words Barts is less of a compute/shader GPU and a bit more of a traditional rasterizing GPU with a dash of tessellation thrown in. Even in the worst case scenarios from our testing the drop-off at 1920x1200 is only 13% compared to Cypress/5870, so while Cypress had a great deal of compute capabilities, it’s clearly difficult to make extremely effective use of it even on the most shader-heavy games of today.
However it’s worth noting that internally AMD was throwing around 2 designs for Barts: a 16 SIMD (1280 SP) 16 ROP design, and a 14 SIMD (1120 SP) 32 ROP design that they ultimately went with. The 14/32 design was faster, but only by 2%. This along with the ease of porting the design from Cypress made it the right choice for AMD, but it also means that Cypress/Barts is not exclusively bound on the shader/texture side or the ROP/raster side.
Along with selectively reducing functional blocks from Cypress and removing FP64 support, AMD made one other major change to improve efficiency for Barts: they’re using Redwood’s memory controller. In the past we’ve talked about the inherent complexities of driving GDDR5 at high speeds, but until now we’ve never known just how complex it is. It turns out that Cypress’s memory controller is nearly twice as big as Redwood’s! By reducing their desired memory speeds from 4.8GHz to 4.2GHz, AMD was able to reduce the size of their memory controller by nearly 50%. Admittedly we don’t know just how much space this design choice saved AMD, but from our discussions with them it’s clearly significant. And it also perfectly highlights just how hard it is to drive GDDR5 at 5GHz and beyond, and why both AMD and NVIDIA cited their memory controllers as some of their biggest issues when bringing up Cypress and GF100 respectively.
Ultimately all of these efficiency changes are necessary for AMD to continue to compete in the GPU market, particularly in the face of NVIDIA and the GF104 GPU powering the GTX 460. Case in point, in the previous quarter AMD’s graphics division only made $1mil in profit. While Barts was in design years before that quarter, the situation still succinctly showcases why it’s important to target each market segment with an appropriate GPU; harvested GPUs are only a stop-gap solution, in the end purposely crippling good GPUs is a good way to cripple a company’ s gross margin.
Seeing the Future: DisplayPort 1.2
While Barts doesn’t bring a massive overhaul to AMD’s core architecture, it’s a different story for all of the secondary controllers contained within Barts. Compared to Cypress, practically everything involving displays and video decoding has been refreshed, replaced, or overhauled, making these feature upgrades the defining change for the 6800 series.
We’ll start on the display side with DisplayPort. AMD has been a major backer of DisplayPort since it was created in 2006, and in 2009 they went as far as making DisplayPort part of their standard port configuration for most of the 5000 series cards. Furthermore for AMD DisplayPort goes hand-in-hand with their Eyefinity initiative, as AMD relies on the fact that DisplayPort doesn’t require an independent clock generator for each monitor in order to efficiently drive 6 monitors from a single card.
So with AMD’s investment in DisplayPort it should come as no surprise that they’re already ready with support for the next version of DisplayPort, less than a year after the specification was finalized. The Radeon HD 6800 series will be the first products anywhere shipping with DP1.2 support – in fact AMD can’t even call it DP1.2 Compliant because the other devices needed for compliance testing aren’t available yet. Instead they’re calling it DP1.2 Ready for the time being.
So what does DP1.2 bring to the table? On a technical level, the only major change is that DP1.2 doubles DP1.1’s bandwidth, from 10.8Gbps (8.64Gbps video) to 21.6Gbps (17.28Gbps video); or to put this in DVI terms DP1.2 will have roughly twice as much video bandwidth as a dual-link DVI port. It’s by doubling DisplayPort’s bandwidth, along with defining new standards, that enable DP1.2’s new features.
At the moment the feature AMD is touting the most with DP1.2 is its ability to drive multiple monitors from a single port, which relates directly to AMD’s Eyefinity technology. DP1.2’s bandwidth upgrade means that it has more than enough bandwidth to drive even the largest consumer monitor; more specifically a single DP1.2 link has enough bandwidth to drive 2 2560 monitors or 4 1920 monitors at 60Hz. Furthermore because DisplayPort is a packet-based transmission medium, it’s easy to expand its feature set since devices only need to know how to handle packets addressed to them. For these reasons multiple display support was canonized in to the DP1.2 standard under the name Multi-Stream Transport (MST).
MST, as the name implies, takes advantage of DP1.2’s bandwidth and packetized nature by interleaving several display streams in to a single DP1.2 stream, with a completely unique display stream for each monitor. Meanwhile on the receiving end there are two ways to handle MST: daisy-chaining and hubs. Daisy-chaining is rather self-explanatory, with one DP1.2 monitor plugged in to the next one to pass along the signal to each successive monitor. In practice we don’t expect to see daisy-chaining used much except on prefabricated multi-monitor setups, as daisy-chaining requires DP1.2 monitors and can be clumsy to setup.
The alternative method is to use a DP1.2 MST hub. A MST hub splits up the signal between client devices, and in spite of what the name “hub” may imply a MST hub is actually a smart device – it’s closer to a USB hub in that it’s actively processing signals than it is an Ethernet hub that blindly passes things along. The importance of this distinction is that the MST hub does away with the need to have a DP1.2 compliant monitor, as the hub is taking care of separating the display streams and communicating to the host via DP1.2. Furthermore MST hubs are compatible with adaptors, meaning DVI/VGA/HDMI ports can be created off of a MST hub by using the appropriate active adaptor. At the end of the day the MST hub is how AMD and other manufacturers are going to drive multiple displays from devices that don’t have the space for multiple outputs.
For Barts AMD is keeping parity with Cypress’s display controller, giving Barts the ability to drive up to 6 monitors. Unlike Cypress however, the existence of MST hubs mean that AMD doesn’t need to dedicate all the space on a card’s bracket to mini-DP outputs, instead AMD is using 2 mini-DP ports to drive 6 monitors in a 3+3 configuration. This in turn means the Eyefinity6 line as we know it is rendered redundant, as AMD & partners no longer need to produce separate E6 cards now that every Barts card can drive 6 DP monitors. Thus as far as AMD’s Eyefinity initiative is concerned it just became a lot more practical to do a 6 monitor Eyefinity setup on a single card, performance notwithstanding.
For the moment the catch is that AMD is the first company to market with a product supporting DP1.2, putting the company in a chicken & egg position with AMD serving as the chicken. MST hubs and DP1.2 displays aren’t expected to be available until early 2011 (hint: look for them at CES) which means it’s going to be a bit longer before the rest of the hardware ecosystem catches up to what AMD can do with Barts.
Besides MST, DP1.2’s bandwidth has three other uses for AMD: higher resolutions/bitdepths, bitstreaming audio, and 3D stereoscopy. As DP1.1’s video bandwidth was only comparable to DL-DVI, the monitor limits were similar: 2560x2048@60Hz with 24bit color. With double the bandwidth for DP1.2, AMD can now drive larger and/or higher bitdepth monitors over DP; 4096x2160@50Hz for the largest monitors, and a number of lower resolutions with 30bit color. When talking to AMD Senior Fellow and company DisplayPort guru David Glen, higher color depths in particular came up a number of times. Although David isn’t necessarily speaking for AMD here, it’s his belief that we’re going to see color depths become important in the consumer space over the next several years as companies look to add new features and functionality to their monitors. And it’s DisplayPort that he wants to use to deliver that functionality.
Along with higher color depths at higher resolutions, DP1.2 also improves on the quality of the audio passed along by DP. DP1.1 was capable of passing along multichannel LPCM audio, but it only had 6.144Mbps available for audio, which ruled out multichannel audio at high bitrates (e.g. 8 channel LPCM 192Khz/24bit) or even compressed lossless audio. With DP1.2 the audio channel has been increased to 48Mbps, giving DP enough bandwidth for unrestricted LPCM along with support for Dolby and DTS lossless audio formats. This brings it up to par with HDMI, which has been able to support these features since 1.3.
Finally, much like how DP1.2 goes hand-in-hand with AMD’s Eyefinity initiative, it also goes hand-in-hand with the company’s new 3D stereoscopy initiative, HD3D. We’ll cover HD3D in depth later, but for now we’ll touch on how it relates to DP1.2. With DP1.2’s additional bandwidth it now has more bandwidth than either HDMI1.4a or DL-DVI, which AMD believes is crucial to enabling better 3D experiences. Case in point, for 3D HDMI 1.4a maxes out at 1080p24 (48Hz total), which is enough for a full resolution movie in 3D but isn’t enough for live action video or 3D gaming, both of which require 120Hz in order to achieve 60Hz in each eye. DP1.2 on the other hand could drive 2560x1600 @ 120Hz, giving 60Hz to each eye at resolutions above full HD.
Ultimately this blurs the line between HDMI and DisplayPort and whether they’re complimentary or competitive interfaces, but you can see where this is going. The most immediate benefit would be that this would make it possible to play Blu-Ray 3D in a window, as it currently has to be played in full screen mode when using HDMI 1.4a in order to make use of 1080p24.
In the meantime however the biggest holdup is still going to be adoption. Support for DisplayPort is steadily improving with most Dell and HP monitors now supporting DisplayPort, but a number of other parties still do not support it, particular among the cheap TN monitors that crowd the market these days. AMD’s DisplayPort ambitions are still reliant on more display manufacturers including DP support on all of their monitors, and retailers like Newegg and Best Buy making it easier to find and identify monitors with DP support. CES 2011 should give us a good indication on how much support there is for DP on the display side of things, as display manufacturers will be showing off their latest wares.
Seeing the Present: HDMI 1.4a, UVD3, and Display Correction
DisplayPort wasn’t the only aspect of AMD’s display controller that got an overhaul however, AMD’s HDMI capabilities have also been brought up to modern standards. Coming from Cypress with support for HDMI 1.3, AMD now supports HDMI 1.4a on the Barts based 6800 series and presumably they will do so on the rest of the 6800 series too. With HDMI 1.4a support AMD can now support full resolution (1080p) 3D stereoscopy for movies, and 720p for games and other material that require 60Hz/eye, along with 4k x 2k resolution for monitors and TVs that have equivalent support. Unlike DP this has less to do with monitors and more to do with TVs, so the importance of this will be seen more on future AMD cards when AMD refreshes their lower-end parts that we normally use with HTPCs.
Launching alongside support for displaying full resolution 3D stereoscopic video is the hardware necessary to decode such video, in the form of the latest version of AMD’s Unified Video Decoder: UVD3. The last time UVD received a major update was with UVD2, which launched alongside the Radeon HD 4000 series and added partial MPEG-2 decoding support by moving IDCT and MoComp from shaders in to the UVD fixed function hardware.
With the Radeon 6800 series AMD is releasing UVD3, which like UVD2 before it builds on the existing UVD feature set. UVD3 is adding support for 3 more-or-less new codecs: MPEG-2, MVC, and MPEG-4 ASP (better known as DivX/XviD). Starting with MPEG-4 ASP, it’s the only new codec in supported by UVD3 that’s actually new, as previously all MPEG-4 ASP decoding was done in software when it came to AMD GPUs. With UVD3 AMD can now completely offload MPEG-4 ASP decoding to the GPU, bringing forth the usual advantages of greatly reducing the amount of work the CPU needs to do and ideally reducing power consumption in the process.
AMD adding MPEG-4 ASP support gives us an interesting chance to compare and contrast them to NVIDIA, who added similar support a year ago in the GT21x GPUs. AMD is a good bit behind NVIDIA here, but they’re making up for it by launching with much better software support for this feature than NVIDIA did; NVIDIA still does not expose their MPEG-4 ASP decoder in most situations, and overall did a poor job of advertising it. When we talked with DivX (who is AMD’s launch partner for this feature) they didn’t even know that NVIDIA had MPEG-4 ASP support. Meanwhile AMD is launching with DivX and had a beta version of the DivX codec with UVD3 support ready to test, and furthermore AMD is fully exposing their MPEG-4 ASP capabilities in their drivers as we see in this DXVA Checker screenshot.
The only downside at this time is that even with Microsoft’s greater focus on codecs for Windows 7, Windows 7 doesn’t know what to do with DXVA acceleration of MPEG-4 ASP. So while Win7 can play MPEG-4 ASP in software, you’re still going to need a 3rd party codec like the DivX codec to get hardware support for MPEG-4 ASP.
The other bit worth mentioning is that while AMD is launching support for MPEG-4 ASP decoding here on the 6800 series, much like HDMI 1.4a it’s not going to be a big deal for the 6800 series market. MPEG-4 ASP is a fairly lightweight codec, so support for it is going to be a bigger deal on low-end products, particularly AMD’s APUs if Llano and Bobcat end up using UVD3, as MPEG-4 ASP decoding in software requires a much greater share of resources on those products.
Up next is MPEG-2, which has been a codec stuck in limbo for quite some time over at AMD. MPEG-2 is even older and easier to decode than MPEG-4 ASP, and while GPUs have supported MPEG-2 decode acceleration as early as last decade, CPUs quickly became fast enough that when combined with low levels of hardware decode acceleration (inverse discrete cosine transform) was more than enough to play MPEG-2 content. Thus AMD hasn’t done much with MPEG-2 over the years other than moving IDCT/MoComp from the shaders to UVD for UVD2.
Because of the similarities between MPEG-4 ASP and MPEG-2, when AMD added support for full MPEG-4 ASP decode acceleration they were able to easily add support for full MPEG-2 decode acceleration, as they were able to reuse the MPEG-4 ASP entropy decode block for MPEG-2. As a result of including full MPEG-4 ASP decode acceleration, AMD now supports full MPEG-2 decode acceleration. Even more so than MPEG-4 ASP however, the benefits for this are going to lie with AMD’s low-end products where getting MPEG-2 off of the CPU should be a boon for battery life.
The final addition to UVD3 is support for Multiview Video Coding, which isn’t a new codec per se, but rather is an extension to H.264 for 3D stereoscopy. H.264 needed to be amended to support the packed frame formats used to store and transmit 3D stereoscopic videos, so with UVD3 AMD is adding support for MVC so that UVD can handle Blu-Ray 3D.
Finally, coupled with support for new codecs and new display outputs in AMD’s display controller is a refinement of AMD’s existing color correction capabilities in their display controller. Cypress and the rest of the 5000 series could do color correction directly on their display controllers, but they could only do so after gamma correction was applied, meaning they had to work in the non-linear gamma color space. Technically speaking this worked, but color accuracy suffered as a result. With the 6800 series’ new display controller, AMD can now perform color calibration in linear space by converting the image from gamma to linear color space for the color correction, before converting it back to gamma color space for display purposes.
As color correction is being used to correct for wide-gamut monitors the importance of this change won’t be seen right away for most users, but as wide-gamut monitors become more widespread color correction becomes increasingly important since wide-gamut monitors will misinterpret the normal sRGB colorspace that most rendering is done in.
High IQ: AMD Fixes Texture Filtering and Adds Morphological AA
“There’s nowhere left to go for quality beyond angle-independent filtering at the moment.”
With the launch of the 5800 series last year, I had high praise for AMD’s anisotropic filtering. AMD brought truly angle-independent filtering to gaming (and are still the only game in town), putting an end to angle-dependent deficiencies and especially AMD’s poor AF on the 4800 series. At both the 5800 series launch and the GTX 480 launch, I’ve said that I’ve been unable to find a meaningful difference or deficiency in AMD’s filtering quality, and NVIDIA was only deficienct by being not quite angle-independent. I have held – and continued to hold until last week – the opinion that there’s no practical difference between the two.
It turns out I was wrong. Whoops.
The same week as when I went down to Los Angeles for AMD’s 6800 series press event, a reader sent me a link to a couple of forum topics discussing AF quality. While I still think most of the differences are superficial, there was one shot comparing AMD and NVIDIA that caught my attention: Trackmania.
The shot clearly shows a transition between mipmaps on the road, something filtering is supposed to resolve. In this case it’s not a superficial difference; it’s very noticeable and very annoying.
AMD appears to agree with everyone else. As it turns out their texture mapping units on the 5000 series really do have an issue with texture filtering, specifically when it comes to “noisy” textures with complex regular patterns. AMD’s texture filtering algorithm was stumbling here and not properly blending the transitions between the mipmaps of these textures, resulting in the kind of visible transitions that we saw in the above Trackmania screenshot.
|Radeon HD 5870||Radeon HD 6870||GeForce GTX 480|
So for the 6800 series, AMD has refined their texture filtering algorithm to better handle this case. Highly regular textures are now filtered properly so that there’s no longer a visible transition between them. As was the case when AMD added angle-independent filtering we can’t test the performance impact of this since we don’t have the ability to enable/disable this new filtering algorithm, but it should be free or close to it. In any case it doesn’t compromise AMD’s existing filtering features, and goes hand-in-hand with their existing angle-independent filtering.
At this point we’re still working on recreating the Trackmania scenario for a proper comparison (which we’ll add to this article when it’s done), but so far it looks good – we aren’t seeing the clear texture transitions that we do on the 5800 series. In an attempt to not make another foolish claim I’m not going to call it perfect, but from our testing we can’t find any clear game examples of where the 6870’s texture filtering is deficient compared to NVIDIA’s – they seem to be equals once again. And even the 5870 with its regular texture problem still does well in everything we’ve tested except Trackmania. As a result I don’t believe this change will be the deciding factor for most people besides the hardcore Trackmania players, but it’s always great to see progress on the texture filtering front.
Moving on from filtering, there’s the matter of anti-aliasing. AMD’s AA advantage from the launch of the 5800 series has evaporated over the last year with the introduction of the GeForce 400 series. With the GTX 480’s first major driver update we saw NVIDIA enable their transparency supersampling mode for DX10 games, on top of their existing ability to use CSAA coverage samples for Alpha To Coverage sampling. The result was that under DX10 NVIDIA has a clear advantage in heavily aliased games such as Crysis and Bad Company 2, where TrSS could smooth out many of the jaggies for a moderate but reasonable performance hit.
For the 6800 series AMD is once again working on their AA quality. While not necessarily a response to NVIDIA’s DX10/DX11 TrSS/SSAA abilities, AMD is introducing a new AA mode, Morphological Anti-Aliasing (MLAA), which should make them competitive with NVIDIA on DX10/DX11 games.
In a nutshell, MLAA is a post-process anti-aliasing filter. Traditional AA modes operate on an image before it’s done rendering and all of the rendering data is thrown away; MSAA for example works on polygon edges, and even TrSS needs to know where alpha covered textures are. MLAA on the other hand is applied to the final image after rendering, with no background knowledge of how it’s rendered. Specifically MLAA is looking for certain types of high-contrast boundaries, and when it finds them it treats them as if they were an aliasing artifact and blends the surrounding pixels to reduce the contrast and remove the aliasing.
MLAA is not a new AA method, but it is the first time we’re seeing it on a PC video card. It’s already in use on video game consoles, where it’s a cheap way to implement AA without requiring the kind of memory bandwidth MSAA requires. In fact it’s an all-around cheap way to perform AA, as it doesn’t require too much computational time either.
For the 6800 series, AMD is implementing MLAA as the ultimate solution to anti-aliasing. Because it’s a post-processing filter, it is API-agonistic, and will work with everything. Deferred rendering? Check. Alpha textures? Done. Screwball games like Bad Company 2 that alias everywhere? Can do! And it should be fast too; AMD says it’s no worse than tier Edge Detect AA mode.
So what’s the catch? The catch is that it’s a post-processing filter; it’s not genuine anti-aliasing as we know it because it’s not operating on the scene as its being rendered. Where traditional AA uses the rendering data to determine exactly what, where, and how to anti-alias things, MLAA is effectively a best-guess at anti-aliasing the final image. Based on what we’ve seen so far we expect that it’s going to try to anti-alias things from time to time that don’t need it, and that the resulting edges won’t be quite as well blended as with MSAA/SSAA. SSAA is still going to offer the best image quality (and this is something AMD has available under DX9), while MSAA + transparency/adaptive anti-aliasing will be the next best method.
Unfortunately AMD only delivered the drivers that enable MLAA yesterday, so we haven’t had a chance to go over the quality of MLAA in-depth. As it’s a post-processing filter we can actually see exactly how it affects images (AMD provides a handy tool to do this) so we’ll update this article shortly with our findings.
Finally, for those of you curious how this is being handled internally, this is actually being done by AMD’s drivers through a DirectCompute shader. Furthermore they’re taking heavy advantage of the Local Data Store of their SIMD design to keep adjacent pixels in memory to speed it up, with this being the biggest reason why it has such a low amount of overhead. Since it’s a Compute Shader, this also means that it should be capable of being back-ported to the 5000 series, although AMD has not committed to this yet. There doesn’t appear to be a technical reason why this isn’t possible, so ultimately it’s up to AMD and if they want to use it to drive 6800 series sales over 5000 series sales.
What’s In a Name?
GPU naming is rarely consistent. While NVIDIA is usually the biggest perpetrator of naming confusion or suddenly switched names AMD does not have a clear record either (the Mobility 5100 series comes to mind). However we’re not sure there’s precedent for AMD’s latest naming decision, and there’s really no stepping around it. So we have a few thoughts we’d like to share.
Since the introduction of the Radeon 3870 in 2007, 800 has been the series designation for AMD’s high-end products. The only time they’ve broken this is last year, when AMD ditched the X2 moniker for their dual-GPU card for the 5900 designation, a move that ruffled a few feathers but at least made some sense since the 5970 wasn’t a true 5870 X2. Regardless, the 800 series has since 2007 been AMD’s designation for their top single-chip product.
With that naming scheme come expectations of performance. Each 800 series card has been successively faster, and while pricing has been inconsistent as AMD’s die size and costs have shifted, ultimately each 800 series card was a notable step up in performance from the previous card. With the 6800 this is not the case. In fact it’s absolutely a step down, the 6800 series is on average 7% slower than the 5800 series. This doesn’t mean that AMD hasn’t made enhancements to the card –we’ve already covered the enhanced tessellation unit, AA/AF, UVD3, and other features – but these are for the most part features and not performance enhancements.
Today AMD is turning their naming scheme on its head by launching these Barts cards with the 6800 name, but without better-than-5800 performance. AMD’s rationale for doing this is that they’re going to be continuing to sell the 5700 series, and that as a result they didn’t want to call these cards the 6700 series and introduce confusion. Furthermore AMD is trying to recapture the glory days of the 4800 series, where those parts sold for under $300 and then quickly under $200. It wasn’t until the 5800 series that an 800 series card became outright expensive. So for these reasons, AMD wanted to call these Barts cards the 6800 series.
We find ourselves in disagreement with AMD here.
We don’t have a problem with AMD introducing the 6 series here – the changes they’ve made, even if not extreme, at least justify that. But there’s a very real issue of creating confusion for buyers of the 5800 series now by introducing the 6800 series. The performance may be close and the power consumption lower, but make no mistake, the 5800 series was faster.
Ultimately this is not our problem; this is AMD’s problem. So we can’t claim harm per-say, but we can reflect on matters. The Barts cards being introduced today should have been called the 6700 series. It would have made the latest rendition of the 700 series more expensive than last time, but at the same time Barts is a very worthy upgrade to the 5700 series. But then that’s the problem for AMD; they don’t want to hurt sales of the 5700 series while it’s still on the market.
NVIDIA’s 6870 Competitor & the Test
As we mentioned on the front page of this article, AMD and NVIDIA don’t officially have competing products at the same price points. The 6870 and 6850 are more expensive than the GTX 460 1GB and 768MB respectively, and above the 6870 is the GTX 470. However NVIDIA is particularly keen to have a competitor to the 6870 that isn’t a GTX 470, and so they’re pushing a 2nd option: a factory overclocked GTX 460 1GB.
As a matter of editorial policy we do not include overclocked cards on general reviews. As a product, reference cards will continue to be produced for quite a while, with good products continuing on for years. Overclocked cards on the other hand come and go depending on market conditions, and even worse no two overclocked cards are alike. If we did normally include overclocked cards, our charts would be full of cards that are only different by 5MHz.
However with the 6800 launch NVIDIA is pushing the overclocked GTX 460 option far harder than we’ve seen them push overclocked cards in the past –we had an EVGA GTX 460 1GB FTW on our doorstep before we were even back from Los Angeles. Given how well the GTX 460 overclocks and how many heavily overclocked cards there are on the market, we believe there is at least some merit to NVIDIA’s arguments, so in this case we went ahead and included the EVGA card in our review. As a reference point it's clocked at 850Mhz and 4GHz memory versus 675MHz core and 3.6MHz memory for a stock GTX 460, giving it a massive 26% core overclock and a much more moderate 11% memory overclock.
However with that we’ll attach the biggest disclaimer we can that while we’re including the card, we don’t believe NVIDIA is taking the right action here. If they were serious about having a higher clocked GTX 460 on the market, then they need to make a new product, such as a GTX 461. Without NVIDIA establishing guidelines, these overclocked GTX 460 cards can vary in clockspeed, cooling, and ultimately performance by a very wide margin. In primary reviews such as these we’re interested in looking at cards that will be around for a while, and without an official product from NVIDIA there’s no guarantee any of these factory overclocked cards will still be around.
If nothing else, pushing overclocked cards makes for a messy situation for buyer. An official product provides a baseline of performance that buyers can see in reviews like ours and expect in any cards they buy. With overclocked cards, this is absent. Pushing factory overclocked cards may give NVIDIA a competitive product, but it’s being done in a way we can’t approve of.
Moving on, for today’s launch we’re using AMD’s latest beta launch drivers, version 8.782RC2, which is analogous to Catalyst 10.10. For the NVIDIA cards we’re using the WHQL version of 260.89.
Keeping with our desire to periodically refresh our benchmark suite, we’ve gone ahead and shuffled around a few benchmarks. We’ve dropped Left 4 Dead (our highest performing benchmark) and the DX11 rendition of BattleForge for Civilization 5 and Metro 2033 respectively, both running in DX11 mode.
With the refresh in mind, we’ve had to cut short our usual selection of cards, as we’ve had under a week to (re)benchmark everything and to write this article, shorter than what we usually have for an article of this magnitude. We’ll be adding these new cards and the rest of our normal lineup to the GPU Bench early next week when we finish benchmarking them.
|CPU:||Intel Core i7-920 @ 3.33GHz|
|Motherboard:||Asus Rampage II Extreme|
|Chipset Drivers:||Intel 188.8.131.525 (Intel)|
|Hard Disk:||OCZ Summit (120GB)|
|Memory:||Patriot Viper DDR3-1333 3 x 2GB (7-7-7-20)|
AMD Radeon HD 6870
AMD Radeon HD 6850
AMD Radeon HD 5870
AMD Radeon HD 5850
AMD Radeon HD 5770
AMD Radeon HD 4870
NVIDIA GeForce GTX 480
NVIDIA GeForce GTX 470
NVIDIA GeForce GTX 460 1GB
NVIDIA GeForce GTX 460 768MB
NVIDIA GeForce GTS 450
NVIDIA GeForce GTX 285
NVIDIA GeForce GTX 260 Core 216
EVGA GeForce GTX 460 1GB FTW
NVIDIA ForceWare 260.89
AMD Catalyst 10.10
|OS:||Windows 7 Ultimate 64-bit|
Kicking things off as always is Crysis: Warhead, still the toughest game in our benchmark suite. Even 2 years since the release of the original Crysis, “but can it run Crysis?” is still an important question, and the answer continues to be “no.” One of these years we’ll actually be able to run it with full Enthusiast settings…
For reasons we’ve yet to determine, Crysis continues to do a very good job serving as an overall barometer for video card performance. Much of what we see here will show up later, including the order that cards fall in.
As we’ve been expecting, the 6800 series cannot keep up with the 5800 series – Barts is still a “rebalanced” Cypress after all. The performance gap isn’t too severe, and it certainly couldn’t justify 5870 prices at today’s prices, but the 6870 and 6850 definitely aren’t perfect replacements for their 5800 series counterparts.
Focusing on 1920x1200, we have a 3-way race between the GTX 470, EVGA GTX 460, and the 6870. The 6870 comes out ahead, with the EVGA and then the GTX 470 bringing up the pack at under a frame behind. Meanwhile near the 6850 is the GTX 460 1GB, and it’s 2fps behind; while even farther down the line is the GTX 460 768MB, which officially is only $10 cheaper than the 6850 and yet it’s well behind the pack. As we’ll see, the 6850 will quickly assert itself as the GTX 460 1GB’s peer when it comes to performance.
Meanwhile taking a quick look at Crossfire performance we see an interesting trend: the 6800 series cards are much closer to their 5800 series counterparts than they are in single card mode. Here the 6850CF even manages to top the 5850CF, an act that nearly defies logic. This is something we’ll have to keep an eye on in later results.
Moving on to our minimums, the picture changes slightly in NVIDIA’s favor. The 6870 drops to the bottom of its pack, while the 6850’s lead narrows versus both GTX 460 cards. Meanwhile in CF mode now both 6800 series cards top their 5800 series counterparts. Crysis’ minimum framerate has always been a bit brutal to AMD cards due to how AMD’s drivers manage their memory, a problem compounded by Crossfire mode. Perhaps something has changed?
Up next is BattleForge, Electronic Arts’ free to play online RTS. As far as RTSes go this game can be quite demanding, and this is without the game’s DX11 features.
Unlike Crysis, BattleForge delivers distinctly different results. The GTX 470 is the clear winner by nearly 15%, while the EVGA GTX 460 and the 6870 are neck-and-neck. Meanwhile the 6850 takes a convincing lead over the stock-clocked GTX 460s. Both 6800 series cards end up falling where we would expect them to, splitting either side of the 5850 and falling behind the 5870.
Meanwhile we once more see unusual Crossfire results; the 6800 series doesn’t beat the 5800 series this time, but it once again closes the gap left by the individual cards.
The third game on our list is one of the games new to our benchmark suite: Starcraft II. Under normal circumstances Starcraft II isn’t particularly GPU limited – it spends much of its time CPU limited – but in 2-player battles in particular we find that it responds more to GPU performance than it does CPU performance. As icing on the cake we have 4x AA enabled, which thanks to the deferred rendering workarounds used by AMD and NVIDIA, crushes the performance of all the cards involved.
Do note that due to the amount of time it takes to run this benchmark we had to cut our testing short. We’ll have our full results in Bench next week.
When we were running our benchmarks for Starcraft, Anand asked me more than once whether our results were a fluke. At 2560 the Radeon cards trample the GTX 460, and at 1920 it’s still lopsided when the 6850 breaks-even in most other games. Compared to the other Radeon cards the 6000 series continues to perform near the 5850, but otherwise the GTX 460 is at a clear disadvantage.
The second new game on our list is 4A Games’ Metro 2033, their tunnel shooter released earlier this year. Last month the game finally received a major patch resolving some outstanding image quality issues with the game, finally making it suitable for use in our benchmark suite. At the same time a dedicated benchmark mode was added to the game, giving us the ability to reliably benchmark much more stressful situations than we could with FRAPS. If Crysis is a tropical GPU killer, then Metro would be its underground counterpart.
From the moment you run Metro it’s clear that it’s going to be a shader-heavy game, and the results from our benchmarks mirror this. The 6800 series does particularly poorly compared to the 5800 series here as a result of the loss of shaders, giving the 5800 series a solid 10% lead over the 6800 series, and showcasing why AMD’s rebalanced design for Barts comes with its own set of tradeoffs.
In the 6870 pack, we’re looking at a dead heat; how Metro reports averages creates a wider spread than the actual performance differences in these cards. Metro is a hard game but it’s a fair game: everything is equally slow. Meanwhile the 6850 manages a small lead over the GTX 460 1GB.
Finally, looking once more at Crossfire scores we see the same mysterious pattern: the 6800 series nearly closes the 5800 series gap.
Ubisoft’s 2008 aerial action game is one of the less demanding games in our benchmark suite, particularly for the latest generation of cards. However it’s fairly unique in that it’s one of the few flying games of any kind that comes with a proper benchmark.
Unlike our previous shader-bound games, HAWX is a game that’s light on the shaders and comparatively heavier on geometry, texturing, and general rasterization. As a result it’s one of the best games for the Barts architecture, as the 6800 series comes out only a frame behind their 5800 series counterparts thanks to the equal number of ROPs and the higher clockspeeds of the 6800 series. In this game at least, 6800 and 5800 are equals.
Unfortunately for AMD, both generations may be equal, but compared to NVIDIA they’re equally slow. The EVGA GTX 460 and the GTX 470 enjoy a healthy 10% lead over the 5870/6870, while the 6850 has more in common with the GTX 460 768MB than it does the GTX 460 1GB.
Meanwhile in an action that blows our mind, the 6800 series cards in Crossfire manage to convincingly beat the 5800 series in Crossfire. Admittedly we’re talking about a difference that’s academic (169fps vs 154fps) but it’s as clear a sign as any that something special is going on with the 6800 series in Crossfire.
The last new game in our benchmark suite is Civilization 5, the latest incarnation in Firaxis Games’ series of turn-based strategy games. Civ 5 gives us an interesting look at things that not even RTSes can match, with a much weaker focus on shading in the game world, and a much greater focus on creating the geometry needed to bring such a world to life. In doing so it uses a slew of DirectX 11 technologies, including tessellation for said geometry and compute shaders for on-the-fly texture decompression.
It’s also one of the few games banned at AnandTech, as “one more turn” and article deadlines are rarely compatible.
Civ 5 has given us benchmark results that quite honestly we have yet to fully appreciate. A tight clustering of results would normally indicate that we’re CPU bound, but the multi-GPU results – particularly for the AMD cards – turns this concept on its head by improving performance by 47% anyhow. The most telling results however are found in the GTX 460 cards, where there’s a clear jump in performance going form the 768MB card to the 1GB card, and again from the 1GB card to the EVGA card. The 1GB GTX only improves on memory, memory bandwidth, and ROPs, greatly narrowing down the factors. No one factor can explain our results, but we believe we’re almost simultaneously memory and geometry bound.
With that in mind, this is clearly a game that benefits NVIDIA’s GPUs right now when we’re looking at single-GPU performance. This likely comes down to NVIDIA’s greater geometry capabilities, but we’re not willing to rule out drivers quite yet, particularly when a partially CPU-bound game comes in to play. In any case NVIDIA’s advantage leads to their wiping the floor with AMD here, as even the mere GTX 460 768MB can best a 5870, let alone the 6800 series.
Crossfire changes things up, but only because NVIDIA apparently does not have a SLI profile for Civ 5 at this time.
The latest game in the Battlefield series - Bad Company 2 – remains as one of the cornerstone DX11 games in our benchmark suite. As BC2 doesn’t have a built-in benchmark or recording mode, here we take a FRAPS run of the jeep chase in the first act, which as an on-rails portion of the game provides very consistent results and a spectacle of explosions, trees, and more.
Our experience with Bad Company 2 more or less matches our experiences with other shader-heavy games at 1920x1200. The GTX 470, 6870, and EVGA GTX 460 all vie for the top of their pack within a frame of each other, while the 6850 enjoys a clear lead over the GTX 460 1GB. However what’s interesting is that the Radeon 5800 series takes a very obvious lead here, a lead that’s larger than in most other games. If you ever wanted to know just how shader-bound Bad Company 2 is, there’s the answer you’re looking for.
As for the Crossfire situation, once again the 6800 series closes the gap. Even the GTX 470 in SLI can’t quite keep up with the 6850CF, which is much a story of how well the game runs on AMD cards as it is a story of what’s clearly going on with the 6800 series and Crossfire.
The 3rd game in the STALKER series continues to build on GSC Game World’s X-Ray Engine by adding DX11 support, tessellation, and more. This also makes it another one of the highly demanding games in our benchmark suite.
STALKER is a game that seems to have gone back and forth some with driver optimizations. In the latest round if anything the NVIDIA cards seem to have a slight edge against the 6870. Here the 6870 is competitive with the EVGA GTX 460, however it’s nearly 10% behind the GTX 470. Meanwhile the 6850 is far more normal, more or less tying with the GTX 460 1GB and enjoying the usual lead over the 768MB version. Meanwhile compared to the 5800 series the 6800 series does rather poorly here, as Stalker is another game that’s shader-bound; the 6870 only takes a slight lead over the 5850, and is several frames per second behind the 5870.
As for our Crossfire cards, they have once again performed another miracle. Being shader bound doesn’t stop the 6870CF from beating the 5870CF while the 6850CF ties the 5850CF.
Codemasters 2009 off-road racing game continues its reign as the token racer in our benchmark suite. As the first DX11 racer, DIRT 2 makes pretty through use of the DX11’s tessellation abilities, not to mention still being the best looking racer we have ever seen.
For whatever reason DIRT 2 is a game that NVIDIA cards have and continue to do well at, and today’s launch doesn’t change that. The 6850 falls to even the GTX 460 768MB, while the 6870 at least bests the stock-clocked GTX 460 1GB, but falls to both the GTX 470 and EVGA GTX 460 by double-digit percentages. AMD may win at other games, but DIRT 2 is a game where they have to take it on the chin.
Electronic Arts’ space-faring RPG is our Unreal Engine 3 game. While it doesn’t have a built in benchmark, it does let us force anti-aliasing through driver control panels, giving us a better idea of UE3’s performance at higher quality settings. Since we can’t use a recording/benchmark in ME2, we use FRAPS to record a short run.
Mass Effect 2 ends up being another game that AMD that AMD struggles at. It’s not a game we’d normally classify as being shader-bound, but looking at the 6800 series performance relative to the 5800 series, it’s either that or texture-bound. In any case the 6850 is still competitive with the GTX 460 1GB, but the 6870 falls behind NVIDIA’s competing cards by several frames per second.
Meanwhile it’s interesting to note that this is the only game where the 6800 series doesn’t dig itself out of a hole in CF mode. Both the 6850CF and 6870CF are where we’d expect them to be relative to the 5870CF/5850CF if we took a simple extrapolation of their performance.
Finally among our revised benchmark suite we have Wolfenstein, the most recent game to be released using the id Software Tech 4 engine. All things considered it’s not a very graphically intensive game, but at this point it’s the most recent OpenGL title available. It’s more than likely the entire OpenGL landscape will be thrown upside-down once id releases Rage next year.
Wolfenstein isn’t particularly shader-heavy, but it’s not ROP-heavy either. As a result the 6800 series can’t quite close the gap here like it can on HAWX, although it’s certainly closer than when we’re on a shader-heavy game. Looking at the competition, the GTX 470 strangely plummets here, leaving the 6870 and EVGA GTX 460 to compete at the top of their class. Meanwhile the 6850 is tied up with the GTX 460 1GB. Our Crossfire cards also end up neck-and-neck, but this appears to have more to do with the fact that we’re CPU bound once we hit 110fps or so.
For a while now we’ve been trying to establish a proper cross-platform compute benchmark suite to add to our GPU articles. It’s not been entirely successful.
While GPUs have been compute capable in some form since 2006 with the launch of G80, and AMD significantly improved their compute capabilities in 2009 with Cypress, the software has been slow to catch on. From gatherings such as NVIDIA’s GTC we’ve seen first-hand how GPU computing is being used in the high-performance computing market, but the consumer side hasn’t materialized as quickly as the right situations for using GPU computing aren’t as straightforward and many developers are unwilling to attach themselves to a single platform in the process.
2009 saw the ratification of OpenCL 1.0 and the launch of DirectCompute, and while the launch of these cross-platform APIs removed some of the roadblocks, we heard as recently as last month from Adobe and others that there’s still work to be done before companies can confidently deploy GPU compute accelerated software. The immaturity of OpenCL drivers was cited as one cause, however there’s also the fact that a lot of computers simply don’t have a suitable compute-capable GPU – it’s Intel that’s the world’s biggest GPU vendor after all.
So here in the fall of 2010 our search for a wide variety of GPU compute applications hasn’t panned out quite like we expected it too. Widespread adoption of GPU computing in consumer applications is still around the corner, so for the time being we have to get creative.
With that in mind we’ve gone ahead and cooked up a new GPU compute benchmark suite based on the software available to us. On the consumer side we have the latest version of Cyberlink’s MediaEspresso video encoding suite and an interesting sub-benchmark from Civilization V. On the professional side we have SmallLuxGPU, an OpenCL based ray tracer. We don’t expect this to be the be all and end all of GPU computing benchmarks, but it gives us a place to start and allows us to cover both cross-platform APIs and NVIDIA & AMD’s platform-specific APIs.
Our first compute benchmark comes from Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.
In our look at Civ V’s performance as a game, we noted that it favors NVIDIA’s GPUs at the moment, and this may be part of the reason why. NVIDIA’s GPUs clean up here, particularly when compared to the 6800 series and its reduced shader count. Furthermore within the GPU families the results are very straightforward, with the order following the relative compute power of each GPU. To be fair to AMD they made a conscious decision to not chase GPU computing performance with the 6800 series, but as a result it fares poorly here.
Our second compute benchmark is Cyberlink’s MediaEspresso 6, the latest version of their GPU-accelerated video encoding suite. MediaEspresso 6 doesn’t currently utilize a common API, and instead has codepaths for both AMD’s APP (née Stream) and NVIDIA’s CUDA APIs, which gives us a chance to test each API with a common program bridging them. As we’ll see this doesn’t necessarily mean that MediaEspresso behaves similarly on both AMD and NVIDIA GPUs, but for MediaEspresso users it is what it is.
We decided to go ahead and use MediaEspresso in this article not knowing what we’d find, and it turns out the results were both more and less than we were expecting at the same time. While our charts don’t show it, video transcoding isn’t all that GPU intensive with MediaEspresso; once we achieve a certain threshold of compute performance on a GPU – such as a GTX 460 in the case of an NVIDIA card – the rest of the process is CPU bottlenecked. As a result all of our Fermi NVIDIA cards at the GTX 460 or better take just as long to encode our sample video, and while the AMD cards show some stratification, it’s on the order of only a couple of seconds. From this it’s clear that with Cyberlink’s technology having a GPU is going to help, but it can’t completely offload what’s historically been a CPU-intensive activity.
As for an AMD/NVIDIA cross comparison, the results are straightforward but not particularly enlightening. It turns out that MediaEspresso 6 is significantly faster on NVIDIA GPUs than it is on AMD GPUs, but since we’ve already established that MediaEspresso 6 is CPU limited when using these powerful GPUs, it doesn’t say anything about the hardware. AMD and NVIDIA both provide common GPU video encoding frameworks for their products that Cyberlink taps in to, and it’s here where we believe the difference lies.
In particular we see MediaEspresso 6 achieve 50% CPU utilization (4 core) when being used with an NVIDIA GPU, while it only achieves 13% CPU utilization (1 core) with an AMD GPU. At this point it would appear that the CPU portions of NVIDIA’s GPU encoding framework are multithreaded while AMD’s framework is singlethreaded. And since the performance bottleneck for video encoding still lies with the CPU, this would be why the NVIDIA GPUs do so much better than the AMD GPUs in this benchmark.
Our final GPU compute benchmark is SmallLuxGPU, the GPU ray tracing branch of the open source LuxRender renderer. While it’s still in beta, SmallLuxGPU recently hit a milestone by implementing a complete ray tracing engine in OpenCL, allowing them to fully offload the process to the GPU. It’s this ray tracing engine we’re testing.
Compared to our other two GPU computing benchmarks, SmallLuxGPU follows the theoretical performance of our GPUs much more closely. As a result our Radeon GPUs with their difficult-to-utilize VLIW5 design end up topping the charts by a significant margin, while the fastest comparable NVIDIA GPU is still 10% slower than the 6850. Ultimately what we’re looking at is what amounts to the best-case scenarios for these GPUs, with this being as good an example as any that in the right circumstances AMD’s VLIW5 shader design can go toe-to-toe with NVIDIA’s compute-focused design and still win.
At the other end of the spectrum from GPU computing performance is GPU tessellation performance, used exclusively for graphical purposes. For the Radeon 6800 series, AMD enhanced their tessellation unit to offer better tessellation performance at lower tessellation factors. In order to analyze the performance of AMD’s enhanced tessellator, we’re using the Unigine Heaven benchmark and Microsoft’s DirectX 11 Detail Tessellation sample program to measure the tessellation performance of a few of our cards.
Since Heaven is a synthetic benchmark at the moment (the DX11 engine isn’t currently used in any games) we’re less concerned with performance relative to NVIDIA’s cards and more concerned with performance relative to the 5870. Compared to the 5870 the 6870 ends up being slightly slower when using moderate amounts of tessellation, while it pulls ahead when using extreme amounts of tessellation. Considering that the 6870 is around 7% slower in games than the 5870 this is actually quite an accomplishment for Barts, and one that we can easily trace back to AMD’s tessellator improvements.
Our second tessellation test is Microsoft’s DirectX 11 Detail Tessellation sample program, which is a much more straightforward test of tessellation performance. Here we’re simply looking at the framerate of the program at different tessellation levels, specifically level 7 (the default level) and level 11 (the maximum level). Here AMD’s tessellation improvements become even more apparent, with the 6870 handily beating the 5870. In fact our results are very close to AMD’s own internal results – at level 7 the 6870 is 43% faster than the 5870, while at level 11 that improvement drops to 29% as the increased level leads to an increasingly large tessellation factor. However this also highlights the fact that AMD’s tessellation performance still collapses at high factors compared to NVIDIA’s GPUs, making it all the more important for AMD to encourage developers to use more reasonable tessellation factors.
Last but not least in our look at AMD’s new Radeon 6800 series is our look at power consumption, GPU temperatures, and the amount of noise generated. With efficiency being one of the major design goals for Barts, AMD stands to gain a lot of ground here compared to the 5800 series for only a minor drop in performance.
Looking quickly at the voltages of the 6800 series, we have 4 samples – 2 each of the 6870, and 2 each of the 6850. Both of our 6870 cards have an idle voltage of 0.945v and a load voltage of 1.172v, and seeing as how they’re both based on AMD’s reference design this is what we would expect for a design that is based around a single VID.
However our 6850 results, which include a non-reference card in the form of XFX’s customized 6850, are much more interesting. While our reference 6850 has a load voltage of 1.094v, our XFX card reports a load voltage of 1.148v. We’ll be taking a look at the XFX 6850 in-depth next week in our 6850 roundup, but for now this leaves us with the question of whether AMD is using variable VIDs, or if XFX is purposely setting theirs higher for overclocking purposes.
|Radeon HD 6800 Series Load Voltage|
|Ref 6870||XFX 6870||Ref 6850||XFX 6850|
Finally our EVGA GTX 460 1GB FTW card has a VID of 0.975v, which compared to all the other GTX 460 cards we’ve tested thus far makes it quite notable. This is lower than any of those other cards by 0.012v, a property we believe is necessary to sell such a heavily overclocked card without causing a similarly large rise in power/heat/noise. It’s also for this reason that we question whether NVIDIA could actually supply suitable GF104 GPUs in high volumes, as GPUs capable of running at this voltage are likely coming from the cream of the crop for NVIDIA.
For our tests, please note that we do not have a pair of reference 6850s. For our second 6850 we are using XFX’s customized 6850 card, which means our results will undoubtedly differ from what a pair of true reference cards would do. However as the 6850 reference design will not be widely available this is less important than it sounds.
As always we start our look at power/temp/noise with our look at idle power. Because we use a 1200W PSU in our GPU test rig our PSU efficiency at idle is quite low, leading to the suppression of the actual difference between cards. But even with this kind of suppression it’s still possible to pick out what cards have a lower idle power draw, as the best cards will still result in a total system power draw that’s at least a couple of watts lower.
AMD’s official specs call for the 6800 series to have a lower idle power draw than the 5800 series, and while we can’t account for all 8 watts we do manage to shave a couple of watts off compared to our 5800 series cards. The Crossfire results are even more impressive, with the 6870CF drawing 11W less than the 5870CF.
Compared to the 6800 series the GeForce GTX 460 768MB does manage to hang on to top honor here for a single card by a watt, however in SLI our 1GB cards do worse than our 6800 series cards by 10W.
Looking at load power consumption it’s clear from the start that AMD’s efficiency gains are going to pay off here. On the latest iteration of our power consumption chart the 6850 underconsumes even the already conservative 5850 by 20W under Crysis and 25W under FurMark, showcasing how AMD was able to reduce their power consumption by a significant amount while giving up much less in the way of performance.
Compared to the 6800 series NVIDIA does notably worse here, with all of the GTX 460 cards pulling down more power than the 6870 and the GTX 470 being in a league of its own. While NVIDIA was competitive with Cypress on power, they’re not in a position to match Barts. They can deliver Barts-like performance (and then some), but they have to consume more power to do it.
Up next is our look at GPU temperatures, starting with idle temps. As we mentioned in our GTX 460 review, NVIDIA ended up producing a very effective reference cooler for the GTX 460, utilizing an open-air design that by dissipating air inside and outside of the case is capable of reaching temperatures fully exhausting coolers can’t match. As a result all of the GTX 460 cards top our charts here.
Prior to the GTX 460 series this is a metric the 5850 always did well in, so we had expected a similar performance from the 6850, only to leave disappointed. What we’re ultimately looking at is a matter of the quality of the cooler: the 6850 may consume less power than the 5850 at idle, but it packs a weaker cooler overall, allowing it to approach these temperatures. For a gaming card such as the 6800 series idle temperatures are almost entirely superficial once we get below 50C, but even so this tells us something about the 6850 reference cooler.
Thanks to the GTX 460’s open-air cooler, all of our GTX 460 cards top our temperature chart even with their higher power consumption. The trade-off is that all of these cards require a well-ventilated case, while the Radeon 5800 and 6800 series will tolerate much poorer cases so long as there’s enough ventilation for the card to pull in air in the first place.
As was the case with idle temperatures, the reference 6850 ends up doing worse than the 5850 here thanks to its less effective cooler; however the 6870 ends up doing better than both the 6850 and 5870 due to its more effective cooler and its lower power consumption compared to the 5870. While these cards can’t quite touch the GTX 460 series, we’re still looking at some of the coolest cards among our current benchmark suite.
Meanwhile our XFX 6850 ends up doing the best out of all of our cards here, however this will come at a cost of more noise. We'll touch on this more next week in our 6850 roundup.
Last but not least is idle noise, which isn’t much of a story with modern cards. With the exception of the GTX 470/480, the latest GeForce and Radeon cards are both capable of running up against the noise floor of our testing environment.
Under load we once again see an NVIDIA GTX 460 card top the chart thanks to its open-air design. This is followed very closely however by the Radeon 6850, which at 47.7dB is our third-quietest card and finally showing off the advantages of the tradeoffs AMD made with the reference cooler. The 6850 may not be as cool as the 5850, but it’s quite a bit quieter. As for the XFX card, this is where XFX has to pay the piper, as their 6850 card ends up being as loud as a 5870 in exchange for their lower temperatures.
Meanwhile the 6870 ends up being quite a bit louder than both the GTX 460 series and the 6850, coming in at 55.2dB. This is a definite leg-up compared to the 5870 and nicely cements the fact that the 6870 is intended to be the 5850’s replacement, but it means the GTX 460 series spoils the results here. Once custom-design vendor cards come out for the 6870, I suspect we’re going to see someone quickly sell a 6870 with a less aggressive cooler, which for the costs of higher temperatures would afford less noise.
Going in to our first meeting with AMD, we weren’t quite sure what to expect with the Radeon HD 6800 series. After all, how do you follow up on the blockbuster that was Cypress and the Radeon HD 5800 series?
The answer is that you don’t, at least not right away. With AMD’s choice of names in mind, Barts and the 6800 series isn’t the true successor to Cypress; but it is the next generation of Radeon for another market. That doesn’t mean it isn’t a great product line though – in fact that’s far from it.
The 6800 series hits the one market segment that AMD couldn’t reach with either the 5800 series or the 5700 series: the $200 market. As we said back in July when we crowned the GTX 460 the $200 king, the most successful chips are those chips that are designed from the get-go for the market they’re being sold in. The GTX 460 succeeded where the Cypress could not, as the penalty for using a harvested Cypress chip for that market was too severe and AMD had little else to work with.
Now 3 months later AMD has their appropriate answer to the $200 market in the form of Barts and the Radeon HD 6800 series. The Barts GPU is small enough to cheaply produce for that market, and with AMD’s rebalanced design it’s capable of trailing the 5800 series by only 7%, making Cypress-like performance available for prices lower than before. It’s the missing link that AMD has needed to be competitive with the GTX 460.
As a result, even with NVIDIA’s latest round of price drops AMD has managed to dethrone the $200 king, and in the process is reshaping the competitive market only recently established by the GTX 460. With AMD and NVIDIA’s price stratification there are very few head-to-head matchups, but there are a few different situations that bear looking at.
At the top end we have the Mexican standoff between the recently price-reduced GTX 470, the newly released Radeon HD 6870, and the overclocked GTX 460 as represented by the EVGA GTX 460 1GB FTW. At $260 the GTX 470 is several percent faster than the 6870, and at only $20 more NVIDIA has done a good job pricing the card. If performance is your sole concern, than the GTX 470 is hard to beat at those prices – though we suspect NVIDIA isn’t happy about selling GF100 cards at such a low price.
Meanwhile if you care about a balance of performance and power/heat/noise, then it’s the 6870 versus the EVGA GTX 460; and the EVGA card wins in an unfair fight. As an overclocked card in a launch card article we’re not going to give it a nod, but we’re not going to ignore it; it’s 5% faster than the reference 6870 while at the same time it’s cooler and quieter (thanks in large part to the fact that it’s an open-air design). At least as long as it’s on the market (we have our doubts about how many suitable GPUs NVIDIA can produce), it’s hard to pass up even when faced with the 6870.
Without the EVGA card in the picture though, the 6870 is clearly sitting at a sweet spot in terms of price, performance, and noise. It’s faster than the 5850 while drawing only as much power and yet it’s still slightly quieter. Meanwhile it completely clobbers the reference clocked GTX 460 1GB in gaming performance, although with NVIDIA’s new prices and the $30 premium we would hope that this is the case. If nothing else the 6870 wins by default – NVIDIA doesn’t have a real product to put against it.
As for the Radeon HD 6850 however, things are much more lopsided in AMD’s favor. It’s give and take depending on the benchmark, but ultimately it’s just as fast as the GTX 460 1GB on average, even though it’s officially $20 cheaper. And at the same time it draws less power and produces less noise than the GTX 460 1GB. In fact unless the GTX 460 1GB was cheaper than the 6850, we really can’t come up with a reason to buy it. For all the advantage of an overclock when going up against the 6870, the stock clocked card has nothing on the 6850. Even the GTX 460 768MB, while $10-$20 cheaper than the 6850, still has to contend with the fact that the 6850 is almost 10% faster and only marginally louder.
In fact our only real concern is that while the reference 6850 is a great card, the XFX card is less so – XFX heavily prioritized temperatures over noise, and while this pays off with a load temperate even better than the GTX 460, it comes at the price of noise levels exceeding even the 6870. Shortly before publication we got a note from XFX that they’re going to work on releasing a BIOS with a less aggressive fan, which hopefully should resolve the issue. In the meantime we suggest checking back here next week, as we’ll have several custom 6850s arriving next week that we’ll be reviewing as part of a 6850 roundup.
Wrapping things up, we believe this will probably go down as being the most competitive card launch of the year. AMD and NVIDIA reposition themselves against each other with every launch, but by first launching the Radeon HD 6000 series against NVIDIA’s mid-to-high range GTX 460, AMD has gone head-first in to one of NVIDIA’s most prized markets, and NVIDIA is pushing right back. If you would have told us 3 months ago that we would have been able to get GTX 460 1GB performance for $180 only a couple months later, we likely would have called you mad, and yet here we are. The competitive market is alive and then some.
Ultimately this probably won’t go down in history as one of AMD’s strongest launches – there’s only so much you can do without a die shrink – but it’s still a welcome addition to the Radeon family. With a new generation of Radeon cards taking their foothold, we now can turn our eyes towards the future, and to see what AMD will be bringing us with the Radeon HD 6900 series and the Cayman GPU.