Original Link: http://www.anandtech.com/show/7930/the-amd-radeon-r9-295x2-review



Although the days of AMD’s “small die” strategy have long since ended, one aspect of AMD’s strategy that they have stuck with since the strategy’s inception has been the concept of a dual-GPU card. AMD’s first modern dual-GPU card, the Radeon HD 3870 X2 (sorry, Rage Fury MAXX), came at a low point for the company where such a product was needed just to close the gap between AMD’s products and NVIDIA’s flagship big die video cards. However with AMD’s greatly improved fortune these days, AMD no longer has to play just to tie but they can play to win. AMD’s dual-GPU cards have evolved accordingly and these days they are the high-flying flagships of AMD’s lineup, embodying the concept of putting as much performance into a single card as is reasonably possible.

The last time we took a look at a new AMD dual-GPU video card was just under a year ago, when AMD launched the Radeon HD 7990. Based on AMD’s then-flagship Tahiti GPUs, the 7990 was a solid design that offered performance competitive with a dual card (7970GHz Edition Crossfire) setup while fixing many of the earlier Radeon HD 6990’s weaknesses. However the 7990 also had its shares of weaknesses and outright bad timing – it came just 2 months after NVIDIA released their blockbuster GeForce GTX Titan, and it also launched at a time right when the FCAT utility became available, enabling reliable frame pacing analysis and exposing the weak points in AMD’s drivers at the time.

Since then AMD has been hard at work on both the software and hardware sides of their business, sorting out their frame pacing problems but also launching new products in the process. Most significant among these was the launch of their newer GCN 1.1 Hawaii GPU, and the Radeon R9 290 series cards that are powered by it. Though Tahiti remains in AMD’s product stack, Hawaii’s greater performance and additional features heralded the retail retirement of the dual-Tahiti 7990, once again leaving an opening in AMD’s product stack.

That brings us to today and the launch of the Radeon R9 295X2. After much consumer speculation and more than a few teasers, AMD is releasing their long-awaited Hawaii-powered entry to their dual-GPU series of cards. With Hawaii AMD has a very powerful (and very power hungry) GPU at their disposal, and for its incarnation in the R9 295X2 AMD is going above and beyond anything they’ve done before, making it very clear that they’re playing to win.

AMD GPU Specification Comparison
  AMD Radeon R9 295X2 AMD Radeon R9 290X AMD Radeon HD 7990 AMD Radeon HD 7970 GHz Edition
Stream Processors 2 x 2816 2816 2 x 2048 2048
Texture Units 2 x 176 176 2 x 128 128
ROPs 2 x 64 64 2 x 32 32
Core Clock ? 727MHz 950MHz 1000MHz
Boost Clock 1018MHz 1000MHz 1000MHz 1050MHz
Memory Clock 5GHz GDDR5 5GHz GDDR5 6GHz GDDR5 6GHz GDDR5
Memory Bus Width 2 x 512-bit 512-bit 2 x 384-bit 384-bit
VRAM 2 x 4GB 4GB 2 x 3GB 3GB
FP64 1/8 1/8 1/4 1/4
TrueAudio Y Y N N
Transistor Count 2 x 6.2B 6.2B 2 x 4.31B 4.31B
Typical Board Power 500W 250W 375W 250W
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 28nm
Architecture GCN 1.1 GCN 1.1 GCN 1.0 GCN 1.0
GPU Hawaii Hawaii Tahiti Tahiti
Launch Date 04/21/2014 10/24/2013 04/23/2013 06/22/2012
Launch Price $1499 $549 $999 $549

Starting with a brief look of the specifications, much of the Radeon R9 295X2’s design, goals, and performance can be observed in the specifications alone. Whereas the 7990 was almost a 7970GE Crossfire setup on a single card, AMD is not making any compromises for the R9 295X2, equipping the card with a pair of fully enabled Hawaii GPUs and then clocking them even higher than their single-GPU flagship, the R9 290X. As a result unlike AMD’s past dual-GPU cards, which made some performance tradeoffs in the name of power consumption and heat, AMD’s singular goal with the R9 295X2 is to offer the complete performance of a R9 290X “Uber” Crossfire setup on a single card.

Altogether this means we’re looking at a pair of top-tier Hawaii GPUs, each with their full 2816 SPs and 64 ROPs enabled. AMD has set the boost clock on these GPUs to 1018MHz – just 2% faster than the 290X – which means performance is generally a wash compared to the R9 290X in CF, but none the less offering just a bit more performance that should offset the penalties from the additional latency the necessary PCIe bridge chip introduces. Otherwise compared to the retired 7990, the R9 295X2 should be a far more capable card, offering 40% more shading/texturing performance and 2x the ROP throughput of AMD’s previous flagship. Like the R9 290X compared to the 7970GHz we’re still looking at what are fundamentally parts from the same generation and made on the same 28nm process, so AMD doesn’t get the benefits of a generational improvement in architectures and manufacturing, but even within the confines of 28nm AMD has been able to do quite a bit with Hawaii to improve their performance over Tahiti based products.

Meanwhile AMD is taking the same no-compromises strategy when it comes to memory. The R9 290X was equipped with 4GB of 5GHz GDDR5, operating on a 512-bit memory bus, and for the R9 295X2 in turn each GPU is getting the same 4GB of memory on the same bus. The fact that AMD has been able to lay down 1024 GDDR5 memory bus lines on a single board is no small feat (the wails of the engineers can be heard for miles), and while it is necessary to keep up with the 290X we weren’t entirely sure if AMD was going to be able and willing to pull it off. Nonetheless, the end result is that each GPU gets the same 320GB/sec as the 290X does, and compared to the 7990 this is an 11% increase in memory bandwidth, not to mention a 33% increase in memory capacity.

Now as can be expected by any card labeled a “no compromises” card by its manufacturer, all of this performance does come at a cost. Hawaii is a very powerful GPU but it is also very power hungry; AMD has finally given us an official Typical Board Power (TBP) for the R9 290X of 250W, and with R9 295X2 AMD is outright doubling it. R9 295X2 is a 500W card, the first 500W reference card from either GPU manufacturer.

As one can expect, moving 500W of heat is no easy task. AMD came close once before with the 6990 – a card designed to handle up to 450W in its AUSUM mode – but the 6990 was dogged by the incredibly loud split blower AMD needed to use to cool the beast. For the 7990 AMD dropped their sights and their power target to just 375W, and at the same time went to a large open air blower that allowed them to offer a dual-GPU card with reasonable noise levels. But for the R9 295X2 AMD is once again turning up the heat, requiring new methods of cooling if they want to offer 500W of cooling while maintaining reasonable noise levels.

To dissipate 500W of heat AMD has moved past blowers and even open air coolers, and moved on to a closed loop liquid cooler (CLLC). We’ll cover AMD’s cooling apparatus in more detail when we take a closer look at the construction of the R9 295X2, but as with AMD’s 500W target AMD is charting new territory for a reference card by making a CLLC the baseline cooler. With two Asetek pumps and a 120mm radiator to dissipate heat, the R9 295X2 is a significant departure from AMD’s past designs and an equally significant change in the traditionally conservative system requirements for a reference card.

In any case, the fact that AMD went this route isn’t wholly surprising – there aren’t too many ways to move 500W of heat – but the lack of significant binning did catch us off guard. Dual-GPU cards are often (but not always) using highly binned GPUs to further contain power consumption, which isn’t something AMD has done this time around, and hence the reason for the R9 295X2’s doubled power consumption. So long as AMD can remove the heat then they’ll be fine, and from our test results it’s clear that AMD has definitely done some binning, but none the less it’s interesting that we aren’t seeing as aggressive binning here as in past years.

Finally, let’s dive into pricing, availability, and competition. Given the relatively exotic cooling requirements for the R9 295X2, it comes as no great surprise that AMD is targeting the same luxury video card crowd that the GTX Titan pioneered last year when it premiered at $1000. This means using more expensive cooling devices, a greater emphasis on build quality with a focus on metal shrouding, and a few gimmicks to make the card stand out in windowed cases. To that end the R9 295X2 will by its very nature be an extremely low volume part, but if AMD has played their cards right it will be the finest card they have ever built.

The price for that level of performance and quality on a single card will be $1499 (€1099 + VAT), $500 higher than the 7990’s $999 launch price, and similarly $500 higher than NVIDIA’s closest competitor, the GTX Titan Black. With two R9 290Xs running for roughly $1200 at current prices, we’ve expected for some time that a dual-GPU Hawaii card would be over $1000, so AMD isn’t too far off from our expectations. Ultimately AMD’s $1500 price tag amounts to a $300 premium for getting two 290Xs on to a single card, along with the 295X2’s much improved build quality and more complex cooling apparatus. Meanwhile GPU complexity and heat density has reached a point where the cost of putting together a dual-GPU card is going to exceed the cost of a single card, so these kinds of dual-GPU premiums are going to be here to stay.

As always, the R9 295X2’s competition will be a mix of dual video card setups such as dual R9 290Xs and dual GTX 780 Tis, and of course NVIDIA’s forthcoming dual-GPU card. When it comes to dual video card setups the latter will always be cheaper than a single dual-GPU card, so the difference lies in the smaller space requirements of a single video card and the power/heat/noise savings that such a card provides. In the AMD ecosystem the reference 290X is dogged by its loud reference cooler, so as we’ll see in our test results the R9 295X2 will have a significant advantage over the 290X when it comes to noise.

Meanwhile in NVIDIA’s ecosystem, NVIDIA has the dual GTX 780 Ti, the dual GTX Titan Black, and the GTX Titan Z. The dual GTX 780 Ti is going to be closest competitor to the R9 295X2 at roughly $1350, with a pair of GTX Titan Blacks carrying both a performance edge and a significant price premium. As for the GTX Titan Z, NVIDIA’s forthcoming dual-GPU card is scheduled to launch later this month, and while it should be a performance powerhouse it’s also going to retail at $3000, twice the price of the R9 295X2. So although the GTX Titan Z can be used for gaming, we’re expecting it to be leveraged more for its compute performance than its gaming performance. In any case based on NVIDIA’s theoretical performance figures we have a strong suspicion that the GTX Titan Z is underclocked for TDP reasons, so it remains to be seen whether it’s even gaming performance competitive with the R9 295X2.

For availability the R9 295X2 will be a soft launch for AMD, with AMD announcing the card 2 weeks ahead of its expected retail date. AMD tells us that the card should start appearing at retailers and in boutique systems on the week of April 21st, and while multiple AMD partners will be offering this card we don’t have a complete list of partners at this time (but expect it to be a short list). The good news is that unlike most of AMD’s recent product launches, we aren’t expecting availability to be a significant problem. Due to the price premium over a pair of 290Xs and recent drops in cryptocoin value, it’s unlikely that miners will want the 295X2, meaning the demand and customer base should follow the more traditional gamer demand curves.

Finally, it’s worth noting that unlike the launch of the 7990, AMD isn’t doing any game bundle promotions for the R9 295X2. AMD hasn’t been nearly as aggressive on game bundles this year, and in the case of the R9 295X2 there isn’t a specific product (e.g. GTX Titan) that AMD needs to counter. Any pack-in items – be it games or devices – will be the domain of the board partners this time around. Also, the AMD Mystery Briefcase was just a promotional item, so partners won’t be packing their retail cards quite so extravagantly.

Spring 2014 GPU Pricing Comparison
AMD Price NVIDIA
  $3000 GeForce GTX Titan Z
Radeon R9 295X2 $1500  
  $1100 GeForce GTX Titan Black
  $700 GeForce GTX 780 Ti
Radeon R9 290X $600  
Radeon R9 290 $500 GeForce GTX 780


Meet the Radeon R9 295X2: Cooling & Power Delivery

Kicking off our look at the R9 295X2’s features and build quality, as we alluded to in our introduction the R9 295X2 is a card of many firsts for AMD. It is the first 500W card for the company, it is the first luxury class video card for the company, and it is of course the first reference card with a closed loop liquid cooler And since the elephant in the room is that CLLC, let’s start our review with that.

Liquid cooling for GPUs is in and of itself not a new concept. Self-builders and boutique builders have been using self-assembled (open loop) liquid cooling assemblies on GPUs for years, as GPUs have exceeded CPUs for heat generation for some time now. However it’s only within the last few years that closed loop liquid coolers have come into the mainstream, and still more recent yet since board manufacturers started using CLLCs in their base designs as opposed to an aftermarket modification.

The first notable card to ship with a CLLC was Asus’s ROG ARES II, a high end boutique design that put two 7970GEs on a single board. The ARES brand is used for Asus’s custom designed cutting edge video cards, both of which have been dual-GPU cards in the same vein as today’s R9 295X2. At the time Asus needed to move around 500W (just like AMD today) and so they used an interesting dual cooler design that combined a split blower with a CLLC, allowing for both the GPUs and the associated discrete components to be effectively cooled in spite of the immense heat. The end result was a rare (1000pc) card that served as a proof of concept for the retail CLLC design, and at least to some extent the market for such a design.

It’s this design that AMD is clearly taking a hint or two from in building the R9 295X2. Like Asus’s design, AMD is essentially punting on cooling the GPUs directly, moving to a split design that uses both direct and indirect cooling.

For GPU cooling AMD has teamed up with Asetek to utilize their collection of CLLC parts to build out the R9 295X2. Each GPU is outfit with one of Asetek’s higher end copper-based pumps, which operate together in a serial design. At the other end of the loop is one of Asetek’s 120mm radiators – we haven’t been able to identify the specific radiator, but at 38mm thick it’s somewhere in between Asetek’s standard (single) thickness and double-thick radiators. Sending air through the radiator in turn is a single 120mm fan in a push configuration. With the bulk of the heat generated by the R9 295X2 coming from the GPUs, the bulk of the heat exhaustion in turn is handled by this radiator setup.

Ultimately the use of a CLLC is what’s key to making the R9 295X2 a viable and acoustically practical product. While it’s plenty possible to build a triple slot open air cooler, as exemplified by products such as the PowerColor custom 7990, those products end up being especially large even by video card standards, and noise is often an ongoing concern. By going with a separate radiator and using liquid cooling to attach it to the GPUs, AMD is able to use a heatsink with a much larger surface area and a large, slow case fan to move air as opposed to having to operate a smaller fan at higher (i.e. louder) speeds, as we saw with the Radeon HD 6990. At the end of the day the PCI-Express form factor imposes certain limits on cooler design by virtue of its shape and overall size of I/O slots, so AMD has forgone PCI-Express entirely by utilizing an external radiator.

Meanwhile a secondary split-blower design provides cooling for the discrete components on the board itself, including the various VRMs and the RAM. AMD utilizes a metal baseplate and secondary copper heatsink over the PCB to channel heat out of those smaller components, with grooves in the baseplate directing the airflow. The use of this split design with a radiator and a split-blower allows AMD to use a CLLC for the GPUs while resolving the biggest roadblock in aftermarket CLLC GPU cooling, which is cooling those discrete components. The end result is a cooling setup that is almost (but not quite) a fully exhausting cooler. All of the GPU heat and half of the discrete component heat is exhausted outside of the case, leaving only a small amount of heat from the other half of the discrete components to be cycled back into the case.

On that note, as can be expected from the inclusion of a CLLC, the R9 295X2’s mounting requirements will have a significant impact on its case compatibility. Installing the R9 295X2 will require an easily accessible 120mm fan exhaust mount, something that’s available in many cases, but not all of them. Our own testbed has a 120mm/140mm mount directly above the GPU which we’re using for our testing, but this is something we can get away with because our 2x140mm CPU CLLC is mounted at the top of the case. Mounting a CPU CLLC and an R9 295X2 will likely tap all of the available mounts available in mid-tower cases, so going for a dual R9 295X2 setup can be assumed to require a full tower ATX case to come up with enough mounting points.


AMD Suggested Mounting

Moving on, beyond the cooler we have the board itself. At 12 inches long the R9 295X2 is the same length as the 7990 and the 6990, so radiator aside it fits in the same space as AMD’s previous dual-GPU cards. Also unchanged compared to past AMD designs, and unlike Asus’s earlier CLLC card, the R9 295X2 is also a standard height card. Some additional clearance is required for the CLLC hoses, but the board and the shroud itself do not protrude any, making it a bit easier to install the card.

Removing the pump and heatsink assembly exposes the board itself, which utilizes a fairly typical layout for a dual-GPU design. The two Hawaii GPUs sit at opposite ends of the board, with each GPU surrounded by its 4GB of VRAM. For each GPU 8 chips are on the front of the board while the other 8 chips are on the back. Meanwhile the various VRM components lie at the center of the card (explaining the earlier secondary copper heatsink), and to the left of that a PLX 48 lane PCIe switch.

The PCB itself is 14 layers, making it an especially intricate PCB, but one necessary to carry 500W while also routing 1024 GDDR5 lines and 48 PCIe 3 lanes. For power delivery and regulation AMD is using a 4+1+1 design for each GPU, which breaks down to 4 power phases for the GPU, one power phase for the memory interface, and one power phase for the memory itself. This 4+1+1 setup is functional for AMD’s needs at stock settings, but between the CLLC and the power delivery system it’s clear that AMD hasn’t built this board for extreme overclocking.

Speaking of power delivery, let’s talk about the 2 8pin PCIe power sockets that are found at the top right side of the card. For those of our readers who can quote PCIe specifications by heart, the standard limit for an 8pin PCIe socket is 150W, which in this configuration would mean that the R9 295X2 has a 375W (150+150+75) power delivery system. By PCIe standards this has the board coming up short, but as we found out back in 2011 with the launch of the 6990, when it comes to these high end specialty cards PCIe compliance no longer matters. In the case of the 6990 and now the R9 295X2, AMD is essentially designing to the capabilities of the hardware rather than the PCIe specification, and the PCI-SIG for their part is not an enforcement body. Other than likely not being able to get their card validated as PCI-Express compliant and therefore included on the Systems Integrator List, AMD isn’t penalized for exceeding the PCIe power delivery standard.

So why does the 500W R9 295X2 only have 2 PCIe power sockets? As it turns out this is an intentional decision by AMD to improve the card’s compatibility. Dual dual-GPU (Quadfire) setups are especially popular with boutique builders and their customers, and very few PSUs offer more than 4 8pin PCIe power plugs. As a result, by using just 2 power sockets the R9 295X2 is compatible with a wider range of PSUs when being used in Quadfire setups. Meanwhile on the power delivery side of the equation, most (if not all) of the PSUs that can reliably push the necessary wattage to support one or two R9 295X2s have no problem delivering the roughly 220W per socket that the card requires. Which is why at the end of the day AMD can even do this, because the PSUs in the market today can handle it.

Speaking of power, it’s worth pointing out that AMD’s official system requirements for the R9 295X2 call for a PSU that can deliver 28A per 8pin PCIe power connector, with a combined amperage of 50A. For most PSUs this means you’re looking at an 800W PSU being required for a single card, and a 1500W PSU for a Quadfire setup.

Meanwhile our final stop on our tour of the R9 295X’s PCB is the I/O connectivity. Here AMD is using the same setup as they used for the 7990, with 4 mini-DisplayPort connectors and a single DL-DVI-D connector sharing the bottom row of the I/O bracket, while the top of the bracket is dedicated to exhausting hot air. With a single R9 290X already capable of driving most games at high settings on single-display (2560x1600/1440) resolutions, the R9 295X2 is primarily targeted towards users who are either using 4K displays or using Eyefinity setups, either of which is best matched with DisplayPorts rather than additional DVI/HDMI ports. As always these are dual-mode ports, so they can easily be converted to HDMI and DVI if the need arises.



Meet the Radeon R9 295X2: Build Quality & Performance Expectations

Moving on, let’s talk about the build quality of the card itself. With the 7990 AMD did not explicitly chase the luxury market despite its $1000 price tag, choosing to use a standard mixture of metal and plastic parts as part of their more aggressive pricing and positioning of that card. Even with that choice, the 7990 was a solid card that was no worse for the use of plastic, reflecting the fact that while metal adds a degree of sturdiness to a card, it’s not strictly necessary for a well-designed card.

However with the R9 295X2 priced at $1500 and AMD choosing to go after the luxury market, AMD has stepped up their build quality in order to meet the higher expectations NVIDIA has set with their GTX Titan series of cards. The end result is that while the R9 295X isn’t a carbon copy of the GTX Titan by any means, it does successfully implement the metal finish that we’ve seen with NVIDIA’s luxury cards, and in the process ends up being a very sturdy card that can stand toe-to-toe with Titan.

Overall AMD is using a 2 piece design here. Mounted on top of the PCB is the baseplate, which runs the length of the card. Meanwhile a series of screws around the edge of the metal shroud holds it to the baseplate, making it similar to the shroud AMD used for their reference 290X cards. The bolts seen at the top of the card, despite appearances, are for all practical purposes decorative, with the aforementioned screws being the real key to holding the card together.

Elsewhere, at the top of the card we can see that AMD has taken a page from NVIDIA’s playbook and invested in red LED lighting for the fan on the card and the Radeon logo. This is one of those gimmicks we mentioned earlier, that although don’t improve the functionality of the card at all, have become popular with luxury buyers for showing off their cards.

Wrapping up our discussion of the R9 295X2’s construction, let’s quickly discuss the card’s physical limits and AMD’s own overclocking limits. While we’ll get into the heart of the matter in our look at power/temperature/noise, this is as good a time as any to point out the card’s various limits and why they are what they are.

Starting with the fans, because AMD is relying on a split cooling design with both an on-board fan and a CLLC, AMD doesn’t offer any fan control options for this card. In the case of the CLLC this is because the CLLC itself controls its own fan speed, with the single 120mm fan slaved into the pumps and the pumps in turn adjusting the fan based on their own sensors. Since the 120mm fan is under the control of the pumps and not tied into the board’s fan controller, there’s no way to control this fan short of introducing a physical fan controller in the middle.

The 120mm fan in question is a simple 2pin fan, which is rated to operate between 1200 RPM and 2000 RPM (+/- 10%). Using an optical tachometer we measured the fan operating between 1340RPM at idle and 1860RPM under load, which is a bit high for idle based on AMD’s specifications, but within the 10% window for load operation. The fan does get up to full speed before the GPUs reach their75C temperature limit, so whether it’s FurMark or Crysis, in our experience the 120mm fan maxes out at the same speed and hence the same noise levels.

AMD Radeon R9 295X2 120mm Radiator Fan Speeds
  Spec Measured
Idle 1200 RPM +/- 10% ~1340 RPM
Full Load 2000 RPM +/- 10% ~1860 RPM

Meanwhile the smaller fan on the card itself is rated for 1350 RPM to 2050 RPM, and is tied into the board’s fan controller. However since this fan is responding to board component temperatures rather than GPU temperatures, AMD has not made this fan controllable. AMD’s APIs do not directly expose the temperatures of these components (MSI AB was only able to tell us the GPU temperatures), so it’s not unexpected that this fan can’t be directly controlled.

While fan controls aren’t exposed, AMD does expose the other overclocking controls in their Overdrive control panel. Power limits, temperature limits, CPU clockspeed, and memory clockspeed are all exposed and can be adjusted. However unlike the 290 series, which allowed a GPU temperature of up to 95C, AMD has clamped down on the R9 295X2’s GPUs, only allowing them to reach 75C before throttling. Upon finding this we asked AMD why they were using such a relatively low temperature limit, and the response we received is that it’s due to a combination of factors including the operational requirements of the CLLC itself, and what AMD considers the best temperature for optimal performance. As we briefly discussed in our 290X review leakage increases with temperature, and while Hawaii is supposed to be a lower leakage part leakage is still going to be occurring. To that end our best guess is that 75C is as warm as Hawaii can get before leakage starts becoming a problem for this card.

Which brings us to our final point, which is how close AMD is operating to the limits of the Asetek CLLC. For the nearly 500W that the R9 295X2’s CLLC needs to dissipate, a single 120mm CLLC is relatively small by CLLC standards. CPU CLLCs are easily found in larger sizes, including 140mm, 2x120mm, and 2x140mm (which is what we use for our CPU). In all of those cases CPUs will generate less heat than a pair of GPUs, which means the R9 295X2’s radiator is operating under quite a bit of load.

Based on our testing the CLLC is big enough to handle the load at stock, but only just. Under our most strenuous gaming workloads we’re hitting temperatures in the low 70s coupled all the while the 120mm fan is reaching its maximum fan speed. This level of cooling performance is enough to keep the card from throttling, but it’s clear that the CLLC can’t handle any more heat. Which ties in to what we said earlier about AMD not designing this card for overclocking. Even without voltage adjustments, just significantly increasing the power limit would cause it to throttle based on GPU temperatures. To that end the 120mm CLLC gets the job done, but this is clearly a card that’s best suited for running at stock.



Revisiting the Radeon HD 7990 & Frame Pacing

Before we jump into our full benchmark suite, the launch of a new AMD dual-GPU card makes this an opportune time to revisit the state of frame pacing on AMD’s cards and to reflect on the Radeon HD 7990, so we’d like to take a moment to do just that.

When the 7990 launched last year, for AMD it came at an unfortunate time when the subject of frame pacing was finally coming to a head. With the incredibly coincidental release of NVIDIA’s FCAT tool it became possible to systematically and objectively measure frame pacing, and those findings showed that AMD’s frame pacing algorithms were significantly lagging NVIDIA’s. AMD Crossfire setups, including the 7990, were doing little if anything to mete out frames in an even manner, resulting in a range of outcomes from badly paced frames to dropping frames altogether. Worse, the problem was especially prevalent on multi-display Eyefinity setups, including the pseudo-multi-display methods that are used to drive 4K monitors at 60Hz even today.

The issue of frame pacing had been brewing for some time and AMD was quick to respond to concerns and agree that it needed addressed,  but the complex nature of the problem meant that it would take some time to fully resolve. The result of AMD’s efforts resulted in a series of phases of Crossfire frame pacing improvements. Phase 1 was released in August, 4 months after the launch of the 7990, and implemented better Crossfire frame pacing for games operating at or below 2560x1600.

The 2560x1600 limitation was a significant one, as this essentially limited AMD’s fixes to single-display setups and excluded Eyefinity and 4K setups. This limit in turn was directly related to the technical underpinnings of AMD’s GCN 1.0 (and earlier) GPUs, which used the Crossfire Bridge Interconnect to share data when using Crossfire. The CFBI offered just 900MB/sec of bandwidth, which was enough for 2560x1600 but nothing more. To move larger frames between GCN 1.0 GPUs, AMD has to undertake a much trickier process of involving the PCI-Express bus.

In the intervening period between then and now AMD has released their GCN 1.1 GPUs, which implement the XDMA block to specifically and efficiently handle frame transfers between GPUs. The end result of the XDMA block is that Hawaii based products – including the R9 295X2 – have no trouble with frame pacing. This in turn makes the R9 295X2 all the more important for AMD due to the fact that it’s the first dual-GPU video card from them to utilize this feature. Otherwise for the 7990 and other GCN 1.0 products, utilizing Crossfire with high resolutions involves a great deal more effort under the hood.

It was only finally in February of this year that AMD rolled out their Phase 2 driver, which implemented their high resolution frame pacing solution for pre-GCN 1.1 video cards. But since that same driver also launched support for AMD’s Mantle API and their Heterogeneous System Architecture, we haven’t had a chance to reevaluate AMD’s frame pacing situation until now. In our full benchmark section we’ll include a complete breakdown of frame pacing performance for both AMD and NVIDIA setups, but we first wanted to stop and take a look at frame pacing for pre-GCN 1.1 cards in particular.

So we’ve set out to answer the following question: now that AMD is supporting high resolution frame pacing on cards such as the 7990, has the 7990 been fully fixed?

The short answer, unfortunately, is that it’s a mixed bag. AMD has made significant improvements since we last evaluated frame pacing on the 7990 back at the 290X launch, which at the time saw the 7990 dropping frames left and right. But AMD has still not come far enough to truly fix the issue, as we’ll see.

We’ll starting off with our delta percentage data, which is the average difference in frame times as a percentage. In an ideal world this number would be 0, indicating that every frame was delivered in exactly as much time as the previous one, which would give us a perfectly smooth experience. In practice however this is impossible to achieve even in a single-GPU setup, let alone a multi-GPU setup. So for multi-GPU setups our cutoff is 20%; if a GPU can deliver a frame with a variance of no more than 20% of the time the previous frame took, then the frame delivery is consistent enough that gameplay should be reasonably smooth and below the bounds of human perception, even if it’s not perfect.

Radeon HD 7990 Delta Percentages (Catalyst 14.4 Beta)

The end result, as we saw back in August with Catalyst 13.8, is that AMD’s frame pacing situation has been brought under control in single-display (2560x1440 and lower) resolutions, as AMD was able to continue using the CFBI and merely changed their algorithms to better handle frame pacing.

However the state of frame pacing for high resolution Crossfire, when invoking the PCIe bus, is still fundamentally broken. Of the games we have that scale with multiple GPUs, the best game is Bioshock: Infinite with a 48.5% delta, 3x the variance of 2560x1440. It gets worse from there, going up as high as 70% for Thief. To be clear this is significant improvement over the 7990 that was dropping frames before AMD’s latest fix, but the deltas are still more than twice what we believe the cutoff should be.

Radeon R9 295X2 Delta Percentages (Catalyst 14.4 Beta)

The Radeon R9 295X2 by comparison fares much better. Not only are AMD’s deltas below 20% on everything but Crysis 3 (where it’s essentially skirting that value), but in most of our games the variance drops with the increased resolution, rather than massively increasing as it does with the 7990. This is the kind of chart we’d like to see for the 7990 as well, and not just the R9 295X2.

Our final graph is a plot of frame times on both cards on the main menu of Thief, showcasing how the cards compare and giving us a visual for just what’s going on. As our delta percentages picked up on, the 7990’s frame times are all over the place, with the card frequently cycling between 15ms frame times and 30ms frame times. This is as opposed to the R9 295X2, which is relatively consistent throughout.

What makes this all the more interesting though – and is something we’ve seen on other charts – is that the 7990’s variance drops towards the end. It’s still unquestionably worse than the R9 295X2 and exceeds our 20% threshold, but compared to the worst point on the chart it has come close to being halved.

This data indicates that for the pre-GCN 1.1 cards AMD is relying on some kind of long term adaptive timing mechanism that takes quite some time (at least a minute) to kick in. Only after which do AMD’s frame pacing mechanisms exert enough control to better regulate frame timings. We’ve known since the launch of the 290X that AMD is using some kind of short term adaptive timer for the 290X and single-display resolutions on pre-GCN 1.1 cards, but this is the first time we’ve seen a long term adaptive timer in use.

The end result is that in extended play sessions the frame pacing situation on the 7990 should be better than what we’re seeing with our relatively short benchmarks. However it also means that the initial frame pacing situation will be very bad and that in the best case scenario even after a few minutes the frame timing variance is still much higher than the 20% threshold it takes to make the frame times reasonably consistent.

To that end, though the 7990 is much improved, it’s hard to say that the frame pacing situation has been fixed. To be sure single-display is fine and has been fine since August, but even with AMD’s most recent changes the 7990 (and presumably other pre-GCN 1.1 cards) are still struggling to display frames at an even pace. Which for a card originally touted as the perfect card for 4K, is not a great outcome.



The Test

Starting with today’s article we’ve made a small change to our suite of games. We are replacing our last 2012 game, Hitman: Absolution with another Square Enix title: the recently released Thief. Both games make use of many of the same graphical features, and both games include a built-in benchmark that is a good approximation of what a worst case rendering load in the game will behave like, making Thief a solid replacement for the older Hitman.

Meanwhile we’ve also updated all of our benchmark results to reflect the latest drivers from AMD and NVIDIA. For all AMD cards we are using AMD’s R9 295X2 launch drivers, Catalyst 14.4. Catalyst 14.4 appears to be a new branch of AMD’s drivers, given the version number 14.100, however we have found very few performance changes in our tests.

As for NVIDIA cards, we’re using the just-launched 337.50 drivers. These drivers contain a collection of performance improvements for NVIDIA cards and coincidentally come at just the right time for NVIDIA to counter AMD’s latest product launch.

We also need to quickly note that because AMD’s Radeon R9 295X2 uses an external 120mm radiator, we’ve had to modify our testbed to house the card. For our R9 295X2 tests we have pulled our testbed’s rear 140mm fan and replaced it with the R9 295X2 radiator. All other tests have the 140mm fan installed as normal.

CPU: Intel Core i7-4960X @ 4.2GHz
Motherboard: ASRock Fatal1ty X79 Professional
Power Supply: Corsair AX1200i
Hard Disk: Samsung SSD 840 EVO (750GB)
Memory: G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26)
Case: NZXT Phantom 630 Windowed Edition
Monitor: Asus PQ321
Video Cards: AMD Radeon R9 295X2
AMD Radeon R9 290X
AMD Radeon R9 290
AMD Radeon HD 7990
AMD Radeon HD 6990
NVIDIA GeForce GTX Titan Black
NVIDIA GeForce GTX 780 Ti
NVIDIA GeForce GTX 780
NVIDIA GeForce GTX 690
NVIDIA GeForce GTX 590
Video Drivers: NVIDIA Release 337.50 Beta
AMD Catalyst 14.4 Beta
OS: Windows 8.1 Pro

 



Metro: Last Light

As always, kicking off our look at performance is 4A Games’ latest entry in their Metro series of subterranean shooters, Metro: Last Light. The original Metro: 2033 was a graphically punishing game for its time and Metro: Last Light is in its own right too. On the other hand it scales well with resolution and quality settings, so it’s still playable on lower end hardware.

Metro: Last Light - 3840x2160 - High Quality

Metro: Last Light - 3840x2160 - Low Quality

Metro: Last Light - 2560x1440 - High Quality

Our first gaming benchmark pretty much sets the tone for what we’ll be seeing in this review. In building the 295X2 AMD set out to build a single card that could match the performance of the 290X “Uber” In Crossfire, and that is exactly what we see happening here. The 295X2 and 290XU CF swap places due to run-to-run variation, but ultimately both tie together, whether it’s above the GTX 780 Ti SLI or below it.

As we’ve already seen with the 290X, thanks in part to AMD’s ROP advantage, AMD’s strong suit is in very high resolutions. This leads to the 295X2 edging out the competition at 2160p, while being edged out itself at 1440p. None the less between AMD and NVIDIA setups this is a very close fight thus far, and will be throughout. As for Metro, even at the punishing resolution of 2160, the 295X2 is fast enough to keep this game going at above 50fps.



Company of Heroes 2

Our second benchmark in our benchmark suite is Relic Games’ Company of Heroes 2, the developer’s World War II Eastern Front themed RTS. For Company of Heroes 2 Relic was kind enough to put together a very strenuous built-in benchmark that was captured from one of the most demanding, snow-bound maps in the game, giving us a great look at CoH2’s performance at its worst. Consequently if a card can do well here then it should have no trouble throughout the rest of the game.

Company of Heroes 2 - 3840x2160 - Low Quality

Company of Heroes 2 - 2560x1440 - Maximum Quality + Med. AA

Company of Heroes 2’s underlying engine is not AFR friendly, and as a result it receives no gains from the second GPU on the 295X2. This is a subtle but important reminder that although most games benefit from multi-GPU setups, there will always be games like Company of Heroes where it’s not possible to scale beyond a single GPU. Which is why maximizing single-GPU performance first before going wider is the preferred way to improve GPU performance.

Company of Heroes 2 - Min. Frame Rate - 3840x2160 - Low Quality

Company of Heroes 2 - Min. Frame Rate - 2560x1440 - Maximum Quality + Med. AA



Bioshock Infinite

Bioshock Infinite is Irrational Games’ latest entry in the Bioshock franchise. Though it’s based on Unreal Engine 3 – making it our obligatory UE3 game – Irrational had added a number of effects that make the game rather GPU-intensive on its highest settings. As an added bonus it includes a built-in benchmark composed of several scenes, a rarity for UE3 engine games, so we can easily get a good representation of what Bioshock’s performance is like.

Bioshock Infinite - 3840x2160 - Ultra Quality + DDoF

Bioshock Infinite - 3840x2160 - Medium Quality

Bioshock Infinite - 2560x1440 - Ultra Quality + DDoF

At Bioshock’s highest quality settings the game generally favors NVIDIA’s GPUs, particularly since NVIDIA’s most recent driver release. As a result we’ll see the 295X2 come up short of 60fps on Ultra quality at 2160p, and otherwise trail the GTX 780 Ti SLI at both 2160p and 1440p. However it’s interesting to note that at 2160p with Medium quality – a compromise setting mostly for testing single-GPU setups at this resolution – we see the 295X2 jump ahead of NVIDIA’s best, illustrating the fact that what’s ultimately dragging down AMD’s performance in this game is a greater degree of bottlenecking with Bioshock’s Ultra quality effects.

Bioshock Infinite - Delta Percentages

Bioshock Infinite - Surround/4K - Delta Percentages

Meanwhile our first set of frame pacing benchmarks has more or less set the stage. Thanks to its XDMA engine the 295X2 is able to deliver acceptable frame pacing performance at both 1440p and 2160p, though at 1440p in particular NVIDIA does technically fare better than AMD here. As for the Radeon HD 7990, this offers a solid example of how AMD’s older GCN 1.0 based dual-GPU card still has great difficulty with frame pacing at higher resolutions.



Battlefield 4

Our current major multiplayer action game of our benchmark suite is Battlefield 4, DICE’s 2013 multiplayer military shooter. After a rocky start, Battlefield 4 has finally reached a point where it’s stable enough for benchmark use, giving us the ability to profile one of the most popular and strenuous shooters out there. As these benchmarks are from single player mode, based on our experiences our rule of thumb here is that multiplayer framerates will dip to half our single player framerates, which means a card needs to be able to average at least 60fps if it’s to be able to hold up in multiplayer.

Battlefield 4 - 3840x2160 - Ultra Quality - 0x MSAA

Battlefield 4 - 3840x2160 - Medium Quality

Battlefield 4 - 2560x1440 - Ultra Quality

As is the case in a few of our other games, whether it’s AMD who’s winning or NVIDIA who’s winning depends on the resolution. At 2160p with Ultra settings (and no MSAA) it’s AMD on top, with the 295X2 capable of delivering 68fps. This is safely past the 60fps threshold needed to ensure that minimum framerates don’t drop below 30fps in multiplayer. Otherwise at 1440p the NVIDIA GTX 780 Ti SLI setup pulls ahead, however the 295X2 is right on its tail.

In the meantime this game is also a good example of just how much faster than the 7990 the 295X2 is, despite the fact that both products are based on high-end (for their time) 28nm GPUs. The 295X2 ends up being over 40% faster at both 2160p and 1440p, showing just how far AMD has come in single card dual-GPU performance in the last year.

Battlefield 4 - Delta Percentages

Battlefield 4 - Surround/4K - Delta Percentages

Shifting to our delta percentage benchmarks, we once more find that the 295X2 has no problem staying within the 20% threshold needed for smooth frame pacing. Though overall NVIDIA does hold an edge, especially at 1440p.



Crysis 3

Still one of our most punishing benchmarks, Crysis 3 needs no introduction. With Crysis 3, Crytek has gone back to trying to kill computers and still holds “most punishing shooter” title in our benchmark suite. Only in a handful of setups can we even run Crysis 3 at its highest (Very High) settings, and that’s still without AA. Crysis 1 was an excellent template for the kind of performance required to drive games for the next few years, and Crysis 3 looks to be much the same for 2014.

Crysis 3 - 3840x2160 - Medium Quality + FXAA

Crysis 3 - 3840x2160 - Low Quality + FXAA

Crysis 3 - 2560x1440 - High Quality + FXAA

Crysis 3 is another title that regular favors NVIDIA cards, and despite AMD being able to close the gap through superior Crossfire scaling the 295X2 still trails the GTX 780 Ti SLI at all resolutions. That said, AMD does at least make it relatively close, and all the while manages to crack 60fps at 2160p with Medium settings, which is about as good as any dual-GPU setup can hope for in Crysis 3 right now.

Crysis 3 - Delta Percentages

Crysis 3 - Surround/4K - Delta Percentages

When it comes to our look at frame pacing, Crysis 3 is the one game where even the XDMA-equipped 295X2 is struggling to meet our own standards for acceptable frame pacing. Our normal threshold here is 20%, which the 295X2 just misses. The remaining difference is under 2% and in all likelihood should not be a significant problem for smoothness (it certainly hasn’t been an issue in our testing), but nonetheless it’s a game that could stand to see further improvements in AMD’s drivers.



Crysis: Warhead

Up next is our legacy title for 2013/2014, Crysis: Warhead. The stand-alone expansion to 2007’s Crysis, at over 5 years old Crysis: Warhead can still beat most systems down. Crysis was intended to be future-looking as far as performance and visual quality goes, and it has clearly achieved that. We’ve only finally reached the point where single-GPU cards have come out that can hit 60fps at 1920 with 4xAA, never mind 2560 and beyond.

Crysis: Warhead - 3840x2160 - Gamer Quality

Crysis: Warhead - 2560x1440 - Enthusiast Quality + 4x MSAA

At 1440p AMD and NVIDIA are within 10% of each other. However if we crank up the resolution to 2160p, the GTX 780 Ti SLI starts falling well behind the 295X2. Though this performance advantage doesn't translate to improved minimums; even at 2160p NVIDIA and AMD are close together on minimum framerates.

Crysis: Warhead - Min. Frame Rate - 3840x2160 - Gamer Quality

Crysis: Warhead - Min. Frame Rate - 2560x1440 - Enthusiast Quality + 4x MSAA



Total War: Rome 2

The second strategy game in our benchmark suite, Total War: Rome 2 is the latest game in the Total War franchise. Total War games have traditionally been a mix of CPU and GPU bottlenecks, so it takes a good system on both ends of the equation to do well here. In this case the game comes with a built-in benchmark that plays out over a forested area with a large number of units, definitely stressing the GPU in particular.

For this game in particular we’ve also gone and turned down the shadows to medium. Rome’s shadows are extremely CPU intensive (as opposed to GPU intensive), so this keeps us from CPU bottlenecking nearly as easily.

Total War: Rome 2 - 3840x2160 - Very High Quality + Med. Shadows

Total War: Rome 2 - 2560x1440 - Extreme Quality + Med. Shadows

For the moment we are including Total War: Rome II as a “freebie” in this review, as neither AMD nor NVIDIA is able to properly render this game. A recent patch for the game made it AFR friendly, unlocking multi-GPU scaling that hasn’t been available for the several months prior. However due to what’s presumably an outstanding bug in the game, when using CF/SLI we’re seeing different rendering artifacts on both AMD and NVIDIA cards.

Given the nature of the artifacting we suspect that performance will remain roughly the same once the problem is resolved, in which case the 295X2 will hold a small but significant lead, but there is no way to know for sure until the rendering issue is corrected. In the meantime this is progress for all multi-GPU cards, even if the game’s developers don’t have it perfected quite yet.



Thief

Our newest addition to our benchmark suite is Eidos Monreal’s stealth action game, Thief. Set amidst a Victorian-era fantasy environment, Thief is an Unreal Engine 3 based title which makes use of a number of supplementary Direct3D 11 effects, including tessellation and advanced lighting. Adding further quality to the game on its highest settings is support for SSAA, which can eliminate most forms of aliasing while bringing even the most powerful video cards to their knees.

Thief - 3840x2160 - High Quality

Thief - 3840x2160 - Low Quality

Thief - 2560x1440 - Very High Quality

Our first major review with Thief finds AMD taking a small lead at 2160p, with NVIDIA returning the favor at 1440p. In the case of 1440p both the AMD and NVIDIA setups are able to deliver well over 60fps (despite the heavy use of SSAA at this setting), while at 2160p even the 295X2 falls just a hair short of cracking 60fps even with the slightly lower quality settings.

Thief - Min. Frame Rate - 3840x2160 - High Quality

Thief - Min. Frame Rate - 3840x2160 - Low Quality

Thief - Min. Frame Rate - 2560x1440 - Very High Quality

Meanwhile when it comes to minimum framerates, while AMD and NVIDIA are close together at 1440p and 2160p with Low quality settings, moving to 2160p with High quality settings pretty much busts the NVIDIA SLI setup. It’s difficult to say for sure on the basis of a single SLI setup, but it looks like the memory requirements at these settings may be overwhelming the 3GB NVIDIA cards, especially in light of the GTX Titan Black’s unusual performance lead over the GTX 780 Ti. The additional buffer handling for SLI further eats into the pool of memory available for these cards, which in turn further hamstrings performance.

Thief - Surround/4K - Delta Percentages

Thief - Delta Percentages

On the other hand, other than the GTX 780 SLI’s initial bottoming out in this benchmark, NVIDIA does deliver stronger frame pacing performance. In both cases the 295X2 delivers acceptable consistency, staying under 20% variance, but it’s still a wider degree of variance than what we’re seeing with the GTX 780 Ti SLI setup.



GRID 2

The final game in our benchmark suite is also our racing entry, Codemasters’ GRID 2. Codemasters continues to set the bar for graphical fidelity in racing games, and with GRID 2 they’ve gone back to racing on the pavement, bringing to life cities and highways alike. Based on their in-house EGO engine, GRID 2 includes a DirectCompute based advanced lighting system in its highest quality settings, which incurs a significant performance penalty but does a good job of emulating more realistic lighting within the game world.

GRID 2 - 3840x2160 - Maximum Quality + 4x MSAA

GRID 2 - 2560x1440 - Maximum Quality + 4x MSAA

There’s little to say about GRID 2 other than that we continue to get amazing performance out of the game even though it’s still one of the best looking games in our benchmark suite. For what it is worth the 295X2 holds a distinct advantage over the GTX 780 Ti SLI at both resolutions we test – especially at 2160p – but even the relatively slow GTX 780 Ti SLI is delivering better than 120fps at 1440p and better than 60fps at 2160p, so the difference is somewhat academic (ed: 120Hz 4K monitors, anyone?)

GRID 2 - Delta Percentages

GRID 2 - Surround/4K - Delta Percentages



Synthetics

As always we’ll also take a quick look at synthetic performance. These synthetic benchmarks do scale with multiple GPUs, but since we’re just looking at a scaled up version of AMD’s existing Hawaii architecture, we will not find any surprises.

Synthetic: TessMark, Image Set 4, 64x Tessellation

Synthetic: 3DMark Vantage Texel Fill

Synthetic: 3DMark Vantage Pixel Fill



Compute

Our final set of performance benchmarks is compute performance, which for dual-GPU cards is always a mixed bag. Unlike gaming where the somewhat genericized AFR process is applicable to most games, when it comes to compute the ability for a program to make good use of multiple GPUs lies solely in the hands of the program’s authors and the algorithms they use.

At the same time while we’re covering compute performance for completeness, the high price and unconventional cooling apparatus for the 295X2 is likely to deter most serious compute users.

In any case, our first compute benchmark is LuxMark2.0, the official benchmark of SmallLuxGPU 2.0. SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

Compute: LuxMark 2.0

As one of the few compute tasks that’s generally multi-GPU friendly, ray tracing is going to be the best case scenario for compute performance for the 295X2. Under LuxMark AMD sees virtually perfect scaling, with the 295X2 nearly doubling the 290X’s performance under this benchmark. No other single card is currently capable of catching up to the 295X2 in this case.

Our second compute benchmark is Sony Vegas Pro 12, an OpenGL and OpenCL video editing and authoring package. Vegas can use GPUs in a few different ways, the primary uses being to accelerate the video effects and compositing process itself, and in the video encoding step. With video encoding being increasingly offloaded to dedicated DSPs these days we’re focusing on the editing and compositing process, rendering to a low CPU overhead format (XDCAM EX). This specific test comes from Sony, and measures how long it takes to render a video.

Compute: Sony Vegas Pro 12 Video Render

Sony Vegas Pro on the other hand sees no advantage from multiple GPUs. The 295X2 does just as well as the other Hawaii cards at 22 seconds, sharing the top of the chart, but the second GPU goes unused.

Our third benchmark set comes from CLBenchmark 1.1. CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

Compute: CLBenchmark 1.1 Fluid Simulation

Compute: CLBenchmark 1.1 Computer Vision

Like Vegas Pro, the CLBenchmark sub-tests we use here don't scale with additional GPUs. So the 295X2 can only match the performance of the 290X on these benchmarks.

Moving on, our fouth compute benchmark is FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance. Each precision has two modes, explicit and implicit, the difference being whether water atoms are included in the simulation, which adds quite a bit of work and overhead. This is another OpenCL test, as Folding @ Home has moved exclusively to OpenCL this year with FAHCore 17.

Compute: Folding @ Home: Explicit, Single Precision

Compute: Folding @ Home: Explicit, Double Precision

Unlike most of our compute benchmarks, Folding@Home does see some degree of multi-GPU scaling. However the outcome is really a mixed bag; single-precision performance ends up being a wash (if not a slight regression) while double-precision is seeing sub-50% scaling.

Wrapping things up, our final compute benchmark is an in-house project developed by our very own Dr. Ian Cutress. SystemCompute is our first C++ AMP benchmark, utilizing Microsoft’s simple C++ extensions to allow the easy use of GPU computing in C++ programs. SystemCompute in turn is a collection of benchmarks for several different fundamental compute algorithms, as described in this previous article, with the final score represented in points. DirectCompute is the compute backend for C++ AMP on Windows, so this forms our other DirectCompute test.

Compute: SystemCompute v0.5.7.2 C++ AMP Benchmark

Our final compute benchmark has the 295X2 and 290X virtually tied once again, as this is another benchmark that doesn’t scale up with multiple GPUs.



Power, Temperature, & Noise

As always, last but not least is our look at power, temperature, and noise. Next to price and performance of course, these are some of the most important aspects of a GPU, due in large part to the impact of noise. All things considered, a loud card is undesirable unless there’s a sufficiently good reason – or sufficiently good performance – to ignore the noise.

Because the R9 295X2 is for all practical purposes performance-equivalent to a pair of 290X “Uber” mode cards in Crossfire, noise will play a big part in our evaluation here. With AMD charging what amounts to a $300 premium for this card over a pair of 290Xs, the better noise performance AMD can deliver over the 290X CF the more justifiable the premium is. At the same time the 290X CF is an absurdly loud setup due to the high fan speeds required, so it would reflect poorly on AMD if they could not beat it.

Meanwhile since we don’t have any voltage readouts for the R9 295X2, we’ll start off this section with a look at average clockspeeds.

Radeon R9 295X2 Average Clockspeeds
Boost Clock 1018MHz
Metro: LL
1018MHz
CoH2
1000MHz
Bioshock
1018MHz
Battlefield 4
1018MHz
Crysis 3
1018MHz
Crysis: Warhead
1018MHz
TW: Rome 2
1005MHz
Thief
1018MHz
GRID 2
1018MHz
Furmark
860MHz

AMD told us that one of their goals for the R9 295X2 was to rival the 290X “Uber” in Crossfire, and to do this they would need to be able to operate with little-to-no throttling. History has shown us that the 290X has the power to run at 1000MHz in virtually all games, but it lacks the cooling capacity to do so unless its fan speeds are greatly increased.

In the case of the R9 295X2, we can safely say that AMD has been able to deliver on their clockspeed promises, which is why the R9 295X2’s performance has been indistinguishable from 290X “Uber”  Crossfire performance. With the exception of our strategy games, both of which throttle at certain points due to power restrictions on both the R9 295X2 and R9 290X, AMD’s latest dual-GPU card is able to maintain 1018MHz on all of our games.

In fact it does so well that we’re unable to determine the card’s base clockspeeds. FurMark bottoms out at 860MHz from power throttling, but try as we might we can’t get it to level out at any other lower clockspeed when introducing throttling scenarios, completely unlike the 290X. AMD’s refusal to publish their base clockspeeds – or even realistic average boost clockspeeds (ala NVIDIA’s boost clock) – continues to disappoint us. But in the case of the R9 295X2 in particular we’re happy to report that the 1018MHz boost clock AMD claims is the clockspeed the card should be able to sustain across all games.

Idle Power Consumption

Starting as always with idle power, it’s clear that AMD has been able to integrate a pair of 290Xs on to a single card without significantly altering the power requirements. At 98W at the wall, our 295X2-equipped system draws just 1W more than when we have a pair of 290Xs in, with the final watt being explained by the PCIe switch chip. This is even a slight improvement over the 7990 despite 295X2 having bigger GPUs with additional memory, with AMD’s newer card saving an additional 4W.

Load Power Consumption - Crysis 3

Moving over to load power consumption, what’s also clear is that in building the 295X2 AMD has only done a limited amount of chip binning. Compared to the 290X CF “Uber”, the 295X2 delivers virtually identical performance while drawing 41W less at the wall. To be sure, a 41W savings is nothing to sneeze at, but it also does little to resolve Hawaii’s status as a power-hungry chip. When NVIDIA is delivering better performance yet for 110W less at the wall, there’s no getting around the fact that AMD really did need a 500W budget to bring a pair of Hawaii GPUs on to a single board in this manner.

At the same time this is also a stark reminder that in the absence of newer manufacturing nodes from TSMC, high-end GPUs are primarily power limited. In building the 295X2 AMD was able to significantly increase their dual-GPU card performance, but it comes with a 150W+ power consumption increase as compared to the previous-generation 7990. This is not free performance by any means.

Load Power Consumption - FurMark

Switching to FurMark, our favorite pathological power test serves to reiterate what we have seen and said earlier. The 295X2 has the performance of a 290X CF setup; it also has the power consumption of a 290X CF setup. 675W at the wall ends up being exactly as much power as a 290X CF setup draws, while saving 55W over the thermally unrestricted 290X CF “Uber” setup. Furthermore in once again comparing it to the 7990 we can see just how much more power the 295X2 can draw than its predecessor, showing us a power consumption increase of 196W at the wall.

Meanwhile it’s interesting to note that while power consumption under Crysis 3 shows the 295X2 and GTX 780 Ti SLI far apart, this test puts them within 23W of each other. Since everyone hits their power limits here, 2 x 250W comes very close to AMD’s 500W limit.

Idle GPU Temperature

Moving on to temperatures, we can see that AMD’s CLLC performs rather well for itself even under idle. At 32C the 295X2 is roughly on par with most single-card setups, all the while it beats other setups like the 7990 and 290X CF by 5C and 10C respectively.

Load GPU Temperature - Crysis 3

The advantage of using a CLLC for a card like the 295X2 is not just in the amount of raw heat the cooler can collect and dissipate, but what temperatures it can sustain while doing so. At 71C under load the 295X2 is cooler than every other high end card on our charts, including the otherwise respectable 7990. 71C is not all that uncommon for semi-custom and fully-custom open air cooled cards, something we’ve frequently seen over the years, but it’s very rare for a reference card to register this low. It in turn is made all the more impressive due to the fact that the 295X2’s cooling setup is virtually a fully exhaustive cooler, so unlike open air cards that are recycling their hot air this is 71C with many of the benefits traditionally found on a blower type cooler.

It is however important to keep in mind that the 295X2 has a relatively low 75C temperature throttle. So although we’re not throttling under this test (sustaining 1018MHz throughout), we’re only 4C off from reaching that point. Measuring the 120mm fan on the CLLC finds that the fan is operating at its full speed, so the CLLC is operating very close to its own limits, too.

Finally this is why the leakage concept is so important here. At 71C AMD is undoubtedly leaking less power than the 290X CF is at 94C (how much is anyone’s guess at this point), which is part of the reason that the 295X2 is viable in the first place, and combined with the CLLC’s limits is why there is a need for a 75C throttle.

Load GPU Temperature - FurMark

Under FurMark we find our load temperatures unchanged. While FurMark undoubtedly puts a greater load on our 295X2, the card’s power throttle has kept the card from generating too much more in the way of heat, keeping temperatures in check and within the limits of the CLLC.

Idle Noise Levels

Last but not least we have our look at noise. Next to power consumption, we can see the 295X2’s second greatest weakness is idle noise. At 42.6dB the 295X2 is not particularly bad, but it’s just not particularly good either. It’s 1.1dB louder than the 290X CF, 2.6dB louder than a singular 290X, and significantly louder than the 7990. Both the 120mm CLLC fan and the on-board fans appear to be contributing to the noise here, and without fan controls it’s not possible to further adjust either of them. Though by our own admission it’s also not clear if either fan can even run lower or if this is at their minimum operating speeds.

At the end of the day we don’t believe 42.6dB will be a deal-breaker, and in fact for the kind of systems that many 295X2s will be going in to there’s a good chance it won’t even be the loudest component at idle. But it is not capable of replicating the quiet sub-40dB idle noise levels that some of our best cards can pull off.

Load Noise Levels - Crysis 3

Moving in to load noise on the other hand exposes one of the 295X2’s greatest strengths. At this point the CLLC fan is running at full speed (~1860 RPM) and yet despite that we’re only measuring 50dB of noise. For a 500W card. For a card that delivers as much performance as an AMD setup that otherwise generates a full 15dB more noise.

To both praise and shame AMD, the reference cooler for the 290 series cards should have been better than what we ultimately got. But on the other hand you’d hardly believe that the 295X2 is a card from the same company that brought us the 290 just 6 months ago.  The use of a CLLC is not groundbreaking and it does have some very clear tradeoffs, but the fact that AMD can offer this much performance at just 50dB of noise is music to our ears.

Now with that said, at least under games some of our better dual card setups aren’t too far off. The GTX 780 Ti SLI is just 3.7dB louder and delivers better framerates in this benchmark, so there are tradeoffs to be had and this is a testament to just how capable NVIDIA’s GTX Titan cooler is. But the fact that AMD can deliver this kind of framerate coupled with this low a noise level is still a significant accomplishment.

Load Noise Levels - FurMark

Finally, going with our pathological test case puts the 295X2 in an even better position. Dissipating a full 500W, the 295X2 holds at 50dB when every other card goes to 51dB, 60dB, and even 66dB. This is with little doubt the limit to what AMD’s CLLC can do, but none the less it’s solid proof of just how capable the 295X2 is. Even in our worst case scenario it holds at 50dB, delivering top tier performance all the while generating far less noise in the process.

In the end it’s difficult to avoid heaping praise on to AMD for the 295X2’s cooler. To be abundantly clear this rides heavily on the fact that the R9 295X2 is the only card in this review that is using a CLLC – everyone is a blower or open air cooler – but there is such a thing as having the sense to push boundaries when one needs to. Which is to say that this isn’t as much a case of AMD innovating as it is AMD having the sense to turn to Asetek for their CLLC, but regardless of the source of the parts the results speak for themselves.

At the same time however, while AMD is doing a fantastic job of moving nearly 500W of heat it bears repeating that we’re still in a situation where AMD needs to move 500W of heat in the first place. Does the performance justify the power? Probably. But with the R9 295X2 AMD is treading new ground when it comes to power consumption, and there are other cost and cooling tradeoffs to be had for such high power consumption.



Final Words

Bringing this review to a close, when I first heard that AMD was going to build a full performance dual Hawaii GPU solution, I was admittedly unsure about what to expect. The power requirements for a dual Hawaii card would pose an interesting set of challenges for AMD, and AMD’s most recent high-end air coolers were not as effective as they should have been.

In that context AMD’s decision to build a card around a closed loop liquid cooling solution makes a lot of sense for what they wanted to achieve. Like any uncommon cooling solution, the semi-exotic nature of a CLLC is a double edged sword that brings with it both benefits and drawbacks; the benefits being enhanced cooling performance, and the drawbacks being complexity and size. So it was clear from the start that given AMD’s goals and their chips, the benefits they could stand to gain could very well outweigh the drawbacks of going so far off of the beaten path.

To that end the Radeon R9 295X2 is a beast, and that goes for every sense of the word. From a performance standpoint AMD has delivered on their goals of offering the full, unthrottled performance of a Radeon R9 290X Crossfire solution. AMD has called the 295X2 an “uncompromised” card and that’s exactly what they have put together, making absolutely no performance compromises in putting a pair of Hawaii GPUs on to a single video card. In a sense it’s almost too simple – there are no real edge cases or other performance bottlenecks to discuss – but then again that’s exactly what it means to offer uncompromised performance.

“Beastly” is just as fitting for the card when it comes to its cooling too. With a maximum noise level of 50dB the 295X2’s CLLC is unlike anything we’re reviewed before, offering acoustic performance as good as or better than some of the best high end cards of this generation despite the heavy cooling workloads such a product calls for. Which brings us to the other beastly aspect, which is the card’s 500W TDP. AMD has put together a card that can put out 500W of heat and still keep itself cooled, but there’s no getting around the fact that at half a kilowatt in power consumption the 295X2 draws more power than any other single card we’ve reviewed before.

Taken altogether this puts the 295X2 in a very interesting spot. The performance offered by the 295X2 is the same performance offered by the 290X in Crossfire, no more and no less. This means that depending on whether we’re looking at 2K or 4K resolutions the 295X2 either trails a cheaper set of GTX 780 Tis in SLI by 5%, or at the kinds of resolutions that most require this much performance it can now exceed those very same GeForce cards by 5%. On the other hand NVIDIA still holds a consistent edge over AMD in frame pacing. But thanks to their XDMA engine AMD's frame pacing performance is vastly improved compared to their prior dual-GPU cards and is now good enough overall (though there's definitely room for further improvement).

But more significantly, by its very nature as a CLLC equipped dual-GPU video card the 295X2 stands alone among current video cards. There’s nothing else like it in terms of design, and that admittedly makes it difficult to properly place the 295X2 in reference to other video cards. Do we talk about how it’s one of only a handful of dual-GPU cards? Or do we talk about the price? Or do we talk about the unconventional cooler?

However perhaps it’s best to frame the 295X2 with respect to its competition, or rather the lack thereof. For all the benefits and drawbacks of AMD’s card perhaps the most unexpected thing they have going for them is that they won’t be facing any real competition from NVIDIA. NVIDIA has announced their own dual-GPU card for later this month, the GeForce GTX Titan Z, but priced at $3000 and targeted more heavily at compute users than it is gamers, the GTX Titan Z is going to reside in its own little niche, leaving the 295X2 alone in the market at half the price. We’ll see what GTX Titan Z brings to the table later this month, but no matter what AMD is going to have an incredible edge on price that we expect will make most potential buyers think twice, despite the 295X2’s own $1500 price tag.

Ultimately while this outcome does put the 295X2 in something of a “winner by default” position, it does not change the fact that AMD has put together a very solid card, and what’s by far their best dual-GPU card yet. Between the price tag and the unconventional cooler it’s certainly a departure from the norm, but for those buyers who can afford and fit this beastly card, it sets a new and very high standard for just what a dual-GPU should do.

Log in

Don't have an account? Sign up now