Original Link: http://www.anandtech.com/show/6934/choosing-a-gaming-cpu-single-multigpu-at-1440p
Choosing a Gaming CPU: Single + Multi-GPU at 1440p, April 2013by Ian Cutress on May 8, 2013 10:00 AM EST
One question when building or upgrading a gaming system is of which CPU to choose - does it matter if I have a quad core from Intel, or a quad module from AMD? Perhaps something simpler will do the trick, and I can spend the difference on the GPU. What if you are running a multi-GPU setup, does the CPU have a bigger effect? This was the question I set out to help answer.
A few things before we start:
This set of results is by no means extensive or exhaustive. For the sake of expediency I could not select 10 different gaming titles across a variety of engines and then test them in seven or more different configurations per game and per CPU, nor could I test every different CPU made. As a result, on the gaming side, I limited myself to one resolution, one set of settings, and four very regular testing titles that offer time demos: Metro 2033, DiRT 3, Civilization V and Sleeping Dogs. This is obviously not Skyrim, Battlefield 3, Crysis 3 or Far Cry 3, which may be more relevant in your set up.
The arguments for and against time demo testing as well as the arguments for taking FRAPs values of sequences are well documented (time demos might not be representative vs. consistency and realism of FRAPsing a repeated run across a field), however all of our tests can be run on home systems to get a feel for how a system performs. Below is a discussion regarding AI, one of the common usages for a CPU in a game, and how it affects the system. Out of our benchmarks, DiRT 3 plays a game, including AI in the result, and the turn-based Civilization V has no concern for direct AI except for time between turns.
All this combines in with my unique position as the motherboard senior editor here at AnandTech – the position gives me access to a wide variety of motherboard chipsets, lane allocations and a fair number of CPUs. GPUs are not necessarily in a large supply in my side of the reviewing area, but both ASUS and ECS have provided my test beds with HD7970s and GTX580s respectively, such that they have been quintessential in being part of my test bed for 12 and 21 months. The task set before me in this review would be almost a career in itself if we were to expand to more GPUs and more multi-GPU setups. Thus testing up to 4x 7970 and up to 2x GTX 580 is a more than reasonable place to start.
Where It All Began
The most important point to note is how this set of results came to pass. Several months ago I came across a few sets of testing by other review websites that floored me – simple CPU comparison tests for gaming which were spreading like wildfire among the forums, and some results contradicted the general prevailing opinion on the topic. These results were pulling all sorts of lurking forum users out of the woodwork to have an opinion, and being the well-adjusted scientist I am, I set forth to confirm the results were, at least in part, valid.
What came next was a shock – some had no real explanation of the hardware setups. While the basic overview of hardware was supplied, there was no run down of settings used, and no attempt to justify the findings which had obviously caused quite a stir. Needless to say, I felt stunned that the lack of verbose testing, as well as both the results and a lot of the conversation, particularly from avid fans of Team Blue and Team Red, that followed. I planned to right this wrong the best way I know how – with science!
The other reason for pulling together the results in this article is perhaps the one I originally started with – the need to update drivers every so often. Since Ivy Bridge release, I have been using Catalyst 12.3 and GeForce 296.10 WHQL on my test beds. This causes problems – older drivers are not optimized, readers sometimes complain if older drivers are used, and new games cannot be added to the test bed because they might not scale correctly due to the older drivers. So while there are some reviews on the internet that update drivers between testing and keep the old numbers (leading to skewed results), actually taking time out to retest a number of platforms for more data points solely on the new drivers is actually a large undertaking.
For example, testing new drivers over six platforms (CPU/motherboard combinations) would mean: six platforms, four games, seven different GPU configurations, ~10 minutes per test plus 2+ hours to set up each platform and install a new OS/drivers/set up benchmarks. That makes 40+ hours of solid testing (if all goes without a second lost here or there), or just over a full working week – more if I also test the CPU performance for a computational benchmark update, or exponentially more if I include multiple resolutions and setting options.
If this is all that is worked on that week, it means no new content – so it happens rarely, perhaps once a year or before a big launch. This time was now, and when I started this testing, I was moving to Catalyst 13.1 and GeForce 310.90, which by the time this review goes live will have already been superseded! In reality, I have been slowly working on this data set for the best part of 10 weeks while also reviewing other hardware (but keeping those reviews with consistent driver comparisons). In total this review encapsulates 24 different CPU setups, with up to 6 different GPU configurations, meaning 430 data points, 1375 benchmark loops and over 51 hours in just GPU benchmarks alone, without considering setup time or driver issues.
What Does the CPU do in a Game?
A lot of game developers use customized versions of game engines, such as the EGO engine for driving games or the Unreal engine. The engine provides the underpinnings for a lot of the code, and the optimizations therein. The engine also decides what in the game gets offloaded onto the GPU.
Imagine the code that makes up the game as a linear sequence of events. In order to go through the game quickly, we need the fastest single core processor available. Of course, games are not like this – lots of the game can be parallelized, such as vector calculations for graphics. These were of course the first to be moved from CPU to the GPU. Over time, more parts of the code have made the move – physics and compute being the main features in recent months and years.
The GPU is good at independent, simple tasks – calculating which color is in which pixel is an example of this, along with addition processing and post-processing features (FXAA and so on). If a task is linear, it lives on the CPU, such as loading textures into memory or negotiating which data to transfer between the memory and the GPUs. The CPU also takes control of independent complex tasks, as the CPU is the one that can make complicated logic analysis.
Very few parts of a game come under this heading of ‘independent yet complex’. Anything suitable for the GPU but not ported over will be here, and the big one usually quoted is artificial intelligence. Deciding where an NPC is going to run, shoot or fly could be considered a very complex set of calculations, ideal for fast CPUs. The counter argument is that games have had complex AI for years – the number of times I personally was destroyed by a Dark Sim on Perfect Dark on the N64 is testament to either my uselessness or the fact that complex AI can be configured with not much CPU power. AI is unlikely to be a limiting factor in frame rates due to CPU usage.
What is most likely going to be the limiting factor is how the CPU can manage data. As engines evolve, they try and use data between the CPU, memory and GPUs less – if textures can be kept on the GPU, then they will stay there. But some engines are not as perfect as we would like them to be, resulting in the CPU as the limiting factor. As CPU performance increases, and those that write the engines in which games are made understand the ecosystem, CPU performance should be less of an issue over time. All roads point towards the PS4 of course, and its 8-core Jaguar processor. Is this all that is needed for a single GPU, albeit in an HSA environment?
Another angle I wanted to test beyond most other websites is multi-GPU. There is content online dealing mostly with single GPU setups, with a few for dual GPU. Even though the number of multi-GPU users is actually quite small globally, the enthusiast markets are clearly geared for it. We get motherboards with support for four GPU cards; we have cases that will support a dual processor board as well as four double-height GPUs. Then there are GPUs being released with two sets of silicon on a PCB, wrapped in a double or triple width cooler.
More often than not on a forum, people will ask ‘what GPU for $xxx’ and some of the suggestions will be towards two GPUs at half the budget, as it commonly offers more performance than a single GPU if the game and the drivers all work smoothly (at the cost of power, heat, and bad driver scenarios). The ecosystem supports multi-GPU setups, so I felt it right to test at least one four-way setup. Although with great power comes great responsibility – there was no point testing 4-way 7970s on 1080p.
Typically in this price bracket, users will go for multi-monitor setups, along the lines of 5760x1080, or big monitor setups like 1440p, 1600p, or the mega-rich might try 4K. Ultimately the high end enthusiast, with cash to burn, is going to gravitate towards 4K, and I cannot wait until that becomes a reality. So for a median point in all of this, we are testing at 1440p and maximum settings. This will put the strain on our Core 2 Duo and Celeron G465 samples, but should be easy pickings for our multi-processor, multi-GPU beast of a machine.
A Minor Problem In Interpreting Results
Throughout testing for this review, there were clearly going to be some issues to consider. Chief of these is the question of consistency and in particular if something like Metro 2033 decides to have an ‘easy’ run which reports +3% higher than normal. For that specific example we get around this by double testing, as the easy run typically appears in the first batch – so we run two or three batches of four and disregard the first batch.
The other, perhaps bigger, issue is interpreting results. If I get 40.0 FPS on a Phenom II X4-960T, 40.1 FPS on an i5-2500K, and then 40.2 FPS on a Phenom II X2-555 BE, does that make the results invalid? The important points to recognize here are statistics and system state.
System State: We have all had times booting a PC when it feels sluggish, but this sluggish behavior disappears on reboot. The same thing can occur with testing, and usually happens as a result of bad initialization or a bad cache optimization routine at boot time. As a result, we try and spot these circumstances and re-run. With more time we would take 100 different measurements of each benchmark, with reboots, and cross out the outliers. Time constraints outside of academia unfortunately do not give us this opportunity.
Statistics: System state aside, frame rate values will often fluctuate around an average. This will mean (depending on the benchmark) that the result could be +/- a few percentage points on each run. So what happens if you have a run of four time demos, and each of them are +2% above the ‘average’ FPS? From the outside, as you will not know the true average, you cannot say if it is valid as the data set is extremely small. If we take more runs, we can find the variance (the technical version of the term), the standard deviation, and perhaps represent the mean, median and mode of a set of results.
As always, the main constraint in articles like these is time – the quicker to publish, the less testing, the larger the error bars and the higher likelihood that some results are going to be skewed because it just so happened to be a good/bad benchmark run. So the example given above of the X2-555 getting a better result is down to interpretation – each result might be +/- 0.5 FPS on average, and because they are all pretty similar we are actually more GPU limited. So it is more whether the GPU has a good/bad run in this circumstance.
For this example, I batched 100 runs of my common WinRAR test in motherboard testing, on an i5-2500K CPU with a Maximus V Formula. Results varied between 71 seconds and 74 seconds, with a large gravitation towards the lower end. To represent this statistically, we normally use a histogram, which separates the results up into ‘bins’ (e.g. 71.00 seconds to 71.25 seconds) of how accurate the final result has to be. Here is an initial representation of the data (time vs. run number), and a few histograms of that data, using a bin size of 1.00 s, 0.75s, 0.5s, 0.33s, 0.25s and 0.1s.
As we get down to the lower bin sizes, there is a pair of large groupings of results between ~71 seconds and ~ 72 seconds. The overall average/mean of the data is 71.88 due to the outliers around 74 seconds, with the median at 72.04 seconds and standard deviation of 0.660. What is the right value to report? Overall average? Peak? Average +/- standard deviation? With the results very skewed around two values, what happens if I do 1-3 runs and get ~71 seconds and none around ~72 seconds?
Statistics is clearly a large field, and without a large sample size, most numbers can be one-off results that are not truly reflective of the data. It is important to ask yourself every time you read a review with a result – how many data points went into that final value, and what analysis was performed?
For this review, we typically take four runs of our GPU tests each, except Civilization V which is extremely consistent +/- 0.1 FPS. The result reported is the average of those four values, minus any results we feel are inconsistent. At times runs have been repeated in order to confirm the value, but this will not be noted in the results.
The Bulldozer Challenge
Another purpose of this article was to tackle the problem surrounding Bulldozer and its derivatives, such as Piledriver and thus all Trinity APUs. The architecture is such that Windows 7, by default, does not accurately assign new threads to new modules – the ‘freshly installed’ stance is to double up on threads per module before moving to the next. By installing a pair of Windows Updates (which do not show in Windows Update automatically), we get an effect called ‘core parking’, which assigns the first series of threads each to its own module, giving it access to a pair of INT and an FP unit, rather than having pairs of threads competing for the prize. This affects variable threaded loading the most, particularly from 2 to 2N-2 threads where N is the number of modules in the CPU (thus 2 to 6 threads in an FX-8150). It should come as no surprise that games fall into this category, so we want to test with and without the entire core parking features in our benchmarks.
Hurdles with NVIDIA and 3-Way SLI on Ivy Bridge
Users who have been keeping up to date with motherboard options on Z77 will understand that there are several ways to put three PCIe slots onto a motherboard. The majority of sub-$250 motherboards will use three PCIe slots in a PCIe 3.0 x8/x8 + PCIe 2.0 x4 arrangement (meaning x8/x8 from the CPU and x4 from the chipset), allowing either two-way SLI or three-way Crossfire. Some motherboards will use a different Ivy Bridge lane allocation option such that we have a PCIe 3.0 x8/x4/x4 layout, giving three-way Crossfire but only two-way SLI. In fact in this arrangement, fitting the final x4 with a sound/raid card disables two-way SLI entirely.
This is due to a not widely publicized requirement of SLI – it needs at least an x8 lane allocation in order to work (either PCIe 2.0 or 3.0). Anything less than this on any GPU and you will be denied in the software. So putting in that third card will cause the second lane to drop to x4, disabling two-way SLI. There are motherboards that have a switch to change to x8/x8 + x4 in this scenario, but we are still capped at two-way SLI.
The only way to go onto 3-way or 4-way SLI is via a PLX 8747 enabled motherboard, which greatly enhances the cost of a motherboard build. This should be kept in mind when dealing with the final results.
It has come to my attention that even if the results were to come out X > Y, some users may call out that the better processor draws more power, which at the end of the day costs more money if you add it up over a year. For the purposes of this review, we are of the opinion that if you are gaming on a budget, then high-end GPUs such as the ones used here are not going to be within your price range.
Simple fun gaming can be had on a low resolution, limited detail system for not much money – for example at a recent LAN I went to I enjoyed 3-4 hours of TF2 fun on my AMD netbook with integrated HD3210 graphics, even though I had to install the ultra-low resolution texture pack and mods to get 30+ FPS. But I had a great time, and thus the beauty of high definition graphics of the bigger systems might not be of concern as long as the frame rates are good.
But if you want the best, you will pay for the best, even if it comes at the electricity cost. Budget gaming is fine, but this review is designed to focus on 1440p with maximum settings, which is not a budget gaming scenario.
Format Of This Article
On the next couple of pages, I will be going through in detail our hardware for this review, including CPUs, motherboards, GPUs and memory. Then we will move to the actual hardware setups, with CPU speeds and memory timings (with motherboards that actually enable XMP) detailed. Also important to note is the motherboards being used – for completeness I have tested several CPUs in two different motherboards because of GPU lane allocations.
We are living in an age where PCIe switches and additional chips are used to expand GPU lane layouts, so much so that there are up to 20 different configurations for Z77 motherboards alone. Sometimes the lane allocation makes a difference, and it can make a large difference using three or more GPUs (x8/x4/x4 vs. x16/x8/x8 with PLX), even with the added latency sometimes associated with the PCIe switches. Our testing over time will include the majority of the PCIe lane allocations on modern setups, but for our first article we are looking at the major ones we are likely to come across.
The results pages will start with a basic CPU analysis, running through my regular motherboard tests on the CPU. This should give us a feel for how much power each CPU has in dealing with mathematics and real world tests, both for integer operations (important on Bulldozer/Piledriver/Radeon) and floating point operations (where Intel/NVIDIA seem to perform best).
We will then move to each of our four gaming titles in turn, in our six different GPU configurations. As mentioned above, in GPU limited scenarios it may seem odd if a sub-$100 CPU is higher than one north of $300, but we hope to explain the tide of results as we go.
I hope this will be an ongoing project here at AnandTech, and over time we can add more CPUs, 4K testing, perhaps even show four-way Titan should that be available to us. The only danger is that on a driver or game change, it takes another chunk of time to get data! Any suggestions of course are greatly appreciated – drop me an email at firstname.lastname@example.org. Our next port of call will most likely be Haswell, which I am very much looking forward to testing.
CPUs, GPUs, Motherboards, and Memory
For an article like this, getting a range of CPUs, which includes the most common and popular, is very important. I have been at AnandTech for just over two years now, and in that time we have had Sandy Bridge, Llano, Bulldozer, Sandy Bridge-E, Ivy Bridge, Trinity and Vishera, of which I tend to get supplied the top end processors of each generation for testing. (As a motherboard reviewer, it is important to make the motherboard the limiting factor.) A lot of users have jumped to one of these platforms, although a large number are still on Wolfdale (Core2), Nehalem, Westmere, Phenom II (Thuban/Zosma/Deneb) or Athlon II.
I have attempted to pool all my AnandTech resources, contacts, and personal resources together to get a good spread of the current ecosystem, with more focus on the modern end of the spectrum. It is worth noting that a multi-GPU user is more likely to have the top line Ivy Bridge, Vishera or Sandy Bridge-E CPU, as well as a top range motherboard, rather than an old Wolfdale. Nevertheless, we will see how they perform. There are a few obvious CPU omissions that I could not obtain for this first review which will hopefully be remedied over time in our next update.
My criteria for obtaining CPUs was to use at least one from the most recent architectures, as well as a range of cores/modules/threads/speeds. The basic list as it stands is:
Cores / Modules
|A6-3650||Llano||FM1||4 (4)||2600||N/A||4 MB / None|
|A8-3850||Llano||FM1||4 (4)||2900||N/A||4 MB / None|
|A8-5600K||Trinity||FM2||2 (4)||3600||3900||4 MB / None|
|A10-5800K||Trinity||FM2||2 (4)||3800||4200||4 MB / None|
|Phenom II X2-555 BE||Callisto K10||AM3||2 (2)||3200||N/A||1 MB / 6 MB|
|Phenom II X4-960T||Zosma K10||AM3||4 (4)||3200||N/A||2 MB / 6 MB|
|Phenom II X6-1100T||Thuban K10||AM3||6 (6)||3300||3700||3 MB / 6 MB|
|FX-8150||Bulldozer||AM3+||4 (8)||3600||4200||8 MB / 8 MB|
|FX-8350||Piledriver||AM3+||4 (8)||4000||4200||8 MB / 8 MB|
|E6400||Conroe||775||2 (2)||2133||N/A||2 MB / None|
|E6700||Conroe||775||2 (2)||2667||N/A||4 MB / None|
|Celeron G465||Sandy Bridge||1155||1 (2)||1900||N/A||0.25 MB / 1.5 MB|
|Core i5-2500K||Sandy Bridge||1155||4 (4)||3300||3700||1 MB / 6 MB|
|Core i7-2600K||Sandy Bridge||1155||4 (8)||3400||3800||1 MB / 8 MB|
|Core i3-3225||Ivy Bridge||1155||2 (4)||3300||N/A||0.5 MB / 3 MB|
|Core i7-3770K||Ivy Bridge||1155||4 (8)||3500||3900||1 MB / 8 MB|
|Core i7-3930K||Sandy Bridge-E||2011||6 (12)||3200||3800||1.5 MB / 12 MB|
|Core i7-3960X||Sandy Bridge-E||2011||6 (12)||3300||3900||1.5 MB / 15 MB|
|Xeon X5690||Westmere||1366||6 (12)||3467||3733||1.5 MB / 12 MB|
A small selection
There omissions are clear to see, such as the i5-3570K, a dual core Llano/Trinity, a dual/tri module Bulldozer/Piledriver, i7-920, i7-3820, or anything Nehalem. These will hopefully be coming up in another review.
My first and foremost thanks go to both ASUS and ECS for supplying me with these GPUs for my test beds. They have been in and out of 60+ motherboards without any issue, and will hopefully continue. My usual scenario for updating GPUs is to flip AMD/NVIDIA every couple of generations – last time it was HD5850 to HD7970, and as such in the future we will move to a 7-series NVIDIA card or a set of Titans (which might outlive a generation or two).
ASUS HD 7970 (HD7970-3GD5)
The ASUS HD 7970 is the reference model at the 7970 launch, using GCN architecture, 2048 SPs at 925 MHz with 3GB of 4.6GHz GDDR5 memory. We have four cards to be used in 1x, 2x, 3x and 4x configurations where possible, also using PCIe 3.0 when enabled by default.
ECS GTX 580 (NGTX580-1536PI-F)
ECS is both a motherboard manufacturer and an NVIDIA card manufacturer, and while most of their VGA models are sold outside of the US, some do make it onto etailers like Newegg. This GTX 580 is also a reference model, with 512 CUDA cores at 772 MHz and 1.5GB of 4GHz GDDR5 memory. We have two cards to be used in 1x and 2x configurations at PCIe 2.0.
The CPU is not always the main part of the picture for this sort of review – the motherboard is equally important as the motherboard dictates how the CPU and the GPU communicate with each other, and what the lane allocation will be. As mentioned on the previous page, there are 20+ PCIe configurations for Z77 alone when you consider some boards are native, some use a PLX 8747 chip, others use two PLX 8747 chips, and about half of the Z77 motherboards on the market enable four PCIe 2.0 lanes from the chipset for CrossFireX use (at high latency).
We have tried to be fair and take motherboards that may have a small premium but are equipped to deal with the job. As a result, some motherboards may also use MultiCore Turbo, which as we have detailed in the past, gives the top turbo speed of the CPU regardless of the loading.
As a result of this lane allocation business, each value in our review will be attributed to both a CPU, whether it uses MCT, and a lane allocation. This would mean something such as i7-3770K+ (3 - x16/x8/x8) would represent an i7-3770K with MCT in a PCIe 3.0 tri-GPU configuration. More on this below.
The ASUS Maximus V Formula has a three way lane allocation of x8/x4/x4 for Ivy Bridge, x8/x8 for Sandy Bridge, and enables MCT.
The Gigabyte Z77X-UP7 has a four way lane allocation of x16/x16, x16/x8/x8 and x8/x8/x8/x8, all via a PLX 8747 chip. It also has a single x16 that bypasses the PLX chip and is thus native, and all configurations enable MCT.
The Gigabyte G1.Sniper M3 is a little different, offering x16, x8/x8, or if you accidentally put the cards in the wrong slots, x16 + x4 from the chipset. This additional configuration is seen on a number of cheaper Z77 ATX motherboards, as well as a few mATX models. The G1.Sniper M3 also implements MCT as standard.
The ASRock X79 Professional is a PCIe 2.0 enabled board offering x16/x16, x16/x16/x8 and x16/x8/x8/x8.
The ASUS Rampage IV Extreme is a PCIe 3.0 enabled board offering the same PCIe layout as the ASRock, except it enables MCT by default.
For Westmere Xeons: The EVGA SR-2
Due to the timing of the first roundup, I was able to use an EVGA SR-2 with a pair of Xeons on loan from Gigabyte for our server testing. The SR-2 forms the basis of our beast machine below, and uses two Westmere-EP Xeons to give PCIe 2.0 x16/x16/x16/x16 via NF200 chips.
For Core 2 Duo: The MSI i975X Platinum PowerUp and ASUS Commando (P965)
The MSI is the motherboard I used for our quick Core 2 Duo comparison pipeline post a few months ago – I still have it sitting on my desk, and it seemed apt to include it in this test. The MSI i975X Platinum PowerUp offers two PCIe 1.1 slots, capable of Crossfire up to x8/x8. I also rummaged through my pile of old motherboards and found the ASUS Commando with a CPU installed, and as it offered x16+x4, this was tested also.
Llano throws a little oddball into the mix, being a true quad core unlike Trinity. The A75-UD4H from Gigabyte was the first one to hand, and offers two PCIe slots at x8/x8. Like the Core 2 Duo setup, we are not SLI enabled.
After finding an A8-3850 CPU as another comparison point for the A6-3650, I pulled out the A75 Extreme6, which offers three-way CFX as x8/x8 + x4 from the chipset as well as the configurations offered by the A75-UD4H.
For Trinity: The Gigabyte F2A85X-UP4
Technically A85X motherboards for Trinity support up to x8/x8 in Crossfire, but the F2A85X-UP4, like other high end A85X motherboards, implements four lanes from the chipset for 3-way AMD linking. Our initial showing on three-way via that chipset linking was not that great, and this review will help quantify this.
For AM3: The ASUS Crosshair V Formula
As the 990FX covers a lot of processor families, the safest place to sit would be on one of the top motherboards available. Technically the Formula-Z is newer and supports Vishera easier, but we have not had the Formula-Z in to test, and the basic Formula was still able to run an FX-8350 as long as we kept the VRMs cool as a cucumber. The CVF offers up to three-way CFX and SLI testing (x16/x8/x8).
Our good friends at G.Skill are putting their best foot forward in supplying us with high end kits to test. The issue with the memory is more dependent on what the motherboard will support – in order to keep testing consistent, no overclocks were performed. This meant that boards and BIOSes limited to a certain DRAM multiplier were set at the maximum multiplier possible. In order to keep things fairer overall, the modules were adjusted for tighter timings. All of this is noted in our final setup lists.
Our main memory testing kit is our trusty G.Skill 4x4GB DDR3-2400 RipjawsX kit which has been part of our motherboard testing for over twelve months. For times when we had two systems being tested side by side, a G.Skill 4x4GB DDR3-2400 Trident X kit was also used.
For The Beast, which is one of the systems that has the issue with higher memory dividers, we pulled in a pair of tri-channel kits from X58 testing. These are high-end kits as well, currently discontinued as they tended to stop working with too much voltage. We have sets of 3x2GB OCZ Blade DDR3-2133 8-9-8 and 3x1GB Dominator GT DDR3-2000 7-8-7 for this purpose, which we ran at 1333 6-7-6 due to motherboard limitations at stock settings.
To end, our Core 2 Duo CPUs clearly gets their own DDR2 memory for completeness. This is a 2x2GB kit of OCZ DDR2-1033 5-6-6.
To start, we want to thank the many manufacturers who have donated kit for our test beds in order to make this review, along with many others, possible.
Thank you to OCZ for providing us with 1250W Gold Power Supplies.
Thank you to G.Skill for providing us with the memory kits.
Thank you to ASUS for providing us with the AMD GPUs and some IO Testing kit.
Thank you to ECS for providing us with the NVIDIA GPUs.
Thank you to Corsair for providing us with the Corsair H80i CLC.
Thank you to Rosewill for providing us with the 500W Platinum Power Supply for mITX testing, a BlackHawk Ultra, and 1600W Hercules PSU for extreme dual CPU + quad GPU testing, and RK-9100 keyboards.
Thank you to Gigabyte for providing us with the X5690 CPUs.
Also many thanks go to the manufacturers who over the years have provided review samples which contribute to this review.
In order to keep the testing fair, we set strict rules in place for each of these setups. For every new chipset, the SSD was formatted and a fresh installation of the OS was applied. The chipset drivers for the motherboard were installed, along with NVIDIA drivers then AMD drivers. The games were preinstalled on a second partition, but relinked to ensure they worked properly. The games were then tested as follows:
Metro 2033: Benchmark Mode, two runs of four scenes at 1440p, max settings. First run of four is discarded, average of second run is taken (minus outliers).
DiRT 3: Benchmark Mode, four runs of the first scene with 8 cars at 1440p, max settings. Average is taken.
Civilization V: One five minute run of the benchmark mode accessible at the command line, at 1440p and max settings. Results produced are total frames in sets of 60 seconds, average taken.
Sleeping Dogs: Using the Adrenaline benchmark software, four scenes at 1440p in Ultra settings. Average is taken.
If the platform was being used for the next CPU (e.g. Maximus V Formula, moving from FX-8150 to FX-8350), there's no need to reinstall. If the platform is changed for the next test, a full reinstall and setup takes place.
How to Read This Review
Due to the large number of different variables in our review, it is hard to accurately label each data point with all the information about that setup. It also stands to reason that just putting the CPU model is also a bad idea when the same CPU could be in two different motherboards with different GPU lane allocations. There is also the memory aspect to consider, as well as if a motherboard uses MCT at stock. Here is a set of labels correlating to configurations you will see in this review:
CPU[+][(CP)] (PCIe version – lane allocation to GPUs [PLX])
First is the name of the CPU, then an optional + identifier for MCT enabled motherboards. (CP) indicates we are dealing with a Bulldozer derived CPU and using the Core Parking updates. Inside the parentheses is the PCIe version of the lanes we are dealing with, along with the lane allocation to each GPU. The final flag is if a PLX chip is involved in lane allocation.
Thus, for example:
A10-5800K (2 – x16/x16): A10-5800K with two GPUs in PCIe 2.0 mode
A10-5800K (CP) (2 – x16/x16): A10-5800K using Core Parking updates with two GPUs in PCIe 2.0 mode
FX-8350K (2 – x16/x16/x8): FX-8350 with three GPUs in PCIe 2.0 mode
i7-3770K (3/2 – x8/x8 + x4): i7-3770K powering three GPUs in PCIe 3.0 but the third GPU is using the PCIe 2.0 x4 from the chipset
i7-3770K+ (3 – x16): i7-3770K (with MCT) powering one GPU in PCIe 3.0 mode
i7-3770K+ (3 – x8/x8/x8/x8 PLX): i7-3770K (with MCT) powering four GPUs in PCIe 3.0 via a PLX chip
Common Configuration Points
All the system setups below have the following consistent configurations points:
- A fresh install of Windows 7 Ultimate 64-bit
- Either an Intel Stock CPU Cooler, a Corsair H80i CLC or Thermalright TRUE Copper
- OCZ 1250W Gold ZX Series PSUs (Rosewill 1600W Hercules for The Beast)
- Up to 4x ASUS AMD HD 7970 GPUs, using Catalyst 13.1
- Up to 2x ECS NVIDIA GTX 580 GPUs, using GeForce WHQL 310.90
- SSD Boot Drives, either OCZ Vertex 3 128GB or Kingston HyperX 120GB
- LG GH22NS50 Optical Drives
- Open Test Beds, either a DimasTech V2.5 EasyHard or a CoolerMaster Test Lab
A6-3650 + Gigabyte A75-UD4H + 16GB DDR3-1866 8-10-10
A8-3850 + ASRock A75 Extreme6 + 16GB DDR3 1866 8-10-10
A8-5600K + Gigabyte F2A85-UP4 + 16GB DDR3-2133 9-10-10
A10-5800K + Gigabyte F2A85-UP4 + 16GB DDR3-2133 9-10-10
X2-555 BE + ASUS Crosshair V Formula + 16GB DDR3 1600 8-8-8
X4-960T + ASUS Crosshair V Formula + 16GB DDR3-1600 8-8-8
X6-1100T + ASUS Crosshair V Formula + 16GB DDR3-1600 8-8-8
FX-8150 + ASUS Crosshair V Formula + 16GB DDR3-2133 10-12-11
FX-8350 + ASUS Crosshair V Formula + 16GB DDR3-2133 9-11-10
FX-8150 + ASUS Crosshair V Formula + 16GB DDR3-2133 10-12-11 + CP
FX-8350 + ASUS Crosshair V Formula + 16GB DDR3-2133 9-11-10 + CP
E6400 + MSI i975X Platinum + 4GB DDR2-666 5-6-6
E6700 + ASUS P965 Commando + 4GB DDR2-666 4-5-5
Celeron G465 + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11
i5-2500K + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11
i7-2600K + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11
i3-3225 + ASUS Maximus V Formula + 16GB DDR3-2400 10-12-12
i7-3770K + Gigabyte Z77X-UP7 + 16GB DDR3-2133 9-11-11
i7-3770K + ASUS Maximus V Formula + 16GB DDR3-2400 9-11-11
i7-3770K + Gigabyte G1.Sniper M3 + 16GB DDR3-2400 9-11-11
i7-3930K + ASUS Rampage IV Extreme + 16GB DDR3-2133 10-12-12
i7-3960X + ASRock X79 Professional 16GB DDR3-2133 10-12-12
Xeon X5690 + EVGA SR-2 + 6GB DDR3 1333 6-7-7
2x Xeon X5690 + EVGA SR-2 + 9GB DDR3 1333 6-7-7
The Beast is a special machine put together to help with the review as a result of various hardware coming into my possession all at the same time. The core of the system is an EVGA SR-2 motherboard, the best and last dual processor motherboard to deal with overclockable Xeon processors. This is paired with a couple of X5690 Xeon processors, the highest clocked Westmere Xeon that Intel offers, and many thanks to Gigabyte for loaning these to us for a pair of reviews. I went and purchased a pair of Intel Xeon socket 1567 coolers for the system, which have a 2U z-height restriction but are copper piped and cooled by powerful (and loud) delta fans. These provided enough cooling power to push the Xeons from 3.43GHz to 4.6GHz during some overclocking attempts, so are more than adequate for the job at hand (if you can put up with the noise).
Our system is paired with some high quality DDR3 Hyper memory, once famed for its overclocking prowess but due to frequent deaths from high voltage, is now relegated as a memory for overclockers. However at stock this memory performs great, often in the region of DDR3-2000 C7, so our memory kits are well primed for this setup.
Of course a full system is nothing without a case and power supply to help justify a build. With the motherboard being absolutely huge, no standard case would take it – only large cases designed for desktop-based server 2P motherboards are adequate. Luckily there is one case which is selling well, and Dustin reviewed recently – the Rosewill Blackhawk Ultra. Aside from the weight, this case had no issues with installing the motherboard; it could easily fit in another 10 HDDs, four optical bays, and any major GPU setup you could possibly think of – with plenty of fans just for good measure. Read Dustin’s review for a more thorough analysis, but I have some good shots of the system and motherboard installed for you:
Rosewill also has the perfect power supply for dealing with a dual processor, quad CrossFireX setup. First, consider how many connections this 2P setup needs – we have a normal 24-pin ATX connector for the motherboard, one 8-pin CPU power connector for each CPU, an additional 6-pin PCIe power connector for each CPU to provide extra power, another 6-pin PCIe power connector to provide power to the PCIe slots, and then two 6+2 PCIe power connectors for the GPUs. That makes 11 PCIe connectors needed in total, and this is alongside all the fans in the case and whatever SSD/ODD setup a user wants. The power supply used for this monster is the 1600W Hercules, rated 80PLUS Silver. With access to 16 PCIe connectors, the only way you might need any more is with a compute rig having seven single slot cards each needing two connectors. With the CPUs and GPUs both overclocked, our system was drawing almost 1500W at the wall (at a 240V source) under a high CPU+GPU load.
Using a 2P system as a desktop comes with its own set of issues, namely some CPU benchmarks not optimized for 2P, or in this case, some trouble getting some games to even work. It seems that the more money you can throw at a gaming system the more problems start to arise, but The Beast provides a nice comparison point when we look at high-end Ivy Bridge, Sandy Bridge-E and Piledriver processors in multiple-GPU setups.
Our first port of call with all our testing is CPU throughput analysis, using our regular motherboard review benchmarks.
Point Calculations - 3D Movement Algorithm Test
The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc. The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score. This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark. The benchmark is also parallel between particles simulated, and we test the single threaded performance as well as the multi-threaded performance.
As mentioned in previous reviews, this benchmark is written how most people would tackle the situation – using floating point numbers. This is also where Intel excels, compared to AMD’s decision to move more towards INT ops (such as hashing), which is typically linked to optimized code or normal OS behavior.
Compression - WinRAR x64 3.93 + WinRAR 4.2
With 64-bit WinRAR, we compress the set of files used in our motherboard USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible and provides a good test for when a system has variable threaded load. WinRAR 4.2 does this a lot better! If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.
Due to the late inclusion of 4.2, our results list for it is a little smaller than I would have hoped. But it is interesting to note that with the Core Parking updates, an FX-8350 overtakes an i5-2500K with MCT.
Image Manipulation - FastStone Image Viewer 4.2
FastStone Image Viewer is a free piece of software I have been using for quite a few years now. It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters. It also has a bulk image conversion tool, which we use here. The software currently operates only in single-thread mode, which should change in later versions of the software. For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.
In terms of pure single thread speed, it is worth noting the X6-1100T is leading the AMD pack.
Video Conversion - Xilisoft Video Converter 7
With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices. By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs. For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU. The time taken to convert these videos gives us our result.
XVC is a little odd in how it arranges its multicore processing. For our set of 33 videos, it will arrange them in batches of threads – so if we take the 8 thread FX-8350, it will arrange the videos into 4 batches of 8, and then a fifth batch of one. That final batch will only have one thread assigned to it (!), and will not get a full 8 threads worth of power. This is also why the 2x X5690 finishes in 6 seconds but the normal X5690 takes longer – you would expect a halving of time moving to two CPUs but XVC arranges the batches such that there is always one at the end that only gets a single thread.
Rendering – PovRay 3.7
The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.
The SMP engine in PovRay is not perfect, though scaling up in CPUs gives almost a 2x effect. The results from this test are great – here we see an FX-8350 CPU below an i7-3770K (with MCT), until the Core Parking updates are applied, meaning the FX-8350 performs better!
Video Conversion - x264 HD Benchmark
The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps. This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed. The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.
Grid Solvers - Explicit Finite Difference
For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it. This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step. By choosing a regular grid, we reduce the levels of memory access required for irregular grids. We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision. The grid is isotropic and the boundary conditions are sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.
Grid solvers do love a fast processor and plenty of cache in order to store data. When moving up to 3D, it is harder to keep that data within the CPU and spending extra time coding in batches can help throughput. Our simulation takes a very naïve approach in code, using simple operations.
Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method
The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations. This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread. The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing. For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision. Again our grid is isotropic with the boundaries acting as sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.
2D Implicit is harsher than an Explicit calculation – each thread needs more a lot memory, which only ever grows as the size of the simulation increases.
Point Calculations - n-Body Simulation
When a series of heavy mass elements are in space, they interact with each other through the force of gravity. Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other. When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code. We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.
As we only look at base/SSE2/SSE4 depending on the processor (auto-detection), we don’t see full AVX numbers in terms of FLOPs.
Our first analysis is with the perennial reviewers’ favorite, Metro 2033. It occurs in a lot of reviews for a couple of reasons – it has a very easy to use benchmark GUI that anyone can use, and it is often very GPU limited, at least in single GPU mode. Metro 2033 is a strenuous DX11 benchmark that can challenge most systems that try to run it at any high-end settings. Developed by 4A Games and released in March 2010, we use the inbuilt DirectX 11 Frontline benchmark to test the hardware at 1440p with full graphical settings. Results are given as the average frame rate from a second batch of 4 runs, as Metro has a tendency to inflate the scores for the first batch by up to 5%.
With one 7970 at 1440p, every processor is in full x16 allocation and there seems to be no split between any processor with 4 threads or above. Processors with two threads fall behind, but not by much as the X2-555 BE still gets 30 FPS. There seems to be no split between PCIe 3.0 or PCIe 2.0, or with respect to memory.
When we start using two GPUs in the setup, the Intel processors have an advantage, with those running PCIe 2.0 a few FPS ahead of the FX-8350. Both cores and single thread speed seem to have some effect (i3-3225 is quite low, FX-8350 > X6-1100T).
More results in favour of Intel processors and PCIe 3.0, the i7-3770K in an x8/x4/x4 surpassing the FX-8350 in an x16/x16/x8 by almost 10 frames per second. There seems to be no advantage to having a Sandy Bridge-E setup over an Ivy Bridge one so far.
While we have limited results, PCIe 3.0 wins against PCIe 2.0 by 5%.
From dual core AMD all the way up to the latest Ivy Bridge, results for a single GTX 580 are all roughly the same, indicating a GPU throughput limited scenario.
Similar to one GTX580, we are still GPU limited here.
Metro 2033 conclusion
A few points are readily apparent from Metro 2033 tests – the more powerful the GPU, the more important the CPU choice is, and that CPU choice does not matter until you get to at least three 7970s. In that case, you want a PCIe 3.0 setup more than anything else.
DiRT 3 is a rallying video game and the third in the Dirt series of the Colin McRae Rally series, developed and published by Codemasters. DiRT 3 also falls under the list of ‘games with a handy benchmark mode’. In previous testing, DiRT 3 has always seemed to love cores, memory, GPUs, PCIe lane bandwidth, everything. The small issue with DiRT 3 is that depending on the benchmark mode tested, the benchmark launcher is not indicative of game play per se, citing numbers higher than actually observed. Despite this, the benchmark mode also includes an element of uncertainty, by actually driving a race, rather than a predetermined sequence of events such as Metro 2033. This in essence should make the benchmark more variable, but we take repeated runs in order to smooth this out. Using the benchmark mode, DiRT 3 is run at 1440p with Ultra graphical settings. Results are reported as the average frame rate across four runs.
While the testing shows a pretty dynamic split between Intel and AMD at around the 82 FPS mark, all processors are roughly +/- 1 or 2 around this mark, meaning that even an A8-5600K will feel like the i7-3770K.
When reaching two GPUs, the Intel/AMD split is getting larger. The FX-8350 puts up a good fight against the i5-2500K and i7-2600K, but the top i7-3770K offers almost 20 FPS more and 40 more than either the X6-1100T or FX-8150.
Moving up to three GPUs and DiRT 3 is jumping on the PCIe bandwagon, enjoying bandwidth and cores as much as possible. Despite this, the gap to the best AMD processor is growing – almost 70 FPS between the FX-8350 and the i7-3770K.
At four GPUs, bandwidth wins out, and the PLX effect on the UP7 seems to cause a small dip compared to the native lane allocation on the RIVE (there could also be some influence due to 6 cores over 4).
Similar to the one 7970 setup, using one GTX 580 has a split between AMD and Intel that is quite noticeable. Despite the split, all the CPUs perform within 1.3 FPS, meaning no big difference.
Moving to dual GTX 580s, and while the split gets bigger, processors like the i3-3225 are starting to lag behind. The difference between the best AMD and best Intel processor is only 2 FPS though, nothing to write home about.
DiRT 3 conclusion
Much like Metro 2033, DiRT 3 has a GPU barrier and until you hit that mark, the choice of CPU makes no real difference at all. In this case, at two-way 7970s, choosing a quad core Intel processor does the business over the FX-8350 by a noticeable gap that continues to grow as more GPUs are added, (assuming you want more than 120 FPS).
A game that has plagued my testing over the past twelve months is Civilization V. Being on the older 12.3 Catalyst drivers were somewhat of a nightmare, giving no scaling, and as a result I dropped it from my test suite after only a couple of reviews. With the later drivers used for this review, the situation has improved but only slightly, as you will see below. Civilization V seems to run into a scaling bottleneck very early on, and any additional GPU allocation only causes worse performance.
Our Civilization V testing uses Ryan’s GPU benchmark test all wrapped up in a neat batch file. We test at 1440p, and report the average frame rate of a 5 minute test.
Civilization V is the first game where we see a gap when comparing processor families. A big part of what makes Civ5 perform at the best rates seems to be PCIe 3.0, followed by CPU performance – our PCIe 2.0 Intel processors are a little behind the PCIe 3.0 models. By virtue of not having a PCIe 3.0 AMD motherboard in for testing, the bad rap falls on AMD until PCIe 3.0 becomes part of their main game.
The power of PCIe 3.0 is more apparent with two 7970 GPUs, however it is worth noting that only processors such as the i5-2500K and above have actually improved their performance with the second GPU. Everything else stays relatively similar.
More cores and PCIe 3.0 are winners here, but no GPU configuration has scaled above two GPUs.
Again, no scaling.
While the top end Intel processors again take the lead, an interesting point is that now we have all PCIe 2.0 values for comparison, the non-hyper threaded 2500K takes the top spot, 10% higher than the FX-8350.
We have another Intel/AMD split, by virtue of the fact that none of the AMD processors scaled above the first GPU. On the Intel side, you need at least an i5-2500K to see scaling, similar to what we saw with the 7970s.
Civilization V conclusion
Intel processors are the clear winner here, though not one stands out over the other. Having PCIe 3.0 seems to be the positive point for Civilization V, but in most cases scaling is still out of the window unless you have a monster machine under your belt.
While not necessarily a game on everybody’s lips, Sleeping Dogs is a strenuous game with a pretty hardcore benchmark that scales well with additional GPU power. The team over at Adrenaline.com.br are supreme for making an easy to use benchmark GUI, allowing a numpty like me to charge ahead with a set of four 1440p runs with maximum graphical settings.
Sleeping Dogs seems to tax the CPU so little that the only CPU that falls behind by the smallest of margins is an E6400 (and the G465 which would not run the benchmark). Intel visually takes all the top spots, but AMD is all in the mix with less than 0.5 FPS splitting an X2-555 BE and an i7-3770K.
A split starts to develop between Intel and AMD again, although you would be hard pressed to choose between the CPUs as everything above an i3-3225 scores 50-56 FPS. The X2-555 BE unfortunately drops off, suggesting that Sleeping Dogs is a fan of the cores and this little CPU is a lacking.
At three GPUs the gap is there, with the best Intel processors over 10% ahead of the best AMD. Neither PCIe lane allocation or memory seems to be playing a part, just a case of threads then single thread performance.
Despite our Beast machine having double the threads, an i7-3960X in PCIe 3.0 mode takes top spot.
It is worth noting the scaling in Sleeping Dogs. The i7-3960X moved from 28.2 -> 56.23 -> 80.85 -> 101.15 FPS, achieving +71% increase of a single card moving from 3 to 4. This speaks of a well written game more than anything.
There is almost nothing to separate every CPU when using a single GTX 580.
Same thing with two GTX 580s – even an X2-555 BE is within 1 FPS (3%) of an i7-3960X.
Sleeping Dogs Conclusion
Due to the successful scaling and GPU limited nature of Sleeping Dogs, almost any CPU you throw at it will get the same result. When you move into three GPUs or more territory, it seems that having the single thread CPU speed of an Intel processor gets a few more FPS at the end of the day.
After testing for this review, one thing is clear in my mind – the performance of CPUs paired with a single GPU is hitting a limit. As games get more complex, those designing the graphics and physics engines know that shifting calculations onto the GPU gives a greater boost in performance. If an engine is written to take advantage of the GPU, then the CPU does not really matter for the most part. If you can transfer textures over to the GPU and keep them in memory, the work of the CPU is essentially done apart from light maintenance or interfacing with the network.
Perhaps a better test would have been with more mid-range GPUs, such as 660 Tis or 7790s; with limited memory on the GPU itself, having that faster CPU and faster DDR3 memory might make a big difference. However the ecosystem may be that a gamer can buy a good GPU and not have to worry that the CPU might be a bit underpowered. Unless you need the performance of a big CPU, the big GPU should be a main priority if it means the CPU is less of a concern at the higher GPU/resolutions.
There is also scope for those using less powerful GPUs, such that the CPU could matter a lot more in this scenario. With limited memory, the CPU would have to organize more texture copies between the memory and the GPU, causing other aspects of the system to become the limiting factor. This is very important when interpreting our results. However, our results for our testing scenarios show several points worth noting.
Firstly, it is important to test both accurately, fairly, and with a good will. Choosing to perform a comparative test when misleading the audience by not understanding how it works underneath is a poor game to play. Leave the bias at home, let the results do the talking.
In three of our games, having a single GPU make almost no difference to what CPU performs the best. Civilization V was the sole exception, which also has issues scaling when you add more GPUs if you do not have the most expensive CPUs on the market. For Civilization V, I would suggest having only a single GPU and trying to get the best out of it.
In DiRT 3, Sleeping Dogs and Metro 2033, almost every CPU performed the same in a single GPU setup. Moving up the GPUs and DiRT 3 leaned towards PCIe 3.0 above two GPUs, Metro 2033 started to lean towards Intel CPUs and Sleeping Dogs needed CPU power when scaling up.
Above three GPUs, the extra horsepower from the single thread performance of an Intel CPU starts to make sense, with as much as 70 FPS difference in DiRT 3. Sleeping Dogs also starts to become sensitive to CPU choice.
We Know What Is Missing
On my list of future updates to this article, we need an i5-3570K processor, as well as dual and tri-module Piledriver and an i7-920 for a roundup. I will have a short window soon to rummage in a large storeroom of processors, which will be a prime opportunity for some of the harder to acquire CPUs. Haswell is just around the corner and should provide an interesting update to data points across the spectrum, in most of its desktop forms. From now on I will aim to cover all the different PCIe lane allocations in a chipset, as well as some of those odd ones caused by PLX chips.
If you have a specific processor you would like me to test for a future article, please leave a note below in the comments, and we will try to cover it. :) Top of that list is an i5-3750K, followed by Haswell, then some more AMD cores. I have 29 more processors on my 'ideal' list (if I can get them), but if anyone has any suggestions that I may not have thought of, please let me know. If I am able to get a hold of Titans, I may be in a position to retest across the board for NVIDIA results, meaning another benchmark or two as well (Bioshock Infinite perhaps).
Recommendations for the Games Tested at 1440p/Max Settings
A CPU for Single GPU Gaming: A8-5600K + Core Parking updates
If I were gaming today on a single GPU, the A8-5600K (or non-K equivalent) would strike me as a price competitive choice for frame rates, as long as you are not a big Civilization V player and don’t mind the single threaded performance. The A8-5600K scores within a percentage point or two across the board in single GPU frame rates with both a HD7970 and a GTX580, as well as feels the same in the OS as an equivalent Intel CPU. The A8-5600K will also overclock a little, giving a boost, and comes in at a stout $110, meaning that some of those $$$ can go towards a beefier GPU or an SSD. The only downside is if you are planning some heavy OS work – if the software is Piledriver-aware all might be well, although most processing is not, and perhaps an i3-3225 or FX-8350 might be worth a look.
A CPU for Dual GPU Gaming: i5-2500K or FX-8350
Looking back through the results, moving to a dual GPU setup obviously has some issues. Various AMD platforms are not certified for dual NVIDIA cards for example, meaning while they may excel for AMD, you cannot recommend them for Team Green. There is also the dilemma that while in certain games you can be fairly GPU limited (Metro 2033, Sleeping Dogs), there are others were having the CPU horsepower can double the frame rate (Civilization V).
After the overview, my recommendation for dual GPU gaming comes in at the feet of the i5-2500K. This recommendation may seem odd – these chips are not the latest from Intel, but chances are that pre-owned they will be hitting a nice price point, especially if/when people move over to Haswell. If you were buying new, the obvious answer would be looking at an i5-3570K on Ivy Bridge rather than the 2500K, so consider this suggestion a minimum CPU recommendation.
On the AMD side, the FX-8350 puts up a good show across most of the benchmarks, but falls spectacularly in Civilization V. If this is not the game you are aiming for and want to invest AMD, then the FX-8350 is a good choice for dual GPU gaming.
A CPU for Tri-GPU Gaming: i7-3770K with an x8/x4/x4 (AMD) or PLX (NVIDIA) motherboard
By moving up in GPU power we also have to boost the CPU power in order to see the best scaling at 1440p. It might be a sad thing to hear but the only CPU in our testing that provides the top frame rates at this level is the top line Ivy Bridge model. For a comparison point, the Sandy Bridge-E 6-core results were often very similar, but the price jump to such as setup is prohibitive to all but the most sturdy of wallets.
As noted in the introduction, using 3-way on NVIDIA with Ivy Bridge will require a PLX motherboard in order to get enough lanes to satisfy the SLI requirement of x8 minimum per CPU. This also raises the bar in terms of price, as PLX motherboards start around the $280 mark. For a 3-way AMD setup, an x8/x4/x4 enabled motherboard performs similarly to a PLX enabled one, and ahead of the slightly crippled x8/x8 + x4 variations. However investing in a PLX board would help moving to a 4-way setup should that be your intended goal. In either scenario, at stock clocks, the i7-3770K is the processor of choice from our testing suite.
A CPU for Quad-GPU Gaming: i7-3770K with a PLX motherboard
A four-way GPU configuration is for those insane few users that have both the money and the physical requirement for pixel power. We are all aware of the law of diminishing returns, and more often than not adding that fourth GPU is taking the biscuit for most resolutions. Despite this, even at 1440p, we see awesome scaling in games like Sleeping Dogs (+73% of a single card moving from three to four cards) and more recently I have seen that four-way GTX680s help give BF3 in Ultra settings a healthy 35 FPS minimum on a 4K monitor. So while four-way setups are insane, there is clearly a usage scenario where it matters to have card number four.
Our testing was pretty clear as to what CPUs are needed at 1440p with fairly powerful GPUs. While the i7-2600K was nearly there in all our benchmarks, only two sets of CPUs made sure of the highest frame rates – the i7-3770K and any six-core Sandy Bridge-E. As mentioned in the three-way conclusion, the price barrier to SB-E is a big step for most users (even if they are splashing out $1500+ on four big cards), giving the nod to an Ivy Bridge configuration. Of course that i7-3770K CPU will have to be paired with a PLX enabled motherboard as well.
One could argue that with overclocking the i7-2600K could come into play, and I don’t doubt that is the case. People building three and four way GPU monsters are more than likely to run extra cooling and overclock. Unfortunately that adds plenty of variables and extra testing which will have to be made at a later date. For now our recommendation at stock, for 4-way at 1440p, is an i7-3770K CPU.
What to Take Away From Our Testing
Ultimately the spectrum for testing this sort of thing is huge - the minute you deal with multiple GPUs in a system, testing different GPUs, testing different resolutions, testing different quality settings, and then extrapolating those across the normal array of benchmarks we apply to a GPU test, we might as well spend a month just looking at a single CPU platform!
We know the testing done here today looks at a niche scenario - 1440p at Max Settings using very powerful GPUs. The trend in gaming, as I see it, will be towards the higher resolution panels, and with Korean 27" monitors coming into the market, if you're ok with that sort of monitor it is a direction to take to improve your gaming experience. 4K is on the horizon, which means either more pixel pushing power or lower resolutions/settings if you want the quality. Testing at 1440p/max settings is something I like to test as it pushes the GPU and hopefully the rest of the system - if you're a gamer, you want the best experience, and finding the hardware to do that is one of the most important things in that process (after getting good at the game you want).
So these results are offered in order to aid a purchasing decision based on our small sample size. No sample size is ever going to be big enough (unless you are able to test in Narnia), but we hope to expand on this in the future. Consider the data, read our conclusions - you may have a different interpretation of the data. Let us know what you think!