Original Link: http://www.anandtech.com/show/6985/choosing-a-gaming-cpu-at-1440p-adding-in-haswell-
Choosing a Gaming CPU at 1440p: Adding in Haswellby Ian Cutress on June 4, 2013 10:00 AM EST
A few weeks ago we released our first set of results to aid readers in deciding what CPU they may want for a new single or multi-GPU build. Today we add in some results for the top end Haswell CPU, the i7-4770K.
As you may have gathered from our initial Haswell coverage, we have had access to the processors for a couple of weeks now, and in that time we have had the opportunity to run through our gaming CPU tests as far as the motherboards we have had access to allows. We have had a variety of PCIe combinations (up to x8/x4/x4 including x8/x8+x1 and x8/x8+x4) worth testing to make sure you aim for the multi-GPU motherboard that fits best. We have also had a small amount of time to test a few more CPUs (Q9400, E6550) to fill out the roster a little.
In order to keep consistency, I want to this article to contain all the information we had in the previous article rather than just reference back – I personally find the measure of applying statistics to the data we obtain (and how we obtain it) very important. The new CPUs will be highlighted, and any adjustments to our conclusions will also be published. I also want to answer some of the questions raised from our last Gaming CPU article.
Where to Begin?
One question when building or upgrading a gaming system is of which CPU to choose - does it matter if I have a quad core from Intel, or a quad module from AMD? Perhaps something simpler will do the trick, and I can spend the difference on the GPU. What if you are running a multi-GPU setup, does the CPU have a bigger effect? This was the question I set out to help answer.
A few things before we start:
This set of results is a work in progress. For the sake of expediency I could not select 10 different gaming titles across a variety of engines and then test them in seven or more different configurations per game and per CPU, nor could I test every different CPU made. As a result, on the gaming side, I limited myself to one resolution, one set of settings, and four very regular testing titles that offer time demos: Metro2033, Dirt 3, Civilization V and Sleeping Dogs. This is obviously not Skyrim, Battlefield 3, Crysis 3 or Far Cry 3, which may be more relevant in your set up. The arguments for and against time demo testing as well as the arguments for taking FRAPs values of sequences are well documented (time demos might not be representative vs. consistency and realism of FRAPsing a repeated run across a field), however all of our tests can be run on home systems to get a feel for how a system performs. Below is a discussion regarding AI, one of the common usages for a CPU in a game, and how it affects the system. Out of our benchmarks, Dirt3 plays a game, including AI in the result, and the turn-based Civilization V has no concern for direct AI except for time between turns.
All this combines in with my unique position as the motherboard senior editor here at AnandTech – the position gives me access to a wide variety of motherboard chipsets, lane allocations and a fair number of CPUs. GPUs are not necessarily in a large supply in my side of the reviewing area, but both ASUS and ECS have provided my test beds with HD7970s and GTX580s respectively, such that they have been quintessential in being part of my test bed for 12 and 21 months. The task set before me in this review would be almost a career in itself if we were to expand to more GPUs and more multi-GPU setups. Thus testing up to 4x 7970 and up to 2x GTX 580 is a more than reasonable place to start.
Where It All Began
The most important point to note is how this set of results came to pass. Several months ago I came across a few sets of testing by other review websites that floored me – simple CPU comparison tests for gaming which were spreading like wildfire among the forums, and some results contradicted the general prevailing opinion on the topic. These results were pulling all sorts of lurking forum users out of the woodwork to have an opinion, and being the well-adjusted scientist I am, I set forth to confirm the results were, at least in part, valid. What came next was a shock – some had no real explanation of the hardware setups. While the basic overview of hardware was supplied, there was no run down of settings used, and no attempt to justify the findings which had obviously caused quite a stir. Needless to say, I felt stunned that the lack of verbose testing, as well as both the results and a lot of the conversation, particularly from avid fans of Team Blue and Team Red, that followed. I planned to right this wrong the best way I know how – with science!
The other reason for pulling together the results in this article is perhaps the one I originally started with – the need to update drivers every so often. Since Ivy Bridge release, I have been using Catalyst 12.3 and GeForce 296.10 WHQL on my test beds. This causes problems – older drivers are not optimized, readers sometimes complain if older drivers are used, and new games cannot be added to the test bed because they might not scale correctly due to the older drivers. So while there are some reviews on the internet that update drivers between testing and keeping the old numbers (leading to skewed results), actually taking time out to retest a number of platforms for more data points solely on the new drivers is actually a large undertaking. For example, testing new drivers over six platforms (CPU/motherboard combinations) would mean: six platforms, four games, seven different GPU configurations, ~10 minutes per test plus 2+ hours to set up each platform and install a new OS/drivers/set up benchmarks. That makes 40+ hours of solid testing (if all goes without a second lost here or there), or just over a full working week – more if I also test the CPU performance for a computational benchmark update, or exponentially more if I include multiple resolutions and setting options. If this is all that is worked on that week, it means no new content – so it happens rarely, perhaps once a year or before a big launch. This time was now, and when I started this testing, I was moving to Catalyst 13.1 and GeForce 310.90, which by the time this review goes live will have already been superseded! In reality, I have been slowly working on this data set for the best part of 10 weeks while also reviewing other hardware (but keeping those reviews with consistent driver comparisons). In total this review encapsulates 24 different CPU setups, with up to 6 different GPU configurations, meaning 430 data points, 1375 benchmark loops and over 51 hours in just GPU benchmarks alone.
What Does the CPU do in a Game?
A lot of game developers use customized versions of game engines, such as the EGO engine for driving games or the Unreal engine. The engine provides the underpinnings for a lot of the code, and the optimizations therein. The engine also decides what in the game gets offloaded onto the GPU.
Imagine the code that makes up the game as a linear sequence of events in order. In order to go through the game quickly, we need the fastest single core processor available. Of course, games are not like this – lots of the game can be parallelized, such as vector calculations for graphics. These were of course the first to be moved from CPU to the GPU. Over time, more parts of the code have made the move – physics and compute being the main features in recent months and years.
The GPU is good at independent, simple tasks – calculating which color is in which pixel is an example of this, along with addition processing and post-processing features (FXAA and so on). If a task is linear, it lives on the CPU, such as loading textures into memory or negotiating which data to transfer between the memory and the GPUs. The CPU also takes control of independent complex tasks, as the CPU is the one that can make complicated logic analysis.
Very few parts of a game come under this heading of ‘independent yet complex’. Anything suitable for the GPU but not ported over will be here, and the big one usually quoted is artificial intelligence. Deciding where an NPC is going to run, shoot or fly could be considered a very complex set of calculations, ideal for fast CPUs. The counter argument is that games have had complex AI for years – the number of times I personally was destroyed by a Dark Sim on Perfect Dark on the N64 is testament to either my uselessness or the fact that complex AI can be configured with not much CPU power. AI is unlikely to be a limiting factor in frame rates due to CPU usage.
What is most likely going to be the limiting factor is how the CPU can manage data. As engines evolve, they try and use data between the CPU, memory and GPUs less – if textures can be kept on the GPU, then they will stay there. But some engines are not as perfect as we would like them to be, resulting in the CPU as the limiting factor. As CPU performance increases, and those that write the engines in which games are made understand the ecosystem, CPU performance should be less of an issue over time. All roads point towards the PS4 of course, and its 8-core Jaguar processor. Is this all that is needed for a single GPU, albeit in a HSA environment?
Another angle I wanted to test beyond most other websites is multi-GPU. There is content online dealing mostly with single GPU setups, with a few for dual GPU. Even though the numbers of multi-GPU users is actually quite small globally, the enthusiast markets are clearly geared for it. We get motherboards with support for 4 GPU cards; we have cases that will support a dual processor board as well as four double-height GPUs. Then there are GPUs being released with two sets of silicon on a PCB, wrapped in a double or triple height cooler. More often than not on a forum, people will ask ‘what GPU for $xxx’ and some of the suggestions will be towards two GPUs at half the budget, as it commonly offers more performance than a single GPU if the game and the drivers all work smoothly (at the cost of power, heat, and bad driver scenarios). The ecosystem supports multi-GPU setups, so I felt it right to test at least one four-way setup. Although with great power comes great responsibility – there was no point testing 4-way 7970s on 1080p. Typically in this price bracket, users will go for multi-monitor setups, along the lines of 5760x1080, or big monitor setups like 1440p, 1600p, or the mega-rich might try 4K. Ultimately the high end enthusiast, with cash to burn, is going to gravitate towards 4K, and I cannot wait until that becomes a reality. So for a median point in all of this, we are testing at 1440p and maximum settings. This will put the strain on our Core2Duo and Celeron G465 samples, but should be easy pickings for our multi-processor, multi-GPU beast of a machine.
A Minor Problem In Interpreting Results
Throughout testing for this review, there were clearly going to be some issues to consider. Chiefly of which is one of consistency and in particular if something like Metro 2033 decides to have an ‘easy’ run which reports +3% higher than normal. For that specific example we get around this by double testing, as the easy run typically appears in the first batch – so we run two or three batches of four and disregard the first batch.
The other, perhaps bigger, issue is interpreting results. If I get 40.0 FPS on a Phenom II X4-960T, 40.1 FPS on an i5-2500K, and then 40.2 FPS on a Phenom II X2-555 BE, does that make the results invalid? The important points to recognize here are statistics and system state.
- System State: We have all had times when booting a PC and it feels sluggish, but this sluggish behavior disappears on reboot. The same thing can occur with testing, and usually happens as a result of bad initialization or a bad cache optimization routine at boot time. As a result, we try and spot these circumstances and re-run. With more time we would take 100 different measurements of each benchmark, with reboots, and cross out the outliers. Time constraints outside of academia unfortunately do not give us this opportunity.
- Statistics: System state aside, frame rate values will often fluctuate around an average. This will mean (depending on the benchmark) that the result could be +/- a few percentage points on each run. So what happens if you have a run of 4 time demos, and each of them are +2% above the ‘average’ FPS? From the outside, as you will not know the true average, you cannot say if it is valid as the data set is extremely small. If we take more runs, we can find the variance (the technical version of the term), the standard deviation, and perhaps represent the mean, median and mode of a set of results. As always, the main constraint in articles like these is time – the quicker to publish, the less testing, the larger the error bars and the higher likelihood that some results are going to be skewed because it just so happened to be a good/bad benchmark run. So the example given above of the X2-555 getting a better result is down to interpretation – each result might be +/- 0.5 FPS on average, and because they are all pretty similar we are actually more GPU limited. So it is more whether the GPU has a good/bad run in this circumstance.
For this example, I batched 100 runs of my common WinRAR test in motherboard testing, on an i5-2500K CPU with a Maximus V Formula. Results varied between 71 seconds and 74 seconds, with a large gravitation towards the lower end. To represent this statistically, we normally use a histogram, which separates the results up into ‘bins’ (e.g. 71.00 seconds to 71.25 seconds) of how accurate the final result has to be. Here is an initial representation of the data (time vs. run number), and a few histograms of that data, using a bin size of 1.00 s, 0.75s, 0.5s, 0.33s, 0.25s and 0.1s.
As we get down to the lower bin sizes, there is a pair of large groupings of results between ~71 seconds and ~ 72 seconds. The overall average/mean of the data is 71.88 due to the outliers around 74 seconds, with the median at 72.04 seconds and standard deviation of 0.660. What is the right value to report? Overall average? Peak? Average +/- standard deviation? With the results very skewed around two values, what happens if I do 1-3 runs and get ~71 seconds and none around ~72 seconds?
Statistics is clearly a large field, and without a large sample size, most numbers can be one-off results that are not truly reflective of the data. It is important to ask yourself every time you read a review with a result – how many data points went into that final value, and what analysis was performed?
For this review, we typically take 4 runs of our GPU tests each, except Civilization V which is extremely consistent +/- 0.1 FPS. The result reported is the average of those four values, minus any results we feel are inconsistent. At times runs have been repeated in order to confirm the value, but this will not be noted in the results.
Reporting the Minimum FPS
A lot of readers have noted in the past that they would like to see minimum FPS values. The minimum FPS is a good measure to the point to for the sake of ‘the worst gameplay experience’, but even with our testing, it would be an effort to report it. I know a lot of websites do report minimum FPS, but it is important to realize that:
In a test that places AI in the center of the picture, it can be difficult to remain consistent. Take for example a run of Dirt 3 – this runs a standard race with several AI cars in which anything can happen. If in one of the runs there is a big six-car crash, lots of elements will be going on, resulting in a severe dip in FPS. In this run I get a minimum 6 FPS, whereas in others I get a minimum ~40 FPS. Which is the right number to report? Technically it would be 6 FPS, but then for any CPU that did not have a big crash pile-up, it would look better when theoretically it has not been put to the test.
If I had the time to run 100 tests of each benchmark, I would happily provide histograms of data representing how often the minimum FPS value fluctuated between runs. But that just is not possible when finding a balance between complete testing and releasing results for you all to see.
While I admit that the time-demo benchmarks that are not AI dependent as such will have a more regular minimum FPS, the average FPS result allows the consistency of the run to be petered out. Ideally perhaps we should be reporting the standard deviation (which would help eliminate those stray ultra-low FPS values), but then that brings its own cavalcade of issues whether the run is mainly higher than average or lower than average, and will most likely not be a regular distribution but a skewed distribution.
While FCAT is a great way to test frame rates, it needs to be set up accordingly and getting data is not a simple run and gun for benchmark results as one would like – even more complicated in terms of data retrieval and analysis than FRAPS, which personally I tend not to touch with a barge pole. While I understand the merits of such a system, it would be ideal if a benchmark mode used FCAT in its own overlay to report data.
Why Test at 1440p? Most Gamers play at 1080p!
Obviously one resolution is not a catch all situation. There will be users on the cheapest 1080p screen money can buy, and those using tri-monitor setups who want peak performance. Having a multi-GPU test at 1080p is a little strange, personally, and ideally for those high end setups you really need to be pushing the pixels. While 1440p is not the de-facto standard, it provides an ideal mid-point in analysis. Take for example the Steam survey:
What we see is 30.73% of gamers running at 1080p, but 4.16% of gamers are above 1080p. If that applies to all of the 4.6 million gamers currently on steam, we are talking about ~200,000 individuals with setups bigger than 1080p playing games on Steam right now, who may or may not have to run at a lower resolution to get frame rates.
So 1080p is still the mainstay for gamers at large, but there is movement afoot to multi-monitor and higher resolution monitors. As a random point of data, personally my gaming rig does have a 1080p screen, but that is only because my two 1440p Korean panels are used for AnandTech review testing, such as this article.
The Bulldozer Challenge
Another purpose of this article was to tackle the problem surrounding Bulldozer and its derivatives, such as Piledriver and thus all Trinity APUs. The architecture is such that Windows 7, by default, does not accurately assign new threads to new modules – the ‘freshly installed’ stance is to double up on threads per module before moving to the next. By installing a pair of Windows Updates (which do not show in Windows Update automatically), we get an effect called ‘core parking’, which assigns the first series of threads each to its own module, giving it access to a pair of INT and an FP unit, rather than having pairs of threads competing for the prize. This affects variable threaded loading the most, particularly from 2 to 2N-2 threads where N is the number of modules in the CPU (thus 2 to 6 threads in an FX-8150). It should come as no surprise that games fall into this category, so we want to test with and without the entire core parking features in our benchmarks.
Hurdles with NVIDIA and 3-Way SLI on Ivy Bridge
Users who have been keeping up to date with motherboard options on Z77 will understand that there are several ways in order to put three PCIe slots onto a motherboard. The majority of sub-$250 motherboards will use three PCIe slots in an PCIe 3.0 x8/x8 + PCIe 2.0 x4 arrangement (meaning x8/x8 from the CPU and x4 from the chipset), allowing either two-way SLI or three-way Crossfire. Some motherboards will use a different Ivy Bridge lane allocation option such that we have a PCIe 3.0 x8/x4/x4 layout, giving three-way Crossfire but only two-way SLI. In fact in this arrangement, fitting the final x4 with a sound/raid card disables two-way SLI entirely.
This is due to a not widely publicized requirement of SLI – it needs at least an x8 lane allocation in order to work (either PCIe 2.0 or 3.0). Anything less than this on any GPU and you will be denied in the software. So putting in that third card will cause the second lane to drop to x4, disabling two-way SLI. There are motherboards that have a switch to change to x8/x8 + x4 in this scenario, but we are still capped at two-way SLI.
The only way to go onto 3-way or 4-way SLI is via a PLX 8747 enabled motherboard, which greatly enhances the cost of a motherboard build. This should be kept in mind when dealing with the final results.
It has come to my attention that even if the results were to come out X > Y, some users may call out that the better processor draws more power, which at the end of the day costs more money if you add it up over a year. For the purposes of this review, we are of the opinion that if you are gaming on a budget, then high-end GPUs such as the ones used here are not going to be within your price range. Simple fun gaming can be had on a low resolution, limited detail system for not much money – for example at a recent LAN I went to I enjoyed 3-4 hours of TF2 fun on my AMD netbook with integrated HD3210 graphics, even though I had to install the ultra-low resolution texture pack and mods to get 30+ FPS. But I had a great time, and thus the beauty of high definition graphics of the bigger systems might not be of concern as long as the frame rates are good. But if you want the best, you will pay for the best, even if it comes at the electricity cost. Budget gaming is fine, but this review is designed to focus at 1440p with maximum settings, which is not a budget gaming scenario.
Format Of This Article
On the next couple of pages, I will be going through in detail our hardware for this review, including CPUs, motherboards, GPUs and memory. Then we will move to the actual hardware setups, with CPU speeds and memory timings (with motherboards that actually enable XMP) detailed. Also important to note is the motherboards being used – for completeness I have tested several CPUs in two different motherboards because of GPU lane allocations. We are living in an age where PCIe switches and additional chips are used to expand GPU lane layouts, so much so that there are up to 20 different configurations for Z77 motherboards alone. Sometimes the lane allocation makes a difference, and it can make a large difference using three or more GPUs (x8/x4/x4 vs. x16/x8/x8 with PLX), even with the added latency sometimes associated with the PCIe switches. Our testing over time will include the majority of the PCIe lane allocations on modern setups – for our first article we are looking at the major ones we are likely to come across.
The results pages will start with a basic CPU analysis, running through my regular motherboard tests on the CPU. This should give us a feel for how much power each CPU has in dealing with mathematics and real world tests, both for integer operations (important on Bulldozer/Piledriver/Radeon) and floating point operations (where Intel/NVIDIA seem to perform best).
We will then move to each of our four gaming titles in turn, in our six different GPU configurations. As mentioned above, in GPU limited scenarios it may seem odd if a sub-$100 CPU is higher than one north of $300, but we hope to explain the tide of results as we go.
I hope this will be an ongoing project here at AnandTech, and over time we can add more CPUs, 4K testing, perhaps even show four-way Titan should that be available to us. The only danger is that on a driver or game change, it takes another chunk of time to get data! Any suggestions of course are greatly appreciated – drop me an email at email@example.com.
For an article like this getting a range of CPUs, which includes the most common and popular, is very important. I have been at AnandTech for just over two years now, and in that time we have had Sandy Bridge, Llano, Bulldozer, Sandy Bridge-E, Ivy Bridge, Trinity and Vishera, of which I tend to get supplied the top end processors of each generation for testing (as a motherboard reviewer, it is important to make the motherboard the limiting factor). A lot of users have jumped to one of these platforms, although a large number are still on Wolfdale (Core2), Nehalem, Westmere, Phenom II (Thuban/Zosma/Deneb) or Athlon II. I have attempted to pool all my AnandTech resources, contacts, and personal resources, together to get a good spread of the current ecosystem, with more focus on the modern end of the spectrum. It is worth nothing that a multi-GPU user is more likely to have the top line Ivy Bridge, Vishera or Sandy Bridge-E CPU, as well as a top range motherboard, rather than an old Wolfdale. Nevertheless, we will see how they perform. There are a few obvious CPU omissions that I could not obtain for this first review which will hopefully be remedied over time in our next update.
My criteria for obtaining CPUs was to use at least one from the most recent architectures, as well as a range of cores/modules/threads/speeds. The basic list as it stands is:
Cores / Modules
|A6-3650||Llano||FM1||4 (4)||2600||N/A||4 MB / None|
|A8-3850||Llano||FM1||4 (4)||2900||N/A||4 MB / None|
|A8-5600K||Trinity||FM2||2 (4)||3600||3900||4 MB / None|
|A10-5800K||Trinity||FM2||2 (4)||3800||4200||4 MB / None|
|Phenom II X2-555 BE||Callisto K10||AM3||2 (2)||3200||N/A||1 MB / 6 MB|
|Phenom II X4-960T||Zosma K10||AM3||4 (4)||3200||N/A||2 MB / 6 MB|
|Phenom II X6-1100T||Thuban K10||AM3||6 (6)||3300||3700||3 MB / 6 MB|
|FX-8150||Bulldozer||AM3+||4 (8)||3600||4200||8 MB / 8 MB|
|FX-8350||Piledriver||AM3+||4 (8)||4000||4200||8 MB / 8 MB|
Cores / Modules
|E6400||Conroe||775||2 (2)||2133||N/A||2 MB / None|
|E6550||Conroe||775||2 (2)||2333||N/A||4 MB / None|
|E6700||Conroe||775||2 (2)||2667||N/A||4 MB / None|
|Q9400||Yorkfield||775||4 (4)||2667||N/A||6 MB / None|
|Xeon X5690||Westmere||1366||6 (12)||3467||3733||1.5 MB / 12 MB|
|Celeron G465||Sandy Bridge||1155||1 (2)||1900||N/A||0.25 MB / 1.5 MB|
|Core i5-2500K||Sandy Bridge||1155||4 (4)||3300||3700||1 MB / 6 MB|
|Core i7-2600K||Sandy Bridge||1155||4 (8)||3400||3800||1 MB / 8 MB|
|Core i7-3930K||Sandy Bridge-E||2011||6 (12)||3200||3800||1.5 MB / 12 MB|
|Core i7-3960X||Sandy Bridge-E||2011||6 (12)||3300||3900||1.5 MB / 15 MB|
|Core i3-3225||Ivy Bridge||1155||2 (4)||3300||N/A||0.5 MB / 3 MB|
|Core i7-3770K||Ivy Bridge||1155||4 (8)||3500||3900||1 MB / 8 MB|
|Core i7-4770K||Haswell||1150||4 (8)||3500||3900||1 MB / 8 MB|
There omissions are clear to see, such as the i7-3570K, a dual core Llano/Trinity, a dual/tri module Bulldozer/Piledriver, i7-920, i7-3820, or anything Nehalem. These will hopefully be coming up in another review.
My first and foremost thanks go to both ASUS and ECS for supplying me with these GPUs for my test beds. They have been in and out of 60+ motherboards without any issue, and will hopefully continue. My usual scenario for updating GPUs is to flip AMD/NVIDIA every couple of generations – last time it was HD5850 to HD7970, and as such in the future we will move to a 7-series NVIDIA card or a set of Titans (which might outlive a generation or two).
The ASUS HD 7970 we use is the reference model at the 7970 launch, using GCN architecture, 2048 SPs at 925 MHz with 3 GB of 4.6 GHz GDDR5 memory. We have four cards to be used in 1x, 2x, 3x and 4x configurations where possible, also using PCIe 3.0 when enabled by default.
ECS is both a motherboard manufacturer and an NVIDIA card manufacturer, and while most of their VGA models are sold outside of the US, some do make it onto e-e-tailers like Newegg. This GTX 580 is also a reference model, with 512 CUDA cores at 772 MHz and 1.5 GB of 4 GHz GDDR5 memory. We have two cards to be used in 1x and 2x configurations at PCIe 2.0.
The CPU is not always the main part of the picture for this sort of review – the motherboard is equally important as the motherboard dictates how the CPU and the GPU communicates with each other, and what the lane allocation will be. As mentioned on the previous page, there are 20+ PCIe configurations for Z77 alone when you consider some boards are native, some use a PLX 8747 chip, others use two PLX 8747 chips, and about half of the Z77 motherboards on the market enable four PCIe 2.0 lanes from the chipset for CrossFireX use (at high latency). We have tried to be fair and take motherboards that may have a small premium but are equipped to deal with the job. As a result, some motherboards may also use MultiCore Turbo, which as we have detailed in the past, gives the top turbo speed of the CPU regardless of the loading.
As a result of this lane allocation business, each value in our review will be attributed to both a CPU, whether it uses MCT, and a lane allocation. This would mean something such as i7-3770K+ (3 - x16/x8/x8) would represent an i7-3770K with MCT in a PCIe 3.0 tri-GPU configuration. More on this below.
The ASUS Z87-Pro nominally offers an PCIe 3.0 x8/x8 layout with an additional PCIe 2.0 x4 from the PCH, however this starts in x1 mode without a change in the BIOS.
The MSI Z87A-GD65 gives us PCI 3.0 x8/x4/x4.
The Gigabyte Z87X-UD3H gives PCIe 3.0 x8/x8 + PCIe 2.0 x4, but this motherboard was tested with MCT off.
The ASUS Maximus V Formula has a three way lane allocation of x8/x4/x4 for Ivy Bridge, x8/x8 for Sandy Bridge, and enables MCT.
The Gigabyte Z77X-UP7 has a four way lane allocation of x16/x16, x16/x8/x8 and x8/x8/x8/x8, all via a PLX 8747 chip. It also has a single x16 that bypasses the PLX chip and is thus native, and all configurations enable MCT.
The Gigabyte G1.Sniper M3 is a little different, offering x16, x8/x8, or if you accidentally put the cards in the wrong slots, x16 + x4 from the chipset. This additional configuration is seen on a number of cheaper Z77 ATX motherboards, as well as a few mATX models. The G1.Sniper M3 also implements MCT as standard.
The ASRock X79 Professional is a PCIe 2.0 enabled board offering x16/x16, x16/x16/x8 and x16/x8/x8/x8.
The ASUS Maximus IV Extreme is a PCIe 3.0 enabled board offering the same as the ASRock, except it enables MCT by default.
For Westmere Xeons: The EVGA SR-2
Due to the timing of the first roundup, I was able to use an EVGA SR-2 with a pair of Xeons on loan from Gigabyte for our server testing. The SR-2 forms the basis of our beast machine below, and uses two Westmere-EP Xeons to give PCIe 2.0 x16/x16/x16/x16 via NF200 chips.
For Core 2 Duo: The MSI i975X Platinum PowerUp and ASUS Commando (P965)
The MSI is the motherboard I used for our quick Core2Duo comparison pipeline post in Q1 2013 – I still have it sitting on my desk, and it seemed apt to include it in this test. The MSI i975X Platinum PowerUp offers two PCIe 1.1 slots, capable of Crossfire up to x8/x8. I also rummaged through my pile of old motherboards and found the ASUS Commando with a CPU installed, and as it offered x16+x4, this was tested also.
Llano throws a little oddball into the mix, being a true quad core unlike Trinity. The A75-UD4H from Gigabyte was the first one to hand, and offers two PCIe slots at x8/x8. Like the Core2Duo setup, we are not SLI enabled.
After finding an A8-3850 CPU as another point against the A6-3650, I pulled out the A75 Extreme6, which offers three-way CFX as x8/x8 + x4 from the chipset as well as the configurations offered by the A75-UD4H.
For Trinity: The Gigabyte F2A85X-UP4
Technically A85X motherboards for Trinity support up to x8/x8 in Crossfire, but the F2A85X-UP4, like other high end A85X motherboards, implements four lanes from the chipset for 3-way AMD linking. Our initial showing on three-way via that chipset linking was not that great, and this review will help quantify this.
For AM3: The ASUS Crosshair V Formula
As the 990FX covers a lot of processor families, the safest place to sit would be on one of the top motherboards available. Technically the Formula-Z is newer and supports Vishera easier, but we have not had the Formula-Z in to test, and the basic Formula was still able to run an FX-8350 as long as we kept the VRMs cool as a cucumber. The CVF offers up to three-way CFX and SLI testing (x16/x8/x8).
Our good friends at G.Skill are putting their best foot forward in supplying us with high end kits to test. The issue with the memory is more dependent on what the motherboard will support – in order to keep testing consistent, no overclocks were performed. This meant that boards and BIOSes limited to a certain DRAM multiplier were set at the maximum multiplier possible. In order to keep things fairer overall, the modules were adjusted for tighter timings. All of this is noted in our final setup lists.
Our main memory testing kit is our trusty G.Skill 4x4 GB DDR3-2400 9-11-11 1.65 V RipjawsX kit which has been part of our motherboard testing for over twelve months. For times when we had two systems being tested side by side, a G.Skill 4x4 GB DDR3-2400 10-12-12 1.65 V TridentX kit was also used.
For The Beast, which is one of the systems that has the issue with higher memory dividers, we pulled in a pair of tri-channel kits from X58 testing. These are high-end kits as well, currently discontinued as they tended to stop working with too much voltage. We have a sets of 3x2 GB OCZ Blade DDR3-2133 8-9-8 and 3x1 GB Dominator GT DDR3-2000 7-8-7 for this purpose, which we ran at 1333 6-7-6 due to motherboard limitations at stock settings.
Our Core2Duo CPUs clearly gets their own DDR2 memory for completeness. This is a 2x2 GB kit of OCZ Platinum DDR2-666 5-5-5.
For Haswell we were offered new kits for testing, this time from Corsair and their Vengeance Pro series. This is a 2x8 GB kit of DDR3-2400 10-12-12 1.65 V.
Testing Methodology, Hardware Configurations
To start, we want to thank the many manufacturers who have donated kit for our test beds in order to make this review, along with many others, possible.
Thank you to OCZ for providing us with 1250W Gold Power Supplies.
Thank you to G.Skill for providing us with memory kits.
Thank you to Corsair for providing us with an AX1200i PSU and 16GB 2400C10 memory.
Thank you to ASUS for providing us with the AMD GPUs and some IO Testing kit.
Thank you to ECS for providing us with the NVIDIA GPUs.
Thank you to Corsair for providing us with the Corsair H80i CLC.
Thank you to Rosewill for providing us with the 500W Platinum Power Supply for mITX testing, BlackHawk Ultra, and 1600W Hercules PSU for extreme dual CPU + quad GPU testing, and RK-9100 keyboards.
Thank you to Gigabyte for providing us with the X5690 CPUs.
Also many thanks go to the manufacturers who over the years have provided review samples which contribute to this review.
In order to keep the testing fair, we set strict rules in place for each of these setups. For every new chipset, the SSD was formatted and a fresh installation of the OS was applied. The chipset drivers for the motherboard were installed, along with NVIDIA drivers then AMD drivers. The games were preinstalled on a second partition, but relinked to ensure they worked properly. The games were then tested as follows:
Metro 2033: Benchmark Mode, two runs of four scenes of Frontline at 1440p, max settings. First run of four is discarded, average of second run is taken (minus outliers).
Dirt3: Benchmark Mode, four runs of the first scene with 8 cars at 1440p, max settings. Average is taken.
Civilization V: One five minute run of the benchmark mode accessible at the command line, at 1440p and max settings. Results produced are total frames in sets of 60 seconds, average taken.
Sleeping Dogs: Using the Adrenaline benchmark software, four scenes at 1440p in Ultra settings. Average is taken.
If the platform was being used for the next CPU (e.g. Maximus V Formula, moving from FX-8150 to FX-8350), no need to reinstall. If the platform is changed for the next test, a full reinstall and setup takes place.
How to Read This Review
Due to the large number of different variables in our review, it is hard to accurately label each data point with all the information about that setup. It also stands to reason that just putting the CPU model is also a bad idea when the same CPU could be in two different motherboards with different GPU lane allocations. There is also the memory aspect to consider, as well as if a motherboard uses MCT at stock. Here is a set of labels correlating to configurations you will see in this review:
CPU[+] [CP] (PCIe version – lane allocation to GPUs [PLX])
First is the name of the CPU, then an optional + identifier for MCT enabled motherboards. CP indicates we are dealing with a Bulldozer derived CPU and using the Core Parking updates. Inside the circular brackets is the PCIe version of the lanes we are dealing with, along with the lane allocation to each GPU. The final flag is if a PLX chip is involved in lane allocation.
A10-5800K (2 – x16/x16): A10-5800K with two GPUs in PCIe 2.0 mode
A10-5800K (CP) (2 – x16/x16): A10-5800K using Core Parking updates with two GPUs in PCIe 2.0 mode
FX-8350K (2 – x16/x16/x8): FX-8350 with three GPUs in PCIe 2.0 mode
i7-3770K (3+2 – x8/x8 + x4): i7-3770K powering three GPUs in PCIe 3.0 but the third GPU is using the PCIe 2.0 x4 from the chipset
i7-3770K+ (3 – x16): i7-3770K (with MCT) powering one GPU in PCIe 3.0 mode
i7-3770K+ (3 – x8/x8/x8/x8 PLX): i7-3770K (with MCT) powering four GPUs in PCIe 3.0 via a PLX chip
Common Configuration Points
All the system setups below have the following consistent configurations points:
- A fresh install of Windows 7 Ultimate 64-bit
- Either an Intel Stock CPU Cooler, a Corsair H80i CLC or Thermalright TRUE Copper
- OCZ 1250W Gold ZX Series PSUs (Rosewill 1600W Hercules for The Beast)
- Up to 4x ASUS AMD HD 7970 GPUs, using Catalyst 13.1
- Up to 2x ECS NVIDIA GTX 580 GPUs, using GeForce WHQL 310.90
- SSD Boot Drives, either OCZ Vertex 3 128 GB or Kingston HyperX 120 GB
- LG GH22NS50 Optical Drives
- Open Test Beds, either a DimasTech V2.5 EasyHard or a CoolerMaster Test Lab
An asterisk (*) indicates the new data for this update.
A6-3650 + Gigabyte A75-UD4H + 16GB DDR3-1866 8-10-10
A8-3850 + ASRock A75 Extreme6 + 16GB DDR3 1866 8-10-10
A8-5600K + Gigabyte F2A85-UP4 + 16GB DDR3-2133 9-10-10
A10-5800K + Gigabyte F2A85-UP4 + 16GB DDR3-2133 9-10-10
X2-555 BE + ASUS Crosshair V Formula + 16GB DDR3 1600 8-8-8
X4-960T + ASUS Crosshair V Formula + 16GB DDR3-1600 8-8-8
X6-1100T + ASUS Crosshair V Formula + 16GB DDR3-1600 8-8-8
FX-8150 + ASUS Crosshair V Formula + 16GB DDR3-2133 10-12-11
FX-8350 + ASUS Crosshair V Formula + 16GB DDR3-2133 9-11-10
FX-8150 + ASUS Crosshair V Formula + 16GB DDR3-2133 10-12-11 + CP
FX-8350 + ASUS Crosshair V Formula + 16GB DDR3-2133 9-11-10 + CP
E6400 + MSI i975X Platinum + 4GB DDR2-666 5-6-6
*E6400 + ASUS P965 Commando + 4GB DDR2-666 4-5-5
*E6550 + ASUS P965 Commando + 4GB DDR2-666 5-6-6
E6700 + ASUS P965 Commando + 4GB DDR2-666 4-5-5
*Q9400 + ASUS P965 Commando + 4GB DDR2-666 5-6-6
Xeon X5690 + EVGA SR-2 + 6GB DDR3 1333 6-7-7
2x Xeon X5690 + EVGA SR-2 + 9GB DDR3 1333 6-7-7
Celeron G465 + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11
i5-2500K + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11
i7-2600K + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11
i3-3225 + ASUS Maximus V Formula + 16GB DDR3-2400 10-12-12
i7-3770K + Gigabyte Z77X-UP7 + 16GB DDR3-2133 9-11-11
i7-3770K + ASUS Maximus V Formula + 16GB DDR3-2400 9-11-11
i7-3930K + ASUS Rampage IV Extreme + 16GB DDR3-2133 10-12-12
i7-3960X + ASRock X79 Professional + 16GB DDR3-2133 10-12-12
*i7-4770K + Gigabyte Z87X-UD3H + 16GB DDR3-2400 10-12-12
*i7-4770K + ASUS Z87-Pro + 16GB DDR3-2400 10-12-12
*i7-4770K + MSI Z87A-GD65 Gaming + 16GB DDR3-2400 10-12-12
Point Calculations - 3D Movement Algorithm Test
The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc. The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score. This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark. The benchmark is also parallel between particles simulated, and we test the single threaded performance as well as the multi-threaded performance.
As mentioned in previous reviews, this benchmark is written how most people would tackle the situation – using floating point numbers. This is also where Intel excels, compared to AMD’s decision to move more towards INT ops (such as hashing), which is typically linked to optimized code or normal OS behavior.
The 4770K comes in top in single threaded performance, showcasing the IPC gains of the new architecture. This is also shown in multithreaded tests with MCT both off and on.
Compression - WinRAR x64 3.93 + WinRAR 4.2
With 64-bit WinRAR, we compress the set of files used in our motherboard USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible and provides a good test for when a system has variable threaded load. WinRAR 4.2 does this a lot better! If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.
Due to the late inclusion of 4.2, our results list for it is a little smaller than I would have hoped. But it is interesting to note that with the Core Parking updates, an FX-8350 overtakes an i5-2500K with MCT.
Image Manipulation - FastStone Image Viewer 4.2
FastStone Image Viewer is a free piece of software I have been using for quite a few years now. It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters. It also has a bulk image conversion tool, which we use here. The software currently operates only in single-thread mode, which should change in later versions of the software. For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.
In terms of pure single thread speed, it is worth noting the X6-1100T is leading the AMD pack, and the 4770K takes the top spot. .
Video Conversion - Xilisoft Video Converter 7
With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices. By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs. For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU. The time taken to convert these videos gives us our result.
XVC is a little odd in how it arranges its multicore processing. For our set of 33 videos, it will arrange them in batches of threads – so if we take the 8 thread FX-8350, it will arrange the videos into 4 batches of 8, and then a fifth batch of one. That final batch will only have one thread assigned to it (!), and will not get a full 8 threads worth of power. This is also why the 2x X5690 finishes in 6 seconds but the normal X5690 takes longer – you would expect a halving of time moving to two CPUs but XVC arranges the batches such that there is always one at the end that only gets a single thread.
Rendering – PovRay 3.7
The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.
The SMP engine in PovRay is not perfect, though scaling up in CPUs gives almost a 2x effect. The results from this test are great – here we see an FX-8350 CPU below an i7-3770K (with MCT), until the Core Parking updates are applied, meaning the FX-8350 performs better! The 4770K also has a chance to flex its compute muscles, performing almost as well as the 8-core Westmere CPU.
Video Conversion - x264 HD Benchmark
The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps. This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed. The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.
Grid Solvers - Explicit Finite Difference
For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it. This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step. By choosing a regular grid, we reduce the levels of memory access required for irregular grids. We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision. The grid is isotropic and the boundary conditions are sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.
Grid solvers do love a fast processor and plenty of cache in order to store data. When moving up to 3D, it is harder to keep that data within the CPU and spending extra time coding in batches can help throughput. Our simulation takes a very naïve approach in code, using simple operations.
Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method
The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations. This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread. The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing. For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision. Again our grid is isotropic with the boundaries acting as sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.
2D Implicit is harsher than an Explicit calculation – each thread needs more a lot memory, which only ever grows as the size of the simulation increases.
Point Calculations - n-Body Simulation
When a series of heavy mass elements are in space, they interact with each other through the force of gravity. Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other. When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code. We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.
As we only look at base/SSE2/SSE4 depending on the processor (auto-detection), we don’t see full AVX numbers in terms of FLOPs.
Our first analysis is with the perennial reviewers’ favorite, Metro 2033. It occurs in a lot of reviews for a couple of reasons – it has a very easy to use benchmark GUI that anyone can use, and it is often very GPU limited, at least in single GPU mode. Metro 2033 is a strenuous DX11 benchmark that can challenge most systems that try to run it at any high-end settings. Developed by 4A Games and released in March 2010, we use the inbuilt DirectX 11 Frontline benchmark to test the hardware at 1440p with full graphical settings. Results are given as the average frame rate from a second batch of 4 runs, as Metro has a tendency to inflate the scores for the first batch by up to 5%.
With one 7970 at 1440p, every processor is in full x16 allocation and there seems to be no split between any processor with 4 threads or above. Processors with two threads fall behind, but not by much as the X2-555 BE still gets 30 FPS. There seems to be no split between PCIe 3.0 or PCIe 2.0, or with respect to memory.
When we start using two GPUs in the setup, the Intel processors have an advantage, with those running PCIe 2.0 a few FPS ahead of the FX-8350. Both cores and single thread speed seem to have some effect (i3-3225 is quite low, FX-8350 > X6-1100T).
More results in favour of Intel processors and PCIe 3.0, the i7-3770K in an x8/x4/x4 surpassing the FX-8350 in an x16/x16/x8 by almost 10 frames per second. There seems to be no advantage to having a Sandy Bridge-E setup over an Ivy Bridge one so far.
While we have limited results, PCIe 3.0 wins against PCIe 2.0 by 5%.
From dual core AMD all the way up to the latest Ivy Bridge, results for a single GTX 580 are all roughly the same, indicating a GPU throughput limited scenario.
Similar to one GTX580, we are still GPU limited here.
Metro 2033 conclusion
A few points are readily apparent from Metro 2033 tests – the more powerful the GPU, the more important the CPU choice is, and that CPU choice does not matter until you get to at least three 7970s. In that case, you want a PCIe 3.0 setup more than anything else.
DiRT 3 is a rallying video game and the third in the Dirt series of the Colin McRae Rally series, developed and published by Codemasters. DiRT 3 also falls under the list of ‘games with a handy benchmark mode’. In previous testing, DiRT 3 has always seemed to love cores, memory, GPUs, PCIe lane bandwidth, everything. The small issue with DiRT 3 is that depending on the benchmark mode tested, the benchmark launcher is not indicative of game play per se, citing numbers higher than actually observed. Despite this, the benchmark mode also includes an element of uncertainty, by actually driving a race, rather than a predetermined sequence of events such as Metro 2033. This in essence should make the benchmark more variable, but we take repeated runs in order to smooth this out. Using the benchmark mode, DiRT 3 is run at 1440p with Ultra graphical settings. Results are reported as the average frame rate across four runs.
While the testing shows a pretty dynamic split between Intel and AMD at around the 82 FPS mark, all processors are roughly +/- 1 or 2 around this mark, meaning that even an A8-5600K will feel like the i7-3770K. The 4770K has a small but ultimately unnoticable advantage in gameplay.
When reaching two GPUs, the Intel/AMD split is getting larger. The FX-8350 puts up a good fight against the i5-2500K and i7-2600K, but the top i7-3770K offers almost 20 FPS more and 40 more than either the X6-1100T or FX-8150.
Moving up to three GPUs and DiRT 3 is jumping on the PCIe bandwagon, enjoying bandwidth and cores as much as possible. Despite this, the gap to the best AMD processor is growing – almost 70 FPS between the FX-8350 and the i7-3770K. The 4770K is slightly ahead of the 3770K at x8/x4/x4, suggesting a small IPC difference,
At four GPUs, bandwidth wins out, and the PLX effect on the UP7 seems to cause a small dip compared to the native lane allocation on the RIVE (there could also be some influence due to 6 cores over 4).
Similar to the one 7970 setup, using one GTX 580 has a split between AMD and Intel that is quite noticeable. Despite the split, all the CPUs perform within 1.3 FPS, meaning no big difference.
Moving to dual GTX 580s, and while the split gets bigger, processors like the i3-3225 are starting to lag behind. The difference between the best AMD and best Intel processor is only 2 FPS though, nothing to write home about.
DiRT 3 conclusion
Much like Metro 2033, DiRT 3 has a GPU barrier and until you hit that mark, the choice of CPU makes no real difference at all. In this case, at two-way 7970s, choosing a quad core Intel processor does the business over the FX-8350 by a noticeable gap that continues to grow as more GPUs are added, (assuming you want more than 120 FPS).
A game that has plagued my testing over the past twelve months is Civilization V. Being on the older 12.3 Catalyst drivers were somewhat of a nightmare, giving no scaling, and as a result I dropped it from my test suite after only a couple of reviews. With the later drivers used for this review, the situation has improved but only slightly, as you will see below. Civilization V seems to run into a scaling bottleneck very early on, and any additional GPU allocation only causes worse performance.
Our Civilization V testing uses Ryan’s GPU benchmark test all wrapped up in a neat batch file. We test at 1440p, and report the average frame rate of a 5 minute test.
Civilization V is the first game where we see a gap when comparing processor families. A big part of what makes Civ5 perform at the best rates seems to be PCIe 3.0, followed by CPU performance – our PCIe 2.0 Intel processors are a little behind the PCIe 3.0 models. By virtue of not having a PCIe 3.0 AMD motherboard in for testing, the bad rap falls on AMD until PCIe 3.0 becomes part of their main game.
The power of PCIe 3.0 is more apparent with two 7970 GPUs, however it is worth noting that only processors such as the i5-2500K and above have actually improved their performance with the second GPU. Everything else stays relatively similar.
More cores and PCIe 3.0 are winners here, but no GPU configuration has scaled above two GPUs.
Again, no scaling.
While the top end Intel processors again take the lead, an interesting point is that now we have all PCIe 2.0 values for comparison, the non-hyper threaded 2500K takes the top spot, 10% higher than the FX-8350.
We have another Intel/AMD split, by virtue of the fact that none of the AMD processors scaled above the first GPU. On the Intel side, you need at least an i5-2500K to see scaling, similar to what we saw with the 7970s.
Civilization V conclusion
Intel processors are the clear winner here, though not one stands out over the other. Having PCIe 3.0 seems to be the positive point for Civilization V, but in most cases scaling is still out of the window unless you have a monster machine under your belt.
While not necessarily a game on everybody’s lips, Sleeping Dogs is a strenuous game with a pretty hardcore benchmark that scales well with additional GPU power. The team over at Adrenaline.com.br are supreme for making an easy to use benchmark GUI, allowing a numpty like me to charge ahead with a set of four 1440p runs with maximum graphical settings.
Sleeping Dogs seems to tax the CPU so little that the only CPU that falls behind by the smallest of margins is an E6400 (and the G465 which would not run the benchmark). Intel visually takes all the top spots, but AMD is all in the mix with less than 0.5 FPS splitting an X2-555 BE and an i7-3770K.
A split starts to develop between Intel and AMD again, although you would be hard pressed to choose between the CPUs as everything above an i3-3225 scores 50-56 FPS. The X2-555 BE unfortunately drops off, suggesting that Sleeping Dogs is a fan of the cores and this little CPU is a lacking.
At three GPUs the gap is there, with the best Intel processors over 10% ahead of the best AMD. Neither PCIe lane allocation or memory seems to be playing a part, just a case of threads then single thread performance.
Despite our Beast machine having double the threads, an i7-3960X in PCIe 3.0 mode takes top spot.
It is worth noting the scaling in Sleeping Dogs. The i7-3960X moved from 28.2 -> 56.23 -> 80.85 -> 101.15 FPS, achieving +71% increase of a single card moving from 3 to 4. This speaks of a well written game more than anything.
There is almost nothing to separate every CPU when using a single GTX 580.
Same thing with two GTX 580s – even an X2-555 BE is within 1 FPS (3%) of an i7-3960X.
Sleeping Dogs Conclusion
Due to the successful scaling and GPU limited nature of Sleeping Dogs, almost any CPU you throw at it will get the same result. When you move into three GPUs or more territory, it seems that having the single thread CPU speed of an Intel processor gets a few more FPS at the end of the day.
Because we have only managed to get hold of the top Haswell processor thus far, it is a little difficult to see where Haswell lies. On the front of it, Haswell is more than adequate in our testing scenario for a single GPU experience and will perform as well as a mid-range CPU. It is when you start moving up into more GPUs, more demanding games and higher resolutions when the big boys start to take control.
On almost all fronts, the i7-4770K is the preferred chip over anything Sandy Bridge-E, if not by virtue of the single threaded speed it is due to the price difference. Sandy Bridge-E is still there if you need the raw CPU horsepower for other things.
Our analysis also shows that without the proper configuration in the BIOS, having a GPU at PCIe 2.0 x1 is really bad for scaling. On the ASUS Z87 Pro, the third full-length PCIe slot is at x1 bandwidth, as it shares the four PCIe lanes from the chipset with other controllers on board – if it is moved up to PCIe 2.0 x4, then the other controllers are disabled. Nonetheless, scaling at either PCIe 2.0 x1 or x4 cannot compete with a proper PCIe 3.0 x8/x4/x4 setup.
Over the course of Haswell, we will update the results as we get hold of PLX enabled motherboards for some of those x8/x8/x8/x8 layouts, and not to mention the weird looking PCIe 3.0 x8/x4/x4 + PCIe x2.0 x4 layouts seen on a couple of motherboards in our Z87 motherboard preview.
As mentioned in our last Gaming CPU testing, the results show several points worth noting.
Firstly, it is important to test both accurately, fairly, and with a good will. Choosing to perform a comparative test when misleading the audience by not understanding how it works underneath is a poor game to play. Leave the bias at home, let the results do the talking.
In three of our games, having a single GPU make almost no difference to what CPU performs the best. Civilization V was the sole exception, which also has issues scaling when you add more GPUs if you do not have the most expensive CPUs on the market. For Civilization V, I would suggest having only a single GPU and trying to get the best out of it.
In Dirt3, Sleeping Dogs and Metro2033, almost every CPU performed the same in a single GPU setup. Moving up the GPUs and Dirt 3 leaned towards PCIe 3.0 above two GPUs, Metro 2033 started to lean towards AMD GPUs and Sleeping Dogs was agnostic.
Above three GPUs, the extra horsepower from the single thread performance of an Intel CPU was starting to make sense, with as much as 70 FPS difference in Dirt 3. Sleeping Dogs was also starting to become sensitive to CPU choice.
We Know What Is Missing
As it has only been a month or so since the last Gaming CPU update, and my hands being deep in Haswell testing, new CPUs have not been streaming through the mail. However, due to suggestions from readers and a little digging, I currently have the following list to acquire and test/retest:
Xeon E3-1220L v2
Athlon II X2 220
Athlon II X2 250
Athlon II X2 280
Athlon II X3 425
Athlon II X3 460
Phenom II X3 740
Phenom II X4 820
Phenom II X4 925
Phenom II X6 1045T
A8-5600K + Core Parking retest
A10-5800K + Core Parking retest
As you can imagine, that is quite a list, and I will be breaking it down into sections and updates for everyone.
But for now, onto our recommendations.
Recommendations for the Games Tested at 1440p/Max Settings
A CPU for Single GPU Gaming: A8-5600K + Core Parking updates
If I were gaming today on a single GPU, the A8-5600K (or non-K equivalent) would strike me as a price competitive choice for frame rates, as long as you are not a big Civilization V player and do not mind the single threaded performance. The A8-5600K scores within a percentage point or two across the board in single GPU frame rates with both a HD7970 and a GTX580, as well as feel the same in the OS as an equivalent Intel CPU. The A8-5600K will also overclock a little, giving a boost, and comes in at a stout $110, meaning that some of those $$$ can go towards a beefier GPU or an SSD. The only downside is if you are planning some heavy OS work – if the software is Piledriver-aware, all is well, although most processing is not, and perhaps an i3-3225 or FX-8350 might be worth a look.
It is possible to consider the non-IGP versions of the A8-5600K, such as the FX-4xxx variant or the Athlon X4 750K BE. But as we have not had these chips in to test, it would be unethical to suggest them without having data to back them up. Watch this space, we have processors in the list to test.
A CPU for Dual GPU Gaming: i5-2500K or FX-8350
Looking back through the results, moving to a dual GPU setup obviously has some issues. Various AMD platforms are not certified for dual NVIDIA cards for example, meaning while they may excel for AMD, you cannot recommend them for team Green. There is also the dilemma that while in certain games you can be fairly GPU limited (Metro 2033, Sleeping Dogs), there are others were having the CPU horsepower can double the frame rate (Civilization V).
After the overview, my recommendation for dual GPU gaming comes in at the feet of the i5-2500K. This recommendation may seem odd – these chips are not the latest from Intel, but chances are that pre-owned they will be hitting a nice price point, especially if/when people move over to Haswell. If you were buying new, the obvious answer would be looking at an i5-3570K on Ivy Bridge rather than the 2500K, so consider this suggestion a minimum CPU recommendation.
On the AMD side, the FX-8350 puts up a good show across most of the benchmarks, but falls spectacularly in Civilization V. If this is not the game you are aiming for and want to invest AMD, then the FX-8350 is a good choice for dual GPU gaming.
A CPU for Tri-GPU Gaming: i7-4770K with an x8/x4/x4 (AMD) or PLX (NVIDIA) motherboard
By moving up in GPU power we also have to boost the CPU power in order to see the best scaling at 1440p. It might be a sad thing to hear but the only CPUa in our testing that provide the top frame rates at this level are the top line Ivy Bridge and Haswell models. For a comparison point, the Sandy Bridge-E 6-core results were often very similar, but the price jump to such as setup is prohibitive to all but the most sturdy of wallets. Of course we would suggest Haswell over Ivy Bridge based on Haswell being that newer platform, but users who can get hold of the i7-3770K in a sale would reap the benefits.
As noted in the introduction, using 3-way on NVIDIA with Ivy Bridge/Haswell will require a PLX motherboard in order to get enough lanes to satisfy the SLI requirement of x8 minimum per CPU. This also raises the bar in terms of price, as PLX motherboards start around the $280 mark. For a 3-way AMD setup, an x8/x4/x4 enabled motherboard performs similarly to a PLX enabled one, and ahead of the slightly crippled x8/x8 + x4 variations. However investing in a PLX board would help moving to a 4-way setup should that be your intended goal. In either scenario, the i7-3770K or i7-4770K are the processors of choice from our testing suite.
A CPU for Quad-GPU Gaming: i7-3770K with a PLX motherboard
So our recommendation in four-way, based on results, would nominally be an i7-3770K. We cannot recommend the 4770K as of yet, as we have no data to back it up! Although this will be coming in the next update, and if any predictions are made, the 4770K would be the preferential chip based on single thread speed and the newer chip.
But even still, a four-way GPU configuration is for those insane few users that have both the money and the physical requirement for pixel power. We are all aware of the law of diminishing returns, and more often than not adding that fourth GPU is taking the biscuit for most resolutions. Despite this, even at 1440p, we see awesome scaling in games like Sleeping Dogs (+73% of a single card moving from three to four cards) and more recently I have seen that four-way GTX680s help give BF3 in Ultra settings a healthy 35 FPS minimum on a 4K monitor. So while four-way setups are insane, there is clearly a usage scenario where it matters to have card number four.
Our testing was pretty clear as to what CPUs are needed at 1440p with fairly powerful GPUs. While the i7-2600K was nearly there in all our benchmarks, only two sets of CPUs made sure of the highest frame rates – the i7-3770K/4770K and any six-core Sandy Bridge-E. As mentioned in the three-way conclusion, the price barrier to SB-E is a big step for most users (even if they are splashing out $1500+ on four big cards), giving the nod to an Ivy Bridge configuration. Of course that CPU will have to be paired with a PLX enabled motherboard as well.
One could argue that with overclocking the i7-2600K could come into play, and I do not doubt that is the case. People building three and four way GPU monsters are more than likely to run extra cooling and overclock. Unfortunately that adds plenty of variables and extra testing which will have to be made at a later date. For now our recommendation at stock, for 4-way at 1440p, is an i7-3770K CPU.
What We Have Not Tested
In the intro to this update, I addressed a couple of points regarding testing 1440p over 1080p, as well as reasons for not using FCAT or reporting minimum FPS. But one of the bigger issues brought up in the first Gaming CPU article comes from the multiplayer gaming perspective, when dealing with a 64-player map in BF3. This is going to be a CPU intensive situation for sure, dealing with the network interface to update the GPU and processing. The only issue from our side is repetitive testing. I focused a lot on the statistics of reporting benchmarking results, and trying to get a consistent MP environment for game testing that can be viewed at objectively is for all intents and purposes practically impossible. Sure I could play a few rounds in every configuration, but FPS numbers would be all over the place based on how the rounds went. I would not be happy on publishing such data and then basing recommendations from it.
The purpose of the data in this article is to help buying decisions based on the games at hand. As a reader who might play more strenuous games, it is clear that riding the cusp of a boundary between CPU performance might not be the best route, especially when modifications start coming into play that drag the frame rates right down, or cause more complex calculations to be performed. In that situation, it makes sense to play it safe with a more powerful processor, and as such our recommendations may not necessarily apply. The recommendations are trying to find a balance between performance, price, and the state of affairs tested in this article at the present time, and if a user knows that the future titles are going to be powerful and they need a system for the next 3-5 years, some future proofing is going to have to form part of the personal decision when it comes down to paying for hardware.
When I have friends or family who come up to me and said ‘I want to play X and have Y to spend’ (not an uncommon occurrence), I try and match what they want with their budget – gaming typically gets a big GPU to begin and then a processor to match depending on what sort of games they play. With more CPUs under our belt here at AnandTech, with an added element of understanding on where the data comes from and how it was obtained, we hope to help make such decisions.
As always, we are open to suggestions! I have had requests for Bioshock Infinite and Tomb Raider to be included – unfortunately each new driver update is still increasing performance for these titles, meaning that our numbers would not be relevant next quarter without a full retest. I will hopefully put them in the testing with the next driver update.