Original Link: http://www.anandtech.com/show/7189/choosing-a-gaming-cpu-september-2013
Choosing a Gaming CPU October 2013: i7-4960X, i5-4670K, Nehalem and Intel Updateby Ian Cutress on October 3, 2013 10:05 AM EST
Quad Core with Hyperthreading versus Quad Core
Back in April we launched our first set of benchmarks relating to which CPU we should choose for gaming. To that list we now add results from several Intel CPUs, including the vital data point of the quad core i5-4670K, some other Haswell CPUs, the new extreme i7-4960X processor and some vintage Nehalem CPUs we could not get hold of for the first round of results.
Many thanks go to GIGABYTE for the loan of the Haswell+Nehalem CPUs for this update and for use of an X58A-UD9.
The i5-4670K provides a salient data point in our testing – the question is always asked about whether having more cores makes a difference. Hyperthreading allows the processor to simulate extra cores, though sometimes at the expense of single thread speed of the secondary logical threads. The i5-4670K also lands on the budget side of the equation if we are talking pricing, currently retailing for $240 compared to the i7-4770K which is at $340. It is often suggested that the i5 overclockable equivalent should offer similar performance, and our inquisitive minds at AnandTech always want to set the important questions straight in our testing.
Alongside the i5-4670K, this update also tests an i5-4430, which at the time of testing is Intel’s slowest quad core part from the initial Haswell release. We are also waiting for the dual core parts to reach our testbeds so we can run our tests. We are also testing the ultimate high end processor, the newly released i7-4960X, offering six hyperthreaded cores at a 4.0 GHz turbo frequency. On the back of our Crystalwell testing, the CPU results from the i7-4750HQ are included, and at the request of some of our readers, I was also able to source a pair of Haswell Xeons for testing – the E3-1280 v3 and the E3-1285 v3. The difference between these two chips is solely the presence of the IGP on the 1285, which causes the official TDP to be raised by two watts. For users who need neither overclocking nor an IGP, the E3-1280 v3 is a potential choice with a slightly higher clock speed and all the benefits of a Xeon and with a $50 price difference.
Due to the time it takes to test any CPU for this article, it was near on impossible to go through all previous generations of processors from both AMD and Intel, let alone a wide variety to show where clock speeds and cache levels are important. However for this Intel update, three 1366 CPUs managed to pass my way for a few weeks. The top selling i7-920 is part of this trio, along with the i7-950 which acted as a slightly more expensive upgrade and the full-fat i7-990X which is the modern equivalent of the i7-4960X in terms of busting a wallet buckle or two. The first two in that list are quad cores with hyperthreading, whereas the i7-990X sits as a hexa-core. Clearly Nehalem (and Westmere) suffer an IPC disadvantage when it comes to Sandy, Ivy and Haswell, but it is important to test where such a ‘performance platform’ sits in the grand scheme of things.
WHERE IS THE AMD?!?
Next update! I currently several AMD CPUs in to test (Richland, Trinity, even a Sempron or two and a Llano) and have requested at least a half dozen more from various sources (Piledriver dual/quad module, Athlon II X4) as well as a CPU or two from AM2/AM2+. The Intel testing landed in my office first for testing, and it made sense to split them up into two separate articles. But rest assured, I hope that FX-6xxx, FX-4xxx and A10-6xxx numbers will be on their way soon. Of course, the FX-9590 and counterpart is also on my list as and when we can get hold of a media sample.
Your Games are Old and do not Consider Multiplayer!
This is not an uncommon criticism with this article and the format it takes. In order to be honest with my results, I have chosen titles which have ceased to be boosted by regular driver updates. Due to the level of testing (one CPU can be 20+ hours including setup, CPU tests and GPU tests) we need a stable platform for comparison. I go into detail on the next page on our testing procedure, but one important aspect for our testing is consistency and repeatability. Almost no MP scenario can offer this, while at the same time maintain a throughput of testing to at least remain partially relevant.
My next big update for games and drivers will be in 2014, hopefully with a GPU update. I hope this will entail more thorough testing (minimum FPS + average FPS), along with updates from our 580s to something powerful and PCIe 3.0 on the NVIDIA side. We are currently looking at Bioshock Infinite/Tomb Raider as possible avenues, and a couple of other titles look interesting.
Format Of This Article
On the next couple of pages, I will start by going through the reasons for this article. Many of the reasons are the same as the previous Haswell Update, but for consistency and clarity it makes sense to at least repeat them for new readers coming to read the results.
I will also list in detail our hardware for this review, including CPUs, motherboards, GPUs and memory. Then we will move to the actual hardware setups, with CPU speeds and memory timings detailed.
Also important to note are the motherboards being used – for completeness I have tested several CPUs in two different motherboards because of GPU lane allocations. We are living in an age where PCIe switches and additional chips are used to expand GPU lane layouts, so much so that there are up to 20 different configurations for Z77/Z87 motherboards alone. Sometimes the lane allocation makes a difference, and it can make a large difference using three or more GPUs (x8/x4/x4 vs. x16/x8/x8 with PLX), even with the added latency sometimes associated with the PCIe switches. Our testing over time will include the majority of the PCIe lane allocations on modern setups – for our first article we are looking at the major ones we are likely to come across.
The results pages will start with a basic CPU analysis, running through my regular motherboard tests on the CPU. This should give us a feel for how much power each CPU has in dealing with mathematics and real world tests, both for integer operations (important on Bulldozer/Piledriver/Radeon) and floating point operations (where Intel/NVIDIA seem to perform best).
We will then move to each of our four gaming titles in turn, in our six different GPU configurations. As mentioned in the next page, in GPU limited scenarios it may seem odd if a sub-$100 CPU is higher than one north of $300, but we hope to explain the tide of results as we go.
This will be an ongoing project here at AnandTech, and over time we can add more CPUs, indepth testing, perhaps even show an extreme four-way setup should that be available to us. The only danger is that on a driver or game change, it takes another chunk of time to get data! Any suggestions of course are greatly appreciated – drop me an email at email@example.com.
The Importance of Data
In order to keep consistency, I want to this article to contain all the information we had in the previous article rather than just reference back – I personally find the measure of applying statistics to the data we obtain (and how we obtain it) very important. The new CPUs will be highlighted, and any adjustments to our conclusions will also be published. I also want to answer some of the questions raised from our previous Gaming CPU articles.
Where to Begin?
One question when building or upgrading a gaming system is of which CPU to choose - does it matter if I have a quad core from Intel, or a quad module from AMD? Perhaps something simpler will do the trick, and I can spend the difference on the GPU. What if you are running a multi-GPU setup, does the CPU have a bigger effect? This was the question I set out to help answer.
A few things before we start:
For the sake of expediency I could not select 10 different gaming titles across a variety of engines and then test them in seven or more different configurations per game and per CPU, nor could I test every different CPU made. As a result, on the gaming side, I limited myself to one resolution, one set of settings, and four very regular testing titles that offer time demos: Metro2033, Dirt 3, Civilization V and Sleeping Dogs. This is obviously not Skyrim, Battlefield 3, Crysis 3 or Far Cry 3, which may be more relevant in your set up.
The arguments for and against time demo testing as well as the arguments for taking FRAPs values of sequences are well documented (time demos might not be representative vs. consistency and realism of FRAPsing a repeated run across a field), however all of our tests can be run on home systems to get a feel for how a system performs. Below is a discussion regarding AI, one of the common usages for a CPU in a game, and how it affects the system. Out of our benchmarks, Dirt3 plays a game, including AI in the result, and the turn-based Civilization V has no concern for direct AI except for time between turns.
All this combines in with my unique position as the motherboard senior editor here at AnandTech – the position gives me access to a wide variety of motherboard chipsets, lane allocations and a fair number of CPUs. GPUs are not necessarily in a large supply in my side of the reviewing area, but both ASUS and ECS have provided my test beds with HD7970s and GTX580s respectively, such that they have been quintessential in being part of my test bed for almost two years. The task set before me in this review would be almost a career in itself if we were to expand to more GPUs and more multi-GPU setups. Thus testing up to 4x 7970 and up to 2x GTX 580 is a more than reasonable place to start.
Where It All Began
The most important point to note is how this set of results came to pass. In 2012 I came across a few sets of testing by other review websites that floored me – simple CPU comparison tests for gaming which were spreading like wildfire among the forums, and some results contradicted the general prevailing opinion on the topic. These results were pulling all sorts of lurking forum users out of the woodwork to have an opinion, and being the well-adjusted scientist I am, I set forth to confirm the results were, at least in part, valid. What came next was a shock – some of the previous explanations posted online had no real explanation of the hardware setups. While the basic overview of hardware was supplied, there was no run down of settings used, and no attempt to justify the findings which had obviously caused quite a stir. Needless to say, I felt stunned that the lack of verbose testing, as well as both the results and a lot of the conversation, particularly from avid fans of Team Blue and Team Red, that followed. I planned to right this wrong the best way I know how – with science!
The other reason for pulling together the results in this article is perhaps the one I originally started with – the need to update drivers every so often. From Ivy Bridge release to Haswell I have been using Catalyst 12.3 and GeForce 296.10 WHQL on my test beds, despite the latest drivers were 13.1 and 320.90. This causes problems – older drivers are not optimized, readers sometimes complain if older drivers are used, and new games cannot be added to the test bed because they might not scale correctly due to the older drivers. So while there are some reviews on the internet that update drivers between testing and keeping the old numbers (leading to skewed results), actually taking time out to retest a number of platforms for more data points solely on the new drivers is actually a large undertaking. For example, testing new drivers over six platforms (CPU/motherboard combinations) would mean: six platforms, four games, seven different GPU configurations, ~10 minutes per test plus 2+ hours to set up each platform and install a new OS/drivers/set up benchmarks. That makes 20-40+ hours of solid testing (if all goes well), or up to a full working week per CPU – more if I also test the CPU performance for a computational benchmark update, or exponentially more if I include multiple resolutions and setting options. If this is all that is worked on that week, it means no new content – so it happens rarely, perhaps once a year or before a big launch. This time was now, and when I started this testing, I was moving to Catalyst 13.1 and GeForce 310.90, which by the time the first part of the review went live was already been superseded! Now in the official Part 2, we present the testing results from 49 different CPU and motherboard combinations.
Some initial AMD Testing from Part 1
What Does the CPU do in a Game?
A lot of game developers use customized versions of game engines, such as the EGO engine for driving games or the Unreal engine. The engine provides the underpinnings for a lot of the code, and the optimizations therein. The engine also decides what in the game gets offloaded onto the GPU.
Imagine the code that makes up the game as a linear sequence of events in order. In order to go through the game quickly, we need the fastest single core processor available. Of course, games are not like this – lots of the game can be parallelized, such as vector calculations for graphics. These vector calculations were of course the first to be moved from CPU to the GPU. Over time, more parts of the code have made the move – physics and compute being the main features in recent months and years.
The GPU is good at independent, simple tasks – calculating which color is in which pixel is an example of this, along with addition processing and post-processing features (FXAA and so on). If a task is linear, it lives on the CPU, such as loading textures into memory or negotiating which data to transfer between the memory and the GPUs. The CPU also takes control of independent complex tasks, as the CPU is the one that can make complicated logic analysis.
Very few parts of a game come under this heading of ‘independent yet complex’. Anything suitable for the GPU but not ported over will be here, and the big one usually quoted is artificial intelligence. Deciding where an NPC is going to run, shoot or fly could be considered a very complex set of calculations, ideal for fast CPUs. The counter argument is that games have had complex AI for years – the number of times I personally was destroyed by a Dark Sim on Perfect Dark on the N64 is testament to either my uselessness or the fact that complex AI can be configured with not much CPU power. AI is unlikely to be a limiting factor in frame rates due to CPU usage.
What is most likely going to be the limiting factor is how the CPU can manage data. As engines evolve, they try and move data between the CPU, memory and GPUs less – if textures can be kept on the GPU, then they will stay there. But some engines are not as perfect as we would like them to be, resulting in the CPU as the limiting factor. As CPU performance increases, and those that write the engines in which games are made understand the ecosystem, CPU performance should be less of an issue over time. All roads point towards the PS4 of course, and its 8-core Jaguar processor. Is this all that is needed for a single GPU, albeit in a HSA environment?
Another angle I wanted to test beyond most other websites is multi-GPU. There is content online dealing mostly with single GPU setups, with a few for dual GPU. Even though the numbers of multi-GPU users is actually quite small globally, the enthusiast markets are clearly geared for it. We get motherboards with support for 4 GPU cards; we have cases that will support a dual processor board as well as four double-height GPUs. Then there are GPUs being released with two sets of silicon on a PCB, wrapped in a double or triple height cooler. More often than not on a forum, people will ask ‘what GPU for $xxx’ and some of the suggestions will be towards two GPUs at half the budget, as it commonly offers more performance than a single GPU if the game and the drivers all work smoothly (at the cost of power, heat, and bad driver scenarios). The ecosystem supports multi-GPU setups, so I felt it right to test at least one four-way setup. Although with great power comes great responsibility – there was no point testing 4-way 7970s on 1080p. Typically in this price bracket, users will go for multi-monitor setups, along the lines of 5760x1080, or big monitor setups like 1440p, 1600p, or the mega-rich might try 4K. Ultimately the high end enthusiast, with cash to burn, is going to gravitate towards 4K, and I cannot wait until that becomes a reality. So for a median point in all of this, we are testing at 1440p and maximum settings. This will put the strain on our Core2Duo and Celeron G465 samples, but should be easy pickings for our multi-processor, multi-GPU beast of a machine.
A Minor Problem In Interpreting Results
Throughout testing for this review, there were clearly going to be some issues to consider. Chiefly of which is one of consistency and in particular if something like Metro 2033 decides to have an ‘easy’ run which reports +3% higher than normal. For that specific example we get around this by double testing, as the easy run typically appears in the first batch – so we run two or three batches of four and disregard the first batch.
The other, perhaps bigger, issue is interpreting results. If I get 40.0 FPS on a Phenom II X4-960T, 40.1 FPS on an i5-2500K, and then 40.2 FPS on a Phenom II X2-555 BE, does that make the results invalid? The important points to recognize here are statistics and system state.
- System State: We have all had times when booting a PC and it feels sluggish, but this sluggish behavior disappears on reboot. The same thing can occur with testing, and usually happens as a result of bad initialization or a bad cache optimization routine at boot time. As a result, we try and spot these circumstances and re-run. With more time we would take 100 different measurements of each benchmark, with reboots, and cross out the outliers. Time constraints outside of academia unfortunately do not give us this opportunity.
- Statistics: System state aside, frame rate values will often fluctuate around an average. This will mean (depending on the benchmark) that the result could be +/- a few percentage points on each run. So what happens if you have a run of 4 time demos, and each of them are +2% above the ‘average’ FPS? From the outside, as you will not know the true average, you cannot say if it is valid as the data set is extremely small. If we take more runs, we can find the variance (the technical version of the term), the standard deviation, and perhaps represent the mean, median and mode of a set of results. As always, the main constraint in articles like these is time – the quicker to publish, the less testing, the larger the error bars and the higher likelihood that some results are going to be skewed because it just so happened to be a good/bad benchmark run. So the example given above of the X2-555 getting a better result is down to interpretation – each result might be +/- 0.5 FPS on average, and because they are all pretty similar we are actually more GPU limited. So it is more whether the GPU has a good/bad run in this circumstance.
For this example, I batched 100 runs of my common WinRAR test in motherboard testing, on an i5-2500K CPU with a Maximus V Formula. Results varied between 71 seconds and 74 seconds, with a large gravitation towards the lower end. To represent this statistically, we normally use a histogram, which separates the results up into ‘bins’ (e.g. 71.00 seconds to 71.25 seconds) of how accurate the final result has to be. Here is an initial representation of the data (time vs. run number), and a few histograms of that data, using a bin size of 1.00 s, 0.75s, 0.5s, 0.33s, 0.25s and 0.1s.
As we get down to the lower bin sizes, there is a pair of large groupings of results between ~71 seconds and ~ 72 seconds. The overall average/mean of the data is 71.88 due to the outliers around 74 seconds, with the median at 72.04 seconds and standard deviation of 0.660. What is the right value to report? Overall average? Peak? Average +/- standard deviation? With the results very skewed around two values, what happens if I do 1-3 runs and get ~71 seconds and none around ~72 seconds?
Statistics is clearly a large field, and without a large sample size, most numbers can be one-off results that are not truly reflective of the data. It is important to ask yourself every time you read a review with a result – how many data points went into that final value, and what analysis was performed?
For this review, we typically take 4 runs of our GPU tests each, except Civilization V which is extremely consistent +/- 0.1 FPS. The result reported is the average of those four values, minus any results we feel are inconsistent. At times runs have been repeated in order to confirm the value, but this will not be noted in the results.
Reporting the Minimum FPS
A lot of readers have noted in the past that they would like to see minimum FPS values. The minimum FPS is a good measure to the point to for the sake of ‘the worst gameplay experience’, but even with our testing, it would be an effort to go back and retest all scenarios and report it. I know a lot of websites do report minimum FPS, but it is important to realize that:
In a test that places AI in the center of the picture, it can be difficult to remain consistent. Take for example a run of Dirt 3 – this runs a standard race with several AI cars in which anything can happen. If in one of the runs there is a big six-car crash, lots of elements will be going on, resulting in a severe dip in FPS. In this run I get a minimum 6 FPS, whereas in others I get a minimum ~40 FPS. Which is the right number to report? Technically it would be 6 FPS, but then for any CPU that did not have a big crash pile-up, it would look better when theoretically it has not been put to the test.
If I had the time to run 100 tests of each benchmark, I would happily provide histograms of data representing how often the minimum FPS value fluctuated between runs. But that just is not possible when finding a balance between complete testing and releasing results for you all to see.
Many sites to offer a plot of FPS against time, to show what the average FPS looks like and where the dips and how bad the ‘minimum FPS’ value actually looks like. In reality, this data set has a large amount of adjacent point averaging, meaning that the FPS reported is actually the result of the average FPS over the last 50-200 frames. If we were going for exact FPS, the time taken to render the frame would cause some of the data to jump about, especially in high pressure scenarios. In this regard, it is always important to question (especially if it is not specifically stated) how the benchmark software obtains the FPS data.
While I admit that the time-demo benchmarks that are not AI dependent as such will have a more regular minimum FPS, the average FPS result allows the consistency of the run to be petered out. Ideally perhaps we should be reporting the standard deviation (which would help eliminate those stray ultra-low FPS values), but then that brings its own cavalcade of issues whether the run is mainly higher than average or lower than average, and will most likely not be a regular distribution but a skewed distribution.
Nevertheless, due to the requests, I will endeavor to report our minimum FPS data when this article gets a new driver and GPU update in 2014. Due to the level of testing already performed, the minimum FPS obtained from this point in would contain a lot of holes and I would not feel comfortable reporting patchy data. Stay tuned for our next driver update (also game update) for this data.
While FCAT is a great way to test frame rates, it needs to be set up accordingly and getting data is not a simple run and gun for benchmark results as one would like – even more complicated in terms of data retrieval and analysis than FRAPS, which personally I tend not to touch with a barge pole. While I understand the merits of such a system, it would be ideal if a benchmark mode used FCAT in its own overlay to report data.
Why Test at 1440p? Most Gamers play at 1080p!
Obviously one resolution is not a catch all situation. There will be users on the cheapest 1080p screen money can buy, and those using tri-monitor setups who want peak performance. Having a multi-GPU test at 1080p is a little strange, personally, and ideally for those high end setups you really need to be pushing the pixels. While 1440p is not the de-facto standard, it provides an ideal mid-point in analysis. Take for example the Steam survey:
What we see is 30.73% of gamers running at 1080p, but 4.16% of gamers are above 1080p (1.25% above 1200p). If that applies to all of the 4.6 million gamers currently on Steam, we are talking about ~200,000 individuals with setups bigger than 1080p playing games on Steam right now (~57,500 bigger than 1200p), who may or may not have to run at a lower resolution to get frame rates.
So 1080p is still the mainstay for gamers at large, but there is movement afoot to multi-monitor and higher resolution monitors. As a random point of data, personally my gaming rig does have a 1080p screen, but that is only because my two 1440p Korean panels are used for AnandTech review testing, such as this article.
I do have a desire to push this test fully into 4K when I can get my hands on a 4K panel, despite the potential lack of immediate relevance in modern gaming. The push towards higher resolutions in the monitor space is happening, slowly but surely.
The Bulldozer Challenge
Another purpose of this article was to tackle the problem surrounding Bulldozer and its derivatives, such as Piledriver and thus all Trinity and Richland APUs. The architecture is such that Windows 7, by default, does not accurately assign new threads to new modules – the ‘freshly installed’ stance is to double up on threads per module before moving to the next. By installing a pair of Windows Updates (which do not show in Windows Update automatically), we get an effect called ‘core parking’, which assigns the first series of threads each to its own module, giving it access to a pair of INT and an FP unit, rather than having pairs of threads competing for the prize. This affects variable threaded loading the most, particularly from 2 to 2N-2 threads where N is the number of modules in the CPU (thus 2 to 6 threads in an FX-8150). It should come as no surprise that games fall into this category, so we want to test with and without the entire core parking features in our benchmarks.
Hurdles with NVIDIA and 3-Way SLI on Ivy Bridge/Haswell
Users who have been keeping up to date with motherboard options on Z77/Z87 will understand that there are several ways in order to put three PCIe slots onto a motherboard. The majority of sub-$250 motherboards will use three PCIe slots in an PCIe 3.0 x8/x8 + PCIe 2.0 x4 arrangement (meaning x8/x8 from the CPU and x4 from the chipset), allowing either two-way SLI or three-way Crossfire. Some motherboards will use a different lane allocation option such that we have a PCIe 3.0 x8/x4/x4 layout, giving three-way Crossfire but only two-way SLI. In fact in this arrangement, fitting the final x4 with a sound/raid card disables two-way SLI entirely.
This is due to a not widely publicized requirement of SLI – it needs at least an x8 lane allocation in order to work (either PCIe 2.0 or 3.0). Anything less than this on any GPU and you will be denied in the software. So putting in that third card will cause the second lane to drop to x4, disabling two-way SLI. There are motherboards that have a switch to change to x8/x8 + x4 in this scenario, but we are still capped at two-way SLI.
The only way to go onto 3-way or 4-way SLI is via a PLX 8747 enabled motherboard, which greatly enhances the cost of a motherboard build. This should be kept in mind when dealing with the final results.
It has come to my attention that even if the results were to come out X > Y, some users may call out that the better processor draws more power, which at the end of the day costs more money if you add it up over a year. For the purposes of this review, we are of the opinion that if you are gaming on a budget, then high-end GPUs such as the ones used here are not going to be within your price range. Simple fun gaming can be had on a low resolution, limited detail system for not much money – for example at a recent LAN I went to I enjoyed 3-4 hours of TF2 fun on my AMD netbook with integrated HD3210 graphics, even though I had to install the ultra-low resolution texture pack and mods to get 30+ FPS. But I had a great time, and thus the beauty of high definition graphics of the bigger systems might not be of concern as long as the frame rates are good. But if you want the best, you will pay for the best, even if it comes at the electricity cost. Budget gaming is fine, but this review is designed to focus at high resolutions with maximum settings, which is not a budget gaming scenario.
For an article like this getting a range of CPUs, which includes the most common and popular, is very important. I have been at AnandTech for just over two years now, and in that time we have had Sandy Bridge, Llano, Bulldozer, Sandy Bridge-E, Ivy Bridge, Trinity and Vishera, of which I tend to get supplied the top end processors of each generation for testing (as a motherboard reviewer, it is important to make the motherboard the limiting factor). A lot of users have jumped to one of these platforms, although a large number are still on Wolfdale (Core2), Nehalem, Westmere, Phenom II (Thuban/Zosma/Deneb) or Athlon II. I have attempted to pool all my AnandTech resources, contacts, and personal resources, together to get a good spread of the current ecosystem, with more focus on the modern end of the spectrum. It is worth nothing that a multi-GPU user is more likely to have the top line Ivy Bridge, Vishera or Sandy Bridge-E CPU, as well as a top range motherboard, rather than an old Wolfdale. As time progresses I hope to obtain greater ranges of CPU speeds, core counts, and caches to suit almost all tastes.
My criteria for obtaining CPUs was to use at least one from the most recent architectures, as well as a range of cores/modules/threads/speeds. The basic list as it stands is shown below, with the CPU.GPU on the left showing what we were able to test:
|CPU||GPU||Name||IGP||Socket||C / M (T)||Speed||Turbo||L2/L3|
|L2007||Nano||BGA400||1 (1)||1600||1 MB / -|
|CPU||GPU||Name||IGP||Socket||C / M (T)||Speed||Turbo||L2/L3|
|E-350||Fusion||FT1||2 (2)||1600||1 MB / -|
|A6-3650||Llano||FM1||4 (4)||2600||4 MB / -|
|A8-3850||Llano||FM1||4 (4)||2900||4 MB / -|
|A8-5600K||Trinity||FM2||2 (4)||3600||3900||4 MB / -|
|A10-5800K||Trinity||FM2||2 (4)||3800||4200||4 MB / -|
|A6-5200||Kabini||FT3||4 (4)||2000||2 MB / -|
|Callisto K10||AM3||2 (2)||3200||1 MB / 6 MB|
|Zosma K10||AM3||4 (4)||3200||2 MB / 6 MB|
|Thuban K10||AM3||6 (6)||3300||3700||3 MB / 6 MB|
|FX-8150||Bulldozer||AM3+||4 (8)||3600||4200||8 MB / 8 MB|
|FX-8350||Piledriver||AM3+||4 (8)||4000||4200||8 MB / 8 MB|
|CPU||GPU||Name||IGP||Socket||C / M (T)||Speed||Turbo||L2/L3|
|E6400||Conroe||775||2 (2)||2133||2 MB / -|
|E6550||Conroe||775||2 (2)||2333||4 MB / -|
|E6700||Conroe||775||2 (2)||2667||4 MB / -|
|Q9400||Yorkfield||775||4 (4)||2667||6 MB / -|
|Nehalem||1366||4 (8)||2667||2933||1 MB / 8 MB|
|Nehalem||1366||4 (8)||3067||3333||1 MB / 8 MB|
|Westmere||1366||6 (12)||3467||3733||1.5 MB / 12 MB|
|Westmere||1366||6 (12)||3467||3733||1.5 MB / 12 MB|
2 x Xeon
|Westmere||1366||12 (24)||3467||3733||1.5 MB / 12 MB|
|BGA1023||2 (2)||1100||0.5 MB / 2 MB|
|1155||1 (2)||1900||0.25 MB / 1.5 MB|
|1155||4 (4)||3300||3700||1 MB / 6 MB|
|1155||4 (8)||3400||3800||1 MB / 8 MB|
|2011||6 (12)||3200||3800||1.5 MB / 12 MB|
|2011||6 (12)||3300||3900||1.5 MB / 15 MB|
2 x Xeon
|2011||16 (32)||2900||3800||2 MB / 20 MB|
4 x Xeon
|2011||32 (64)||2600||3100||2 MB / 20 MB|
|Ivy Bridge||1155||2 (4)||3300||0.5 MB / 3 MB|
|Ivy Bridge||1155||4 (8)||3500||3900||1 MB / 8 MB|
|Ivy Bridge-E||2011||6 (12)||3600||4000||1.5 MB / 15 MB|
|Haswell||1150||4 (4)||3000||3200||1 MB / 6 MB|
|Haswell||1150||4 (4)||3400||3800||1 MB / 6 MB|
|Haswell||1150||4 (8)||3500||3900||1 MB / 8 MB|
1 MB / 6 MB
128 MB L4
|Haswell||1150||4 (8)||3600||4000||1 MB / 8 MB|
|Haswell||1150||4 (8)||3600||4000||1 MB / 8 MB|
Note: the indication on the left hand side is whether we have tested the CPU in terms of our CPU tests or our GPU tests. In certain circumstances GPU tests were unavailable, but the CPU tests provide interesting data points.
This is Part 2 of our Gaming CPU series, with Part 1 covering a basic range of CPUs and a Haswell update covering the i7-4770K. For Part 2 this is primarily an Intel 4670K/Nehalem update, whereas Part 3 of our testing will focus on the AMD side. I currently have many AMD CPUs in house (Richland, Trinity, K10) and am on the request list for a few more (Vishera, more Richland).
My first and foremost thanks go to both ASUS and ECS for supplying me with these GPUs for my test beds. They have been in and out of 60+ motherboards without any issue, and will hopefully continue. My usual scenario for updating GPUs is to flip AMD/NVIDIA every couple of generations – last time it was HD5850 to HD7970, and as such in the future we will move to a 7-series NVIDIA card or a set of Titans (which might outlive a generation or two).
The ASUS HD 7970 we use is the reference model at the 7970 launch, using GCN architecture, 2048 SPs at 925 MHz with 3 GB of 4.6 GHz GDDR5 memory. We had four cards to be used in 1x, 2x, 3x and 4x configurations where possible, also using PCIe 3.0 when enabled by default, although for this update we were limited to three.
ECS is both a motherboard manufacturer and an NVIDIA card manufacturer, and while most of their VGA models are sold outside of the US, some do make it onto e-e-tailers like Newegg. This GTX 580 is also a reference model, with 512 CUDA cores at 772 MHz and 1.5 GB of 4 GHz GDDR5 memory. We have two cards to be used in 1x and 2x configurations at PCIe 2.0.
The CPU is not always the main part of the picture for this sort of review – the motherboard is equally important as the motherboard dictates how the CPU and the GPU communicates with each other, and what the lane allocation will be. As mentioned on the previous page, there are 20+ PCIe configurations for Z87/Z77 alone when you consider some boards are native, some use a PLX 8747 chip, others use two PLX 8747 chips, and about half of the Z87/Z77 motherboards on the market enable four PCIe 2.0 lanes from the chipset for CrossFireX use (at high latency). We have tried to be fair and take motherboards that may have a small premium but are equipped to deal with the job. As a result, some motherboards may also use MultiCore Turbo, which as we have detailed in the past, gives the top turbo speed of the CPU regardless of the loading.
As a result of this lane allocation business, each value in our review will be attributed to both a CPU, whether it uses MCT, and a lane allocation.
|1150||Z87||ASUS Z87-Pro||PCIe 3.0 x8/x8 + PCIe 2.0 x4|
|MSI Z87-GD65 Gaming||PCIe 3.0 x8/x8/x4|
|GIGABYTE Z87X-UD3H||PCIe 3.0 x8/x8 + PCIe 2.0 x4|
|MSI Z87 XPower||PCIe 3.0 x8/x8/x8/x8 via PLX8747|
|1155||Z77||ASUS Maximus V Formula||PCIe 3.0 x8/x4/x4|
|GIGABYTE Z77X-UP7||PCIe 3.0 x8/x8/x8/x8 via PLX8747|
|GIGABYTE G1.Sniper M3||PCIe 3.0 x8/x8 or x16 + PCIe 2.0 x4|
|2011||X79||ASRock X79 Professional||PCIe 2.0 x16/x8/x8/x8|
|ASUS Rampage IV Extreme||PCIe 3.0 x16/x8/x8/x8|
|Gigabyte X79-UD3||PCIe 3.0 x16/x8/x8/x8|
|1366||X58||GIGABYTE X58A-UD9||PCIe 2.0 x16/x16/x16/x16 via NF200|
|ASRock X58 Extreme3||PCIe 2.0 x16/x16 + x4|
|5520||EVGA SR-2||PCIe 2.0 x16/x16/x16/x16 via NF200|
|775||975X||MSI Platinum Power Up||PCIe 1.1 x8/x8|
|P965||ASUS Commando||PCIe 1.1 x16 + x4|
|FM1||A75||GIGABYTE A75-UD4H||PCIe 2.0 x8/x8|
|ASRock A75 Extreme6||PCIe 2.0 x8/x8 + x4|
|FM2||A85X||GIGABYTE F2A85X-UP4||PCIe 2.0 x8/x8 + x4|
|AM3||990FX||ASUS Crosshair V Formula||PCIe 2.0 x16/x8/x8|
Our good friends at G.Skill are putting their best foot forward in supplying us with high end kits to test. The issue with the memory is more dependent on what the motherboard will support – in order to keep testing consistent, no overclocks were performed. This meant that boards and BIOSes limited to a certain DRAM multiplier were set at the maximum multiplier possible. In order to keep things fairer overall, the modules were adjusted for tighter timings. All of this is noted in our final setup lists.
Our main memory testing kit is our trusty G.Skill 4x4 GB DDR3-2400 9-11-11 1.65 V RipjawsX kit which has been part of our motherboard testing for over twelve months. For times when we had two systems being tested side by side, a G.Skill 4x4 GB DDR3-2400 10-12-12 1.65 V TridentX kit was also used.
For The Beast, which is one of the systems that has the issue with higher memory dividers, we pulled in a pair of tri-channel kits from X58 testing. These are high-end kits as well, currently discontinued as they tended to stop working with too much voltage. We have a sets of 3x2 GB OCZ Blade DDR3-2133 8-9-8 and 3x1 GB Dominator GT DDR3-2000 7-8-7 for this purpose, which we ran at 1333 6-7-6 due to motherboard limitations at stock settings.
Our Core2Duo CPUs clearly gets their own DDR2 memory for completeness. This is a 2x2 GB kit of OCZ Platinum DDR2-666 5-5-5.
For Haswell we were offered new kits for testing, this time from Corsair and their Vengeance Pro series. This is a 2x8 GB kit of DDR3-2400 10-12-12 1.65 V.
To start, we want to thank the many manufacturers who have donated kit for our test beds in order to make this review, along with many others, possible.
Thank you to OCZ for providing us with 1250W Gold Power Supplies.
Thank you to G.Skill for providing us with memory kits.
Thank you to Corsair for providing us with an AX1200i PSU and 16GB 2400C10 memory.
Thank you to ASUS for providing us with the AMD GPUs and some IO Testing kit.
Thank you to ECS for providing us with the NVIDIA GPUs.
Thank you to Corsair for providing us with the Corsair H80i CLC.
Thank you to Rosewill for providing us with the 500W Platinum Power Supply for mITX testing, BlackHawk Ultra, and 1600W Hercules PSU for extreme dual CPU + quad GPU testing, and RK-9100 keyboards.
Also many thanks go to the manufacturers who over the years have provided review samples which contribute to this review. For this Intel update we would particularly like to thank Gigabyte for loaning the Haswell and Nehalem CPUs!
In order to keep the testing fair, we set strict rules in place for each of these setups. For every new chipset, the SSD was formatted and a fresh installation of the OS was applied. The chipset drivers for the motherboard were installed, along with NVIDIA drivers then AMD drivers. The games were preinstalled on a second partition, but relinked to ensure they worked properly. The games were then tested as follows:
Metro 2033: Benchmark Mode, two runs of four scenes of Frontline at 1440p, max settings. First run of four is discarded, average of second run is taken (minus outliers).
Dirt3: Benchmark Mode, four runs of the first scene with 8 cars at 1440p, max settings. Average is taken.
Civilization V: One five minute run of the benchmark mode accessible at the command line, at 1440p and max settings. Results produced are total frames in sets of 60 seconds, average taken.
Sleeping Dogs: Using the Adrenaline benchmark software, four scenes at 1440p in Ultra settings. Average is taken.
If the platform was being used for the next CPU (e.g. Maximus V Formula, moving from FX-8150 to FX-8350), no need to reinstall. If the platform is changed for the next test, a full reinstall and setup takes place.
How to Read This Review
Due to the large number of different variables in our review, it is hard to accurately label each data point with all the information about that setup. It also stands to reason that just putting the CPU model is also a bad idea when the same CPU could be in two different motherboards with different GPU lane allocations. There is also the memory aspect to consider, as well as if a motherboard uses MCT at stock. Here is a set of labels correlating to configurations you will see in this review:
CPU[+] [CP] (PCIe version – lane allocation to GPUs [PLX])
e.g. A10-5800K (2 – x16/x16): A10-5800K with two GPUs in PCIe 2.0 mode
- First is the name of the CPU, then an optional + identifier for MCT enabled motherboards.
- CP indicates we are dealing with a Bulldozer derived CPU and using the Core Parking updates.
- Inside the circular brackets is the PCIe version of the lanes we are dealing with, along with the lane allocation to each GPU.
- The final flag is if a PLX chip is involved in lane allocation.
This one of the more complex configurations:
i7-3770K+ (3 – x8/x8/x8/x8 PLX)
Which means an i7-3770K (with MCT) powering four GPUs in PCIe 3.0 via a PLX chip
Common Configuration Points
All the system setups below have the following consistent configurations points:
- A fresh install of Windows 7 Ultimate 64-bit
- Either an Intel Stock CPU Cooler, a Corsair H80i CLC or Thermalright TRUE Copper
- OCZ 1250W Gold ZX Series PSU or Corsair AX1200i PSU for SP
- Rosewill 1600W Hercules for DP systems
- Up to 4x ASUS AMD HD 7970 GPUs, using Catalyst 13.1
- Up to 2x ECS NVIDIA GTX 580 GPUs, using GeForce WHQL 310.90
- SSD Boot Drives, OCZ Vertex 3 128 GB
- LG GH22NS50 Optical Drives
- Open Test Beds, either a DimasTech V2.5 EasyHard or a CoolerMaster Test Lab
CPU and Motherboard Configurations
Those listed as ‘Part 2’ are new for this update.
|Part 1||A6-3650 + Gigabyte A75-UD4H + 16GB DDR3-1866 8-10-10|
|Part 1||A8-3850 + ASRock A75 Extreme6 + 16GB DDR3 1866 8-10-10|
|Part 1||A8-5600K + Gigabyte F2A85-UP4 + 16GB DDR3-2133 9-10-10|
|Part 1||A10-5800K + Gigabyte F2A85-UP4 + 16GB DDR3-2133 9-10-10|
|Part 1||X2-555 BE + ASUS Crosshair V Formula + 16GB DDR3 1600 8-8-8|
|Part 1||X4-960T + ASUS Crosshair V Formula + 16GB DDR3-1600 8-8-8|
|Part 1||X6-1100T + ASUS Crosshair V Formula + 16GB DDR3-1600 8-8-8|
|Part 1||FX-8150 + ASUS Crosshair V Formula + 16GB DDR3-2133 10-12-11|
|Part 1||FX-8350 + ASUS Crosshair V Formula + 16GB DDR3-2133 9-11-10|
|Part 1||FX-8150 + ASUS Crosshair V Formula + 16GB DDR3-2133 10-12-11 + CP|
|Part 1||FX-8350 + ASUS Crosshair V Formula + 16GB DDR3-2133 9-11-10 + CP|
|Part 1||E6400 + MSI i975X Platinum + 4GB DDR2-666 5-6-6|
|Part 1||E6700 + ASUS P965 Commando + 4GB DDR2-666 4-5-5|
|Part 1||Xeon X5690 + EVGA SR-2 + 6GB DDR3 1333 6-7-7|
|Part 1||2x Xeon X5690 + EVGA SR-2|
|Part 1||Celeron G465 + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11|
|Part 1||i5-2500K + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11|
|Part 1||i7-2600K + ASUS Maximus V Formula + 16GB DDR3-2133 9-11-11|
|Part 1||i3-3225 + ASUS Maximus V Formula + 16GB DDR3-2400 10-12-12|
|Part 1||i7-3770K + Gigabyte Z77X-UP7 + 16GB DDR3-2133 9-11-11|
|Part 1||i7-3770K + ASUS Maximus V Formula + 16GB DDR3-2400 9-11-11|
|Part 1||i7-3930K + ASUS Rampage IV Extreme + 16GB DDR3-2133 10-12-12|
|Part 1||i7-3960X + ASRock X79 Professional + 16GB DDR3-2133 10-12-12|
|Part 1b||E6400 + ASUS P965 Commando + 4GB DDR2-666 4-5-5|
|Part 1b||E6550 + ASUS P965 Commando + 4GB DDR2-666 5-6-6|
|Part 1b||Q9400 + ASUS P965 Commando + 4GB DDR2-666 5-6-6|
|Part 1b||i7-4770K + Gigabyte Z87X-UD3H + 16GB DDR3-2400 10-12-12|
|Part 1b||i7-4770K + ASUS Z87-Pro + 16GB DDR3-2400 10-12-12|
|Part 1b||i7-4770K + MSI Z87A-GD65 Gaming + 16GB DDR3-2400 10-12-12|
|Part 2||A6-5200 + ASRock IMB-A180-H + 8GB DDR3-1333 9-9-10|
|Part 2||Fusion E-350 + Zotac Fusion-A-E + 8GB DDR3-1066 7-7-7|
|Part 2||i7-4770K + MSI Z87 XPower + 16GB DDR3-2400 10-12-12|
|Part 2||4x E5-4650L + SuperMicro + 128GB DDR3 1600 11-11-11|
|Part 2||2x E5-2690 + Gigabyte GA-7PESH1 + 32GB DDR3-1600 11-11-11|
|Part 2||Celeron 847 + ECS NM70-I2 + 8GB DDR3-1333 9-9-9|
|Part 2||i7-920 + Gigabyte X58-UD9 + 6GB DDR3-1866 7-8-7|
|Part 2||i7-950 + Gigabyte X58-UD9 + 6GB DDR3-1866 7-8-7|
|Part 2||i7-990X + Gigabyte X58-UD9 + 6GB DDR3-1866 7-8-7|
|Part 2||i7-920 + ASRock X58 Extreme3 + 6GB DDR3-1866 7-8-7|
|Part 2||i7-950 + ASRock X58 Extreme3 + 6GB DDR3-1866 7-8-7|
|Part 2||i7-990X + ASRock X58 Extreme3 + 6GB DDR3-1866 7-8-7|
|Part 2||i5-4430 + Gigabyte Z87X-UD3H + DDR3-2400 10-12-12|
|Part 2||i7-4670K + Gigabyte Z87X-UD3H + DDR3-2400 10-12-12|
|Part 2||Xeon E3-1280 v3 + Gigabyte Z87X-UD3H + DDR3-2400 10-12-12|
|Part 2||Xeon E3-1285 v3 + Gigabyte Z87X-UD3H + DDR3-2400 10-12-12|
|Part 2||Via L2007 + ECS VX900-I + 8GB DDR3-1066 7-7-7|
Our first port of call with all our testing is CPU throughput analysis, using our regular motherboard review benchmarks.
Point Calculations - 3D Movement Algorithm Test
The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc. The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score. This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark. The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance.
For single thread performance, the higher MHz Haswell CPUs sit on top of the list - interestingly enough it is the Xeons. Comparing these to the i7-4960X, which also sits at 4 GHz, shows the generational difference in this purely multithreaded test. The 100 MHz difference between the i5-4670K and the i7-4770K shows up as two points in this test. The s1366 CPUs are staggered between a score of 90.93 and 115.79, with the i7-920 falling short of the X6-1100T. Due to the IPC difference the i7-990X is behind the i5-2500K and anything older at a similar MHz.
For the multithreaded test, cores and MHz with FP performance win out here, so the i5-4670K, even in a motherboard with Multi-Core Turbo, sits behind the eight threads of the FX-8350 and six threads of the X6-1100T. The i7-4770K scores another 75%, along with the Xeons. In terms of the Nehalem CPUs, the i7-990X performs an extra 200 points higher than the latest Haswell CPUs due to its six core / twelve thread design. Unfortunately the i7-920/i7-950 are a little behind, with the i7-2600K offering a noticable boost.
Compression - WinRAR x64 3.93 + WinRAR 4.2
With 64-bit WinRAR, we compress the set of files used in the USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible, and provides as a good test for when a system has variable threaded load. WinRAR 4.2 does this a lot better! If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.
The only downside with WinRAR is that when you're dealing with slow CPUs, they are very slow! The quad core Nehalem CPUs are kept on track by the FX-8350 using this older version of WinRAR, although it seems the higher IPC wins out here over cores with the 4.0 GHz Haswell Xeons scoring best.
The improvements in WinRAR 4.2 due to optimisations and multi-threading result in more cores giving better results. The i7-990X does well here, although Sandy Bridge-E and Ivy Bridge-E take the top spots. Due to the threading advantage WinRAR takes, the i7-4770K gets a 20 second advantage of its non-hyperthreaded cousin, the i5-4670K.
Image Manipulation - FastStone Image Viewer 4.2
FastStone Image Viewer is a free piece of software I have been using for quite a few years now. It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters. It also has a bulk image conversion tool, which we use here. The software currently operates only in single-thread mode, which should change in later versions of the software. For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.
FastStone loves single threaded IPC and MHz, so it's no surprise for the Haswell CPUs to be on top, with no discernable difference between the i5-4670K and the i7-4770K. The old school Nehalems take a knock, with the i7-920 being almost a full 60% slower than the top scores.
Video Conversion - Xilisoft Video Converter 7
With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices. By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs. For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU. The time taken to convert these videos gives us our result.
For fully multithreaded video conversion, a combination of cores, IPC and MHz take top spots, hence the i7-4960X is the consumer CPU to get. The i7-990X has a smaller advantage over the quad core Haswells this time, and here is one benchmark where the i5-4670K falls behind the FX-8350s due to the integer nature of the workload. Interestingly enough the i5-4430 slots in with an i5-2500K due to IPC increases despite lower power consumption and MHz.
Rendering – PovRay 3.7
The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.
PovRay is another 'multithreading takes all', as shown by our 4P testing on E5-4650L CPUs. The i7-990X still shows its worth, being as quick as the i7-4770K at least, although the i7-920 and i7-950 are further down the pecking order.
Video Conversion - x264 HD Benchmark
The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps. This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed. The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.
Grid Solvers - Explicit Finite Difference
For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it. This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step. By choosing a regular grid, we reduce the levels of memory access required for irregular grids. We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision. The grid is isotropic and the boundary conditions are sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.
Grid solvers do love a fast processor and plenty of cache in order to store data. When moving up to 3D, it is harder to keep that data within the CPU and spending extra time coding in batches can help throughput. Our simulation takes a very naïve approach in code, using simple operations, but that doesn't stop the single socket, highly threaded CPUs taking top spots. The i5-4670K takes a surprising twist in 2D, outpacing the i7-4770K.
Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method
The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations. This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread. The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing. For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision. Again our grid is isotropic with the boundaries acting as sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.
If anything, large caches matter more in implicit simulation, in line with both cores and threads. The i5-4430 is on the lower rungs of the Intel bloc, but the 990X is at the top.
Point Calculations - n-Body Simulation
When a series of heavy mass elements are in space, they interact with each other through the force of gravity. Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other. When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code. We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.
Due to extension enhancements, we see that a quad core Haswell Xeon scores roughly the same as the hex-core Nehalem, with the i5-4430 not far behind. If anything, the i7-920 and i7-950 take a nose dive here, and it's worth investing even in an i5-4430 for a 50% performance enhancement.
Our first analysis is with the perennial reviewers’ favorite, Metro2033. It occurs in a lot of reviews for a couple of reasons – it has a very easy to use benchmark GUI that anyone can use, and it is often very GPU limited, at least in single GPU mode. Metro2033 is a strenuous DX11 benchmark that can challenge most systems that try to run it at any high-end settings. Developed by 4A Games and released in March 2010, we use the inbuilt DirectX 11 Frontline benchmark to test the hardware at 1440p with full graphical settings. Results are given as the average frame rate from a second batch of 4 runs, as Metro has a tendency to inflate the scores for the first batch by up to 5%.
Almost all our test results fall between 31-35 FPS, which technically means a 10% difference between Nehalem CPUs and the latest Intel and AMD CPUs.
Doubling up to two 7970s and the Nehalems are in the ballpark of the Piledriver CPUs, but for comparison the quad core i5-4670K is similar to the full fat i7-4770K. Anything quad core and Intel, Sandy Bridge and above, hits 60 FPS average.
At three GPUs we have a bit more seperation going on, with the Nehalems losing out due to IPC - only on the NF200 enabled motherboard do we get 70 FPS. There are no benefits moving to the hex-core Ivy Bridge-E i7-4960X, but the jump from 4670K to 4770K nets five FPS.
Similar to the 7970s, most modern CPUs perform the same. Beware of single core CPUs however, with the G465 not fairing well.
Similarly in dual NVIDIA GPU, there is not much difference - ~3 FPS at most unless you deal with dual core CPUs. Interestingly the results seem to be a little varied within that 41-44 FPS band.
In terms of single GPU, almost all the CPUs we have tested perform the same within a margin. On dual AMD GPUs we start to see a split, with the older Nehalem CPUs falling under 60 FPS. On tri-GPU setups the i5-4430 performs close to the Nehalems, and moving from 4670K to 4770K merits a jump from 72.47 FPS to 74-77, depending on lane allocation.
Dirt 3 is a rallying video game and the third in the Dirt series of the Colin McRae Rally series, developed and published by Codemasters. Dirt 3 also falls under the list of ‘games with a handy benchmark mode’. In previous testing, Dirt 3 has always seemed to love cores, memory, GPUs, PCIe lane bandwidth, everything. The small issue with Dirt 3 is that depending on the benchmark mode tested, the benchmark launcher is not indicative of game play per se, citing numbers higher than actually observed. Despite this, the benchmark mode also includes an element of uncertainty, by actually driving a race, rather than a predetermined sequence of events such as Metro 2033. This in essence should make the benchmark more variable, but we take repeated in order to smooth this out. Using the benchmark mode, Dirt 3 is run at 1440p with Ultra graphical settings. Results are reported as the average frame rate across four runs.
Similar to Metro, pure dual core CPUs seem best avoided when pushing a high resolution with a single GPU. The Haswell CPUs seem to be near the top due to their IPC advantage.
When running dual AMD GPUs only the top AMD chips seem to click on to the tail of Intel, with the hex-core CPUs taking top spots. Again there's no real change moving from 4670K to 4770K, and even the Nehalem CPUs keep up within 4% of the top spots
At three GPUs the 4670K seems to provide the equivalent grunt to the 4770K, though more cores and more lanes seems to be the order of the day. Moving from a hybrid CPU/PCH x8/x8 + x4 lane allocation to a pure CPU allocation (x8/x4/x4) merits a 30 FPS rise in itself. The Nehalem CPUs, without NF200 support, seem to be on the back foot performing worse than Piledriver.
On the NVIDIA side, one GPU performs similarly across the board in our test.
When it comes to dual NVIDIA GPUs, ideally the latest AMD architecture and anything above a dual core Intel Sandy Bridge processor is enough to hit 100 FPS.
Our big variations occured on the AMD GPU side where it was clear that above two GPUs that perhaps moving from Nehalem might bring a boost to frame rates. The 4670K is still on par with the 4770K in our testing, and the i5-4430 seemed to be on a similar line most of the way but was down a peg on tri-GPU.
A game that has plagued my testing over the past twelve months is Civilization V. Being on the older 12.3 Catalyst drivers were somewhat of a nightmare, giving no scaling, and as a result I dropped it from my test suite after only a couple of reviews. With the later drivers used for this review, the situation has improved but only slightly, as you will see below. Civilization V seems to run into a scaling bottleneck very early on, and any additional GPU allocation only causes worse performance.
Our Civilization V testing uses Ryan’s GPU benchmark test all wrapped up in a neat batch file. We test at 1440p, and report the average frame rate of a 5 minute test.
Civ5 seems to love IPC, with our Haswell and Ivy-E CPUs all near the top. All our PCIe 3.0 combinations hit 80 FPS or above.
On multiple AMD GPUs the PCIe 3.0 combiantions get the biggest boost, along with anything using a PLX or NF200 chip to boost lane allocations. There seems to be a barrier around 100-108 FPS that only Haswell and Ivy Bridge CPUs are moving over, except the one 990X result. The i7-4960X takes top spot, and the i7-920 is 45 FPS behind - almost 1/3. The i5-4430 is lower than expected, showing little scaling after the first GPU.
Civ5 has terrible scaling behond one GPU let alone two, meaning most of our tri-GPU results are similar to dual GPU. Again, anything purely PCIe 3.0 seems to get the biggest boost, with the 4670K still fighting alongside the 4770K.
For a single GTX 580 the top spots above 80 FPS are all on the side of Sandy Bridge and above, with Nehalem scoring below this marker. It seems that dual core CPUs take a bashing, suggesting a quad core minimum.
More NVIDIA GPUs for Civ5 means more cores and more lanes where possible, with the i7-4960X taking the top spot. This is almost 40 FPS higher than the i5-4430 and the Nehalem CPUs. The 4670K doesn't miss a beat against the i7-4770K.
Civilization V Conclusion
We see some of our biggest variations in CPU performance in Civilization V, where it is clear that a modern Intel processor (Ivy/Haswell), at least quad core, is needed to get the job done for the higher frame rates. Arguably any high-end AMD processor will perform >60 FPS in our testing here as well, perhaps making the point moot. For single CPU, the i5-4430 performs well in Civ5, though in dual GPU the i5-4670K might be a better investment.
Sleeping Dogs is a strenuous game with a pretty hardcore benchmark that scales well with additional GPU power when SSAO is enabled. The team at Adrenaline.com.br is supreme for making an easy to use benchmark GUI, allowing a numpty like me to charge ahead with a set of four 1440p runs with maximum graphical settings.
With one AMD GPU, Sleeping Dogs is similar across the board.
On dual AMD GPUs, there seems to be a little kink with those running x16+x4 lane allocations, although this is a minor difference.
Between an i7-920 and an i5-4430 we get a 7 FPS difference, almost 10%, showing the change over CPU generations. In fact at this level anything above that i7-920 gives 70 FPS+, but the hex-core Ivy-E takes top spot at ~81 FPS.
0.4 FPS between Core2Duo and Haswell. For one NVIDIA GPU, CPU does not seem to matter(!)
Similarly with dual NVIDIA GPUs, with less than ~3% between top and bottom results.
Sleeping Dogs Conclusion
While the NVIDIA results did not change much between different CPUs, any modern processor seems to hit the high notes when it comes to multi-GPU Sleeping Dogs.
The i5-4670K vs i7-4770K Dilemma
The big debate on Gaming CPUs always circles around to how many cores does a game use, and whether they are sufficiently utilized to matter. Some users are concerned if a title does not use all the CPU cores, while others would prefer that the CPU is a minimal part of the equation when work can be offloaded onto the GPU. So here is a question:
Do you prefer:
- a game that uses the CPU as much as possible such that the CPU can be a bottleneck, or
- a game that offloads most of the CPU work to the GPU thus making the GPU the primary bottleneck?
I am firmly in the latter camp and like to think that the latter is the result of game optimization, and that some users would focus budgets on GPUs if that is their primary concern for a system. Of course you can have your cake and eat it too with a hex-core system, as long as the wallet stretches.
No matter how much philosophical mumbo-jumbo you want to throw at ‘the ideal situation’, the reality always answer the question ‘but what should I get today?’. There are plenty of forum posts regarding processor recommendations, especially when it comes to Intel’s flagship mainstream processor, the i7-4770K, and its modified counterpart, the i5-4670K. Whether the hyperthreading of the i7-4770K provides a boost in games over the i5-4670K is an answer I wanted to provide, given the price difference between these processors is around $100 at Newegg today and that money might be better spent on a GPU.
Here is a table comparing all our results with both CPUs in an x8/x8 + x4 motherboard, with a ‘win’ going to the side that has a +3.5% FPS advantage:
In direct comparison, only two benchmarks had more than a 3.5% FPS jump with the 4770K in favor, and one actually in favor of the 4670K.
So in terms of answering the question, for our benchmarks, it would seem that the i5-4670K is the more cost effective choice in buying a Haswell processor.
Nehalem Can Still Be Still Strong, But Update Soon
Getting a chance to cover a range of Nehalem CPUs was a goal since the first testing started for Part 1, and it is clear to see why performance platforms have that particular name. If you invested in an i7-920, like I did, and are lucky enough to run a nice D0 stepping CPU, then some bases are covered on the single and dual GPU front, although there are some holes were Nehalem is clearly not with the leading pack of CPUs.
Of course with socket 1366 CPUs there are some compromises. The motherboards with these CPUs do not have PCIe 3.0 (which is shown to look important in multi-GPU setups), nor do not have USB 3.0 / SATA 6 Gbps native, meaning you’ll be scrounging around for mid-performing controllers at best. Features such as Thunderbolt are but a wish unless you are willing to upgrade.
The most direct comparison for us is the i7-950 to the i7-4770K – here we have two processors both quad core with hyperthreading, with the 4770K taking a small MHz lead and a large IPC lead. Going back through our benchmark scenarios, the 4770K needs a PLX board to support some nice 3- and 4- way GPU setups, but there is a clear CPU advantage on the side of Haswell. The triple channel memory support of Nehalem was a big plus point when it was launched, but as we can now kit out our dual channel mainstream platforms with 2400 C10 memory with relative ease (or more if you are that way inclined), that memory bandwidth advantage is shorter.
If you were lucky/rich enough to jump on the extreme end of Westmere (i7-980, i7-980X or i7-990X), then having that hex-core system will keep a small advantage in multithreaded tests over the top performing Haswell solution (PovRay on 4770K = 1612.68, on i7-990X = 1636.40). The IPC advantage that Haswell comes with shows itself to be useful in most multi-GPU setups, whereas for our other single and some dual GPU benchmarks the performance difference between the two processors is almost negligible. When you hit three-way GPU configurations, it is all about the lane counts.
Investing in the i7-4960X
Plunging in at the high end is always expensive. These are the high margin parts that the manufacturers want to promote the virtues of such that users might invest lower down the product stack. When going for more cores, more MHz and a strong IPC contender, the CPU benchmark results are plain to see for anyone needing cores and grunt.
The downside of the i7-4960X is going to be with the chipset – we still have X79 on hand, even when paired with a motherboard refresh there are some limitations that motherboard manufacturers cannot escape. Now that Haswell/Z87 offers a full complement of SATA 6 Gbps and native USB 3.0, functionality via X79 has to come via extra controllers. The big upside in the extreme end is the lane allocation, which has some benefits in gaming.
Across our benchmark range, the i7-4960X and i7-4770K are similar in results, where single and dual GPU results are on par with each other across the board. When we start moving into tri-GPU setups, there are several things to consider:
- the x8/x8 + x4 PCIe allocation on Z87 is bad
- the x8/x4/x4 PCIe allocation on Z87 fares better
- having a PLX chip on a Z87 for x16/x8/x8 is best, but this more expensive
- You don’t have to worry about this with an i7-4960X
- But going Ivy Bridge-E is more expensive to begin with
The i7-4960X takes the top spot on Dirt 3 tri-GPU, Civ5 tri-GPU and Sleeping Dogs tri-GPU, suggesting that if you want the absolute best frame rates with more than two GPUs, then the i7-4960X is your answer. However the i7-4770K with a PLX-enabled motherboard will give almost as good results (often within 1-2%) of the i7-4960X for the lower CPU cost.
Next Update: Part 3
As mentioned in the early parts of this article, our next update will focus solely on the AMD midrange. A few of our partners have kindly volunteered processors for testing, as well as a small call to AMD for a few of the major ones and a quick scout on eBay for the less expensive models. As you can imagine, that is quite a list to choose from, but needs must as the devil desires. For sure the A10-6800K and similar processors will be included in as many GPU configurations as possible.
Recommendations for the Games Tested at 1440p/Max Settings
A CPU for Single GPU Gaming:
AMD: A8-5600K + Core Parking updates
If I were gaming today on a single GPU, the A8-5600K (or non-K equivalent) would strike me as a price competitive choice for frame rates, as long as you are not a big Civilization V player and do not mind the single threaded performance. The A8-5600K scores within a percentage point or two across the board in single GPU frame rates with both a HD7970 and a GTX580, as well as feel the same in the OS as an equivalent Intel CPU. The A8-5600K will also overclock a little, giving a boost, and comes in at a stout $110, meaning that some of those $$$ can go towards a beefier GPU or an SSD. The only downside is if you are planning some heavy OS work – if the software is Piledriver-aware, all is well, although most processing is not, and perhaps an i3-3225 or FX-8350 might be worth a look.
It is possible to consider the non-IGP versions of the A8-5600K, such as the FX-4xxx variant or the Athlon X4 750K. But as we have not had these chips in to test, it would be unethical to suggest them without having data to back them up. Watch this space, we have processors in the list to test.
A CPU for Dual GPU Gaming:
Intel: i5-4430 / i5-4670K
AMD: FX-8350 + Core Parking Updates
Based on our benchmarks, it again comes down to if you are a Civilization V type gamer, or if the engine your game is based on is similar to Civ5. If the answer is no, then the i5-4430 performs within low single digit % numbers of our top performers, and the FX-8350 puts up a reasonable showing. If the answer is yes, then anything short of the i5-4670K means that performance is being lost.
Looking back through the results, moving to a dual GPU setup obviously has some issues. Various AMD platforms are not certified for dual NVIDIA cards for example, meaning while they may excel for AMD, you cannot recommend them for team Green. There is also the dilemma that while in certain games you can be fairly GPU limited (Metro 2033, Sleeping Dogs), there are others were having the CPU horsepower can double the frame rate (Civilization V).
After the overview, my recommendation for dual GPU gaming comes in at the feet of the i5-4430 and the i5-4670K, depending on your CPU workloads. The price difference between these two processors is around $40, and for that extra we do get an overclockable CPU as well.
A CPU for Tri-GPU Gaming:
i5-4670K with an x8/x4/x4 (AMD) or PLX (NVIDIA) motherboard
By moving up in GPU power we also have to boost the CPU power in order to see the best scaling at 1440p. The CPUs in our testing that provides the top frame rates at this level are the top line Ivy Bridge and Haswell models. For a comparison point, the Sandy/Ivy Bridge-E 6-core results were often very similar, but the price jump to such as setup is prohibitive to all but the most sturdy of wallets. Of course we would suggest Haswell over Ivy Bridge based on Haswell being that newer platform.
As noted in the introduction, using 3-way on NVIDIA with Ivy Bridge will require a PLX motherboard in order to get enough lanes to satisfy the SLI requirement of x8 minimum per CPU. This also raises the bar in terms of price, as PLX motherboards start around the $280 mark. For a 3-way AMD setup, an x8/x4/x4 enabled motherboard performs similarly to a PLX enabled one, and ahead of the slightly crippled x8/x8 + x4 variations. However investing in a PLX board would help moving to a 4-way setup should that be your intended goal.
A CPU for Quad-GPU Gaming:
i5-4670K with a PLX motherboard
While our fourth GPU for this update was unfortunately in need of repair, by extension of the tri-GPU results we should say at this point that as long as the game title scales, we need at least the CPU recommendation for the tri-GPU setups in order to make sure the frame rates are in the top echelons. There are a couple of Haswell Z87 motherboards that offer an odd x8/x4/x4 + x4 PCIe lane allocation, although given our tri-GPU results using that PCIe 2.0 x4 from the PCH, I would not be too confident in seeing anything spectacular from those results. We see in our tri-GPU testing that the PLX chip has a positive effect which will only ever be boosted by adding GPUs. For the wallets that open wider than most, the socket 2011 processors are also at your beck and call.
But even still, a four-way GPU configuration is for those few users that have both the money and the physical requirement for pixel power. We are all aware of the law of diminishing returns, and more often than not adding that fourth GPU is taking the biscuit for most resolutions. Despite this, even at 1440p, we see awesome scaling in games like Sleeping Dogs (+73% of a single card moving from three to four cards) and more recently I have seen that four-way GTX680s help give BF3 in Ultra settings a healthy 35 FPS minimum on a 4K monitor. So while four-way setups are insane, there is clearly a usage scenario where it matters to have card number four.
Next on the Horizon: AMD, Dual Core Haswell, 2014 Updates
By the end of the year, it would make sense to cycle back around to the AMD platforms we have not tested in their entirety, including CPUs like the A10-6800K and the non-IGP oriented Athlon X4 750K. The stack of CPUs under $150 is larger than I originally thought, varying in CPU speed, cores and cache levels. We have a number in for testing which should provide a few interesting data points.
At the beginning of September, Intel formally put on sale the dual core Haswell CPUs that are now populating e-tailers. I would like to get a few in to bolster the area around the i3-3225 which is looking a little forlorn. Samples and prices dependent, I would also like to take a few of these in due course – either this year or beginning of next.
Leading on to next year, I am planning an update to our testing following recommendations from our readers. This includes a driver update (to the latest WHQL), hopefully an update on our NVIDIA GPU side to something around the GTX 760+, and also a game update to coincide with more relevant titles. At present we are looking at Company of Heroes 2, Bioshock Infinite, F1 2012/2013, Tomb Raider, and Sleeping Dogs again. We will stick at 1440p for 2014, as well as aim to report minimum frame rates as well.
If you have any suggestions for our Gaming CPU 2014 update, please forward them on to firstname.lastname@example.org!