Choosing a Gaming CPU: Single + Multi-GPU at 1440p, April 2013

Name: Choosing a Gaming CPU: Single + Multi-GPU at 1440p, April 2013
Item: Choosing a Gaming CPU: Single + Multi-GPU at 1440p, April 2013
Author: Dr. Ian Cutress

by Ian Cutress on May 8, 2013 10:00 AM EST

Posted in
CPUs
Guides
Gaming
GPUs

242 Comments | Add A Comment

242 Comments

CPU Benchmarks

Point Calculations - 3D Movement Algorithm Test

The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc. The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score. This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark. The benchmark is also parallel between particles simulated, and we test the single threaded performance as well as the multi-threaded performance.

3D Particle Movement Single Threaded

3D Particle Movement MultiThreaded

As mentioned in previous reviews, this benchmark is written how most people would tackle the situation – using floating point numbers. This is also where Intel excels, compared to AMD’s decision to move more towards INT ops (such as hashing), which is typically linked to optimized code or normal OS behavior.

Compression - WinRAR x64 3.93 + WinRAR 4.2

With 64-bit WinRAR, we compress the set of files used in our motherboard USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible and provides a good test for when a system has variable threaded load. WinRAR 4.2 does this a lot better! If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.

WinRAR 3.93

WinRAR 4.2

Due to the late inclusion of 4.2, our results list for it is a little smaller than I would have hoped. But it is interesting to note that with the Core Parking updates, an FX-8350 overtakes an i5-2500K with MCT.

Image Manipulation - FastStone Image Viewer 4.2

FastStone Image Viewer is a free piece of software I have been using for quite a few years now. It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters. It also has a bulk image conversion tool, which we use here. The software currently operates only in single-thread mode, which should change in later versions of the software. For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.

FastStone Image Viewer 4.2

In terms of pure single thread speed, it is worth noting the X6-1100T is leading the AMD pack.

Video Conversion - Xilisoft Video Converter 7

With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices. By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs. For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU. The time taken to convert these videos gives us our result.

Xilisoft Video Converter 7

XVC is a little odd in how it arranges its multicore processing. For our set of 33 videos, it will arrange them in batches of threads – so if we take the 8 thread FX-8350, it will arrange the videos into 4 batches of 8, and then a fifth batch of one. That final batch will only have one thread assigned to it (!), and will not get a full 8 threads worth of power. This is also why the 2x X5690 finishes in 6 seconds but the normal X5690 takes longer – you would expect a halving of time moving to two CPUs but XVC arranges the batches such that there is always one at the end that only gets a single thread.

Rendering – PovRay 3.7

The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.

PovRay 3.7 Multithreaded Benchmark

The SMP engine in PovRay is not perfect, though scaling up in CPUs gives almost a 2x effect. The results from this test are great – here we see an FX-8350 CPU below an i7-3770K (with MCT), until the Core Parking updates are applied, meaning the FX-8350 performs better!

Video Conversion - x264 HD Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps. This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed. The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.

x264 HD Benchmark Pass 1

x264 HD Benchmark Pass 2

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it. This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step. By choosing a regular grid, we reduce the levels of memory access required for irregular grids. We test both 2D and 3D explicit finite difference simulations with 2ⁿ nodes in each dimension, using OpenMP as the threading operator in single precision. The grid is isotropic and the boundary conditions are sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.

Explicit Finite Difference Grid Solver (2D)

Explicit Finite Difference Grid Solver (3D)

Grid solvers do love a fast processor and plenty of cache in order to store data. When moving up to 3D, it is harder to keep that data within the CPU and spending extra time coding in batches can help throughput. Our simulation takes a very naïve approach in code, using simple operations.

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations. This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread. The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing. For this we simulate a 2D grid of 2ⁿ nodes in each dimension, using OpenMP in single precision. Again our grid is isotropic with the boundaries acting as sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.

Implicit Finite Difference Grid Solver (2D)

2D Implicit is harsher than an Explicit calculation – each thread needs more a lot memory, which only ever grows as the size of the simulation increases.

Point Calculations - n-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity. Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other. When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code. We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

n-body Simulation via C++ AMP

As we only look at base/SSE2/SSE4 depending on the processor (auto-detection), we don’t see full AVX numbers in terms of FLOPs.

Testing Methodology, Hardware Configurations, and The Beast GPU Benchmarks: Metro2033

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

242 Comments

View All Comments

Patrese - Wednesday, May 8, 2013 - link
Awesome article, thanks! Is it possible to include some sort of gaming physics testing? Now that PhysX is beginning to catch some momentum, I'd be great to see if a 8-module AMD processor handles physics stuff better than a 4-core comparable Intel one, and at what point does a dedicated physics card starts to make sense, if at all.

I’d be also nice if a “mainstream gaming” article could be made too. Benchmarks at 1080p with cards like the 660Ti and 7850, for instance. No need for 3 way SLI/CF on those, so you'll not need as much time in Narnia. :)
araczynski - Wednesday, May 8, 2013 - link
interesting read, although i find it too focused to be of much general use (or useful future reference). i'd like to have seen for example how an E8500 holds up (too big of a gap between E6500 and i52500), as well as at least ONE game i would even bother playing (skyrim/witcher/etc). and of course like you mentioned, even a slightly bigger sampling of graphics cards. (i think you mentioned that).

anywho, i realize this wasn't meant to be anything exhaustive (i do appreciate having the CPU/GPU benches available here as a good reference though), and i do like the detail/explanation length you went into.

so thanks :)
xinthius - Wednesday, May 8, 2013 - link
But AMD offers good price to performance at lower tiers, they should be recommend.
yougotkicked - Wednesday, May 8, 2013 - link
Regarding your comments on the role of artificial intelligence in game performance/programming: I've just finished a course in AI, and while implementations may vary quite a bit from game to game, many AI programs can be reduced to highly-parallel brute-force computation, simply evaluating the resulting states of many potential decisions for a numerical representation of their desirability, then selecting the best option from the set of evaluated actions. Obviously this is something that will vary greatly from game to game, but in games with many independent AI managed elements, I would expect a certain amount of the processing to be offloaded to the GPU.

Other than that I agree with you on the demands of AI in games; my professor (who specializes in game AI and has experience in the industry) said that the AI is usually given about 10% of the CPU time in a game, so it's rarely a limiting factor.

I'm still working through the whole article (really enjoying it so far) so I'm sure I'll have many more comments/questions later.
IanCutress - Wednesday, May 8, 2013 - link
Based on previous CUDA experience, CUDA doesn't like a lot of IF statements in its routines. So if you're offloading different AI parts onto the GPU, unless all the elements are being put through the same set of if commands (and states), it won't work too well, with some warps taking a lot longer than others if there is large branch deviation. It's a task suited to MIMD environments, like a CPU. Then again, it really depends on the game. Clever AI is already here, because we confine it to a self-created system. One could argue that the bots in CounterStrike are not particularly smart, but the system can put their accuracy up to 100% to make it harder. It's a lot of give and take, perhaps. It is times like these I wish I did CompSci rather than Chemistry :) I need to take one of those MIT online AI courses. You know, inbetween testing!

Ian
yougotkicked - Wednesday, May 8, 2013 - link
I suppose conditionals would make offloading some AI components to the GPU impractical, but there still remains a subset of AI computations which seem very GPU friendly to me. State evaluation functions seems like a prime example, the CPU would be responsible for deciding which options to evaluate, building an array of states to be evaluated by the GPU. These situations probably don't come up very often in FPSs, but in something like Civilization I can see it being quite common.

I've actually got to head over to that class now, I'll ask the professor if he knows of any AI's using GPU computing in modern games.
airmantharp - Wednesday, May 8, 2013 - link
Like Ian said, GPU's aren't good 'branch' processors, but I do see where you're coming from. Things like real physics, audio environment maps, and pre-render lighting maps could be fed to AI routines running on the CPU. This would allow for a much greater 'simulation awareness' for AI actions.
yougotkicked - Wednesday, May 8, 2013 - link
I spoke with my professor and he said that as far as he knows, many people have discussed to prospect of using GPU's for AI, but nobody has actually done so yet. He's going to ask some friends of his at some major game studios to see if they are working on it.

He did agree with me that there are some aspects that could be computed on a GPU, but a lot of the existing AI methods are inherently sequential, so offloading it to the GPU will require new algorithms in many cases.
TheQweaker - Thursday, May 9, 2013 - link
You may wish to check nVidia's GTC conference web site where you can find some GPU AI Research. Also, nVidia published various PDF slides on GPU Path Planning.

If you look deeper in some specific AI Domains such as, say, AI Planning (first used in F.E.A.R. in 2005, lately used in KillZone 3 and Transformers 3: The Fall of the Cybertron) you can find papers investigating the use of GPUs.

On of the bottom line of current GPU AI research is that GPUs crunch large numbers of data very fast so, currently, there is not much hope in using the many GPU threads for tiny amounts of data of state space search.

Hoping this helps.

-- The Qweaker.
yougotkicked - Thursday, May 9, 2013 - link
Thanks for pointing me towards those papers, they look pretty interesting and I've been looking for a topic to write my final paper on ;)

Choosing a Gaming CPU: Single + Multi-GPU at 1440p, April 2013

Post Your Comment

242 Comments

View All Comments

Patrese - Wednesday, May 8, 2013 - link

araczynski - Wednesday, May 8, 2013 - link

xinthius - Wednesday, May 8, 2013 - link

yougotkicked - Wednesday, May 8, 2013 - link

IanCutress - Wednesday, May 8, 2013 - link

yougotkicked - Wednesday, May 8, 2013 - link

airmantharp - Wednesday, May 8, 2013 - link

yougotkicked - Wednesday, May 8, 2013 - link

TheQweaker - Thursday, May 9, 2013 - link

yougotkicked - Thursday, May 9, 2013 - link

Log in

Don't have an account? Sign up now