CPU Benchmarks

Point Calculations - 3D Movement Algorithm Test

The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc. The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score. This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark. The benchmark is also parallel between particles simulated, and we test the single threaded performance as well as the multi-threaded performance.

3D Particle Movement Single Threaded

3D Particle Movement MultiThreaded

As mentioned in previous reviews, this benchmark is written how most people would tackle the situation – using floating point numbers. This is also where Intel excels, compared to AMD’s decision to move more towards INT ops (such as hashing), which is typically linked to optimized code or normal OS behavior.

Compression - WinRAR x64 3.93 + WinRAR 4.2

With 64-bit WinRAR, we compress the set of files used in our motherboard USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible and provides a good test for when a system has variable threaded load. WinRAR 4.2 does this a lot better! If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.

WinRAR 3.93

WinRAR 4.2

Due to the late inclusion of 4.2, our results list for it is a little smaller than I would have hoped. But it is interesting to note that with the Core Parking updates, an FX-8350 overtakes an i5-2500K with MCT.

Image Manipulation - FastStone Image Viewer 4.2

FastStone Image Viewer is a free piece of software I have been using for quite a few years now. It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters. It also has a bulk image conversion tool, which we use here. The software currently operates only in single-thread mode, which should change in later versions of the software. For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.

FastStone Image Viewer 4.2

In terms of pure single thread speed, it is worth noting the X6-1100T is leading the AMD pack.

Video Conversion - Xilisoft Video Converter 7

With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices. By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs. For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU. The time taken to convert these videos gives us our result.

Xilisoft Video Converter 7

XVC is a little odd in how it arranges its multicore processing. For our set of 33 videos, it will arrange them in batches of threads – so if we take the 8 thread FX-8350, it will arrange the videos into 4 batches of 8, and then a fifth batch of one. That final batch will only have one thread assigned to it (!), and will not get a full 8 threads worth of power. This is also why the 2x X5690 finishes in 6 seconds but the normal X5690 takes longer – you would expect a halving of time moving to two CPUs but XVC arranges the batches such that there is always one at the end that only gets a single thread.

Rendering – PovRay 3.7

The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing. It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed. As a CPU test, it runs for approximately 2-3 minutes on high end platforms.

PovRay 3.7 Multithreaded Benchmark

The SMP engine in PovRay is not perfect, though scaling up in CPUs gives almost a 2x effect. The results from this test are great – here we see an FX-8350 CPU below an i7-3770K (with MCT), until the Core Parking updates are applied, meaning the FX-8350 performs better!

Video Conversion - x264 HD Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps. This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed. The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.

x264 HD Benchmark Pass 1

x264 HD Benchmark Pass 2

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it. This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step. By choosing a regular grid, we reduce the levels of memory access required for irregular grids. We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision. The grid is isotropic and the boundary conditions are sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.

Explicit Finite Difference Grid Solver (2D)

Explicit Finite Difference Grid Solver (3D)

Grid solvers do love a fast processor and plenty of cache in order to store data. When moving up to 3D, it is harder to keep that data within the CPU and spending extra time coding in batches can help throughput. Our simulation takes a very naïve approach in code, using simple operations.

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations. This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread. The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing. For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision. Again our grid is isotropic with the boundaries acting as sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.

Implicit Finite Difference Grid Solver (2D)

2D Implicit is harsher than an Explicit calculation – each thread needs more a lot memory, which only ever grows as the size of the simulation increases.

Point Calculations - n-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity. Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other. When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code. We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

n-body Simulation via C++ AMP

As we only look at base/SSE2/SSE4 depending on the processor (auto-detection), we don’t see full AVX numbers in terms of FLOPs.

Testing Methodology, Hardware Configurations, and The Beast GPU Benchmarks: Metro2033
Comments Locked

242 Comments

View All Comments

  • JarredWalton - Wednesday, May 8, 2013 - link

    "While I haven't programmed AI..." Doesn't that make most of your other assumptions and guesses related to this area invalid?

    As for the rest, the point of the article isn't to compare HD 7970 with GTX 580, or to look at pure CPU scaling; rather, it's to look at CPU and GPU scaling in games at settings people are likely to use with a variety of CPUs, which necessitates using multiple motherboards. Given that in general people aren't going to buy two or three GPUs to run at lower resolutions and detail settings, the choice to run 1440p makes perfect sense: it's not so far out of reach that people don't use it, and it will allow the dual, triple, and quad GPU setups room to stretch (when they can).

    The first section shows CPU performance comparison, just as a background to the gaming comparisons. We can see how huge the gap is in CPU performance between a variety of processors, but how does that translate to gaming, and in particular, how does it translate to gaming with higher performance GPUs? People don't buy a Radeon HD 5450 for serious gaming, and they likely don't play games.

    For the rest: there is no subset of games that properly encompass "what people actually play". But if we're looking at what people play, it's going to include a lot of Flash games and Facebook games that work fine on Intel HD 4000. I guess we should just stop there? In other words, we know the limitations of the testing, and there will always be limitations. We can list many more flaws or questions that you haven't, but if you're interested in playing games on a modern PC, and you want to know a good choice for your CPU and GPU(s), the article provides a good set of data to help you determine if you might want to upgrade or not. If you're happy playing at 1366x768 and Medium detail, no, this won't help much. If you want minimum detail and maximum frame rate at 1080p, it's also generally useless. I'd argue however that the people looking for either of those are far less in number, or at least if they do exist they're not looking to research gaming performance until it affects them.
  • wcg66 - Wednesday, May 8, 2013 - link

    Ian, thanks for this. I'd really like to see how these tests change even higher resolutions, 3 monitor setups of 5760x1080, for example. There are folks claiming that the additional PCIe lanes in the i7 e-series makes for significantly better performance. Your results don't bare this out. If anything the 3930K is behind or sometimes barely ahead (if you consider error margins, arguably it's on par with the regular i7 chips.) I own an i7 2700K and 3930K.
  • Moon Patrol - Wednesday, May 8, 2013 - link

    Awesome review! Very impressed with the effort and time put into this! Thanks a lot!
    It be cool if you could maybe somewhere fit an i7 860 in somewhere over there. Socket 1156 is feeling left out :P I have i7 860...
  • Quizzical - Wednesday, May 8, 2013 - link

    Great data for people who want to overload their video card and figure out which CPU will help them do it. But it's basically worthless for gamers who want to make games run smoothly and look nice and want to know what CPU will help them do it.

    Would you do video card benchmarks by running undemanding games at minimum settings and using an old single core Celeron processor? That's basically the video card equivalent to treating this as a CPU benchmark. The article goes far out if its way to make things GPU-bound so that you can't see differences between CPUs, both by the games chosen and the settings within those games.

    But hey, if you want to compare a Radeon HD 7970 to a GeForce GTX 580, this is the definitive article for it and there will never be a better data set for that.
  • JarredWalton - Wednesday, May 8, 2013 - link

    Troll much? The article clearly didn't go too far out of the way to make things GPU bound, as evidenced by the fact that two of the games aren't GPU bound even with a single 7970. How many people out there buy a 7970 to play at anything less than 1080p -- or even at 1080p? I'd guess most 7970 owners are running at least 1440p or multi-monitor...or perhaps just doing Bitcoin, but that's not really part of the discussion here, unless the discussion is GPU hashing prowess.
  • Quizzical - Wednesday, May 8, 2013 - link

    If they're not GPU bound with a single 7970, then why does adding a second 7970 (or a second GTX 580) greatly increase performance in all four games? That can't happen if you're looking mostly at a CPU bottleneck, as it means that the CPU is doing a lot more work than before in order to deliver those extra frames. Indeed, sometimes it wouldn't happen even if you were purely GPU bound, as CrossFire and SLI don't always work properly.

    If you're trying to compare various options for a given component, you try to do tests that where the different benchmark results will mostly reflect differences in the particular component that you're trying to test. If you're trying to compare video cards, you want differences in scores to mostly reflect video card performance rather than being bottlenecked by something else. If you're trying to compare solid state drives, you want differences in scores to mostly reflect differences in solid state drive performance rather than being bottlenecked by something else. And if you're trying to compare processors, you want differences in scores to mostly reflect differences in CPU performance, not to get results that mostly say, hey, we managed to make everything mostly limited by the GPU.

    When you're trying to do benchmarks to compare video cards, you (or whoever does video card reviews on this site) understand this principle perfectly well. A while back, there was a review on this site in which the author (which might be you; I don't care to look it up) specifically said that he wanted to use Skyrim, but it was clearly CPU-bound for a bunch of video cards, so it wasn't included in the review.

    If you're not trying to make the games largely GPU bound, then why do you go to max settings? Why don't you turn off the settings that you know put a huge load on the GPU and don't meaningfully affect the CPU load? If you're doing benchmarking, the only reason to turn on settings that you know put a huge load on the GPU and no meaningful load on anything else is precisely that you want to be GPU bound. That makes sense for a video card review. Not so much if you're trying to compare processors.
  • JarredWalton - Wednesday, May 8, 2013 - link

    You go to max settings because that's what most people with a 7970 (or two or three or four) are going to use. This isn't a purely CPU benchmark article, and it's not a purely GPU benchmark article; it's both, and hence, the benchmarks and settings are going to have to compromise somewhat.

    Ian could do a suite of testing at 640x480 (or maybe just 1366x768) in order to move the bottleneck more to the CPU, but no one in their right mind plays at that resolution with a high-end GPU. On a laptop, sure, but on a desktop with an HD 7970 or a GTX 580? Not a chance! And when you drop settings down to minimum (or even medium), it does change the CPU dynamic a lot -- less textures, less geometry, less everything. I've encountered games where even when I'm clearly CPU limited, Ultra quality is half the performance of Medium quality.
  • IndianaKrom - Friday, May 10, 2013 - link

    Basically for the most part the single GPU game tests tell us absolutely nothing about the CPU because save for a couple especially old or low end CPUs, none of them even come close to hindering the already completely saturated GPU. The 2-4 GPU configurations are much more interesting because they show actual differences between different CPU and motherboard configurations. I do think it would be interesting to also show a low resolution test which would help reveal the impact of crossfire / SLI overhead versus a single more powerful GPU and could more directly expose the CPU limit.
  • Zink - Wednesday, May 8, 2013 - link

    You should use a DSLR and edit the pictures better. The cover image is noisy and lacks contrast.
  • makerofthegames - Wednesday, May 8, 2013 - link

    Very interesting article. And a lot of unwarranted criticism in the comments.

    I'm kind of disappointed that the dual Xeons failed so many benchmarks. I was looking to see how I should upgrade my venerable 2x5150 machine - whether to go with fast dual-cores, or with similar-speed quad-cores. But all the benchmarks for the Xeons was either "the same as every other CPU", or "no results".

    Oh well, I have more important things to upgrade on it anyways. And I realize that "people using Xeon 5150s for gaming" is a segment about as big as "Atom gamers".

Log in

Don't have an account? Sign up now