CPU Benchmarks

Point Calculations - 3D Movement Algorithm Test

The algorithms in 3DPM employ both uniform random number generation or normal distribution random number generation, and vary in various amounts of trigonometric operations, conditional statements, generation and rejection, fused operations, etc.  The benchmark runs through six algorithms for a specified number of particles and steps, and calculates the speed of each algorithm, then sums them all for a final score.  This is an example of a real world situation that a computational scientist may find themselves in, rather than a pure synthetic benchmark.  The benchmark is also parallel between particles simulated, and we test the single thread performance as well as the multi-threaded performance.

3D Particle Movement Single Threaded

For single thread performance, the higher MHz Haswell CPUs sit on top of the list - interestingly enough it is the Xeons.  Comparing these to the i7-4960X, which also sits at 4 GHz, shows the generational difference in this purely multithreaded test.  The 100 MHz difference between the i5-4670K and the i7-4770K shows up as two points in this test.  The s1366 CPUs are staggered between a score of 90.93 and 115.79, with the i7-920 falling short of the X6-1100T.  Due to the IPC difference the i7-990X is behind the i5-2500K and anything older at a similar MHz.

3D Particle Movement MultiThreaded

For the multithreaded test, cores and MHz with FP performance win out here, so the i5-4670K, even in a motherboard with Multi-Core Turbo, sits behind the eight threads of the FX-8350 and six threads of the X6-1100T.  The i7-4770K scores another 75%, along with the Xeons.  In terms of the Nehalem CPUs, the i7-990X performs an extra 200 points higher than the latest Haswell CPUs due to its six core / twelve thread design.  Unfortunately the i7-920/i7-950 are a little behind, with the i7-2600K offering a noticable boost.

Compression - WinRAR x64 3.93 + WinRAR 4.2

With 64-bit WinRAR, we compress the set of files used in the USB speed tests. WinRAR x64 3.93 attempts to use multithreading when possible, and provides as a good test for when a system has variable threaded load.  WinRAR 4.2 does this a lot better! If a system has multiple speeds to invoke at different loading, the switching between those speeds will determine how well the system will do.

WinRAR 3.93

The only downside with WinRAR is that when you're dealing with slow CPUs, they are very slow!  The quad core Nehalem CPUs are kept on track by the FX-8350 using this older version of WinRAR, although it seems the higher IPC wins out here over cores with the 4.0 GHz Haswell Xeons scoring best.

WinRAR 4.2

The improvements in WinRAR 4.2 due to optimisations and multi-threading result in more cores giving better results.  The i7-990X does well here, although Sandy Bridge-E and Ivy Bridge-E take the top spots.  Due to the threading advantage WinRAR takes, the i7-4770K gets a 20 second advantage of its non-hyperthreaded cousin, the i5-4670K.

Image Manipulation - FastStone Image Viewer 4.2

FastStone Image Viewer is a free piece of software I have been using for quite a few years now.  It allows quick viewing of flat images, as well as resizing, changing color depth, adding simple text or simple filters.  It also has a bulk image conversion tool, which we use here.  The software currently operates only in single-thread mode, which should change in later versions of the software.  For this test, we convert a series of 170 files, of various resolutions, dimensions and types (of a total size of 163MB), all to the .gif format of 640x480 dimensions.

FastStone Image Viewer 4.2

FastStone loves single threaded IPC and MHz, so it's no surprise for the Haswell CPUs to be on top, with no discernable difference between the i5-4670K and the i7-4770K.  The old school Nehalems take a knock, with the i7-920 being almost a full 60% slower than the top scores.

Video Conversion - Xilisoft Video Converter 7

With XVC, users can convert any type of normal video to any compatible format for smartphones, tablets and other devices.  By default, it uses all available threads on the system, and in the presence of appropriate graphics cards, can utilize CUDA for NVIDIA GPUs as well as AMD WinAPP for AMD GPUs.  For this test, we use a set of 33 HD videos, each lasting 30 seconds, and convert them from 1080p to an iPod H.264 video format using just the CPU.  The time taken to convert these videos gives us our result.

Xilisoft Video Converter 7

For fully multithreaded video conversion, a combination of cores, IPC and MHz take top spots, hence the i7-4960X is the consumer CPU to get.  The i7-990X has a smaller advantage over the quad core Haswells this time, and here is one benchmark where the i5-4670K falls behind the FX-8350s due to the integer nature of the workload.  Interestingly enough the i5-4430 slots in with an i5-2500K due to IPC increases despite lower power consumption and MHz.

Rendering – PovRay 3.7

The Persistence of Vision RayTracer, or PovRay, is a freeware package for as the name suggests, ray tracing.  It is a pure renderer, rather than modeling software, but the latest beta version contains a handy benchmark for stressing all processing threads on a platform. We have been using this test in motherboard reviews to test memory stability at various CPU speeds to good effect – if it passes the test, the IMC in the CPU is stable for a given CPU speed.  As a CPU test, it runs for approximately 2-3 minutes on high end platforms.

PovRay 3.7 Multithreaded Benchmark

PovRay is another 'multithreading takes all', as shown by our 4P testing on E5-4650L CPUs.  The i7-990X still shows its worth, being as quick as the i7-4770K at least, although the i7-920 and i7-950 are further down the pecking order.

Video Conversion - x264 HD Benchmark

The x264 HD Benchmark uses a common HD encoding tool to process an HD MPEG2 source at 1280x720 at 3963 Kbps.  This test represents a standardized result which can be compared across other reviews, and is dependent on both CPU power and memory speed.  The benchmark performs a 2-pass encode, and the results shown are the average of each pass performed four times.

x264 HD Benchmark Pass 1x264 HD Benchmark Pass 2

Grid Solvers - Explicit Finite Difference

For any grid of regular nodes, the simplest way to calculate the next time step is to use the values of those around it.  This makes for easy mathematics and parallel simulation, as each node calculated is only dependent on the previous time step, not the nodes around it on the current calculated time step.  By choosing a regular grid, we reduce the levels of memory access required for irregular grids.  We test both 2D and 3D explicit finite difference simulations with 2n nodes in each dimension, using OpenMP as the threading operator in single precision.  The grid is isotropic and the boundary conditions are sinks.  Values are floating point, with memory cache sizes and speeds playing a part in the overall score.

Explicit Finite Difference Grid Solver (2D)Explicit Finite Difference Grid Solver (3D)

Grid solvers do love a fast processor and plenty of cache in order to store data.  When moving up to 3D, it is harder to keep that data within the CPU and spending extra time coding in batches can help throughput.  Our simulation takes a very naïve approach in code, using simple operations, but that doesn't stop the single socket, highly threaded CPUs taking top spots.  The i5-4670K takes a surprising twist in 2D, outpacing the i7-4770K.

Grid Solvers - Implicit Finite Difference + Alternating Direction Implicit Method

The implicit method takes a different approach to the explicit method – instead of considering one unknown in the new time step to be calculated from known elements in the previous time step, we consider that an old point can influence several new points by way of simultaneous equations.  This adds to the complexity of the simulation – the grid of nodes is solved as a series of rows and columns rather than points, reducing the parallel nature of the simulation by a dimension and drastically increasing the memory requirements of each thread.  The upside, as noted above, is the less stringent stability rules related to time steps and grid spacing.  For this we simulate a 2D grid of 2n nodes in each dimension, using OpenMP in single precision.  Again our grid is isotropic with the boundaries acting as sinks. Values are floating point, with memory cache sizes and speeds playing a part in the overall score.

Implicit Finite Difference Grid Solver (2D)

If anything, large caches matter more in implicit simulation, in line with both cores and threads.  The i5-4430 is on the lower rungs of the Intel bloc, but the 990X is at the top.

Point Calculations - n-Body Simulation

When a series of heavy mass elements are in space, they interact with each other through the force of gravity.  Thus when a star cluster forms, the interaction of every large mass with every other large mass defines the speed at which these elements approach each other.  When dealing with millions and billions of stars on such a large scale, the movement of each of these stars can be simulated through the physical theorems that describe the interactions. The benchmark detects whether the processor is SSE2 or SSE4 capable, and implements the relative code.  We run a simulation of 10240 particles of equal mass - the output for this code is in terms of GFLOPs, and the result recorded was the peak GFLOPs value.

n-body Simulation via C++ AMP

Due to extension enhancements, we see that a quad core Haswell Xeon scores roughly the same as the hex-core Nehalem, with the i5-4430 not far behind. If anything, the i7-920 and i7-950 take a nose dive here, and it's worth investing even in an i5-4430 for a 50% performance enhancement.

Testing Methodology, Hardware Configurations GPU Benchmarks: Metro2033
Comments Locked

137 Comments

View All Comments

  • BOMBOVA - Sunday, October 6, 2013 - link

    a low cost raid controller yes, 64kb, 128kb, show the merit of raid 0, at 6 Gbs each, i was doubtful myself, but took the test of the device, for i need better video editing performance, at least it works :) now we have to watch out for the 12Gbs devices coming soon, imo for games, not much notice of improvement to be seen, but in big data transfers, sata 3 improvements, can be had for low costs. gl, trying it out, borrow a card to try, if you can, Cheers.
  • BOMBOVA - Saturday, October 26, 2013 - link

    fact is you can set to 32 K blocks, or 64K, 'only" , but is a true Marvel controller chip, in the Syba, and is on the PCI e buss. , Control M, sets the chipset, works rite off, is quick, but, there is a hint, that the lanes are only 5 Gbit second, still is a fine patch upgrade, on low cost 6Gbit second ssd's i am in for 2x120 ssd's and controller for 250,
  • R-Type - Friday, October 4, 2013 - link

    Results are typical for a variety of games where the resolution is set to 1920 x 1200. Games include Dirt 3, Civilization V, Guild Wars 2, Mechwarrior Living Legends, Diablo 3, Starcraft 2, etc.
  • augiem - Friday, October 4, 2013 - link

    I'm really kind of shocked to see how well Nehalem stands up still in many benchmarks. If you adjust the i7 920 benchmarks to make up for the difference in frequency between it and the 4770K, it's not half bad. I used the difference between the i7-920 and i7-950 to determine how the benchmark scaled on Nehalem. If it was close enough to linearly (+/- 1%), I considered it. I saw a 6% - 40% performance advantage for Haswell across the CPU tests, which is actually smaller than I expected for an almost 6 year old chip. (Obviously this includes differences in the platforms too.) Striking that even in 6 years the speed hasn't even doubled.

    I'm still on an i7-920@3.6, so this was very relevant to me. If it were 40% across the board, it might be more compelling, but quite a few were more like 15%, 20%, etc. Now I understand Haswell is going to OC a lot further than this one, so in that way you could get the performance diff up there.

    I'm just dumbfounded that this Nehalem has lasted me 4.5 years already and it still doesn't feel slow. On the one hand its great value for the money, but on the other hand its a little disappointing to see performance curve drop off like it has over the past 6 years.

    That would be a fun project. Make a graph showing average CPU performance increases over the last 30 years.
  • Genericuser1234 - Saturday, October 5, 2013 - link

    As interesting these chips are for getting maximum performance from the high w parts. Will you do an article about the low TDP parts that are the true masterpieces Intel makes. I live in Denmark and energy cost more and more. My PC is running almost nonstop and I am curious how well these chips perform in a gaming environment. How far behind are they on performance and what kind of power cost on a year based on an average workload / idle time are we talking about you could save. I find the low power chips to be Intel's true stars. Do more with less. Maybe even throw in a power house chip from 2 years ago for comparison. That would be an interesting article
  • agent_x007 - Saturday, October 5, 2013 - link

    Congrats to those who did testing part.
    Can't wait to see AMD added.

    Too bad Pentium XE 955/965 (ie. Presler B1/C1 @ 3,46/3,73GHz) didn't "cut it" for this comparson :(
    Hyper Threading and "Last of Netburst" legacy could be interesting in comparison with low end, fully-intergated setups, like VIA Nano or AMD Fusion.
    + There is also a possibility that todays multithreaded programs would better utilize the 4 threads of this kind of CPU, maybe to the point of matching Core 2 Duo's...

    Either way, to sum it up in two words : GREAT WORK.
  • khanov - Saturday, October 5, 2013 - link

    A good article, and nice to see an update now that new CPU's are out.

    Wouldn't it be nice if you could have all the benefits of X79 for multi GPU configurations, but without the added cost over Z87? Well actually you can, if you take in to account the quad-core LGA2011 CPUs.

    The i7-4820K is no more expensive than the i7-4770K, and motherboard costs are very similar too. So people seriously considering 3 or 4 GPUs might be very interested in this option, to gain the benefits of extra PCIe lane allocation without the extra cost of a hex core CPU.

    Ian, would you please consider adding i7-3820 and/or i7-4820K to the next update? It would be nice to see how well, or how badly, they fare against the competition.
  • MarcHFR - Sunday, October 6, 2013 - link

    Hi all,

    Sorry but i don't understand this review. What's the point of recommanding different CPU on the only basis of single/dual/tri/quad GPU ?

    First, the GPU power is not related to the number of GPU only, with 2x660 you get lower performance than 1x780, but if i read the conclusion for 2x660 you recommand FX-8350 but A8-5600K for 1x780 ?

    Second, for example with only a 7970 with a small CPU or a big CPU you get exactly the same performance on Sleeping Dogs 2560*1440 max settings. But what kind of player will keep a setting that offer 28 fps on such a carde ? None ! They will lower the graphic settings related to the GPU only to a point that they will get a higher framerate, like the 80 fps you get with three card.

    Whatever the number/power of the GPU, as soon as it's not a lower-end card, the CPU needed to get playable framerate is the same with a GTX 660 or 2xGTX 780 as soon as you don't use graphics settings related to the GPU that lower the framerate that can be sustained by the GPU under the framerate that can be sustained by the CPU.

    You can recommand different CPU to get more than 40/60/80/120 fps in some games (but good luck since integrated benchmark are generally not using the most CPU bound scene), but recommand different CPU for single/dual/tri/quad GPU seems a non-sense for me.
  • Majesticii - Sunday, October 6, 2013 - link

    Damn. How can you call this a CPU comparisson with data like this. The games are run at such extreme values that in no way they represent the impact of a CPU. Sleeping dogs is just 4 graphs with 28fps, how can any respected researcher show this data without severe shame. To add insult to injury, the vast majority seems to think this is how CPU tests are done and call it a nice review. Literally my heart sank as i read through these comments. Noone (except a few ignored), not even the reviewers has a clue what they're on about. This way of CPU-reviewing in games needs to stop. This isn't just uninformative, it's worse; It's completely misleading. Test games at 800x600 low settings, and pay no mind to those people calling for "real-world benchmarks". Stay true to what's real, instead of appealing to the community.
  • 3Ball - Monday, October 7, 2013 - link

    Forgive me if this is pointed out in the article and I have missed it, but it is worth pointing out. Battlefield 4 will use up to 8 cores/threads. My i7-860 @ 4.0ghz with hyperthreading is outperforming a friends Ivy bridge (3570k) at 4.4ghz without hyperthreading, so much so that my frames are better using a GTX680 against his GTX780.

    This could be the product of the "beta", but I do believe it is a sign of things to come. The new consoles are most likely going to influence multithreaded performance greatly considering the lower sinlge thread performance present in the systems.

    I have been planning on rebuilding with haswell early next year and was planning on getting a 4670k, but have now changed that decision to going with a 4770k due to this experience. Just my two cents. Cheers!

Log in

Don't have an account? Sign up now