CPU Performance: Five Generations of Intel CPUs Compared

For the purposes of our look at Haswell, we will be breaking up our review coverage into two parts. The rest of this article will focus on the CPU side of Haswell, while coverage of the GPU - including Iris Pro and Crystalwell - has been spun off into another artice: Intel Iris Pro 5200 Graphics Review: Core i7-4950HQ Tested.

The majority of the market doesn’t upgrade annually, so I went back a total of five generations to characterize Haswell’s CPU performance. Everything from a 2.53GHz Core 2 Duo through Nehalem, Sandy Bridge, Ivy Bridge and Haswell are represented here. With the exception of the Core 2 platform, everything else is running at or near the peak launch frequency for the chip.

In general, I saw performance gains over Ivy Bridge of 1 - 19%, with an average improvement of 8.3%. Some of the performance gains were actually quite impressive. The 7.8% increase in Kraken shows there’s still room for improvement in lightly threaded performance, while the double digit FP performance gains in POV-Ray and x264 HD really play to Haswell’s strengths.

Compared to Sandy Bridge, Haswell looks even more impressive. The Core i7-4770K outperforms the i7-2700K by 7 - 26%, with an average performance advantage of 17%. The gains over Sandy Bridge aren’t large enough to make upgrading from a Sandy Bridge i7 to a Haswell i5 worthwhile though, as you still give up a lot if you go from 8 to 4 threads on a quad-core part running heavily threaded workloads.

Compared to Nehalem the gains average almost 44%.

Cinebench 11.5 - Single Threaded

Cinebench 11.5 - Multi Threaded

POV-Ray 3.7 RC7

7-zip Benchmark - Single Threaded

7-zip Benchmark - Multithreaded

Kraken Javascript Benchmark (Chrome)

PCMark 7 - Overall

x264 HD 5.0.1 - First Pass

x264 HD 5.0.1 - Second Pass

TrueCrypt AES Benchmark

Quite possibly the most surprising was just how consistent (and large) the performance improvements were in our Visual Studio 2012 compile test. With a 15% increase in performance vs. Ivy Bridge at the same frequencies, what we’re looking at here is the perfect example of Haswell’s IPC increases manifesting in a real-world benchmark.

Gaming Performance

After spending far too much time on the Iris Pro test system, I didn’t have a ton of time left over to do a lot of gaming performance testing with Haswell. Luckily Ian had his gaming performance test data already in the engine, so I borrowed a couple of graphs.

As expected, Haswell is incrementally quicker in GPU bound gaming scenarios compared to Ivy Bridge - and most definitely at the top of the charts.

Civilization V - One 7970, 1440p, Max Settings

Dirt 3 - One 7970, 1440p, Max Settings

Die Size and Transistor Count CPU Performance: Going Even Further Back
Comments Locked

210 Comments

View All Comments

  • Amaranthus - Monday, June 3, 2013 - link

    One of the main (and already implemented) uses of TSX is hardware lock elision. I'd guess the hypothesis is that physics code takes locks defensively but rarely actually have contention because they're working on different parts of the world. In this scenario more fine grained locks on sections of the world would let you scale better but that is a lot of work and HLE gives you the same benefit for free.
  • Jaybus - Monday, June 3, 2013 - link

    No. HLE (XACQUIRE and XRELEASE) do nothing by themselves. They reuse REPNE/REPE prefixes and on CPUs that do not support TSX are ignored on instructions that would be valid for XACQUIRE/XRELEASE if TSX were available. It is a backward compatibility method. Since all of those instructions may have a LOCK prefix, without TSX capability, a normal lock is used, NOT the optimistic locking provided by TSX that allows other threads to see the lock as already free.

    Without TSX the code is still (software) lock-free, but there is no possibility of multiple threads accessing the same memory simultaneously (as there is with TSX), so one or more threads will see a pipeline stall due to the LOCK prefix.
  • bji - Monday, June 3, 2013 - link

    I can't imagine that lock elision is that beneficial to very many applications. Lock contention is almost never a significant performance bottleneck; yeah there are poorly designed applications where lock contention can have a more significant effect, but proper multithreaded coding has the contended sections of code reduced to the smallest number of instructions possible, at which point the effects of lock contention are minimized.

    In order to take advantage of transactional memory and get the full benefits of TSX you have to write such radically different algorithms that I doubt that it's worth it except in the most unusual and specific cases. OK so you can use TSX instructions to make a hashtable or other container class suffer slightly less from lock contention, but that is oh so very rarely a significant aspect to the performance of any program.
  • klmccaughey - Monday, June 3, 2013 - link

    As a programmer, I disagree. This is a very useful feature set that, if it was more widely adopted, would prove very useful for many workaday tasks that the CPU performs.
  • bji - Monday, June 3, 2013 - link

    As a programmer, I am pretty sure that the benefits of TSX are limited to a very unusual and uncommon set of problems the performance increase of which will mean very very little to 99.99% of users 99.99% of the time. Also fully transactional memory algorithms require significant rework from their non-transactional counterparts meaning that taking full advantage of TSX takes developer effort which will not be worth it except in very rare circumstances.

    The HLE instructions may have some very minor benefit because they can be used with algorithms that don't need to be reworked at all (you just get a little bit more parallelism for free), but even then you're going to be avoiding some lock contention; even if you completely eliminated lock contention from most algorithms they would only be fractionally faster in real world usage. Lock contention just isn't that big of a deal in normal circumstances.
  • klmccaughey - Monday, June 3, 2013 - link

    Exactly. It would be the ubiquity of these features that would cause them to be useful - splitting them into segments defeats the adoption and use of said features. Intel are pushing segmentation too hard (too greedily?)
  • bill5 - Saturday, June 1, 2013 - link

    kind if a weird, scattered review.

    loved the q6600 though, since i still have one. and the 8350, since i have my eye on it.

    be interesting if this pushed 8350 prices down enough to be more attractive (it's currently only 180 on newegg). if not i'll probably go with i5 4670 (even though i'm getting tired of these faux msrp's, bet money that chip will be 229 on newegg forget 213)

    ps, my bill4 account was apparently banned (it kept saying i was posting spam wouldn't allow me to post) i post controversial things that probably get downvoted, but they arent spam. please stop doing that.
  • bill5 - Saturday, June 1, 2013 - link

    this really shows where the 8350 fails, single thread.

    it looks like clock for clock it's ipc may be similar to my q6600. it only gains in single thread due to the gaudy 4.0 clock speed.

    otoh go to multithread and it holds it's own against the other ~200 intel chips.
  • Nexus-7 - Saturday, June 1, 2013 - link

    In for one 4770k --

    I'm coming from an i5-750 running at 4GHz. I'm thinking this will be a sufficiently large leap forward although I'm tempted to wait for Ivy Bridge-E's 6c12t monsters.
  • chizow - Saturday, June 1, 2013 - link

    Similar boat, but I told myself I wouldn't wait for Intel's E platform anymore. X58 may be the last great E platform, mainly because it actually preceded the rest of the mainstream performance parts. Intel seems to sit on these E platforms for so long now that they almost become irrelevant by the time they launch.

Log in

Don't have an account? Sign up now