Mixed Workloads: Mild Gains

The one thing all of the following benchmarks have in common is they feature more varied CPU utilization. With periods of heavy single and all core utilization, we also see times when these benchmarks use more than one core but fewer than all.

SYSMark has always been a fairly lightly threaded test. While there are definite gains seen when going from 2 to 4 cores, this is hardly a heavily threaded test. The performance impact of the hotfixes is negligible in the overall performance result or across the individual benchmark suites however:

SYSMark 2012 - Overall

Our Visual Studio 2008 compile test is heavily threaded for the most part, however the beginning of the build process uses a fraction of the total available cores. The hotfixes show a reasonable impact on performance here (~5%):

Build Chromium Project - Visual Studio 2008

The first pass of our x264 transcode benchmark doesn't use all available cores but it is more than just single threaded:

x264 HD Benchmark - 1st pass - v3.03

Performance goes up but only by ~2% here. As expected, the second pass which consumes all cores in the system remains unchanged:

x264 HD Benchmark - 2nd pass - v3.03

Games are another place we can look for performance improvements as it's rare to see consistent, many-core utilization while playing a modern title on a high-end CPU. Metro 2033 is fairly GPU bound and thus we don't see much of an improvement, although for whatever reason the 51.5 fps ceiling at 19x12 is broken by the hotfixes.

Metro 2033 Frontline Benchmark - 1024 x 768 - DX11 High Quality

Metro 2033 Frontline Benchmark - 1920 x 1200 - DX11 High Quality

DiRT 3 shows a 5% performance gain from the hotfixes. The improvement isn't enough to really change the standings here, but it's an example of a larger performance gain.

DiRT 3 - Aspen Benchmark - 1024 x 768 Low Quality

DiRT 3 - Aspen Benchmark - 1920 x 1200 High Quality

Crysis Warhead mirrors the roughly 5% gain we saw in DiRT 3:

Crysis Warhead Assault Benchmark - 1680 x 1050 Mainstream DX10 64-bit

Civilization V's CPU bound no render results show no gains, which is to be expected. But looking at the average frame rate during the simulation we see a 4.9% increase in performance.

Civilization V - 1680 x 1050 - DX11 High Quality

Civilization V - 1680 x 1050 - DX11 High Quality

Single & Heavily Threaded Workloads Need Not Apply Final Words
Comments Locked

79 Comments

View All Comments

  • Ratman6161 - Saturday, January 28, 2012 - link

    With an Intel i3 2100. I suspect the gamers might actually be better served with the i3 than AMD's 8150 other than the inability to overclock the i3. Think of how much better a GPU you could get by saving about $150 by going with an i3 over an 8150.

    Just sayin...
  • medi01 - Sunday, January 29, 2012 - link

    Did you forget something important?
    Taking into account, cough, motherboard prices, cough?

    The "you can get much better GPU for XX bucks" stands for pretty much any non low end CPU.
  • xeridea - Monday, January 30, 2012 - link

    AMD did a blind test with the 2700k and 8150. 28% chose 2700k, 51% chose 8150, 20% undecided. It is a possibly an anomaly that the 8150 got more votes, but it at least shows that it is competitive as a gaming CPU, even compared to the 2700k which costs $100 more.

    IIRC the 8150 is competitive and sometimes beats the 2500k and even beats the 2600k in some highly threaded situations.

    Your figures are wrong. Intel has the 2600k for $325 and the 8150 for $270. This means the 8150 is $65 less than the 2600k, not $30. It is $40 more than the 2500k but you have to consider generally higher cost of motherboard/memory with Intel, and guaranteed socket incompatibility with every CPU.
  • arjuna1 - Saturday, January 28, 2012 - link

    Yeah, Imagine how pathetic can a life be when you need to sustain your ego by hating the competing brand of your PC's cpu.
  • Fox5 - Saturday, January 28, 2012 - link

    Wasn't the point of the hotfix to consolidate threads onto modules, so that modules could be gated and turbo core enabled? Isn't that where the performance boost is supposed to come from?
  • KonradK - Sunday, January 29, 2012 - link

    Windows 7 sheduler since beginning tries to group many threads on single core. It is mainly for maximize effect of power gates. I'm not sure whether core must be completely idle for turbo core/turbo boost to be enabled.
    Purpose of hotfix was to make sheduler distingushing between cores beloging to the same, or separate module.
  • Mech-Akuma - Monday, January 30, 2012 - link

    This is what I thought as well. I remember reading somewhere when Bulldozer launched that Win7 prefers to primarily direct all threads to cores inside different modules. This would cause modules not to be able to enter C6 sleep and therefore the performance improvement of Turbo Core would be drastically cut. However since each module does not need to share resources (while each module is only using 1 core) performance is picked up there.

    However this contradicts what this article says about pre-hotfix scheduling. Could someone clarify how pre-hotfix scheduling worked?
  • silverblue - Saturday, January 28, 2012 - link

    Toms did the same thing. However, I'm not sure it'd make much difference.
  • richaron - Monday, January 30, 2012 - link

    As far as I'm aware 1600 is the most developed ram at the moment. I consider anything faster as simply overclocked 1600; more "bandwidth", more voltage, but lower cas latency.
    I'm not sure what the reviewers were thinking, but they probly know more than most of us...
  • KonradK - Saturday, January 28, 2012 - link

    "No. Bulldozer adds silicon that actually executes instructions. [...]"

    Whole idea of Hyperthreading is to let a second thread utilize resources of core wasted by first thread (and vice-versa).
    Depending on fact how much the execution units are replicated and how (un)optimally written is code amount of wastes (and benefits from Hyperthreiding) will be lower or greater.
    Maybe the execution units of Bulldozer's FPU are replicated to the such excent that it will be wasted in most cases unless used by two threads simultaneusly.
    But performance of two (FPU intensive) threads running on the same module will never be equal to the performance of two threads running on separate modules.*
    Otherwise hotfixes to sheduler would be useless.
    *) Assuming that CPU is not fully loaded i.e. some cores remains idle.

Log in

Don't have an account? Sign up now