Single & Heavily Threaded Workloads Need Not Apply

Remembering what these two hotfixes actually do, the only hope for performance gains comes from running workloads that are neither single threaded nor heavily threaded. To confirm that there are no gains at either end of the spectrum we first turn to Cinebench, a 3D rendering test that lets us configure how many threads are in use:

Cinebench 11.5 - Single Threaded

Cinebench 11.5 - Multi-Threaded

With one thread or 8 threads active, the FX-8150's performance is unchanged by the new hotfixes. I also ran TrueCrypt's encryption/decryption benchmark, another heavily threaded test that runs on all cores/modules:

AES-128 Performance - TrueCrypt 7.1 Benchmark

Once again, there's no change in performance. Although you can argue that CPU performance is most important when utilization is at its highest, most desktops will find themselves in between full utilization of a single core and all cores. To test those cases, we need to look elsewhere.

The Hotfixes Mixed Workloads: Mild Gains
POST A COMMENT

79 Comments

View All Comments

  • KonradK - Friday, January 27, 2012 - link

    "Ideally, threads with shared data sets would get scheduled on the same module, while threads that share no data would be scheduled on separate modules."

    I think that it is impossible for sheduler to predict how thread will behave and it is not practical to track the behaviour of running thread (tracking which areas of memory are accessed by threads would be so computational intensive as computational intensive is emulation).
    So ultimately there is choice between "threads should be scheduled on separate modules if possible" or "do not care which cores belongs to the same module" (pre-hotfix behaviour).
    Second means that Bulldozer will behave as PIV EE (2 core, 4 threads) on Windows2000, at least for threads that uses FPU heavily. Windows 2000 does not ditinguish between logical and physical cores.
    Reply
  • Araemo - Friday, January 27, 2012 - link

    I've noticed that windows doesn't always schedule jobs well to take advantage of Intel Turbo Boost.. I realize that it probably doesn't have a noticeable level of impact, but I do notice that running only 1 thread of high-cpu-utilization still doesn't often kick turbo above the 3/4 cores active frequency.. I can use processor affinity on the various common background tasks to pull them all to 1 or 2 cores to activate full turbo, but if a process is only using a percent or so of cpu resources, why schedule it to an otherwise-inactive core if there is an already-active, but 98% un-utilized core available? I think the power gating efficiencies would actually be more useful than the pure mhz-related turbo efficiencies (Running 2 cores 100Mhz faster is probably going to waste less power than you gain by shutting down the other two cores completely/partially).

    Is there anything to address that behavior?
    Reply
  • taltamir - Friday, January 27, 2012 - link

    Wouldn't those hotfixes improve performance on intel HT processors as well? Reply
  • tipoo - Friday, January 27, 2012 - link

    No, Windows already leaves virtual threads from hyperthreading alone until all the physical cores are used, so this won't improve things on the Intel side any. This is specifically for Bulldozer and future architectures like this. Reply
  • Hale_ru - Monday, February 06, 2012 - link

    Bullshit!
    Win7 had nothing to do with Intel HT until AMD hit them in the head!

    I had so much asspain with Win7 shity CPU scheduler on FEM and FDTD simulations.

    8HT-core setup just reduces overall performance up to 50%(a HALF!!!) comparing to NO-HT setup.
    Simpel task manager checkup showed that Win7 just was putting low-threaded processes on the same core without an option. Just simplest increment scheduler they have for Intels.
    Reply
  • Hale_ru - Monday, February 06, 2012 - link

    So, it is recommended to use AMD optimization patches (only the core-addresing one, not the C6 state patch) on any Win7 machine using simple multithreaded mathematics. Reply
  • hescominsoon - Friday, January 27, 2012 - link

    Shared cpu modules that have to compete for resources? Reminds me of HT v1. IMO this is basically a quad core chip with the other 4 threads available in the primary core isn't being used all the way. I've looked at the design and it's just nonsensical. This is not a futuristic bet but a desperate attempt at differentiation...with most likely disastrous results. AMD has now painted themselves into a niche product instead of a high performance general purpose cpu. Reply
  • dgingeri - Friday, January 27, 2012 - link

    I like the design of the Bulldozer overall, but there is obviously a bottleneck that is causing problems with this chip. I'm thinking the bottleneck is likely the decoder. it can only handle 4 instructions per clock cycle, and feeds 2 full Int cores and the FP unit shared between the two cores. I bet increasing the decoder capacity would show a really big increase in speed. What do you think? Reply
  • bji - Saturday, January 28, 2012 - link

    I think that if it was something easy, AMD would have done it already or are in the process of doing it.

    I also think that it's unlikely that it's something as simple as improving the decoder throughput, because one would think that AMD would have tried that pretty early on when evaluations of the chip showed that the decoder was limiting performance.

    These chips are incredibly complicated and all of the parts are incredibly interrelated. The last 25% of IPC is incredibly hard to achieve.
    Reply
  • bobbozzo - Friday, January 27, 2012 - link

    The hotfixes also support Windows Server 2008 R2 Service Pack 1 (SP1) Reply

Log in

Don't have an account? Sign up now