Final Words

AMD didn't overpromise as far as the benefits of these new scheduling/core parking hotfixes for Windows 7 are concerned. Single digit percentage gains can be expected in most mixed workloads, although there's a chance that you'd see low double digit gains if the conditions are right. It's important to note that the hotfixes for Windows 7 aren't ideal either. They simply force threads to be scheduled on empty modules first rather than idle cores on occupied modules. To properly utilize Bulldozer's architecture we'd need a scheduler that schedules both based on available cores/modules but biases its scheduling depending on data dependency between threads.

If Bulldozer were the last architecture to present this type of scheduling challenge I'd say that it's unlikely we'll see things get better. Luckily for AMD, I don't believe homogeneous multi-core architectures will be all we get moving forward. Schedulers will get better at understanding the underlying hardware, just as they have in the past. We may see better utilization of Bulldozer cores/modules in Windows 8 but as always, don't use the promise of what may come as a basis for any present day purchasing decisions.

Mixed Workloads: Mild Gains
Comments Locked

79 Comments

View All Comments

  • wumpus - Friday, January 27, 2012 - link

    I'd have to believe that any CPU with SMT enabled will benefit. That is, unless they already have this feature. Of course, Intel has been shipping SMT processors since P4. I'd like to believe that microsoft simply flipped whatever switch to treat bulldozer cores as SMT cores, but I don't have enough faith in microsoft's scheduling to believe they ever got it right.
  • hansmuff - Friday, January 27, 2012 - link

    At least Windows 7 (haven't tested anything else) schedules threads properly on Sandy Bridge. HT only comes into play once all 4 cores are loaded.
  • tipoo - Friday, January 27, 2012 - link

    Windows already has intelligent behaviour for Hyperthreading. I don't think this will change anything on the Intel side.
  • silet1911 - Wednesday, February 1, 2012 - link

    Yes, a website called Jagatreview have review a 2500+patch and there is a small performance increase

    http://www.jagatreview.com/2012/01/amd-fx-8120-vs-...
  • tk11 - Friday, January 27, 2012 - link

    Even if if a scheduler did take the time to figure out when threads shared a significant number of recent memory accesses would that be enough information to determine that the thread would perform optimally on the same module as a related thread rather than an unused module?

    Also... Wouldn't running code that performed "intelligent core/module scheduling based on the memory addresses touched by a thread" negatively impact performance far more than any gains realized by scheduling threads on cores that are merely suspected to be more optimally suited to running each particular thread?
  • eastyy123 - Friday, January 27, 2012 - link

    could some explain the whole module/core thing to me please

    i always assumed a core was basically like a whole processor shrunk onto a die is that basically right ?

    and how does the amd modules differ ?
  • KonradK - Friday, January 27, 2012 - link

    Long sory short:
    Bulldozer's module consist 2 integer cores and 1 floating point (FPU) core.
  • KonradK - Friday, January 27, 2012 - link

    "Story" no "sory"
    I'm sorry...
  • Ammaross - Friday, January 27, 2012 - link

    "Bulldozer's module consist 2 integer cores and 1 floating point (FPU) core."

    However, the 1 FPU core can be used as two single floating point cores or a single double double floating point core, so it depends on the floating point data running through it.
  • KonradK - Friday, January 27, 2012 - link

    Not sure what you are supposing.
    Precision is the same, regardless of fact whether one or two threads are executed by FPU core. There are single or double precision FPU instructions, but aech thread can use any of them.
    However if you mean single or double performance:
    If two FPU threads will run on the same module each of them will have half of performance in comparision tothe same two FPU threads running on separate modules.
    Just in first case one FPU is shared by two threads.
    And it is whole point in the hotfixes - avoiding such situation as long this is possible.

Log in

Don't have an account? Sign up now