Athlon II X2: Hardware C1E and Return of the CnQ Bug

I noticed something strange in my initial testing of the Athlon II X2, take a look at these SYSMark results:

Processor SYSMark 2007 Overall
AMD Phenom II X2 550 BE (3.10GHz) 167
AMD Athlon II X2 250 (3.00GHz) 134
AMD Athlon X2 7850 (2.80GHz) 145

 

The Athlon II X2 250 is slower than the Athlon X2 7850 and significantly slower than the Phenom II X2 550. Remembering the Cool’n’Quiet bug from the original Phenom processor I decided to turn CnQ off in the BIOS to see if the scores would go up:

Processor SYSMark 2007 Overall - CnQ On SYSMark 2007 Overall - CnQ Off
AMD Athlon II X2 250 (3.00GHz) 134 148

 

Indeed they did. 

I contacted AMD and was informed that there’s more than meets the eye with the Athlon II X2.  Although the architecture is fundamentally a couple of Phenom II cores with larger L2 caches and no L3, there’s one more change to the die: microcode support for the C1E power state.

When the OS executes a halt instruction on a CPU (during a period of no activity for example) the clock signal is shut off to the CPU for a period of time.  This saves power as no transistors are actively switching during this time.  Voltage supplied to the processor is left unchanged however.  This power state is known as C1.

In the late Pentium 4 era Intel introduced an Enhanced Halt State, called C1E.  Instead of just shutting off the clock to the CPU, when a CPU was in C1E its clock speed and voltage would both be reduced to their lowest possible value.  The reduction in voltage results in a reduction in leakage current, which in turn saves power.

Apparently prior to the Athlon II X2, AMD enabled support for C1E outside of the processor.  Although I tried, I couldn’t get access to anyone at AMD to explain things any further so what I offer is my best guess.  I’m guessing that whenever a halt instruction was executed by the OS, AMD used some combination of its existing C1 support and Cool’n’Quiet to both stop the clock to the CPU and reduce voltage. 

Regardless of how AMD enabled it, motherboard makers were constantly botching it up in their BIOS which would result in different motherboards having very different power consumption levels especially at idle.  It appears that some vendors were properly enabling this software-hack C1E state while others weren’t.

AMD always expressed frustration to me that the motherboard vendors kept screwing things up and I’m guessing they got tired of dealing with it.  The new Athlon II X2 has microcode level support for the C1E state; when the OS executes a halt instruction, the CPU now knows to both shut off its clock and drop its voltage.  No BIOS trickery necessary.

The problem with this, as you can guess, is that not all current motherboards have proper BIOS support for it.  Yep.

But that’s only half of the problem.  Simply not supporting the new hardware C1E in the Athlon II X2 won’t cause the issue I saw above, that has to do with Cool’n’Quiet, not C1E.  So what’s going on?

Late last week AMD finally got back to me with an answer.  The feature that caused the CnQ bug in the original Phenom processor was the processor’s ability to run each core at a different clock speed.  A nasty combination of Windows’ scheduler and the Phenom’s power management could result in cores, under load, running at 50% of their frequency.  AMD fixed the problem by removing the feature; in the Phenom II all cores attempt to run at the same frequency. 

When AMD put out its master BIOS code for all 7xx series reference motherboards, the Athlon II did not exist.  The fix that was applied to the Phenom II would not be applied to any other Phenom II based derivatives, they would simply get treated as original Phenom processors with varying clock speeds between cores.

And that’s what’s going on.  The Athlon II isn’t told to run both of its cores at the same frequency and thus you can have situations where performance is much lower than it should be.

AMD is aware of the issue and are currently working with motherboard vendors to properly enable BIOS support for the Athlon II.  Until then, the best way to run and use the Athlon II is with CnQ disabled.  Unlike the original Phenom, this bug should get fixed in the near future.

Index Intel’s Response: The Pentium E6300
POST A COMMENT

55 Comments

View All Comments

  • haplo602 - Tuesday, June 2, 2009 - link

    can you include linux kernel compilation tests, or something similar or larger (gcc, libqt, X) ??? would help me much more than gaming and 3d rendering benches :-) Reply
  • virvan - Tuesday, June 2, 2009 - link

    Anand, I BEG you to include some kind of compilation tests in the "bench" application; some of us are actually programmers that spend more time building than watching or transcoding movies ;)
    A Linux Kernel bench + some kind of MS Visual C++ benchmark would be extremely welcome.
    Btw, when could we expect the old CPUs to be added to Bench? I am specifically waiting for Athlon XP and P3/P4's.
    10x
    Reply
  • Anand Lal Shimpi - Tuesday, June 2, 2009 - link

    I really do want to include a software build test, the question is what is the simplest to setup and run, most representative and most repeatable test I can run?

    I'd prefer something under Windows because it means one less OS/image change (which matters if you're trying to run something on ~70 different configurations) but I'm open to all suggestions.

    Thoughts? Feel free to take this conversation offline over email if you'd like to help.

    Take care,
    Anand
    Reply
  • virvan - Wednesday, June 3, 2009 - link

    You could try building a CGAL demo program (http://www.cgal.org/FAQ.html)">http://www.cgal.org/FAQ.html). It is cross platform and big enough (but not too big).
    I am really a Linux programmer but I could try to help if you are not a programmer. I haven't booted Windows for years but, hey, we have virtual machines nowadays :)
    Reply
  • adiposity - Tuesday, June 2, 2009 - link

    A fairly decent size build that I do is Qt under VS 2008.

    Instructions are here:

    http://wiki.qtcentre.org/index.php?title=Qt4_with_...">http://wiki.qtcentre.org/index.php?title=Qt4_with_...

    Download source here:

    http://www.qtsoftware.com/downloads/windows-cpp">http://www.qtsoftware.com/downloads/windows-cpp

    You can use VS2008 Express.

    -Dan
    Reply
  • haplo602 - Wednesday, June 3, 2009 - link

    I have no experience with VS 2008. Can it be manualy set to certain amount of compile threads ? make has a command line parameter for this, so you can even test a single threaded compile and scale the number of threads used to exploit the drop off limit (where more threads do not yield better performance).

    qt is quite huge, but that's ok, since a compilation of a few minutes (linux kernel) won't tell much in the future, when processing power increases.
    Reply
  • smitty3268 - Wednesday, June 3, 2009 - link

    Yes, you can add the /MP parameter in Visual Studio. Reply
  • adiposity - Wednesday, June 3, 2009 - link


    From the page I linked before:

    Add these line to the .pro file for release version:

    QMAKE_CXXFLAGS_RELEASE += -MP[processMax]


    -Dan
    Reply
  • smitty3268 - Wednesday, June 3, 2009 - link

    All of Qt might be a bit large for a simple benchmark.

    Something like Paint.NET or NDepend might make a good C# test.
    Reply
  • adiposity - Wednesday, June 3, 2009 - link

    Use:

    nmake sub-src

    It only compiles qt libraries, not the tools or examples.

    It really does not take very long (less than 10 minutes on a Core2Duo 2.4).

    -Dan
    Reply

Log in

Don't have an account? Sign up now