Athlon II X2: Hardware C1E and Return of the CnQ Bug

I noticed something strange in my initial testing of the Athlon II X2, take a look at these SYSMark results:

Processor SYSMark 2007 Overall
AMD Phenom II X2 550 BE (3.10GHz) 167
AMD Athlon II X2 250 (3.00GHz) 134
AMD Athlon X2 7850 (2.80GHz) 145

 

The Athlon II X2 250 is slower than the Athlon X2 7850 and significantly slower than the Phenom II X2 550. Remembering the Cool’n’Quiet bug from the original Phenom processor I decided to turn CnQ off in the BIOS to see if the scores would go up:

Processor SYSMark 2007 Overall - CnQ On SYSMark 2007 Overall - CnQ Off
AMD Athlon II X2 250 (3.00GHz) 134 148

 

Indeed they did. 

I contacted AMD and was informed that there’s more than meets the eye with the Athlon II X2.  Although the architecture is fundamentally a couple of Phenom II cores with larger L2 caches and no L3, there’s one more change to the die: microcode support for the C1E power state.

When the OS executes a halt instruction on a CPU (during a period of no activity for example) the clock signal is shut off to the CPU for a period of time.  This saves power as no transistors are actively switching during this time.  Voltage supplied to the processor is left unchanged however.  This power state is known as C1.

In the late Pentium 4 era Intel introduced an Enhanced Halt State, called C1E.  Instead of just shutting off the clock to the CPU, when a CPU was in C1E its clock speed and voltage would both be reduced to their lowest possible value.  The reduction in voltage results in a reduction in leakage current, which in turn saves power.

Apparently prior to the Athlon II X2, AMD enabled support for C1E outside of the processor.  Although I tried, I couldn’t get access to anyone at AMD to explain things any further so what I offer is my best guess.  I’m guessing that whenever a halt instruction was executed by the OS, AMD used some combination of its existing C1 support and Cool’n’Quiet to both stop the clock to the CPU and reduce voltage. 

Regardless of how AMD enabled it, motherboard makers were constantly botching it up in their BIOS which would result in different motherboards having very different power consumption levels especially at idle.  It appears that some vendors were properly enabling this software-hack C1E state while others weren’t.

AMD always expressed frustration to me that the motherboard vendors kept screwing things up and I’m guessing they got tired of dealing with it.  The new Athlon II X2 has microcode level support for the C1E state; when the OS executes a halt instruction, the CPU now knows to both shut off its clock and drop its voltage.  No BIOS trickery necessary.

The problem with this, as you can guess, is that not all current motherboards have proper BIOS support for it.  Yep.

But that’s only half of the problem.  Simply not supporting the new hardware C1E in the Athlon II X2 won’t cause the issue I saw above, that has to do with Cool’n’Quiet, not C1E.  So what’s going on?

Late last week AMD finally got back to me with an answer.  The feature that caused the CnQ bug in the original Phenom processor was the processor’s ability to run each core at a different clock speed.  A nasty combination of Windows’ scheduler and the Phenom’s power management could result in cores, under load, running at 50% of their frequency.  AMD fixed the problem by removing the feature; in the Phenom II all cores attempt to run at the same frequency. 

When AMD put out its master BIOS code for all 7xx series reference motherboards, the Athlon II did not exist.  The fix that was applied to the Phenom II would not be applied to any other Phenom II based derivatives, they would simply get treated as original Phenom processors with varying clock speeds between cores.

And that’s what’s going on.  The Athlon II isn’t told to run both of its cores at the same frequency and thus you can have situations where performance is much lower than it should be.

AMD is aware of the issue and are currently working with motherboard vendors to properly enable BIOS support for the Athlon II.  Until then, the best way to run and use the Athlon II is with CnQ disabled.  Unlike the original Phenom, this bug should get fixed in the near future.

Index Intel’s Response: The Pentium E6300
POST A COMMENT

55 Comments

View All Comments

  • TA152H - Tuesday, June 2, 2009 - link

    I agree with almost everything you say, I only have a small caveat.

    Intel chips will suffer much less from this than AMD, since they have an inclusive cache architecture, and can readily read the information from the L3 cache. I still think AMD has an exclusive cache arrangement, which I really think they should change with regards to the L3 cache for reasons just like the one you mention.

    For what's it's worth, Microsoft screwed Intel 14 years back when the Pentium Pro was released. Naturally, Intel got the blame for having miserable 16-bit performance (it was related to segmentation, which was part of all the 16-bit modes, and technical, even 32-bit mode even though it was transparent), because Microsoft told them the world will be 32-bit by then. Of course Windows 95 had a lot of legacy code, and Windows NT, which we called "Not There" at the time, was about as common as a 20 year old virgin in western Europe. So, Intel took the blame, just like AMD is now, despite, once again, Microsoft's incompetence.

    Really, if you think about it, the ability to clock the processors differently could be a very useful features, except for the fact Winblows can't use it properly.

    Good
    Reply
  • TA152H - Tuesday, June 2, 2009 - link

    First, I like seeing the Pentium 4s in the benchmarks, it was kind of interesting. They did better than I thought they would, and it makes me even more curious what they would be like on 45nm, since their clock speeds would probably be astronomical (since 45nm has much better power characteristics, and the clock speed limiter on the Pentium 4 was power use/heat).

    But, anyway, why not use the Pentium 4 670 (3.8 GHz), or Pentium EE 965 Extreme Edition (3.73 GHz) processors? Why use the next to fastest ones?

    Don't get me wrong, it was still informative, but I would have liked to see the fastest measured against today's processors, not one step removed. Even so, it was nice to see them, so it's just a minor complaint. I'm looking forward to seeing the Nano.
    Reply
  • strikeback03 - Wednesday, June 3, 2009 - link

    Might not have had any around. Figure the "best of breed" were the most likely to be either sold or go in a system for some family member when they were no longer needed for comparison duties. Reply
  • ShangoY - Tuesday, June 2, 2009 - link

    I am curious as to why the current cheapest Intel quad core were not included in the benches yet you bothered to go grab the previously $999 Pentium 4 and then also included the Phenom X4 940. Reply
  • Gary Key - Tuesday, June 2, 2009 - link

    http://www.anandtech.com/bench/default.aspx?b=2">http://www.anandtech.com/bench/default.aspx?b=2 - You can compare them here. Reply
  • Kenzid - Tuesday, June 2, 2009 - link

    Does any body know why AMD transistor density is very low compare to Intel? Is this because of Intel High K metal process or the architechture? Reply
  • Goty - Tuesday, June 2, 2009 - link

    It's more than likely due to the fact that Intel has much higher cache densities than AMD does. It probably had very little to nothing to do with the actual process (well, beyond the geometry size, that is). Reply
  • TA152H - Tuesday, June 2, 2009 - link

    What are you basing that on?

    Typically, cache is very dense, so you will notice transistor count increasing disproportionately to size as you add cache.

    With respect to the Athlon II X2 being larger than the Penryn, that's not really a bad thing, since it does more too; the Penryn needs a memory controller on the chipset that the Athlon II does not.
    Reply
  • Kenzid - Tuesday, June 2, 2009 - link

    Based on above die size chart. Core2Duo 107mm2 and 410million transistors while Athlon II has only 234million transistors on 117mm2. It's almost half of the number of Intel used on theirs. IMC take that much space? Reply
  • TA152H - Tuesday, June 2, 2009 - link

    Well, take a look at the Pentium version, and you'll see the what I was saying about the cache. We both can agree it's the same core, but one has a larger cache.

    The Pentium is 82 mm2, with 228 million transistors, with a 2.064 megabytes of cache (L1 + L2). But, since 1 megabyte is disabled, it's really 3.064, like the other Wolfdale's have. The 6 MB version of the Core 2 is 107 mm2 with 410 million transistors.

    So, you can see that adding 3 MB of cache increased the transistor count by 182 million, but the size by only 25 mm2. Or, in other words, it increased transistors by about 80%, but size by about 31%. So, cache does increase transistor count disproportionately to die size.

    Oh, and yes, the IMC is quite large. You can view some of the pictures of the CPU die to see it, but it's far from insignificant in size.
    Reply

Log in

Don't have an account? Sign up now