The Mystery of the Missing Performance

As with past experience, we saw some very odd system behavior in testing and determined that Cool & Quiet may have had something to do with it. In testing this theory, we pulled out our Photoshop CS3 benchmark. We ran it once with Cool & Quiet off and once with it on. Our results were staggeringly different.

Our goal was to test both with C&Q disabled and enabled. It so happened that our first run through the benchmarks was with the power saving feature disabled. Our numbers looked much better than in previous tests and it seemed like everything made sense once again.

When we enabled C&Q the second time, however, the issue seemed to have disappeared (as has randomly happened in the past as well). We did install AMD's Power Meter in order to verify that C&Q was working (and it was), and it is possible that installing this software somehow fixed the issue. But since the issue has randomly come and gone in the past we really can't suggest this as a sure fire fix either.

In trying to re-reproduce the problem, we uninstalled the power meter, we rebooted and disabled CnQ, then re-enabled CnQ again. None of this brought back the poor performance we saw, but in another odd twist CnQ didn't really provide any power advantage either. It's entirely possible, since we didn't measure power when the problem was apparent, that the power savings of CnQ are also afflicted by whatever is underlying here.

In fact, if both performance and power savings were negatively affected by whatever is happening, we would not be surprised. AMD has informed us that our power numbers don't show as much of a savings as they would expect from CnQ (interestingly enough, Johan saw similar behavior in his latest piece). We've asked AMD to help us track down the issue, but their power guy is currently on vacation so it will be a little while.

Because the Photoshop test took so much less time without CnQ, we actually wanted to measure power usage over the test and compare energy used (watts * secconds to give joules). We fully expected the non-C&Q mode to be so much more efficient in completing the test quickly that it would use less total energy to perform the operation. Unfortunately, we were unable to verify this theory.

One thing is for certain, something is definitely not working as it should.

We do have a couple theories, but nothing confirmed or that really even makes a lot of sense yet. Why not share our thoughts and musings and see what comes of that though. It worked fairly well to help us find the instruction latency of the GT200 right?

One of the first things we thought was that it took longer to come out of its low power state than it should, but AMD did say that there's no reason why the X2 should be able to do this faster.

Our minds then wandered over to what we saw when we looked at the AMD Power Meter. Since Windows Vista takes it upon itself to move threads between cores in fairly stupid ways, during the Photoshop test we saw what looked like threads bouncing around between cores or cycling through them in rapid succession. Whatever was actually being done, the result was that one processor would ramp up to full power (1GHz up to 2GHz) and then drop back down as the next CPU came up to speed.

We talked about how it's possible that threads moving between these different cores, needing to wake the next one up rather than running on an already at speed core, could possibly impact performance. As the Phenom is the only CPU architecture we currently have access to with individual PLLs per core (Intel's CPUs must run all cores at the same frequency), the CnQ issues could be related to that.

There has to be a factor that is AMD specific that causes this problem -- and not only that but Phenom specific because we've never seen this problem on other AMD parts.

Or have we?

AMD GPUs last year exhibited quite an interesting issue with their power management features that were clearly evident in specific locations while playing Crysis. The culprit was the dynamic clocking of the GPU based on the graphics load. Because the hardware was able to switch quickly between modes, and due to the way Crytek's engine works, AMD GPUs were constantly speeding up and slowing down in situations where they should have been at full speed the entire time.

It is entirely possible that the CPU issue is of a similar nature. Perhaps the hardware that controls the clock speed is slowing down and then speeding up each core when it should just keep the core at full speed for a short time longer. The solution to the GPU issue was to increase the amount of time the GPU had to have lowered activity before the processor was clocked down. This meant that an increase in activity would result in an instant speed bump while the GPU had to be relatively lightly used for a longer period of time (still less than a second if I recall correctly) before it was clocked back down.

Yes, it's the same company, but the similarities do go a bit deeper. We really don't know what the heart of the matter is, but this kind of problem certainly is not without precedent. We will have to wait for AMD to help us understand better what is happening and if there is anything that could be done about it. We do hope you've enjoyed our best guesses though, and please feel free to let us know if you've got any other plausible explanations we didn't address.

The Story of Phenom's Erratic Performance More Problems?
Comments Locked

36 Comments

View All Comments

  • Gikaseixas - Wednesday, July 2, 2008 - link

    Other sites tested it already and could hit 3.2 - 3.6 speeds. Hopefully Anandtech will be able to overclock a Phenom to it's limits this time around.
  • Googer - Wednesday, July 2, 2008 - link

    Missing from the benchmarks is the Intel Core 2 Quad Q9550 Yorkfield 2.83GHz 12MB. How would this chip stack up against all others tested?

    http://www.newegg.com/Product/Product.aspx?Item=N8...">http://www.newegg.com/Product/Product.aspx?Item=N8...
  • RamarC - Wednesday, July 2, 2008 - link

    the q9550 isn't in the same price range as the other processors so that's why it wasn't included. as for performance, either subtract or add 10% to the q9450's figures.
  • DanD85 - Wednesday, July 2, 2008 - link

    Are you absolutely sure about that? As hothardware thinks differently:
    "By altering its multiplier and increasing the CPU voltage to 1.45v, we were able to take our Phenom X4 9950 to an respectable 3.1GHz using nothing but a stock AMD PIB cooler. Higher frequencies were possible, but we couldn't keep the system 100% stable, so we backed things down to 3.1GHz. While running at that speed, we re-ran some tests and also monitored core temperatures and found that the chip never broke the 60ºC mark, and hovered around 58ºC under load - at least according to AMD's Overdrive software. That is one heck of an overclock and relatively cool temperatures for a Phenom in our opinion. If the majority of chips have the same amount of headroom as ours, we suspect the 9950 Black Edition will be appealing to AMD CPU enthusiasts looking for the best the company has to offer."
    http://www.hothardware.com/Articles/AMD_Phenom_X4_...">http://www.hothardware.com/Articles/AMD...nom_X4_9...
  • KaarlisK - Wednesday, July 2, 2008 - link

    Maybe by setting affinity for Photoshop's threads to a certain core, it would be possible to verify whether Vista's thread management is part of the cause?
  • Zoomer - Wednesday, July 2, 2008 - link

    I know its a novel concept, but what about running some benches in XP to see if it's another Vista issue?
  • Rhoxed - Wednesday, July 2, 2008 - link

    Increasing the NB core (IMC) clock (in Phenom it runs async from the Core Speed unlike Athlon which is Sync) drops latencies (especially L3) and increases memory performance/throughput, which in turn improves system performance. The Phenom starts to come to life when you hit a 2.6GHz core speed with a NB core clock at 2200MHz+. Depending on the application and CPU, increasing NB core speeds (getting up to 2200MHz+) can result in performance differences from 3%~12% in most cases.

    Upping my NB/HT to 2400MHz over the stock 2000 at the same clockspeed (2800) i net a 15%~ increase (on a 9850BE)
  • RamarC - Tuesday, July 1, 2008 - link

    i'm a developer and want to upgrade my win2k3/ss2k5 server to a quad core. since it currently has a 3.4ghz p4d, a phenom 9x50 would be a big step-up (even though i don't have any performance issues). but the p4d has been very reliable and i don't want to have to deal with flaky hardware issues when i'm pushing code out the door. should i just bite the bullet and shell out the extra cash for a p45+q9450?
  • Calin - Wednesday, July 2, 2008 - link

    I have an AMD based PC at home, and I look forward to another AMD-based pc (780G and Phenom X3 or X4).
    These being said, I think for a server you really really should go for an Intel configuration. Also, at 3.4 GHz a P4D probably is one hell of a power draw.
    Compared to your current server, and based on what I think you need, I don't think a quad core would help you - a dual core would probably be enough, and Intel has those aplenty.
  • Zoomer - Wednesday, July 2, 2008 - link

    From the article, it seems like sticking to the cheaper, sub 100W TDP cpus and not overclocking is the way to go.

Log in

Don't have an account? Sign up now