The Pursuit of Clock Speed

Thus far I have pointed out that a number of resources in Bulldozer have gone down in number compared to their abundance in AMD's Phenom II architecture. Many of these tradeoffs were made in order to keep die size in check while adding new features (e.g. wider front end, larger queues/data structures, new instruction support). Everywhere from the Bulldozer front-end through the execution clusters, AMD's opportunity to increase performance depends on both efficiency and clock speed. Bulldozer has to make better use of its resources than Phenom II as well as run at higher frequencies to outperform its predecessor. As a result, a major target for Bulldozer was to be able to scale to higher clock speeds.

AMD's architects called this pursuit a low gate count per pipeline stage design. By reducing the number of gates per pipeline stage, you reduce the time spent in each stage and can increase the overall frequency of the processor. If this sounds familiar, it's because Intel used similar logic in the creation of the Pentium 4.

Where Bulldozer is different is AMD insists the design didn't aggressively pursue frequency like the P4, but rather aggressively pursued gate count reduction per stage. According to AMD, the former results in power problems while the latter is more manageable.

AMD's target for Bulldozer was a 30% higher frequency than the previous generation architecture. Unfortunately that's a fairly vague statement and I couldn't get AMD to commit to anything more pronounced, but if we look at the top-end Phenom II X6 at 3.3GHz a 30% increase in frequency would put Bulldozer at 4.3GHz.

Unfortunately 4.3GHz isn't what the top-end AMD FX CPU ships at. The best we'll get at launch is 3.6GHz, a meager 9% increase over the outgoing architecture. Turbo Core does get AMD close to those initial frequency targets, however the turbo frequencies are only typically seen for very short periods of time.

As you may remember from the Pentium 4 days, a significantly deeper pipeline can bring with it significant penalties. We have two prior examples of architectures that increased pipeline length over their predecessors: Willamette and Prescott.

Willamette doubled the pipeline length of the P6 and it was due to make up for it by the corresponding increase in clock frequency. If you do less per clock cycle, you need to throw more clock cycles at the problem to have a neutral impact on performance. Although Willamette ran at higher clock speeds than the outgoing P6 architecture, the increase in frequency was gated by process technology. It wasn't until Northwood arrived that Intel could hit the clock speeds required to truly put distance between its newest and older architectures.

Prescott lengthened the pipeline once more, this time quite significantly. Much to our surprise however, thanks to a lot of clever work on the architecture side Intel was able to keep average instructions executed per clock constant while increasing the length of the pipe. This enabled Prescott to hit higher frequencies and deliver more performance at the same time, without starting at an inherent disadvantage. Where Prescott did fall short however was in the power consumption department. Running at extremely high frequencies required very high voltages and as a result, power consumption skyrocketed.

AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:

 

Cinebench 11.5 - Single Threaded

At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD's efforts, IPC went down.

A slight reduction in IPC however is easily made up for by an increase in operating frequency. Unfortunately, it doesn't appear that AMD was able to hit the clock targets it needed for Bulldozer this time around.

We've recently reported on Global Foundries' issues with 32nm yields. I can't help but wonder if the same type of issues that are impacting Llano today are also holding Bulldozer back.

The Architecture Power Management and Real Turbo Core
Comments Locked

430 Comments

View All Comments

  • Iketh - Wednesday, October 12, 2011 - link

    AMD Exec a year ago: "We about ready to release BD?"

    AMD Engineer: "Soon. At 4ghz, we're actually slower per thread and using double the power than Phenom at 3.4ghz, but we'll get there..."

    AMD Exec: /gquit
  • lyeoh - Wednesday, October 12, 2011 - link

    Bulldozer reminds me of the P4/Prescott for some reason ;).

    High clock, high watts, but not enough performance.

    Might be faster in parallelizable tasks but most people with such tasks would just buy more computers and build large clusters.
  • Iketh - Wednesday, October 12, 2011 - link

    The processors are popping up on Newegg now... the 8120 for $220 and 6100 for $190
  • vol7ron - Wednesday, October 12, 2011 - link

    Sigh... I made the mistake in buying a Prescott. Not to mention I bought an "E" batch, which ran even hotter and weren't as overclockable.
  • actionjksn - Wednesday, October 12, 2011 - link

    Yeah I had one of those hot potato's too. Back then we thought Intel was finished.
  • just4U - Thursday, October 13, 2011 - link

    ckryan, you stated you were blown away by the 2500K yes? It's odd you know.. I've owned a PII 920, PII 1055, PII 955 (tested lots of lowbie $60-80 parts from AMD to) .. also used a i7 920, i7 955 i5 2500k i7 2600k (my most recent one) and .. I am not blown away by any of them..

    Last time I was blown away by a cpu was the Q6600..(before that the A64 3200+) since then other cpu's have been better but not so much so that I'd say that it was night and day differences.
  • CeriseCogburn - Wednesday, March 21, 2012 - link

    Ok that was some enormously skilled twisting and spinning. BD is an epic failure, period. I can't envision anyone with any needs, need, or combo thereof choosing it.\
    It's so bad amd lied about it's transistor count.
    Forget it, it's an epic fail and never anything more.
  • jiffylube1024 - Wednesday, October 12, 2011 - link

    Ugh, BD is quite the disappointment. The power consumption is absolutely through the roof -- unacceptable for 32nm, really!

    With that said, I am very intrigued in the FX-4100 4-core 3.6GHz part. This should be the replacement for the Athlon II 2-4 core series, and I'm very interested to see how it does vs ~3 GHz Athlon II X2's, X3's and X4's.
  • yankeeDDL - Wednesday, October 12, 2011 - link

    Wow ...
    I'm blown away.
    I have been waiting for BD's reviews and benchmarks for months. I have waited for BD for my new rig.
    I have used AMD for the past 8 years and I am ... was convinced that it always offered, by far, the best price/performance ratio for entry-level, mid range PCs.
    I am a still a big fan of AMD ... but I have to stand corrected. BD is a POS. Longer pipelines? Didn't they learn anything from Pentium 3/4 debacle?
    A Phenom II X6 is almost always better than BD, even in power consumption. Come on: if BD had come out shortly after the Phenom I could see it as an incremental improvement, a new baseline to build upon. But it took AMD years to come out with BD ... and this is the result? Disappointing.
    I mean, betting everything on higher clock frequencies? At 4GHz? It's no wonder that Intel's IPC improvements are crunching BD: IPC is all about doing more with the same power, clock speed is all about throwing more power to do the same faster ...
    Boy. This ruined my day.
  • yankeeDDL - Wednesday, October 12, 2011 - link

    By the way, no matter how AMD slices it, I see the FX-8* as a 4-core CPU. A glorified ohene, but still a 4-core.
    If I was AMD, I would have considered a fair goal to obliterate the i5-2500 performance with the new FX-8 family, instead it comes short most of the times.
    What were they thinking?

Log in

Don't have an account? Sign up now