The Pursuit of Clock Speed

Thus far I have pointed out that a number of resources in Bulldozer have gone down in number compared to their abundance in AMD's Phenom II architecture. Many of these tradeoffs were made in order to keep die size in check while adding new features (e.g. wider front end, larger queues/data structures, new instruction support). Everywhere from the Bulldozer front-end through the execution clusters, AMD's opportunity to increase performance depends on both efficiency and clock speed. Bulldozer has to make better use of its resources than Phenom II as well as run at higher frequencies to outperform its predecessor. As a result, a major target for Bulldozer was to be able to scale to higher clock speeds.

AMD's architects called this pursuit a low gate count per pipeline stage design. By reducing the number of gates per pipeline stage, you reduce the time spent in each stage and can increase the overall frequency of the processor. If this sounds familiar, it's because Intel used similar logic in the creation of the Pentium 4.

Where Bulldozer is different is AMD insists the design didn't aggressively pursue frequency like the P4, but rather aggressively pursued gate count reduction per stage. According to AMD, the former results in power problems while the latter is more manageable.

AMD's target for Bulldozer was a 30% higher frequency than the previous generation architecture. Unfortunately that's a fairly vague statement and I couldn't get AMD to commit to anything more pronounced, but if we look at the top-end Phenom II X6 at 3.3GHz a 30% increase in frequency would put Bulldozer at 4.3GHz.

Unfortunately 4.3GHz isn't what the top-end AMD FX CPU ships at. The best we'll get at launch is 3.6GHz, a meager 9% increase over the outgoing architecture. Turbo Core does get AMD close to those initial frequency targets, however the turbo frequencies are only typically seen for very short periods of time.

As you may remember from the Pentium 4 days, a significantly deeper pipeline can bring with it significant penalties. We have two prior examples of architectures that increased pipeline length over their predecessors: Willamette and Prescott.

Willamette doubled the pipeline length of the P6 and it was due to make up for it by the corresponding increase in clock frequency. If you do less per clock cycle, you need to throw more clock cycles at the problem to have a neutral impact on performance. Although Willamette ran at higher clock speeds than the outgoing P6 architecture, the increase in frequency was gated by process technology. It wasn't until Northwood arrived that Intel could hit the clock speeds required to truly put distance between its newest and older architectures.

Prescott lengthened the pipeline once more, this time quite significantly. Much to our surprise however, thanks to a lot of clever work on the architecture side Intel was able to keep average instructions executed per clock constant while increasing the length of the pipe. This enabled Prescott to hit higher frequencies and deliver more performance at the same time, without starting at an inherent disadvantage. Where Prescott did fall short however was in the power consumption department. Running at extremely high frequencies required very high voltages and as a result, power consumption skyrocketed.

AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:

 

Cinebench 11.5 - Single Threaded

At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD's efforts, IPC went down.

A slight reduction in IPC however is easily made up for by an increase in operating frequency. Unfortunately, it doesn't appear that AMD was able to hit the clock targets it needed for Bulldozer this time around.

We've recently reported on Global Foundries' issues with 32nm yields. I can't help but wonder if the same type of issues that are impacting Llano today are also holding Bulldozer back.

The Architecture Power Management and Real Turbo Core
Comments Locked

430 Comments

View All Comments

  • madseven7 - Thursday, October 13, 2011 - link

    How could the 6core fx be competitive? For $169 you could get a Phenom 2 that beats the 8150 fx. Imagine what it would do to the 6 core fx chip?
  • jerkstorez - Wednesday, October 12, 2011 - link

    Was pretty hyped for this chip to come out, and if it was a decent performance boost, I was ready to upgrade from my Phenom II 1090t. It looks like it'd barely an upgrade in many areas and a definite downgrade in others. With a whole new architecture and all the hype, I expected a lot more, at least better performance than the current generation of AMD chips. Very disappointed.
  • boobox - Wednesday, October 12, 2011 - link

    What is with the settings choice for all these benchmarks?

    Who is buying these processors and running games on medium settings and such low resolutions?

    I was hoping to see 1920x1080 at least for tests on the highest settings.
  • JarredWalton - Wednesday, October 12, 2011 - link

    It's a CPU review, so settings were selected to show both CPU-limited (or at least more CPU limited) and GPU-limited (or more GPU limited than CPU). And of course, that's only one facet of the overall review.
  • tipoo - Wednesday, October 12, 2011 - link

    Maxing out the GPU would create a bottleneck and hide the CPU's performance.
  • CoolGoodGuy - Wednesday, October 12, 2011 - link

    I read this extensive review. However after the last page, your mention about "windows scheduler problem" made me to think this tests might be Biased. So, I thought of posting this comment.

    Windows is compiled using Intel Compiler which optimises code very well for Intel processors but doesn't do that for AMD. Where as Linux is mostly compiled using GNU GCC. So, Linux bench marks would be more neutral for both Intel and AMD processors.

    Also, now a days a lot of Desktop users have started using Linux Distros like UBUNTU, and in servers Linux is mainstream.

    Server users would be greatly benefited by a Linux CPU benchmark.

    So, I would request you to include Linux benchmarks for processors in your reviews.

    Thanks,
  • JarredWalton - Wednesday, October 12, 2011 - link

    Server benchmarks will be coming from Johan at some point, though obviously those require a lot of time to put together. As for the compiler of Windows, you're never going to change that, and Windows is still 90%+ of the market. Ultimately, as a hardware manufacturer you need to make hardware that runs the stuff people are doing faster than the competition or there's not much point. It's like having the world's fastest GPU with crappy drivers: no one will like it because it can't run games.
  • FunBunny2 - Wednesday, October 12, 2011 - link

    Here's your answer:

    http://developer.amd.com/tools/gnu/pages/default.a...
  • Burticus - Wednesday, October 12, 2011 - link

    Wait wait wait wait wait, wait some more, wait for years... then.... bleh. Doesn't even match up to the last product. That is not the way to move the bar forward or stay in business, AMD.

    At some point why didn't someone just say.... you know what? This thing may sell like hotcakes for servers but just doesn't make sense for desktop. Sell the Opterons to get that server profit margin, and just die shrink the Thuban and make it faster for desktop. Know when something is a lost cause and fold the hand already.
  • Geforce man - Wednesday, October 12, 2011 - link

    Any chance for a few benchmarks at the overclocked settings? it would be quite interesting to see, as the 4.6Ghz range seems to be a common OC for an i7 2600k (Obviously this would be slower, but it'd be nice to see)

Log in

Don't have an account? Sign up now