The Pursuit of Clock Speed

Thus far I have pointed out that a number of resources in Bulldozer have gone down in number compared to their abundance in AMD's Phenom II architecture. Many of these tradeoffs were made in order to keep die size in check while adding new features (e.g. wider front end, larger queues/data structures, new instruction support). Everywhere from the Bulldozer front-end through the execution clusters, AMD's opportunity to increase performance depends on both efficiency and clock speed. Bulldozer has to make better use of its resources than Phenom II as well as run at higher frequencies to outperform its predecessor. As a result, a major target for Bulldozer was to be able to scale to higher clock speeds.

AMD's architects called this pursuit a low gate count per pipeline stage design. By reducing the number of gates per pipeline stage, you reduce the time spent in each stage and can increase the overall frequency of the processor. If this sounds familiar, it's because Intel used similar logic in the creation of the Pentium 4.

Where Bulldozer is different is AMD insists the design didn't aggressively pursue frequency like the P4, but rather aggressively pursued gate count reduction per stage. According to AMD, the former results in power problems while the latter is more manageable.

AMD's target for Bulldozer was a 30% higher frequency than the previous generation architecture. Unfortunately that's a fairly vague statement and I couldn't get AMD to commit to anything more pronounced, but if we look at the top-end Phenom II X6 at 3.3GHz a 30% increase in frequency would put Bulldozer at 4.3GHz.

Unfortunately 4.3GHz isn't what the top-end AMD FX CPU ships at. The best we'll get at launch is 3.6GHz, a meager 9% increase over the outgoing architecture. Turbo Core does get AMD close to those initial frequency targets, however the turbo frequencies are only typically seen for very short periods of time.

As you may remember from the Pentium 4 days, a significantly deeper pipeline can bring with it significant penalties. We have two prior examples of architectures that increased pipeline length over their predecessors: Willamette and Prescott.

Willamette doubled the pipeline length of the P6 and it was due to make up for it by the corresponding increase in clock frequency. If you do less per clock cycle, you need to throw more clock cycles at the problem to have a neutral impact on performance. Although Willamette ran at higher clock speeds than the outgoing P6 architecture, the increase in frequency was gated by process technology. It wasn't until Northwood arrived that Intel could hit the clock speeds required to truly put distance between its newest and older architectures.

Prescott lengthened the pipeline once more, this time quite significantly. Much to our surprise however, thanks to a lot of clever work on the architecture side Intel was able to keep average instructions executed per clock constant while increasing the length of the pipe. This enabled Prescott to hit higher frequencies and deliver more performance at the same time, without starting at an inherent disadvantage. Where Prescott did fall short however was in the power consumption department. Running at extremely high frequencies required very high voltages and as a result, power consumption skyrocketed.

AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:

 

Cinebench 11.5 - Single Threaded

At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD's efforts, IPC went down.

A slight reduction in IPC however is easily made up for by an increase in operating frequency. Unfortunately, it doesn't appear that AMD was able to hit the clock targets it needed for Bulldozer this time around.

We've recently reported on Global Foundries' issues with 32nm yields. I can't help but wonder if the same type of issues that are impacting Llano today are also holding Bulldozer back.

The Architecture Power Management and Real Turbo Core
Comments Locked

430 Comments

View All Comments

  • ThaHeretic - Saturday, October 15, 2011 - link

    Here's something for a compile test: build the Linux kernel. Something people actually care about.
  • Loki726 - Monday, October 31, 2011 - link

    The linux kernel is more or less straight C with a little assembly; it is much easier on a compiler frontend and more likely to stress the backend optimizers and code generators.

    Chromium is much more representative of a modern C++ codebase. At least, it is more relevant to me.
  • nyran125 - Saturday, October 15, 2011 - link

    Whats the point in having 8 cores, if its not even as fast as an intel 4 core and you get better performance overall with intel.. Heres the BIG reality, the high end 8 core is not that much cheaper than a 2600K. Liek $20-60 MAX> Youd be crazy to buy an 8 core for the same price as an intel 2600K...

    LIKE MAD!!!
  • Fiontar - Saturday, October 15, 2011 - link

    Well, these numbers are pretty dismal all around. Maybe as the architecture and the process mature, this design will start to shine, but for the first generation, the results are very disappointing.

    As someone who is running a Phenom II X6 at a non-turbo core 4.0 Ghz, air cooled, I just don't see why I would want to upgrade. If I got lucky and got a BD overclock to 4.6 Ghz, I might get a single digit % increase in performance over my Phenom II X6, which is not worth the cost or effort.

    I guess on the plus side, my Phenom II was a good upgrade investment. Unless I'm tempted to upgrade to an Intel set up in the near future, I think I can expect to get another year or two from my Phenom II before I start to see upgrade options that make sense. (I usually wait to upgrade my CPU until I can expect about a 40% increase in performance over my current system at a reasonable price).

    I hope AMD is able to remain competitive with NVidia in the GPU space, because they just aren't making it in the CPU space.

    BTW, if the BD can reliably be overclocked to to 4.5Ghz+, why are they only selling them at 3.3 Ghz? I'm guessing because the added power requirements then make them look bad on power consumption and performance per watt, which seems to be trumping pure performance as a goal for their CPU releases.
  • Fiontar - Saturday, October 15, 2011 - link

    A big thumbs down to Anand for not posting any of the over-clock benchmarks. He ran them, why not include them in the review?

    With the BD running at an air cooled 4.5 Ghz, or a water cooled 5.0 Ghz, both a significant boost over the default clock speed, the OC benchmarks are more important to a lot of enthusiasts than the base numbers. In the article you say you ran the benchmarks on the OC part, why didn't you include them in your charts? Or at least some details in the section of the article on the Over-clock? You tell us how high you managed to over-clock the BD and under what conditions, but you gave us zero input on the payoff!
  • Oscarcharliezulu - Saturday, October 15, 2011 - link

    ...was going to upgrade my old amd3 system to a BD, just a dev box, but I think a phenom x6 or 955 will be just fine. Bit sad too.
  • nhenk--1 - Sunday, October 16, 2011 - link

    I think Anand hit the nail on the head mentioning that clock frequency is the major limitation of this chip. AMD even stated that they were targeting a 30% frequency boost. A 30% frequency increase over a 3.2 GHz Phenom II (AM3 launch frequency i think) would be 4.2 GHz, 17% faster than the 3.6 GHz 8150.

    If AMD really did make this chip to scale linearly to frequency increases, and you add 17% performance to any of the benchmarks, BD would roughly match the i7. This was probably the initial intention at AMD. Instead the gigantic die, and limitations of 32nm geometries shot heat and power through the roof, and that extra 17% is simply out of reach.

    I am an AMD fan, but at this point we have to accept that we (consumers) are not a priority. AMD has been bleeding share in the server space where margins are high, and where this chip will probably do quite well. We bashed Barcelona at release too (I was still dumb enough to buy one), but it was a relative success in the server market.

    AMD needs to secure its spot in the server space if it wants to survive long term. 5 years from now we will all be connecting to iCloud with our ARM powered Macbook Vapor thin client laptops, and a server will do all of the processing for us. I will probably shed a tear when that happens, I like building PCs. Maybe I'll start building my own dedicated servers.

    The review looked fair to me, seems like Anand is trying very hard to be objective.
  • neotiger - Monday, October 17, 2011 - link

    "server space where margins are high, and where this chip will probably do quite well."

    I don't see how Bulldozer could possibly do well in the server space. Did you see the numbers on power consumption? Yikes.

    For servers power consumption is far more important than it is in the consumer space. And BD draws about TWICE as much power as Sandy Bridge does while performs worse.

    BD is going to fail worse in the server space than it will in the consumer space.
  • silverblue - Monday, October 17, 2011 - link

    I'm not sure that I agree.

    For a start, you're far more likely to see heavily threaded workloads on servers than in the consumer space. Bulldozer does far better here than with lightly threaded workloads and even the 8150 often exceeds the i7-2600K under such conditions, so the potential is there for it to be a monster in the server space. Secondly, if Interlagos noticably improves performance over Magny Cours then coupled with the fact that you only need the Interlagos CPU to pop into your G34 system means this should be an upgrade. Finally, power consumption is only really an issue with Bulldozer when you're overclocking. Sure, Zambezi is a hungrier chip, but remember that it's got a hell of a lot more cache and execution hardware under the bonnet. Under the right circumstances, it should crush Thuban, though admittedly we expected more than just "under the right circumstances".

    I know very little about servers (obviously), however I am looking forward to Johan's review; it'd be good to see this thing perform to its strengths.
  • neotiger - Monday, October 17, 2011 - link

    First, in the server space BD isn't competing with i7-2600K. You have to remember that all the current Sandy Bridge i7 waste a big chunk of silicon real estate on GPU, which is useless in servers. In 3 weeks Intel is releasing the 6 core version of SB, essentially take the transistors that have been used for GPU and turn them into 2 extra cores.

    Even in highly threaded workloads 8150 performs more or less the same level as i7-2600K. In 3 weeks SB will increase threaded performance by 50% (going from 4 cores to 6). Once again the performance gap between SB and BD will be huge, in both single-threaded and multi-threaded workloads.

    Second, BD draws much higher power than SB even in stock frequency. This is born out by the benchmark data in the article.

Log in

Don't have an account? Sign up now