The Pursuit of Clock Speed

Thus far I have pointed out that a number of resources in Bulldozer have gone down in number compared to their abundance in AMD's Phenom II architecture. Many of these tradeoffs were made in order to keep die size in check while adding new features (e.g. wider front end, larger queues/data structures, new instruction support). Everywhere from the Bulldozer front-end through the execution clusters, AMD's opportunity to increase performance depends on both efficiency and clock speed. Bulldozer has to make better use of its resources than Phenom II as well as run at higher frequencies to outperform its predecessor. As a result, a major target for Bulldozer was to be able to scale to higher clock speeds.

AMD's architects called this pursuit a low gate count per pipeline stage design. By reducing the number of gates per pipeline stage, you reduce the time spent in each stage and can increase the overall frequency of the processor. If this sounds familiar, it's because Intel used similar logic in the creation of the Pentium 4.

Where Bulldozer is different is AMD insists the design didn't aggressively pursue frequency like the P4, but rather aggressively pursued gate count reduction per stage. According to AMD, the former results in power problems while the latter is more manageable.

AMD's target for Bulldozer was a 30% higher frequency than the previous generation architecture. Unfortunately that's a fairly vague statement and I couldn't get AMD to commit to anything more pronounced, but if we look at the top-end Phenom II X6 at 3.3GHz a 30% increase in frequency would put Bulldozer at 4.3GHz.

Unfortunately 4.3GHz isn't what the top-end AMD FX CPU ships at. The best we'll get at launch is 3.6GHz, a meager 9% increase over the outgoing architecture. Turbo Core does get AMD close to those initial frequency targets, however the turbo frequencies are only typically seen for very short periods of time.

As you may remember from the Pentium 4 days, a significantly deeper pipeline can bring with it significant penalties. We have two prior examples of architectures that increased pipeline length over their predecessors: Willamette and Prescott.

Willamette doubled the pipeline length of the P6 and it was due to make up for it by the corresponding increase in clock frequency. If you do less per clock cycle, you need to throw more clock cycles at the problem to have a neutral impact on performance. Although Willamette ran at higher clock speeds than the outgoing P6 architecture, the increase in frequency was gated by process technology. It wasn't until Northwood arrived that Intel could hit the clock speeds required to truly put distance between its newest and older architectures.

Prescott lengthened the pipeline once more, this time quite significantly. Much to our surprise however, thanks to a lot of clever work on the architecture side Intel was able to keep average instructions executed per clock constant while increasing the length of the pipe. This enabled Prescott to hit higher frequencies and deliver more performance at the same time, without starting at an inherent disadvantage. Where Prescott did fall short however was in the power consumption department. Running at extremely high frequencies required very high voltages and as a result, power consumption skyrocketed.

AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:

 

Cinebench 11.5 - Single Threaded

At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD's efforts, IPC went down.

A slight reduction in IPC however is easily made up for by an increase in operating frequency. Unfortunately, it doesn't appear that AMD was able to hit the clock targets it needed for Bulldozer this time around.

We've recently reported on Global Foundries' issues with 32nm yields. I can't help but wonder if the same type of issues that are impacting Llano today are also holding Bulldozer back.

The Architecture Power Management and Real Turbo Core
Comments Locked

430 Comments

View All Comments

  • THizzle7XU - Wednesday, October 12, 2011 - link

    Well, why would you target the variable PC segment when you can program for a well established, large user-base platform with a single configuration and make a ton more money with probably far less QA work since there's only one set (two for multi-platform PS3 games) of hardware to test?

    And it's not like 360/PS3 games suddenly look like crap 5-6 years into their cycles. Think about how good PS2 games looked 7 years into that system's life cycle (God of War 2). Devs are just now getting the most of of the hardware. It's a great time to be playing games on 360/PS3 (and PC!).
  • GatorLord - Wednesday, October 12, 2011 - link

    Consider what AMD is and what AMD isn't and where computing is headed and this chip is really beginning to make sense. While these benches seem frustrating to those of us on a desktop today I think a slightly deeper dive shows that there is a whole world of hope here...with these chips, not something later.

    I dug into the deal with Cray and Oak Ridge, and Cray is selling ORNL massively powerful computers (think petaflops) using Bulldozer CPUs controlling Nvidia Tesla GPUs which perform the bulk of the processing. The GPUs do vastly more and faster FPU calculations and the CPU is vastly better at dishing out the grunt work and processing the results for use by humans or software or other hardware. This is the future of High Performance Computing, today, but on a government scale. OK, so what? I'm a client user.

    Here's what: AMD is actually best at making GPUs...no question. They have been in the GPGPU space as long as Nvidia...except the AMD engineers can collaborate on both CPU and GPU projects simultaneously without a bunch of awkward NDAs and antitrust BS getting in the way. That means that while they obviously can turn humble server chips into supercomputers by harnessing the many cores on a graphics card, how much more than we've seen is possible on our lowly desktops when this rebranded server chip enslaves the Ferraris on the PCI bus next door...the GPUs.

    I get it...it makes perfect sense now. Don't waste real estate on FPU dies when the one's next door are hundreds or thousands of times better and faster too. This is not the beginning of the end of AMD, but the end of the beginning (to shamlessely quote Churchill). Now all that cryptic talk about a supercomputer in your tablet makes sense...think Llano with a so-so CPU and a big GPU on the same die with some code tweaks to schedule the GPU as a massive FPU and the picture starts taking shape.

    Now imagine a full blown server chip (BD) harnessing full blown GPUs...Radeon 6XXX or 7XXX and we are talking about performance improvements in the orders of magnitude, not percentage points. Is AMD crazy? I'm thinking crazy like a fox.

    Oh..as a disclaimer, while I'm long AMD...I'm just an enthusiast like the rest of you and not a shill...I want both companies to make fast chips that I can use to do Monte Carlos and linear regressions...it just looks like AMD seems to have figured out how to play the hand they're holding for change...here's to the future for us all.
  • Menoetios - Wednesday, October 12, 2011 - link

    I think you bring up a very good point here. This chip looks like it's designed to be very closely paired with a highly programmable GPU, which is where the GPU roadmaps are leading over the next year and a half. While the apples-to-apples nature of this review draw a disappointing picture, I'm very curious how AMD's "Fusion" products next year will look, as the various compute elements of the CPU and GPU become more tightly integrated. Bulldozer appears to fit perfectly in an ecosystem that we don't quite have yet.
  • GatorLord - Wednesday, October 12, 2011 - link

    Exactly. Ecosystem...I like it. This is what it must feel like to pick up a flashlight at the entrance to the tunnel when all you're used to is clubs and torches. Until you find the switch, it just seems worse at either...then viola!
  • actionjksn - Wednesday, October 12, 2011 - link

    Wow I hope that made you feel better about the crappy chip also known a "Man With A Shovel"
    I was just hoping AMD would quit forcing Intel to have to keep on crippling their chips, just to keep them from putting AMD out of business. AMD better fix this abortion quick, this is getting old.
  • GatorLord - Thursday, October 13, 2011 - link

    Feeling fine. Not as good in the short run, but feeling better about the long run. Unfortunately, due to constraints, it takes AMD too long to get stuff dialed in and by the time they do, Intel has already made an end run and beat them to the punch.

    Intel can do that, they're 40x as big as AMD. Actually, and this may sound crazy until you digest it, the smartest thing Intel could do is spin off a couple of really good dev labs as competitors. Relying on AMD to drive your competition is risky in that AMD may not be able to innovate fast enough to push Intel where it could be if they had more and better sharks in the water nipping at their tails.

    You really need eight or more highly capable, highly aggressive competitors to create a fully functioning market free of monopolistic and oligopolistic sluggishness and BS hand signalling between them. This space is too capital intensive for that at the time being with the current chip making technology what it is.
  • yankeeDDL - Wednesday, October 12, 2011 - link

    Just to be the devil's advocate ...
    The launch event in London sported 2 PC, side by side, running Cinebench.
    One had the core i5-2500k, the other the FX8150.
    Of course, these systems are prepared by AMD, so the results from Anand are clearly more reliable (at least all the conditions are documented).
    Nevertheless, it is clear that in the demo from AMD, the FX runs faster. Not by a lot, but it is clearly faster than the i5.
    Video: http://www.viddler.com/explore/engadget/videos/335...

    Even so, assuming that this was a valid datapoint, things won't change too much: the i5-2500k is cheaper and (would be) slightly slower than the FX8150 in the most heavily threaded benchmark. But it would be slightly better than Anand's results show.
  • KamikaZeeFu - Wednesday, October 12, 2011 - link

    "Nevertheless, it is clear that in the demo from AMD, the FX runs faster. Not by a lot, but it is clearly faster than the i5."

    Check the review, cinebench r11.5 multithreaded chart.
    Anand's numbers mirror the ones by AMD. Multithreaded workloads are the only case where the 8150 will outperform an i5 2500k because it can process twice the amount of threads.

    Really disappointed in AMD here, but I expected subpar performance because it was eerily quiet about the FX line as far as performance went.

    Desktop BD is a full failure, they were aiming for high clock speeds and made sacrifices, but still failed their objective. By the time their process is mature and 4 GHz dozers hit the channel, Ivy bridge will be out.

    As far as server performance goes, not even sure they will succeed there.
    As seen in the review, clock for clock performance isn't up compared to the prvious generation, and in some cases it's actually slower. Considering that servers run at lower clocks in the first place, I don't see BD being any threat to intels server lineup.

    4 years to develop this chip, and their motto seemed to be "we'll do netburst but in not-fail"
  • medi01 - Wednesday, October 12, 2011 - link

    So CPU is a bottleneck in your games eh?
  • TekDemon - Wednesday, October 12, 2011 - link

    It's not but people don't buy CPUs for today's games, generally you want your system to be future proof so the more extra headroom there is in these CPU benchmarks the better it holds up over the long term. Look back at CPU benchmarks from 3-4 years ago and you'll see that the CPUs that barely passed muster back then easily bottleneck you whereas CPUs that had extra headroom are still usable for gaming. For example the Core 2 Duo E8400 or E8500 is still a very capable gaming CPU, especially when given a mild overclock and frankly in games that only use a few threads (like Starcraft 2) it gives Bulldozer a run for the money.
    I'm not a fanboy either way since I own that E8400 as well as a Phenom II (unlocked to X4, OC'ed to 3.9Ghz) and a i5 2500K but if I was building a new system I sure as heck would want extra headroom for future-proofing.
    That said? Of course these chips will be more than enough power for general use. They're just not going to be good for high end systems. But in a general use situation the problem is that the power consumption is just crappy compared to the intel solutions, even if you can argue that it's more than enough power for most people why would you want to use more electricity?

Log in

Don't have an account? Sign up now