The Pursuit of Clock Speed

Thus far I have pointed out that a number of resources in Bulldozer have gone down in number compared to their abundance in AMD's Phenom II architecture. Many of these tradeoffs were made in order to keep die size in check while adding new features (e.g. wider front end, larger queues/data structures, new instruction support). Everywhere from the Bulldozer front-end through the execution clusters, AMD's opportunity to increase performance depends on both efficiency and clock speed. Bulldozer has to make better use of its resources than Phenom II as well as run at higher frequencies to outperform its predecessor. As a result, a major target for Bulldozer was to be able to scale to higher clock speeds.

AMD's architects called this pursuit a low gate count per pipeline stage design. By reducing the number of gates per pipeline stage, you reduce the time spent in each stage and can increase the overall frequency of the processor. If this sounds familiar, it's because Intel used similar logic in the creation of the Pentium 4.

Where Bulldozer is different is AMD insists the design didn't aggressively pursue frequency like the P4, but rather aggressively pursued gate count reduction per stage. According to AMD, the former results in power problems while the latter is more manageable.

AMD's target for Bulldozer was a 30% higher frequency than the previous generation architecture. Unfortunately that's a fairly vague statement and I couldn't get AMD to commit to anything more pronounced, but if we look at the top-end Phenom II X6 at 3.3GHz a 30% increase in frequency would put Bulldozer at 4.3GHz.

Unfortunately 4.3GHz isn't what the top-end AMD FX CPU ships at. The best we'll get at launch is 3.6GHz, a meager 9% increase over the outgoing architecture. Turbo Core does get AMD close to those initial frequency targets, however the turbo frequencies are only typically seen for very short periods of time.

As you may remember from the Pentium 4 days, a significantly deeper pipeline can bring with it significant penalties. We have two prior examples of architectures that increased pipeline length over their predecessors: Willamette and Prescott.

Willamette doubled the pipeline length of the P6 and it was due to make up for it by the corresponding increase in clock frequency. If you do less per clock cycle, you need to throw more clock cycles at the problem to have a neutral impact on performance. Although Willamette ran at higher clock speeds than the outgoing P6 architecture, the increase in frequency was gated by process technology. It wasn't until Northwood arrived that Intel could hit the clock speeds required to truly put distance between its newest and older architectures.

Prescott lengthened the pipeline once more, this time quite significantly. Much to our surprise however, thanks to a lot of clever work on the architecture side Intel was able to keep average instructions executed per clock constant while increasing the length of the pipe. This enabled Prescott to hit higher frequencies and deliver more performance at the same time, without starting at an inherent disadvantage. Where Prescott did fall short however was in the power consumption department. Running at extremely high frequencies required very high voltages and as a result, power consumption skyrocketed.

AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:

 

Cinebench 11.5 - Single Threaded

At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD's efforts, IPC went down.

A slight reduction in IPC however is easily made up for by an increase in operating frequency. Unfortunately, it doesn't appear that AMD was able to hit the clock targets it needed for Bulldozer this time around.

We've recently reported on Global Foundries' issues with 32nm yields. I can't help but wonder if the same type of issues that are impacting Llano today are also holding Bulldozer back.

The Architecture Power Management and Real Turbo Core
Comments Locked

430 Comments

View All Comments

  • AmdInside - Wednesday, October 12, 2011 - link

    Their roadmap is aggressive but when is the last time AMD has come close to meeting their schedule? Not going to happen. But do hope that they do for consumers sake.
  • Eagle70ss - Wednesday, October 12, 2011 - link

    AMD really bent over and grabbed their ankles....I'm just wondering why it took so long to release douche-dozer...I was really hoping they would have a good part this time...Will Intel stand alone as the sole quality CPU maker?? Only time will tell, but it looks to be so....
  • silverblue - Wednesday, October 12, 2011 - link

    I must say, I did expect this. That price drop wasn't exactly a giveaway, was it? Single threaded performance is generally poor and there really is something wrong with the caching. I simply refuse to believe a lack of BIOS optimisations is at fault for any of this... and blaming Windows 7 for not truly understanding Bulldozer's idiosyncracies? Come off it; Windows 8 won't even be around when Piledriver appears, and we'll have to wait to see the second generation of this particular microarchitecture performing more like it "should". Bringing back the FX moniker certainly attracted attention, however if by doing so they wanted to remind us of the fact that the FX-51 was a server CPU, they've succeeded, if only on that basis, as the FX was king of all and not just in select benchmarks as the P4 tended to be.

    I can't wait for Johan's server review; I just want to see if this thing really does well in its natural habitat. It's got to have a success somewhere. Thankfully, I can see far more optimism in this area. Incidentally, I was expecting Bulldozer to be able to work on eight 128-bit FP instructions per clock as opposed to 6 with Thuban, so obviously I got my wires crossed on that one.

    You can't argue that Bulldozer hasn't a lot of promise, but at the same time, you can't argue that AMD haven't been trying to perform damage limitation on an already faulty product.
  • arjuna1 - Wednesday, October 12, 2011 - link

    Nobody, and I mean, nobody at all, expected Bulldozer to reach SB like performance, obvious nobody either saw sub Phenom II performance in certain applications, but almost everything promised has been delivered, at lower prices than Intel, the way AMD has always done it, and quoting the article:
    "In many ways, where Bulldozer is a clear win is where AMD has always done well in: heavily threaded applications. If you're predominantly running well threaded workloads, Bulldozer will typically give you performance somewhere around or above Intel's 2500K."

    PS
    wolfman3k5, stop your Intel shilling, it almost look like if Intel was paying you by the hour.
  • wolfman3k5 - Wednesday, October 12, 2011 - link

    I get $22.50 per hour from Intel plus tips. I also get a $50.00 bonus if I surpass 1000 comments / posts per day. Between 3:00AM and 7:00AM I get $25.85 per hour. I make good money writing nice things about Intel. What do you do?
  • g101 - Wednesday, October 12, 2011 - link

    What's surprising is that you apparently think that's "good money".

    Guess what, you little dumbshit kid, profit savvy professionals will sill be running AMD. I couldn't care less about your shitty lightly threaded games and optimized synthetic benchmarks.

    Stupid children using their computers for play.
  • silverblue - Friday, October 14, 2011 - link

    You need to bear in mind that a) AMD reintroduced the FX brand just for Zambezi, and b) JF-AMD actually started a thread entitled The Bulldozer Blog Is Live! on www.overclock.net. Regardless of whether John Freuhe is a server-focused guy or not, the point being is that he and AMD both targetted the client side in terms of i) overclockers and ii) gamers. I might be wrong, but that's how I see it. Yes, he didn't come out and say it directly that Zambezi would be a great gaming solution, but he DID say that IPC would be an improvement over their past products. Now that the reviews are out, he's nowhere to be seen, barring the odd login to do who-knows-what. Does overclock.net have any leaning towards the server market in any way?

    If Zambezi's poor performance is partly down to using faulty ASUS boards/anything less than 1866MHz RAM/an L1 cache bug/some weird hardware combinations/WHATEVER, I'm sure we'll find out in time, but regardless, it's going to be harming non-gaming workloads as well, so it's important to people like you as well.
  • silverblue - Friday, October 14, 2011 - link

    Just thought I'd say that I've been a bit harsh to JF there. Out of all the AMD people who could've come along to have a chat, he was definitely the bravest. It was on his free time, and he's probably getting copious amounts of hate messages just for being an AMD rep.
  • Proxicon - Wednesday, October 12, 2011 - link

    I stayed up all night to read this review....

    I guess the prices on 2600k won't be going down anytime soon. I had already built my complete system in my head. Then the reviews came..

    I kind of figured that if AMD was firing people and resignations were being handed in before a major launch, it wasn't going to be good. Also, no early release of benchmarks. That in itself was suspect. If they really had such a great processor than why all the secrecy. I was hoping it was an Apple play. boy, was I wrong.

    You guys buy the "faildozer" and help keep the prices of the 2600K low. I'll be looking for a 2600K....
  • 3DVagabond - Wednesday, October 12, 2011 - link

    I'm not an expert, but Bulldozer seems to be a server chip pressed into desktop service. Designed for highly threaded workloads many consumer tasks just aren't it's forte (and also designed to have even more cores than 8). While it isn't competitive in single thread performance, if you use highly threaded workloads enough and aren't afraid to O/C to boost the single core performance, Bulldozer can be the better chip. That is if the price is right. The 8120 might be an awesome value in this scenario. We'll have to wait for reviews to be sure.

    One question, please. When you O/C'd the 8150, did you only use stock cooling? From the review it sounded like you did, but instead of saying so clearly, you said it wouldn't do 5GHz on "air" (I believe that was the statement? Feel free to flame me if I'm wrong. :D). So, to be clear, would it not do 5GHz on air with a top notch cooler, or did you only try the stock cooler?

    Thanks.

Log in

Don't have an account? Sign up now