The Bulldozer Review: AMD FX-8150 Testedby Anand Lal Shimpi on October 12, 2011 1:27 AM EST
The Pursuit of Clock Speed
Thus far I have pointed out that a number of resources in Bulldozer have gone down in number compared to their abundance in AMD's Phenom II architecture. Many of these tradeoffs were made in order to keep die size in check while adding new features (e.g. wider front end, larger queues/data structures, new instruction support). Everywhere from the Bulldozer front-end through the execution clusters, AMD's opportunity to increase performance depends on both efficiency and clock speed. Bulldozer has to make better use of its resources than Phenom II as well as run at higher frequencies to outperform its predecessor. As a result, a major target for Bulldozer was to be able to scale to higher clock speeds.
AMD's architects called this pursuit a low gate count per pipeline stage design. By reducing the number of gates per pipeline stage, you reduce the time spent in each stage and can increase the overall frequency of the processor. If this sounds familiar, it's because Intel used similar logic in the creation of the Pentium 4.
Where Bulldozer is different is AMD insists the design didn't aggressively pursue frequency like the P4, but rather aggressively pursued gate count reduction per stage. According to AMD, the former results in power problems while the latter is more manageable.
AMD's target for Bulldozer was a 30% higher frequency than the previous generation architecture. Unfortunately that's a fairly vague statement and I couldn't get AMD to commit to anything more pronounced, but if we look at the top-end Phenom II X6 at 3.3GHz a 30% increase in frequency would put Bulldozer at 4.3GHz.
Unfortunately 4.3GHz isn't what the top-end AMD FX CPU ships at. The best we'll get at launch is 3.6GHz, a meager 9% increase over the outgoing architecture. Turbo Core does get AMD close to those initial frequency targets, however the turbo frequencies are only typically seen for very short periods of time.
As you may remember from the Pentium 4 days, a significantly deeper pipeline can bring with it significant penalties. We have two prior examples of architectures that increased pipeline length over their predecessors: Willamette and Prescott.
Willamette doubled the pipeline length of the P6 and it was due to make up for it by the corresponding increase in clock frequency. If you do less per clock cycle, you need to throw more clock cycles at the problem to have a neutral impact on performance. Although Willamette ran at higher clock speeds than the outgoing P6 architecture, the increase in frequency was gated by process technology. It wasn't until Northwood arrived that Intel could hit the clock speeds required to truly put distance between its newest and older architectures.
Prescott lengthened the pipeline once more, this time quite significantly. Much to our surprise however, thanks to a lot of clever work on the architecture side Intel was able to keep average instructions executed per clock constant while increasing the length of the pipe. This enabled Prescott to hit higher frequencies and deliver more performance at the same time, without starting at an inherent disadvantage. Where Prescott did fall short however was in the power consumption department. Running at extremely high frequencies required very high voltages and as a result, power consumption skyrocketed.
AMD's goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:
At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD's efforts, IPC went down.
A slight reduction in IPC however is easily made up for by an increase in operating frequency. Unfortunately, it doesn't appear that AMD was able to hit the clock targets it needed for Bulldozer this time around.
We've recently reported on Global Foundries' issues with 32nm yields. I can't help but wonder if the same type of issues that are impacting Llano today are also holding Bulldozer back.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Kristian Vättö - Wednesday, October 12, 2011 - linkI'm happy that I went with i5-2500K. Performance, especially in gaming, seems to be pretty horrible.
ckryan - Wednesday, October 12, 2011 - linkI was just going to say the same thing. I was all about AMD last year, but early this year I picked up an i5 2500K and was blown away by efficiency and performance even in a hobbled H67. Once I bought a proper P67, it was on. It's not that Bulldozer is terrible (because it isn't); Sandy Bridge is just a "phenom". If SB had just been a little faster than Lynnfield, it would still be fast. But it's a big leap to SB, and it's certainly the best value. AMD has Bulldozer, an inconsistent performer that is better in some areas and worse in others, but has a hard time competing with it's own forebearer. It's still an unusual product that some people will really benefit from and some wont. The demise of the Phenom II can't come soon enough for AMD as some people will look at the benchmarks and conclude that a super cheap X4 955BE is a much better value than BD. I hope it isn't seen that way, but it's not a difficult conclusion to reach. Perhaps BD is more forward looking, and the other octocore will be cheaper than the 8150 so it's a better value. I'd really like to see the performance of the 4- and 6- before making judgement.
It's still technically a win, but it's a Pyrrhic victory.
ogreslayer - Wednesday, October 12, 2011 - linkI tell friends that exact thing all the time. Phenoms are great CPUs but switch to Nehelam or Sandy Bridge and the speed is noticibly different. At equal clocks Core 2 Quads are as fast or faster.
Bulldozer ends up with a lot of issues fanboys refused to see even though Anandtech and other sites did bring it up in previews. I guess it was just hope and a understandable disbelief that AMD would be behind for a decade till the next architecture. We can start at clockspeed but only being dual-channel is not helping memory bandwidth. I don't think there is enough L3 and they most definitely should have a shortpipeline to crush through processes. They need an 1.4 to 1.6 in CBmarks or what is thhe point of the modules.
The module philosophy is probably close to the future of x86 but I imagine seeing Intel keeping HT enabled on the high-end SKUs. Also I think both of them want to switch FP calculation over to GPUs.
slickr - Wednesday, October 12, 2011 - linkYeah I agree. To me Bulldozer comes like 1 year late.
Its just not competitive enough and the fact that you have to make a sacrifice to single threaded performance for multithreaded when even the multithreaded isn't that good and looses to 2600K is just sad.
They needed to win big with Bulldozer and they failed hard!
retrospooty - Wednesday, October 12, 2011 - linkYa, it seems to be a pattern lately with the last few AMD architectures.
1. Hype up the CPU as the next big thing
2. Release is delayed
3. Once released, benchmarks are severely underwhelming
JasperJanssen - Wednesday, October 12, 2011 - link4. Immediately start hyping up the next release as the salvation of all.
GatorLord - Thursday, October 20, 2011 - linkIt looks to me like BD is the CPU beta bug sponge for Trinity and beyond. Everybody these days releases a beta before the money launch.
Hence the B3 stepping...and probably a few more now that a capable fab is onboard with TSMC. BD is not a CPU like we're used to...its an APU/HPC engine designed to drive code and a Cayman class GPU at 28nm and lots of GHz...I get it now.
Also, the whole massive cache and 2B transistors, 800M dedicated to I/O, thing (SB uses 995M total) finally makes sense when you realize that this chip was designed to pump many smaller GPGPU caches full of raw data to process and combine all the outputs quickly.
Apparently GPUs compute very fast, but have slow fetch latencies and the best way to overcome that is by having their caches continously and rapidly filled...like from the CPU with the big cache and I/O machine on the same chip...how smart..and convenient...and fast.
Can you say 'OpenCL'?
jleach1 - Friday, October 21, 2011 - linkI don't see how this can be considered an APU, This product isn't being marketed as a HPC proc., and i don't see the benefit of this architecture design in GPGPU environments at all.
It's sad...i've always given major kudos to AMD. Back in the days of the Athlon's prime, it was awesome to see david stomping goliath.
But AMD has dropped the ball continuously since then. Thuban was nice, but it might as well be considered a fluke, seeing as AMD took a worthy architecture (Thuban) and ditched it for what's widely considered as a joke.
And the phrase "AMD dropped the ball" is an understatement.
They've ultimately failed. They havent competed with Intel in years. They...have...failed. After thuban came out i was starting to think that the fact that they competed for years on price and clock speed alone was a fluke, and just a blip on the radar. Now i see it the opposite way...it seems that AMD merely puts out good processors every once in a while...and only on accident.
medi01 - Wednesday, October 12, 2011 - linkWell, if anand didn't badmouth AMD's GPU's on top of CPU's, we would see less "fanboys" complainging about anand's bias.
vol7ron - Wednesday, October 12, 2011 - linkBy badmouth do you mean objectively tell the truth? Do you blame PCMark or FutureMark for any of that? Perhaps if all the tests just said that AMD was clearly better, it wouldn't be badmouthing anymore.