The Impact of Bulldozer's Pipeline

With a new branch prediction architecture and an unknown, but presumably significantly deeper pipline, I was eager to find out just how much of a burden AMD's quest for frequency had placed on Bulldozer. To do so I turned to the trusty N-Queens solver, now baked into the AIDA64 benchmark suite.

The N-Queens problem is simple. On an N x N chessboard, how do you place N queens so they cannot attack one another? Solving the problem is incredibly branch intensive, and as a result it serves as a great measure of the impact of a deeper pipeline.

The AIDA64 implementation of the N-Queens algorithm is heavily threaded, but I wanted to first get a look at single-core performance so I disabled all but a single integer/fp core on Bulldozer, as well as the competing processors. I also looked at constant frequency as well as turbo enabled speeds:

Single Core Branch Predictor Performance—AIDA64 Queens Benchmark

Unfortunately things don't look good. Even with turbo enabled, the 3.6GHz Bulldozer part needs another 25% higher frequency to equal a 3.6GHz Phenom II X4. Even a 3.3GHz Phenom II X6 does better here. Without being fully aware of the optimizations at work in AIDA64 I wouldn't put too much focus on Sandy Bridge's performance here, but Intel is widely known for focusing on branch prediction performance.

If we let the N-Queens benchmark scale to all available threads, the performance issues are easily masked by throwing more threads at the problem:

SMP Branch Predictor Performance—AIDA64 Queens Benchmark

However it is quite clear that for single or lightly threaded operations that are branch heavy, Bulldozer will be in for a fight.

Power Management and Real Turbo Core Cache and Memory Performance
Comments Locked

430 Comments

View All Comments

  • Hrel - Wednesday, October 12, 2011 - link

    yes, I use it as a term that means being cheap. My friends often call me jewish cause I hunt for bargains pretty relentlessly.
  • silverblue - Friday, October 14, 2011 - link

    I get called Scottish for the same thing. ;)
  • poohbear - Wednesday, October 12, 2011 - link

    so disappointing to read this. What on earth were they doing all this time?? AMD's NEW cpu can't even outperform its OLD CPU? well atleast i can stick with my PhenomII X6 till Ivy Bridge comes out & thank goodness i didnt buy a pricey AM3+ before reading reviews.:p So sad to see AMD has come to this.....
  • OutsideLoopComputers - Wednesday, October 12, 2011 - link

    I think when Anand publishes benchmarks with a couple of Bulldozers working together in a dual or quad-socket board (Opteron), THEN we will see why AMD designed it the way they did. If the FX achieves parity and sometimes superiority in heavily multithreaded apps vs Sandy Bridge in a single socket, then imagine how two or four of these working together will do in server applications vs Sandy Bridge Xeon. I'll bet we see superiority in most server disciplines.

    I don't think this silicon was designed to go after Intel desktop processors, but to perform directly with dual and quad socket Xeon.

    Its intended to be an Opteron right now, and as an afterthought-to be sold as an FX desktop single socket part, to bridge the gap between A-series and Opteron.
  • JohanAnandtech - Wednesday, October 12, 2011 - link

    Indeed. The market for high-end desktop parts is very small, with low margins, and shrinking! The mobile market is growing, so AMD A6 en A8 CPUs make a lot more sense.

    The server market keeps growing, and the profit margins are excellent because a large percentage of the market wants high end parts (compare that to the desktop market, where almost every one wants the midrange and budgets). the Zip and crypting benchmarks show that Bulldozer is definitely not a complete failure. We'll see :-)
  • g101 - Wednesday, October 12, 2011 - link

    Good to see an intelligent reviewer that knows how to do more than run synthetic benchmarks and games.

    It's funny seeing all the uneducated gamer "complete failure" comments.
  • bassbeast - Thursday, February 9, 2012 - link

    I'm sorry but you are wrong sir and here is why: They are marketing this chip at the CONSUMER and NOT the server, which makes it a total Faildozer.

    If they would have kept P2 for the consumer and kept BD for the Opteron then you sir would have been 100% correct, but by killing their P2 they have just admitted they are out of the desktop CPU business and for a company that small that is a seriously DUMB move. Their Athlon and P2 have been the "go to" chip for many of us system builders because it gave "good enough" performance in the apps that people use, but Faildozer is a hot pig of a chip that is worse for consumer loads in every. single. way. over the P2.

    I'm just glad i bought an X6 when i did, but when i can no longer get the P2 and Athlon II for new builds i'll be switching to intel, the BD simply is worthless for the consumer market and NEVER should have been marketed to it in the first place! so please get off your high horse and admit the truth, the BD chip should have never been sold for anything but servers.
  • haplo602 - Wednesday, October 12, 2011 - link

    This is a server CPU abused for the desktop.

    Have a look at FPU performance. Almost clock for clock (3.3G vs 3.6G) it beats 6 FPU units in Phenom X6. That's quite nice.

    Once they do some optimisations on a mature process, this will achieve SB performance levels. However until then I am going for 2389 optys ....
  • GourdFreeMan - Wednesday, October 12, 2011 - link

    You introduce the fact that AMD lengthened the pipeline transitioning to Bulldozed without explicitly mentioning the pipeline length. How many stages exactly is Bulldozer's pipeline?
  • duploxxx - Wednesday, October 12, 2011 - link

    Well there clearly seems to be something wrong with the usage of the modules in combination with the way to high latency on any cache and memory. single threaded performance is hit by that and so does lack any gaming performance.

    So I hope anandtech can have a clear look at the following thread and continue to seek further:
    http://www.xtremesystems.org/forums/showthread.php...

    secondly during OC just like previous gen, do something more with NB oc in stead of just upping the GHZ, there is more to an architecture then just the ghz....

Log in

Don't have an account? Sign up now