The Impact of Bulldozer's Pipeline

With a new branch prediction architecture and an unknown, but presumably significantly deeper pipline, I was eager to find out just how much of a burden AMD's quest for frequency had placed on Bulldozer. To do so I turned to the trusty N-Queens solver, now baked into the AIDA64 benchmark suite.

The N-Queens problem is simple. On an N x N chessboard, how do you place N queens so they cannot attack one another? Solving the problem is incredibly branch intensive, and as a result it serves as a great measure of the impact of a deeper pipeline.

The AIDA64 implementation of the N-Queens algorithm is heavily threaded, but I wanted to first get a look at single-core performance so I disabled all but a single integer/fp core on Bulldozer, as well as the competing processors. I also looked at constant frequency as well as turbo enabled speeds:

Single Core Branch Predictor Performance—AIDA64 Queens Benchmark

Unfortunately things don't look good. Even with turbo enabled, the 3.6GHz Bulldozer part needs another 25% higher frequency to equal a 3.6GHz Phenom II X4. Even a 3.3GHz Phenom II X6 does better here. Without being fully aware of the optimizations at work in AIDA64 I wouldn't put too much focus on Sandy Bridge's performance here, but Intel is widely known for focusing on branch prediction performance.

If we let the N-Queens benchmark scale to all available threads, the performance issues are easily masked by throwing more threads at the problem:

SMP Branch Predictor Performance—AIDA64 Queens Benchmark

However it is quite clear that for single or lightly threaded operations that are branch heavy, Bulldozer will be in for a fight.

Power Management and Real Turbo Core Cache and Memory Performance
Comments Locked

430 Comments

View All Comments

  • THizzle7XU - Wednesday, October 12, 2011 - link

    Well, why would you target the variable PC segment when you can program for a well established, large user-base platform with a single configuration and make a ton more money with probably far less QA work since there's only one set (two for multi-platform PS3 games) of hardware to test?

    And it's not like 360/PS3 games suddenly look like crap 5-6 years into their cycles. Think about how good PS2 games looked 7 years into that system's life cycle (God of War 2). Devs are just now getting the most of of the hardware. It's a great time to be playing games on 360/PS3 (and PC!).
  • GatorLord - Wednesday, October 12, 2011 - link

    Consider what AMD is and what AMD isn't and where computing is headed and this chip is really beginning to make sense. While these benches seem frustrating to those of us on a desktop today I think a slightly deeper dive shows that there is a whole world of hope here...with these chips, not something later.

    I dug into the deal with Cray and Oak Ridge, and Cray is selling ORNL massively powerful computers (think petaflops) using Bulldozer CPUs controlling Nvidia Tesla GPUs which perform the bulk of the processing. The GPUs do vastly more and faster FPU calculations and the CPU is vastly better at dishing out the grunt work and processing the results for use by humans or software or other hardware. This is the future of High Performance Computing, today, but on a government scale. OK, so what? I'm a client user.

    Here's what: AMD is actually best at making GPUs...no question. They have been in the GPGPU space as long as Nvidia...except the AMD engineers can collaborate on both CPU and GPU projects simultaneously without a bunch of awkward NDAs and antitrust BS getting in the way. That means that while they obviously can turn humble server chips into supercomputers by harnessing the many cores on a graphics card, how much more than we've seen is possible on our lowly desktops when this rebranded server chip enslaves the Ferraris on the PCI bus next door...the GPUs.

    I get it...it makes perfect sense now. Don't waste real estate on FPU dies when the one's next door are hundreds or thousands of times better and faster too. This is not the beginning of the end of AMD, but the end of the beginning (to shamlessely quote Churchill). Now all that cryptic talk about a supercomputer in your tablet makes sense...think Llano with a so-so CPU and a big GPU on the same die with some code tweaks to schedule the GPU as a massive FPU and the picture starts taking shape.

    Now imagine a full blown server chip (BD) harnessing full blown GPUs...Radeon 6XXX or 7XXX and we are talking about performance improvements in the orders of magnitude, not percentage points. Is AMD crazy? I'm thinking crazy like a fox.

    Oh..as a disclaimer, while I'm long AMD...I'm just an enthusiast like the rest of you and not a shill...I want both companies to make fast chips that I can use to do Monte Carlos and linear regressions...it just looks like AMD seems to have figured out how to play the hand they're holding for change...here's to the future for us all.
  • Menoetios - Wednesday, October 12, 2011 - link

    I think you bring up a very good point here. This chip looks like it's designed to be very closely paired with a highly programmable GPU, which is where the GPU roadmaps are leading over the next year and a half. While the apples-to-apples nature of this review draw a disappointing picture, I'm very curious how AMD's "Fusion" products next year will look, as the various compute elements of the CPU and GPU become more tightly integrated. Bulldozer appears to fit perfectly in an ecosystem that we don't quite have yet.
  • GatorLord - Wednesday, October 12, 2011 - link

    Exactly. Ecosystem...I like it. This is what it must feel like to pick up a flashlight at the entrance to the tunnel when all you're used to is clubs and torches. Until you find the switch, it just seems worse at either...then viola!
  • actionjksn - Wednesday, October 12, 2011 - link

    Wow I hope that made you feel better about the crappy chip also known a "Man With A Shovel"
    I was just hoping AMD would quit forcing Intel to have to keep on crippling their chips, just to keep them from putting AMD out of business. AMD better fix this abortion quick, this is getting old.
  • GatorLord - Thursday, October 13, 2011 - link

    Feeling fine. Not as good in the short run, but feeling better about the long run. Unfortunately, due to constraints, it takes AMD too long to get stuff dialed in and by the time they do, Intel has already made an end run and beat them to the punch.

    Intel can do that, they're 40x as big as AMD. Actually, and this may sound crazy until you digest it, the smartest thing Intel could do is spin off a couple of really good dev labs as competitors. Relying on AMD to drive your competition is risky in that AMD may not be able to innovate fast enough to push Intel where it could be if they had more and better sharks in the water nipping at their tails.

    You really need eight or more highly capable, highly aggressive competitors to create a fully functioning market free of monopolistic and oligopolistic sluggishness and BS hand signalling between them. This space is too capital intensive for that at the time being with the current chip making technology what it is.
  • yankeeDDL - Wednesday, October 12, 2011 - link

    Just to be the devil's advocate ...
    The launch event in London sported 2 PC, side by side, running Cinebench.
    One had the core i5-2500k, the other the FX8150.
    Of course, these systems are prepared by AMD, so the results from Anand are clearly more reliable (at least all the conditions are documented).
    Nevertheless, it is clear that in the demo from AMD, the FX runs faster. Not by a lot, but it is clearly faster than the i5.
    Video: http://www.viddler.com/explore/engadget/videos/335...

    Even so, assuming that this was a valid datapoint, things won't change too much: the i5-2500k is cheaper and (would be) slightly slower than the FX8150 in the most heavily threaded benchmark. But it would be slightly better than Anand's results show.
  • KamikaZeeFu - Wednesday, October 12, 2011 - link

    "Nevertheless, it is clear that in the demo from AMD, the FX runs faster. Not by a lot, but it is clearly faster than the i5."

    Check the review, cinebench r11.5 multithreaded chart.
    Anand's numbers mirror the ones by AMD. Multithreaded workloads are the only case where the 8150 will outperform an i5 2500k because it can process twice the amount of threads.

    Really disappointed in AMD here, but I expected subpar performance because it was eerily quiet about the FX line as far as performance went.

    Desktop BD is a full failure, they were aiming for high clock speeds and made sacrifices, but still failed their objective. By the time their process is mature and 4 GHz dozers hit the channel, Ivy bridge will be out.

    As far as server performance goes, not even sure they will succeed there.
    As seen in the review, clock for clock performance isn't up compared to the prvious generation, and in some cases it's actually slower. Considering that servers run at lower clocks in the first place, I don't see BD being any threat to intels server lineup.

    4 years to develop this chip, and their motto seemed to be "we'll do netburst but in not-fail"
  • medi01 - Wednesday, October 12, 2011 - link

    So CPU is a bottleneck in your games eh?
  • TekDemon - Wednesday, October 12, 2011 - link

    It's not but people don't buy CPUs for today's games, generally you want your system to be future proof so the more extra headroom there is in these CPU benchmarks the better it holds up over the long term. Look back at CPU benchmarks from 3-4 years ago and you'll see that the CPUs that barely passed muster back then easily bottleneck you whereas CPUs that had extra headroom are still usable for gaming. For example the Core 2 Duo E8400 or E8500 is still a very capable gaming CPU, especially when given a mild overclock and frankly in games that only use a few threads (like Starcraft 2) it gives Bulldozer a run for the money.
    I'm not a fanboy either way since I own that E8400 as well as a Phenom II (unlocked to X4, OC'ed to 3.9Ghz) and a i5 2500K but if I was building a new system I sure as heck would want extra headroom for future-proofing.
    That said? Of course these chips will be more than enough power for general use. They're just not going to be good for high end systems. But in a general use situation the problem is that the power consumption is just crappy compared to the intel solutions, even if you can argue that it's more than enough power for most people why would you want to use more electricity?

Log in

Don't have an account? Sign up now