The Impact of Bulldozer's Pipeline

With a new branch prediction architecture and an unknown, but presumably significantly deeper pipline, I was eager to find out just how much of a burden AMD's quest for frequency had placed on Bulldozer. To do so I turned to the trusty N-Queens solver, now baked into the AIDA64 benchmark suite.

The N-Queens problem is simple. On an N x N chessboard, how do you place N queens so they cannot attack one another? Solving the problem is incredibly branch intensive, and as a result it serves as a great measure of the impact of a deeper pipeline.

The AIDA64 implementation of the N-Queens algorithm is heavily threaded, but I wanted to first get a look at single-core performance so I disabled all but a single integer/fp core on Bulldozer, as well as the competing processors. I also looked at constant frequency as well as turbo enabled speeds:

Single Core Branch Predictor Performance—AIDA64 Queens Benchmark

Unfortunately things don't look good. Even with turbo enabled, the 3.6GHz Bulldozer part needs another 25% higher frequency to equal a 3.6GHz Phenom II X4. Even a 3.3GHz Phenom II X6 does better here. Without being fully aware of the optimizations at work in AIDA64 I wouldn't put too much focus on Sandy Bridge's performance here, but Intel is widely known for focusing on branch prediction performance.

If we let the N-Queens benchmark scale to all available threads, the performance issues are easily masked by throwing more threads at the problem:

SMP Branch Predictor Performance—AIDA64 Queens Benchmark

However it is quite clear that for single or lightly threaded operations that are branch heavy, Bulldozer will be in for a fight.

Power Management and Real Turbo Core Cache and Memory Performance
Comments Locked

430 Comments

View All Comments

  • ThaHeretic - Saturday, October 15, 2011 - link

    Here's something for a compile test: build the Linux kernel. Something people actually care about.
  • Loki726 - Monday, October 31, 2011 - link

    The linux kernel is more or less straight C with a little assembly; it is much easier on a compiler frontend and more likely to stress the backend optimizers and code generators.

    Chromium is much more representative of a modern C++ codebase. At least, it is more relevant to me.
  • nyran125 - Saturday, October 15, 2011 - link

    Whats the point in having 8 cores, if its not even as fast as an intel 4 core and you get better performance overall with intel.. Heres the BIG reality, the high end 8 core is not that much cheaper than a 2600K. Liek $20-60 MAX> Youd be crazy to buy an 8 core for the same price as an intel 2600K...

    LIKE MAD!!!
  • Fiontar - Saturday, October 15, 2011 - link

    Well, these numbers are pretty dismal all around. Maybe as the architecture and the process mature, this design will start to shine, but for the first generation, the results are very disappointing.

    As someone who is running a Phenom II X6 at a non-turbo core 4.0 Ghz, air cooled, I just don't see why I would want to upgrade. If I got lucky and got a BD overclock to 4.6 Ghz, I might get a single digit % increase in performance over my Phenom II X6, which is not worth the cost or effort.

    I guess on the plus side, my Phenom II was a good upgrade investment. Unless I'm tempted to upgrade to an Intel set up in the near future, I think I can expect to get another year or two from my Phenom II before I start to see upgrade options that make sense. (I usually wait to upgrade my CPU until I can expect about a 40% increase in performance over my current system at a reasonable price).

    I hope AMD is able to remain competitive with NVidia in the GPU space, because they just aren't making it in the CPU space.

    BTW, if the BD can reliably be overclocked to to 4.5Ghz+, why are they only selling them at 3.3 Ghz? I'm guessing because the added power requirements then make them look bad on power consumption and performance per watt, which seems to be trumping pure performance as a goal for their CPU releases.
  • Fiontar - Saturday, October 15, 2011 - link

    A big thumbs down to Anand for not posting any of the over-clock benchmarks. He ran them, why not include them in the review?

    With the BD running at an air cooled 4.5 Ghz, or a water cooled 5.0 Ghz, both a significant boost over the default clock speed, the OC benchmarks are more important to a lot of enthusiasts than the base numbers. In the article you say you ran the benchmarks on the OC part, why didn't you include them in your charts? Or at least some details in the section of the article on the Over-clock? You tell us how high you managed to over-clock the BD and under what conditions, but you gave us zero input on the payoff!
  • Oscarcharliezulu - Saturday, October 15, 2011 - link

    ...was going to upgrade my old amd3 system to a BD, just a dev box, but I think a phenom x6 or 955 will be just fine. Bit sad too.
  • nhenk--1 - Sunday, October 16, 2011 - link

    I think Anand hit the nail on the head mentioning that clock frequency is the major limitation of this chip. AMD even stated that they were targeting a 30% frequency boost. A 30% frequency increase over a 3.2 GHz Phenom II (AM3 launch frequency i think) would be 4.2 GHz, 17% faster than the 3.6 GHz 8150.

    If AMD really did make this chip to scale linearly to frequency increases, and you add 17% performance to any of the benchmarks, BD would roughly match the i7. This was probably the initial intention at AMD. Instead the gigantic die, and limitations of 32nm geometries shot heat and power through the roof, and that extra 17% is simply out of reach.

    I am an AMD fan, but at this point we have to accept that we (consumers) are not a priority. AMD has been bleeding share in the server space where margins are high, and where this chip will probably do quite well. We bashed Barcelona at release too (I was still dumb enough to buy one), but it was a relative success in the server market.

    AMD needs to secure its spot in the server space if it wants to survive long term. 5 years from now we will all be connecting to iCloud with our ARM powered Macbook Vapor thin client laptops, and a server will do all of the processing for us. I will probably shed a tear when that happens, I like building PCs. Maybe I'll start building my own dedicated servers.

    The review looked fair to me, seems like Anand is trying very hard to be objective.
  • neotiger - Monday, October 17, 2011 - link

    "server space where margins are high, and where this chip will probably do quite well."

    I don't see how Bulldozer could possibly do well in the server space. Did you see the numbers on power consumption? Yikes.

    For servers power consumption is far more important than it is in the consumer space. And BD draws about TWICE as much power as Sandy Bridge does while performs worse.

    BD is going to fail worse in the server space than it will in the consumer space.
  • silverblue - Monday, October 17, 2011 - link

    I'm not sure that I agree.

    For a start, you're far more likely to see heavily threaded workloads on servers than in the consumer space. Bulldozer does far better here than with lightly threaded workloads and even the 8150 often exceeds the i7-2600K under such conditions, so the potential is there for it to be a monster in the server space. Secondly, if Interlagos noticably improves performance over Magny Cours then coupled with the fact that you only need the Interlagos CPU to pop into your G34 system means this should be an upgrade. Finally, power consumption is only really an issue with Bulldozer when you're overclocking. Sure, Zambezi is a hungrier chip, but remember that it's got a hell of a lot more cache and execution hardware under the bonnet. Under the right circumstances, it should crush Thuban, though admittedly we expected more than just "under the right circumstances".

    I know very little about servers (obviously), however I am looking forward to Johan's review; it'd be good to see this thing perform to its strengths.
  • neotiger - Monday, October 17, 2011 - link

    First, in the server space BD isn't competing with i7-2600K. You have to remember that all the current Sandy Bridge i7 waste a big chunk of silicon real estate on GPU, which is useless in servers. In 3 weeks Intel is releasing the 6 core version of SB, essentially take the transistors that have been used for GPU and turn them into 2 extra cores.

    Even in highly threaded workloads 8150 performs more or less the same level as i7-2600K. In 3 weeks SB will increase threaded performance by 50% (going from 4 cores to 6). Once again the performance gap between SB and BD will be huge, in both single-threaded and multi-threaded workloads.

    Second, BD draws much higher power than SB even in stock frequency. This is born out by the benchmark data in the article.

Log in

Don't have an account? Sign up now