Branch Prediction and a Deeper Pipeline

Bulldozer will use a deeper pipeline with less logic per stage compared to current Phenom II/Opteron processors. AMD argues that this will ensure clock speed won’t be a problem with the design and we should expect to see Bulldozer based products at similar if not higher clock speeds than what we have today with Phenom II.

With a deeper pipe, branch prediction becomes more important and Bulldozer has a significant change in the way branch prediction works.

In Phenom II, the branch prediction and instruction fetch logic are run in lockstep - when one stalls, the other also stalls. Branches are predicted as they are encountered. If the fetch logic grabs an x86 branch instruction, the prediction logic works in parallel to predict the likely target of that branch. However if the branch is incorrectly predicted, subsequent branches aren’t predicted until the current mispredict is correctly resolved. As a result, the fetch logic and prefetchers can’t work and potential performance is lost.

In Bulldozer the branch prediction and fetch logic are decoupled. The predictor now produces a queue of future fetch addresses. Even if there’s a mispredict the branch predictor can continue to fill its prediction queue with targets. The fetch logic can then check this queue of addresses against what’s in the instruction cache to avoid future misses in L1.


With Phenom AMD implemented comparable prefetching logic to what Intel did with Core. In Bulldozer, AMD is ramping up the aggressiveness of those prefetchers. There are independent prefetchers at both the L1 and L2 levels that support larger numbers of strides and large stride sizes (both compared to what exists in current AMD architectures). There’s also a non-strided data prefetcher that looks at correlated cache misses and uses that data to prefetch into the caches.

AMD unfortunately didn’t go into more detail on its prefetchers other than to promise that they are much more aggressive than what we have today. Aggressive prefetching usually means there’s a good amount of memory bandwidth available so I’m wondering if we’ll see Bulldozer adopt a 3 - 4 channel DDR3 memory controller in high end configurations similar to what we have today with Gulftown.

Power Gating & Real Turbo Mode

Each Bulldozer module in a processor can be clocked and power gated independently. This has two implications. You can now power off cores (in sets of two) that aren’t in use and save tons of idle power. You can also use the power savings to drive up the frequency of other cores in a Bulldozer CPU. With Bulldozer, AMD should have something functionally equivalent to Intel’s Turbo Boost modes. Since clock speed and power gating is controlled at the module level and not the core level there will still be some differences between the two but this should be much better than AMD’s current Core Turbo technology.

There’s of course extensive clock gating around the chip, but obviously the big change is power gating which AMD hasn’t had up to this point (Bobcat is also power gated).

Performance and Availability

While Bobcat is going to be in production in Q4 of this year, with system availability in Q1 of 2011 - Bulldozer is still a 2011 project and AMD isn’t giving any guidance as to when in 2011.

Parts are already back and in AMD’s labs but we have no indication of performance or rollout schedule. Given Bobcat’s schedule, I’d say that the first Bulldozer CPUs will be out no earlier than Q2 2011 and AMD’s unwillingness to specify what half of the year would imply that it’ll be a late Q2/early Q3 launch.

The first Bulldozer parts will be server focused, with high end desktop CPUs following but still in 2011.

A Real Redesign Final Words


View All Comments

  • SuperiorSpecimen - Tuesday, August 24, 2010 - link

    Let's see some competition outside of the price game! Reply
  • mrmojo1 - Tuesday, August 24, 2010 - link

    Awesome article, can't wait to see their release :) Should be very interesting! Reply
  • crawmm - Tuesday, August 24, 2010 - link

    I drooled on my laptop reading this. Thank you, Anand. Good overview. And fun reading after a day of tedious (and mindless) work. Reply
  • lothar98 - Tuesday, August 24, 2010 - link

    "In many ways the architecture looks to be on-par with what Intel has done with Nehalem/Westmere."

    I truly hope that this does not end up to be how things roll out. It has been far too long since we have seen good competition throughout the range of consumer CPU lineup. Currently we have options and competition in the mid-low end giving us exceptional bang for our buck. While one would never say you can get the best bang for your buck in the mid or high end everyone can still appreciate having options as well as getting value.
  • Freddo - Tuesday, August 24, 2010 - link

    Bobcat seems very interesting to me, I hope it won't take long until we see a good netbook with it, with good build quality (metal, no plastic toy), a HDMI port and 2GB RAM. Reply
  • Mike1111 - Tuesday, August 24, 2010 - link

    I'm wondering: what about AMD powered notebooks? And I don't mean netbooks or CULV notebooks. Looks like bulldozer won't come to notebooks until 2012, which would mean that AMD would most likely have to compete with Intel's 22nm Sandy Bridge successor, Ivy Bridge. Reply
  • Penti - Tuesday, August 24, 2010 - link

    Llano APU, it's briefly mentioned. It's where we're at. Basically K10-based 4-core with integrated DX11 GPU. Better then today but not much of a competition. Reply
  • mino - Tuesday, August 24, 2010 - link

    The GPU in the is supposed to be at least 5x the speed of current IGP performance.

    Basically you get a "discrete" GPU for a price of IGP ...
  • MonkeyPaw - Tuesday, August 24, 2010 - link

    I can see Bobcat scaling upward in notebooks. It's multi-core capable, and is a fully-functional CPU. A quad core Bobcat with better-than-Intel graphics should be a very fulfilling product for notebooks in the mid-range, while providing good battery life (thank you, power gating). Anything above that could be handled by low-voltage Bulldozers as a premium offering. To me, that seems like a better solution than Intel's, where the Atom to Core increase is so severe. Reply
  • Kiijibari - Tuesday, August 24, 2010 - link

    Ehh guys ...

    MMX is depracated in 64bit mode together with x87 and 3Dnow!:

    The x87, MMX, and 3DNow! instruction sets are deprecated in 64-bit modes. The instructions sets are still present for backward compatibility for 32-bit mode; however, to avoid compatibility issues in the future, their use in current and future projects is discouraged.

    Why on Earth should AMD build in 2 special MMX pipes in a brand new µarchitecture ?

    AMD just announced that they got rid of 3Dnow!, MMX pipes make no sense at all.

    You probably mean XOP, dont you ?

Log in

Don't have an account? Sign up now