Branch Prediction and a Deeper Pipeline

Bulldozer will use a deeper pipeline with less logic per stage compared to current Phenom II/Opteron processors. AMD argues that this will ensure clock speed won’t be a problem with the design and we should expect to see Bulldozer based products at similar if not higher clock speeds than what we have today with Phenom II.

With a deeper pipe, branch prediction becomes more important and Bulldozer has a significant change in the way branch prediction works.

In Phenom II, the branch prediction and instruction fetch logic are run in lockstep - when one stalls, the other also stalls. Branches are predicted as they are encountered. If the fetch logic grabs an x86 branch instruction, the prediction logic works in parallel to predict the likely target of that branch. However if the branch is incorrectly predicted, subsequent branches aren’t predicted until the current mispredict is correctly resolved. As a result, the fetch logic and prefetchers can’t work and potential performance is lost.

In Bulldozer the branch prediction and fetch logic are decoupled. The predictor now produces a queue of future fetch addresses. Even if there’s a mispredict the branch predictor can continue to fill its prediction queue with targets. The fetch logic can then check this queue of addresses against what’s in the instruction cache to avoid future misses in L1.

Prefetchers

With Phenom AMD implemented comparable prefetching logic to what Intel did with Core. In Bulldozer, AMD is ramping up the aggressiveness of those prefetchers. There are independent prefetchers at both the L1 and L2 levels that support larger numbers of strides and large stride sizes (both compared to what exists in current AMD architectures). There’s also a non-strided data prefetcher that looks at correlated cache misses and uses that data to prefetch into the caches.

AMD unfortunately didn’t go into more detail on its prefetchers other than to promise that they are much more aggressive than what we have today. Aggressive prefetching usually means there’s a good amount of memory bandwidth available so I’m wondering if we’ll see Bulldozer adopt a 3 - 4 channel DDR3 memory controller in high end configurations similar to what we have today with Gulftown.

Power Gating & Real Turbo Mode

Each Bulldozer module in a processor can be clocked and power gated independently. This has two implications. You can now power off cores (in sets of two) that aren’t in use and save tons of idle power. You can also use the power savings to drive up the frequency of other cores in a Bulldozer CPU. With Bulldozer, AMD should have something functionally equivalent to Intel’s Turbo Boost modes. Since clock speed and power gating is controlled at the module level and not the core level there will still be some differences between the two but this should be much better than AMD’s current Core Turbo technology.

There’s of course extensive clock gating around the chip, but obviously the big change is power gating which AMD hasn’t had up to this point (Bobcat is also power gated).

Performance and Availability

While Bobcat is going to be in production in Q4 of this year, with system availability in Q1 of 2011 - Bulldozer is still a 2011 project and AMD isn’t giving any guidance as to when in 2011.

Parts are already back and in AMD’s labs but we have no indication of performance or rollout schedule. Given Bobcat’s schedule, I’d say that the first Bulldozer CPUs will be out no earlier than Q2 2011 and AMD’s unwillingness to specify what half of the year would imply that it’ll be a late Q2/early Q3 launch.

The first Bulldozer parts will be server focused, with high end desktop CPUs following but still in 2011.

A Real Redesign Final Words
Comments Locked

76 Comments

View All Comments

  • mino - Tuesday, August 24, 2010 - link

    From the HW design POW, those pipes are "MMX/3Dnow" class stuff.
    They run SSE3, but they are still MMX-class.

    There is a reason Bulldozer has "FMAC" written there ...
  • Kiijibari - Tuesday, August 24, 2010 - link

    ... it is stupid to name a circuit after a deprecated ISA extension and not after its function.
    If its doing stuff like 3dnow and mmx then call it Shuffel / permutation pipeline but not MMX ...

    The FMAC is the best example .. why is it written FMAC in that case and not SSE5/AVX/XOP ?
  • KonradK - Thursday, August 26, 2010 - link

    Depracated does not mean prohibited. Also there are existing MMX programs and other than Windows 64bit operating systems and compilers other than MSVSC.

    MMX and x87 is prohibited in 64bit kernel code.

    http://msdn.microsoft.com/en-us/library/ff545910%2...
  • iwod - Tuesday, August 24, 2010 - link

    From the design of Bulldozer's FPU it is cleared that AMD want Multi Threaded FPU to run on OpenCL. While the dual Integer looks interesting now. It is up against the SandyBridge, the architecture that is suppose to leap again like Pentium 4 to C2D. And if Bulldozer comes any later, it will be up against the die shrink of SandyBridge, Ivy Bridge. Things dont look so good in here.

    It is mainstream / low end that looks very interesting. I am currently using a Pentium M 1.8Ghz Dothan with 2GB DDR Ram. With a Radeon 1600 Graphics. I dont get hardware acceleration from GPU, 720P is just barely playable with some very fast software decoder. It is fast enough to watch some 460p youtube and most of my day web serving.

    Now if Bobcat have similar or higher IPC then Dothan. A Quad Core Bobcat with Radeon 5000 64 SP will still be within reasonable die size on 40nm, It will be cheap when it drops to 32nm or lower. Most of us dont need SUPER FAST computer. And Bobcat with Radeon 5 Series or Higher Plus a Fast SSD are all we need.
  • aegisofrime - Tuesday, August 24, 2010 - link

    I don't recall Sandy Bridge being a revolutionary leap. Everyone has been saying that it's more of evolutionary, the main difference being the addition of AVX.

    I REALLY REALLY REALLY hope that AMD announces later today what socket Bulldozer will be on... I desperately need more video encoding performance. I have a AM2+ motherboard and that bloody 1055T is singing it's siren song to me every night. If Bulldozer is on AM3 I can get an AM3 board and the 1055T and do a quick upgrade to Bulldozer.

    Come on AMD. Your customers need more information to make an informed decision!
  • mino - Tuesday, August 24, 2010 - link

    Buldozer gen1 == primarily servers
    => 16/12-core (MCM) Socket G34 (current platfrom)
    => 8/6/4-core Socket G32 (current platfrom)

    Bulldozer Desktop (hopefully before X-mas 2011)
    => 8?/6/4-core Socket AM3R2(or AM3+, whatever they call it)
  • Pirks - Tuesday, August 24, 2010 - link

    Huh? You want more video encoding perfomance and you think about upgrading CPU? What kind of idiocy is that? Use 480GTX with Badaboom and your video encoding speed won't be matched by CPUs of year 2020 or maybe even 2030 :P
  • aegisofrime - Tuesday, August 24, 2010 - link

    Don't talk if you don't know what you are talking about. No GPU encoder out there is able to match x264 quality or SPEED wise. And the huge flaw in your statement is that Badaboom doesn't even support Fermi GPUs right now.

    Have you done any serious video encoding before, or are you just trolling as usual?
  • ChronoReverse - Tuesday, August 24, 2010 - link

    Indeed. I would try out CUDA encoders every once in a while in hopes that I could at least get the quality of x264 at MINIMUM quality but they can't even match that.

    Since x264 at minimum quality encodes slightly quicker (on my quad core) a CUDA encoder does (on my GTX260) and still yields better quality, I really appreciate faster CPU's.
  • mapesdhs - Tuesday, August 24, 2010 - link


    Hate to say it but unless GPU acceleration is available, the i7 is a far better
    choice for video encoding. I still use a 6000+ for most tasks, but numerous
    article reviews made it quite clear that AMD was not the best choice for
    video encoding, so I went with an i7 860 4GHz. Pricing was surprisingly good,
    speed is excellent.

    Ian.

Log in

Don't have an account? Sign up now