Cache and Memory Performance

I mentioned earlier that cache latencies are higher in order to accommodate the larger caches (8MB L2 + 8MB L3) as well as the high frequency design. We turned to our old friend cachemem to measure these latencies in clocks:

Cache/Memory Latency Comparison
  L1 L2 L3 Main Memory
AMD FX-8150 (3.6GHz) 4 21 65 195
AMD Phenom II X4 975 BE (3.6GHz) 3 15 59 182
AMD Phenom II X6 1100T (3.3GHz) 3 14 55 157
Intel Core i5 2500K (3.3GHz) 4 11 25 148

Cache latencies are up significantly across the board, which is to be expected given the increase in pipeline depth as well as cache size. But is Bulldozer able to overcome the increase through higher clocks? To find out we have to convert latency in clocks to latency in nanoseconds:

Memory Latency

We disable turbo in order to get predictable clock speeds, which lets us accurately calculate memory latency in ns. The FX-8150 at 3.6GHz has a longer trip down memory lane than its predecessor, also at 3.6GHz. The higher latency caches play a role in this as they are necessary to help drive AMD's frequency up. What happens if we turn turbo on and peg the FX-8150 at 3.9GHz? Memory latency goes down. Bulldozer still isn't able to get to main memory as quickly as Sandy Bridge, but thanks to Turbo Core it's able to do so better than the outgoing Phenom II.

L3 Cache Latency

L3 access latency is effectively a wash compared to the Phenom II thanks to the higher clock speeds enabled by Turbo Core. Latencies haven't really improved though, and Bulldozer has a long way to go before it reaches Sandy Bridge access latencies.

The Impact of Bulldozer's Pipeline Windows 7 Application Performance
Comments Locked

430 Comments

View All Comments

  • ThaHeretic - Saturday, October 15, 2011 - link

    Here's something for a compile test: build the Linux kernel. Something people actually care about.
  • Loki726 - Monday, October 31, 2011 - link

    The linux kernel is more or less straight C with a little assembly; it is much easier on a compiler frontend and more likely to stress the backend optimizers and code generators.

    Chromium is much more representative of a modern C++ codebase. At least, it is more relevant to me.
  • nyran125 - Saturday, October 15, 2011 - link

    Whats the point in having 8 cores, if its not even as fast as an intel 4 core and you get better performance overall with intel.. Heres the BIG reality, the high end 8 core is not that much cheaper than a 2600K. Liek $20-60 MAX> Youd be crazy to buy an 8 core for the same price as an intel 2600K...

    LIKE MAD!!!
  • Fiontar - Saturday, October 15, 2011 - link

    Well, these numbers are pretty dismal all around. Maybe as the architecture and the process mature, this design will start to shine, but for the first generation, the results are very disappointing.

    As someone who is running a Phenom II X6 at a non-turbo core 4.0 Ghz, air cooled, I just don't see why I would want to upgrade. If I got lucky and got a BD overclock to 4.6 Ghz, I might get a single digit % increase in performance over my Phenom II X6, which is not worth the cost or effort.

    I guess on the plus side, my Phenom II was a good upgrade investment. Unless I'm tempted to upgrade to an Intel set up in the near future, I think I can expect to get another year or two from my Phenom II before I start to see upgrade options that make sense. (I usually wait to upgrade my CPU until I can expect about a 40% increase in performance over my current system at a reasonable price).

    I hope AMD is able to remain competitive with NVidia in the GPU space, because they just aren't making it in the CPU space.

    BTW, if the BD can reliably be overclocked to to 4.5Ghz+, why are they only selling them at 3.3 Ghz? I'm guessing because the added power requirements then make them look bad on power consumption and performance per watt, which seems to be trumping pure performance as a goal for their CPU releases.
  • Fiontar - Saturday, October 15, 2011 - link

    A big thumbs down to Anand for not posting any of the over-clock benchmarks. He ran them, why not include them in the review?

    With the BD running at an air cooled 4.5 Ghz, or a water cooled 5.0 Ghz, both a significant boost over the default clock speed, the OC benchmarks are more important to a lot of enthusiasts than the base numbers. In the article you say you ran the benchmarks on the OC part, why didn't you include them in your charts? Or at least some details in the section of the article on the Over-clock? You tell us how high you managed to over-clock the BD and under what conditions, but you gave us zero input on the payoff!
  • Oscarcharliezulu - Saturday, October 15, 2011 - link

    ...was going to upgrade my old amd3 system to a BD, just a dev box, but I think a phenom x6 or 955 will be just fine. Bit sad too.
  • nhenk--1 - Sunday, October 16, 2011 - link

    I think Anand hit the nail on the head mentioning that clock frequency is the major limitation of this chip. AMD even stated that they were targeting a 30% frequency boost. A 30% frequency increase over a 3.2 GHz Phenom II (AM3 launch frequency i think) would be 4.2 GHz, 17% faster than the 3.6 GHz 8150.

    If AMD really did make this chip to scale linearly to frequency increases, and you add 17% performance to any of the benchmarks, BD would roughly match the i7. This was probably the initial intention at AMD. Instead the gigantic die, and limitations of 32nm geometries shot heat and power through the roof, and that extra 17% is simply out of reach.

    I am an AMD fan, but at this point we have to accept that we (consumers) are not a priority. AMD has been bleeding share in the server space where margins are high, and where this chip will probably do quite well. We bashed Barcelona at release too (I was still dumb enough to buy one), but it was a relative success in the server market.

    AMD needs to secure its spot in the server space if it wants to survive long term. 5 years from now we will all be connecting to iCloud with our ARM powered Macbook Vapor thin client laptops, and a server will do all of the processing for us. I will probably shed a tear when that happens, I like building PCs. Maybe I'll start building my own dedicated servers.

    The review looked fair to me, seems like Anand is trying very hard to be objective.
  • neotiger - Monday, October 17, 2011 - link

    "server space where margins are high, and where this chip will probably do quite well."

    I don't see how Bulldozer could possibly do well in the server space. Did you see the numbers on power consumption? Yikes.

    For servers power consumption is far more important than it is in the consumer space. And BD draws about TWICE as much power as Sandy Bridge does while performs worse.

    BD is going to fail worse in the server space than it will in the consumer space.
  • silverblue - Monday, October 17, 2011 - link

    I'm not sure that I agree.

    For a start, you're far more likely to see heavily threaded workloads on servers than in the consumer space. Bulldozer does far better here than with lightly threaded workloads and even the 8150 often exceeds the i7-2600K under such conditions, so the potential is there for it to be a monster in the server space. Secondly, if Interlagos noticably improves performance over Magny Cours then coupled with the fact that you only need the Interlagos CPU to pop into your G34 system means this should be an upgrade. Finally, power consumption is only really an issue with Bulldozer when you're overclocking. Sure, Zambezi is a hungrier chip, but remember that it's got a hell of a lot more cache and execution hardware under the bonnet. Under the right circumstances, it should crush Thuban, though admittedly we expected more than just "under the right circumstances".

    I know very little about servers (obviously), however I am looking forward to Johan's review; it'd be good to see this thing perform to its strengths.
  • neotiger - Monday, October 17, 2011 - link

    First, in the server space BD isn't competing with i7-2600K. You have to remember that all the current Sandy Bridge i7 waste a big chunk of silicon real estate on GPU, which is useless in servers. In 3 weeks Intel is releasing the 6 core version of SB, essentially take the transistors that have been used for GPU and turn them into 2 extra cores.

    Even in highly threaded workloads 8150 performs more or less the same level as i7-2600K. In 3 weeks SB will increase threaded performance by 50% (going from 4 cores to 6). Once again the performance gap between SB and BD will be huge, in both single-threaded and multi-threaded workloads.

    Second, BD draws much higher power than SB even in stock frequency. This is born out by the benchmark data in the article.

Log in

Don't have an account? Sign up now