Cache and Memory Performance

I mentioned earlier that cache latencies are higher in order to accommodate the larger caches (8MB L2 + 8MB L3) as well as the high frequency design. We turned to our old friend cachemem to measure these latencies in clocks:

Cache/Memory Latency Comparison
  L1 L2 L3 Main Memory
AMD FX-8150 (3.6GHz) 4 21 65 195
AMD Phenom II X4 975 BE (3.6GHz) 3 15 59 182
AMD Phenom II X6 1100T (3.3GHz) 3 14 55 157
Intel Core i5 2500K (3.3GHz) 4 11 25 148

Cache latencies are up significantly across the board, which is to be expected given the increase in pipeline depth as well as cache size. But is Bulldozer able to overcome the increase through higher clocks? To find out we have to convert latency in clocks to latency in nanoseconds:

Memory Latency

We disable turbo in order to get predictable clock speeds, which lets us accurately calculate memory latency in ns. The FX-8150 at 3.6GHz has a longer trip down memory lane than its predecessor, also at 3.6GHz. The higher latency caches play a role in this as they are necessary to help drive AMD's frequency up. What happens if we turn turbo on and peg the FX-8150 at 3.9GHz? Memory latency goes down. Bulldozer still isn't able to get to main memory as quickly as Sandy Bridge, but thanks to Turbo Core it's able to do so better than the outgoing Phenom II.

L3 Cache Latency

L3 access latency is effectively a wash compared to the Phenom II thanks to the higher clock speeds enabled by Turbo Core. Latencies haven't really improved though, and Bulldozer has a long way to go before it reaches Sandy Bridge access latencies.

The Impact of Bulldozer's Pipeline Windows 7 Application Performance
Comments Locked

430 Comments

View All Comments

  • nofumble62 - Thursday, October 13, 2011 - link

    Crappy building block will mean crappy building.
  • richaron - Friday, October 14, 2011 - link

    At first I was pissed off by being strung along for this pile of tripe. After sleeping on it, I am not completely giving up on this SERVER CHIP:
    1) FX is a performance moniker, scratch stupid amount of cache & crank clock
    2) I'm sure these numbties can get single thread up to thuban levels
    3) Patch windows scheduler ffs
    Fix those (relatively simple) things & it will kick ass. But it means most enthusiasts wont be spending money on AMD for a while yet.
  • 7Enigma - Friday, October 14, 2011 - link

    Biggest problem for a server chip is the load power levels. It just doesn't compete on that benchmark and one in which is VERY important for a server environment from a cost/heat standpoint.

    Let's hope that's just a crappy leaky chip due to manufacturing but it's to early to tell.
  • richaron - Friday, October 14, 2011 - link

    I've worked in a 'server environment'. of course power consumption is an issue. at the lower clock speeds & considering multithread performance, this is already a good/great contender. virtual servers & scientific computing this is already a winnar.
    with a few (hardware & software) tweaks it could be a GREAT pc chip in the long term.
  • ryansh - Friday, October 14, 2011 - link

    Anyone have a BETA copy of WIN8 to see if BD's performance increases like AMD says it will.
  • silverblue - Friday, October 14, 2011 - link

    There's benchmarks here and there but nothing to say it'll improve performance more than 10% across the board. In any case, the competition also benefits from Windows 8, so it's still not a sign of AMD closing any sort of gap in a tangible fashion.
  • Pipperox - Friday, October 14, 2011 - link

    But Bulldozer is different.
    Windows 7 scheduler does not have a clue about its "modules" and "cores".
    So for example it may find it perfectly legit to schedule 2 FP intensive threads to the same module.
    Instead this will result in reduced performance on Bulldozer.
    Also one may want to schedule two integer threads which share the same memory space to the same module, instead of 2 different modules.
    This way the two threads can share the same L2 cache, instead of having to go to the L3 which would increase latency.

    All of the above does not apply to Thuban; to a lesser degree it applies to Sandy Bridge, but Windows 7 scheduler is already aware of Sandy Bridge's architecture.
  • nirmv - Saturday, October 15, 2011 - link

    Pipperox, It's not different than Intel's Hyper Threading.
  • Pipperox - Sunday, October 16, 2011 - link

    It is, although they're similar concepts.
    Let's make an example: you have 2 integer threads working on the same address space (for example two parallel threads working in the same process).
    All cores are idle.
    What is the best scheduling for a Hyperthreading cpu?
    You schedule each thread to a different core, so that they can enjoy full execution resources.

    What is best on Bulldozer?
    You schedule them to the SAME module.
    This because the execution resources are split in a BD module, so there would be no advantage to schedule the threads to different modules.
    HOWEVER if the 2 threads are on the same module, they can share the L2 cache instead of the L3 cache on BD, so they enjoy lower memory latency and higher bandwidth.

    There are cases where the above is not true, of course.

    But my example shows that optimal scheduling for Hyperthreading can be SUB-optimal on Bulldozer.

    Hence the need for a Bulldozer-aware scheduler in Windows 8.
  • Regs - Friday, October 14, 2011 - link

    AMD needs a 40-50% performance gain and they're not going to see it using windows 8. What AMD needs is...actually I have no clue what the need. I've never been so dumbfounded about a product that makes no sense or has any position in the market.

Log in

Don't have an account? Sign up now