Integer Crunching Power

Each core has two integer executions units (EX0 and EX1) and two AGUs (Address Generation Units). For comparison, the K10 core inside Magny-Cours and Istanbul had three ports to a “Fully featured ALU + AGU” couple. AMD marketing cleverly drew four pipeline blocks inside the Bulldozer integer core, but those powerpoint blocks cannot hide the fact that each Bulldozer integer core has fewer execution resources.

In practice, the AG0 and AG1 are little more than assistants with limited capabilities to EX0 and EX1.The software optimization guide for AMD family 15h processors lists only a few instructions (page 248 in the January 2012 version) that can be processed by the AG0 and AG1 execution units and each time the remark "First op to AG0 | AG1, Second to EX0 | EX1" is made. The AG0 and AG1 execution units reduce the latency of the CALL and LEA instructions, but the maximum throughput of each integer core inside the Bulldozer module is only two integer instructions per clock cycle. It's only when a fused branch enters EX0 and another integer instruction can enter EX1 that we have a slightly higher throughput of three integer instructions.

So the Bulldozer integer core can execute one integer instruction less per cycle (2 vs 3). That doesn’t mean that the Bulldozer integer core is 1/3 slower, however. The integer core of Bulldozer is smaller but also more flexible. The per lane dedicated 8-entry schedulers are gone, and a much larger 40 entry scheduler replaced it. This means that Bulldozer should be better at extracting ILP (Instruction Level Parallelism) out of code that has low IPC (Instructions Per Clock).

In some integer intensive applications, the fact that the maximum throughput of integer instructions is somewhat lower might slow things down. That is the not very useful "it depends" answer, so let's clarify: what kind of applications are we talking about?

Setting Expectations: the Front End Reevaluating the Situation
POST A COMMENT

84 Comments

View All Comments

  • thunderising - Wednesday, May 30, 2012 - link

    Glad AMD has "Greater Performance" planned sometime in the future. Wow! Reply
  • themossie - Wednesday, May 30, 2012 - link

    "There are also other factors at play, though, as it's already known that StarCraft II doesn't use more than two cores; theinstead, it's likely the..."

    (feel free to remove comment after fixing this)
    Reply
  • Nightraptor - Wednesday, May 30, 2012 - link

    I am wondering if it would be possible to compare the processor performance of the Trinity A10 with a underclocked FX-4100 set to the same frequencies (I don't know if it is possible to disable the L3 cache on the FX-4100). This might give us a rough idea of how much the improvements of Piledriver have bought us. Just doing rough math in my head it would seem that they have to be pretty significant given how a FX-4100 compared to the Phenom II X4's (it lost alot, if not most of the time t of the time). The new A10 Trinity's on the other hand seem to win most of the time compared to the old architecture. Given that the A10 is a Piledriver based FX-4XXX series equivalent minus the L3 cache it would seem that Piledriver brought very significant enhancements. Either that or the Phenom II era processors responded much more poorly to the lack of L3 than Piledriver does. Reply
  • coder543 - Wednesday, May 30, 2012 - link

    I was hoping they would be doing the same thing, even though it would be challenging to draw real information out of comparing a desktop processor and a mobile processor. Reply
  • SleepyFE - Wednesday, May 30, 2012 - link

    +1 Reply
  • kyuu - Wednesday, May 30, 2012 - link

    +2

    If you can figure out some way to do a comparison and analysis of Piledriver's performance vs. Bulldozer, I think a great many of us are interested to see that. From benchmarks, it seems like Piledriver improved a great deal over Bulldozer, but it's difficult to tell without being able to compare two similar processors.
    Reply
  • Aone - Wednesday, May 30, 2012 - link

    You can compare A10-4600M or A8-4500M versus mobile Llano or Phenom or even Turion to see tweaked BD is nothing of spectacular. For instance, in most cases A8-4500M (2.3GHz base) loses to Llano A8-3500M (1.5GHz base). Reply
  • Nightraptor - Wednesday, May 30, 2012 - link

    Where are you getting the benchmarks from that a Trinity loses to Llano. In almost all the benchmarks I have been able to find (with the exception of a few) it seems that Trinity beats Llano, hence the original post. If the Piledriver enhancements were very minor I would've expected Trinity (a hacked quad core) to lose to Llano most of the time (a true quad core). This didn't appear to happen - at least not in the anandtech review. Reply
  • Aone - Wednesday, May 30, 2012 - link

    "http://www.notebookcheck.net/AMD-A-Series-A8-4500M...
    Look at "Show comparison chart". Great info!
    Reply
  • Nightraptor - Thursday, May 31, 2012 - link

    I'm not a big fan of the reliability of that website - They tend to be pretty scant on the test circumstances and configurations. Furthermore I'm curious where they are getting the informaiton for the A8-4500 as to the best of my knowledge the only Trinity in the wild at the moment is the A10 which AMD sent out in a custom made review laptop. All they list is a "K75D Sample". Reply

Log in

Don't have an account? Sign up now