IPC Increases: Double L1 Data Cache, Better Branch Prediction

One of the biggest changes in the design is the increase in the L1 data cache, doubling its size from 64 KB to 128 KB while keeping the same efficiency. This is combined with a better prefetch pipeline and branch prediction to reduce the level of cache misses in the design. The L1 data cache is also now an 8-way associative design, but with the better branch prediction when needed it will only activate the one segment required and when possible power down the rest.  This includes removing extra data from 64-bit word constructions. This reduces power consumption by up to 2x, along with better clock gating and minor adjustments. It is worth pointing out that doubling the L1 cache is not always easy – it needs to be close to the branch predictors and prefetch buffers in order to be effective, but it also requires space. By using the high density libraries this was achieved, as well as prioritizing lower level cache. Another element is the latency, which normally has to be increased when a cache increases in size, although AMD did not elaborate into how this was performed.

As listed above, the branch prediction benefits come about through a 50% increase in the BTB size. This allows the buffer to store more historic records of previous interactions, increasing the likelihood of a prefetch if similar work is in motion. If this requires floating point data, the FP port can initiate a quicker flush required to loop data back into the next command. Support for new instructions is not new, though AVX2 is something a number of high end software packages will be interested in using in the future.

These changes, according to AMD, relate to a 4-15% higher IPC for Excavator in Carrizo compared to Steamroller in Kaveri.  This is perhaps a little more what we normally would expect from a generational increase (4-8% is more normal), but AMD likes to stress that this comes in addition to lower power consumption and with a reduced die area. As a result, at the same power Carrizo can have both an IPC advantage and a frequency advantage.

As a result, AMD states that for the same power, Cinebench single threaded results will go up 40% and multithreaded results up 55%. The benefits are fewer however the further up the power band you go despite the increase, as the higher density libraries perform slightly worse at higher power than Kaveri.

Efficiency and Die Area Savings Power Saving and Power Consumption
Comments Locked

137 Comments

View All Comments

  • Novacius - Tuesday, June 2, 2015 - link

    From the article: "However for a 15W part, to which Carrizo will be primarily, this means either a 10%+ power reduction at the same frequency or a 5% increase in frequency for the same power. Should AMD release other APUs in the 7.5W region, this is where most of the gains are."

    It's power per core pair (module) and Carrizo clearly has two of them. This means the highest gains are exactly in the 15W TDP range.
  • Ian Cutress - Tuesday, June 2, 2015 - link

    Good point, article updated :)
  • Novacius - Tuesday, June 2, 2015 - link

    "The big upgrade in graphics for Carrizo is that the maximum number of compute units for a mobile APU moves up from six (384 SPs) to eight (512 SPs), affording a 33% potential improvement."

    Mobile Kaveri already has 8 CUs (FX-7600P), but only at 35W.
  • el etro - Tuesday, June 2, 2015 - link

    They stated that the power savings allowed them to put the 512SPs/8CUs in the 15W part.
  • Ian Cutress - Wednesday, June 3, 2015 - link

    I changed that on the fly a little while back between meetings, should be OK now :)
  • ClamShall - Tuesday, June 2, 2015 - link

    This article is basically just an explanation of AMD's marketing slides without actual empirical data to back things up (other than what has been provided by AMD). Worse still, it doesn't make any notable attempt to critically analyze whether the company's claims will or will not materialize.

    In short, this article should've been left to AMD's marketing team and posted on the company's site.
  • Iketh - Tuesday, June 2, 2015 - link

    I don't have time to visit multiple sites. This is the only one I visit. Thank you AT for this article and keep them coming please.
  • KenLuskin - Wednesday, June 3, 2015 - link

    clamboy, This is a good article, and you are just another Intel fanboy with butt hurt. Intel does the EXACT same thing, and yet dumbos like you suck it up...... Bend over and assume the position for maximum penetration!
  • SolMiester - Wednesday, June 3, 2015 - link

    Eh?, and how are AMD providing the butt hurt?....
  • formulav8 - Sunday, June 7, 2015 - link

    Because there are many wackos who feels like an Intel/NVidia Corp is their mommy and hates to see AMD improve anything. Look at comments for Freesync reviews and such. Stupid how anyone gets attached to a Corp who cares nothing about you.

Log in

Don't have an account? Sign up now