IPC Increases: Double L1 Data Cache, Better Branch Prediction

One of the biggest changes in the design is the increase in the L1 data cache, doubling its size from 64 KB to 128 KB while keeping the same efficiency. This is combined with a better prefetch pipeline and branch prediction to reduce the level of cache misses in the design. The L1 data cache is also now an 8-way associative design, but with the better branch prediction when needed it will only activate the one segment required and when possible power down the rest.  This includes removing extra data from 64-bit word constructions. This reduces power consumption by up to 2x, along with better clock gating and minor adjustments. It is worth pointing out that doubling the L1 cache is not always easy – it needs to be close to the branch predictors and prefetch buffers in order to be effective, but it also requires space. By using the high density libraries this was achieved, as well as prioritizing lower level cache. Another element is the latency, which normally has to be increased when a cache increases in size, although AMD did not elaborate into how this was performed.

As listed above, the branch prediction benefits come about through a 50% increase in the BTB size. This allows the buffer to store more historic records of previous interactions, increasing the likelihood of a prefetch if similar work is in motion. If this requires floating point data, the FP port can initiate a quicker flush required to loop data back into the next command. Support for new instructions is not new, though AVX2 is something a number of high end software packages will be interested in using in the future.

These changes, according to AMD, relate to a 4-15% higher IPC for Excavator in Carrizo compared to Steamroller in Kaveri.  This is perhaps a little more what we normally would expect from a generational increase (4-8% is more normal), but AMD likes to stress that this comes in addition to lower power consumption and with a reduced die area. As a result, at the same power Carrizo can have both an IPC advantage and a frequency advantage.

As a result, AMD states that for the same power, Cinebench single threaded results will go up 40% and multithreaded results up 55%. The benefits are fewer however the further up the power band you go despite the increase, as the higher density libraries perform slightly worse at higher power than Kaveri.

Efficiency and Die Area Savings Power Saving and Power Consumption
POST A COMMENT

138 Comments

View All Comments

  • zodiacfml - Friday, June 5, 2015 - link

    Imagine what they could with 14nm of this, probably at half the cost of a Core M with 60 to 70% CPU performance of the M, yet with better graphics at the same TDP. Reply
  • AS118 - Saturday, June 6, 2015 - link

    I already signed up on the mailing list that tells you when Laptops with Carrizo come out and are ready to buy. You can do so on AMD's website if you're interested. The H.265 hardware decoding alone interests me, and all the other features like program-specific acceleration and the better GPU performance for mainstream games is nice.

    If you only play stuff like LoL or Counterstrike, or browser games or even older games on GoG and Steam, the A10 and up look like they'll be quite good.
    Reply
  • ivyanev - Sunday, June 7, 2015 - link

    As the performance is more than enough for everyday use, and the price is good, using it in mini PC would be great. Reply
  • watzupken - Thursday, June 11, 2015 - link

    I was thinking the same thing. If they can produce this for use in those NUC sized PC, I will consider getting one as HTPC if the price is right. Reply
  • Fujikoma - Sunday, June 7, 2015 - link

    AMD not including VP9 support is a mistake. They could always drop it if YouTube isn't as popular, but a lot of video in media articles tends to be linked to YouTube.
    It would be nice to see a die shrink with AMD adding more CPU cores to make up the difference to at least compete with Intel in number crunching.
    Reply
  • ivyanev - Tuesday, June 9, 2015 - link

    Try using h264ify plugin for chrome - it disables the vp8 and vp9 video, and youtube plays the mp4 versions - butter smooth and efficient Reply
  • figus77 - Thursday, June 11, 2015 - link

    I think everyone should look at APU with respect, apu is the future of pc and notebook, HBM on next AMD GPU will be a start and test for new APU with HBM on chip ram, that will be faster and faster than any ddr4 now available in market and probably any 'on motherboard' ram we will ever see, AMD could start a revolution in PC market, and other will probably copy them in short, even with faster cpu, but IF that will happens we shall be grate to AMD.
    And sorry for my english...
    Reply
  • JDub8 - Tuesday, June 16, 2015 - link

    Something I'm always interested in but is never addressed in these articles. The UVD playback and all its magical power savings - what codecs/players support it? If I have a CCCP installed will MPC-HC automaticall benifit? Or will that be reserved for some cyberpower payware dvd/bd player? Reply

Log in

Don't have an account? Sign up now