Graphics

The big upgrade in graphics for Carrizo is that the maximum number of compute units for a 15W mobile APU moves up from six (384 SPs) to eight (512 SPs), affording a 33% potential improvement. This means that the high end A10 Carrizo mobile APUs will align with the A10 Kaveri desktop APUs, although the desktop APUs will use 6x the power. Carrizo also moves to AMD’s third generation of Graphics Core Next, meaning GCN 1.2 and similar to Tonga based retail graphics cards (the R9 285).

This gives DirectX 12 support, but one of AMD’s aims with Carrizo is full HSA 1.0 support. Earlier this year when AMD first released proper Carrizo details, we were told that Carrizo will support the full HSA 1.0 draft as it currently stands as it has not been ratified, and they will not push back the launch of Carrizo until that happens. So there is a chance that Carrizo will not be certified has a fully HSA 1.0 compliant APU, but very few people are predicting major changes to the specification at this point before ratification that requires hardware adjustments.

The difference between Kaveri’s ‘HSA Ready’ and Carrizo’s ‘HSA Final’ nomenclature comes down to one main feature – context switching. Kaveri can do everything Carrizo can do, apart from this. Context switching allows the HSA device to switch between work asynchronously while it waits on the other part that needs to finish. I would imagine that if Kaveri came across work that required this, it would sit there idle waiting for work to finish before continuing, which means that Carrizo would be faster in this regard.

One of the key parts of HSA is pointer translation, allowing both the CPU and GPU to access the same memory despite their different interpretations of how the memory in the system is configured. One of the features on Carrizo will be the use of address translation caches inside the GPU, essentially keeping a record of which address points to which data and when an address is in a lower cache, that data can be accessed quicker. These ATC L1/L2 caches will be inside the compute units themselves as well as the GPU memory controller and an overriding ATC L2 beyond the regular L2 per compute unit.

Use of GCN 1.2 means that AMD can use their latest color compression algorithms with little effort – it takes a little more die area to implement (of which Excavator has more to play with than Kaveri), but affords performance improvements particularly in gaming. The texture data is stored losslessly to maintain visual fidelity, and move between graphics cores in this compressed state.

In yet more effort to suction power out of the system, the GPU will have its own dedicated voltage plane as part of the system, rather than a separate voltage island requiring its own power delivery mechanism as before. AMD’s latest numbers on the improvements here only date back to June 2013 via internal simulations, rather than an actual direct comparison.

All the performance metrics rolled in, and AMD is quoting a 65% performance improvement at 15W compared to Kaveri. The adjustment in design is allowing higher frequency for the same power, combined with the additional compute units and other enhancements for the overall score. At 35W the gain is less pronounced, but more akin to regular generational improvements anyway. What we see at 35W is what we would normally expect, and it pales in comparison to the 15W numbers.

Unified Video Decoder and Playback Pathways AMD Secure Processor and Final Thoughts
Comments Locked

137 Comments

View All Comments

  • name99 - Saturday, June 6, 2015 - link

    You are comparing a $400 laptop to a $1500 laptop and, what do you know, the $1500 laptop comes out better. What a surprise!

    The point is that in this space batteries have long been cheap and the energy efficiency nothing like at the higher end. Which means the work-life has been something like 3 hrs. If AMD shifts that to six hours with this chip, that's a massive improvement in the target space.

    You're also making bad assumptions about why these laptops are bought. If you rely on your laptop heavily for your job, you buy a $1500 laptop. These machines are bought to act as light performance desk machines that are occasionally (but only occasionally) taken to a conference room or on a field trip.
  • name99 - Saturday, June 6, 2015 - link

    AMD does not have infinite resources. This play makes sense.
    Intel is essentially operating by starting with a Xeon design point and progressively stripping things out to get to Broadwell-M, which means that Broadwell-M over-supplies this $400-$700 market. Meanwhile at the really low end, Intel has Atom.

    AMD is seeing (correctly, I think) that there is something of a gap in the Intel line which they can cover AND that this gap will probably persist for some time --- Intel isn't going to create a third line just to fit that gap.
  • Krysto - Wednesday, June 3, 2015 - link

    I might be ready to get into AMD, as AMD has a lot of innovation lately. But it still disappoints me greatly that they aren't able to adopt a more modern process node.

    If they launch their new high-performance CPU core next year as part of an APU that uses HBM memory and is at the very least on 16nm FinFET, I might get that instead of a Skylake laptop. HSA is pretty cool and one of the reasons I'd get it.
  • UtilityMax - Wednesday, June 3, 2015 - link

    The Kaveri FX parts are still almost half as slow in IPC as a competing Intel Core i3 with the same TDP. Only in tests involving multithreaded apps that can load all four cores the FX parts are keeping up with the Core i3. Let's hope the Carrizo generation of APUs will improve this situation.
  • silverblue - Thursday, June 4, 2015 - link

    Without being an AMD apologist, I think the point was that single threaded performance was "good enough" for your usual light work which tends to be hamstrung by I/O anyway.

    There are two things that I need to see clarified about Carrizo, however:

    1) Does Carrizo drop CPU frequency automatically when the GPU is being taxed? That's certainly going to be an issue as regards the comparison with an i3.
    2) With the addition of AVX2, were there any architectural changes made to accommodate AVX2, for example a wider FlexFPU?
  • sonicmerlin - Tuesday, June 9, 2015 - link

    Yup. I'll wait for the 14 nm Zen APUs with HBM. The performance leap (both CPU and GPU) should be truly massive.
  • Phartindust - Thursday, June 4, 2015 - link

    Dude your gettin a Dell with a AMD processor!
    When was the last time that happened?
    Looks like @Dell loves #Carrizo, and will use @AMD once again. #AMDRTP http://www.cnet.com/au/news/dell-inspirion-amd-car...
  • elabdump - Friday, June 5, 2015 - link

    Don't forget that Intel gives you an non fixable NSA approved BIOS: http://mjg59.dreamwidth.org/33981.html
  • patrickjchase - Friday, June 5, 2015 - link

    Ian, you appear to have confused I-cache and D-cache.

    You wrote: "The L1 data cache is also now an 8-way associative design, but with the better branch prediction when needed it will only activate the one segment required and when possible power down the rest".

    This is of course gibberish. Branch prediction would help to predict the target set of an *instruction* fetch from the I-cache, but is useless for D-cache set prediction for the most part (I say "for the most part" because Brad Calder did publish a way-prediction scheme based on instruction address back in the 90s. It didn't work very well and hasn't been productized that I know of).
  • zodiacfml - Friday, June 5, 2015 - link

    Imagine what they could with 14nm of this, probably at half the cost of a Core M with 60 to 70% CPU performance of the M, yet with better graphics at the same TDP.

Log in

Don't have an account? Sign up now