Kabini: Mainstream APU for Notebooks

AMD will be building two APUs based on Jaguar: Kabini and Temash. Kabini is AMD’s mainstream APU, which you can expect to see in ultra-thin affordable notebooks. Note that both of these are full blown SoCs by conventional definitions - the IO hub is integrated into the monolithic die. Kabini ends up being the first quad-core x86 SoC if we go by that definition.

Kabini will carry A and E series branding, and will be available in a full quad-core version (A series) as well as dual-core (E series). The list of Kabini parts launching is below:

On the GPU side we have a 2 Compute Unit implementation of AMD’s Graphics Core Next architecture. The geometry engine has been culled a bit (1/4 primitive per clock) in order to make the transition into these smaller/low cost APUs. Double precision is supported at 1/16 rate, although adds and some muls will run at 1/8 the single precision rate.

Kabini features a single 64-bit DDR3 memory controller and ranges in TDPs from 9W to 25W. Although Jaguar supports dynamic frequency boosting (aka Turbo mode), the feature isn’t present/enabled on Kabini - all of the CPU clocks noted in the table above are the highest you’ll see regardless of core activity.

We have a separate review focusing on the performance of AMD’s A4-5000 Kabini APU live today as well.

Temash: Entry Level APU for Tablets

While Kabini will go into more traditional notebook designs, Temash will head down into the tablet space. The Temash TDPs range from 3.9W all the way up to 9W. Of the three Temash parts launching today, two are dual-core designs with the highest end A6-1450 boasting 4 cores as well as support for turbo core. The A6-1450’s turbo core implementation also enables TDP sharing between the CPU and GPU cores (idle CPUs can be power gated and their thermal budget given to the GPU, and vice versa).

The A4-1200 is quite interesting as it carries a sub-4W TDP, low enough to make it into an iPad-like form factor. It’s also important to note that AMD doesn’t actually reduce the number of GPU cores in any of the Temash designs, it just scales down clock speed.

Xbox One & PlayStation 4

In both our Xbox One and PS4 articles I referred to the SoCs as using two Jaguar compute units - now you can understand why. Both designs incorporate two quad-core Jaguar modules, each with their own shared 2MB L2 cache. Communication between the modules isn’t ideal, so we’ll likely see both consoles prefer that related tasks run on the same module.

Looking at Kabini, we have a good idea of the dynamic range for Jaguar on TSMC’s 28nm process: 1GHz - 2GHz. Right around 1.6GHz seems to be the sweet spot, as going to 2GHz requires a 66% increase in TDP.

The major change between AMD’s Temash/Kabini Jaguar implementations as what’s done in the consoles is really all of the unified memory addressing work and any coherency that’s supported on the platforms. Memory buses are obviously very different as well, but the CPU cores themselves are pretty much identical to what we’ve outlined here.

The Jaguar Compute Unit & Physical Layout/Synthesis Final Words
Comments Locked

78 Comments

View All Comments

  • GuMeshow - Friday, May 24, 2013 - link

    The Embedded G-Series SOCs seem to be exactly Kabini + ECC memory enabled (ex: GX-420CA and A5-5200). This will probably be the cheapest way to get ECC enabled and better performance then Atom, next step up would be Intel S1200KPR + Celeron G1610?.

    I've been thinking of putting together a Router/Firewall/Proxy/NAS combo ...
  • R3MF - Thursday, May 23, 2013 - link

    HSA?
  • Spoelie - Thursday, May 23, 2013 - link

    Is it just me or does the shared L2 cache merely enable the same scaling to 4 cores as bobcat had to 2 cores? There is no "massive benefit" as alluded to in the numbers or discussion.

    Bobcat has for one thread 0.32 and for two threads 0.61, or a scaling of 95%. (0.64 perfect scaling)
    Jaguar has for one thread 0.39 and for four threads 1.50, or a scaling of 96% (1.56 perfect scaling)

    The 1% difference could easily be a result of score rounding. I see that a four core bobcat would probably scale worse than jaguar, but the percentages chosen in the table are a bit misleading.
  • Spoelie - Thursday, May 23, 2013 - link

    Of course, drawing such conclusions from a single benchmark is dangerous. If other benchmarks exhibit more code/data sharing and thread dependencies than Cinebench, their numbers might show a more appreciable scaling benefit from the shared L2 cache.
  • tipoo - Thursday, May 23, 2013 - link

    I wonder how this compares to the PowerPC 750, which the Wii U is based off of. The PS4 and One being Jaguar based, that would be interesting.
  • aliasfox - Thursday, May 23, 2013 - link

    Wii U uses a PPC 750? Correct me if I'm wrong, but the PPC 750 family is the same chip that Apple marketed as the G3 up until about 10 years ago? And IIRC, Dolphin in the GameCube was also based on this architecture?

    Back in the day, the G3 at least had formidable integer performance -clock for clock, it was able to outdo the Pentium II on certain (integer heavy) benchmarks by 2x. Its downfall was an outdated chipset (no proper support for DDR) and the inability to scale to higher clockspeeds - integer performance may have been fast, but floating point performance wasn't quite as impressively fast - good if the Pentium II you're competing against is nearly the same clock, bad when the PIII and Core Solos are 2x your clockspeed.

    Considering the history of the PPC 750, I'd love to know how a modern version of it would compare.
  • tipoo - Thursday, May 23, 2013 - link

    Yes, the Gamecube, Wii, and Wii U all use PowerPC 750 based processors. The Wii U is the only known multicore implementation of it, but the core itself appears unchanged from the Wii, according to the hacker that told us the clock speed and other details.
  • tipoo - Thursday, May 23, 2013 - link

    And you're right, it was good at integer, but the FPU was absolutely terrible...Which makes it an odd choice for games, since games rely much more on floating point math than integer. I think it was only kept for backwards compatibility, while even three Jaguar cores would have been better performing and still small.

    The Nintendo faithful are saying it won't matter since FP work will get pushed to the GPU, but the GPU is already straining to get even a little ahead of the PS360, plus not all algorithms work well on GPUs.
  • tipoo - Thursday, May 23, 2013 - link

    Also barely any SIMD, just paired singles. Even the ancient Xenon had good SIMD.
  • tipoo - Thursday, May 23, 2013 - link

    Unchanged on the actual core parts I mean, obviously the eDRAM is different from old 750s.

Log in

Don't have an account? Sign up now