Kabini: Mainstream APU for Notebooks

AMD will be building two APUs based on Jaguar: Kabini and Temash. Kabini is AMD’s mainstream APU, which you can expect to see in ultra-thin affordable notebooks. Note that both of these are full blown SoCs by conventional definitions - the IO hub is integrated into the monolithic die. Kabini ends up being the first quad-core x86 SoC if we go by that definition.

Kabini will carry A and E series branding, and will be available in a full quad-core version (A series) as well as dual-core (E series). The list of Kabini parts launching is below:

On the GPU side we have a 2 Compute Unit implementation of AMD’s Graphics Core Next architecture. The geometry engine has been culled a bit (1/4 primitive per clock) in order to make the transition into these smaller/low cost APUs. Double precision is supported at 1/16 rate, although adds and some muls will run at 1/8 the single precision rate.

Kabini features a single 64-bit DDR3 memory controller and ranges in TDPs from 9W to 25W. Although Jaguar supports dynamic frequency boosting (aka Turbo mode), the feature isn’t present/enabled on Kabini - all of the CPU clocks noted in the table above are the highest you’ll see regardless of core activity.

We have a separate review focusing on the performance of AMD’s A4-5000 Kabini APU live today as well.

Temash: Entry Level APU for Tablets

While Kabini will go into more traditional notebook designs, Temash will head down into the tablet space. The Temash TDPs range from 3.9W all the way up to 9W. Of the three Temash parts launching today, two are dual-core designs with the highest end A6-1450 boasting 4 cores as well as support for turbo core. The A6-1450’s turbo core implementation also enables TDP sharing between the CPU and GPU cores (idle CPUs can be power gated and their thermal budget given to the GPU, and vice versa).

The A4-1200 is quite interesting as it carries a sub-4W TDP, low enough to make it into an iPad-like form factor. It’s also important to note that AMD doesn’t actually reduce the number of GPU cores in any of the Temash designs, it just scales down clock speed.

Xbox One & PlayStation 4

In both our Xbox One and PS4 articles I referred to the SoCs as using two Jaguar compute units - now you can understand why. Both designs incorporate two quad-core Jaguar modules, each with their own shared 2MB L2 cache. Communication between the modules isn’t ideal, so we’ll likely see both consoles prefer that related tasks run on the same module.

Looking at Kabini, we have a good idea of the dynamic range for Jaguar on TSMC’s 28nm process: 1GHz - 2GHz. Right around 1.6GHz seems to be the sweet spot, as going to 2GHz requires a 66% increase in TDP.

The major change between AMD’s Temash/Kabini Jaguar implementations as what’s done in the consoles is really all of the unified memory addressing work and any coherency that’s supported on the platforms. Memory buses are obviously very different as well, but the CPU cores themselves are pretty much identical to what we’ve outlined here.

The Jaguar Compute Unit & Physical Layout/Synthesis Final Words
Comments Locked

78 Comments

View All Comments

  • Wolfpup - Wednesday, June 12, 2013 - link

    Ironically when I see an Intel sticker on a tablet (unless it's a Core i part), I avoid it like the plague. Bobcat would have been perfect for tablets, and a BIG selling point.
  • Wolfpup - Wednesday, June 12, 2013 - link

    Yeah, I really have no interest in an Atom tablet, partially even just because of the horrible video.

    I've got an 11.6" AMD c50 (lowest end Bobcat) based notebook, and while it's slow, it's still impressive how it runs anything, and in a pinch can even function as a main PC. AMD's got an even lower power Bobcat part with the exact same performance for tablets, but I don't know of shipping computers that used it, and it really would have been perfect. These new ones of course will be even better.

    I wonder if the companies building these understand that using AMD would be a selling point... I see "Atom" and my eyes glaze over....
  • codedivine - Thursday, May 23, 2013 - link

    4 DP FMAs per 16 cycles? Why even bother putting them in :|
  • Tuna-Fish - Thursday, May 23, 2013 - link

    Because it's expected by the spec, and some compute loads use it for very rarely used things.
  • Exophase - Thursday, May 23, 2013 - link

    "I should point out that ARM is increasingly looking like the odd-man-out here, with both Jaguar and Intel’s Silvermont retaining the dual-issue design of their predecessors."

    It's not just ARM, it's three different current gen ARM cores.. if you're going to pose it as ISA shouldn't it then just be ARM vs x86 and not ARM vs Silvermont and Jaguar?

    Besides, MIPS is 3-way in its CPUs targeting this power budget too (proAptiv), and so is PowerPC (e600 for instance). The reason why Silvermont and Jaguar is 2-way is really undeniable: x86 decoders are substantially more expensive than those for any of these ISAs, even Thumb-2. There's some validity to the argument that x86 instructions are more powerful (after first negating where they aren't - most critically, lack of three-way addressing adds a lot of extra move instructions for non-AVX processors) but nowhere close to 50% more powerful.
  • lmcd - Thursday, May 23, 2013 - link

    Isn't Qualcomm Krait 2-way?
  • Exophase - Thursday, May 23, 2013 - link

    Qualcomm hasn't said an awful lot about the internals of the uarch but several sources report 3-way decode and I haven't seen any say 2-way. It's possible that isn't fully symmetric or limited in some other way, we don't really know.
  • Krysto - Friday, May 24, 2013 - link

    Pretty sure it's 3-way.
  • tiquio - Thursday, May 23, 2013 - link

    I don't really understand the point about unique macros. What are macros in reference to CPU architecture.
  • quasi_accurate - Thursday, May 23, 2013 - link

    Don't worry, I had no idea either until I started working in the industry :) It just means custom circuits that are hand crafted by a human. This is as opposed to "synthesis", in which the RTL code (written in a hardware description language such as Verilog) are "synthesized" by design software into circuits.

Log in

Don't have an account? Sign up now