Compute Unit

Bobcat was pretty simple from a multi-core standpoint. Each Bobcat core had its own private 512KB L2 cache, and all core-to-core communication happened via a bus interface on each of the cores. The cache hierarchy was exclusive, as has been the case with all of AMD’s previous architectures.

Jaguar changes everything. AMD defines a Jaguar compute unit as up to four cores with a single, large, shared L2 cache. The L2 cache can be up to 2MB in size and is 16-way set associative. The L2 cache is also inclusive, a first in AMD’s history. In the past AMD always implemented exclusive caches as the inclusive duplicating of L1 data in L2 meant a smaller effective L2 cache. The larger shared L2 cache is responsible for up to another 5-7% increase in IPC over Bobcat (totaling ~22%).

AMD’s new cache architecture and lower latency core-to-core communication within a Jaguar compute unit means an even greater performance advantage over Bobcat in multithreaded workloads:

Multithreaded Performance Comparison
  # of Cores Cinebench 11.5 (Single Threaded) Cinebench 11.5 (Multithreaded)
AMD A4-5000 (1.5GHz Jaguar x 4) 4 0.39 1.5
AMD E-350 (1.6GHz Bobcat x 2) 2 0.32 0.61
Advantage 100% 21.9% 145.9%

The L1 caches remain unchanged at 32KB/32KB (I/D cache) per core.

Physical Layout and Synthesis

Bobcat was AMD’s first easily synthesized CPU core, it was a direct result of the ATI acquisition years before. With Jaguar, AMD made a conscious effort to further reduce the number of unique macros required by the design. The result was a great simplification, which helped AMD port Jaguar between foundries. There’s of course an area tradeoff when moving away from custom macros to more general designs but it was deemed worthwhile. Looking at the results, you really can’t argue. A single Jaguar core measures only 3.1mm^2 at 28nm compared to 4.9mm^2 for a 40nm Bobcat.

Integer & FP Units, Load/Store Improvements The APUs: Kabini, Temash, Xbox One & PS4
Comments Locked

78 Comments

View All Comments

  • skatendo - Friday, May 24, 2013 - link

    Not entirely true. The Wii U CPU is highly customized and has enhancements not found in typical PowerPC processors. It's been completely tailored for gaming. I'm not saying it's the power of the newer Jaguar chipsets, but the beauty of custom silicon is that you can do much more with less (Tegra 3's quad-core, 12-core GPU vs. Apple's A5 dual core CPU/GPU anyone? yeah A5 kicked its arse for games) that's why Nintendo didn't release tech specs because they tailored a system for games and performance will manifest with upcoming games (not these sloppy ports we've seen so far).
  • tipoo - Friday, May 24, 2013 - link

    I'm aware it would be highly customized, but a plethora of developers have also come out and said the CPU sucks.
  • skatendo - Saturday, May 25, 2013 - link

    Also the "plethora" of developers that said it sucked (namely the Metro: Last Light dev) said they had an early build of the Wii U SDK and said it was "slow". Having worked for a developer, they base their opinions on how fast/efficient they can port over their game. The Wii U is a totally different infrastructure that lazy devs don't want to take the time to learn, especially with a newer GPGPU.
  • Kevin G - Sunday, May 26, 2013 - link

    If a developer wants to do GPGPU, the PS4 and Xbox One will be highly preferable due to unified virtual memory space. If GPGPU was Nintendo's strategy, they shouldn't have picked a GPU from the Radeon 6000 generation. Sure, it can do GPU but there are far more compromises to hand off the workload.
  • Simen1 - Thursday, May 23, 2013 - link

    What is the TDP and die size of the APUs in X-Box One and Playstation 4?
  • haukionkannel - Thursday, May 23, 2013 - link

    Douple the 1.6 Ghz 4 core version and you are near. The wider memory controller eats some extra energy to, so maybe you have to add 0.2 to 0.3 calculation...
  • fellix - Thursday, May 23, 2013 - link

    "The L2 cache is also inclusive, a first in AMD’s history."

    Not exactly correct. The very first Athon (K7) on Slot A with off-die L2 used inclusive cache hierarchy. All models after that moved to exclusive design.
  • Exophase - Thursday, May 23, 2013 - link

    Bulldozer is also mostly inclusive. Not strictly inclusive, but certainly not exclusive (you really wouldn't get such a thing from a writethrough L1 cache)
  • whyso - Thursday, May 23, 2013 - link

    Ahh amd, I love your marketing slides. Lets compare battery life and EXCLUDE the screen. Never mind that the screen consumes a large amount of power and that when you add it to the total battery life savings go down tremendously. (That's why sandy-> ivy bridge didn't improve battery life that much on mobile). Lets also leave out the Rest of system power and soc power for brazos. It also looks like the system is using an SSD to generate these numbers which looking at the target market almost no OEM will do.
  • extide - Thursday, May 23, 2013 - link

    It's a perfectly valid comparison to make. All laptops will include a screen and the screen has nothing to do with AMD (or Intel).

Log in

Don't have an account? Sign up now