Llano, Trinity and Kaveri Die: Compared

AMD sent along a high res shot of Kaveri's die. Armed with the same from the previous two generations, we can get a decent idea of the progression of AMD's APUs:

Llano, K10 Quad Core

Trinity and Richland Die, with two Piledriver modules and processor graphics

Kaveri, two modules and processor graphics

Moving from Llano to Trinity, we have the reduction from a fully-fledged quad core system to the dual module layout AMD is keeping with its APU range. Moving from Richland to Kaveri is actually a bigger step than one might imagine:

AMD APU Details
Core Name Llano Trinity Richland Kaveri
Microarch K10 Piledriver Piledriver Steamroller
CPU Example A8-3850 A10-5800K A10-6800K A10-7850K
Threads 4 4 4 4
Cores 4 2 2 2
GPU HD 6550 HD 7660D HD 8670D R7
GPU Arch VLIW5 VLIW4 VLIW4 GCN 1.1
GPU Cores 400 384 384 512
Die size / mm2 228 246 246 245
Transistors 1.178 B 1.303 B 1.303 B 2.41 B
Power 100W 100W 100W 95W
CPU MHz 2900 3800 4100 3700
CPU Turbo N/A 4200 4400 4000
L1 Cache 256KB C$
256KB D$
128KB C$
64KB D$
128KB C$
64KB D$
192KB C$
64KB D$
L2 Cache 4 x 1MB 2 x 2 MB 2 x 2 MB 2 x 2 MB
Node 32nm SOI 32nm SOI 32nm SOI 28nm SHP
Memory DDR-1866 DDR-1866 DDR-2133 DDR-2133

Looking back at Llano and Trinity/Richland, it's very clear that AMD's APUs on GF's 32nm SOI process had a real issue with transistor density. The table below attempts to put everything in perspective but keep in mind that, outside of Intel, no one does a good job of documenting how they are counting (estimating) transistors. My only hope is AMD's transistor counting methods are consistent across CPU and GPU, although that alone may be wishful thinking:

Transistor Density Comparison
Manufacturing Process Transistor Count Die Size Transistors per mm2
AMD Kaveri GF 28nm SHP 2.41B 245 mm2 9.837M
AMD Richland GF 32nm SOI 1.30B 246 mm2 5.285M
AMD Llano GF 32nm SOI 1.178B 228 mm2 5.166M
AMD Bonaire (R7 260X) TSMC 28nm 2.08B 160 mm2 13.000M
AMD Pitcairn (R7 270/270X) TSMC 28nm 2.80B 212 mm2 13.209M
AMD Vishera (FX-8350) GF 32nm SOI 1.2B 315 mm2 3.810M
Intel Haswell 4C (GT2) Intel 22nm 1.40B 177 mm2 7.910M
NVIDIA GK106 (GTX 660) TSMC 28nm 2.54B 214 mm2 11.869M

If AMD is indeed counting the same way across APUs/GPUs, the move to Kaveri doesn't look all that extreme but rather a good point in between previous APUs and other AMD GCN GPUs. Compared to standalone CPU architectures from AMD, it's clear that the APUs are far more dense thanks to big portions of their die being occupied by a GPU.

The Steamroller Architecture: Counting Compute Cores and Improvements over Piledriver Accelerators: TrueAudio DSP, Video Coding Engine, Unified Video Decoder
POST A COMMENT

380 Comments

View All Comments

  • geniekid - Tuesday, January 14, 2014 - link

    Would've been nice to see a discrete GPU thrown in the mix - especially with all that talk about Dual Graphics. Reply
  • Ryan Smith - Tuesday, January 14, 2014 - link

    Dual graphics is not yet up and running (and it would require a different card than the 6750 Ian had on hand). Reply
  • Nenad - Wednesday, January 15, 2014 - link

    I wonder if Dual Graphics can work with HSA, although I doubt due to cache coherence if nothing else.

    While on HSA, I must say that it looks very promising. I do not have experience with AMD specific GPU programming, or with OpenCL, but I do with CUDA (and some AMP) - and ability to avoid CPU/GPU copy would be great advantage in certain cases.

    Interesting thing is that AMD now have HW that support HSA, but does not yet have software tools (drivers, compilers...), while NVidia does not have HW, but does have software: in new CUDA, you can use unified memory, even if driver simulate copy for you (but that supposedly means when NVidia deliver HW, your unaltered app from last year will work and use advantage of HSA)

    Also, while HSA is great step ahead, I wonder if we will ever see one much more important thing if GPGPU is ever to became mainstream: PREEMPTIVE MULTITASKING. As it is now, still programer/app needs to spend time to figure out how to split work in small chunks for GPU, in order to not take too much time of GPU at once. It increase complexity of GPU code, and rely on good behavior of other GPU apps. Hopefully, next AMD 'unification' after HSA would be 'preemptive multitasking' ;p
    Reply
  • tcube - Thursday, January 16, 2014 - link

    Preemtion, dynamic context switching is said to come with excavator core/ carizo apu. And they do have the toolset for hsa/hsail, just look it up on amd's site, bolt i think it's called it is a c library.

    Further more project sumatra will make java execute on the gpu. At first via a opencl wrapper then via hsa and in the end the jvm itself will do it for you via hsa. Oracle is prety commited to this.
    Reply
  • kazriko - Thursday, January 30, 2014 - link

    I think where multiple GPU and Dual Graphics stuff will really shine is when we start getting more Mantle applications. With that, each GPU in the system can be controlled independently, and the developers could put GPGPU processes that work better with low latency to the CPU on the APU's built in GPU, and processes for graphics rendering that don't need as low of latency to the discrete graphics card.

    Preemptive would be interesting, but I'm not sure how game-changing it would be once you get into HSA's juggling of tasks back and forth between different processors. Right now, they do have multitasking they could do by having several queues going into the GPU, and you could have several tasks running from each queue across the different CUs on the chip. Not preemptive, but definitely multi-threaded.
    Reply
  • MaRao - Thursday, January 16, 2014 - link

    Instead AMD should create new chipsets with dual AMU sockets. Two A8-7600 APUs can give tremendous CPU and GPU performance, yet maintaining 90-100W power usage. Reply
  • PatHeist - Thursday, February 13, 2014 - link

    Making dual socket boards scale well is tremendously complex. You also need to increase things like the CPU cache by a lot. Not to mention that performance would tend to scale very badly with the additional CPU cores for things like gaming. Reply
  • kzac - Monday, February 16, 2015 - link

    Having 2 or more APUs on a logic board would defeat the purpose of having an APU in the first place, which was to eliminate processing being handled by the logic board controller. With dual APU sockets, there would need to be some controller interjected to direct work to the APUs which could create a bottle neck in processing time (clock cycles). This is the very reason for the existence of multi core APUs and CPUs of today.

    Its my expectation that we will start to observe much more memory being added to the APU at some point, to increase throughput speeds. Essentially think of future APUs becoming a mini computer within, the only limitations currently to this issue are heat extraction and power consumption.
    Reply
  • 5thaccount - Tuesday, January 21, 2014 - link

    I'm not so interested in dual graphics... I am really curious to see how it performs as a standard old-fashioned CPU. You could even bench it with an nVidia card. No one seems to be reviewing it as a processor. All reviews review it as an APU. Funny thing is, several people I work with use these, but they all have discrete graphics. Reply
  • geniekid - Tuesday, January 14, 2014 - link

    Nvm. Too early! Reply

Log in

Don't have an account? Sign up now