AMD Kaveri Docs Reference Quad-Channel Memory Interface, GDDR5 Option
by Anand Lal Shimpi on January 16, 2014 10:51 PM ESTOur own Ryan Smith pointed me at an excellent thread on Beyond3D where forum member yuri ran across a reference to additional memory controllers in AMD's recently released Kaveri APU. AMD's latest BIOS and Kernel Developer's Guide (BKDG 3.00) for Kaveri includes a reference to four DRAM controllers (DCT0 - 3) with only two in use (DCT0 and DCT3). The same guide also references a Gddr5Mode option for each DRAM controller.
Let me be very clear here: there's no chance that the recently launched Kaveri will be capable of GDDR5 or 4 x 64-bit memory operation (Socket-FM2+ pin-out alone would be an obvious limitation), but it's very possible that there were plans for one (or both) of those things in an AMD APU. Memory bandwidth can be a huge limit to scaling processor graphics performance, especially since the GPU has to share its limited bandwidth to main memory with a handful of CPU cores. Intel's workaround with Haswell was to pair it with 128MB of on-package eDRAM. AMD has typically shied away from more exotic solutions, leaving the launched Kaveri looking pretty normal on the memory bandwidth front.
In our Kaveri review, we asked the question whether or not any of you would be interested in a big Kaveri option with 12 - 20 CUs (768 - 1280 SPs) enabled, basically a high-end variant of the Xbox One or PS4 SoC. AMD would need a substantial increase in memory bandwidth to make such a thing feasible, but based on AMD's own docs it looks like that may not be too difficult to get.
There were rumors a while back of Kaveri using GDDR5 on a stick but it looks like nothing ever came of that. The options for a higher end Kaveri APU would have to be:
1) 256-bit wide DDR3 interface with standard DIMM slots, or
2) 256-bit wide GDDR5 interface with memory soldered down on the motherboard
I do wonder if AMD would consider the first option and tossing some high-speed memory on-die (similar to the Xbox One SoC).
All of this is an interesting academic exercise though, which brings me back to our original question from the Kaveri review. If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?
I know I'd be interested in a 2-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, assuming AMD could stick some high-bandwidth memory on-die as well. More or less a high-end version of the Xbox One SoC. Such a thing would interest me but I'm not sure if anyone would buy it. Leave your thoughts in the comments below, I'm sure some important folks will get to read them :)
127 Comments
View All Comments
frogger4 - Friday, January 17, 2014 - link
Answering the question in the post: Yes, that is something I would buy (if in the market for a new PC or console).The problem with Kaveri as it is right now is this: What role does it actually fill? It is a launch vehicle for HSA, and is a platform for entry level PC gaming. It does fill that role, but I believe that market is small.
There are better options for a traditional CPU + GPU computer (Intel), there are better options for GPU compute (a dedicated GPU), there are better options for midrange gaming (a console), and there are better options for mobile (Intel being more power efficient).
I see three directions that Kaveri would need to go to be more competitive: More powerful CPU - that doesn't look likely to happen; FX hasn't done well recently. Lower power for ultra mobile - that doesn't look likely to happen while Kaveri is at 28nm and Intel will be launching 14nm this year. But a more powerful GPU with greater memory bandwidth - That could happen! That would be a killer chip for professionals working at 4K resolutions, and it would make console level gaming performance more affordable and compact. A "Steam Machine" needs a one chip solution like the consoles, with the graphics performance of the consoles. HSA support and better-than-Jaguar CPU performance is just a plus!
Cloakstar - Friday, January 17, 2014 - link
The extra 2 memory controllers would be a real performance enabler, even with hUMA. Bank+row interleave on the AMD APUs has boosted performance in CPU+GPU memory starved scenarios by over 50%. 4 DDR3 controllers leading to 8 DDR3 slots would cheaply max 32GB hUMA, and beat 2-controller DDR5 in bandwidth and latency.The current APU design would benefit, but the 4 controllers would also enable higher performing, more efficient parts, with either the CPU or GPU doubled. I expect a 4+ CPU module Opteron APU at some point with 4 memory controllers.
I do not expect AMD to release an APU besting the XB1 or PS4 in gaming performance until at least 2015. Large-scale cooperative efforts like those tend to come with short-term noncompete clauses. This would mean doubling the GPU or offering a 4-controller DDR5 solution are out of the question for this generation.
zachj - Friday, January 17, 2014 - link
As far as I understand it, the x86 instruction set in modern processors is really just a common API for each manufacturer's (AMD/Intel) custom hardware back-ends...if so, why can't AMD use the graphics hardware to offset its FPU deficit? Obviously AMD is expecting software companies to create/modify software to explicitly leverage the APU, but I don't understand why the x86 instruction set "API" doesn't afford them this capability for free; if my standard x86 code is issuing floating point operations, why not simply have the GPU quietly satisfy those operations on the back-end?If they did that, wouldn't we have an extremely competitive "CPU" product from AMD?
As for memory bandwidth, I'd tend to think that on-package GDDR5 would be a safer bet that on-motherboard; not only does that allow me to take my GDDR5 with me if I buy a new motherboard, but memory bandwidth is always impacted by the length and quality of the traces between it and the memory controller, so on-package ensures those traces won't be a problem.
UtilityMax - Thursday, January 30, 2014 - link
The thing is that GPUs are only good for massively parallel computations. The SIMD instructions could take advantage of the GPU hardware. However, anything that's not don't in parallel will be better off just running on the normal CPU cores.200380051 - Friday, January 17, 2014 - link
BGA Kaveri with 256 bit memory, either onboard GDDR5 or quad-channel DDR3 SODIMMs, on mITX. A sweet, sweeeeet combo.Also, some love for AM3+ owners?
Impressive chip @ 45w nonetheless. Can't wait to see this in mobile form factors.
Mathos - Friday, January 17, 2014 - link
I kinda figured from slides that I saw on the architecture blocks on other sites, that it having the capability to do quad channel was a given since it has a 256bit memory interface. Also the ability to run the IMC possibly in DDR5 mode was a given, since it's essentially the same hUMA/HSA design that is in the PS4 which uses gDDR5 memory exclusively.The limiting factor with Kaveri obviously is the FM2+ socket. They obviously decided to eschew the use of of a quad channel memory, or on board DDR5, to maintain backwards compatibility of the fm2+ socket with older gen APU's.
I'd also imagine, that may be why they haven't decided to make an FX version of the Steamroller cores, since in order to do so with HSA/hUMA, it would require a completely new socket. Remember we're also suppose to be seeing HSA/hUMA enabled GPU's this year as well. Otherwise it'd be stuck with the same memory and system constraints as previous AM3 CPU's. Which historically tend to be bandwidth limited, partially by dual channel memory, and partially by imc/L3 cache speeds.
I'd also imagine there is a lack of desire to use quad channel memory, even Intel dropped support for mainstream triple channel memory on its i series after the first gen i7's. Since it's a hard sell to convince people to buy ram in 3 stick kits instead of 2 stick. Now you only really see triple channel and quad channel in their server chips. Which is where we'll likely see the quad channel imc come into play for Kaveri, in the server space.
In the end, would I like to see an FX series replacement, even if it's on a new socket, with say 6 Steamroller Cores (3 modules), a decent ipg, and the possibility for triple or quad channel memory? Who the hell wouldn't?
Gotta remember with the DDR5 thing though. AMD isn't in the position to drive use of a new memory standard these days. Back in the days of the Athlon, and Athlon XP, and Athlon 64 pushing DDR and DDR2 things were different. But, with hUMA allowing them to put gDDR5 on a board, allowing the cpu direct access to it, would be the easiest work around for them.
toyotabedzrock - Friday, January 17, 2014 - link
Seems more likely this documentation is for the xbox and ps versions of the chip.oaf_king - Friday, January 17, 2014 - link
if an OEM picked it up as a LAPTOP design it could be pretty excellent. The adjustable TDPs would be perfect because there could be a high-power mode for plugged in/cooling stand and obviously a lower mode for mobility. GDDR5 a little pricier but lower voltage so I'd guess overall pretty good battery. All depends on the GDDR5 latest pricing, but luckily it appears good ole Sony has pushed forward and it should be dropping now to mass production. Way to go, Sony :)aryonoco - Friday, January 17, 2014 - link
I'd be very interested in a high-end Kaveri. I've pretty much determined that I'll be building/buying a Steam Machine this year both to act as my HTPC (with XBMC) and for occasional gaming. I'm not a big gamer, don't need the best graphics at the best settings, something moderate will be good for me.Kaveri right now is nearly there but the graphics performance just falls short. If AMD can sell a version of Kaveri with 12 CUs and 256-bit wide DDR3 for about ~$200 I think it will be the perfect SoC for my Steam Machine.
ravyne - Friday, January 17, 2014 - link
I would love a big Kaveri -- I'd happily buy a 4 module / 16CU / quad-channel APU. I'd be plenty happy if that's what replaced the current FX line even. 8 x86-64 threads, 1024 shaders in the same memory space, an 70+ GB/s between them? Yes, please. I don't care if the TDP is approaching 150w, it'd be worthwhile.