Our own Ryan Smith pointed me at an excellent thread on Beyond3D where forum member yuri ran across a reference to additional memory controllers in AMD's recently released Kaveri APU. AMD's latest BIOS and Kernel Developer's Guide (BKDG 3.00) for Kaveri includes a reference to four DRAM controllers (DCT0 - 3) with only two in use (DCT0 and DCT3). The same guide also references a Gddr5Mode option for each DRAM controller.

Let me be very clear here: there's no chance that the recently launched Kaveri will be capable of GDDR5 or 4 x 64-bit memory operation (Socket-FM2+ pin-out alone would be an obvious limitation), but it's very possible that there were plans for one (or both) of those things in an AMD APU. Memory bandwidth can be a huge limit to scaling processor graphics performance, especially since the GPU has to share its limited bandwidth to main memory with a handful of CPU cores. Intel's workaround with Haswell was to pair it with 128MB of on-package eDRAM. AMD has typically shied away from more exotic solutions, leaving the launched Kaveri looking pretty normal on the memory bandwidth front.

In our Kaveri review, we asked the question whether or not any of you would be interested in a big Kaveri option with 12 - 20 CUs (768 - 1280 SPs) enabled, basically a high-end variant of the Xbox One or PS4 SoC. AMD would need a substantial increase in memory bandwidth to make such a thing feasible, but based on AMD's own docs it looks like that may not be too difficult to get.

There were rumors a while back of Kaveri using GDDR5 on a stick but it looks like nothing ever came of that. The options for a higher end Kaveri APU would have to be:

1) 256-bit wide DDR3 interface with standard DIMM slots, or
2) 256-bit wide GDDR5 interface with memory soldered down on the motherboard

I do wonder if AMD would consider the first option and tossing some high-speed memory on-die (similar to the Xbox One SoC).

All of this is an interesting academic exercise though, which brings me back to our original question from the Kaveri review. If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?

I know I'd be interested in a 2-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, assuming AMD could stick some high-bandwidth memory on-die as well. More or less a high-end version of the Xbox One SoC. Such a thing would interest me but I'm not sure if anyone would buy it. Leave your thoughts in the comments below, I'm sure some important folks will get to read them :)

POST A COMMENT

127 Comments

View All Comments

  • ericore - Saturday, January 18, 2014 - link

    Quad is a little over kill, even a tri-memory controller by default for FM2+ would have been the most intelligent option; best balance between money and performance. As for Kaveri configuration, you always have a split; some people prefer more CPU, a few others prefer more GPU. I think if AMD was to be intelligent with regard to market saturation they need to cater to both these crowds in the APU design; have some more CPU focused. What AMD doesn't seem to realize is that their APU fundamentally is nothing more then a pure compromise; its not excellent at anything, just good at everything. The onboard graphics are still no where near where they need to be to forget about discrete solutions. 20 CUs with a better memory controller would definitely change that. HSA is another reason not to beef up the GPU on the APU because HSA has not taken off; so it makes more sense for AMD to put more CPU power until their is an HSA bottleneck. They are approaching this from the wrong perspective with regards to desktop. On laptops their strategy is on point, but its completely the opposite for desktops. Personally, I thinking of investing in G2140 since its much faster then Kaveri in archiving, encoding etc. I will then possibly buy Pirate Islands GPU midend depending on whether Denver competes with mantle which I suspect it will be a complete failure by comparison because even Denver has to deal with DirectX, and who knows if Nvidia even managed to offload any significant CPU time to Denver or is it mostly for physx. Also AMD apus are way too pricey, so not only is it a pure compromise, but its an expensive one too. 200$ canadian, where richland cost 140-145$ Canadian Reply
  • ericore - Saturday, January 18, 2014 - link

    Just so I'm clear, what's I'm saying is on the desktop side their Flagship APU should be a 6 core, with whatever die space left used for GPU. That would have been a hit, if they released it instead of their current 7850k. Reply
  • flashback_rtk - Saturday, January 18, 2014 - link

    I have a question:

    If AMD were to release a version of Kaveri with L3 cache or quad-channel memory, would they be able to just tweak it and release it under the same platform with another number (like A10-7870K) or would they have to "re-engineer" the chip (like in Trinity>Richland)? As I understand it option 1 is only possible if the existing chip already has those components albeit disabled (I don't know if that's possible for the memory controller anyway), am I right?

    I also think that the APU is a strange proposition (outside laptops). Home office is best served by intel CPUs without that graphic horsepower and there is a small margin where an APU makes sense before considering a dGPU for gaming. (For HTPCs I think that between AMD's and Intel's offerings the choices are fairly diverse, which is a good thing.) Although the chip looks great and to be a good foundation to build upon, there are still many checkboxes unticked:

    1. Good Linux drivers: I use Linux for everything except gaming and some audio software, my family does as well. Intel is hassle-free. Nvidia gets gaming covered (if you game on Linux). I don't think this is more important than other possible improvements, but cheap beefy AMD+free Linux would be something that a lot of enthusiasts would like to see (for HTPC with native Linux gaming maxed out for example).
    2. Dual Graphics: I remember when I first read about Dual Graphics in Trinity and was very impressed. Then saw some benchmarks and I was disappointed. This may be fixed now with Kaveri (is it?). If Dual Graphics is working the chip becomes much more interesting.
    3. Mantle and TrueAudio: Everything sounds cool in theory but it's not shipped yet and seems to be quite proprietary now. I hope AMD gets everyone to use this easily and efficiently (easy for developers to add, stable and easy for end users to get performance out of it, and non-problematic for non-radeon users).
    4. L3 Cache: It puzzles me why this chip doesn't have it (may be part of a bigger plan), but it would make sense.
    5. HSA: This seems to be the reason of existence of this chip. But it's also something the end-user won't notice or care about until real-world applications use it. The technologies that make HSA seem to be well thought, well designed, and a very good architecture to build a very wide range of systems. I really hope AMD pushes very hard with it and that other HSA Foundation members release products built on HSA.

    This is it, I think, from least to most important (even though I believe that each one is equally important). As everyone says, this whole APU thing is like an eternal promise. In this regard Kaveri feels rushed (I don't mean incomplete or buggy) but necessary. I am very interested to see what are AMD short-term plans for FM2+ and Kaveri. For the long-term I hope they succeed with HSA, like they got with x64 (I think that theoretically HSA benefits would show up quicker than 64 bits did).

    I may not be right in everything since I am guessing a lot of things I don't know, but just wanted to give my 2 cents.
    Reply
  • ericore - Saturday, January 18, 2014 - link

    L cache is used to keep the CPU fed.
    Intel may require LV3 cache due to lower latency (greater bandwidth, greater buffer requirement).
    Kaveri has a much higher latency more or less voiding LV3 cache.
    But on die DRAM for graphics and HSA would make a lot of sense but is expensive.
    On die cache is expensive.
    Reply
  • flashback_rtk - Monday, January 20, 2014 - link

    OK, but, if they were to add an L3, they would have to redesign the chip and call it something else right? Reply
  • alwayssts - Saturday, January 18, 2014 - link

    If you want to go purely on gpu alone, it would seem 7850k would need somewhere around 128-bit/ddr3-2600 if simply a gpu without whatever benefits the cpu cache brings, and of course that is not counting whatever bandwidth the cpu needs. That is also if the design is under full load. Past amd designs were clearly designed toward certain ddr3 spec speeds (above those listed for the processor spec) and peak bandwidth on simply the gpu, so perhaps that is a some-what safe formula assuming they did their homework and that makes sense for realistic loads for the whole chip counting internal cache.

    What a 256-bit bus affords amd is to take their most efficient gpu design (outside of some aspects of Hawaii) in Bonaire; 1x16 ROP array with 14 CUs (896sp), a gpu that could actually compete with the xbox/ps4 and play games at decent settings (especially in crossfire), and run it at realistic clocks with cheap memory...ddr3-2400 kits can be had for cheap, and clearly where they should aim.

    I've always used the bandwidth formula of roughly 56.25gbps per 1tf on a gpu for scaling, and up to this point it has always worked to show both tangible results as well as a bottleneck.

    If you take 256*2400mhz/8 = 76.8gbps. 76.8/56.25 = 1.365.33_ TF.
    1365.33_TF/[896*2 alus] = 762mhz

    Those seem like very realistic clockspeeds, as even though GF (on 28nm for instance) is around 10% less efficient per volt than TSMC (the later which is roughly 1v=1ghz on avg, perhaps a percent or two lower...or take the 1.163v 7970ghz which seems binned to 1150mhz or 7870 which is 1.218 and 1200mhz) that still leaves them with 764mhz at the lowest/most efficient voltage of .85v for hpp at GF...or just around perfect for such a design.

    Do I think that is a good idea? Hell yes! Both in low tdp form and higher-tdp black editions (coupled with more voltage, better cooling, and faster memory as it becomes cheaper) could bring amd back in the game in a big way. Would it not be amusing to say you overclocked your APU to 1029mhz (1.15v-1.175...still within the scalability power curve) with 3242mhz memory (as ddr3-3000 rated/capabie kits may become cheaper) and were equal than a ps4?

    Not saying that is exactly realistic on 28nm, but rather the core design would be solid for future iterations and there would be useful scalability from stock to overclockers.

    I hope amd moves forward with this idea.
    Reply
  • A-time - Sunday, January 19, 2014 - link

    With four memory channels they can split Cpu & Gpu and run gpu faster. Reply
  • tcube - Sunday, January 19, 2014 - link

    Yes! We would like a big apu!!! My big apu choice:

    4 cores with 1024b shared fp unit per module(or alternativelly dinamic context swithing and possibly no fp at all but more inter pipelines...), deeper pipeline and 20-25 cu's working at 3.5-4ghz cpu and 1-1.2ghz gpu. To power this beast properly 2-4gb of on package stacked ram the kind amd is building with hynix.... this beast could even run without external ram. Alternativelly i would rather go 4/6 channels ddr4... then either ddr3 or gddr5... we're talking highend after all arn't we?

    For said apu i'd gladly pay 600€ no questions asked!! I would even go higher if needed!

    The bhemoth i described would be rather big.. probably in the 500mm2 range using 28nm... but i'd go for broke and use 14nm glofo (they said they're now taping out on) process if viable that should shrink die size and also lower power usage to manageable levels... it will be expensive but such a beast would have around 3tflops of power and hsa with preemtion would bring the full brunt of the gpu to bear... it would be a valid knights landing counter... and a magnificent cpu...
    Reply
  • ninjaquick - Monday, January 20, 2014 - link

    I'd like to see a 4-Module 24+ CU 256-bit GDDR5 core, with widespread HSA implementation. AMD is really poised to take over if they can drive adoption. Reply
  • Shadowmaster625 - Monday, January 20, 2014 - link

    If AMD could actually get some drivers out there to take advantage of the unique features of the APU, then it would interest me much more. On paper there is at least the potential for 2-5 times the framerates vs a discrete GPU of the same specs. But the drivers have to be totally redesigned from the bottom up. I just dont see this rather incompetent AMD laying all this groundwork. I remember back in the day when the SNES was competing against the PC. There was no way you could get SNES type graphics on a PC in 1992. Just look at the boss fights in Super Contra. You simply could not get that from a 1992 PC game, not even close. There was way too much overhead in the gpu drivers and in the OS and just about every other aspect of the platform. It's kind of the same situation today. The entire game engine is designed around a gpu being in its own memory space. It is going to take 100x what AMD has to get developers to dump all they've done and start over. Reply

Log in

Don't have an account? Sign up now