That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • TheJian - Sunday, August 19, 2012 - link

    You forgot to mention the extra heat, noise and watts it takes to do that 1250mhz. Catching NV's cards isn't free in any of these regards as the 7950 BOOST edition already shows vs. even the zotac Amp version of the 660ti.

    AMD fanboys might call these features these days I guess...
  • Galidou - Monday, August 20, 2012 - link

    7950 boost is a reference board, as we said before, 4 out of 18 boards are reference on newegg. It's unlikely people will buy reference cooler from AMD because they're plain bad if you overclock. Will people putting the reference design out of the way when speaking about overclock....

    Noise and temperature when overclocked on either for example Twin Frozr3 from MSI or sapphire OC are ALOT more silent and cool even fully overclocked....
  • TheJian - Monday, August 20, 2012 - link

    Nearly every card on newegg is OC'd by manufacturer and the top listed boost isn't even the top it can do by default which os a Nice.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    The GTX 580 was already way ahead, so it didn't have to catch up to 6970. When OC'ed it was so far ahead it was ridiculous.
    That's why all the amd fans had a hissy fit when the 680 came out - it won on power - frames, features, price - all they could whine about was it wasn't "greatly faster" than last gen (finally admitting of course the 580 STOMPED the pedal to the floor, held it there, and blew the doors off everything before it).

    See, these are the types of bias that always rear their ugly little hate filled heads.
  • Galidou - Thursday, August 23, 2012 - link

    ''ugly little hate filled heads''

    The sentence speaks for itself about who's writing it, you're one of them sorry, if it wasn't so filled with hate maybe I could pass but.... no.

    It won for the first time on power/performance, let'S not do like it has always been like that.... and before even if they didn't win on that front you speak like they won on everything forever. There's nothing good about AMD we already know your opinion, you don'T have to spread some more hatred on the subject, we know it.

    Just do yourself a favor, stop lacking respect to others for something everyone already knows about you, you hate AMD end of the line.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    You were wrong again and made a completely false comparison, and got called on it.
    Should I just call you blabbering idiot instead of hate filled biased amd fanboy ?
    How about NEITHER, and you keeping the facts straight ?
    HEY !
    That would actually be nice, but you're not nice.
  • Galidou - Tuesday, August 21, 2012 - link

    You didn't get the point of the memory controller/memory quantity they said here, poor newbie, let me explain things for you. On GCN there's 6 64bit memory bus, now divide 3gb of memory into 6, bingo 512mb each controller. Now take 2gb of memory and divide it by 3 memory bus of the 660 ti(192bit), ohhh you can't do this, what that means is there are 2 64 bit buses that takes care of a 512mb chip and the 2 other memory buses take care of 2x512 memory each.

    Asynchronous memory may not be that bad, it isn't a weirdo theory either, my little girl that's 10 years old would understand it if I would explain her that simple mathematical equation. Mr Cogburn the expert with less logic than a 10 y old girl... what are you doing here, you'Re so good, maybe you should join us at overclock.net and discuss with some pro overclockers and show us your results of your cpu/gpu on nitrogen if you're so pro about hardware.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Wow, and to think you just said you couldn't support the hatred, but there you go spreading your hatred again.

    The author is WRONG. Be a man and face the facts.
  • Galidou - Thursday, August 23, 2012 - link

    I didn't use a damn word that lacked of respect to you in all my previous posts unlike you calling me dummy, stupid, and so on in some previous post. You used words nazi, evil, and so on against AMD but I'm the one who spread the HATE LOL COMON.

    Don't try me, you are the most disrespectful, but still knowledgeable, person I've ever seen at your times. You expect me to stay totally cold all the time in face of the fact you diminish and attack people calling them names because they are Fanboys.... COMON.

    I speak once against your logic but every post you have is full of hatred and attack to people but ohhh when I say something ONCE, it's BAD. COMON.... Well I know it wasn't my best one and I offer you my dearest excuses, I truly am sorry. I was at the end of the roll, you're hard to follow, using everything you can to make us feel like AMD is realted to cancer, hell, nazis and such while I have a 4870 and 6850 crossfire that served me well.

    Imagine someone attacking and diminishing the user of a product because he uses the said product and you're one owner of that said product. Someone lacking of respect to YOU in every way because you use that thing and you actually like it and it served you well, you had no problem with it but still he can't stop and almost tries to make you beleive that if you bought that, it'S because you are plain stupid. You'd be mad, well that's what you make feel to most of AMD users the way you speak of them.

    I'm sorry for being such an ass comparing your logic to my 10 years old daughter while I know you're more logical than her. But you should really be sorry to any AMD video card owner that reads you because you really make em beleive AMD is the devil and their products are worthless while they're not.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Don't start out with a big lie, and you won't hear from me in the way you don't like to hear or see.
    You've got yourself convinced, you already did your research, you've said so, you've told yourself a pile of lies, WELL KEEP YOUR LIES TO YOURSELF THEN, in your own head, swirling around, instead of laying them on here then making up every excuse when you get called on them !
    Pretty simple dude.
    Here' let me help you, this is you talking:
    " I've been brainwashed at overclockers and all the amd fanboys there have convinced me to "get into OC" and told me a dozen lies, half of which I blindly and unthinkingly repeated here as I attacked the guy who knew what he was talking about and proved me wrong, again and again. I hate him, and want him to do what I say, not what he does. I want to remain wrong and immensley biased for all the wrong reasons, because being an amd fanboy is my fun no matter how many falsehoods I spew that are very easily smashed by someone, whom I claim, doesn't know a thing after they prove me wrong, again and again".
    Hey dude, be an amd fanboy, just don't spew out those completely incorrect falsehoods, that's all. Not that hard is it ?
    LOL
    Yeah, it's really, really hard not to, otherwise, you'd have a heckuva time having a single point.
    I get it. I expect it from you people. You've got no other recourse.
    Honestly it would be better to just say I like AMD and I'm getting it because I like AMD and I don't care about anything but that.

Log in

Don't have an account? Sign up now