That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • CeriseCogburn - Thursday, August 23, 2012 - link

    No contribution there.
  • claysm - Wednesday, August 22, 2012 - link

    The article says that the 660 Ti is an average of 10-15% faster than the 7870, and that's true. But I feel that that average doesn't reflect how close those two cards really are in most games. If you throw out the results for Portal 2 and Battlefield 3 (since they are nVidia blowouts), the 660 Ti is only about 5% faster than the 7870.
    Now obviously you can't just throw those results away because you don't like them, but if you're not playing BF3 or Portal 2, then the 660 Ti and the 7870 are actually very close. And given the recent price drop of the 7870, it would definitely win the price/performance mark.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    At what resolution ?
    Oh, doesn't matter apparently.
  • claysm - Thursday, August 23, 2012 - link

    At every resolution tested. 1680x1050, 1920x1200, and 2560x1600.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Not true nice try though not really was pathetic
  • CeriseCogburn - Thursday, August 23, 2012 - link

    No PhysX, no adaptive v-sync, inferior 3D, inferior 3 panel gaming, no target frame rate, poorer IQ, the list goes on and on.
    you have to be a fanboy fool to buy amd, and there are a lot of fools around, you being one of them.
  • claysm - Thursday, August 23, 2012 - link

    PhysX is not that great. There is only a single this year that will have PhysX support, and that is Borderlands 2. Most of the effects that PhysX adds are just smoke and more fluid and cloth dynamics. Sometimes a slightly more destructible environment.
    Adaptive V-Sync is cool, I saw a demonstration video of it.
    Inferior 3D is true, although your next point is stupid. AMD's Eyefinity is much better than nVidia Surround.
    I'm not a fanboy, Go to the bench and look at the results, do the math if you want. Barring BF3 and Portal 2, again since they are huge wins for nVidia, every other game on the list is extremely close. Of the 35 benchmarks that were run, it's the 8 from BF3 and Portal 2 that completely blow the average. The 660 Ti is more powerful, but the 7870 is a lot closer to the 660 Ti than the average would lead you to believe.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Yeah whatever - buy the slow loser without the features, say they don't matter, get the one with crappy drivers, say that doesn't matter.. throw out a few games, say they don't matter, ignore the driver support that goes back to the nVidia 6 series, that doesn't matter, ignore the pathetic release drivers of amd, say that doesn't matter... put in the screwy amd extra download junk for taskbar control in eyefinity, pretend that doesn't matter - no bezel peek pretend that doesn't matter...

    DUDE - WHATEVER ... you're a FOOL to buy amd.
  • Ambilogy - Thursday, August 23, 2012 - link

    "Yeah whatever"

    Essentially when someone said that theres 5% diference you 'just forget about it, nothings here to notice I'm busy trolling'. bullshit

    "- buy the slow loser"

    So for you slower loser is a little dif in framerates only in the allmyghty 19XX x 1XXX res? where everything is playable with other cards also? what when new titles come and then some stuff starts to go wrong with 660ti?. you can actually ignore the difference now and future titles could go better for AMD for opencl stuff. You should have said "a little slower now, if we are lucky, still a little slower on the future". bullshit

    "without the features, say they don't matter"

    I don't actually notice phyxs playing... and... if 2% of people play in very high res, how many do you think plays at your marvelous nvidia 3d? bullshit. Its like saying this is bad because only 2% uses it, and this is good but the percentage is even less. bullshit

    "get the one with crappy drivers"

    You read that a lot of people had amd driver issues, nice, like a lot of people also has nvidia driver issues... do you know the percentage of driver failures? the failures stand out only because normal working drivers don't drive attention. Does not mean that its plagged by bugs. bullshit.

    ", say that doesn't matter.. throw out a few games, say they don't matter, ignore the driver support that goes back to the nVidia 6 series, that doesn't matter, ignore the pathetic release drivers of amd, say that doesn't matter... "

    Hey nice! i know how to repeat stuff that I have already said without proving anything also! look: bullshit, bullshit, bullshit. The games you think matter can still be played, its future games that will tax this cards to new limits, then we will see, and if those include opencl, where will be your god? "well I could play battlefield 3 better some time ago, im sure these new games don't matter". or maybe a "yeah whatever" ? :)

    And im tyred now, I think this card is a fail, what does it do that cards already didn't do? what market do they cover that was not previously covered?

    OH NO BUT WE HAVE BETTER FPS FOR MAIN RESOLUTIONS
    Well, good luck with that in the future... I'm sure a man will buy a good 7950 with factory oc that will go just about as well, still playable and nice, and when the future comes then what? you can cry, cry hard.

    You cannot accept that your card is:

    1. Easy to equalize in performance, with little performance difference in most games or actually none if OC is considered.
    2. Focused on the marketing of some today games and completely forgot about future, memory bandwidth and so on.
    3. Overly marketised by nvidia.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    You cannot accept (your lies) that your card is:

    1. Easy to equalize in performance, with little performance difference in most games or actually none if OC is considered.

    I don't have a problem with that. 660Ti is hitting 1300+ on core and 7000+ on memory, and so you have a problem with that.
    The general idea you state, though I'M ALL FOR IT MAN!

    A FEW FPS SHOULD NOT BE THE THING YOU FOCUS ON, ESPECIALLY WHEN #1 ! ALL FOR IT ! 100% !

    Thus we get down to the added features- whoops ! nVidia is about 10 ahead on that now. That settles it.
    Hello ? Can YOU accept THAT ?
    If FOLLOWS 100% from your #1
    I'd like an answer about your acceptance level.

    2. Focused on the marketing of some today games and completely forgot about future, memory bandwidth and so on.

    Nope, it's already been proven it's a misnomer. Cores are gone , fps is too, before memory can be used. In the present, a bit faster now, cranked to the max, and FAILING on both sides with CURRENT GAMES - but some fantasy future is viable ? It's already been aborted.
    You need to ACCEPT THAT FACT.
    The other possibility would be driver enhancements, but both sides do that, and usually nvidia does it much better, and SERVICES PAST CARDS all the way back to 6 series AGP so amd loses that battle "years down the road" - dude...
    Accept or not ? Those are current facts.

    3. Overly marketised by nvidia. "

    Okay, so whatever that means...all I see is insane amd fanboysim - that's the PR call of the loser - MARKETING to get their failure hyped - hence we see the mind infected amd fanboys everywhere, in fact, you probably said that because you have the pr pumped nVidia hatred.
    Here's an example of "marketised""
    http://www.verdetrol.com/
    ROFL - your few and far between and dollars still hard at work.
    AMD adverts your butt in CCC - install and bang - the ads start flowing right onto your CCC screen...
    Is that " Overly marketised" ?

    I'm sorry you're going to have to do much better than that.

Log in

Don't have an account? Sign up now