That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • Galidou - Saturday, August 18, 2012 - link

    That's because Tom didn't use Portal 2 in benches and Nvidia is so gooood at it! Plus, instead of Dirt 3, he used dirt showdown and AMD is soooo good at it. So if you don't play: Balltefield 3, Dirt 3 and portal 2, there's a good chance that the 7870 might be better for you considering it wil performe equially/very close to the higher priced gtx 660.

    But again, if I'd be a heavy battlefield 3/portal 2 player, the choice is obvious...
  • Galidou - Saturday, August 18, 2012 - link

    Correcting myself higher priced GTX 660 ti. But gotta remember at the same time, there's a limited quantity of Borderlands 2 they give away if you buy an Nvidia video card which should be a testament that their card perform well with this game and it's worth 60$ so you save that if you ever planned to buy it anyway....
  • CeriseCogburn - Sunday, August 19, 2012 - link

    Said the fella at the site that has been milking crysis one for amd fanboys for how long now, even as Crysis 2 has been out for almost a year now...
    Yeah, sure is all nVidia here (gag)(rolls eyes)(sees the way every review is worded)
  • TheJian - Monday, August 20, 2012 - link

    Obvious for everyone else too. Quit looking at ryans comments and 2560x1600 where 98% of us don't run. He bases most of his junk comments and conclusion on 2560x1600...WHAT FOR?
    68 24in monitors on newegg...NOT ONE over 1920x1200.
    52 27in monitors on newegg. 41 1920x1200 or less, only 11 at 2560x1440 (NOT 1600). HIs recommendations and crap comments are about a res you can't even get until you use multimonitor or a 30incher!

    Already proved all the games run better at 1920x1200. See my other post to you...Its a lot more than battlefield and portal2, dirt 3..shogun2, Skyrim, Batman AC, Witcher2, Battlefield 3 Multiplayer, max payne 3, Civ5 (landslide again at 1920x1200 here anandtech). How many more you need? Don't point me to wins at 2560x1600 either... :) Unless we all are getting 30inchers for free soon (obama handouts?), it doesn't matter.

    What games are OK to test without you calling them biased towards NV?
  • CeriseCogburn - Thursday, August 23, 2012 - link

    I thank you for telling the truth and putting up with the amd fanboys who can't find the truth with the swirling retarded amd fanboy redeye goggles sandblasted to their skulls.
    I really appreciate it as I don't feel like using the truth to refute the loons and taking so many hours to do so and having to rinse and repeat over and over again since nothing sinks into their insane skulls and manning up is something they never can do.
    I do have hope though since a few obviously kept responding to you, and hence likely reading some (ADD and dyslexia may still be a problem by the looks of it though) so maybe in 20 years with a lot of global warming and hence more blood flow to their brains as they run around screaming the end is nigh, the facts everywhere presented over and over again will perhaps start just a tiny bit to begin sinking in to the mush.
    LOL
    I have to say, you definitely deserve a medal for putting up with them, and doing that large a favor pointing out the facts. I appreciate it, as the amd lies are really not friendly to us gamers and especially amd fanboys who always claim they pinch every penny (which is really sad, too).
  • Galidou - Thursday, August 23, 2012 - link

    It wins at 1920*1080 often because the games are cpu limited and Nvidia has an advantage to use lower cpu ressources. It does mean something else too, if it's cpu limited, it means the graphics doesn'T push the system enough and at the same time means that when intensive graphically new games will come out, the cpu will be less in the way. What's bad in buying a video card that already maxes everything at 1080p and will do this for you in the future because these games are just not pushing it enough?

    I remember the gtx 580 when it came out, it was running everything in 1920*1080 while gtx 570 and radeon 6970 were already doing this still people bought gtx 580 and now they are more taxed it's useful at 1080p. But it's obvious gtx 660 ti is superior in many ways and many games but what I want you to understand you two(Cerise and Jian) well I should just say Jian, I understood a long time ago that Cerise has a closed mind on the subject, is that AMD has strenghts too it loses overall at 1080p and stock clocked cards, but someone can be happy with a card like that anyway..... While all along I've been discussing, I never said Nvidia was bad, I never dismissed their older gen card either as amazing parts too, while you just continued and try to make people beleive that you'll see AMAZING difference, HUMONGOUS GAINS by buying Nvidia and that AMD is cancer(or at least it looks like that in your eyes).

    It's quite hard for anyone right now who's running a 7950 like I now do and my friend do and my 6850 crossfire did and my 4870 did. like my 8800gt, the gtx 460 I bought building computers for many of my friends do to just understand what rabble you might say about such a difference when 90% of their game is pegged at 60fps from high to ultra details. All these graphs reviews and everything else, they're not reflecting what the average user feels when they play their game.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    " It's quite hard for anyone right now who's running a 7950 like I now do and my friend do "

    Wow, you didn't listen at all amd fanboy. So you have a report on triple mon Skyrim, or ... your mons not working in triple - you need some adapter or something else you have to buy ?

    Let's have your SkyRim loss and crash numbers.... LET'S SEE unplayable 29-34 fps if you're sporting a sandy 2500k

    http://www.bit-tech.net/hardware/2012/08/16/nvidia...

    Wow, cool, you lost.
  • TheJian - Monday, August 20, 2012 - link

    I debunked all this already (see other posts). Besides they ran all cards at ref speeds...LOL. Bandwidth is NOT an issue where 98% of us run. 1920x1200 or below even on 27in monitors (only 11 27in at newegg have above 1920x1200 and it's less than tested here at 2560x1440). Ryan misleads over an over in this review as if we run on 30in monitors. WE DON'T. @1920x1200 you won't have memory problems from either side. Not even at msaa/af/fxaa etc. ALL of the 24in monitors at newegg have 1920x1200/1080. NOT ONE at 2560x anything. Only 11 27in that are 2560x1440 all 41 others are 1920x1080 (even less taxing than 1920x1200!). Ryan is just trying to stop people from buying Nvidia I guess. I'm not sure why he thinks 2560x1600 is important as I've already shown <2% use it in steampowered's hardware survey and you basically have to have a special 27 or 30in to run above 1920x1200 native. Raise your hand if you are running in that 2% user group? I don't see many hands...LOL. Also note of that 2% most are running multi-monitor & usually multi card setups. But ryan can't make a recommendation...LOL. He could if he would quit pretending we all use 2560x1600...ROFLMAO. I admit, at that res you MAY run into memory issues (for the 2% that do it).
  • saturn85 - Thursday, August 16, 2012 - link

    nice folding@home benchmark.
  • martyrant - Thursday, August 16, 2012 - link

    Is there anyone out there trying to mod this thing to a 670 yet if it's all the identical parts with one of the four rop/mem buses disabled. I'd imagine some of these things, even if binned as failed 670s, a few would most likely have all 4 rop/mem buses functional.

    This would be a pretty sweet upgrade path if so :) Would be the Radeon 6950 all over again (and all the previous generations that were able to either do softmods or if anyone remembers the pencil graphite trick back in the day).

    Thanks for the review, I've been waiting for this one...even though I'm pretty disappointed. The 7970 I've seen on sale for $360 lately and right now it's looking like it's going to be the best bang for your buck. That's cheaper than a 670/680, only slightly more than a 660 Ti, and it's pretty much the performance crown single GPU for the most part, though AMD's drivers lately are scaring me.

Log in

Don't have an account? Sign up now