That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • TheJian - Monday, August 20, 2012 - link

    660 can go to 1100/1200 as easily as the 7950 gets to 1150 (so another 10% faster)..Check the asus card I linked to before. You'll have a hard time catching the 660 no matter what, it costs you also as noted by anandtech, my comments on watts/cost/heat etc.

    Memory bandwidth isn't the issue. here and all of it overclocks fairly close. We don't run in 2560x1600. It's not the weakness. That is a misnomer perpetuated by Ryan beating it like a dead horse when only 2% of users use any res above 1920x1200. I just debunked that idea further by showing even monitors at newegg including 27 inchers don't use that res. IE, no, bandwidth isn't the problem. Bad review on ryan's part, and no conclusion is the problem. The CORE clock/boost is the thing when it's not an bandwidth issue, and it's already been shown to not be true.. LOL, yep, nvidia conspiracy, the minimums were used here to...ROFL. Good luck digging for things wrong with 660TI. Minimums are shown at hardocp, guru3d, anandtech and more. Strange thing you even brought this up with no proof.

    The NV cards have only been upped 100mhz, which is about ~10%, not 20 like you say. 915/1114 isn't 20%. You CAN get there, but not in out of box exp. I'd guess nearly all of the memory will hit 6.6ghz. Common for 7970OC / gtx680 to hit 7+ghz.
  • Galidou - Monday, August 20, 2012 - link

    I said 20% because most of their cards are way above reference clocks, I was just representing the reality, not the reference thingys. When you can buy factory overclocked cards at the same price, let's say 10$ premium, mentioning the reference clocks is almost... useless. Plus over the internet, 80% of the reviews had factory overclocked cards so the performance we see everywhere and is in everyone's head, is close to 20% overclock has been done.

    So in fact there's maybe 10-15% of the juice left for fellow overclockers. I'm estimating, it could be more in the case of better chips. While the 7950 as we know it, has been reviewed everywhere on it's reference clocks/fan and if you take an aftermarket cooler and get, let's be honest and say 40%, it's far ahead in terms of comparison from the reference reviews we have.

    And again and for the last time, it all depends on the games.
  • Galidou - Monday, August 20, 2012 - link

    When I look at things again and again. the memory bandwidth doesn't seem to be much of a problem. The only games where I can guess it could harm it is any new games that will come out with directx11 heavy graphics. Something that taxes the cards on every aspects, else than that, for now, the card doesn't seem to have any weaknesses at all.

    I never thought that for the moment it was a real weakness for it, the future will tell us but even there, 90% of the gamers plays at 1080p or less and 80% of that 90% pays less than 150$ for their video cards. For those paying more, it all depends on choosen side, games they play, overclocking or not and money they want to spend.

    Remove overclocking of the way Nvidia wins almost everything by a good margin. Anyone playing 1080p won't be deceived by any 200$+ card if they are not so inclined playing everything on ultra with 8x MSAA.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    You're going to have a CRAP experience and stuttering junk on your eyefinity in between crashes.
    Come back and apologize to me, and then thejian can hang his head and tell you he tried to warn you.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Here's the WARNING for you again, with the 660Ti STOMPING your dreamy 7950 into the turf in Skyrim at 2560 x 1080

    http://www.bit-tech.net/hardware/2012/08/16/nvidia...
  • RussianSensation - Thursday, August 16, 2012 - link

    Hey Ryan,

    In Shogun 2 and Batman AC at 1080P almost none of the new cards are being stressed. I think you should increase the quality to Ultra for the new 2-3GB generation of cards even if the < 1.5GB VRAM cards suffer and bump AA to 8X in Batman. Otherwise all the cards have no problem passing these benchmarks. Same with SKYRIM, maybe think about adding heavy mods OR testing that game with SSAA or 8xAA at least. Even the 6970 is getting > 83 fps. Maybe you can start thinking of replacing some of these games. They aren't getting very demanding anymore for the new generation of cards.
  • Ryan Smith - Saturday, August 18, 2012 - link

    Russian, it's unlikely that we'll ever bump AA up to 8x. I hate jaggies, but the only thing 8x AA does is to superficially slow things down; the quality improvement isn't even negligable. If 4x MSAA doesn't get rid of jaggies in a game, then the problem isn't MSAA.

    Consequently this is why we use SSAA on Portal 2. High-end cards are fast enough to use SSAA at a reasonable speed. Ultimately many of these games will get replaced in the next benchmark refresh, but if we need to throw up extra roadblocks in the future it will be in the form of TrSSAA/AAA or SSAA, just like we did with Portal 2.
  • Biorganic - Saturday, August 18, 2012 - link

    I was speaking a bit on both. The article insinuates that the 660ti is on the same performance level as the 7950. The obvious caveat to your results is that it is ridiculously easy to overclock the 7950 by 35-45%, and GCN performance scales pretty well with clock increases. It should be noted in the article that the perf of 7950 OC'd is beyond what the 660ti can attain. Unless you guys can OC a 660ti sample by 30% or more.
  • CeriseCogburn - Sunday, August 19, 2012 - link

    Is this the exact same way we recommended the GTX460 reviews ? With some supermassive OC in the reviews, so we could really see what the great GTX 460 could do ?
    NO>>>>>>>
    The EXACT OPPOSITE occurred here, by all of your type people.
    Did we demand the 560Ti be OC'ed to show how it surpasses the amd series ? NOPE.
    Did we go on and on about how massive the GTX580 gains were with OC even though it was already far, far ahead of all the amd cards with it's very low core clocks ? NOPE - here we heard power whines.
    Did we just complain that the GTX680 is not even in the review while the 7970 is ?
    Nope.
    How about the GTX 470 or 480 ? Very low cores, where were all of you then demanding they be OC'ed because they gained massively.. ?
    Huh, where were you ?
  • Galidou - Sunday, August 19, 2012 - link

    Performance scales pretty well on both design but AMD just is a little better at overclocking because it seems like the base clock is terribly underclocked. It just feels like that but that must be for power constraints and noise on reference designs.

Log in

Don't have an account? Sign up now