That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • CeriseCogburn - Sunday, August 19, 2012 - link

    Boy but once we get to talking about the amd laptop APU's - Gee then the gaming sky is the limit and golly it's so, so important to take advantage of the great amd gaming hardware !
  • RussianSensation - Thursday, August 16, 2012 - link

    8800GT can't even play Crysis, Crysis Warhead, Crysis 2, Metro 2033, Shogun 2, Anno 2070, Witcher 2, Batman AC smoothly without resorting to DX9 or having everything set to Medium. It's a good card but new cards are 5x faster.
  • evolucion8 - Thursday, August 16, 2012 - link

    Odd, my laptop has a GTX 560m which is pretty much a power optimized GTS 450 and Im able to play Crysis DX10on high at 720p without AA, it runs between 23-33fps which might not seems great, is enough for casual gaming. I wonder how a 8800GT couldn't run that game at least on medium at the same resolution. Regarding other games like Crysis 2 and Batman AC they only run on DX9 or DX11, Metro 2033 is another story lol
  • Galidou - Saturday, August 18, 2012 - link

    You just said it yourself, 720p, no AA, 23-33 fps in a forum speaking about gtx 660 ti surrounded by people playing mostly 1080p and above... For me, anything below 40 fps is not super playable and still it's ALOT better when my fps is pegged at 60 with vsync.

    BTW GTS 450 = 9800gtx+ > 8800gt
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Then those terrible fps drops to very low 10 or 0 on amd cards should be bothering you. Do they bother you ?
  • ionis - Thursday, August 16, 2012 - link

    Batman AC, Anno 2070, Hard Reset, and Skyrim are what I've been playing just fine, along with many other modern games. I don't think these games even have a DX 10 option, so of course they'll use DX 9.
  • TheJian - Sunday, August 19, 2012 - link

    He said 1680x1050.
    Metro 2033 at 37fps:
    http://www.tomshardware.com/charts/2011-gaming-gra...
    I wouldn't run a game in DX10 anyway regarding Crysis and warhead (both of which run dx10 or 9...with all from XP running 9, it's not even about the cards, it's about the os). Google Alex St. John and DirectX 10 for his opinion. Which is rather important since he created DirectX. He and Extremetech/Pcmag proved it sucks in I think it was 9 or 11 games, noting screenshots for those who wanted to compare versions of games run under 9 vs 10. Nothing but a performance hit they said.
    http://www.firingsquad.com/hardware/lost_planet_de...
    http://www.extremetech.com/computing/77486-bioshoc...
    http://www.extremetech.com/computing/78788-bioshoc...
    http://www.shacknews.com/article/46338/alex-st-joh...
    Regarding the rest...I'm pretty sure he could run most at his res without resorting to running everything on medium. The 8800GT was a pretty awesome card. My dad still owns his...LOL. With a stroke though, he doesn't play as much so will get my hand-me-down radeon 5850 when I upgrade at black friday this year, The 8800GT almost obsoleted the overpriced 8800GTX and 8800Ultra overnight at the time it came out.
    http://www.tomshardware.com/reviews/geforce-8800-g...
    first of many benchmarks in that review.

    I won't argue that newer cards are 5 times faster...But in 1680x1050 I might argue how many times he'll notice. :) My nephew only complained about skyrim at this res on a less potent card. I'd also note ionis would have to spend quite a bit to get 5x that card he already has. My own card was only bought because it was 2x faster than my old card (a duplicate of my dads, which is a faster clocked 8800GT 512MB, Huge MSI copper pipes made it near silent also, the PSU fan was louder). He'd have to spend a few hundred at least. I'm currently waiting for another double of my 5850 at $300 (which I think just got released here :)). But I can wait a few more months for a great deal. So I'm guessing about $300 to beat his old 8800GT by 5x. His gpu may be limited by the cpu at that res quite a bit no matter what he buys at $300. Most running an old 8800gt I'd guess are running older cpu's too. So to see that 5x may require a new cpu in a lot of games (which I'm assuming is his monitor's NATIVE res). But it certainly would allow him to set EVERYTHING gpu wise at max on 1680x1050, of that I wholeheartedly agree. If he witnesses a slowdown at that point it's most likely his CPU :) Nvidia/AMD have really kind of run out of excuses for us to buy new cards right now. Unless you have a 27in (in rare cases 24's at 2560) or above, or multi monitor it's hard to argue for dual cards, or a great card at 2560x1600+. Newegg's 24's are all 1920x1200 (20 models) or 1920x1080 (48 models) for native resolutions out of 68 total :) Those are the RECOMMENDED resolutions for these 68 24in models at newegg Ryan.

    Again, I wonder why Ryan couldn't make a recommendation with just a quick look at resolutions on Newegg's 68 24in monitors showing NONE in native at 2560x1600. Besides the fact that you have to jack all sorts of things around at that res on a 24 in the OS or they're small. 2560x1600 is ideally for 27in+. Raise your hand if you have a 27 or 30in...LOL. The recommendation is EASY at 1920x1200 (the highest native res of ANY 24 on newegg RYAN!). Even the $289 dell UltraSharp U2412M is only 1920x1200. This is a quite expensive 24in (albeit gorgeous). $400 24's on there are still 1920x1080 or 1920x1200. Still can't figure out what to recommend ryan? I don't get it. I'm all for giving AMD help if I can, but get real. The 660TI appears to dominate almost all games at these resolutions.
  • mlb12uk - Thursday, August 16, 2012 - link

    Hi Ryan

    Thanks for the review. Im looking a GPU for 1920x1080 to play skyrim and upcomming mods. Im looking at the GTX 660 and a HD 7870, both cards have 2GB memory which I think should be enough. My questions is which would you recommend? The GTX 660 looks good but the slower memory bandwidth seems to hinder it in certain games that seem to make use of high memory availability (im guessing games like skyrim?).

    What are your thoughts on this please?
  • RussianSensation - Thursday, August 16, 2012 - link

    I think you should be comparing a 660Ti to HD7950. The 7870 can be had for $250 on Newegg. If you plan on overclocking, 7950 is the better card for Skyim, especially with mods and high AA. While not tested here, once you add Mods and crank AA, 7900 series is much faster than GTX600 in SKYRIM:

    7950 800mhz leads GTX660Ti by 24% at 1080P with 8AA with mods in Skyrim:

    http://www.computerbase.de/artikel/grafikkarten/20...

    You can pick up HD7950 MSI Twin Frozr for $317 with 3 free games. It's already preoverclocked to 880mhz and is actually one of the best overclocking 7950s on the market.
  • rarson - Friday, August 17, 2012 - link

    I didn't actually notice your username when I was reading your reply, and was shocked to read that you were actually recommending the 7950... that's when I realized you weren't Ryan.

Log in

Don't have an account? Sign up now