That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • TheJian - Monday, August 20, 2012 - link

    Please point me to a 7970 for $360. The cheapest on newegg even after rebate is $410.

    Nice try though. "I'm pretty disappointed". Why? You got a 30in monitor or something? At 1920x1200 this card beats the 7970 ghz edition in a lot of games. :) Skyrim being one and by 10fps right here in this article...LOL.

    Mod to 670 isn't worth it when the shipped cards already beat it (3 of them did here). Remember, you should be looking at 1920x1200 and ignoring Ryans BS resolution only 2% or less use (it's a decimal point at steampowered hardware survey). If you're not running at 2560x1600 read the article again ignoring ryans comments. It's the best card at 1920x1200, regardless of Ryans stupid page titles "that darned memory"...ROFL. Why? STill tromps everything at 1920x1200...LOL.

    Got anything to say Ryan? Any proof we'll use 2560x1600 in the world? Can you point to anything that says >2% use it? Can you point to a monitor using it that isn't a 27/30in? Raise your hand if you have a 30in...LOL.
  • JarredWalton - Tuesday, August 21, 2012 - link

    http://www.microcenter.com/single_product_results....

    That's at least $20 cheaper than what you state.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    That's a whitebox version card only. LOL
  • TheJian - Friday, August 24, 2012 - link

    And the prices just dropped, so yeah, I should be off by ~20 by now :) White box, as stated. No game. Well, dirt showdown don't count it's rated so low ;)

    But nothing that states my analysis is incorrect. His recommendations were made based on 2560x1600 even though as proven 98% play 1920x1200 or less and the monitor he pointed me to isn't even sold in the USA. YOu have to buy it in Korea. With a blank faq page, help is blank, no phone and a gmail acct for help. No returns. Are you going to buy one from out of country from a site like that? Nothing I said wasn't true.
  • Mr Perfect - Thursday, August 16, 2012 - link

    I wonder if any board partners will try making the board symetrical again by pushing it up to 3GB? It's not like the extra ram would do any good, but if you could keep an already memory bandwidth starved card humming along at 144GB/s and prevent it from dropping all the way down to 48GB/s, it might help.
  • CeriseCogburn - Sunday, August 19, 2012 - link

    It doesn't drop to 48GB, that was just the reviewers little attack.
    You should have noticed the reviewer can't find anything wrong, including sudden loss of bandwidth, in this card, or the prior released nVidia models with a similar weighted setup.
    The SPECULATION is what the amd fanboys get into, then for a year, or two, or more, they will keep talking about it, with zero evidence, and talk about the future date when it might matter.... or they might "discover" and issue they have desperately been hunting for.
    In the mean time, they'll cover up amd's actual flaws.
    It's like the hallowed and holy of holies amd perfect circle algorithm.
    After years of the candy love for it, it was admitted it had major flaws in game, with disturbing border lines at shader transitions.
    That after the endless praise for the perfect circle algorithm, that, we were told - when push came to shove, and only in obscurity, that no in game advantage for it could be found, never mind the endless hours and tests spent searching for that desperately needed big amd fanboy win...
    So that's how it goes here. A huge nVidia advantage is either forgotten about and not mentioned, or actually derided and put down with misinformation and lies, until some amd next release, when it has appeared it is the time that it can finally be admitted that amd has had a huge fault in the exact area that was praised, and nVidia has a huge advantage and no fault even though it was criticized, and now it's okay because amd has fixed the problem in the new release... ( then you find out the new release didn't really fix the problem, and new set of sdpins and half truths starts after a single mention of what wrong).
    Happened on AA issues here as well. Same thing.
  • JarredWalton - Tuesday, August 21, 2012 - link

    Most games are made to target specific amounts of memory, and often you won't hit the bottlenecks unless you run at higher detail settings. 1920x1200 even with 4xAA isn't likely to hit such limits, which is why the 2560x1600 numbers can tell us a bit more.

    Best case for accessing the full 2GB, NVIDIA would interleave the memory over the three 64-bit connections in a 1:1:2 ratio. That means in aggregate you would typically get 3/4 of the maximum bandwidth once you pass 1.5GB of usage. This would explain why the drop isn't as severe at the final 512MB, but however you want to look at it there is technically a portion of RAM that can only be accessed at 1/3 the speed of the rest of the RAM.

    The better question to ask is: are we not seeing any major differences because NVIDIA masks this, or because the added bandwidth isn't needed by the current crop of games? Probably both are true to varying degrees.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    " GTX 660 Ti and 7950 tied at roughly 67fps. If you want a brief summary of where this is going, there you go. Though the fact that the GTX 660 Ti actually increases its lead at 2560 is unexpected. "

    Theory vs fact.
  • TheJian - Monday, August 20, 2012 - link

    Memory starved at what? NEVER at 1920x1200 or less. Are you running a 30in monitor? All 24in monitors are 1920x1200 or below on newegg (68 of them!). 80% of the 27inchers are also this way on newegg.com. 3GB has been proven useless (well 4gb was):
    http://www.guru3d.com/article/palit-geforce-gtx-68...

    "The 4GB -- Realistically there was NOT ONE game that we tested that could benefit from the two extra GB's of graphics memory. Even at 2560x1600 (which is a massive 4 Mpixels resolution) there was just no measurable difference."

    "But 2GB really covers 98% of the games in the highest resolutions. "
    Game over even on 2560x1600 for 4GB or 3GB. Ryan is misleading you...Sorry. Though he's talking bandwidth mostly, the point is 98% of us (all 24in and down, most 27in) are running at 1920x1200 or BELOW.
  • Galcobar - Thursday, August 16, 2012 - link

    Was wondering about how the Zotac was altered to stand in as a reference 660 Ti.

    Were the clock speeds and voltages lowered through one of the overclocking programs, or was a reference BIOS flashed onto it? I ask because as I understand AMD's base/boost clock implementation, the base clock is set by the BIOS and is not alterable by outside software.

Log in

Don't have an account? Sign up now