That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • CeriseCogburn - Thursday, August 23, 2012 - link

    you LIED

    http://www.bit-tech.net/hardware/2012/08/16/nvidia...

    Deal with it.
  • Galidou - Tuesday, August 21, 2012 - link

    And you think AMD is much different, 7950 on newegg, 4 out of 18 models are reference models and no overclocker will ever buy em because we're already informed of that.Seems like you're not checking both side stock, that's what it is to be an Nvidia fanboy, you speak of things without having a clue about what you say.

    Blah blah blah they only have 3 reference cards....

    I get on newegg.ca and I see 4 reference designs 2 of them are overclocked, I see 4 stock clocked cards 2 are non reference coolers. It isn't different from amd with his 4 out of 18. Poor newbie, speaking like he knows something while he's in total darkness.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Hey dummy, the initial statement was a crybaby whine that the 660Ti was reviewed in OC models.
    Try to DEAL with what I said in response to some idiot not knowing why the 660Ti had a lot of OC model reviews.
    The author here pointed out why, but I'm sure you crybabies didn't read, or just had a rage3d brain fart instead of any comprehension, otherwise you wouldn't have whined.
    PS dummy - amd cards have been out for 9 months for OC models, so checking egg now is pretty dang stupid... amazingly stupid, but that's what fanboys do - make stupid idiotic complaints, get corrected, then make more stupid idiotic replies.
    Get your head screwed on straight.
  • Galidou - Thursday, August 23, 2012 - link

    You're right on that one, still, AMD AT LAUNCH had almost as much overclocked parts available to the public as they have now. How could you know, like you ever check the stock of AMD cards, you lack informations anyway.

    But still on almost 80% of the website they were reviewed stock clocked only. On the opposite of Nvidia having pushed for more overclocked parts to be reviewed, 80% of the 660 ti were OCed, why being so unfair if your products are FAR superior, well, because they simply are not THAT much superior.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    How could I know, and NO THEY DIDN'T.
    The crybaby liar attacking rage3d boy got it wrong again.
    Facts are what I KNOW, bs is what you talk.
    You've been corrected, AGAIN.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    LIAR " You're right on that one, still, AMD AT LAUNCH had almost as much overclocked parts available to the public as they have now."

    WRONG. THEY HAD EXACTLY 1.

    But as it’s turning out the Radeon HD 7970 isn’t going to be a traditional launch. In a rare move AMD has loosened the leash on their partners just a bit, and as a result we’re seeing semi-custom cards planned for launch earlier than usual. XFX looks to be the first partner to take advantage of this more liberal policy, as alongside the reference cards being launched today they’re launching their first semi-custom 7970s.
    Fully custom cards will come farther down the line. Of these 4 cards, 2 of them will be launching today: XFX’s Core Edition pure reference card, and their customized Black Edition Double Dissipation model, which features both a factory overclock and XFX’s custom cooler. It’s the Black Edition Double Dissipation
    ANANDTECH

    So... hopefully you UNDERSTAND WHY you're full of it.
  • Galidou - Thursday, August 23, 2012 - link

    There's no reason for ''Hey dummy'', ''crybaby whine'', ''PS dummy'' in our discussion, the lack of respect you use there just shows even more you're not able to maintain a good level of objectivity.

    ''Get your head screwed on straight''

    You're always trying to attack... always, sad, you're just trowing your credebility out the window yourself, not others are doing it, you alone are just trowing what little dignity you could have on a forum speaking about video cards.... comon, we're better humans that that.
  • TheJian - Sunday, August 19, 2012 - link

    And look at the OMG wattage used and heat given at that rate...LOL.

    You live in alaska maybe this is good. You live in AZ like I do, you're mentally having issues if you leave the store without a 660 TI in this battle at $299+.

    Power@load 315w zotac amp vs. 353 7950 (vs 373w for 7950B)!

    Nevermind what happens when you OC your beloved 7950.

    NOISE? The worst 660 TI (Zotac Amp) is 49.2DB. The 7950 is already 54.9, and 7950B is even worse at 58DB (the highest card in the list!). So do you want to hear your speakers and game sounds or that fan driving you out of the room. We are talking a 9DB difference (and DB's are exponential in noise rising), not overclocked on your 7950B in these results in this article. The 7970 is less noisy than the 7950B here.
  • Galidou - Sunday, August 19, 2012 - link

    I'm looking for the Sapphire 7950, no overclocker will buy a reference design and overclock it unless they want a jet engine..... The sapphire OC is really quiet even under load, and it gets to 1150 core on low voltage and low enough temp, if it's too hot I'll even let 50mhz go away anyway it will translate into a 40% overclock.

    The thing is I don't play Battlefield 3 and never will, same for portal 2, I'm a skyrim player looking for more memory than the 1gb of my 6850 crossfire ATM. I'll play other games, but the real power of Nvidia is mainly in those 2 games.
  • TheJian - Monday, August 20, 2012 - link

    You got a 30 inch monitor? Otherwise the 660TI is better in almost all games and usually by a larger than 20% margin, in some cases as I already proved, even vs. the 7970ghz edition.

    Already stated 660TI is less noisy even the Factory OC'd tested here and anywhere around the web. These are not HIGHLY clocked (none of the 79xx series are in the benchmarks except for the ghz edition) and all are noisy compared to 660TI.

    Power of nv only in two games?...Already debunked unless you have a 30in or have multi-monitor spanned res (way over 2560x1600) and in both cases you're running something like a gtx690 or more than one card. See my other posts in this review comments section. I've done the comparison work for you (and tore ryan a new one while at it). All you have to do is read it and verify it all. Easy. You're wrong. Or prove it at 1920x1200 or below.

    This card comes with 2GB :) You're welcome ;)
    Show me a REVIEW site showing a 1150 core clock on 7950 that isn't pumping out an extra 80 watts to do it and I'll accept that opinion.
    http://www.guru3d.com/article/radeon-hd-7950-overc...
    1.25v, 79 watts more than 7950 default.
    http://www.bls.gov/ro5/aepchi.htm
    Bureau of Labor Stats, avg cost per watt in usa 13.5c.
    http://www.citytrf.net/costs_calculator.htm
    3hrs per day (how much do you game?) @13.5c for roughly 80watts extra to run at 1150 per year=~$12 per year...Game heavy on weekends, or more during the week too? I know people that put in 10-20 hours on a weekend easy with a hot game...LOL. I wish I had that much time. Factor that over 3yrs, or live in a higher cost (Miami FL, anywhere in CA, Chicago IL) and it goes up. So at your modified version it will at least cost you ~$36 or so, not to mention the heat it generates for 3 years which may cause you to turn up your AC if you're in AZ/TX etc... :) Assuming you own the card for a good 3years before upgrading again and are a regular gamer (you have dual cards already - safe to say SLI users game more than most?), this is going to cost you some extra cash. It's NOT just the purchase price that gets you. Also I'm only talking VS the 7950 regular speeds. I'm not talking vs an already lower watt (.987v) 660TI by default, that is already cooler too no matter where you look. So the savings in more pronounced (say $40 instead of $36 over 3yrs at 3hrs).

    What can you do with a $40 savings over 3 years? Add another 8GB DDR3 module (or 2x4GB) to your machine...Buy a K chip instead of regular IVY/Sandy (really, any point in sandy now? for a $10-20 savings? Don't buy Sandy people), and get a 50PK of blank Taiyo Yuden dvdr's or some DL's, or a 25pk of blurays with that K chip. You get the point. :) It's FREE money if you choose correctly. Not to say I'm telling you to buy a 660TI...IF you have a 30in and run at 2560x1440 I'd hint at a 680 and OC it instead :) I am completely against dual cards because of heat, noise and watts...But if you couldn't care less about these things the sky is the limit for you. I will NEVER buy 2 cards when I can get ONE with a dual chip config for anywhere near the same price. It will always win on watts/not splitting mem/heat etc...Just hard to beat a card like GTX690 for all these areas vs. sli/crossfire. I want my pc silent and COLD. If you have no speakers or like my nephew game 90% with headphones, again maybe 2 cheaper cards would rock for you vs a 690 etc. Personally it's against my religion... :)
    Model #: 100352OCSR
    Model #: 100352FLEX
    Model #: 100352SR
    All newegg...Make sure you get the right one if you really want a 7950 sapphire oc. Note the connectors differences, and I'd take the top one if I was you. Not sure why the bottom one exists, as it seems a dupe of the same priced top...But you'll need to look closer, I don't want one :) Sorry.
    http://www.newegg.com/Product/ProductList.aspx?Sub...
    Nice card though, and I saw a review of the 7970 version at Hardocp (one of the best they've seen they said), it was BUILT very nice and they commented on the component level parts being better and more suited to OC. Not sure if it's the same here. Just an observation of the big brother :)

Log in

Don't have an account? Sign up now