That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
POST A COMMENT

313 Comments

View All Comments

  • blanarahul - Thursday, August 16, 2012 - link

    First! Oh yeah! Reply
  • blanarahul - Thursday, August 16, 2012 - link

    GTX 660 Ti: Designed for overclockers. Overclock memory and thats it. Reply
  • CeriseCogburn - Thursday, August 23, 2012 - link

    The cores are hitting over 1300 consistently. Oh well, buh bye amd. Reply
  • Galidou - Monday, August 27, 2012 - link

    Well it depends on the samples, the 660 ti I bought for my wife, I tested it in my pc and over 1290 core clock(with boost) after 10-15 minutes gaming in a game that doesn't even taxes the gpu past 70%, the video card crashes and windows tells me ''the adapter has stopped responding''.

    Crysis 2 stutters on some levels but it's mainly stable 95% of the time wheras my 7950 overclocked is not doing this.

    It would artifact in MSI kombustor with a slight increase in voltage and core clock above 1260. Good thing it's for my wife and not me, she won't overclock as it's way enough for her mere 1080p resolution. The memory overclocks at 6,6ghz easily.
    Reply
  • GmTrix - Friday, August 17, 2012 - link

    Dear God, Have AnandTech readers really sunk to this level of childishness? Reply
  • Chaitanya - Friday, August 17, 2012 - link

    shocking. Reply
  • CeriseCogburn - Sunday, August 19, 2012 - link

    TXAA - AWESOME - THE JAGGIES ARE GONE.
    Thank you nVidia for having real technology developement, unlike amd loser
    Thank you nVidia for being able to mix ram chip sizes or to distribute ram chips across your memory controllers with proprietary technology that you keep secret depsite amd fanboys desiring to know how you do it so they can help amd implement for free.
    Thanks also for doing it so well, even with reviewers putting it down and claiming it can result in 48 bandwidth instead of 144 bandwidth, all the games and tests they have ever thrown at it in a desperate amd fanboy desire to find a chink in it's armor has yielded ABSOLUTELY NOTHING, as in, YOU'VE DONE IT PERFECTLY AGAIN nVidia.
    I just love the massive bias at this site.
    It must be their darn memory failing.
    Every time they make a crazy speculative attack here on nVidia where all their rabid research to find some fault provides a big fat goose egg, they try to do it again anyway, and they talk like they'll eventually find something even though they never do. By the time they give up, they're off on some other notional and failed to prove it put down against nVidia.
    192 bit bus / 2GB ram / unequal distribution / PERFECT PERFORMANCE IMPLEMENTATION
    Get used to it.
    Reply
  • TheJian - Sunday, August 19, 2012 - link

    ROFL... I should have just read more posts...Might have saved me a crapload of typing Cerise...LOL. Nah, it needs to be said more by more than ONE person :) Call a spade a spade people.

    I tried to leave out the word BIAS and RYAN/Anandtech in the same sentence :)

    But hold on a minute, while I fire up my compute crap (or 2008 game rendered moot by it's own 2011+2012 hires patch equivalent) so I can run up my electric bill so I can prove the AMD card wins in something I never intend to use a gaming card for or run at a res that these things aren't being used for by 98% of the people. Folding? You must be kidding. Bitcoin hunting?...LOL that party was over ages ago - you won't pay for your card getting bitcoins today - it was over before anandtech did their article on bitcoins - but I bet they helped sell some AMD cards. Quadro+fireGL cards are for this crap (computational NON game stuff I mean). Recommending cards based on computational crap is pointless when they're for gaming.

    I'm an amd fanboy but ONLY at heart. My wallet wins all arguments regardless of my love for AMD (or my NV stock...LOL). I'm trying to hold out for AMD's next cpu's but I'm heavily leaning Ivy K for Black Friday, fanboy AMD love or not. They ruined their company by paying 3x the price for ATI, which in turn crapped on their stock and degraded their company to near junk bond status in said stock (damn them, I used to be able to ride the rollercoaster and make money on AMD!). I'm still hoping for a trick up their sleeve nobody knows about. But I think they're just holding back cpu's to clear shelves, nothing special in the new ones coming. Basically a sandy to ivy upgrade but on AMD's side for bullsnozer. The problem is it's still going to be behind ivy by 25-50% (in some cases far worse). Unless it's an EXCEPTIONAL price I can't help but pick IVY as I do a lot of rar/par stuff and of course gaming. I'd get hurt way too much by following my heart this round (I had to take xeon e3110 s775 last time for the same reason).

    My planned Black Friday upgrade looks like, X motherboard (too early for a pick or homework not knowing AMD yet), Ivy 3770K (likely) and a 660TI with the highest default clock I can get at a black friday price :) (meaning $299 or under for zotac AMP speeds or better). I already have 16GB ddr3 waiting here...LOL. I ordered it ages ago, figuring it's going to go through the roof at some point (win8? crappy as it is IMHO). I'm only down $10 so far after purchasing mem I think in Jan or so...LOL. In the end I think I'll be up $30-80 at some point (I only paid $75 for 16GB). Got my dad taken care of too, we're both just waiting on black friday and all this 28nm vid card crap to sort out. End of Nov should have some better tsmc cards available (or another fabs chips?). I'm guessing a ton at high clocks by then for under $299.

    Anyway, THANKS for the good laugh :) I needed that after reading my 4th asinine review. Guru3d looking up for the 5th though...LOL. He doesn't seem to care who wins, & caters more to the wallet it seems (great OC stuff there too). He usually doesn't have a ton of cards or chips in each review though, so you have to read more than one product review there to get the picture, but they're good reviews. Hilbert Hagedoorn (sp?) does pretty dang good. By the end of it, I'll have hit everyone I think (worth mentioning, techreport, hardocp, ixbtlabs, hexus etc - sorry if I left a good one out guys). I seem to read 10+ these days before parting with cash. :( I like hardocp for a difference in ideas of benchmarking. He benches and states the HIGHEST PLAYABLE SETTINGS per card. It's a good change IMHO, though I still require all the other reviews for more games etc. I'm just sure to hit him for vidcard reviews just for the settings I can expect to get away with in a few games. I wish guru3d had thrown in an OC'd 660TI into the 7950 boost review since they're so easily had clocked high at $299/309. But one more read gets that picture, or can be drawn by all the asinine reviews and his 7950 boost review...LOL. I have to get through the rest of guru3d, then off to hardocp for the different angle :) Ahh, weekend geek reading galore with two new gpu cards out this week ;)
    Reply
  • Jorgan22 - Sunday, October 07, 2012 - link

    Review was a good read, glad to see the 660 TI is doing well.

    I have no idea what's up with the comments though, especially you TheJian, you wrote a novel, ending half the paragraphs with "... LOL".

    If you're going to waste so much time doing that, post it in the forums, not in a comment thread where its not going to get read buddy, just hurts you.
    Reply
  • RussianSensation - Sunday, August 19, 2012 - link

    1) TXAA is a blurry mess. See videos or screenshots. It's an option but let's not try claiming it's some new revolutionary anti-aliasing features.

    Instead HD7950 can actually handle MSAA and mods in Skyrim and Batman AC and not choke.
    http://www.computerbase.de/artikel/grafikkarten/20...

    2) That review left 2 critical aspects out:

    (I) Factory preoverclocked, binned after-market 7950s run cooler, quieter and at way lower voltage than that reference artificially overvolted 7950B card tested in the review (see MSI TwinFrozr 3, Gigabyte Windforce 3x for $320-330 on Newegg).

    (II) Those same after-market 7950s hit 1100-1200mhz on 1.175V or less in our forum. At those speeds, the HD7950 > GTX680/HD7970 Ghz Edition. How is that for value at $320-330?

    The review didn't take into account that you can get way better 7950 cards and they overclock 30-50%, and yet the same review took after-market 660Tis and used their coolers for noise testing and overclocking sections against a reference based 7950.

    Let's see how the 660Ti does against the $320 MSI TwinFrozr 7950 @ 1150mhz with MSAA on in Metro 2033, Crysis 1/Warhead, Anno 2070, Skyrim with ENB Mods w/8xMSAA, Batman AC w/8xMSAA, Dirt Showdown, Sleeping Dogs, Sniper Elite V2, Serious Sam 3, Bulletstorm, Alan Wake, Crysis 2 with MSAA. It's going to get crushed, that's what will happen.
    Reply

Log in

Don't have an account? Sign up now