That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • TheJian - Sunday, August 19, 2012 - link

    http://www.newegg.com/Product/Product.aspx?Item=N8...

    And it's $350. The only BOOST edition on newegg 2 days after this review.

    A full 6 660 TI's for $299 (one after rebate). So, unfair to not include a card that looks like there's a $50 premium to the TI? I beg to differ. Also there are 11 cards available to BUY for 660 TI. Nuff said?

    It was rightly picked on.
    Google 7950 boost, you get $349 cheapest and availability is next to none. Google 7950b you don't even get a result for shopping. The radeon 7950 cheapest at newegg is already $319.99 (most after rebate). If you're looking at 1920x1200 and below the 660 TI is a no brainer. It is close in the games it loses in, and dominates in a few it wins in. Not sure why the nvidia 660 ti is even in the list, you don't buy that. Zotac's $299 is basically the bottom you buy and is faster than the ref design at 928mhz/1006 boost (not 915/boost 980), so consider the TI GREEN bar slower than what you'll actually buy for $299. Heck the 6th card I mentioned at $299 after rebate is running it's base at 1019 boost at 1097! So they are clocking regular cards at a full 100mhz faster than REF for $299. Another at $309 is also like this (1006/1084 boost). Knowing this you should be comparing the Zotac AMP (barely faster than the two I mention for $299 and 309) vs. the 7950 which is $320 at minimum!

    Zotac AMP (only 14mhz faster base than $299/309 card) vs. 7950 (again more expensive by $20) @ 1920x1200
    Civ5 <5% slower
    Skyrim >7% faster
    Battlefield3 >25% faster (above 40% or so in FXAA High)
    Portal 2 >54% faster (same in 2560x...even though it's useless IMHO)
    Batman Arkham >6% faster
    Shogun 2 >25% faster
    Dirt3 >6% faster
    Metro 2033 =WASH (ztac 51.5 vs. 7950 51...margin of error..LOL)
    Crysis Warhead >19% loss.
    Power@load 315w zotac amp vs. 353 7950 (vs 373w for 7950B)! Not only is the 660TI usually faster by a whopping amount, it's also going to cost you less at the register, and far less at the electric bill (all year for 2-4 years you probably have it - assuming you spend $300-350 for a gaming card to GAME on it).

    For $299 or $309 I'll RUN home with the 660 TI over 7950 @ $319. The games where it loses, you won't notice the difference at those frame rates. At todays BOOST prices ($350) there really isn't a comparison to be made. I believe it will be a while before the 7950B is $320, let along $299 of the 660 TI.

    NVIDIA did an awesome job here for gamers. I'll wait for black friday in a few months, but unless something changes, perf/watt wise I know what I'm upgrading to. I don't play crysis much :) (ok, none). Seeding higher clocked cards or not, you can BUY them for $299, can't buy a BOOST for under $350. By your own account, only two makers of 7950 BOOST. Feel free to retract your comment ;)
  • CeriseCogburn - Sunday, August 19, 2012 - link

    NO ONE plays crysis anymore, it's merely a placeholder to prop up AMD card stats. It's blatantly sick as Crysis 2 is out.
    It's IMMENSE bias for amd.
  • Galidou - Sunday, August 19, 2012 - link

    They use Crysis 2 almost everywhere on the internet again because of one reason, it's heavy, no one plays 3dMark because it's not a game still it's always included in reviews because it's relevant to performance.
  • TheJian - Monday, August 20, 2012 - link

    Read it again...He said NOBODY plays CRYSIS. He's confirming what I said.

    The complaint wasn't about crysis 1...It was about benchmarking a game from 2008 that isn't played, and is based on CryEngine 2 which a total of 7 games were based on since 2007. Crysis 1, warhead, Blues Mars (what? Not one metacritic review), Vigilance (what? no pc version),Merchants of Brooklyn, no reviews, The Day (?) and Merchants of Brooklyn,(?) Entropia Brooklyn (?). Who cares?

    The complaint is Anantech should use CRYSIS 2! With the hires patch and DX11 patch, with everything turned on. The CryEngine 3 game engine is used in 23 games, including the coming crysis 3! Though after a little more homework I still think this will be a victory for AMD, it's far more relevant and not a landslide by any means. But it IS relevant NV loser or not. Crysis 2 is still being played and I'm sure crysis 3 will for at least a while soon. 3x the games made on this engine...Warhead should be tossed and Crysis 2 used. But not without loading the 3 patches that get you all this goodness.
  • Galidou - Monday, August 20, 2012 - link

    Well I meant Crysis, not the 2, confused there. Even if no one plays the first one it's still very intensive but true, they should use crysis 2 as it's more relevant of games played now...
  • CeriseCogburn - Thursday, August 23, 2012 - link

    Yes we all play 3dmark and upload our scores and compare.
    Not sure about you, you only play one game that now conveniently got an amd driver boost.
    Good for amd they actually did something for once - although i'll be glad to hear how many times it crashes for you each night @ 1300 WC.
    It will be a LOT. Believe me. 30 mods, not as many as myself, but you'll be going down with CCC often.
  • Galidou - Thursday, August 23, 2012 - link

    Of all the video cards I had, and I had ALOT from the geforce 2 GTS up to my actually retreated 6850 crossfire(just received my Sapphire 7950 OC) I had close to 0 problems. How could you know anything about CCC while it's obvious you didn't have an AMD video card in years.

    I have 30 mods because it was already straining my limited video memory and I had a problem with one of them already(realistic sounds of thunder) which was related to my hi-fi sound card driver(asus xonar STX) that I found lately.

    I had no problem with CCC at all, other than using it to scale my LCD TV so it fits all the screen and using my game profiles. I didn't touch it much in the last year. It played Dirt 2, 3, Skyrim, GTA 4, Fallout 3, Fallout NV, Oblivion!!, and so on without a problem. And yet, you try to tell me I'll have problem with a program you don't know a thing about.

    But just so you might appreciate me for my efforts, my wife decided to change the 4870 for the forthcoming Guild wars 2 for energy and temperature reason. So I got her a 660 ti as my 6850 were already sold to a friend. She game at 1080p only and I didn't want to overclock her stuff so, it was obvious. At the same time I'll be able to compare both, but I already know I like Nvidia's UI more than AMD's CCC though they look quite alike now.

    BTW just for the sake of it I researched with google:

    AMD drivers keep crashing:
    3,54 million results

    Nvidia drivers keep crashing:
    3,37 million results
  • CeriseCogburn - Thursday, August 23, 2012 - link

    The reason I say what I do is because I DO HAVE A LOT of amd cards, you DUMMY.
  • CeriseCogburn - Thursday, August 23, 2012 - link

    you're another idiot that gets everything wrong, attacks others for what they HAVE NOT SAID, gets corrected again and again, makes another crap offshot lie, then, OF COURSE - HAS A PERFECT DUAL AMD SETUP THAT HAS NEVER HAD A PROBLEM, EVA!
    That means you have very little experience, a freaking teensy tiny tiny bit.
    Look in the mirror dummy.
  • Galidou - Sunday, August 19, 2012 - link

    The 7950b is crap, I don't even want to hear about a reference design with a little boost. On newegg there are 4 cards out of 18 that are reference and the others are mainly overclocked models with coolers ALOT better which will overclock terribly good.

    It's easy for the average user to see the win for nvidia considering 20% of the overclock has been already done and there's not much headroom left..... Once overclocked, the only one that's faster for the 660 ti, remains portal 2.

    The Zotac might only have 14mhz more on base clock but the core clock is not the thing here, the zotac is the better of the pack because it comes with memory overcloked to 6,6ghz which is the only weakness of the 660ti, memory bandwidth. There's a weird thing in here tho, I found the minimum fps on another review, but on anandtech, the minimum appeared only in the games that it was less noticeable, good job again Nvidia.

Log in

Don't have an account? Sign up now