That Darn Memory Bus

Among the entire GTX 600 family, the GTX 660 Ti’s one unique feature is its memory controller layout. NVIDIA built GK104 with 4 memory controllers, each 64 bits wide, giving the entire GPU a combined memory bus width of 256 bits. These memory controllers are tied into the ROPs and L2 cache, with each controller forming part of a ROP partition containing 8 ROPs (or rather 1 ROP unit capable of processing 8 operations), 128KB of L2 cache, and the memory controller. To disable any of those things means taking out a whole ROP partition, which is exactly what NVIDIA has done.

The impact on the ROPs and the L2 cache is rather straightforward – render operation throughput is reduced by 25% and there’s 25% less L2 cache to store data in – but the loss of the memory controller is a much tougher concept to deal with. This goes for both NVIDIA on the design end and for consumers on the usage end.

256 is a nice power-of-two number. For video cards with power-of-two memory bus widths, it’s very easy to equip them with a similarly power-of-two memory capacity such as 1GB, 2GB, or 4GB of memory. For various minor technical reasons (mostly the sanity of the engineers), GPU manufacturers like sticking to power-of-two memory busses. And while this is by no means a true design constraint in video card manufacturing, there are ramifications for skipping from it.

The biggest consequence of deviating from a power-of-two memory bus is that under normal circumstances this leads to a card’s memory capacity not lining up with the bulk of the cards on the market. To use the GTX 500 series as an example, NVIDIA had 1.5GB of memory on the GTX 580 at a time when the common Radeon HD 5870 had 1GB, giving NVIDIA a 512MB advantage. Later on however the common Radeon HD 6970 had 2GB of memory, leaving NVIDIA behind by 512MB. This also had one additional consequence for NVIDIA: they needed 12 memory chips where AMD needed 8, which generally inflates the bill of materials more than the price of higher speed memory in a narrower design does. This ended up not being a problem for the GTX 580 since 1.5GB was still plenty of memory for 2010/2011 and the high pricetag could easily absorb the BoM hit, but this is not always the case.

Because NVIDIA has disabled a ROP partition on GK104 in order to make the GTX 660 Ti, they’re dropping from a power-of-two 256bit bus to an off-size 192bit bus. Under normal circumstances this means that they’d need to either reduce the amount of memory on the card from 2GB to 1.5GB, or double it to 3GB. The former is undesirable for competitive reasons (AMD has 2GB cards below the 660 Ti and 3GB cards above) not to mention the fact that 1.5GB is too small for a $300 card in 2012. The latter on the other hand incurs the BoM hit as NVIDIA moves from 8 memory chips to 12 memory chips, a scenario that the lower margin GTX 660 Ti can’t as easily absorb, not to mention how silly it would be for a GTX 680 to have less memory than a GTX 660 Ti.

Rather than take the usual route NVIDIA is going to take their own 3rd route: put 2GB of memory on the GTX 660 Ti anyhow. By putting more memory on one controller than the other two – in effect breaking the symmetry of the memory banks – NVIDIA can have 2GB of memory attached to a 192bit memory bus. This is a technique that NVIDIA has had available to them for quite some time, but it’s also something they rarely pull out and only use it when necessary.

We were first introduced to this technique with the GTX 550 Ti in 2011, which had a similarly large 192bit memory bus. By using a mix of 2Gb and 1Gb modules, NVIDIA could outfit the card with 1GB of memory rather than the 1.5GB/768MB that a 192bit memory bus would typically dictate.

For the GTX 660 Ti in 2012 NVIDIA is once again going to use their asymmetrical memory technique in order to outfit the GTX 660 Ti with 2GB of memory on a 192bit bus, but they’re going to be implementing it slightly differently. Whereas the GTX 550 Ti mixed memory chip density in order to get 1GB out of 6 chips, the GTX 660 Ti will mix up the number of chips attached to each controller in order to get 2GB out of 8 chips. Specifically, there will be 4 chips instead of 2 attached to one of the memory controllers, while the other controllers will continue to have 2 chips. By doing it in this manner, this allows NVIDIA to use the same Hynix 2Gb chips they already use in the rest of the GTX 600 series, with the only high-level difference being the width of the bus connecting them.

Of course at a low-level it’s more complex than that. In a symmetrical design with an equal amount of RAM on each controller it’s rather easy to interleave memory operations across all of the controllers, which maximizes performance of the memory subsystem as a whole. However complete interleaving requires that kind of a symmetrical design, which means it’s not quite suitable for use on NVIDIA’s asymmetrical memory designs. Instead NVIDIA must start playing tricks. And when tricks are involved, there’s always a downside.

The best case scenario is always going to be that the entire 192bit bus is in use by interleaving a memory operation across all 3 controllers, giving the card 144GB/sec of memory bandwidth (192bit * 6GHz / 8). But that can only be done at up to 1.5GB of memory; the final 512MB of memory is attached to a single memory controller. This invokes the worst case scenario, where only 1 64-bit memory controller is in use and thereby reducing memory bandwidth to a much more modest 48GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios. In the past we’ve tried to divine how NVIDIA is accomplishing this, but even with the compute capability of CUDA memory appears to be too far abstracted for us to test any specific theories. And because NVIDIA is continuing to label the internal details of their memory bus a competitive advantage, they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, one where poking and prodding doesn’t produce much in the way of meaningful results.

As with the GTX 550 Ti, all we can really say at this time is that the performance we get in our benchmarks is the performance we get. Our best guess remains that NVIDIA is interleaving the lower 1.5GB of address while pushing the last 512MB of address space into the larger memory bank, but we don’t have any hard data to back it up. For most users this shouldn’t be a problem (especially since GK104 is so wishy-washy at compute), but it remains that there’s always a downside to an asymmetrical memory design. With any luck one day we’ll find that downside and be able to better understand the GTX 660 Ti’s performance in the process.

The GeForce GTX 660 Ti Review Meet The EVGA GeForce GTX 660 Ti Superclocked
Comments Locked

313 Comments

View All Comments

  • Oxford Guy - Thursday, August 16, 2012 - link

    What is with the 285 being included? It's not even a DX 11 card.

    Where is the 480? Why is the 570 included instead of the 580?

    Where is the 680?
  • Ryan Smith - Saturday, August 18, 2012 - link

    The 285 was included because I wanted to quickly throw in a GTX 285 card where applicable, since NVIDIA is promoting the GTX 660 Ti as a GTX 200 series upgrade. Basically there was no harm in including it where we could.

    As for the 480, it's equivalent to the 570 in performance (eerily so), so there's never a need to break it out separately.

    And the 680 is in Bench. It didn't make much sense to include a card $200 more expensive which would just compress the results among the $300 cards.
  • CeriseCogburn - Sunday, August 19, 2012 - link

    So you're saying the 680 is way faster than the 7970 which you included in every chart, since the 7970 won't compress those $300 card results.
    Thanks for admitting that the 7970 is so much slower.
  • Pixelpusher6 - Friday, August 17, 2012 - link

    Thanks Ryan. Great review as always.

    I know one of the differentiating factors for the Radeon 7950s is the 3GB of ram but I was curious are there any current games which will max out 2GB of RAM with high resolution, AA, etc.?

    I think it's interesting how similar AMDs and Nvidias GPUs are this generation. I believe Nvidia will be releasing the GTX 660 non Ti based on GK106. Leaked specs seem to be similar to this card but the texture units will be reduced to 64. I wonder how much of a performance reduction this will account for. I think it will be hard for Nvidia to get the same type of performance / $ as say GTX 460 / 560 Ti this generation because of having to have GK104 fill in more market segments.

    Also I wasn't aware that Nvidia was still having trouble meeting demand with GK104 chips I thought those issues were all cleared up. I think when AMD released their 7000 series chips they should have taken advantage of being first to market and been more competitive on price to grow market share rather than increase margins. At that time someone sitting on 8800GT era hardware would be hard pressed to upgrade knowing that AMDs inflated prices would come down once Nvidia brought their GPUs to market. People who hold on to their cards for a number of years is unlikely to upgrade 6 months later to Nvidias product. If AMD cards were priced lower at this time a lot more people would have bought them, thereby beating Nvidia before they even have a card to market. I do give some credit to AMD for preparing for this launch and adjusting prices, but in my opinion this should have been done much earlier. AMD management needs to be more aggressive and catch Nvidia off guard, rather than just reacting to whatever they do. I would "preemptively" strike at the GTX 660 non Ti by lowering prices on the 7850 to $199. Instead it seems they'll follow the trend and keep it at $240-250 right up until the launch of the GTX 660 then lower it to $199.
  • Ryan Smith - Saturday, August 18, 2012 - link

    Pixelpusher, there are no games we test that max out 2GB of VRAM out of the box. 3GB may one day prove to be advantageous, but right even at multi-monitor resolutions 2GB is doing the job (since we're seeing these cards run out of compute/render performance before they run out of RAM).
  • Sudarshan_SMD - Friday, August 17, 2012 - link

    Where are naked images of the card?
  • CeriseCogburn - Thursday, August 23, 2012 - link

    You don't undress somebody you don't love.
  • dalearyous - Friday, August 17, 2012 - link

    it seems the biggest disappointment i see in comments is the price point.

    but if this card comes bundled with borderlands 2, and you were already planning on buying borderlands 2 then this puts the card at $240, worth it IMO.
  • rarson - Friday, August 17, 2012 - link

    but it's the middle of freaking August. While Tahiti was unfortunately clocked a bit lower than it probably should have been, and AMD took a bit too long to bring out the GE edition cards, Nvidia is now practically 8 months behind AMD, having only just released a $300 card. (In the 8 months that have gone by since the release of the 7950, its price has dropped from $450 to $320, effectively making it a competitor to the 660 Ti. AMD is able to compete on price with a better-performing card by virtue of the fact that it simply took Nvidia too damn long to get their product to market.) By the time the bottom end appears, AMD will be ready for Canary Islands.

    It's bad enough that Kepler (and Fermi, for that matter) was so late and so not available for several months, but it's taking forever to simply roll out the lower-tier products (and yes, I know 28nm wafers have been in short supply, but that's partially due to Nvidia's crappy Kepler yields... AMD have not had such supply problems). Can you imagine what would have happened if Nvidia actually tried to release GK110 as a consumer card? We'd have NOTHING. Hot, unmanufacturable nothing.

    Nvidia needs to get their shit together. At the rate they're going, they'll have to skip an entire generation just to get back on track. I liked the 680 because it was a good performer, but that doesn't do consumers any good when it's 4 months late to the party and almost completely unavailable. Perhaps by the end of the year, 28nm will have matured enough and Nvidia will be able to design something that yields decently while still offering the competitiveness that the 680 brought us, because what I'd really like to see is both companies releasing good cards at the same time. Thanks to Fermi and Kepler, that hasn't happened for a while now. Us consumers benefit from healthy competition and Nvidia has been screwing that up for everyone. Get it together, Nvidia!
  • CeriseCogburn - Sunday, August 19, 2012 - link

    So as any wacko fanboy does, you fault nVidia for releasing a card later that drives the very top end tier amd cards down from the 579+ shipping I paid to $170 less plus 3 free games.
    Yeah buddy, it's all nVidia's fault, and they need to get their act together, and if they do in fact get their act together, you can buy the very top amd card for $150, because that's likely all it will be worth.
    Good to know it's all nVidia's fault. AMD from $579+plus ship to $409 and 3 free games and nVidia sucks for not having it's act together.
    The FDA as well as the EPA should ban the koolaid you're drinking.

Log in

Don't have an account? Sign up now