GTX 550 Ti’s Quirk: 1GB Of VRAM On A 192-bit Bus

One thing that has always set NVIDIA apart from AMD is their willingness to use non-power of 2 memory bus sizes. AMD always sticks to 256/128/64 bit busses, while NVIDA has used those along with interesting combinations such as 384, 320, and 192 bit busses. This can allow NVIDIA to tap more memory bandwidth by having a wider bus, however they also usually run their memory slower than AMD’s memory on comparable products, so NVIDIA’s memory bandwidth advantage isn’t quite as pronounced. The more immediate ramifications of this however are that NVIDIA ends up with equally odd memory sizes: 1536MB, 1280MB, and 768MB.

768MB in particular can be problematic. When the GTX 460 launched, NVIDIA went with two flavors: 1GB and 768MB, the difference being how many memory controller/ROP blocks were enabled, which in turn changed how much RAM was connected. 768MB just isn’t very big these days – it’s only as big as NVIDIA’s top of the line card back at the end of 2006. At high resolutions with anti-aliasing and high quality textures it’s easy to swamp a card, making 1GB the preferred size for practically everything from $250 down. So when NVIDIA has a 768MB card and AMD has a 1GB card, NVIDIA has a definite marketing problem and a potential performance problem.

Video Card Bus Width Comparison
NVIDIA Bus Width   AMD Bus Width
GTX 570 320-bit   Radeon HD 6970 256-bit
GTX 560 Ti 256-bit   Radeon HD 6950 256-bit
GTX 460 768MB 192-bit   Radeon HD 6850 256-bit
GTX 550 Ti 192-bit   Radeon HD 5770 128-bit
GTS 450 128-bit   Radeon HD 5750 128-bit

NVIDIA’s solution is to normally outfit cards with more RAM to make up for the wider bus, which is why we’ve seen 1536MB and 1280MB cards going against 1GB AMD cards. With cheaper cards though the extra memory (or higher density memory) is an extra cost that cuts in to margins. So what do you do when you have an oddly sized 192-bit memory bus on a midrange card? For GTS 450 NVIDIA disabled a memory controller to bring it down to 128-bit, however for GTX 550 Ti they needed to do something different if they wanted to have a 192-bit bus while avoiding having only 768MB of memory or driving up costs by using 1536MB of memory. NVIDIA’s solution was to put 1GB on a 192-bit card anyhow, and this is the GTX 550 Ti’s defining feature from a technical perspective.

Under ideal circumstances when inter leaving memory banks you want the banks to be of equal capacity, this allows you to distribute most memory operations equally among all banks throughout the entire memory space. Video cards with their non-removable memory have done this for ages, however full computers with their replaceable DIMMs have had to work with other layouts. Thus computers have supported additional interleaving options beyond symmetrical interleaving, most notably “flex” interleaving where one bank is larger than the other.

It’s this technique that NVIDIA has adopted for the GTX 550 Ti. GF116 has 3 64-bit memory controllers, each of which is attached to a pair of GDDR5 chips running in 32bit mode.  All told this is a 6 chip configuration, with NVIDIA using 4 1Gb chips and 2 2Gb chips. In the case of our Zotac card – and presumably all GTX 550 Ti cards – the memory is laid out as illustrated above, with the 1Gb devices split among 2 of the memory controllers, while both 2Gb devices are on the 3rd memory controller.

This marks the first time we’ve seen such a memory configuration on a video card, and as such raises a number of questions. Our primary concern at this point in time is performance, as it’s mathematically impossible to organize the memory in such a way that the card always has access to its full theoretical memory bandwidth. The best case scenario is always going to be that the entire 192-bit bus is in use, giving the card 98.5GB/sec of memory bandwidth (192bit * 4104MHz / 8), meanwhile the worst case scenario is that only 1 64-bit memory controller is in use, reducing memory bandwidth to a much more modest 32.8GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios, and at this time they are labeling the internal details of their memory bus a competitive advantage, meaning they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, which we’re going to have to poke and prod at to try to determine how NVIDIA is distributing memory operations.

Our base assumption is that NVIDIA is using a memory interleaving mode similar to “flex” modes on desktop computers, which means lower memory addresses are mapped across all 3 memory controllers, while higher addresses are mapped to the remaining RAM capacity on the 3rd memory controller. As such NVIDIA would have the full 98.5GB/sec of memory bandwidth available across the first 768MB, while the last 256MB would be much more painful at 32.8GB/sec. This isn’t the only way to distribute memory operations however, and indeed NVIDIA doesn’t have to use 1 method at a time thanks to the 3 memory controllers, so the truth is likely much more complex.

Given the black box nature of GTX 550’s memory access methods, we decided to poke at things in the most practical manner available: CUDA. GPGPU operation makes it easy to write algorithms that test the memory across the entire address space, which in theory would make it easy to determine GTX 550’s actual memory bandwidth, and if it was consistent across the entire address space. Furthermore we have another very similar NVIDIA card with a 192-bit memory bus on hand – GTX 460 768MB – so it would be easy to compare the two and see how a pure 192-bit card would compare.

We ran in to one roadblock however: apparently no one told the CUDA group that GTX 550 was going to use mixed density memory. As it stands CUDA (and other APIs built upon it such as OpenCL and DirectCompute) can only see 768MB minus whatever memory is already in use. While this lends support to our theory that NVIDIA is using flex mode interleaving, this makes it nearly impossible to test the theory at this time as graphics operations aren’t nearly as flexible enough (and much more prone to caching) to test this.


CUDA-Z: CUDA Available Memory. Clockwise, Top-Left: GTS 450, GTX 460  768MB, GTX 550 Ti

At this point NVIDIA tells us it’s a bug and that it should be fixed by the end of the month, however until such a time we’re left with our share of doubts. Although this doesn’t lead to any kind of faulty operation, this is a pretty big bug to slip through NVIDIA’s QA process, which makes it all the more surprising.

In the meantime we did do some testing against the more limited memory capacity of the GTX 550. At this point the results are inconclusive at best. Using NVIDIA’s Bandwidth Test CUDA sample program, which is a simple test to measure memcopy bandwidth of the GPU, we tested the GTS 450, GTX 468 768MB, GTX 460 1GB, and GTX 550 Ti at both stock and normalized (GTX 460) clocks. The results were inconclusive – the test seems to scale with core clocks far more than memory bandwidth – which may be another bug, or an artifact of the program having originally been written pre-Fermi. In any case here is the data, but we have low confidence in it.

As it stands the test shows almost no gain over the GTS 450 at normalized clocks; this doesn’t make a great deal of sense under any memory interleaving scheme, hence the low confidence. If and when all the bugs that may be causing this are fixed, we’ll definitely be revisiting the issue to try to better pin down how NVIDIA is doing memory interleaving.

Index Meet The Zotac GeForce GTX 550 Ti AMP Edition
POST A COMMENT

79 Comments

View All Comments

  • HangFire - Tuesday, March 15, 2011 - link

    Advertisers hate it when you see how competitive older offerings are with new stuff. So, little-used features like in DirectX 11 are used to force out comparisons with older cards that still deliver great frame rates and value, and cause users to not upgrade for a while.

    We won this battle for a while and AT had a few older cards it included in its benchmarks. Now its back to nothing but the latest.
    Reply
  • morphologia - Tuesday, March 15, 2011 - link

    "Nothing but the latest?" The 4870 and 4870X2 shown in this comparison are hardly current. I suppose the 4870 is less likely to outperform than the 4890 is, but the X2 makes an even stronger showing. Still does not make sense.

    Also, on an unrelated note, it looked like the 6990 was dropping out of various comparison scales without explanation. BattleForge 1680x1050, for example. 6990 dominated the 1920x1200 but was inexplicably absent from 1680x1050, instead the 580 topped that chart. What's up with that??
    Reply
  • Ryan Smith - Tuesday, March 15, 2011 - link

    The 6990 is not on any of the 1680 benchmarks. It's already CPU limited at 1920; at 1680 it's useless data since no one is going to use it at that resolution. Reply
  • Ryan Smith - Tuesday, March 15, 2011 - link

    Due to the amount of time it takes to benchmark (and rebenchmark) GPUs, it's necessary to keep a truncated list, and from there not every card actually makes it in to the article (and this is why we have GPU Bench). As such I focus on the current and previous generation of GPUs, while throwing in a sampling of 3rd & 4th generation GPUs as a baseline.

    I specifically pick these older GPUs based on architecture and relative performance - the idea being that while we don't have every GPU in the system, if it's a few years old it's well established how other minor variations of that GPU perform relative to the one in our results database. So in this case the 4870 is in there both because it's easy to visualize where the 4850/4890 would be relative to it, and because it was easily the most popular 4800 card.
    Reply
  • morphologia - Tuesday, March 15, 2011 - link

    Seems like the 4870X2 was a bit of a spoiler, seeing as how it trumped a few of even the current generation, though it too was dropping in and out of the bar charts with no explanation. If you are going to include it at all, there should be more consistency. Otherwise it looks like ranking/stat doctoring. Reply
  • 7Enigma - Thursday, March 17, 2011 - link

    Ryan's already mentioned why. It's a dual GPU card at the time was likely not tested at the low resolutions this particular article used for these lower-end cards. Likely 1920X1200 (or 1080) was the lowest this card was benchmarked at. I applaud Anandtech for including the data they have, and as mentioned you can use Bench to compare to your hearts desire. Bottom line: it is unlikely someone is gaming at less than 24" resolutions with a 4870X2, and if they are they can use Bench for that particular purpose.

    These guys have enough to do without going back and retesting cards from years ago. I'm just glad the data is in there.
    Reply
  • nwarawa - Tuesday, March 15, 2011 - link

    I didn't hear much complaining about the GTX460 768MB all this time : all the reviews were heralding its value. Now we have an even less powerful GPU, and 768MB suddenly becomes an issue? The heck with that. 768MB should be the standard configuration for this card, with a MSRP of $129. If you want high resolutions with AA, you should be getting a more powerful GPU as well. Nvidia should use a 768MB model of the GTX550 to phase out the 768MB GTX460, keep the 1GB GTX460 for awhile, and encourage more brands to bin their GTX560Ti's and make some 2GB models (I know Palit/Gainward does one, but no availability where I live). An overclocked 2GB GTX560Ti would be handy in a handful of games (GTA4 immediately comes to mind), and would compete well with a 6950... leaving the GTX570 to dance with the 6970, and the GTX580 to maintain its single-chip lead. Reply
  • HangFire - Wednesday, March 16, 2011 - link

    768MB did not suddenly become an issue. Previous AT articles on the two 460's have repeated warnings that 768MB would soon be not enough memory.

    Agreed that if you lower the price enough, 768MB becomes "enough" as you are unlikely to be driving high resolutions with the corresponding large numbers of in-memory high resolution textures with a low-end card. At moderate resolutions, 768MB is enough.
    Reply
  • Belard - Tuesday, March 15, 2011 - link

    (AMD - you still suck for naming the SLOWER 6870 cards to replace the 5870s etc)

    LOL - this would be FUNNY if it wasn't so sad.

    1 - The "State of the art" 550 Ti (Total idiot) card is 0~5% faster than the the 1+ year old ATI 5770. Really, other than for reference - 1280x1024 scores are useless for todays monitors. $120~140 means buying a 20" 1920x1080 monitor, $160~200 is a 21~22" model. I'm missing the 1920x1200 since its not so bloody narrow. I'd love to see a 26~27" that does 2560 x 1600 on the market.

    So when comparing the results for 1920x1080, which is a STANDARD for today. The 550 is sometimes 0~3 fps faster, sometimes slower.

    2 - Price!? The 5770 is easily 1/3rd cheaper going for $100~125 vs, $150~160.

    3 - Stupid model names!? GeForce was given to the series. So WTF is GFX good for? If the 550 is almost the bottom end... why not GTS like the GTS 450 or GTS 250? There is no consistency. It doesn't donate feature sets.
    "TI" okay... What is the difference between a TI and a NON TI card? Oh yeah, the letters on the box and in bios, nothing else. Why bother?

    We know Nivdia will most likely skip the 600 series (What happened to the 300s?) so they too can be "7s" with ATI. So well we see:
    Nvidia GeForce GT 720 mx
    Nvidia Geforce GTS 740 Pro
    Nvidia Geforce GTX 780 Ultra

    The Geforce 550 or "GF550" is ALL we need to know what the product is.

    4 - ATI 6850... it should have been included in this benchmark since its in the same price range. Newegg as them for $150~180 ($160 avg). It would really show what people are paying for. The 6850 is about 10fps faster than the 5770/GF550

    5 - GF 460-768 price is $130~160.. again, about 10fps faster than the GF550. But oh yeah, the 550 replaces the older and faster card. hate it when that hapens!

    Think I'll hold on to my ATI 4670 until the AMD 7670/7770 comes out... I want the performance of a 5870/6950 with the heat/noise and power issues of a 5770 at a price under $180.
    Reply
  • phoible_123 - Tuesday, March 15, 2011 - link

    I find it kind of interesting that nVidia's price points for their mid-range cards are higher this time around.

    The GTX460 started at $220 IIRC (and had its price cut pretty quickly), while the GTX560 is $259. The price for the 460 was pretty killer, but by the time the 560 came out, the pricing was pretty ho hum. If they had launched it at the same price level as the 460, AMD wouldn't have been able to compete. Granted, I'm sure they priced it that way to keep margins high, but this is a process improvement rather than an all-new chip (basically it's the third-gen gf100)...

    The GTS450 was $130, while the GTX550 "TI" is $150. And when the GTS450 came out, the value prop wasn't that good (when compared to the 460). It's like half the card for only a little less money.

    I recently picked up a GTX460 768MB for $90 after rebates.

    It's kind of like the radeon 48xx vs 57xx comparison (less card for more money).
    Reply

Log in

Don't have an account? Sign up now