GTX 550 Ti’s Quirk: 1GB Of VRAM On A 192-bit Bus

One thing that has always set NVIDIA apart from AMD is their willingness to use non-power of 2 memory bus sizes. AMD always sticks to 256/128/64 bit busses, while NVIDA has used those along with interesting combinations such as 384, 320, and 192 bit busses. This can allow NVIDIA to tap more memory bandwidth by having a wider bus, however they also usually run their memory slower than AMD’s memory on comparable products, so NVIDIA’s memory bandwidth advantage isn’t quite as pronounced. The more immediate ramifications of this however are that NVIDIA ends up with equally odd memory sizes: 1536MB, 1280MB, and 768MB.

768MB in particular can be problematic. When the GTX 460 launched, NVIDIA went with two flavors: 1GB and 768MB, the difference being how many memory controller/ROP blocks were enabled, which in turn changed how much RAM was connected. 768MB just isn’t very big these days – it’s only as big as NVIDIA’s top of the line card back at the end of 2006. At high resolutions with anti-aliasing and high quality textures it’s easy to swamp a card, making 1GB the preferred size for practically everything from $250 down. So when NVIDIA has a 768MB card and AMD has a 1GB card, NVIDIA has a definite marketing problem and a potential performance problem.

Video Card Bus Width Comparison
NVIDIA Bus Width   AMD Bus Width
GTX 570 320-bit   Radeon HD 6970 256-bit
GTX 560 Ti 256-bit   Radeon HD 6950 256-bit
GTX 460 768MB 192-bit   Radeon HD 6850 256-bit
GTX 550 Ti 192-bit   Radeon HD 5770 128-bit
GTS 450 128-bit   Radeon HD 5750 128-bit

NVIDIA’s solution is to normally outfit cards with more RAM to make up for the wider bus, which is why we’ve seen 1536MB and 1280MB cards going against 1GB AMD cards. With cheaper cards though the extra memory (or higher density memory) is an extra cost that cuts in to margins. So what do you do when you have an oddly sized 192-bit memory bus on a midrange card? For GTS 450 NVIDIA disabled a memory controller to bring it down to 128-bit, however for GTX 550 Ti they needed to do something different if they wanted to have a 192-bit bus while avoiding having only 768MB of memory or driving up costs by using 1536MB of memory. NVIDIA’s solution was to put 1GB on a 192-bit card anyhow, and this is the GTX 550 Ti’s defining feature from a technical perspective.

Under ideal circumstances when inter leaving memory banks you want the banks to be of equal capacity, this allows you to distribute most memory operations equally among all banks throughout the entire memory space. Video cards with their non-removable memory have done this for ages, however full computers with their replaceable DIMMs have had to work with other layouts. Thus computers have supported additional interleaving options beyond symmetrical interleaving, most notably “flex” interleaving where one bank is larger than the other.

It’s this technique that NVIDIA has adopted for the GTX 550 Ti. GF116 has 3 64-bit memory controllers, each of which is attached to a pair of GDDR5 chips running in 32bit mode.  All told this is a 6 chip configuration, with NVIDIA using 4 1Gb chips and 2 2Gb chips. In the case of our Zotac card – and presumably all GTX 550 Ti cards – the memory is laid out as illustrated above, with the 1Gb devices split among 2 of the memory controllers, while both 2Gb devices are on the 3rd memory controller.

This marks the first time we’ve seen such a memory configuration on a video card, and as such raises a number of questions. Our primary concern at this point in time is performance, as it’s mathematically impossible to organize the memory in such a way that the card always has access to its full theoretical memory bandwidth. The best case scenario is always going to be that the entire 192-bit bus is in use, giving the card 98.5GB/sec of memory bandwidth (192bit * 4104MHz / 8), meanwhile the worst case scenario is that only 1 64-bit memory controller is in use, reducing memory bandwidth to a much more modest 32.8GB/sec.

How NVIDIA spreads out memory accesses will have a great deal of impact on when we hit these scenarios, and at this time they are labeling the internal details of their memory bus a competitive advantage, meaning they’re unwilling to share the details of its operation with us. Thus we’re largely dealing with a black box here, which we’re going to have to poke and prod at to try to determine how NVIDIA is distributing memory operations.

Our base assumption is that NVIDIA is using a memory interleaving mode similar to “flex” modes on desktop computers, which means lower memory addresses are mapped across all 3 memory controllers, while higher addresses are mapped to the remaining RAM capacity on the 3rd memory controller. As such NVIDIA would have the full 98.5GB/sec of memory bandwidth available across the first 768MB, while the last 256MB would be much more painful at 32.8GB/sec. This isn’t the only way to distribute memory operations however, and indeed NVIDIA doesn’t have to use 1 method at a time thanks to the 3 memory controllers, so the truth is likely much more complex.

Given the black box nature of GTX 550’s memory access methods, we decided to poke at things in the most practical manner available: CUDA. GPGPU operation makes it easy to write algorithms that test the memory across the entire address space, which in theory would make it easy to determine GTX 550’s actual memory bandwidth, and if it was consistent across the entire address space. Furthermore we have another very similar NVIDIA card with a 192-bit memory bus on hand – GTX 460 768MB – so it would be easy to compare the two and see how a pure 192-bit card would compare.

We ran in to one roadblock however: apparently no one told the CUDA group that GTX 550 was going to use mixed density memory. As it stands CUDA (and other APIs built upon it such as OpenCL and DirectCompute) can only see 768MB minus whatever memory is already in use. While this lends support to our theory that NVIDIA is using flex mode interleaving, this makes it nearly impossible to test the theory at this time as graphics operations aren’t nearly as flexible enough (and much more prone to caching) to test this.


CUDA-Z: CUDA Available Memory. Clockwise, Top-Left: GTS 450, GTX 460  768MB, GTX 550 Ti

At this point NVIDIA tells us it’s a bug and that it should be fixed by the end of the month, however until such a time we’re left with our share of doubts. Although this doesn’t lead to any kind of faulty operation, this is a pretty big bug to slip through NVIDIA’s QA process, which makes it all the more surprising.

In the meantime we did do some testing against the more limited memory capacity of the GTX 550. At this point the results are inconclusive at best. Using NVIDIA’s Bandwidth Test CUDA sample program, which is a simple test to measure memcopy bandwidth of the GPU, we tested the GTS 450, GTX 468 768MB, GTX 460 1GB, and GTX 550 Ti at both stock and normalized (GTX 460) clocks. The results were inconclusive – the test seems to scale with core clocks far more than memory bandwidth – which may be another bug, or an artifact of the program having originally been written pre-Fermi. In any case here is the data, but we have low confidence in it.

As it stands the test shows almost no gain over the GTS 450 at normalized clocks; this doesn’t make a great deal of sense under any memory interleaving scheme, hence the low confidence. If and when all the bugs that may be causing this are fixed, we’ll definitely be revisiting the issue to try to better pin down how NVIDIA is doing memory interleaving.

Index Meet The Zotac GeForce GTX 550 Ti AMP Edition
POST A COMMENT

79 Comments

View All Comments

  • Soldier1969 - Wednesday, March 16, 2011 - link

    Poor mans card, come back when you get at least a 580 or better... Reply
  • valenti - Thursday, March 17, 2011 - link

    Ryan, can you explain where the nodes per day numbers come from?

    I spend a fair amount of time hanging around folding sites, I can't think of anybody else that uses the nodes per day metric. Most people use PPD (points per day), easily gathered from a F@H statistics program such as HFM.net

    I'm unsure how to convert nodes/day to PPD, if that is possible. In actual practice, I find that a 450 card nets almost 10,000PPD, while a 460 gets about 12,000. I prefer the 450, after taking into effect price and power needs.

    You might want to search on "capturing a WU" to read about how to copy a protein's work files, allowing you to run the same protein for each card.
    Reply
  • suddenone - Thursday, March 17, 2011 - link

    How good this card is depends on what monitor you plan on using. Anyone that has a small monitor and plans to keep it for at least three years might be happy with this card. I am puzzled to see fps over 60 on a standard 60hz monitor. My gtx 460 ran fine until I upgraded to a 24 inch 1080p monitor. I sold the card on ebay and bought the gtx 570( big gun). The gtx 570 can run any of my games at over 30fps min with all effects at maximum. Peace out. Reply
  • ol1bit - Thursday, March 17, 2011 - link

    I bought 2 460's 6 months ago it so. For 149 each, and they beat the 550 in every catigory.

    This is a bad launch by Nivida. Old product is faster at same price or lower.
    Reply
  • meatfestival - Monday, March 21, 2011 - link

    Not sure if it's already been posted, but although it does indeed use id Tech 4, Raven wrote a custom DX9 renderer for it.

    The multiplayer component (based on the Quake Wars fork of id Tech 4) is on the old OpenGL renderer.
    Reply
  • ClagMaster - Friday, March 25, 2011 - link

    The GF116 has much optimized and improved performance over that of the GT-450 it is destined to replace.

    However, the GTX-550 Ti is a low-end replacement for the GT-450 and is priced to much. I wished nVidia called it the GT-550 and reserved the Ti designation for something truly high performance.

    For $20 more, I can get a ATI-6850 that gives me 25% more performance for 10% less power consumption.
    Reply
  • chrcoluk - Friday, October 21, 2011 - link

    I think people are been very harsh on the card.

    its slower than the older 460 on these benchmarks but the gap isnt a gulf and in addition peopel are completely discounting the power factor. The major problem with graphics cards today is they use too much power, intel have managed to reduce power load whilst increasing performance, yet nvidia cant do the same.

    Also no considerations taken on what people are upgrading from, I currently have an 8800GT and am only considering upgrading, a 550 TI would double performance whilst using a little more power on load over the 8800GT and only a 3rd on idle. Buying a 460 would give me a bigger performance boost but I need to connect a 2nd power cable (wtf???) and it uses significantly more power than the 550, about 50% more. so for watts vs performance the 550 beats the 460. But it seems people on here and most other sites consider power consumption as 0 importance, maybe they dont pay their own power bills?

    The 550 TI will have many sales because nvidia know most people dont upgrade every year but more likely 3+ years frequency so the 550TI doesnt have to beat the 460 it just has to be significantly better than DX9 and DX8 cards at a good price point whilst also not needing someone to maybe buy a fatter psu as well (not considered here when comparing prices). The 550TI will double my current performance, and I guess the 460 1 gig would be about an extra 20% or so on top of that. Amount of video ram is also important, texture heavy games will saturate 768meg so in those scenarios the 550 is a better choice than the 768 460 model.
    Reply
  • xKerberos - Tuesday, January 17, 2012 - link

    Oh finally someone who makes some sense! In my country, GTX460s are more expensive than GTX550 Tis and my overclocked Gigabyte 9800gt was struggling to run Battlefield 3 so I picked up an MSI GTX550 Ti with 1gb of ram and factory overclocked. It runs BF3 like a dream at high. And seeing how it munches on VRAM, I'm surprised how people with 768mb cards run that game.

    Granted I have a tiny 1280x1024 Dell UltraSharp 19" monitor so I don't need a high end card to max games out. My 9800gt was doing a fine job until BF3 came out. I also had to fork out for 8gb of ddr2 ram to run that beast properly.

    Anyway what I like about my new card is it was cheap (half the price of GTX560, although at half the performance), it runs very cool (36C on idle in a warm tropical country! max 60C on load), very quiet and it does the job. I won't have to upgrade for a while yet!
    Reply
  • UpStateMike - Wednesday, February 22, 2012 - link

    I have to agree with this. About a month ago I began the project build to replace an ancient dinosaur that was maxed out and tired.

    I kept my good 630W ps, a 24" led 1080p viewsonic monitor, and a 9800 gtx card that I figured I could use in the new build.

    I started with a new case - rosewill challenger, and build around an i5 Sandy Bridge 2500k put into asus p8z68-vpro mb. I have 8gb of gskill ddr3 1333 at the moment.

    Now that this is all up and running, instead of my old system being the performance bottleneck of the video card, the card is now the bottleneck.

    I began by trying to cheap out and stay at $80 range, and OC the cards mentioned here, but I liked the lower power use (I'm a medium gamer but I wanted a PC that could handle any game I want to get in the future) of the 550ti and although I'm at $120 for this card, my guess is when I go for my next upgrade phase in about 6 months or so, I can wait for a good deal to come along and get another one to SLI and another 8gb of memory.

    So for me, this is a big upgrade that I can build on to SLI and get to where I should be happy for the next couple of years. I do more photography editing and whatnot so this greatly helps me with that and I can still game as I get some time. I can appreciate that the differences are minor for anyone that has bought a card in the last year, but coming from a 9800gtx I'm very happy.
    Reply

Log in

Don't have an account? Sign up now