Musing About Memory Bandwidth & The Test

As we discussed in our introduction, NVIDIA is launching the GeForce GT 640 exclusively as a DDR3 part. Because of the lack of memory bandwidth this is going to hold back the performance of the card, and while we don’t have a GDDR5 card at this time it will still be pretty easy to see what the performance impact is once we jump into our gaming performance section and compare it to other cards like the GTS 450.

In the meantime however we wanted to try to better visualize just how little memory bandwidth the DDR3 GT 640 had. It’s one thing to say that the card has 28GB/sec of memory bandwidth, but how does that compare to other cards? For the answer to that we drew up a couple of graphs based on the ratio of theoretical memory bandwidth to theoretical performance.

The first graph is the ratio of memory bandwidth to color pixel ROPs (bytes per color operation), which is an entirely synthetic metric but a great illustration of the importance of memory bandwidth. Render operations are the biggest consumer of memory bandwidth so this is where DDR3 cards typically choke.

Because of the nature of this graph cards are all over the place, with cards with particularly unusual configurations (such as the 4 ROP GT 440) appearing near the top. Still, high-end cards such as the GTX 680 and Radeon HD 7970 have among the highest ratio of memory bandwidth to color render operations, while lower-end cards generally have a lower ratio. At 1.97 B/cOP, the DDR3 GT 640 has the lowest ratio by far, and is only 66% of the ratio found on the next-lowest card, the GeForce GT 440 OEM. The fact of the matter is that because of its high clockspeed and 16 ROPs, the DDR3 GT 640 has by far the lowest ratio than any other card before it.

Our second graph is the ratio of memory bandwidth to shader operations (bytes per FLOP). Shaders aren’t nearly as bandwidth constrained as ROPs thanks to liberal use of register files and caches, but I wanted to take a look at more than just one ratio. Compared to our ROP chart the GT 640 isn’t nearly as much of an outlier, but it still has the lowest ratio of memory bandwidth per FLOP out of all of the cards in our charts.

Now to be clear not all of this is NVIDIA’s fault. Memory speeds have not kept pace with Moore’s Law, so GPU performance has been growing faster than memory speeds for quite some time leading to a general downwards trend. But the GT 640 is a special card in this respect in that there has never been a card this starved for memory bandwidth. This is further impacted by the fact that while GDDR5 speeds have at least been increasing as a modest rate over the years as GPU memory controllers and memory chip production have improved, DDR3 memory speeds have been locked in the 1.6GHz-1.8GHz range for years. Simply put, the gap between GDDR5 and DDR3 has never been greater. Even a conservative memory clock of 4.5GHz would give a GDDR5 card 2.5 times the memory bandwidth of a typical DDR3 card.

Of course there’s still a place in the world for DDR3 cards, particularly in very low power situations, but that place is shrinking in size every day. If and when it arrives, we expect that the GDDR5 GT 640 will quickly trounce the DDR3 version in virtually all gaming scenarios. DDR3 for a card this power hungry (relatively speaking) and with this many ROPs just doesn’t look like it makes a lot of sense.

The Test

For our test we’re using the latest NVIDIA drivers at the time our benchmarks were taken (301.42), and for AMD’s cards we’re using the new Catalyst 12.6 betas. For analysis purposes we’ve thrown in a couple of additional cards that we don’t normally test, such as the GDDR5 GeForce GT 240. The GDDR5 GT 240 has 43.5GB/sec of memory bandwidth but with a far older and less powerful GPU, which makes for an interesting comparison on progress.

CPU: Intel Core i7-3960X @ 4.3GHz
Motherboard: EVGA X79 SLI
Chipset Drivers: Intel 9.​2.​3.​1022
Power Supply: Antec True Power Quattro 1200
Hard Disk: Samsung 470 (256GB)
Memory: G.Skill Ripjaws DDR3-1867 4 x 4GB (8-10-9-26)
Case: Thermaltake Spedo Advance
Video Cards: AMD Radeon HD 7770
AMD Radeon HD 7750-800
AMD Radeon HD 6670
AMD Radeon HD 5750
NVIDIA GeForce GT 640 DDR3
NVIDIA GeForce GTX 550 Ti
NVIDIA GeForce GTS 450
NVIDIA GeForce GT 440
NVIDIA GeForce GT 240
Video Drivers: NVIDIA ForceWare 301.42
AMD Catalyst 12.6 Beta
OS: Windows 7 Ultimate 64-bit


HTPC Aspects : Decoding and Rendering Benchmarks Crysis, Metro, DiRT 3, Shogun 2, & Batman: Arkham City


View All Comments

  • extide - Wednesday, June 20, 2012 - link

    For posting folding benchmarks! A lot of people really appreciate that! Reply
  • Zink - Wednesday, June 20, 2012 - link

    No one else uses your benchmarking tool and it doesn't always correlate to performance with current F@H projects but that is the only reason I care about GPUs.
  • Marlin1975 - Wednesday, June 20, 2012 - link

    Good design if it had DDR5. If they can do 2gig of DDR5 then it be a great mid-price card. Reply
  • Homeles - Wednesday, June 20, 2012 - link

    It would still be terrible until the price dropped. Reply
  • Samus - Thursday, June 21, 2012 - link

    There's no reason this wouldn't be similar in speed to a GTX460 if it had DDR5. The only difference would be 128-bit vs 192-bit memory bus, everything else would be an advantage: same number cores, substantially higher clock speed, lower power consumption increasing overclocking headroom, etc. Reply
  • MrSpadge - Thursday, June 21, 2012 - link

    You forget: substantially lower shader clock speed, more coarse shader grouping -> more difficult to use them all at once, and software scheduling -> need a better compiler, can't do runtime optimizations. Reply
  • t_case - Wednesday, June 20, 2012 - link

    So who has the Sony VPL-vw1000ES? Now that's a nice projector... only roughly the price of a new car heh. Reply
  • stephenasmith - Wednesday, June 20, 2012 - link

    I love me some painfully slow gaming! Reply
  • nitrousoxide - Wednesday, June 20, 2012 - link

    Just curious if the most powerful IGP can keep up with entry-level Kepler Reply
  • Roland00Address - Wednesday, June 20, 2012 - link

    But this should get you an idea of what performance you would be getting with llano. (Numbers taken from Llano review that appeared 12 months ago so drivers will be old.)

    Crysis Warhead 1680x1050 performance quality
    A8-6550D with 1600 mhz memory
    58.8 fps
    A8-6550D with 1866 mhz memory
    62.5 fps
    99.8 fps

    This makes the 640 about 69.7% faster than a non overclock Llano (people are going to get 1600mhz memory).

Log in

Don't have an account? Sign up now