NVIDIA's 1.4 Billion Transistor GPU: GT200 Arrives as the GeForce GTX 280 & 260by Anand Lal Shimpi & Derek Wilson on June 16, 2008 9:00 AM EST
- Posted in
Folding@home Now on NVIDIA
Folding@home, for those who don't know, is a distributed computing app designed to help researchers better understand the process of protien folding. Knowing more about how protiens assemble themselves can help us better understand many diseases such as alzheimers, but protien folding is very complex and takes a long time to simulate. the problem is made much easier by breaking it up into smaller parts and allowing many people to work on the problem.
Most of this work has been done on the CPU, but PS3 and AMD R5xx GPUs have been able to fold for a while now. Recently support for AMD's R6xx lineup was added as well. NVIDIA GPUs haven't been enabled to run folding@home until now (or very soon anyway). Stanford has finally implemented a version of folding@home with CUDA support that will allow all G80 and higher hardware to run the client.
We've had the chance for the past couple days to play around with a pre-beta version of the folding client, and running folding on NVIDIA hardwarwe is definitly very fast. Work units and protiens are different on CPUs and GPUs because the hardware is suited to different tasks, but to give some perspective a quadcore CPU could simulate tens of nanoseconds of a protien fold, while GPUs can simulate hundreds.
While we don't have the ability to bring you any useful comparative benchmarks right now, Stanford is working on implemeting some standard test cases that can be run on different hardware. This will help us actually compare the performance of different hardware in a meaningful way. Right now giving you numbers to compare CPUs, PS3s, AMD and NVIDIA GPUs would be like directly comparing framerates from different games on different hardware as if they were related.
What we will say is that NVIDIA predicts that the GTX 280 will be capable of simulating something between 5 and 6 hundred nanoseconds of folding per day while CPUs are going to be two orders of magnitude slower. They also show the GTX 280 handily ahead of any current AMD solutions by high margins, but until we can test it ourselves we really don't want to put a finer point on it.
In our tests, we've actually seen the GT200 folding client perform at between 600 and 850ns per day (using the timestamps in the log file to determine performance), so we are quite impressed. Work units complete about every 20 to 25 minutes depending on the protien and whether or not the viewer is running (which does have a significant impact since the calculations and the display are both running on the GPU).
Hardware H.264 Encoding
For years now both ATI and NVIDIA have been boasting about how much better their GPUs were for video encoding than Intel's CPUs. They promised multi-fold speedups in performance but never delivered, so we've been stuck encoding and transcoding videos on CPUs.
With the GT200, NVIDIA has taken one step closer to actually delivering on these promises. We got a copy of a severely limited beta of Elemental Technologies' BadaBOOM Media Converter:
The media converter currently only works on the GeForce GTX 280 and GTX 260, but when it ships there will be support for G80/G92 based GPUs as well. The arguably more frustrating issue with it today is its lack of support for CPU-based encoding, so we can't actually make an apples-to-apples comparison to CPUs or other GPUs. The demo will also only encode up to 2 minutes of video.
With that out of the way however, BadaBOOM will perform H.264 encoding on your GPU. There is still a significant amount of work being done on the CPU during the encode, our Core 2 Extreme QX9770 was at 20 - 30% CPU utilization during the entire encode process, but it's better than the 50 - 100% it would normally be at if we were encoding on the CPU alone.
Then there's the speedup. We can't perform a true apples-to-apples comparison since we can't use BadaBOOM's H.264 encoder on anything else, but compared to using the open source x264 encoder the performance speedup is pretty good. We used AutoMKV and played with its presets to vary quality:
In the worst case scenario, the GTX 280 is around 40% faster than encoding on Intel's fastest CPU alone. In the best case scenario however, the GTX 280 can complete the encoding task in 1/10th the time.
We're not sure where a true apples-to-apples comparison would end up, but somewhere between those two extremes is probably a good guesstimate. Hopefully we'll see more examples of GPU based video encoder applications in the future as there seems to be a lot of potential here. Given how long it takes to encode a Blu-ray movie, we needn't even explain why it's necessary.