Original Link: http://www.anandtech.com/show/2095
The GPU Advances: ATI's Stream Processing & Folding@Homeby Ryan Smith on September 30, 2006 8:00 PM EST
- Posted in
In the continual progression of GPU technology, we've seen GPUs become increasingly useful at generalized tasks as they have added flexibility for game designers to implement more customized and more expansive graphical effects. What started out as a simple fixed-function rendering process, where texture and vertex data were fed into a GPU and pixels were pushed out, has evolved into a system where a great deal of processing takes place inside the GPU. The modern GPU can be used to store and manipulate data in ways that go far beyond just quickly figuring out what happens when multiple textures are mixed together.
What GPUs have evolved into today are devices that are increasingly similar to CPUs in their ability to do more things, while still specializing in only a subset of abilities. Starting with Shader Model 2.0 on cards like the Radeon 9700 and continuing with Shader Model 3.0 and today's latest cards, GPUs have become floating-point powerhouses that are able to do most floating-point calculations many times faster than a CPU, a necessity as 3D rendering is a very FP-intensive process. At the same time, we have seen GPUs add programming constructs like looping, branching, and other abilities previously only used on CPUs, but which are crucial to enable effective programmer use of the GPU resources . In short, today's GPUs have in many ways become extremely powerful floating-point processors that have been used for 3D rendering but little else.
Both ATI and NVIDIA have been looking to put the expanded capabilities of their GPUs to good use, with varying success. So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far.
Meanwhile the academic world has been working on designing and utilizing custom-built floating-point hardware for years for their own research purposes. The class of hardware related to today's topic, stream processors, are extremely powerful floating-point processors able to process whole blocks of data at once, where CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!+, but these efforts still pale in comparison to what custom hardware has been able to do. This same progress was happening on GPUs, only in a different direction, and until recently GPUs remained untapped as anything other than a graphics tool.
Today's GPUs have evolved into their own class of stream processors, sharing much in common with the customized hardware of researchers, as a result of the 3D rendering process also being a streaming task. The key difference here however is that while GPU designers have cut a couple of corners where they don't need certain functionality for 3D rendering as compared to what a custom processor can do, by and large they have developed extremely fast stream processors that are just as fast as custom hardware but due to economies of scale are many, many times cheaper than a custom design.
It's here where ATI is looking for new ideas on what to run on their GPUs as part of their new stream computing initiative. The academic world is full of such ideas, chomping at the bit to run their experiments on more than a handful of customized hardware designs. One such application, and part of the star of today's announcement, is Folding@Home, a Stanford research project designed to simulate protein folding in order to unlock the secrets of diseases caused by flawed protein folding.
For several years now, Dr. Vijay Pande of Stanford has been leading the Folding@Home project in order to research protein folding. Without diving unnecessarily into the biology of his research, as proteins are produced from their basic building blocks - amino acids - they must go through a folding process to achieve the right shape to perform their intended function. However, for numerous reasons protein folding can go wrong, and when it does it can cause various diseases as malformed proteins wreck havoc in the body.
|Click to enlarge|
Although Folding@Home's research involves multiple diseases, the primary disease they are focusing on at this point is Alzheimer's Disease, a brain-wasting condition affecting primarily older people where they slowly lose the ability to remember things and think clearly, eventually leading to death. As Alzheimer's is caused by malformed proteins impairing normal brain functions, understanding how exactly Alzheimer's occurs - and more importantly how to prevent and cure it - requires a better understanding on how proteins fold, why they fold incorrectly, and why malformed proteins cause even more proteins to fold incorrectly.
The biggest hurdle in this line of research is that it's very computing intensive: a single calculation can take 1 million days (that's over 2700 years) on a fast CPU. Coupled with this is the need to run multiple calculations in order to simulate the entire folding process, which can take upwards of several seconds. Even splitting this load among processors in a supercomputer, the process is still too computing intensive to complete in any reasonable amount of time; a processor will simulate 1 nanosecond of folding per day, and even if all grant money given out by the United States government was put towards buying supercomputers, it wouldn't even come close to being enough.
This is where the "@Home" portion of Folding@Home comes in. Needing even more computing power than they could hope to buy, the Folding@Home research team decided to try to spread processing to computers all throughout the world, in a process called distributed computing. Their hopes were that average computer users would be willing to donate spare/unused processor cycles to the Folding@Home project by running the Folding@Home client, which would grab small pieces of data from their central servers and return it upon completion.
The call for help was successful, as computer owners were more than willing to donate computer cycles to help with this research, and hopefully help in coming up with a way to cure diseases like Alzheimer's. Entire teams formed in a race to see who could get more processing done, including our own Team AnandTech, and the combined power of over two-hundred thousand CPUs resulted in the Folding@Home project netting over 200 Teraflops (one trillion Floating-point Operations Per Second) of sustained performance.
While this was a good enough start to do research, it was still ultimately falling short of the kind of power the Folding@Home research group needed to do the kind of long-runs they needed along side short-run research that the Folding@Home community could do. Additionally, as processors have recently hit a cap in terms of total speed in megahertz, AMD and Intel have been moving to multiple-core designs, which introduce scaling problems for the Folding@Home design and is not as effective as increasing clockspeeds.
Since CPUs were not growing at speeds satisfactory for the Folding@Home research group, and they were still well short of their goal in processing power, the focus has since returned to stream processors, and in turn GPUs. As we mentioned previously, the massive floating-point power of a GPU is well geared towards doing research work, and in the case of Folding@Home, they excel in exactly the kind of processing the project requires. To get more computing power, Folding@Home has now turned towards utilizing the power of the GPU.
Enter the GPU
Modern GPUs such as the R580 core powering ATI's X19xx series have upwards of 48 pixel shading units, designed to do exactly what the Folding@Home team requires. With help from ATI, the Folding@Home team has created a version of their client that can utilize ATI's X19xx GPUs with very impressive results. While we do not have the client in our hands quite yet, as it will not be released until Monday, the Folding@Home team is saying that the GPU-accelerated client is 20 to 40 times faster than their clients just using the CPU. Once we have the client in our hands, we'll put this to the test, but even a fraction of this number would represent a massive speedup.
|Click to enlarge|
With this kind of speedup, the Folding@Home research group is looking to finally be able to run simulations involving longer folding periods and more complex proteins that they couldn't run before, allowing them to research new proteins that were previously inaccessible. This implementation also allows them to finally do some research on their own, without requiring the entire world's help, by building a cluster of (relatively) cheap video cards to do research, something they've never been able to do before.
Unfortunately for home users, for the time being, the number of those who can help out by donating their GPU resources is rather limited. The first beta client to be released on Monday only works on ATI GPUs, and even then only works on single X19xx cards. The research group has indicated that they are hoping to expand this to CrossFire-enabled platforms soon, along with less-powerful ATI cards.
The situation for NVIDIA users however isn't as rosy, as while the research group would like to expand this to use the latest GeForce cards, their current attempts at implementing GPU-accelerated processing on those cards has shown that NVIDIA's cards are too slow compared to ATI's to be used. Whether this is due to a subtle architectural difference between the two, or if it's a result of ATI's greater emphasis on pixel shading with this generation of cards as compared to NVIDIA we're not sure, but Folding@Home won't be coming to NVIDIA cards as long as the research group can't solve the performance problem.
The Folding@Home project is the first of what ATI is hoping will be many projects and applications, both academic and commercial, that will be able to tap the power of GPUs. Given the results showcased by the Folding@Home project, the impact on the applications that would work well on a GPU could be huge. In the future we hope to be testing technologies such as GPU-accelerated physics processing for which both ATI and NVIDIA have promised support, and other yet to be announced applications that utilize stream processing techniques.
It's been a longer wait than we were hoping for, but we're finally seeing the power of the GPU unleashed as was promised so long ago, starting with Folding@Home. As GPUs continue to grow in abilities and power, it should come as no surprise that ATI, NVIDIA, and their CPU-producing counterparts are looking at how to better connect GPUs and other such coprocessors to the CPU in order to further enable this kind of processing and boost its performance. As we see AMD's Torrenza technology and Intel's competing Geneseo technology implemented in computer designs, we'll no doubt see more applications make use of the GPU, in what could be one of the biggest-single performance improvements in years. The GPU is not just for graphics any more.
As for our readers interested in trying out the Folding@Home research group's efforts in GPU acceleration and contributing towards understanding and finding a cure for Alzheimer's, the first GPU beta client is scheduled to be released on Monday. For more information on Folding@Home or how to use the client once it does come out, our Team AnandTech members over in our Distributed Computing forum will be more than happy to give a helping hand.