Original Link: http://www.anandtech.com/show/2752
Today we have a new add-in board coprocessor in town. Caustic Graphics has announced their CausticOne hardware and CausticGL API which will enable hardware accelerated raytracing. We are reminded of Ageia's venture into dedicated hardware for physics, but Caustic Graphics seems to be taking a more balanced approach to bringing their hardware to market. The goal is to start at the top where cost is no object and get developers interested in and working with their hardware before they bring it to the end user.
Pixar and other studios that make heavy use of computer generated animation for films tend to have render farms that can take seconds, minutes or even hours to render. With full length films lasting about 150000 frames (plus or minus), that time really adds up. Those that need to render one frame as near reality as possible (say car designers doing preliminary visualization of a new model) can kick off rendering jobs that take days to complete. These guys put tons of cash into their computer systems. Time is money and if Caustic can save these guys more time than it would cost them to buy the hardware and port their software, then Caustic will do well.
The long term goals might have something to do with gaming, but we definitely aren't looking at that option right now. By trying to penetrate the market at the back end like this, Caustic Graphics may avoid the pitfalls we saw Aegia run into. Of course, at this point it is unclear whether or not the end user will even need a dedicated raytracing card by the time the hardware makes it to market. With current GPUs getting faster all the time, CPUs becoming increasingly parallel, and Larrabee on the horizon, there are quite a number of factors that will affect the viability of a part like this in consumer space.
Regardless, Caustic Graphics is here and ready to start making an impact. Their SDK should be available to developers today, with hardware soon to follow. Before we take a deeper look at what Caustic Graphics is offering, let's talk a little bit about the differences between rasterization (what current GPUs do) and raytracing (what the Caustic Graphics hardware will accelerate).
What is Raytracing?
In all modern forms of 3D rendering for display on a computer, the goal is to determine the color of every pixel on the screen as fast as possible. Raytracing is simply a method that can be used to do so. Currently, the most common method for rendering realtime 3D graphics is rasterization. There are fundamental differences between the way rasterization and raytracing go about determining pixel color.
With rasterization and raytracing, we start with geometry. Triangles to be specific. We have a scene made up of triangles and shader programs are used to determine the color at any given point on every triangle. With a rasterizer, we loop through every triangle and use math to project the triangle onto the screen. This is like taking a 3D scene, and flattening it out. We find out what pixels every triangle overlaps and save the depth values for later when we shade those pixels. We use lighting algorithms, texture maps, and the location of the pixel on the triangle itself to do the shading.
Unlike rasterization, raytracing starts with the pixels. If we draw a line from a central eye (or camera) position through each pixel, we can use math to determine what triangles this line (called a primary ray) intersects. For every triangle that our primary intersects, we save the position of the intersection. After all our geometry has been checked for intersection, we keep the intersection closest to the viewer (ignoring transparency for a minute). This process means lots of conditionals and branching in addition to the compute power required by whatever shader programs are used.
From here, like with rasterization, we can use shaders to determine the color of our pixel, but the input the shaders use can be other rays (these are secondary rays) that have been cast (or shot) from our saved closest point of intersection. These secondary rays can be used to do lots of things like look for shadows (shoot a ray at every light source and see if that light is blocked by something), and reflections (shoot a ray at the angle the primary ray would reflect from and start the process over again). Rays used to do reflection, refraction, radiosity, and other effects can end up generating a good number of secondary rays. The key advantages to the rendering quality of raytracing lie in secondary rays, but these are also what add the incredible complexity to raytracing renderers.
Calculating secondary rays is particularly time consuming, as not only do we have the same branching issues, but we are less likely to see speed up from grouping rays together into packets. Its easy to see that when we shot a lot of primary rays (say four for each pixel for some antialiasing), include a lot of bounces for reflective surfaces (lots of secondary rays becoming increasingly incoherent), have a lot of geometry (and thus lots of things to check for intersection), lots of light sources (which means lots of shadows), have translucent material with refraction indexes, or treat other lit objects as light sources (radiosity), computing our scene has a ton of branches and a ton of computation.
Typically CPUs are really good for branching and sequential operations. On the flipside, GPUs are great for situations with tons of parallel independent operations with little branching. Putting both lots of branching and lots of parallel independent operations together can lead to an algorithm that can't benefit from the full potential of either CPUs or GPUs. Caustic Graphics has put together hardware that attempts to create a perfect environment for raytracing, approaching the problem differently than either a CPU or a GPU. Unfortunately, they didn't go into much detail about the architecture of their hardware. But they did tell us some things and we can speculate on others.
CausticOne and the Initial Strategy
Out of the gate, Caustic isn't going after gaming. They aren't even going after the consumer market. Which, we think, it quite a wise move. Ageia showed very clearly the difficulty that can be experienced in trying to get support for a hardware feature that is not widely adopted into games. Game developers only have so many resources and they can't spend their time on aspects of game programming that completely ignore the vast majority of their users. Thus, the target for the first hardware will be in film, video and other offline rendering or simulation areas.
The idea isn't to displace the CPU or the GPU in rendering or even raytracing specifically, but to augment and assist them in an application where rays need to be cast and evaluated.
The CausticOne is a board built using FPGAs (field programmable gate arrays) and 4GB of RAM. Two of the FPGAs (the ones with heatsinks on them) make up SIMD processing units that handle evaluation of rays. We are told that the hardware provides about a 20x speedup over modern CPU based raytracing algorithms. And since this hardware can be combined with CPU based raytracing techniques, this extra speed is added on top of the speed current rendering systems already have. Potentially, we could integrate processing with CausticOne into GPU based raytracing techniques, but this has not yet been achieved. Certainly, if a single PC could make use of CPU, GPU and raytracing processor, we would see some incredible performance.
Caustic Graphics didn't go into much detail on the processor side of things, as they don't want to give away their special sauce. We know it's SIMD, and we know that it is built to handle secondary incoherent rays very efficiently. One of the difficulties in building a fast raytracing engine is that as you look at deeper and deeper bounces of light, we find less coherence between rays that have been traced back from the eye. Essentially, the more bounces we look at the more likely it is that rays near each other will diverge.
Speeding up raytracing on traditional hardware requires building packets of rays to shoot. In packets with high coherence, we see a lot of speed up because we reuse a lot of the work we do. Caustic Graphics tells us that their hardware makes it possible to shoot single rays without using packets and without hurting performance. Secondary incoherent rays also don't show the same type of performance degradation we see on CPUs and especially GPUs.
The CausticOne has a huge amount of RAM on board because, unlike with the GPU, the entire scene needs to be fully maintained in the memory of the card in order to maintain performance. Every ray shot needs to be checked against all the geometry in a scene, and then secondary rays shot from the first intersection need to have information about every other object and light source. With massive amounts of RAM and two FPGAs, we know beyond a shadow of a doubt that Caustic Graphics' hardware must be very fast at branching and very adept at computing rays once they've been traced back to an object.
Development is ongoing, and the CausticOne is slated to go to developers and those who run render farms. This will not be an end user product, but will be available to those who could have a use for it now (like movie studios with render farms or in High Performance Computing (HPC, or big iron) systems). Developers of all kinds will also have access to the hardware in order to start developing for it now before the consumer version hits the streets.
Their business model will be service and support for those who want and need it with CausticOne. Caustic Graphics has extended OpenGL ES 2.0 with GLSL to include support for shooting rays from shaders. They hope that their extensions will eventually become part of OpenGL, which may actually be useful in the future especially if hybrid rasterizing and raytracing rendering engines start to take off. They went with the ES spec, as it's less encumbered by legacy elements present in OpenGL.
On top of OpenGL ES 2.0 with Caustic's extensions, developers can use the CausticRender package which is a higher level set of tools designed on top CausticGL (which is what they're calling their extended GL API). This allows developers to either dig down to the low level or start writing engines more intuitively. There are more tools that Caustic is working on, and they do hope to see content creation ISVs and others start building their own tools as well. They want to make it easy for anyone who already has a raytracing engine to port their software to a hardware accelerated version, and they also want people who need to render raytraced scenes to have software that can take advantage of their hardware.
Focusing on developers and render farms first is a great way to go, as it sort of mirrors the way NVIDIA used the HPC space to help get CUDA exposure. There are applications out there that never have enough power, and offering something that can provide an order of magnitude speed up is very attractive in that space. Getting software support in the high end content creation and rendering packages would definitely help get Caustic noticed and possibly enable them to trickle down into more consumer oriented markets. But that brings us to next year and the CausticTwo.
CausticTwo, the Long Term, and Preliminary Thoughts
Looking toward the future, Caustic Graphics will bring out the CausticTwo next year. The major differences with this hardware will be with the replacement of the FPGAs with ASICs (application specific integrated circuit - a silicon chip like a CPU or a GPU). This will enable an estimated additional 14x performance improvement as ASICs can run much much faster than FPGAs. We could also see more RAM on board as well. This would bring the projected performance to over 200x the speed of current CPU based raytracing performance.
Of course, next year CPUs will be faster, but based on that kind of projection we are still looking at about two orders of magnitude more performance than CPU based algorithms. This means that instead of seconds per frame, we can start talking about frames per second. Unless we want even more photorealistic images. That will still take a very long time.
The CausticTwo will also be available to end users. Hopefully by this time raytraceing plugins for 3D Studio Max, Maya, and all the other content creation tools that some prosumers and students dabble in will be hardware accelerated on Caustic Graphics hardware. And maybe at this point we'll start to see some realtime raytracing engine demos. Maybe.
Planting their flag firmly in film, video and advanced visualization markets makes the most sense and holds the most potential for long term viability. Jumping completely into games won't be the best way to go at this point -- it needs to be either a gradual adoption or they need to get their hardware into future game consoles. Pushing PC gaming before console adoption will likely prove as just as difficult for Caustic as it did for Ageia, and might not be the best use of resources. Especially if they can cut out a niche in the higher end space.
But they do have their eye on games at some point and are already talking about game consoles. While hardware, service and support for render farms, large scale visualization and those who need the hyper-realism that raytracing can offer has the potential to create a sustainable business, conceiving a piece of hardware that becomes nearly required for gaming (like the GPU) would be the holy grail in this case. It's not likely, but you can bet it's at the back of their mind. Staying focused on more modest goals is definitely a better way to stay in business though.
But they could go another direction. They could try and get themselves acquired by a 3rd party like Ageia did. Of course, NVIDIA killed Ageia's hardware business, and it would be nice if Caustic's hardware technology survived any acquisition. But that is often times how these things go. We'll simply have to wait and see.
There is another factor looming on the horizon as well. As we mentioned earlier, raytracing is very branch heavy, memory dependent and compute heavy. It's a beast of an algorithm that seems to always have a bottleneck no matter what it is running on. Though it will still be a while before we have hardware, Larrabee might just as well be a solution to the raytracing option. The Larrabee architecture tries to blend some of the CPU and GPU approach to processing, and the hybrid may enable a platform that competes with Caustic when it hits the scene. Memory organization and size are probably still going to favor Caustic, but we've continually heard rumblings that raytracing on Larrabee will be where it's at. It will certainly be interesting to compare the two approaches when they both arrive.
Beyond Larrabee, the long term plan for many core CPUs could include application specific processors. We will see combined CPUs and GPUs in the near future, and maybe we'll see dedicated raytracing units integrated as one or more of the many cores on a CPU down the road. The really long term picture is a bit more fuzzy, but they've got short term potential in the markets that need all the power they can get.
For now, we don't have hardware and we don't have developer feedback either. Caustic is going to get us a copy of their SDK so we can play around with it a bit and evaluate it. But as for knowing how applicable or useful Caustic Graphics hardware will be in the realworld, we just don't have the information we need yet.
Here's to hoping for the best.