CausticOne and the Initial Strategy

Out of the gate, Caustic isn't going after gaming. They aren't even going after the consumer market. Which, we think, it quite a wise move. Ageia showed very clearly the difficulty that can be experienced in trying to get support for a hardware feature that is not widely adopted into games. Game developers only have so many resources and they can't spend their time on aspects of game programming that completely ignore the vast majority of their users. Thus, the target for the first hardware will be in film, video and other offline rendering or simulation areas.

The idea isn't to displace the CPU or the GPU in rendering or even raytracing specifically, but to augment and assist them in an application where rays need to be cast and evaluated.

The CausticOne is a board built using FPGAs (field programmable gate arrays) and 4GB of RAM. Two of the FPGAs (the ones with heatsinks on them) make up SIMD processing units that handle evaluation of rays. We are told that the hardware provides about a 20x speedup over modern CPU based raytracing algorithms. And since this hardware can be combined with CPU based raytracing techniques, this extra speed is added on top of the speed current rendering systems already have. Potentially, we could integrate processing with CausticOne into GPU based raytracing techniques, but this has not yet been achieved. Certainly, if a single PC could make use of CPU, GPU and raytracing processor, we would see some incredible performance.

Caustic Graphics didn't go into much detail on the processor side of things, as they don't want to give away their special sauce. We know it's SIMD, and we know that it is built to handle secondary incoherent rays very efficiently. One of the difficulties in building a fast raytracing engine is that as you look at deeper and deeper bounces of light, we find less coherence between rays that have been traced back from the eye. Essentially, the more bounces we look at the more likely it is that rays near each other will diverge.

Speeding up raytracing on traditional hardware requires building packets of rays to shoot. In packets with high coherence, we see a lot of speed up because we reuse a lot of the work we do. Caustic Graphics tells us that their hardware makes it possible to shoot single rays without using packets and without hurting performance. Secondary incoherent rays also don't show the same type of performance degradation we see on CPUs and especially GPUs.

The CausticOne has a huge amount of RAM on board because, unlike with the GPU, the entire scene needs to be fully maintained in the memory of the card in order to maintain performance. Every ray shot needs to be checked against all the geometry in a scene, and then secondary rays shot from the first intersection need to have information about every other object and light source. With massive amounts of RAM and two FPGAs, we know beyond a shadow of a doubt that Caustic Graphics' hardware must be very fast at branching and very adept at computing rays once they've been traced back to an object.

Development is ongoing, and the CausticOne is slated to go to developers and those who run render farms. This will not be an end user product, but will be available to those who could have a use for it now (like movie studios with render farms or in High Performance Computing (HPC, or big iron) systems). Developers of all kinds will also have access to the hardware in order to start developing for it now before the consumer version hits the streets.

Their business model will be service and support for those who want and need it with CausticOne. Caustic Graphics has extended OpenGL ES 2.0 with GLSL to include support for shooting rays from shaders. They hope that their extensions will eventually become part of OpenGL, which may actually be useful in the future especially if hybrid rasterizing and raytracing rendering engines start to take off. They went with the ES spec, as it's less encumbered by legacy elements present in OpenGL.

On top of OpenGL ES 2.0 with Caustic's extensions, developers can use the CausticRender package which is a higher level set of tools designed on top CausticGL (which is what they're calling their extended GL API). This allows developers to either dig down to the low level or start writing engines more intuitively. There are more tools that Caustic is working on, and they do hope to see content creation ISVs and others start building their own tools as well. They want to make it easy for anyone who already has a raytracing engine to port their software to a hardware accelerated version, and they also want people who need to render raytraced scenes to have software that can take advantage of their hardware.

Focusing on developers and render farms first is a great way to go, as it sort of mirrors the way NVIDIA used the HPC space to help get CUDA exposure. There are applications out there that never have enough power, and offering something that can provide an order of magnitude speed up is very attractive in that space. Getting software support in the high end content creation and rendering packages would definitely help get Caustic noticed and possibly enable them to trickle down into more consumer oriented markets. But that brings us to next year and the CausticTwo.

What is Raytracing? CausticTwo, the Long Term, and Preliminary Thoughts
Comments Locked

48 Comments

View All Comments

  • jido - Thursday, April 30, 2009 - link

    With fast branching and parallel computing, as well as a good amount of RAM, there has to be other applications for this card. Could you do encryption/decryption or AI maybe?
  • lemonadesoda - Thursday, April 30, 2009 - link

    Wouldnt the "clearspeed e710" PCI board outperform this? The hardware is there. This company should focus on the software libraries.

    The CATS™ 700 1U rack module comtains 12 of these and delivers over 1 teraflops. With the right software you probably could do near-real time raytracing (seconds rather than minutes or hours per scene). Farm the CATS 700 and yes you could do real time raytracing on a one or two second "lag". That is, use 20 of these things, all rendering separate frames, one frame apart. It will take you a few seconds to build the first scene, but then you will have the rest ready to show real-time.

    How to manage this? Well that's the software issue these guys should be working on... NOT reinventing hardware that is already out there.

    AFTER they get it working on CATS then perhaps they mighht consider developing their own optimised hardware. But that should come SECOND not FIRST.

    Back to MBA skool boyz.
  • kyleb2112 - Saturday, April 25, 2009 - link

    If this can appreciably speed up raytracing, it'll find a market in the same demo that buys workstation graphics cards. But the way cpu cores are multiplying, they'll have to hurry up. Nothing loves multicores more than 3D rendering, and once we've got 32-core boxes this tech may be obsolete.
  • Draven31 - Thursday, April 23, 2009 - link

    ART PURE/Renderdrive anyone? Its been done, and the last couple times it has been tried, it was cheaper to buy six render nodes that ended up being the same speed, within six months. If its gonna be a year before they are even shipping cards to consumers, i doubt they are going to get *anywhere*
  • 7Enigma - Wednesday, April 22, 2009 - link

    ...and no one has a clue what that will be. Comon people, this article was little more than a puff press piece; interesting to read and make geeks giddy, but no actual substance. To be honest, this should be a blog post. Yes the description of the differences between ray-tracing and rasterization are nice and all, but there is no meat to this product at the time being.

    So no, it's not 20X faster, or 100X faster, or 2% faster, it doesn't yet exist and until independant testing has been done, I won't believe a word I read.
  • simtex - Wednesday, April 22, 2009 - link

    All the people that claim, this is a bad idea, intel will just copy it an make it run on their CPU. Well people the CPU have other tasks too, if ray-tracing is to be used in games, I would be rather anoyed if I couldnt get physics and AI calculations because my CPU had to do all the rendering work. So a card to off-load some of the computations would surely be a nice addition. Of course then Intel could cooperate with nvidia and offload physics and AI calculations to the graphic card, but that doesn't seems very likely atm.

    Also Caustics doesn't claim that this is a production board, infact they state that this is a prototype, and that their final product will use ASICs, and not FPGAs. Furthermore Caustics design does in fact consider the bandwidth requirements for ray-tracing, actually they claim that their algorithms are specially designed to cope with the limited bandwidth, and that this in their major achievement. Personally i think they use some sort of ray-bundling, although this have also been implemented in software ray-tracers they must have invented some new tricks to make it even better.

    Another great aspect of ray-tracing is that the frame rate is more dependent on the number of pixels you wish to render, than it is on the number of triangles on the scene, in contrary to rasterization.
  • ssj4Gogeta - Wednesday, April 22, 2009 - link

    Well I don't know much about all this, but I saw their video. The co-founder says that ray-tracing isn't a compute problem anymore, and that they looked at it in a different way. So, I'm wondering, if they're using a new algorithm or something that's making all the difference, can't Intel use their general purpose Larrabee to simulate that?
  • simtex - Thursday, April 23, 2009 - link

    Depends whether the algorithm is public available, as I understand the claims of Caustics is that it's their own algorithm and propably patented.
  • slusallek - Wednesday, April 22, 2009 - link

    It is strange to see that someone proposes a hardware architecture that by design is bandwidth limited. They have separated ray tracing at the point where the bandwidth is highest -- whihc seems like not a really smart move.

    By doing the ray traversal and intersection on their card and the shading on the GPU they must constantly transfer ray data between the two: transfer the hit point of each ray/pixel to the GPU (point, normal, texture coordinates, shader ID, ...) and then transfer any rays newly generated by a shader back to their chip (origin, direction, min/max_dir, ...). Each of those transfers is easily between 30 to 60 bytes per ray.

    So for a HD screen this is easily 100MB for just a single ray generation and only one sample per pixel. Given a PCI 1.0 4x bandwidth of 1GB/s this gives a theoretical maximum of just 5 fps -- and we have not done any work yet. No AA, no shadow rays, reflections, they all generate multiples of this in bandwidth. Even with PCI 3.0 and 16x lanes this will be a huge bandwidth issue.

    Let me also just point out that one of the first RTRT papers on visualizing car headlights (http://graphics.cs.uni-sb.de/Publications/2002/200...">http://graphics.cs.uni-sb.de/Publications/2002/200... already achieved up to 10 fps at the same video resolution. Note that the headlight used up to 25 rays per pixel and lots of complex multiple reflection and refraction. It ran on a cluster of 16 nodes with dual CPU single-core Athlon 1800s providing a total of 32 cores.

    Compare this with a latest machine used by Casutics with likely Dual Quad-core CPUs giving something 16 cores (with hyperthreading) each one easily providing a multiple of the FLOPS of the old Athlon. So it seems their performance is not even reaching what has already been done 7 years ago.

    So in summary, I am not really impressed by what Caustics claims. Their hardware architecture is severely limited by design and their software results are way behind what has been done many years ago.
  • nubie - Tuesday, April 21, 2009 - link

    I for one am looking forward to a time when games no longer have ugly polygons.

    Even in recent games cylindrical objects are pictured as comprised of as few as 8 sides.

    If you can send the raw math into the equation for the rays to hit it will make the whole thing much more powerful.


    I would love to see this and Larrabee succeed, progress and competition are always good for the consumer.

Log in

Don't have an account? Sign up now