### What is Raytracing?

In all modern forms of 3D rendering for display on a computer, the goal is to determine the color of every pixel on the screen as fast as possible. Raytracing is simply a method that can be used to do so. Currently, the most common method for rendering realtime 3D graphics is rasterization. There are fundamental differences between the way rasterization and raytracing go about determining pixel color.

With rasterization and raytracing, we start with geometry. Triangles to be specific. We have a scene made up of triangles and shader programs are used to determine the color at any given point on every triangle. With a rasterizer, we loop through every triangle and use math to project the triangle onto the screen. This is like taking a 3D scene, and flattening it out. We find out what pixels every triangle overlaps and save the depth values for later when we shade those pixels. We use lighting algorithms, texture maps, and the location of the pixel on the triangle itself to do the shading.

Click to Enlarge

Unlike rasterization, raytracing starts with the pixels. If we draw a line from a central eye (or camera) position through each pixel, we can use math to determine what triangles this line (called a primary ray) intersects. For every triangle that our primary intersects, we save the position of the intersection. After all our geometry has been checked for intersection, we keep the intersection closest to the viewer (ignoring transparency for a minute). This process means lots of conditionals and branching in addition to the compute power required by whatever shader programs are used.

Click to Enlarge

From here, like with rasterization, we can use shaders to determine the color of our pixel, but the input the shaders use can be other rays (these are secondary rays) that have been cast (or shot) from our saved closest point of intersection. These secondary rays can be used to do lots of things like look for shadows (shoot a ray at every light source and see if that light is blocked by something), and reflections (shoot a ray at the angle the primary ray would reflect from and start the process over again). Rays used to do reflection, refraction, radiosity, and other effects can end up generating a good number of secondary rays. The key advantages to the rendering quality of raytracing lie in secondary rays, but these are also what add the incredible complexity to raytracing renderers.

Click to Enlarge

Calculating secondary rays is particularly time consuming, as not only do we have the same branching issues, but we are less likely to see speed up from grouping rays together into packets. Its easy to see that when we shot a lot of primary rays (say four for each pixel for some antialiasing), include a lot of bounces for reflective surfaces (lots of secondary rays becoming increasingly incoherent), have a lot of geometry (and thus lots of things to check for intersection), lots of light sources (which means lots of shadows), have translucent material with refraction indexes, or treat other lit objects as light sources (radiosity), computing our scene has a ton of branches and a ton of computation.

Click to Enlarge

Typically CPUs are really good for branching and sequential operations. On the flipside, GPUs are great for situations with tons of parallel independent operations with little branching. Putting both lots of branching and lots of parallel independent operations together can lead to an algorithm that can't benefit from the full potential of either CPUs or GPUs. Caustic Graphics has put together hardware that attempts to create a perfect environment for raytracing, approaching the problem differently than either a CPU or a GPU. Unfortunately, they didn't go into much detail about the architecture of their hardware. But they did tell us some things and we can speculate on others.

Index CausticOne and the Initial Strategy

• #### jido - Thursday, April 30, 2009 - link

With fast branching and parallel computing, as well as a good amount of RAM, there has to be other applications for this card. Could you do encryption/decryption or AI maybe? Reply

Wouldnt the "clearspeed e710" PCI board outperform this? The hardware is there. This company should focus on the software libraries.

The CATS™ 700 1U rack module comtains 12 of these and delivers over 1 teraflops. With the right software you probably could do near-real time raytracing (seconds rather than minutes or hours per scene). Farm the CATS 700 and yes you could do real time raytracing on a one or two second "lag". That is, use 20 of these things, all rendering separate frames, one frame apart. It will take you a few seconds to build the first scene, but then you will have the rest ready to show real-time.

How to manage this? Well that's the software issue these guys should be working on... NOT reinventing hardware that is already out there.

AFTER they get it working on CATS then perhaps they mighht consider developing their own optimised hardware. But that should come SECOND not FIRST.

Back to MBA skool boyz.
• #### kyleb2112 - Saturday, April 25, 2009 - link

If this can appreciably speed up raytracing, it'll find a market in the same demo that buys workstation graphics cards. But the way cpu cores are multiplying, they'll have to hurry up. Nothing loves multicores more than 3D rendering, and once we've got 32-core boxes this tech may be obsolete. Reply
• #### Draven31 - Thursday, April 23, 2009 - link

ART PURE/Renderdrive anyone? Its been done, and the last couple times it has been tried, it was cheaper to buy six render nodes that ended up being the same speed, within six months. If its gonna be a year before they are even shipping cards to consumers, i doubt they are going to get *anywhere* Reply
• #### 7Enigma - Wednesday, April 22, 2009 - link

...and no one has a clue what that will be. Comon people, this article was little more than a puff press piece; interesting to read and make geeks giddy, but no actual substance. To be honest, this should be a blog post. Yes the description of the differences between ray-tracing and rasterization are nice and all, but there is no meat to this product at the time being.

So no, it's not 20X faster, or 100X faster, or 2% faster, it doesn't yet exist and until independant testing has been done, I won't believe a word I read.
• #### simtex - Wednesday, April 22, 2009 - link

All the people that claim, this is a bad idea, intel will just copy it an make it run on their CPU. Well people the CPU have other tasks too, if ray-tracing is to be used in games, I would be rather anoyed if I couldnt get physics and AI calculations because my CPU had to do all the rendering work. So a card to off-load some of the computations would surely be a nice addition. Of course then Intel could cooperate with nvidia and offload physics and AI calculations to the graphic card, but that doesn't seems very likely atm.

Also Caustics doesn't claim that this is a production board, infact they state that this is a prototype, and that their final product will use ASICs, and not FPGAs. Furthermore Caustics design does in fact consider the bandwidth requirements for ray-tracing, actually they claim that their algorithms are specially designed to cope with the limited bandwidth, and that this in their major achievement. Personally i think they use some sort of ray-bundling, although this have also been implemented in software ray-tracers they must have invented some new tricks to make it even better.

Another great aspect of ray-tracing is that the frame rate is more dependent on the number of pixels you wish to render, than it is on the number of triangles on the scene, in contrary to rasterization.
• #### ssj4Gogeta - Wednesday, April 22, 2009 - link

Well I don't know much about all this, but I saw their video. The co-founder says that ray-tracing isn't a compute problem anymore, and that they looked at it in a different way. So, I'm wondering, if they're using a new algorithm or something that's making all the difference, can't Intel use their general purpose Larrabee to simulate that? Reply
• #### simtex - Thursday, April 23, 2009 - link

Depends whether the algorithm is public available, as I understand the claims of Caustics is that it's their own algorithm and propably patented. Reply
• #### slusallek - Wednesday, April 22, 2009 - link

It is strange to see that someone proposes a hardware architecture that by design is bandwidth limited. They have separated ray tracing at the point where the bandwidth is highest -- whihc seems like not a really smart move.

By doing the ray traversal and intersection on their card and the shading on the GPU they must constantly transfer ray data between the two: transfer the hit point of each ray/pixel to the GPU (point, normal, texture coordinates, shader ID, ...) and then transfer any rays newly generated by a shader back to their chip (origin, direction, min/max_dir, ...). Each of those transfers is easily between 30 to 60 bytes per ray.

So for a HD screen this is easily 100MB for just a single ray generation and only one sample per pixel. Given a PCI 1.0 4x bandwidth of 1GB/s this gives a theoretical maximum of just 5 fps -- and we have not done any work yet. No AA, no shadow rays, reflections, they all generate multiples of this in bandwidth. Even with PCI 3.0 and 16x lanes this will be a huge bandwidth issue.

Let me also just point out that one of the first RTRT papers on visualizing car headlights (http://graphics.cs.uni-sb.de/Publications/2002/200...">http://graphics.cs.uni-sb.de/Publications/2002/200... already achieved up to 10 fps at the same video resolution. Note that the headlight used up to 25 rays per pixel and lots of complex multiple reflection and refraction. It ran on a cluster of 16 nodes with dual CPU single-core Athlon 1800s providing a total of 32 cores.

Compare this with a latest machine used by Casutics with likely Dual Quad-core CPUs giving something 16 cores (with hyperthreading) each one easily providing a multiple of the FLOPS of the old Athlon. So it seems their performance is not even reaching what has already been done 7 years ago.

So in summary, I am not really impressed by what Caustics claims. Their hardware architecture is severely limited by design and their software results are way behind what has been done many years ago.
• #### nubie - Tuesday, April 21, 2009 - link

I for one am looking forward to a time when games no longer have ugly polygons.

Even in recent games cylindrical objects are pictured as comprised of as few as 8 sides.

If you can send the raw math into the equation for the rays to hit it will make the whole thing much more powerful.

I would love to see this and Larrabee succeed, progress and competition are always good for the consumer.