Ray Tracing in One Weekend

Executable Below!

As an exercise for learning ray tracing I have implemented most of Ray Tracing in One Weekend, one of the most beloved ray tracing introduction books, in an OpenGL compute shader. The idea is to create an application that shoots multiple rays per pixel and checks for intersections with spheres. This way we procedurally generate a frame, avoiding traditional triangle rendering. I have built up quite a nice C++ framework over time for creating new OpenGL applications, loading shaders etc. so I could focus on the actual GLSL ray tracing program and only need a bit of C++ for dynamically adding, removing and altering the rendered objects.

Rasterization

Traditional rendering is called rasterization: It is the process of taking vertices (points in 3D space, called 'vectors' in math) that make up a triangle and converting it to monitor pixels. We take the triangle and project it flat onto the screen. This technique is used the for pretty much every modern game; Everything you see on the screen is made up of millions of triangles all being rasterized every frame.

Image above: Doom Eternal (2020) id Tech 7, Image below: Rasterizing triangles

Ray Tracing

Modern graphics cards have had years of hardware development to optimize the rasterization pipeline and I believe that many decisions made were based on a clear direction. Though, the mathematical concept of ray tracing has been around since before I was born and it is good to see companies like Nvidia lead the way to a ray traced future. So how does it differ from rasterization?

As previously explained, rasterization fills every pixel by projecting triangles flat onto the screen. In ray tracing, we shoot a ray for every pixel on the screen and check for the closest triangle we intersect (or not). In an oversimplified example, we look up the intersected triangles' colour and write that to the pixel we shot from.

In pseudo-code it is something like this:

for pixel on screen:
   ray = ray(pixel)
    for object in world:
       for triangle in object:
           if ray.intersects(triangle):
               // check if its the closest intersection
               // if so, get the colour of the triangle

You will notice that this involves a triple nested for loop, and with objects being made up of millions of triangles, you end up shooting billions of rays. Nvidia attempts to speed up this process with dedicated hardware executing the intersection algorithms, but because of its recursive nature it will take years before we get to fully ray traced frames.
Now let's talk spheres. In this demo, we define the world objects to just be spheres and it just so happens that, with the power of linear algebra, we can directly check for intersections:

for pixel on screen:
   ray = ray(pixel)
    for sphere in world:
           if ray.intersects(sphere):
               // check if its the closest intersection
               // if so, get the colour of the triangle

This cuts out the entire triangle loop. So how do we check a ray directly against a sphere? Isn't a sphere made up of thousands of triangles?

Oh god, math

I will actually save you the boring parts because well, they're boring.. and there's tons of online material that explains this picture in greater detail:

Essentially, a sphere can be defined by just its parameters instead of triangles: a single vector representing the origin of the sphere ( 'C' in the image above) and a radius. A ray is simply a line in 3D space between two points, one of the points being the ray origin ('O') and the other the direction ('D') the ray travels in. With the power of Pythagoras theorem we can find the float value ('P') which describes how far we have to walk along the ray to get to the intersection. calculating the intersection point boils down to:

vector3D intersection = ray.origin + ray.direction * p

Finding P is the hard part.

Bounces

Using the previous techniques we can determine if a ray intersects a sphere and memorize the colour if we did. We then create a new ray, with its origin at the intersection point and shoot in the opposite direction. This way we recursively keep bouncing until we miss. In real life, light loses some of its 'power' every time it bounces off of a surface, so every time we bounce we add the colour of the sphere we just hit to the previous colour, but halving it each time:

rgba colour = rgba()
for pixel on screen:
   ray = ray(pixel)
    for sphere in world:
           if ray.intersects(sphere):
                if(closest intersection):
                    colour += sphere.colour * 0.5

This is a gross over simplification, so if you want to learn more about this concept look into Physically Based Rendering. It's basically a bunch of math to determine how much power is lost and how rays bounce off of surfaces to determine reflections. For example, glass is perfectly reflective, so rays bouncing off of glass shoot directly in the opposite direction. Rougher materials might bounce off in different angles.

What is OpenGL?

OpenGL is a way to interact with graphics cards/units in a computer system through code.
In itself, its a giant API document that driver vendors like Nvidia and AMD are to implement for their graphics cards. The API has been around for years and compared to newer alternatives like DirectX12 and Vulkan it is much easier to learn and use, but comes at the cost of performance and flexibility.

What is Compute?

Rasterized rendering is done through various graphics card programs called shaders. There are shaders for every stage of the rasterization process, most common are per vertex (point in 3d space) shaders and per pixel shaders. For example, in the pixel shader you look up the colour of the rasterized triangle and the program will write that colour to the screen. Compute shaders are programs that run outside of the rasterization process and have nothing to do with rendering, they are called 'compute' because that is what they are designed to do; perform work that only involves computation. In newer APIs this means you can execute compute shaders in parallel with vertex and pixel shaders. This way one can render a complex scene and perform computation-only stuff like particles or animation alongside it.

Why Compute?

The biggest benefit of Compute is being able to manually control concurrency. Modern graphics cards are made up of thousands of individual cores, so ideally we spread our work over as many cores as possible. Since we are writing our result directly to the screen we can use a 'work group size' that is some factor of 2, 4, 8, or 16 of the monitor's resolution. Let's say we have a monitor of 1280 by 720 pixels, if you select a work group size of 4 we get 320 by 180 clusters of 4x4 cores working on our program, each cluster handling one invocation of our shader program. Even though we are in a Compute shader we can get all the same information like monitor resolution, current pixel coordinates and more by using GLSL's global variables that give us the invocation number.

Why C++?

Predominately a C API, OpenGL has bindings for many modern languages but most do not perform well and are often poorly maintained. Since C is backwards compatible with C++ we don't have to worry about bindings and get all the performance we need. Raekor is written in C++ and handles all the windowing, input, GUI and buffers. One of the nice things for this particular project is being able to tie the Camera class directly to the compute shader, using its matrices to generate ray starting positions and directions and allowing the user to move around. Lastly, we keep a container of Sphere class objects and send this to the GPU.

Challenges

The biggest problem with shaders is that they are very limited in terms of programming features. GLSL looks a lot like C/C++ but a very limited subset of it (no classes, no recursion etc). Because of this, there is no built-in functionality for random numbers. We need random numbers for ray tracing, we might shoot multiple rays per pixel and each with a slight random offset. In order to get random numbers in GLSL we take in a float called 'iTime' that depicts the number of seconds passed since application start and use it as seed.

Another problem is noise, we shoot multiple rays through a single pixel and average the result. This creates noise because pixels next to each other get colours that don't exactly make sense. To 'fix' this we have two intermediate b uffers, one accumulation buffer and one for the final image. The accumulation buffer works like a moving average, we store the color values in RGB, and an iteration count in the Alpha channel. To get the colour for the final image we simply get the accumulation pixel and do RGB / A, giving us the average.

Result with noise

Final result

The last challenge is maintaining interactive frame rates. As sphere count goes up the trace count goes up and we start losing frames. On a GTX 1080Ti I am able to maintain 60 frames per second as long as I keep the sphere count under 25-ish. Frame rate also depends on angle and viewing distance, if you get up close to a sphere the shader has to do some awkward ray math and the frame rate starts to tank. This makes scalability very limited; adding spheres, increasing the maximum number of bounces and shooting more rays per pixel all introduce significant overhead so these need to be kept at optimal settings.

Future

There is one feature missing which is depth of field. I was unable to get it working in time, as I do not fully understand where to apply the randomized offset, since the book uses a different approach to the camera class than I do. There are also tons of optimizations possible like splitting geometry up in a bounding hierarchy volume so rays can early out, and further optimizing the works group through some form of shared memory (resulting in less cache misses).

The source code and application is publicly available.

Search This Blog

Graphics Programming