Mouse picking techniques

When creating any interactive 3D application being able to fly around a 3D scene and interact with objects makes it feel like all that math created something tangible. One of the simplest forms of interaction one would want in a game, or an engine, is being able to select objects by simply clicking on them. Before implementing this I was able to select entities by clicking on their name in the scene hierarchy list but that does not feel like you're interacting with the world, just the UI. Time to get to work.

Ray Casting

The first implementation we take a look at is ray casting. The idea is to cast a ray into our scene and check for intersection with any mesh in the current scene. A ray is just a mathematical structure that holds a 3 component vector stating the ray's origin and a 3 component vector stating its direction.

struct Ray {

    Ray() = default;

    Ray(Viewport& viewport, glm::vec2 coords);

    glm::vec3 origin;

    glm::vec3 direction;

};

On click we take the mouse position coordinates inside the window and convert it to a world-space directional vector that we use as the ray direction. The ray origin is simply the location of the viewport's camera in world-space (as always, it does not really matter what space they are in as long as all calculations are performed in the same space!). This article does a fantastic job of explaining the entire process. We can use this ray to check if it intersects with a game object's bounding box, though, we will need separate calculations for axis aligned and oriented boxes.

The engine calculates a bounding box for every mesh when it is imported by lazily looping over the vertices and keeping track of the minimum and maximum vector. Next we check for ray intersections with every bounding box. For both axis aligned and oriented we use an algorithm based on the slabs technique as described by Eric Haines in "An Introduction to Ray Tracing". Implementing it is straight forward, we just have to factor in the axis' rotation when testing oriented boxes, which we can extract from the game object's transformation matrix.

std::optional<float> hitsOBB(const glm::vec3& min, const glm::vec3& max, const glm::mat4& modelMatrix);

std::optional<float> hitsAABB(const glm::vec3& min, const glm::vec3& max);

We use std::optional to determine if the ray hit at all, and if it did we want the resulting distance. Since the engine now has an entity-component system we just create a system that checks for intersections and returns the Entity integer of the clicked object.

// if the camera is inside a mesh's world AABB we skip it

if (Math::pointInAABB(ray.origin, worldAABB[0], worldAABB[1])) {

    continue;

}

// check for ray hit

auto hitResult = ray.hitsOBB(mesh.aabb[0], mesh.aabb[1], worldTransform);

// if we hit do something

if (hitResult.has_value()) {

    // if its the first iteration we init the shortest distance

    if (!result) {

        shortestDistance = hitResult;

        result = entity;

    } // every other iteration we check if the hits distance is shorter and update the result

    else if (hitResult.value() < shortestDistance) {

        shortestDistance = hitResult;

        result = entity;

    }

}

return result;

Note that we early out if the camera is inside an object's bounding box, else we would constantly select the object we are in because it's an instant hit. This also saves us a couple of CPU cycles of checking ray intersections. Finally we render the selected object's bounding box to the screen by rendering the box vertices using GL_LINES.

This technique leaves a lot to be desired, it works fine for box shaped meshes but loses out in terms of accuracy when dealing with more complex geometry. One way to improve our algorithm is to only use bounding box testing for early-out and once you find the best matching box you check every vertex of that box's mesh against the ray. It's not a very fast improvement since it scales linearly with a mesh's vertex count. Maybe we can get the GPU to help us out?

Stencil Buffer Picking

The idea is simple: We take a point in our rendering pipeline where we draw the entire scene and generate a stencil buffer that holds a unique identifier for any mesh's fragment (or pixel in HLSL land) in the viewport. We can then sample this texture using our mouse coordinates to get back that unique id. In this example I will be using the OpenGL based deferred geometry pass to plug in the extra stencil buffer and code.

First we add an 8-bit stencil buffer to our render pass by creating a render buffer with both depth and stencil components and attaching it to the frame buffer.

GDepthBuffer.init(viewport.size.x, viewport.size.y, GL_DEPTH32F_STENCIL8);

GBuffer.attach(GDepthBuffer, GL_DEPTH_STENCIL_ATTACHMENT);

Next we define the stencil pipeline state.

// enable stencil stuff

glEnable(GL_STENCIL_TEST);

glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE);

glStencilMask(0xFFFF); // Write to stencil buffer

glStencilFunc(GL_ALWAYS, 0, 0xFFFF);  // Set any stencil to 0

This tells OpenGL to enable stencil operations and only replace the stencil buffer's value if both the depth test and stencil test passed, else we keep the original value. The stencil func value is set to zero. Next we continue rendering the scene by looping over every mesh. In this loop we can give OpenGL a stencil func value that uniquely identifies a mesh, which in our case is the Entity integer from the entity-component system. Don't forget to disable stencil testing at the end of the render pass.

// write the entity ID to the stencil buffer for picking

glStencilFunc(GL_ALWAYS, (GLint)entity, 0xFFFF);

We should now have a stencil buffer filled with entity integers for every mesh. If you are having a hard time visualising the stencil buffer in your head, consider the following picture:

A stencil buffer is just a regular texture of lets say 1920 by 1080 pixels that fits the viewport. But instead of colour values per pixel (RGBA, 4 floats) it gets a single byte of data per pixel. So when we render a mesh, for every pixel it covers on the screen, we can write an 8 bit integer value to the stencil buffer. We don't have to worry about where meshes are in world space because their pixels are drawn over each other anyway and we can only select what is visible inside the viewport.

Next we create a function that reads from the stencil buffer at some xy coordinate and gives us back the mesh's integer.

ECS::Entity pick(uint32_t x, uint32_t y) {

    int id;

    GBuffer.bind();

    glReadPixels(x, y, 1, 1, GL_STENCIL_INDEX, GL_INT, &id);

    GBuffer.unbind();

    return id;

}

Whenever the user clicks inside the viewport we call pick and set the active Entity to its result.
Since we get pixel accuracy we can select much more complex geometry than with our ray cast method, but it loses accuracy when objects get far away.

Another downside is that it is keeping the GPU from doing render work since we have to bind and read pixels from the stencil buffer. For the engine's editor this method works fine, but it gets tricky when implementing multi-select functionality. Using the stencil buffer we could also create an outline around the mesh, which would allow us to get rid of our bounding box overlay.

There are many optimizations to be made for both methods, and at the end of the day it comes down to use case. My engine will utilise stencil picking for now, but could run into hurdles in the future making me reconsider that decision.

As always the entire code can be found at https://github.com/nicovanbentum/Raekor

Search This Blog

Graphics Programming