### In a Nutshell

LambdaCube 3D is a domain specific language and library that makes it possible to program GPUs in a purely functional style.

Purely Functional Rendering Engine

May 15, 2016

Posted by on Recently I created a new example that we added to the online editor: a simple showcase using ambient occlusion fields. This is a lightweight method to approximate ambient occlusion in real time using a 3D lookup table.

There is no single best method for calculating ambient occlusion, because various approaches shine under different conditions. For instance, screen-space methods are more likely to perform better when shading (very) small features, while working at the scale of objects or rooms requires solutions that work in world space, unless the artistic intent calls for a deliberately unrealistic look. VR applications especially favour world-space effects due to the increased need for temporal and spatial coherence.

I was interested in finding a reasonable approximation of ambient occlusion by big moving objects without unpleasant temporal artifacts, so it was clear from the beginning that a screen-space postprocessing effect was out of the question. I also ruled out approaches based on raymarching and raytracing, because I wanted it to be lightweight enough for mobile and other low-performance devices, and support any possible occluder shape defined as a triangle mesh. Being physically accurate was less of a concern for me as long as the result looked convincing.

First of all, I did a little research on world-space methods. I quickly found two solutions that are the most widely cited:

- Ambient Occlusion Fields by Kontkanen and Laine, which uses a cube map to encode the occlusion by a single object. Each entry of the map contains coefficients for an approximation function that returns the occlusion term given the distance from the centre of the object in the direction corresponding to the entry. They also describe a way to combine the occlusion terms originating from several objects by exploiting blending.
- Ambient Occlusion Fields and Decals in Infamous 2, which is a more direct approach that stores occlusion information (amount and general direction) in a 3D texture fitted around the casting object. This allows a more accurate reconstruction of occlusion, especially close to or within the convex hull of the object, at the cost of higher memory requirements.

I thought the latter approach was promising and created a prototype implementation. However, I was unhappy with the results exactly where I expected this method to shine: inside and near the the object, and especially when it should have been self-shadowing.

After exhausting my patience for hacks, I had a different idea: instead of storing the general direction and strength of the occlusion at every sampling point, I’d directly store the strength in each of the six principal (axis-aligned) directions. The results surpassed all my expectations! The shading effect was very well-behaved and robust in general, and all the issues with missing occlusion went away instantly. While this meant increasing the number of terms from 4 to 6 for each sample, thanks to the improved behaviour the sampling resolution could be reduced enough to more than make up for it – consider that decreasing resolution by only 20% is enough to nearly halve the volume.

The real beef of this method is in the preprocessing step to generate the field, so let’s go through the process step by step. First of all, we take the bounding box of the object and add some padding to capture the domain where we want the approximation to work:

Next, we sample occlusion at every point by rendering the object on an 8×8 cube map as seen from that point. We just want a black and white image where the occluded parts are white. There is no real need for higher resolution or antialiasing, as we’ll have more than 256 samples affecting each of the final terms. Here’s how the cube maps look like (using 10x10x10 sampling points for the illustration):

Now we need a way to reduce each cube map to just six occlusion terms, one for each of the principal directions. The obvious thing to do is to define them as averages over half cubes. E.g. the up term is an average of the upper half of the cube, the right term is derived from the right half etc. For better accuracy, it might help to weight the samples of the cube map based on the relative directions they represent, but I chose not to do this because I was satisfied with the outcome even with simple averaging, and the difference is unlikely to be significant. Your mileage may vary.

The resulting terms can be stored in two RGB texels per sample, either in a 3D texture or a 2D one if your target platform has no support for the former (looking at you, WebGL).

To recap, here’s the whole field generation process in pseudocode:

principal_directions= {left, down, back, right, up, forward} for eachsample_indexin (1, 1, 1) to (x_res, y_res, z_res)pos= position of the grid point atsample_indexsample= black and white 8x8 cube map capture of object atposfor eachdir_indexin 1 to 6dir=principal_directions[dir_index]hemisphere= all texels ofsamplein the directions at acute angle withdirterms[dir_index] = average(hemisphere)field_negative[sample_index] = (r:terms[1], g:terms[2], b:terms[3])field_positive[sample_index] = (r:terms[4], g:terms[5], b:terms[6])

This is what it looks like when sampling at a resolution of 32x32x32 (negative XYZ terms on top, positive XYZ terms on bottom):

The resulting image is basically a voxelised representation of the object. Given this data, it is very easy to extract the final occlusion term during rendering. The key equation is the following:

**occlusion = dot(minField(p), max(-n, 0)) + dot(maxField(p), max(n, 0))**, where

**p**= the position of the sampling point in field space (this is normalised, i.e. (0,0,0) corresponds to one corner of the original bounding box used to generate the samples, and (1,1,1) covers the opposite corner)**n**= the normal of the surface in occluder local space**minField**= function to sample the minimum/negative terms (a single texture lookup if we have a 3D texture, two lookups and a lerp if we have a 2D texture)**maxField**= function to sample the maximum/positive terms

All we’re doing here is computing a weighted sum of the occlusion terms, where the weights are the clamped dot products of **n** with the six principal directions. These weights happen to be the same as the individual components of the normal, so instead of doing six dot products, we can get them by zeroing out the negative terms of **n** and **-n**.

Putting aside the details of obtaining **p** and **n** for a moment, let’s look at the result. Not very surprisingly, the ambient term computed from the above field suffers from aliasing, which is especially visible when moving the object. Blurring the field with an appropriate kernel before use can completely eliminate this artifact. I settled with the following 3x3x3 kernel:

1 | 4 | 1 | 4 | 9 | 4 | 1 | 4 | 1 | ||

4 | 9 | 4 | 9 | 16 | 9 | 4 | 9 | 4 | ||

1 | 4 | 1 | 4 | 9 | 4 | 1 | 4 | 1 |

Also, since the field is finite in size, I decided to simply fade out the terms to zero near the edge to improve the transition at the boundary. In the Infamous 2 implementation they opted for remapping the samples so the highest boundary value would be zero, but this means that every object has a different mapping that needs to be fixed with other magic coefficients later on. Here’s a comparison of the original (left) and the blurred (right) fields:

Back to the problem of sampling. Most of the work is just transforming points and vectors between coordinate systems, so it can be done in the vertex shader. Let’s define a few transformations:

**F**– occluder local to (normalised) field space, i.e. map the bounding box in the occluder’s local space to the range (0,0,0)-(1,1,1); this matrix is tied to the baked field, therefore it’s constant**O**– occluder local to world space, i.e. occluder model transformation**R**– receiver local to world space, i.e. receiver model transformation

I’ll use the **f**, **o**, and **r** subscripts to denote that a point or vector is in field, occluder local or receiver local space, e.g. **p _{f}** is the field space position, or

**n = n _{o} = normalize(O^{-1} * R * n_{r})**

The bias factor is the inversely proportional to the field’s resolution (I’m using 1/32 in the example, but it could also be a non-uniform scale if the field is not cube shaped), and its role is to prevent surfaces from shadowing themselves. Note that we’re not transforming the normal into field space, since that would alter its direction.

And that’s pretty much it! So far I’m very pleased with the results. One improvement I believe might be worth looking into is reducing the amount of terms from 6 to 4 per sample, so we can halve the amount of texture space and lookups needed. To do this, I’d pick the normals of a regular tetrahedron instead of the six principal directions for reference, and compute 4 dot products in the vertex shader (the 4 reference normals could be packed in a 4×3 matrix **N**) to determine the contribution of each term:

**weights = N * n _{o} = (dot(n_{o}, n_{1}), dot(n_{o}, n_{2}), dot(n_{o}, n_{3}), dot(n_{o}, n_{4}))**

As soon as LambdaCube is in a good shape again, I’m planning to augment our beloved Stunts example with this effect.