### In a Nutshell

LambdaCube 3D is a domain specific language and library that makes it possible to program GPUs in a purely functional style.

Purely Functional Rendering Engine

October 14, 2012

Posted by on One of the test examples in the repository is an implementation of Variance Shadow Mapping, which lets us demonstrate multi-pass rendering through a relatively simple effect. In this post we’ll first have a look at the runnable EDSL implementation, discussing some of our syntactic woes on the way.

The Wikipedia article on shadow mapping gives a good overview of the general idea: first we render the scene from the point of view of the light source (using perspective projection for point or spot lights, and orthographic projection for directional lights), recording the distances of various surfaces from the light, then use this information as a look-up table in a second pass when we render the scene in the space of the main camera.

All we need to do to make shadow mapping work is to think carefully about the transformation matrices involved. The transformation from object local coordinates to screen coordinates is normally decomposed into three phases:

- model: object-local to world space (global coordinates);
- view: world to camera space (camera-local coordinates);
- projection: camera to screen space (coordinates after perspective projection and normalisation).

The model matrix is potentially different for every object, while view and projection are global to the rendering pass, so they can be composed before rendering. Consequently, we are going to use the following transformation matrices:

**modelMatrix**: per-object model matrix;**lightMatrix**: view-projection matrix for the light;**cameraMatrix**: view-projection matrix for the camera.

The first step is to record the distances of the closest surfaces from the light source. Whether the light is directional or point light is irrelevant; all we need is the light matrix to be able to transform the vertices as desired. This simple depth pass is handled by the following definition:

depth :: Exp Obj (FrameBuffer N1 (Float, Float)) depth = Accumulate accCtx PassAll frag (Rasterize triangleCtx prims) clearBuf where accCtx = AccumulationContext Nothing (DepthOp Less True :. ColorOp NoBlending True :. ZT) clearBuf = FrameBuffer (DepthImage n1 1000 :. ColorImage n1 0 :. ZT) prims = Transform vert (Fetch "geometrySlot" Triangle (IV3F "position")) lightMatrix = Uni (IM44F "lightMatrix") modelMatrix = Uni (IM44F "modelMatrix") vert :: Exp V V3F -> VertexOut Float vert pos = VertexOut lightPos (floatV 1) (Smooth depth :. ZT) where lightPos = lightMatrix @*. modelMatrix @*. v3v4 pos V4 _ _ depth _ = unpack' lightPos frag :: Exp F Float -> FragmentOut (Depth Float :+: Color Float :+: ZZ) frag depth = FragmentOutRastDepth (depth :. ZT)

Due to the lack of dedicated syntax, the notation is somewhat heavier than the DSL equivalent would be. The **Exp** type constructor corresponds to the **@** operator; its first argument is the frequency, while the second is the payload type. For technical reasons, we also need our own non-flat tuple representation, which is constructed with the **:.** operator (**:+:** in the type system). In this case, **depth** is a one-layer framebuffer that stores two floating point numbers per pixel. In this particular case, both happen to be the same: the distance from the light. The first is the depth generated by the rasteriser, and the second is the output of the fragment shader. The latter value will change later when we calculate variance.

The top level of the definition starts with the last stage of the pipeline, the accumulation step. This is where the fragment shader is applied to the output of the rasteriser, i.e. the fragment stream. The accumulation context describes what happens to each fragment produced by the shader: they are only kept if their depth is less than that in the framebuffer, the new depth is written to the buffer, and the new colour simply overwrites the old one without blending. There is no additional fragment filter, which is expressed with the **PassAll** constant (equivalent to providing a constantly true function). Before accumulation, the framebuffer is cleared by setting the raster depth to 1000 and the ‘colour’ to 0 in each pixel.

As for the fragment stream, it is obtained by rasterising the primitive stream, which consists of triangles. There is only one vertex attribute, the position. Everything else is irrelevant for this pass, since we don’t need to calculate anything besides the actual shapes. Geometry can be pushed into the pipeline through named slots, and this is the point where we can define the name.

The vertex shader transforms the position into light-space coordinates by going through the world coordinate system first. Our infix operators are all prefixed with **@** to avoid name collusions with the operators defined in the standard prelude; we intend to drop this prefix in the DSL. The **v3v4** function simply extends a 3D vector with the homogeneous coordinate w=1. For the payload, it simply emits the z coordinate of the light-space position. We don’t have a convenient interface for swizzling at the moment: vector components can be extracted by pattern matching on the result of the **unpack’** function (all the primitive functions have an apostrophe in their name to avoid collision with the prelude). Also, the definition of uniforms makes it quite apparent that we are basically building a raw AST.

In order to make use of the depth information, we need to convert it into a sampler, which will be referenced in the second, final pass:

shadowMapSize :: Num a => a shadowMapSize = 512 sm :: Exp Obj (FrameBuffer N1 (Float, V4F)) sm = Accumulate accCtx PassAll frag (Rasterize triangleCtx prims) clearBuf where accCtx = AccumulationContext Nothing (DepthOp Less True :. ColorOp NoBlending (one' :: V4B) :. ZT) clearBuf = FrameBuffer (DepthImage n1 1000 :. ColorImage n1 (V4 0.1 0.2 0.6 1) :. ZT) prims = Transform vert (Fetch "geometrySlot" Triangle (IV3F "position", IV3F "normal")) cameraMatrix = Uni (IM44F "cameraMatrix") lightMatrix = Uni (IM44F "lightMatrix") modelMatrix = Uni (IM44F "modelMatrix") lightPosition = Uni (IV3F "lightPosition") vert :: Exp V (V3F, V3F) -> VertexOut (V3F, V4F, V3F) vert attr = VertexOut viewPos (floatV 1) (Smooth (v4v3 worldPos) :. Smooth lightPos :. Smooth worldNormal :. ZT) where worldPos = modelMatrix @*. v3v4 localPos viewPos = cameraMatrix @*. worldPos lightPos = lightMatrix @*. worldPos worldNormal = normalize' (v4v3 (modelMatrix @*. n3v4 localNormal)) (localPos, localNormal) = untup2 attr frag :: Exp F (V3F, V4F, V3F) -> FragmentOut (Depth Float :+: Color V4F :+: ZZ) frag attr = FragmentOutRastDepth (luminance :. ZT) where V4 lightU lightV lightDepth lightW = unpack' lightPos uv = clampUV (scaleUV (pack' (V2 lightU lightV) @/ lightW)) surfaceDistance = texture' sampler uv lightPortion = Cond (lightDepth @<= surfaceDistance @+ floatF 0.01) (floatF 1) (floatF 0) lambert = max' (floatF 0) (dot' worldNormal (normalize' (lightPosition @- worldPos))) intensity = lambert @* lightPortion luminance = pack' (V4 intensity intensity intensity (floatF 1)) clampUV x = clamp' x (floatF 0) (floatF 1) scaleUV x = x @* floatF 0.5 @+ floatF 0.5 (worldPos, lightPos, worldNormal) = untup3 attr sampler = Sampler PointFilter Clamp shadowMap shadowMap :: Texture (Exp Obj) DIM2 SingleTex (Regular Float) Red shadowMap = Texture (Texture2D (Float Red) n1) (V2 shadowMapSize shadowMapSize) NoMip [PrjFrameBuffer "shadowMap" tix0 depth]

The definition of the sampler is right at the bottom. First we convert the framebuffer yielded by **depth** into an image using **PrjFrameBuffer**, which projects a given member of a tuple during the conversion. The predefined value **tix0** is the *tuple index* of the first element. In this case, our ‘tuple’ is degenerate, since it consists of a single element anyway. The resulting image is converted into a two-dimensional texture (**shadowMap**) with just a floating-point red channel. Finally, the texture is wrapped in a sampler structure (**sampler**), which specifies that it’s a non-repeating image that must not be smoothened during sampling, since that wouldn’t be meaningful for depth values at the edges of objects.

The pipeline setup is very similar to that of the depth pass, and most of the difference is in the shaders. This is a more complex case, where we have tuples both in the vertex and the fragment stream. Again, for technical reasons we need to unpack these representations with a dedicated function (**untup***) and pattern matching, just like vectors. We could also use view patterns to make this extra step a bit less painful, but in the end all this won’t be necessary in the DSL.

As for shadows, the work is divided up between the two shader phases. The vertex shader calculates the vectors needed: view position (used by the rasteriser), world space position, light space position and world space surface normal. We cheated a bit with the normal calculation, since we don’t use the inverse transpose of the matrix. This is fine as long as our transformations don’t involve any non-uniform scaling, or we’re only scaling axis-aligned cuboids. The **n3v4** function extends a 3D vector with w=0, so it is treated as a direction.

The fragment shader calculates the final colour of each pixel. The light space position is used to address the shadow map (**sampler**) as well as to quickly determine the distance to the light source without calculating an extra square root. The value of **lightPortion** is 0 if there is an occluder between the current point and the light source, 1 otherwise. To avoid self-shadowing, a little offset is applied to the stored depth of the closest surface (this could have been done in the first pass as well). Afterwards, we calculate the light contribution using the Lambert model, i.e. taking the cosine of the angle between the surface normal and the vector that points towards the light. Multiplying this value with the light proportion gives us an image like this:

We can clearly see that the chosen depth offset is not sufficient, since the background plane still suffers from self-shadowing.

In the actual example, we use a spotlight instead of a point light that radiates in every direction. This effect is simply achieved by calculating a colour that depends on the shadow UV coordinates. We can change the definition of **intensity** and **luminance** to get a better idea of the light’s direction:

uv' = uv @- floatF 0.5 spotShape = floatF 1 @- length' uv' @* floatF 4 intensity = max' (floatF 0) (spotShape @* lambert) V2 spotR spotG = unpack' (scaleUV (round' (uv' @* floatF 10)) @* intensity) luminance = pack' (V4 spotR spotG intensity (floatF 1)) @* lightPortion

The resulting image is maybe a bit more interesting than the previous one:

There are several issues with basic shadow mapping. Aliased shadow edges are only the tip of the iceberg; nowadays they can be easily fixed by using shadow samplers, which can provide bilinear filtering on the light proportion instead of just a binary comparison. However, if we want softer shadows, life gets a lot more complicated. The obvious solution is to take several samples of the shadow map and average the results, but this entails a potentially severe performance hit. We can also get away with a single sample if we apply a jitter to the UV coordinates, but this results in a noisy pattern instead of a smooth transition.

Variance shadow mapping makes it possible to get a proper smoothing effect with a single sample from the shadow map. The basic idea is to store a probability distribution instead of an exact function, and estimate the probability of a pixel being occluded instead of performing an exact test. The VSM algorithm uses Chebyshev’s inequality for the estimation. Since our shadow map is now a probability distribution, we can directly blur it with a Gaussian filter and get meaningful results. Another nice side effect is that the new formula also addresses the problem of self-shadowing, and provides a robust solution in place of the rather brittle and scene dependent offset hack.

In the first pass, we store the first two moments of the depth distribution:

moments :: Exp Obj (FrameBuffer N1 (Float, V2F)) moments = Accumulate accCtx PassAll frag (Rasterize triangleCtx prims) clearBuf where accCtx = AccumulationContext Nothing (DepthOp Less True :. ColorOp NoBlending (one' :: V2B) :. ZT) clearBuf = FrameBuffer (DepthImage n1 1000 :. ColorImage n1 (V2 0 0) :. ZT) prims = Transform vert (Fetch "geometrySlot" Triangle (IV3F "position")) lightMatrix = Uni (IM44F "lightMatrix") modelMatrix = Uni (IM44F "modelMatrix") vert :: Exp V V3F -> VertexOut Float vert pos = VertexOut lightPos (floatV 1) (Smooth depth :. ZT) where lightPos = lightMatrix @*. modelMatrix @*. v3v4 pos V4 _ _ depth _ = unpack' lightPos frag :: Exp F Float -> FragmentOut (Depth Float :+: Color V2F :+: ZZ) frag depth = FragmentOutRastDepth (pack' (V2 moment1 moment2) :. ZT) where dx = dFdx' depth dy = dFdy' depth moment1 = depth moment2 = depth @* depth @+ floatF 0.25 @* (dx @* dx @+ dy @* dy)

The difference between **moments** and **depth** is the type (two-dimensional float vector instead of a single float per pixel) and the fragment shader, which calculates two moments of the distribution. The second pass (**vsm**) is also similar to the basic shadow mapping case, the only thing that changes is the formula for the light portion, which now becomes the maximum probability that the surface is lit:

V2 moment1 moment2 = unpack' (texture' sampler uv) variance = max' (floatF 0.002) (moment2 @- moment1 @* moment1) distance = max' (floatF 0) (lightDepth @- moment1) lightProbMax = variance @/ (variance @+ distance @* distance)

The other thing that changes slightly is the definition of the sampler:

sampler = Sampler LinearFilter Clamp shadowMap shadowMap :: Texture (Exp Obj) DIM2 SingleTex (Regular Float) RG shadowMap = Texture (Texture2D (Float RG) n1) (V2 shadowMapSize shadowMapSize) NoMip [PrjFrameBuffer "shadowMap" tix0 moments]

Unlike the previous one, this texture has a green component as well to store the second moment, and the sampler is set up to perform linear filtering. Using the value of **lightProbMax** directly as the light portion is a good first approximation, but it leads to light bleeding, a well-known problem with VSM:

Before addressing the bleeding issue, we should first take advantage of the fact that the shadow map can be filtered. We are going to insert an extra pair of passes between **moments** and **vsm** that blurs the shadow map. It is a pair of passes because we exploit the separability of the Gaussian filter, so first we blur the image vertically, then horizontally, thereby doing O(n) work per pixel instead of O(n^{2}), where n is the width of the filter. The blur is described by the following function:

blur :: [(Float, Float)] -> Exp Obj (Image N1 V2F) -> Exp Obj (FrameBuffer N1 V2F) blur coefficients img = filter1D dirH (PrjFrameBuffer "" tix0 (filter1D dirV img)) where dirH v = Const (V2 (v / shadowMapSize) 0) :: Exp F V2F dirV v = Const (V2 0 (v / shadowMapSize)) :: Exp F V2F filter1D :: (Float -> Exp F V2F) -> Exp Obj (Image N1 V2F) -> Exp Obj (FrameBuffer N1 V2F) filter1D dir img = Accumulate accCtx PassAll frag (Rasterize triangleCtx prims) clearBuf where accCtx = AccumulationContext Nothing (ColorOp NoBlending (one' :: V2B) :. ZT) clearBuf = FrameBuffer (ColorImage n1 (V2 0 0) :. ZT) prims = Transform vert (Fetch "postSlot" Triangle (IV2F "position")) vert :: Exp V V2F -> VertexOut V2F vert uv = VertexOut pos (Const 1) (NoPerspective uv' :. ZT) where uv' = uv @* floatV 0.5 @+ floatV 0.5 pos = pack' (V4 u v (floatV 1) (floatV 1)) V2 u v = unpack' uv frag :: Exp F V2F -> FragmentOut (Color V2F :+: ZZ) frag uv = FragmentOut (sample :. ZT) where sample = foldr1 (@+) [texture' smp (uv @+ dir ofs) @* floatF coeff | (ofs, coeff) <- coefficients] smp = Sampler LinearFilter Clamp tex tex = Texture (Texture2D (Float RG) n1) (V2 shadowMapSize shadowMapSize) NoMip [img]

The **blur** function takes a set of coefficient-offset pairs and an input image, and yields a framebuffer containing the filtered version of the image. This shows the power of our approach: since this is an ordinary pure function, it can be applied to any image in any pipeline. In fact, it could be part of a standard library of utility functions after factoring out the resolution as a separate argument instead of using our global constant **shadowMapSize**. The only notable novelty here is the use of Haskell for meta-programming: the amount of shader code generated is proportional to the length of the coefficient list, since the summation is expanded statically. We need to find a convenient substitute for this facility when migrating to the DSL.

To insert the blur, we need to change the definition of the sampler in **vsm**:

sampler = Sampler LinearFilter Clamp shadowMapBlur shadowMapBlur :: Texture (Exp Obj) DIM2 SingleTex (Regular Float) RG shadowMapBlur = Texture (Texture2D (Float RG) n1) (V2 shadowMapSize shadowMapSize) NoMip [PrjFrameBuffer "shadowMap" tix0 blurredMoments] where blurredMoments = blur blurCoefficients (PrjFrameBuffer "blur" tix0 moments) blurCoefficients = [(-4.0, 0.05), (-3.0, 0.09), (-2.0, 0.12), (-1.0, 0.15), (0.0, 0.16), (1.0, 0.15), (2.0, 0.12), (3.0, 0.09), (4.0, 0.05)]

After this change, our shadows change drastically:

As it turns out, blurring also helps against light bleeding to a certain extent, since light bleeding appears in areas where the variance is big. However, it is still quite obvious. There are several ways to address the problem, and we chose the simplest for the sake of the example: raising the light portion to a power. The higher the exponent, the less the light bleeding, but unfortunately increasing the exponent also causes overdarkening. In a way, this is a hack in a similar vein as the depth offset is for simple shadow mapping. In the example, we use the square of **lightProbMax** as the light portion, which gives us nice soft shadows:

We recommend using this example as a starting point for your own experiments, as it has no external dependencies.