LambdaCube 3D

Purely Functional Rendering Engine

Monthly Archives: August 2012

Motivations and background

At its heart, the graphics pipeline is simply a configurable data-flow network. The main input of the network arrives as a stream of vertex descriptions, and additional data can be provided in various slots, which are constant during a rendering pass: uniforms of basic types and samplers (textures with some attached logic). After some processing steps, the final output is one or more raster images.

We can look at this data-flow network as a mathematical function that maps scene descriptions to bitmaps. The internal structure of this function can be defined in terms of smaller building blocks that correspond to various stages of the pipeline. Even the programmable pipeline has a more or less fixed global structure, but the transformations within the main stages can be freely defined through shaders, which suggests that the pipeline can be naturally modelled as a higher-order function.

If a single execution of the whole pipeline – a pass – is in essence a function application, it is also straightforward to compose these functions to build more complex rendering processes. All we need is a mapping from framebuffers to textures, which can be read back through samplers in a later pass. In the end, even multi-pass rendering can be modelled as a pure function.

Unfortunately, the practice of programming GPUs completely obscures the purity of the transformation, because the data-flow has to be set up manually, and its description requires at least two languages: a shader language for the GPU parts and the host language to hook up everything on the CPU side. The latter involves programming against a stateful API that sets up the video card through side effects. This is an unnecessarily error prone process, and difficult to structure with composable modules.

Working directly with the data-flow is beneficial for several reasons. As an abstraction over the hardware, it opens up the possibility of using all the capabilities of the machine without having to specifically program for each platform, and might also make profiling-driven optimisations feasible. This also makes the code future proof, since only the data-flow compiler needs to be updated from time to time. Besides, pure functions provide a degree of compositionality unmatched by code relying on side effects. It should be much easier to create reusable components in this system, because they are context independent. At the same time, the final pipeline could be made efficient by applying whole-program optimisation steps based on mathematical equalities, e.g. floating (hoisting) certain operations from the fragment shader to an earlier stage.

Some notable systems

Despite the fact that declarative stream programming is a well-established paradigm – represented by languages as old as e.g. Lustre –, and GPUs are an obvious subject for this approach, it is difficult to find systems that try to implement the idea outlined above. We quickly present a few relevant solutions below, and we’d be happy to receive pointers to others if you happen to know one or two; feel free to comment!

Real-Time Shading Language

The Stanford Real-Time Programmable Shading Project is a good starting point, since this is where the concept of computation frequency was first introduced. The frequency associated with a node in the data-flow network rather unsurprisingly specifies how often that node emits new values. This quantity is directly related to the pipeline phase the node is in: nodes in the fragment shader produce several values for each input coming from the vertex shader, which in turn processes several records (vertex attributes) for each possible change in the uniforms, since uniforms are fixed during a pass. The frequency concept can reach as far as the preprocessing stage, since compile-time constants can be thought of as the ultimate slowest-changing values.

While the language developed by the project is deliberately C-like in its appearance, it is purely functional in nature. All the pipeline is described by a single program, and it is up to the system to either allocate everything in the appropriate shader stage or even multiple passes if the fragment-level computation is too complex to fit in one (or in the absence of programmable shaders). Frequencies are handled by the type system and inferred by the compiler. Unfortunately, due to the fact that this is a decade-old project, the expressiveness of the language is severely limited.


If we are to marry graphics and functional programming, it is impossible not to mention Conal Elliott’s name. His Vertigo project shows another way to turn purely functional descriptions into executable shader code, and the demos show how to define parametric surfaces and various materials. Unfortunately, this system doesn’t address the issue of frequencies, as it generates only vertex shaders. However, Conal’s work is generally interesting because he approaches modelling from a purely mathematical standpoint, therefore everything is dictated by precise denotational semantics.


One more recent attempt is Renaissance, which allows the user to describe a single pass in a pure functional language. Given such a description, it can infer the type and frequency of the nodes solely from their dependencies, and allocate all the operations in the corresponding shader stages. Unlike RTSL, the syntax of this language is inspired by Haskell, and type annotations are optional. Conceptually, the programmer defines a fragment shader, and it is up to the compiler to float computations back to earlier stages if they don’t refer to any fragment-frequency constructs. Otherwise, the two systems are similar in expressive power.


The project formerly known as HaGPipe is a Haskell EDSL. This is the only other system we’re aware of that can describe multi-pass rendering processes (i.e. rendering to textures and using the results in subsequent passes) without any need for assistance from the host language or manually tinkering with framebuffers. All the other systems mentioned here are in essence shader language replacements, only intended to be just a small component in a rendering engine. GPipe distinguishes between frequencies in the type system, by explicitly stating whether a value is a PrimitiveStream or a FragmentStream, for instance. It also makes it possible to explicitly configure the fixed functionality of the rendering pipeline, e.g. blending equation or depth test, thereby providing a complete solution.


Unlike the others, Spark approaches modelling using OO principles. It is interesting for us mainly because it is trying to attack the same problem from another direction: organisation of rendering code with reusable components, where the separation of concerns is independent from the pipeline stages. In Spark, the means of combination are inheritance and generous use of mixins. The type system is aware of different frequencies, and it also allows the user to define new ones.


We decided to make LambdaCube a DSL as opposed to an embedded DSL, like GPipe, for a number of reasons.

Most importantly, DSLs are decoupled from the compiler of the host language. As a result, pipeline descriptions can be bundled with geometry and other scene data without introducing unnecessary dependencies. They don’t have to be mixed with the code, where they don’t belong.

Being responsible for all the compilation stages starting with parsing gives us in general a lot more control over the process of executing code as opposed to relying on a host language compiler. For instance, it should allow us to provide facilities to perform quick partial recompiles combined with hot-swapping, at least during development when no complex optimisations are turned on. Also, it could potentially enable advanced features like interactive optimisation, where individual rewrite rules could be turned on and off while a test scene is running, thereby getting instant feedback about their effect on performance.

Finally, when designing a DSL, we are not limited by the features or structure of a host language, e.g. its type system or standard library. We can potentially get a better fit with the domain. Even if the host language has an expressive type system that’s technically capable of modelling the domain, using it directly might not necessarily provide optimal experience. For instance, type errors tend to be cryptic due to the disconnect with the domain.

At the present moment, LambdaCube is only available as a pseudo-EDSL in Haskell, until we implement the compiler front-end. The language exposed is essentially the AST of the actual DSL made slightly friendlier through a thin wrapper module, which adds convenience features like infix operators. However, we intend to move forward with the parser as soon as possible.


A new era in 3D rendering

Even if the title of this opening post might sound slightly pompous, we believe that a bit of ambition won’t hurt at this point. After all, we intend to challenge the state of the art in 3D rendering. As for who the mysterious ‘we’ refers to, you can find out in the About section.

The primary topic of this blog will be LambdaCube 3D, a domain specific language and library that provides a purely functional way to program GPUs. However, we also intend to cover additional issues related to creating interactive audiovisual content, especially games, in idiomatic Haskell. Just a few examples: reliable cross-platform audio programming, integrating physics, functional reactive programming (FRP henceforth).

A bit of history

LambdaCube started its life in the beginning of 2009 as Csaba’s toy project to gain some experience in Haskell by putting cool things on the screen through the OpenGL bindings. As Gergely was his mentor at the university at the time, they teamed up and submitted an application to the Jane Street Summer Project. As a result, the demo application was completely rewritten and turned into a library in a few months, and saw its first Hackage release. This library was still designed as a reimplementation of the OGRE 3D engine, i.e. it would be able to load content created for OGRE, and it had superficial integration with the Bullet physics engine as well.

The first release was followed by a long period of silence due to other obligations (and maybe a slight case of double stealth mode), but development sped up again in 2011. After a few months of work behind the curtains, an improved version of the library was released, which already made it possible to build the Stunts demo. This version still followed the same principles as the first one, focusing on OGRE support, and was decidedly the last release of its kind, since it was impossible to provide a nice functional interface while building on a legacy base.

Previously, there have been attempts at programming GPUs in a way that’s truly functional in spirit. So far, the most feature complete system to our knowledge is GPipe, which served as the primary inspiration for the third major version of LambdaCube. Just as GPipe, LambdaCube allows the developer to describe the whole rendering pipeline in a purely functional manner as a stream processing network, and let the system figure out how to map the description to the underlying graphics APIs and manage resources. The primary difference between the two is that GPipe is a DSL embedded in Haskell, while LambdaCube aims to be a fully independent DSL that can exist outside the Haskell world.

The current state of affairs

In its current state, LambdaCube is already functional, but still in its infancy. The current API is a rudimentary EDSL that is not intended for direct use in the long run. It is essentially the internal phase of a compiler backend exposed for testing purposes. To exercise the library, we have created two small proof of concept examples: a port of the old LambdaCube Stunts example, and a Quake III level viewer. It is our pleasure to see that LambdaCube already provides much better runtime performance than GPipe without compromising the overall design.

If you are interested in getting your hands dirty, feel free to check out the Getting Started section.

Stunts demo with Parallel Split Shadow Mapping

Quake III Arena level viewer