In a Nutshell
LambdaCube 3D is a domain specific language and library that makes it possible to program GPUs in a purely functional style.
Purely Functional Rendering Engine
At its heart, the graphics pipeline is simply a configurable data-flow network. The main input of the network arrives as a stream of vertex descriptions, and additional data can be provided in various slots, which are constant during a rendering pass: uniforms of basic types and samplers (textures with some attached logic). After some processing steps, the final output is one or more raster images.
We can look at this data-flow network as a mathematical function that maps scene descriptions to bitmaps. The internal structure of this function can be defined in terms of smaller building blocks that correspond to various stages of the pipeline. Even the programmable pipeline has a more or less fixed global structure, but the transformations within the main stages can be freely defined through shaders, which suggests that the pipeline can be naturally modelled as a higher-order function.
If a single execution of the whole pipeline – a pass – is in essence a function application, it is also straightforward to compose these functions to build more complex rendering processes. All we need is a mapping from framebuffers to textures, which can be read back through samplers in a later pass. In the end, even multi-pass rendering can be modelled as a pure function.
Unfortunately, the practice of programming GPUs completely obscures the purity of the transformation, because the data-flow has to be set up manually, and its description requires at least two languages: a shader language for the GPU parts and the host language to hook up everything on the CPU side. The latter involves programming against a stateful API that sets up the video card through side effects. This is an unnecessarily error prone process, and difficult to structure with composable modules.
Working directly with the data-flow is beneficial for several reasons. As an abstraction over the hardware, it opens up the possibility of using all the capabilities of the machine without having to specifically program for each platform, and might also make profiling-driven optimisations feasible. This also makes the code future proof, since only the data-flow compiler needs to be updated from time to time. Besides, pure functions provide a degree of compositionality unmatched by code relying on side effects. It should be much easier to create reusable components in this system, because they are context independent. At the same time, the final pipeline could be made efficient by applying whole-program optimisation steps based on mathematical equalities, e.g. floating (hoisting) certain operations from the fragment shader to an earlier stage.
Despite the fact that declarative stream programming is a well-established paradigm – represented by languages as old as e.g. Lustre –, and GPUs are an obvious subject for this approach, it is difficult to find systems that try to implement the idea outlined above. We quickly present a few relevant solutions below, and we’d be happy to receive pointers to others if you happen to know one or two; feel free to comment!
The Stanford Real-Time Programmable Shading Project is a good starting point, since this is where the concept of computation frequency was first introduced. The frequency associated with a node in the data-flow network rather unsurprisingly specifies how often that node emits new values. This quantity is directly related to the pipeline phase the node is in: nodes in the fragment shader produce several values for each input coming from the vertex shader, which in turn processes several records (vertex attributes) for each possible change in the uniforms, since uniforms are fixed during a pass. The frequency concept can reach as far as the preprocessing stage, since compile-time constants can be thought of as the ultimate slowest-changing values.
While the language developed by the project is deliberately C-like in its appearance, it is purely functional in nature. All the pipeline is described by a single program, and it is up to the system to either allocate everything in the appropriate shader stage or even multiple passes if the fragment-level computation is too complex to fit in one (or in the absence of programmable shaders). Frequencies are handled by the type system and inferred by the compiler. Unfortunately, due to the fact that this is a decade-old project, the expressiveness of the language is severely limited.
If we are to marry graphics and functional programming, it is impossible not to mention Conal Elliott’s name. His Vertigo project shows another way to turn purely functional descriptions into executable shader code, and the demos show how to define parametric surfaces and various materials. Unfortunately, this system doesn’t address the issue of frequencies, as it generates only vertex shaders. However, Conal’s work is generally interesting because he approaches modelling from a purely mathematical standpoint, therefore everything is dictated by precise denotational semantics.
One more recent attempt is Renaissance, which allows the user to describe a single pass in a pure functional language. Given such a description, it can infer the type and frequency of the nodes solely from their dependencies, and allocate all the operations in the corresponding shader stages. Unlike RTSL, the syntax of this language is inspired by Haskell, and type annotations are optional. Conceptually, the programmer defines a fragment shader, and it is up to the compiler to float computations back to earlier stages if they don’t refer to any fragment-frequency constructs. Otherwise, the two systems are similar in expressive power.
The project formerly known as HaGPipe is a Haskell EDSL. This is the only other system we’re aware of that can describe multi-pass rendering processes (i.e. rendering to textures and using the results in subsequent passes) without any need for assistance from the host language or manually tinkering with framebuffers. All the other systems mentioned here are in essence shader language replacements, only intended to be just a small component in a rendering engine. GPipe distinguishes between frequencies in the type system, by explicitly stating whether a value is a PrimitiveStream or a FragmentStream, for instance. It also makes it possible to explicitly configure the fixed functionality of the rendering pipeline, e.g. blending equation or depth test, thereby providing a complete solution.
Unlike the others, Spark approaches modelling using OO principles. It is interesting for us mainly because it is trying to attack the same problem from another direction: organisation of rendering code with reusable components, where the separation of concerns is independent from the pipeline stages. In Spark, the means of combination are inheritance and generous use of mixins. The type system is aware of different frequencies, and it also allows the user to define new ones.
We decided to make LambdaCube a DSL as opposed to an embedded DSL, like GPipe, for a number of reasons.
Most importantly, DSLs are decoupled from the compiler of the host language. As a result, pipeline descriptions can be bundled with geometry and other scene data without introducing unnecessary dependencies. They don’t have to be mixed with the code, where they don’t belong.
Being responsible for all the compilation stages starting with parsing gives us in general a lot more control over the process of executing code as opposed to relying on a host language compiler. For instance, it should allow us to provide facilities to perform quick partial recompiles combined with hot-swapping, at least during development when no complex optimisations are turned on. Also, it could potentially enable advanced features like interactive optimisation, where individual rewrite rules could be turned on and off while a test scene is running, thereby getting instant feedback about their effect on performance.
Finally, when designing a DSL, we are not limited by the features or structure of a host language, e.g. its type system or standard library. We can potentially get a better fit with the domain. Even if the host language has an expressive type system that’s technically capable of modelling the domain, using it directly might not necessarily provide optimal experience. For instance, type errors tend to be cryptic due to the disconnect with the domain.
At the present moment, LambdaCube is only available as a pseudo-EDSL in Haskell, until we implement the compiler front-end. The language exposed is essentially the AST of the actual DSL made slightly friendlier through a thin wrapper module, which adds convenience features like infix operators. However, we intend to move forward with the parser as soon as possible.