LambdaCube 3D
Purely Functional Rendering Engine
Playing around with distance field font rendering
November 12, 2014
Posted by on While waiting for LambdaCube to reach a stable state I started thinking about displaying text with it. I wanted to build a system that doesn’t limit the user to a pre-determined set of characters, so all the static atlas based methods were out of the question. From experience I also know that a dynamic atlas can fill very quickly as the same characters need to be rendered to it at all the different sizes we need to display them in. But I also wanted to see nice sharp edges, so simply zooming the texture was out of the question.
What’s out there?
Short of converting the characters to triangle meshes and rendering them directly, the most popular solution of this problem is to represent them as signed distance fields (SDF). The most commonly cited source for this approach is a well-known white paper from Valve. Unfortunately, the problem with SDF based rendering is that it generally cannot preserve the sharpness of corners beyond the resolution of the texture:
The above letter was rendered at 64 pixels per em size. Displaying SDF shapes is very simple: just sample the texture with bilinear interpolation and apply a step function. By choosing the right step function adapted to the scale, it’s possible to get cheap anti-aliasing. Even simple alpha testing can do the trick if that’s not needed, in which case rendering can be done without shaders. As an added bonus, by choosing different step functions we can render outlines, cheap blurs, and various other effects. In practice, the distance field does an excellent job when it comes to curves. When zoomed in, the piecewise bilinear nature of the curves becomes apparent – it actually tends to look piecewise linear –, but at that point we could just swap in the real geometry if the application needs higher fidelity.
Of course, all of this is irrelevant if we cannot afford to lose the sharp corners. There are some existing approaches to solve this problem, but only two come to mind as serious attempts: Loop and Blinn’s method (low-poly mesh with extra information on the curved parts) and GLyphy (SDF represented as a combination of arc spline approximations). Both of these solutions are a bit too heavy for my purposes, and they also happen to be patented, which makes them not so desirable to integrate in one’s own system.
The Valve paper also points towards a possible solution: corners can be reconstructed from a two-channel distance field, one for each incident edge. They claimed not to pursue that direction because they ‘like the rounded style of this text’ (yeah, right!). Interestingly, I haven’t been able to find any follow-up on this remark, so I set out to create a solution along these lines. The end result is part of the LambdaCube Font Engine, or Lafonten for short.
Doubling the channels
Valve went for the brute-force option when generating the distance fields: take a high-resolution binary image of the shape and for each pixel measure the distance to the nearest pixel of the opposite colour. I couldn’t use this method as a starting point, because in order to properly reconstruct the shapes of letters I really needed to work with the geometry. Thankfully, there’s a handy library for processing TrueType fonts called FontyFruity that can extract the Bézier control points for the character outlines (note: at the time of this writing, the version required by Lafonten is not available on Hackage yet).
The obvious first step is to take the outline of the character, and slice it up along the corners that need to be kept sharp. These curve sections need to be extruded separately and they need to alternate between the two channels. If there’s an odd number of sections, one segment – preferably the longest one to minimise artifacts – can gradually transition from one channel to the other:
The extrusion is performed in both directions, so the real outline of the character is the median line of the extruded sections. Afterwards, the internal part needs to be filled somehow. It turns out that simply using the maximum value in both channels yields good results:
The shape used for filling is created from the inner edges of the extruded curve sections. The convex corners are derived as intersections between the consecutive curves, while the concave corners need to be covered more aggressively to make sure there are no dark holes within the contour (not to mention that those curves don’t necessarily intersect if the angle is sharp enough).
Another transformation that turned out to be useful to avoid some undesired interaction between unrelated curves is an extra cutting step applied at obtuse convex angles. For instance, the bottom left part of the 3 above actually looks like this:
It might be the case that this cutting step is redundant, as it was originally introduced while I was experimenting with a different, more brittle channel selection scheme.
Unfortunately, this two-channel distance field doesn’t contain enough information to reconstruct the original shape. The problem is that depending on whether a corner is convex or concave we need to combine the fields with a different function to get an approximation. Convex corners are defined as the minimum of the two values, while concave ones are the maximum. When we render the above image on a low-res texture and sample it linearly, we cannot really determine which case we’re looking at without additional data.
Quadrupling the channels
After some experimentation, I came to the conclusion that I needed two additional channels to serve as masks. The final formula to calculate the approximate distance is the following:
max(max(d1 * m1, d2 * m2), min(d1, d2)).
The values of d1 and d2 come from the channels shown above, while m1 and m2 depend on what colour section we are in. If we’re in a section approximated by d1, then m1 = 1 and m2 = 0. For d2, it’s the opposite. The values of m1 and m2 can be obtained by thresholding the two extra channels.
The new channels can be thought of as weights, and they explicitly mark which distance channel is relevant in a given position. This is what they look like for the same character:
The trickiest part is around the corners: we need to make sure that the weights kick in before the distance from the other channel disappears, but not too soon, otherwise the current channel would interfere with the other side of the corner. There’s a lot of hand tuning and magic constants involved, because I couldn’t find the time yet to derive a proper formula in a principled manner, but it handles most situations well enough already. It’s also obvious that the weights are erroneously low in the concave corners, but somehow this bug never seems to lead to visible errors so I haven’t fixed it yet.
One way to visualise the interaction of the four channels is thresholding them and adding the corresponding weights and distances:
The left hand side shows the result of thresholding and adding the channels. The right hand side is the same image, but the green channel is filled with the result of evaluating the reconstruction formula. This visualisation also shows how delicate the balance is around convex curvatures when generating the distance fields with this approach. Fortunately it does the trick!
One last step towards happiness
Unfortunately, this is not the end of the story. When the above geometry is rendered at a small resolution, we get some new artifacts partly from rasterisation, partly from linear interpolation in certain cases. The first problem is caused by the fact that the corner vertices of the fill geometry are not shared with the outlines, just derived with a formula, so there’s no guarantee that the rasteriser will fill all the pixels. The second issue can happen when inner edges of the opposite channels touch. When this happens, interpolation is performed between opposite values that both represent inner areas, but their average is so low that it’s interpreted as an outer area:
As it turns out, both issues can be resolved in a single post-processing pass that detects these specific situations and patches up the texture accordingly. All-maximum pixels are spread to adjacent all-minimum pixels, and directly facing opposite channels also unified with the max function.
The bottom line
So was the final outcome worth the trouble? I believe so. The resolution necessary to correctly represent different characters highly depends on the font. Fonts with small details (e.g. small serifs) obviously require a finer resolution to reproduce with high fidelity. For instance, Times New Roman needs about 128 pixels per em, which is probably too much for most purposes. On the other hand, Droid Sans is fine with just 64 pixels per em, and Droid Serif with 72. As an extreme example, Ubuntu Regular renders nicely from just 40 pixels per em. In any case, the improvement over the simple distance field of the same resolution is quite spectacular:
In the end, I achieved the original goal. The characters are added to the atlas dynamically on demand, and they can be rendered at arbitrary sizes without getting blurry or blocky in appearance. At the moment the baking step is not very optimised, so it’s not possible to generate many characters without introducing frame drops. But it’s fast enough to be used on a loading screen, for instance.
What next?
I’m surprised how well this method works in practice despite the ad hoc approach I took in generating the distance field. However, I’m missing the robustness of the simple distance field, which doesn’t break down in surprising ways no matter what the input shape is. One alternative I’d like to explore involves a very different way of filling the channels. Instead of rendering the geometry directly as described above, I’m considering the use of diffusion curves or generalised Voronoi diagrams. Since the resolution needed to reproduce individual characters is not terribly high, it could be okay to do the baking on the CPU with a dumb brute-force algorithm at first. This would also make it possible to generate characters in a background thread, which would be ideal for games.
Very nice rendering. Did you test other fonts as well ? What is the “ad hoc approach” you’re talking about for generating the distance field ? Do you have more information concerning this technique (reference implementation) ?
Yes, I tested ten-something fonts. Generally speaking, the method works best in the absence of small details, which require too much resolution to be practical. Also, thin/light types are bad, since you need at least four texels across the cross-section of a stroke to accurately describe its shape.
The ad hoc approach simply refers to what the post just described. I started from the trivial idea of extruding the outline sections and then spent a lot of time trying to fix various artifacts instead of looking into whether I can derive my constants in a more disciplined way. Now I’m playing around with a classification algorithm that’s based on generalised discrete Voronoi diagrams, and it seems to need a lot less hand tweaking.
The reference implementation is found in the Lafonten library. It’s in Haskell, so sorry if that’s a problem! The most important part is in the
letterOutlineTriangles
andletterFillTriangles
functions, which define the geometry that’s directly rendered into the distance field.Thanks for the quick answer. I will try to adapt your code to freetype-gl (https://github.com/rougier/freetype-gl) once I have some free time…
Another related technics worth looking to cope with small size:
http://www.java-gaming.org/index.php?topic=33612.0
Pingback: Font Rendering - Dreamler Blog
You made a few tweets last year saying the following:
“Currently working on a different scheme for generating the field. Basically solving the distance equation per texel for nearby curves; this seems to be a lot more robust!”
Did anything come out of that work?
So far it’s a partial success. In most cases it works better than rendering the meshes as described in this post, but sadly there are still too many corner cases where it breaks down too badly to be truly usable. I think the core idea is solid, but I need to come up with another way of figuring out which outline sections are relevant at any chosen point.
Do you think using the following method would be better than using composite distance fields?
http://wdobbie.com/post/gpu-text-rendering-with-vector-textures/
I don’t know if they can be directly compared, because they make different trade-offs. Analytical methods like that require less memory, but they are slower to render.
Would a Closed Point Coordinate Field be of any help in solving the corner problem as it has more info than just a distance field?
http://github.prideout.net/coordinate-fields/
I don’t see how it would help with the corners by itself. In any case, my latest attempt does build a CPCF as an intermediate step to optimise the lookup of nearest curves that get stored in the final texture.
Have you tried doing creating text effects like outlines/shadows/glow using this method? I am curious how good they look since you are not doing an exact distance transform.
The usable range in the reconstructed distance field is pretty small around the outline, since getting closer to other features would lead to artifacts (whenever the closest feature switches to another one). You also need to consider that there’s a trade-off between the size of the supported range and numerical accuracy. In the end, it’s possible to use such effects, but they have to be quite limited in size.
Hello, great work with this distance field! I look on it and was pretty amazed. Is there any way to generate output bmp/png with distance font?
I’m not sure I understand your question. If you’re asking whether there’s a ready tool that will do this for you, the answer is no at the moment. I’m still experimenting with a different way of generating the four channels.
Hi, just wondering any updates on this?
I do not see any sort of code or download for the tools, so I am just curious
thanks
Well, there’s the old Haskell implementation that I used when writing this post. My more recent attempt hasn’t yielded truly useful results so far, so it’s not released anywhere.
If anyone is looking for a follow-up, this article was cited in this great paper: https://dspace.cvut.cz/bitstream/handle/10467/62770/F8-DP-2015-Chlumsky-Viktor-thesis.pdf
Pingback: The current state of the text in Augmented Reality - Niteesh Yadav's Portfolio
Pingback: Variables that affect the experience in AR - Niteesh Yadav's Portfolio