TTF CMAP Cleanup

Spent a bit of time over the weekend cleaning up my CMAP loading code for TTF loader in my DOS game library. This is stuff that's not particularly glamourous, even from the perspective of font graphics. All it's doing right now is mapping characters to glyph indices. Haven't even started on the glyph loading code itself.

There's a hell of a lot in the TTF format, and I'm only implementing a subset of it, but I honestly wasn't expecting to run into this kind of complexity before even getting to the glyph data. The character-to-glyph mapping stuff is hard.

There are about a dozen different ways to map characters to glyphs, and I've spent most of the time on this project (the font loader, not the DOS game overall) working on format 4. It stores a series of "segments" which refer to a range of characters, and each of those segments can map to a contiguous range of glyphs, OR they can just map to an array of character IDs hanging off at the end of the buffer. The offsets into that buffer are offsets from the point you'd be at in the file if you were looking at a specific index in the array corresponding to that character.

I get the impression that TTFs were just meant to be loaded into memory as-is and then to have all these character mapping shenanigans happen in-memory, with the way these offsets work. Of course, that doesn't work out so great on little endian machines, because everything in the file is stored big endian. So no matter what way you slice it, there's going to be some data massaging happening before you can really use it on little endian.

Anyway. Yikes. I didn't know font loading was going to be quite this silly. But I'm still having fun, so... onwards!

Posted: 2020-01-12

Tags: dos, devlog