For millions of legacy documents, correct rendering depends on resources such as fonts that are not generally embedded within the document structure, and may not be adequately controlled for during archival ingest procedures. Large document collections depend on thousands of unique fonts not available on a common desktop workstation, which typically has between 100 and 200 fonts. Silent substitution of fonts performed by applications such as Microsoft Office can yield poorly rendered documents and may result in significant information loss.
We use a large collection of 225,000 Word documents to assess the difficulty of matching font requirements with a database of fonts. We describe the identifying information contained in common font formats, font requirements stored in Word documents, the API provided by Windows to support font requests by applications, the documented substitution algorithms used by Windows when requested fonts are not available, and the ways in which support software might be used to control font substitution in a preservation environment.