From Books to Bytes: Turning Paper Dictionaries into Digital Format

Introduction by OpenMethods Editor (Delphine Montoliu): This post reports the historical process until the digitization of German dictionaries.

When a small team of digital humanists at Trier University started digitising the Grimm in 1998, they felt like Jacob Grimm must have felt about 150 years ago. 300 million printed characters with the weight of centuries seemed to weigh down on their shoulders. A mere image digitisation appeared to be the easiest way of publishing the dictionary online. That, however, would have made the dictionary simply visible and browsable online. But what about making the content accessible and searchable, what about making the existing information networks alive? Should the team try to convert images into machine-encoded text via optical character recognition (OCR)? The results of several trials were utterly disappointing. The quality of the print was not good enough. Faced with more than 67.000 columns that are densely populated with about 300.000.000 poorly printed characters, partly with diacritics, partly coming from different alphabets, the OCR software at the end of the 1990s was not up to the task. It produced so many mistakes that proofreading would have been extremely time consuming and expensive.

Original publication date: 12/01/2016.

Source: From Books to Bytes: Turning Paper Dictionaries into Digital Format – DigiLex

Share this: