OpenMethods

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Menu
Skip to content
  • Home
  • About
  • Who we are
    • Editorial Team
    • Volunteer Editors
  • Join us
  • Submit a content
  • RSS feeds
  • Log in
  • Posted on August 23, 2017November 9, 2017
  • by Delphine Montoliu

From Books to Bytes: Turning Paper Dictionaries into Digital Format

https://openmethods.dariah.eu/2017/08/23/from-books-to-bytes-turning-paper-dictionaries-into-digital-format-digilex/ OpenMethods introduction to: From Books to Bytes: Turning Paper Dictionaries into Digital Format 2017-08-23 07:43:42 Introduction: This post reports the historical process until the digitization of German dictionaries. Delphine Montoliu http://digilex.hypotheses.org/132 Blog post Analysis Capture Code Collaboration Conversion Creation Data Data Recognition Digital Humanities Dissemination Encoding English Images Interaction Language Linked open data Meta-Activities Methods POS-Tagging Preservation Preservation Metadata Programming Project Management Projects Publishing Research Activities Research Objects Research Techniques Scanning Searching Sequence Alignment Software Stilistic Analysis Storage Structural Analysis Technology Preservation Text Tools Transcription Visualization Writing via bookmarklet

Introduction by OpenMethods Editor (Delphine Montoliu): This post reports the historical process until the digitization of German dictionaries.

When a small team of digital humanists at Trier University started digitising the Grimm in 1998, they felt like Jacob Grimm must have felt about 150 years ago. 300 million printed characters with the weight of centuries seemed to weigh down on their shoulders. A mere image digitisation appeared to be the easiest way of publishing the dictionary online. That, however, would have made the dictionary simply visible and browsable online. But what about making the content accessible and searchable, what about making the existing information networks alive? Should the team try to convert images into machine-encoded text via optical character recognition (OCR)? The results of several trials were utterly disappointing. The quality of the print was not good enough. Faced with more than 67.000 columns that are densely populated with about 300.000.000 poorly printed characters, partly with diacritics, partly coming from different alphabets, the OCR software at the end of the 1990s was not up to the task. It produced so many mistakes that proofreading would have been extremely time consuming and expensive.

 

Original publication date: 12/01/2016.

Source: From Books to Bytes: Turning Paper Dictionaries into Digital Format – DigiLex

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Posted in Analysis, Capture, Code, Collaboration, Conversion, Creation, Data, Data Recognition, Digital Humanities, Dissemination, Encoding, English, Images, Interaction, Language, Languages, Linked open data, Meta-Activities, Methods, POS-Tagging, Preservation, Preservation Metadata, Programming, Project Management, Projects, Publishing, Research Activities, Research Objects, Research Techniques, Scanning, Searching, Sequence Alignment, Software, Stilistic Analysis, Storage, Structural Analysis, Technology Preservation, Text, Tools, Transcription, Visualization, WritingTagged via bookmarklet

Post navigation

Prev Using Omeka to Design Digital Art History Projects
Next Scrum Methodology & it’s Practical Use at CVCE

logo_isidoreIsidore suggestions

    Interested in blogging about your research? The Digital Humanities Tools and Methods blog is for you!

    In cooperation with

    OPERAS

    Categories

    Recent Posts

    • Cultural Ontologies: the ArCo Knowledge Graph.
    • OpenMethods Spotlights #2 : Interview with Luise Borek and Canan Hastik about TaDiRAH
    • Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama
    • Worthäufigkeiten als Quelle für die Geschichtswissenschaft? – Einblicke in die Digital Humanities
    • Fragmentarium: a Model for Digital Fragmentology

    Archives

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    © Copyright 2017-2018 – OpenMethods
    Privacy Notice
    Hosted by – We use
    HaS has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675570
    Bezel Theme by SimpleFreeThemes ⋅ Powered by WordPress