OpenMethods

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Menu
Skip to content
  • Home
  • About
  • Who we are
    • Editorial Team
    • Volunteer Editors
  • Join us
  • Submit a content
  • RSS feeds
  • Log in
  • Posted on August 23, 2017November 9, 2017
  • by Delphine Montoliu

From Books to Bytes: Turning Paper Dictionaries into Digital Format

https://openmethods.dariah.eu/2017/08/23/from-books-to-bytes-turning-paper-dictionaries-into-digital-format-digilex/ OpenMethods introduction to: From Books to Bytes: Turning Paper Dictionaries into Digital Format 2017-08-23 07:43:42 Introduction: This post reports the historical process until the digitization of German dictionaries. Delphine Montoliu http://digilex.hypotheses.org/132 Blog post Analysis Capture Code Collaboration Conversion Creation Data Data Recognition Digital Humanities Dissemination Encoding English Images Interaction Language Linked open data Meta-Activities Methods POS-Tagging Preservation Preservation Metadata Programming Project Management Projects Publishing Research Activities Research Objects Research Techniques Scanning Searching Sequence Alignment Software Stilistic Analysis Storage Structural Analysis Technology Preservation Text Tools Transcription Visualization Writing via bookmarklet

Introduction by OpenMethods Editor (Delphine Montoliu): This post reports the historical process until the digitization of German dictionaries.

When a small team of digital humanists at Trier University started digitising the Grimm in 1998, they felt like Jacob Grimm must have felt about 150 years ago. 300 million printed characters with the weight of centuries seemed to weigh down on their shoulders. A mere image digitisation appeared to be the easiest way of publishing the dictionary online. That, however, would have made the dictionary simply visible and browsable online. But what about making the content accessible and searchable, what about making the existing information networks alive? Should the team try to convert images into machine-encoded text via optical character recognition (OCR)? The results of several trials were utterly disappointing. The quality of the print was not good enough. Faced with more than 67.000 columns that are densely populated with about 300.000.000 poorly printed characters, partly with diacritics, partly coming from different alphabets, the OCR software at the end of the 1990s was not up to the task. It produced so many mistakes that proofreading would have been extremely time consuming and expensive.

 

Original publication date: 12/01/2016.

Source: From Books to Bytes: Turning Paper Dictionaries into Digital Format – DigiLex

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Posted in Analysis, Capture, Code, Collaboration, Conversion, Creation, Data, Data Recognition, Digital Humanities, Dissemination, Encoding, English, Images, Interaction, Language, Languages, Linked open data, Meta-Activities, Methods, POS-Tagging, Preservation, Preservation Metadata, Programming, Project Management, Projects, Publishing, Research Activities, Research Objects, Research Techniques, Scanning, Searching, Sequence Alignment, Software, Stilistic Analysis, Storage, Structural Analysis, Technology Preservation, Text, Tools, Transcription, Visualization, WritingTagged via bookmarklet

Post navigation

Prev Using Omeka to Design Digital Art History Projects
Next Scrum Methodology & it’s Practical Use at CVCE

logo_isidoreIsidore suggestions

    Interested in blogging about your research? The Digital Humanities Tools and Methods blog is for you!

    In cooperation with

    OPERAS

    Categories

    Recent Posts

    • Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS
    • Getting started with OpenRefine – Digital Humanities 201
    • Annotation Guidelines For narrative levels, time features, and subjective narration styles in fiction (SANTA 2).
    • GitHub – CateAgostini/IIIF
    • Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods – Journal of Digital History

    Archives

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    OpenMethods © 2017-2018.
    All site content, except where otherwise noted, is licensed under a CC BY license. This is in line with DARIAH’s Open Access Policy
    Privacy Notice
    Hosted by – We use
    HaS received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675570
    Bezel Theme by SimpleFreeThemes ⋅ Powered by WordPress