Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities

https://openmethods.dariah.eu/2018/06/11/attributing-authorship-in-the-noisy-digitized-correspondence-of-jacob-and-wilhelm-grimm-digital-humanities/ OpenMethods introduction to: Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities 2018-06-11 14:10:43 Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.  Joris van Zundert https://www.frontiersin.org/article/10.3389/fdigh.2018.00004/full Blog post Analysis Capture Cleanup Content Analysis Data Data Recognition Digital Humanities Enrichment German Literature Machine Learning Manuscript Methods Persons Research Activities Research Objects Research Techniques Scanning Stilistic Analysis Text Text Bearing Objects Transcription via bookmarklet

Introduction by OpenMethods Editor (Joris van Zundert): Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.

This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Character Recognition), reporting on the effect this noise has on the analyses necessary to computationally identify the different writing style of the two brothers.

Source: Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities

Original date of publication: 05/04/2018

Author: Author on Source

Drs. Joris J. van Zundert (1972) is a senior researcher and developer in humanities computing. He holds a research position in the department of literary studies at the Huygens Institute for the History of The Netherlands, a research institute of The Netherlands Royal Academy of Arts and Sciences (KNAW). His main interest as a researcher and developer is in the possibilities of computational algorithms for the analysis of literary and historic texts, and the nature and properties of humanities information and data modeling. His current research focuses on computer science and humanities interaction and the tensions between hermeneutics and ‘big data’ approaches.