Towards Scientific Workflows and Computer Simulation as a Method in Digital Humanities – Digitale Bibliothek – Gesellschaft für Informatik e.V.

Introduction: The explore! project tests computer stimulation and text mining on autobiographic texts as well as the reusability of the approach in literary studies. To facilitate the application of the proposed method in broader context and to new research questions, the text analysis is performed by means of scientific workflows that allow for the documentation, automation, and modularization of the processing steps. By enabling the reuse of proven workflows, the goal of the project is to enhance the efficiency of data analysis in similar projects and further advance collaboration between computer scientists and digital humanists.

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology

Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities

Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process. 


Introduction: Know Your Implementation: Subgraphs in Literary Networks shows how the online tool ezlinavis can give account of detached subgraphs while working with network analysis of literary texts. For this specific case, Goethe’s Faust, Part One (1808) was analyzed and visualized with ezlinavis, and average distances were calculated giving some new results to this research in relation to Faust as protagonist.