Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.
Introduction: This blog post not only presents a technique of measuring poetic meter and using it to plot distances between poets, but it also provides an insight into the theoretical and empirical process leading to those results.
Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.
Introduction: This article introduces a novel way to unfold and discover patterns in complex texts, at the intersection between macro and micro analytics. This technique is called TIC (Transcendental Information Cascades) allows analysis of how a cast of characters is generated and managed dynamically over the duration of a text.
Introduction: Concepts are described differently in different times, and the way people talk about them reveals much about how people perceive these concepts. Researchers of the eScience Center in Amsterdam together with scholars from Utrecht University developed a visual tool to gain insight into such concept shift.
Introduction: The post discusses the challenges that traditional philological approach has to face in creating digital corpora of critical editions of nonvernacular medieval works.
Introduction: This paper explores some the new toolchains offered by the Open Web Platform and alternatives to be considered in the daily editing workflows.
Introduction: This software paper in Polish describes “Magik” (Magician), a tool for textual scholars which allows for comparisons of different variants of the same text.
Introduction: This software paper describes ‘stylo’ – an R package for stylometric research and text processing.
Introduction: This post highlights digital methods and standards for an efficient analysis of historical data.