Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.
Category: Machine Learning
Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.
Introduction: Ecologists are much aided by historical sources of information on human-animal interaction. But how does one cope with the plethora of different descriptions for the same animal in the historic record? A Dutch research group reports on how to aggregate ‘Bunzings’, ‘Ullingen’, and ‘Eierdieven’ (‘Egg-thieves’) into a useful historical ecology knowledge base.
Introduction: This software paper describes ‘stylo’ – an R package for stylometric research and text processing.
Introduction: This article traces complex genealogy of distant reading to social-scientific approaches in literary studies.
Introduction: This post analyses the sequence alignment text/image and the quality of manuscript transcriptions.
Introduction: This post outlines retro-digitalisation and academic analysis of paper-based documents.
Introduction: This is a report conference on musicology and encoding.
Introduction: This post outlines a conference on an experiment of oral data storage.