Introduction: Spanish scholars Pablo Ruiz Fabo and Helena Bermúdez Sabel work in this article on two case studies regarding the application of Natural Language Processing (NLP) technologies, entity linking, and Computational Linguistics methods to create corpus navigation interfaces. The authors also focus on how these technologies for automatic text analysis allow us to enrich scholarly digital editions. They include interesting points of view about analogue and digital editions, and their relation with ecdotic practice.
Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.
Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.