Introduction: In this article, José Calvo Tello offers a methodological guide on data curation for creating literary corpus for quantitative analysis. This brief tutorial covers all stages of the curation and creation process and guides the reader towards practical cases from Hispanic literature. The author deals with every single step in the creation of a literary corpus for quantitative analysis: from digitization, metadata, automatic processes for cleaning and mining the texts, to licenses, publishing and achiving/long term preservation.
Introduction: The first steps into working with digital methods of text analysis are often made with beginner-friendly tools. The DARIAH-DE TopicsExplorer opens up the world of topic modeling with an easy-to-understand GUI, numerous operating options and high-quality results. The team of forText of the University of Hamburg developed a tutorial (Lerneinheit) to guide users step by step from installing the software to the first results with a sample corpus. The tutorial also contains screenshots, videos, exercises and explanations. This follows the didactic concept of forText.
Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.
https://openmethods.dariah.eu/2019/06/28/prensa-digitalizada-herramientas-y-metodos-digitales-para-una-investigacion-a-escala/ OpenMethods introduction to: Prensa digitalizada: herramientas y métodos digitales para una investigación a escala 2019-06-28 20:52:53 Gimena Del Rio http://revistas.uned.es/index.php/RHD/article/view/22527 Blog post Editing Literature Publishing Research Objects Spanish Aprendizaje…
Introduction: The explore! project tests computer stimulation and text mining on autobiographic texts as well as the reusability of the approach in literary studies. To facilitate the application of the proposed method in broader context and to new research questions, the text analysis is performed by means of scientific workflows that allow for the documentation, automation, and modularization of the processing steps. By enabling the reuse of proven workflows, the goal of the project is to enhance the efficiency of data analysis in similar projects and further advance collaboration between computer scientists and digital humanists.
Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.
Introduction: This blog post not only presents a technique of measuring poetic meter and using it to plot distances between poets, but it also provides an insight into the theoretical and empirical process leading to those results.
Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.
Introduction: The rperseus package provides classicists and other people interested in ancient philology and exegesis with corpora of texts from the ancient world (based on the Perseus Digital Library), combined with a toolkit designed to compare passages and selected words with parallels where the same expressions or words occur.
Introduction: Know Your Implementation: Subgraphs in Literary Networks shows how the online tool ezlinavis can give account of detached subgraphs while working with network analysis of literary texts. For this specific case, Goethe’s Faust, Part One (1808) was analyzed and visualized with ezlinavis, and average distances were calculated giving some new results to this research in relation to Faust as protagonist.