Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.
Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.
Introduction: The rperseus package provides classicists and other people interested in ancient philology and exegesis with corpora of texts from the ancient world (based on the Perseus Digital Library), combined with a toolkit designed to compare passages and selected words with parallels where the same expressions or words occur.
Introduction: This blog post presents “TEI Simple”, a framework developed to ensure a simpler interaction between TEI and other formats, and to enable easier customization.
Introduction: Ecologists are much aided by historical sources of information on human-animal interaction. But how does one cope with the plethora of different descriptions for the same animal in the historic record? A Dutch research group reports on how to aggregate ‘Bunzings’, ‘Ullingen’, and ‘Eierdieven’ (‘Egg-thieves’) into a useful historical ecology knowledge base.
Introduction: This report (available in English, French, German, Polish and Spanish) summarizes the findings of a web-based survey conducted in 2014/2015 by the Digital Methods and Practices Observatory (DiMPO), a DARIAH working group
Introduction: The article discusses how letters are being used across the disciplines, identifying similarities and differences in transcription, digitisation and annotation practices. It is based on a workshop held after the end of the project Digitising experiences of migration: the development of interconnected letters collections (DEM). The aims were to examine issues and challenges surrounding digitisation, build capacity relating to correspondence mark-up, and initiate the process of interconnecting resources to encourage cross-disciplinary research. Subsequent to the DEM project, TEI templates were developed for capturing information within and about migrant correspondence, and visualisation tools were trialled with metadata from a sample of letter collections. Additionally, as a demonstration of how the project’s outputs could be repurposed and expanded, the correspondence metadata that was collected for DEM was added to a more general correspondence project, Visual Correspondence.
Introduction: This post explains the benefits of using BEACON for data enrichment and increased visibility, on the example of Bibliografie deutsch-jüdische Geschichte Nordrhein-Westfalen.
Introduction: The post discusses the challenges that traditional philological approach has to face in creating digital corpora of critical editions of nonvernacular medieval works.
Introduction: This paper describes a project of applying LOD on the traditional catalog metadata.