Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.
Introduction: In this article, Nicolás Quiroga reflects on the fundamental place of the note-taking practice in the work of historians. The artcile also reviews some tools for classifying information -which do not substantially affect the note-taking activity – and suggests how the use of these tools can create new digital approaches for historians.
Introduction: There is a postulated level of anthropomorphism where people feel uncanny about the appearance of a robot. But what happens if digital facsimiles and online editions become nigh indistinguishable from the real, yet materially remaining so vastly different? How do we ethically provide access to the digital object without creating a blindspot and neglect for the real thing. A question that keeps digital librarian Dot Porter awake and which she ponders in this thoughtful contribution.
Introduction: The world of R consists of innumerous packages. Most of them have very little download rates because they are limited to certain functions as part of a larger argument. Based on a surprising experience with the small package clipr Matthew Lincoln shares his thoughts about this reception phenomenon especially in the digital humanities.
Introduction: The Research Software Directory of the Netherlands eScience Institute provides easy access to software, source code and its documentation. More importantly, it makes it easy to cite software, which is highly advisable when using software to derive research results. The Research Software Directory positions itself as a platform that eases scientific referencing and reproducibility of software based research—good peer praxis that is still underdeveloped in the humanities.
Introduction: This lesson by Marten Düring from the “Programming Historian-Website” gently introduces novices to the topic to Network Visualisation of Historical Sources. As a case study it covers not only the general advantages of network visualisation for humanists but also a step-by-step explanation of the process from extraction of the data until the visualization (using the Palladio-tool). This lesson has also been translated into Spanish and includes many useful references for further reading.
Introduction: Standards are best explained in real life use cases. The Parthenos Standardization Survival Kit is a collection of research use case scenarios illustrating best practices in Digital Humanities and Heritage research. It is designed to support researchers in selecting and using the appropriate standards for their particular disciplines and workflows. The latest addition to the SSK is a scenario for creating a born-digital dictionary in TEI.
Introduction: The explore! project tests computer stimulation and text mining on autobiographic texts as well as the reusability of the approach in literary studies. To facilitate the application of the proposed method in broader context and to new research questions, the text analysis is performed by means of scientific workflows that allow for the documentation, automation, and modularization of the processing steps. By enabling the reuse of proven workflows, the goal of the project is to enhance the efficiency of data analysis in similar projects and further advance collaboration between computer scientists and digital humanists.
Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.
Introduction: This article assesses the issue of personalisation in internet research, raising important issues of how should we interpret users’ choices and how to account for the potential platform-design influence in your research workflow.