Introduction by OpenMethods Editor: Sustainability questions such as how to maintain digital project outputs after the funding period, or how to keep aging code and infrastructure that are important for our research up-to-date are among the major challenges DH projects are facing today. This post gives us a sneak peek into the solutions and working practices from the Center for Digital Humanities at Princeton. In their approach to build capacity for sustaining DH projects and preserve access to data and software, they view projects as collaborative and process-based scholarship. Therefore, their focus is on implementing project management workflows and documentation tools that can be flexibly applied to projects of different scopes and sizes and also allow for further refinement in due case. By sharing these resources together with their real-life use cases in DH projects, their aim is to benefit other scholarly communities and sustain a broader conversation about these tricky issues.
Introduction: Natural Language Processing techniques applied to historical languages have been attracting an increasing interest in the academic community. Many online resources, like annotated corpora, are already available, as well as various methodologies and tools. However, for digital philologist dealing with these languages it appears important to rely on a specific pipeline, on a sequence of working steps and applications thanks to which accomplishing an effective text analysis.
This need is addressed by the Classical Language Toolkit, as illustrated by Patrick J. Burns in his contribution “Building a Text Analysis Pipeline for Classical Languages”.
Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface.
Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.
Introduction: Linked Data and Linked Open Data are gaining an increasing interest and application in many fields. A recent experiment conducted in 2018 at Furman University illustrates and discusses some of the challenges from a pedagogical perspective posed by Linked Open Data applied to research in the historical domain.
“Linked Open Data to navigate the Past: using Peripleo in class” by Chiara Palladino describes the exploitation of the search-engine Peripleo in order to reconstruct the past of four archeologically-relevant cities. Many databases, comprising various types of information, have been consulted, and the results, as highlighted in the contribution by Palladino, show both advantages and limitations of a Linked Open Data-oriented approach to historical investigations.
Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.
Introduction: Digital humanists looking for tools in order to visualize and analyze texts can rely on ‘Voyant Tools’ (https://voyant-tools.org), a software package created by S.Sinclair and G.Rockwell. Online resources are available in order to learn how to use Voyant. In this post, we highlight two of them: “Using Voyant-Tools to Formulate Research Questions for Textual Data” by Filipa Calado (GC Digital Fellows and the tutorial “Investigating texts with Voyant” by Miriam Posner.
Introduction: In this article, Alejandro Bia Platas and Ramón P. Ñeco García introduce TEIdown, an extension of the Markdown syntax that aims at creating XML-TEI documents, and transformation programs. TEIdown helps editors to validate and find errors in TEI documents.
Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.
Introduction: In this article, Nicolás Quiroga reflects on the fundamental place of the note-taking practice in the work of historians. The artcile also reviews some tools for classifying information -which do not substantially affect the note-taking activity – and suggests how the use of these tools can create new digital approaches for historians.