Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.
Introduction: the RIDE journal (the Review Journal of the Institute for Documentology and Scholarly Editing) aims to offer a solution to current misalignments between scholarly workflows and their evaluation and provides a forum for the critical evaluation of the methodology of digital edition projects. This time, we have been cherry picking from their latest issue (Issue 11) dedicated to the evaluation and critical improvement of tools and environments.
Ediarum is a toolbox developed for editors by the TELOTA initiative at the BBAW in Berlin to generate and annotate TEI-XML Data in German language. In his review, Andreas Mertgens touches upon issues regarding methodology and implementation, use cases, deployment and learning curve, Open Source, sustainability and extensibility of the tool, user interaction and GUI and of course a rich functional overview.
[Click ‘Read more’ for the full post!]
Introduction: Spanish scholars Pablo Ruiz Fabo and Helena Bermúdez Sabel work in this article on two case studies regarding the application of Natural Language Processing (NLP) technologies, entity linking, and Computational Linguistics methods to create corpus navigation interfaces. The authors also focus on how these technologies for automatic text analysis allow us to enrich scholarly digital editions. They include interesting points of view about analogue and digital editions, and their relation with ecdotic practice.
Introduction: The article illustrates the application of a ‘discourse-driven topic modeling’ (DDTM) to the analysis of the corpus ChronicItaly comprising several newspapers in Italian language, appeared in the USA during the time of massive migration towards America between the end of the XIX century and the first two decades of the XX (1898-1920).
The method combines both Text Modelling (™) and the discourse-historical approach (DHA) in order to get a more comprehensive representation of the ethnocultural and linguistic identity of the Italian group of migrants in the historical American context in crucial periods of time like that immediately preceding the eruption and that of the unfolding of World War I.
Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.
The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.
Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.
On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]
Introduction: The first steps into working with digital methods of text analysis are often made with beginner-friendly tools. The DARIAH-DE TopicsExplorer opens up the world of topic modeling with an easy-to-understand GUI, numerous operating options and high-quality results. The team of forText of the University of Hamburg developed a tutorial (Lerneinheit) to guide users step by step from installing the software to the first results with a sample corpus. The tutorial also contains screenshots, videos, exercises and explanations. This follows the didactic concept of forText.
Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface.
Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.
Introduction: Linked Data and Linked Open Data are gaining an increasing interest and application in many fields. A recent experiment conducted in 2018 at Furman University illustrates and discusses some of the challenges from a pedagogical perspective posed by Linked Open Data applied to research in the historical domain.
“Linked Open Data to navigate the Past: using Peripleo in class” by Chiara Palladino describes the exploitation of the search-engine Peripleo in order to reconstruct the past of four archeologically-relevant cities. Many databases, comprising various types of information, have been consulted, and the results, as highlighted in the contribution by Palladino, show both advantages and limitations of a Linked Open Data-oriented approach to historical investigations.