Research COVID-19 with AVOBMAT

Research COVID-19 with AVOBMAT

Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.

Navegación de corpus a través de anotaciones lingüísticas automáticas obtenidas por Procesamiento del Lenguaje Natural: de anecdótico a ecdótico

Navegación de corpus a través de anotaciones lingüísticas automáticas obtenidas por Procesamiento del Lenguaje Natural: de anecdótico a ecdótico

Introduction: Spanish scholars Pablo Ruiz Fabo and Helena Bermúdez Sabel work in this article on two case studies regarding the application of Natural Language Processing (NLP) technologies, entity linking, and Computational Linguistics methods to create corpus navigation interfaces. The authors also focus on how these technologies for automatic text analysis allow us to enrich scholarly digital editions. They include interesting points of view about analogue and digital editions, and their relation with ecdotic practice.

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920

Introduction: The article illustrates the application of a ‘discourse-driven topic modeling’ (DDTM) to the analysis of the corpus ChronicItaly comprising several newspapers in Italian language, appeared in the USA during the time of massive migration towards America between the end of the XIX century and the first two decades of the XX (1898-1920).

The method combines both Text Modelling (™) and the discourse-historical approach (DHA) in order to get a more comprehensive representation of the ethnocultural and linguistic identity of the Italian group of migrants in the historical American context in crucial periods of time like that immediately preceding the eruption and that of the unfolding of World War I.

Web Scraping with Python for Beginners | The Digital Orientalist

Web Scraping with Python for Beginners | The Digital Orientalist

Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.

Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.

Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.

On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]

Topic Modeling mit dem DARIAH Topics Explorer | forTEXT

Topic Modeling mit dem DARIAH Topics Explorer | forTEXT

Introduction: The first steps into working with digital methods of text analysis are often made with beginner-friendly tools. The DARIAH-DE TopicsExplorer opens up the world of topic modeling with an easy-to-understand GUI, numerous operating options and high-quality results. The team of forText of the University of Hamburg developed a tutorial (Lerneinheit) to guide users step by step from installing the software to the first results with a sample corpus. The tutorial also contains screenshots, videos, exercises and explanations. This follows the didactic concept of forText.

Exploring internet with Hyphe

Exploring internet with Hyphe

Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface. 

Analyzing Documents with TF-IDF | Programming Historian

Analyzing Documents with TF-IDF | Programming Historian

Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.

Approaching Linked Data

Approaching Linked Data

Introduction: Linked Data and Linked Open Data are gaining an increasing interest and application in many fields. A recent experiment conducted in 2018 at Furman University illustrates and discusses some of the challenges from a pedagogical perspective posed by Linked Open Data applied to research in the historical domain.

“Linked Open Data to navigate the Past: using Peripleo in class” by Chiara Palladino describes the exploitation of the search-engine Peripleo in order to reconstruct the past of four archeologically-relevant cities. Many databases, comprising various types of information, have been consulted, and the results, as highlighted in the contribution by Palladino, show both advantages and limitations of a Linked Open Data-oriented approach to historical investigations.