The First of May in German Literature

The First of May in German Literature

Introduction by OpenMethods Editor (Erzsébet Tóth-Czifra): Research on date extractions from literature brings us closer to answering big questions of “when literature takes place”.  As Frank Fischer’s blog post, First of May in German literature shows, beyond mere quantification, this line of research also yields insights on the cultural significance of certain dates. In this case, the significance of 1st of May in German literature (as reflected in the “Corpus of German-Language Fiction” dataset) was determined with the help of a freely accessible data set and the open access tool HeidelTime. The brief description of the workflow is a smart demonstration of the potential of open DH methods and data sharing in sustainable ways.

Bonus one: the post starts out from briefly touching upon some of Frank’s public humanities activities.

Bonus two: mention of the Tiwoli (“Today in World Literature”) app, a fun side product built on to pof the date extraction research.

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

Introduction: This blog post by Lucy Havens presents a sentiment analysis of over 2000 Times Music Reviews using freely available tools: defoe for building the corpus of reviews, VADER for sentiment analysis and Jupiter Notebooks to provide a rich documentation and to connect the different components of the analysis. The description of the workflow comes with tool and method criticism reflections, including an outlook how to improve and continue to get better and more results.

Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama

Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama

Introduction: The DraCor ecosystem encourages various approaches to the browsing and consultation of the data collected in the corpora, like those detailed in the Tools section: the Shiny DraCor app (https://shiny.dracor.org/), along with the SPARQL queries and the Easy Linavis interfaces (https://dracor.org/sparql and https://ezlinavis.dracor.org/ respectively). The project, thus, aims at creating a suitable digital environment for the development of an innovative way to approach literary corpora, potentially open to collaborations and interactions with other initiatives thanks to its ontology and Linked Open data-based nature.
[Click ‘Read more’ for the full post!]

Web Scraping with Python for Beginners | The Digital Orientalist

Web Scraping with Python for Beginners | The Digital Orientalist

Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.

Diseño de corpus literario para análisis cuantitativos

Diseño de corpus literario para análisis cuantitativos

Introduction: In this article, José Calvo Tello offers a methodological guide on data curation for creating literary corpus for quantitative analysis. This brief tutorial covers all stages of the curation and creation process and guides the reader towards practical cases from Hispanic literature. The author deals with every single step in the creation of a literary corpus for quantitative analysis: from digitization, metadata, automatic processes for cleaning and mining the texts, to licenses, publishing and achiving/long term preservation.

A World of Possibilities: a corpus-based approach to the diachrony of modality in Latin

A World of Possibilities: a corpus-based approach to the diachrony of modality in Latin

Introduction: Hosted at the University of Lausanne, “A world of possibilities. Modal pathways over an extra-long period of time: the diachrony in the Latin language” (WoPoss) is a project under development exploiting a corpus-based approach to the study and reconstruction of the diachrony of modality in Latin.
Following specific annotation guidelines applied to a set of various texts pertaining to the time span between 3rd century BCE and 7th century CE, the work team lead by Francesca Dell’Oro aims at analyzing the patterns of modality in the Latin language through a close consideration of lexical markers.

Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.

Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.

On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]

Approaching Linked Data

Approaching Linked Data

Introduction: Linked Data and Linked Open Data are gaining an increasing interest and application in many fields. A recent experiment conducted in 2018 at Furman University illustrates and discusses some of the challenges from a pedagogical perspective posed by Linked Open Data applied to research in the historical domain.

“Linked Open Data to navigate the Past: using Peripleo in class” by Chiara Palladino describes the exploitation of the search-engine Peripleo in order to reconstruct the past of four archeologically-relevant cities. Many databases, comprising various types of information, have been consulted, and the results, as highlighted in the contribution by Palladino, show both advantages and limitations of a Linked Open Data-oriented approach to historical investigations.