GROBID: when data extraction becomes a suite

GROBID: when data extraction becomes a suite

Introduction: GROBID is an already well-known open source tool in the field of Digital Humanities, originally built to extract and parse bibliographical metadata from scholarly works. The acronym stands for GeneRation Of BIbliographic Data.
Shaped by use cases and adoptions to a range of different DH and non-DH settings, the tool has been progressively evolved into a suite of technical features currently applied to various fields, like that of journals, dictionaries and archives.
[Click ‘Read more’ for the full post!]

Research COVID-19 with AVOBMAT

Research COVID-19 with AVOBMAT

Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.

Diseño de corpus literario para análisis cuantitativos

Diseño de corpus literario para análisis cuantitativos

Introduction: In this article, José Calvo Tello offers a methodological guide on data curation for creating literary corpus for quantitative analysis. This brief tutorial covers all stages of the curation and creation process and guides the reader towards practical cases from Hispanic literature. The author deals with every single step in the creation of a literary corpus for quantitative analysis: from digitization, metadata, automatic processes for cleaning and mining the texts, to licenses, publishing and achiving/long term preservation.

Approaching Linked Data

Approaching Linked Data

Introduction: Linked Data and Linked Open Data are gaining an increasing interest and application in many fields. A recent experiment conducted in 2018 at Furman University illustrates and discusses some of the challenges from a pedagogical perspective posed by Linked Open Data applied to research in the historical domain.

“Linked Open Data to navigate the Past: using Peripleo in class” by Chiara Palladino describes the exploitation of the search-engine Peripleo in order to reconstruct the past of four archeologically-relevant cities. Many databases, comprising various types of information, have been consulted, and the results, as highlighted in the contribution by Palladino, show both advantages and limitations of a Linked Open Data-oriented approach to historical investigations.

Evaluating named entity recognition tools for extracting social networks from novels

Evaluating named entity recognition tools for extracting social networks from novels

Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.

El archivo y la toma de notas. El lugar del software en la interpretación histórica

El archivo y la toma de notas. El lugar del software en la interpretación histórica

Introduction: In this article, Nicolás Quiroga reflects on the fundamental place of the note-taking practice in the work of historians. The artcile also reviews some tools for classifying information -which do not substantially affect the note-taking activity – and suggests how the use of these tools can create new digital approaches for historians.