GROBID: when data extraction becomes a suite

GROBID: when data extraction becomes a suite

Introduction: GROBID is an already well-known open source tool in the field of Digital Humanities, originally built to extract and parse bibliographical metadata from scholarly works. The acronym stands for GeneRation Of BIbliographic Data.
Shaped by use cases and adoptions to a range of different DH and non-DH settings, the tool has been progressively evolved into a suite of technical features currently applied to various fields, like that of journals, dictionaries and archives.
[Click ‘Read more’ for the full post!]

Zur Digitalisierung der Materialität mittelalterlicher Objekte. Ein Bericht aus der wissenschaftsgeschichtlichen Werkstatt

Zur Digitalisierung der Materialität mittelalterlicher Objekte. Ein Bericht aus der wissenschaftsgeschichtlichen Werkstatt

Introduction: In this blog post, Michael Schonhardt explores and evaluates a range of freely available, Open Source tools – Inkscape, Blender, Stellarium, Sketchup – that enable the digital, 3D modelling of medieval scholarly objects. These diverse tools bring easily implementable solutions for both the analysis and the communication of results of object-related cultural studies and are especially suitable for projects with small budgets.

When history meets technology. impresso: an innovative corpus-oriented perspective.

When history meets technology. impresso: an innovative corpus-oriented perspective.

Historical newspapers, already available in many digitized collections, may represent a significant source of information for the reconstruction of events and backgrounds, enabling historians to cast new light on facts and phenomena, as well as to advance new interpretations. Lausanne, University of Zurich and C2DH Luxembourg, the ‘impresso – Media Monitoring of the Past’ project wishes to offer an advanced corpus-oriented answer to the increasing need of accessing and consulting collections of historical digitized newspapers.
[…] Thanks to a suite of computational tools for data extraction, linking and exploration, impresso aims at overcoming the traditional keyword-based approach by means of the application of advanced techniques, from lexical processing to semantically deepened n-grams, from data modelling to interoperability.
[Click ‘Read more’ for the full post!]

Research COVID-19 with AVOBMAT

Research COVID-19 with AVOBMAT

Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.

Automatic annotation of incomplete and scattered bibliographical references in Digital Humanities papers

Automatic annotation of incomplete and scattered bibliographical references in Digital Humanities papers

The reviewed article presents the project BILBO and illustrates the application of several appropriate machine-learning techniques to the constitution of proper reference corpora and the construction of efficient annotation models. In this way, solutions are proposed for the problem of extracting and processing useful information from bibliographic references in digital documentation whatever their bibliographic styles are. It proves the usefulness and high degree of accuracy of CRF techniques, which involve finding the most effective set of features (including three types of features: input, local and global features) of a given corpus of well-structured bibliographical data (with labels such as surname, forename or title). Moreover, this approach has not only been proven efficient when applied to such traditional, well-structured bibliographical data sets, but it also originally contributes to the processing of more complicated, less-structured references such as the ones contained in footnotes by applying SVM with new features for sequence classification.

[Click ‘Read more’ for the full post.]