LoGaRT and RISE: Two multilingual tools from the Max Planck Institute for the History of Science

LoGaRT and RISE: Two multilingual tools from the Max Planck Institute for the History of Science

Introduction: This post introduces two tools developed by the Max Planck Institute for the History of Science, LoGaRT and RISE with a focus on Asia and Eurasia. […]The concept of LoGaRT – treating local gazetteers as “databases” by themselves – is an innovative and pertinent way to articulate the essence of the platform: providing opportunities for multi-level analysis from the close reading of the sources (using, for example, the carousel mode) to the large-scale, “bird’s eye view” of the materials across geographical and temporal boundaries. Local gazetteers are predominantly textual sources – this characteristic of the collection is reflected in the capabilities of LoGaRT as well, since some of its key capabilities include data search (using Chinese characters), collection and analysis, as well as tagging and dataset comparison. That said, LoGaRT also offers integrated visualization tools and supports the expansion of the collection and tagging features to the images used in a number of gazetteers. The opportunity to smoothly intertwine these visual and textual collections with Chinese historical maps (see CHMap) is an added, and much welcome, advantage of the tool, which helps to develop sophisticated and multifaceted analyses.
[Click ‘Read more’ for the full post!]

Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS

Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS

https://openmethods.dariah.eu/2022/05/11/open-source-tool-allows-users-to-create-interactive-timelines-digital-humanities-at-a-state/ OpenMethods introduction to: Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS 2022-05-11 07:28:36 Marinella Testori Blog post Creation Data Designing Digital Humanities English Methods…

GitHub – CateAgostini/IIIF

GitHub – CateAgostini/IIIF

Introduction: In this resource, Caterina Agostini, PhD in Italian from Rutgers University, Project Manager at The Center for Digital Humanities at Princeton shares two handouts of workshops she organized and co-taught on the International Image Interoperability Framework (IIIF). They provide a gentle introduction to IIIF and clear overview of features (displaying, editing, annotating, sharing and comparing images along universal standards), examples and resources. The handouts could be of interest to anyone interested in the design and teaching of Open Educational Resources on IIF.
[Click ‘Read more’ for the full post!]

BERT for Humanists: a deep learning language model  meets DH

BERT for Humanists: a deep learning language model meets DH

Introduction: Awarded as Best Long Paper at the 2019 NACCL (North American Chapter of the Association for Computational Linguistics) Conference, the contribution by Jacob Devlin et al. provides an illustration of “BERT: Pre-training of Deep Biredictional Transformers for Language Understanding” (https://aclanthology.org/N19-1423/).

As highlighted by the authors in the abstract, BERT is a “new language representation model” and, in the past few years, it has become widespread in various NLP applications; for example, a project exploiting it is CamemBERT (https://camembert-model.fr/), regarding French. 

In June 2021, a workshop organized by David Mimno, Melanie Walsh and Maria Antoniak (https://melaniewalsh.github.io/BERT-for-Humanists/workshop/) pointed out how to use BERT in projects related to digital humanities, in order to deal with word similarity and classification classification while relying on Phyton-based HuggingFace transformers library. (https://melaniewalsh.github.io/BERT-for-Humanists/tutorials/ ). A further advantage of this training resource is that it has been written with sensitivity towards the target audience in mind:  in a way that it provides a gentle introduction to complexities of language models to scholars with education and background other than Computer Science.

Along with the Tutorials, the same blog includes Introductions about BERT in general and in its specific usage in a Google Colab notebook, as well as a constantly-updated bibliography and a glossary of the main terms (‘attention’, ‘Fine-Tune’, ‘GPU’, ‘Label’, ‘Task’, ‘Transformers’, ‘Token’, ‘Type’, ‘Vector’).

The First of May in German Literature

The First of May in German Literature

Introduction by OpenMethods Editor (Erzsébet Tóth-Czifra): Research on date extractions from literature brings us closer to answering big questions of “when literature takes place”.  As Frank Fischer’s blog post, First of May in German literature shows, beyond mere quantification, this line of research also yields insights on the cultural significance of certain dates. In this case, the significance of 1st of May in German literature (as reflected in the “Corpus of German-Language Fiction” dataset) was determined with the help of a freely accessible data set and the open access tool HeidelTime. The brief description of the workflow is a smart demonstration of the potential of open DH methods and data sharing in sustainable ways.

Bonus one: the post starts out from briefly touching upon some of Frank’s public humanities activities.

Bonus two: mention of the Tiwoli (“Today in World Literature”) app, a fun side product built on to pof the date extraction research.

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

Introduction: This blog post by Lucy Havens presents a sentiment analysis of over 2000 Times Music Reviews using freely available tools: defoe for building the corpus of reviews, VADER for sentiment analysis and Jupiter Notebooks to provide a rich documentation and to connect the different components of the analysis. The description of the workflow comes with tool and method criticism reflections, including an outlook how to improve and continue to get better and more results.

Novels in distant reading: the European Literary Text Collection (ELTeC).

Novels in distant reading: the European Literary Text Collection (ELTeC).

Introduction: Among the most recent, currently ongoing, projects exploiting distant techniques reading there is the European Literary Text Collection (ELTeC), which is one of the main elements of the Distant Reading for European Literary History (COST Action CA16204, https://www.distant-reading.net/). Thanks to the contribution provided by four Working Groups (respectively dealing with Scholarly Resources, Methods and Tools, Literary Theory and History, and Dissemination: https://www.distant-reading.net/working-groups/ ), the project aims at providing at least 2,500 novels written in ten European languages with a range of Distant Reading computational tools and methodological strategies to approach them from various perspectives (textual, stylistic, topical, et similia). A full description of the objectives of the Action and of ELTeC can be found and read in the Memorandum of Understanding for the implementation of the COST Action “Distant Reading for European Literary History” (DISTANT-READING) CA 16204”, available at the link  https://e-services.cost.eu/files/domain_files/CA/Action_CA16204/mou/CA16204-e.pdf

[Click ‘Read more’ for the full post!]

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

Introduction: NLP modelling and tasks performed by them are becoming an integral part of our daily realities (everyday or research). A central concern of NLP research is that for many of their users, these models still largely operate as black boxes with limited reflections on why the model makes certain predictions, how their usage is skewed towards certain content types, what are the underlying social, cultural biases etc. The open source Language Interoperability Tool aim to change this for the better and brings transparency to the visualization and understanding of NLP models. The pre-print describing the tool comes with rich documentation and description of the tool (including case studies of different kinds) and gives us an honest SWOT analysis of it.

Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama

Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama

Introduction: The DraCor ecosystem encourages various approaches to the browsing and consultation of the data collected in the corpora, like those detailed in the Tools section: the Shiny DraCor app (https://shiny.dracor.org/), along with the SPARQL queries and the Easy Linavis interfaces (https://dracor.org/sparql and https://ezlinavis.dracor.org/ respectively). The project, thus, aims at creating a suitable digital environment for the development of an innovative way to approach literary corpora, potentially open to collaborations and interactions with other initiatives thanks to its ontology and Linked Open data-based nature.
[Click ‘Read more’ for the full post!]