LoGaRT and RISE: Two multilingual tools from the Max Planck Institute for the History of Science

LoGaRT and RISE: Two multilingual tools from the Max Planck Institute for the History of Science

Introduction: This post introduces two tools developed by the Max Planck Institute for the History of Science, LoGaRT and RISE with a focus on Asia and Eurasia. […]The concept of LoGaRT – treating local gazetteers as “databases” by themselves – is an innovative and pertinent way to articulate the essence of the platform: providing opportunities for multi-level analysis from the close reading of the sources (using, for example, the carousel mode) to the large-scale, “bird’s eye view” of the materials across geographical and temporal boundaries. Local gazetteers are predominantly textual sources – this characteristic of the collection is reflected in the capabilities of LoGaRT as well, since some of its key capabilities include data search (using Chinese characters), collection and analysis, as well as tagging and dataset comparison. That said, LoGaRT also offers integrated visualization tools and supports the expansion of the collection and tagging features to the images used in a number of gazetteers. The opportunity to smoothly intertwine these visual and textual collections with Chinese historical maps (see CHMap) is an added, and much welcome, advantage of the tool, which helps to develop sophisticated and multifaceted analyses.
[Click ‘Read more’ for the full post!]

Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS

Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS

https://openmethods.dariah.eu/2022/05/11/open-source-tool-allows-users-to-create-interactive-timelines-digital-humanities-at-a-state/ OpenMethods introduction to: Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS 2022-05-11 07:28:36 Marinella Testori Blog post Creation Data Designing Digital Humanities English Methods…

Getting started with OpenRefine – Digital Humanities 201

Getting started with OpenRefine – Digital Humanities 201

Introduction: Open Refine, the freely accessible successor to Google Refine, is an ideal tool for cleaning up data series and thus obtaining more sustainable results. Entries can be searched in alphabetical order or sorted by frequency, so that typing errors or slightly different variants can be easily found and adjusted. For example, with the help of the software, I discovered two such discrepancies in my Augustinian Correspondence Database, which I am now able to correct with one click in the programme. I was shown that I had noted “As a reference to Jerome’s letter it’s not counted” five times and “As a reference to Jerome’s letter, it’s not counted” three times. Consequently, if I searched the database for this expression, I would not see all the results. A second discrepancy was between the entry “continuing reference (marked by Nam)” and the entry “continuing reference (marked by nam)”. Thanks to Open Refine, such errors can be completely avoided in the future.

The tutorial by Miriam Posner is a useful introduction to come in touch with the software. However, the first step of the installation is already out of date. While version 3.1 was still the latest when the tutorial was published, it is now version 3.5.2. Under Windows, you can now distinguish between a version that requires Java and a version with embedded OpenJDK Java, which I found very pleasing.

If needed, there are links at the end of the tutorial to other introductions that go into more depth.

Annotation Guidelines For narrative levels, time features, and subjective narration styles in fiction (SANTA 2).

Annotation Guidelines For narrative levels, time features, and subjective narration styles in fiction (SANTA 2).

Introduction: If you are looking for solutions to translate narratological concepts to annotation guidelines to tag or mark-up your texts for both qualitative and quantitative analysis, then Edward Kearns’s paper “Annotation Guidelines for narrative levels, time features, and subjective narration styles in fiction” is for you! The tag set is designed to be used in XML, but they can be flexibly adopted to other working environments too, including for instance CATMA. The use of the tags is illustrated on a corpus of modernist fiction.
The guidelines have been published in a special issue of The Journal of Cultural Analytics (vol. 6, issue 4) entirely devoted to the illustration of the Systematic Analysis of Narrative levels Through Annotation (SANTA) project, serving as the broader intellectual context to the guidelines. All articles in the special issue are open peer reviewed , open access, and are available in both PDF and XML formats.
[Click ‘Read more’ for the full post!]

GitHub – CateAgostini/IIIF

GitHub – CateAgostini/IIIF

Introduction: In this resource, Caterina Agostini, PhD in Italian from Rutgers University, Project Manager at The Center for Digital Humanities at Princeton shares two handouts of workshops she organized and co-taught on the International Image Interoperability Framework (IIIF). They provide a gentle introduction to IIIF and clear overview of features (displaying, editing, annotating, sharing and comparing images along universal standards), examples and resources. The handouts could be of interest to anyone interested in the design and teaching of Open Educational Resources on IIF.
[Click ‘Read more’ for the full post!]

Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods – Journal of Digital History

Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods – Journal of Digital History

Introduction: In this post, we highlight a new publication venue for Historian Digital Humanists, the Journal of Digital History where digital scholarship is presented in three layers: narrative, epistemic and data layers. These publications are therefore complex digital scholarly outputs that open a bigger window on DH research and enable readers to follow along the whole research process, execute or eventually even reproduce certain steps. We showcase this innovative publication method though highlighting a methodology paper from the first issue, Sarah Oberbichler’s and Eva Pfanzelter’s Topic-specific corpus building: A step towards a representative newspaper corpus on the topic of return migration using text mining methods.

[Click ‘Read more’ for the full post!]

Find research data repositories for the humanities – the data deposit recommendation service

Find research data repositories for the humanities – the data deposit recommendation service

Introduction: Finding  suitable research data repositories that best match the technical or legal requirements of your research data is not always an easy task. This paper, authored by Stephan Buddenbohm, Maaikew de Jong, Jean-Luc Minel  and Yoann Moranville showcase the demonstrator instance of the Data Deposit Recommendation Service (DDRS), an application built on top of the re3data database specifically for scholars working in the Humanities domain. The paper  also highlights further directions of developing the tool, many of which implicitly bring sustainability issues to the table.

BERT for Humanists: a deep learning language model  meets DH

BERT for Humanists: a deep learning language model meets DH

Introduction: Awarded as Best Long Paper at the 2019 NACCL (North American Chapter of the Association for Computational Linguistics) Conference, the contribution by Jacob Devlin et al. provides an illustration of “BERT: Pre-training of Deep Biredictional Transformers for Language Understanding” (https://aclanthology.org/N19-1423/).

As highlighted by the authors in the abstract, BERT is a “new language representation model” and, in the past few years, it has become widespread in various NLP applications; for example, a project exploiting it is CamemBERT (https://camembert-model.fr/), regarding French. 

In June 2021, a workshop organized by David Mimno, Melanie Walsh and Maria Antoniak (https://melaniewalsh.github.io/BERT-for-Humanists/workshop/) pointed out how to use BERT in projects related to digital humanities, in order to deal with word similarity and classification classification while relying on Phyton-based HuggingFace transformers library. (https://melaniewalsh.github.io/BERT-for-Humanists/tutorials/ ). A further advantage of this training resource is that it has been written with sensitivity towards the target audience in mind:  in a way that it provides a gentle introduction to complexities of language models to scholars with education and background other than Computer Science.

Along with the Tutorials, the same blog includes Introductions about BERT in general and in its specific usage in a Google Colab notebook, as well as a constantly-updated bibliography and a glossary of the main terms (‘attention’, ‘Fine-Tune’, ‘GPU’, ‘Label’, ‘Task’, ‘Transformers’, ‘Token’, ‘Type’, ‘Vector’).

Voluntad y deseo en la filosofía moderna: un acercamiento computacional

Voluntad y deseo en la filosofía moderna: un acercamiento computacional

Introduction: in this study, Cebral Loureda analyzes how will and desire are conveyed in: Ethics, by Spinoza; The Phenomenology of Spirit, by Hegel; The World as Will and Representation, by Schopenhauer; and Thus spoke Zarathustra, by Nietzsche. With the idea of determining theses texts’ degree of cohesion, the author follows a computational and quantitative methodology to compare and contrast them, as well as assess their internal contradictions. A normalized corpus, statistics and visualizations are employed so as to evaluate the terminology, topoi and sentimentality of these works.  In relation to terminology, author’s findings revealed that Nietzsche uses a highly differentiated vocabulary from that of the other philosophers, adding marked emotional connotations to his discourse. Visualizations showed the terminological commonalities between Hegel and Schopenhauer and shed light on the former bearing the highest number of semantic connections with the other philosophers.  As for topoi, results showed there is a clear dichotomic tension between conceptual and vital experience in the studied documents. Redefining this dualism, however, Cebral Loureda observed that the concrete is always intertwined with the abstract and vice versa.  Regarding the sentimental dimension of these works, examination unveiled that Nietzsche’s presents the greatest negative sentimental load. In contrast, Spinoza’s is the most emotionally balanced. With all this, Cebral Loureda proves that there is a high degree of cohesion among these philosophical works, which link reason and emotion to will, time and spirit, core notions of modern philosophy and society.

The First of May in German Literature

The First of May in German Literature

Introduction by OpenMethods Editor (Erzsébet Tóth-Czifra): Research on date extractions from literature brings us closer to answering big questions of “when literature takes place”.  As Frank Fischer’s blog post, First of May in German literature shows, beyond mere quantification, this line of research also yields insights on the cultural significance of certain dates. In this case, the significance of 1st of May in German literature (as reflected in the “Corpus of German-Language Fiction” dataset) was determined with the help of a freely accessible data set and the open access tool HeidelTime. The brief description of the workflow is a smart demonstration of the potential of open DH methods and data sharing in sustainable ways.

Bonus one: the post starts out from briefly touching upon some of Frank’s public humanities activities.

Bonus two: mention of the Tiwoli (“Today in World Literature”) app, a fun side product built on to pof the date extraction research.