“Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach”

“Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach”

Every scholar in digital humanities and/or social sciences has probably already faced the challenge posed by consulting large digital newspaper archives in order to extract detailed information about a topic. It is beyond any doubt that computational-oriented methods and tools currently available may provide a great contribution; however, applying such methods and tools could pose several difficulties, especially in dealing with large ensembles of items.

“Multilingual Research Projects: Non-Latin Script Challenges for Making Use of Standards, Authority Files, and Character Recognition”.

Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?

“Document Enrichment as a Tool for Automated Interview Coding”

“Document Enrichment as a Tool for Automated Interview Coding”

Following our last post focusing on Critical Discourse Analysis, today we highlight an automated document enrichment pipeline for automated interview coding, proposed by Ajda Pretnar Žagar, Nikola Ðukic´, Rajko Muršic in their paper presented at the Conference on Language Technologies & Digital Humanities, Ljubljana 2022. As described in the “Essential Guide to Coding Qualitative Data” (https://delvetool.com/guide), one of the main field of application of such a procedure is Ethnography, but not only.

Thanks to qualitative data coding it is possible to enrich texts through adding labels and descriptions to specific passages, that are generally pinpointed by means of computer-assisted qualitative data analysis softwares (CAQDAS). This can be valid for several fields of applications, from the humanities to biology, from sociology to medicine.
In their paper, Pretnar Žagar, Ðukic´ and Muršicˇ illustrate how relying on a couple of taxonomies (or onthologies) already known in anthropological studies may represent an asset to automatize and hasten the process of data labelling. These taxonomies are the Outline of Cultural Materials (OCM) and the ETSEO (acronym for Ethnological Topography of Slovenian Ethnic Territory) systematics. In both cases we deal with taxonomies elaborated and applied in ethnographic research in order to organize and better analyze concepts and categories related to human cultures and traditions.

[Click ‘Read more’ for the full post!]


SPARQL for music: when melodies meet ontology

SPARQL for music: when melodies meet ontology

Introduction: Developed in the context of the EU H2020 Polifonia project, the investigation deals with the potentialities of SPARQL Anything to
to extract musical features, both at metadata and symbolic levels, from MusicXML files. The paper captures the procedure that has applied by starting from an overview about the application of ontologies to music, as well as of the so- called ‘façade-based’ approach to knowledge graphs, which is at the core of the SPARQL Anything software. Then, it moves to an illustration of the passages involved (i.e., melody extraction, N-grams extraction, N-grams analysis and exploitation
of the Music Notation Ontology). Finally, it provides some considerations regarding the result of the experiment in terms of effectiveness of the queries’ performance. In conclusion, the authors highlight how further studies in the field may cast an increasingly brighter light on the application of semantic-oriented methods and techniques to computational musicology.
[Click ‘Read more’ for the full post!]

What is PixPlot? (DH Tools) – YouTube

What is PixPlot? (DH Tools) – YouTube

Introduction: This short video teaser summarizes the main characteristics of PixPlot, a Python-based tool for clustering images and analyzing them from a numerical perspective as well as its pedagogical relevance as far as
machine learning is concerned.

The paper “Visual Patterns Discovery in Large Databases of Paintings”, presented at the Digital Humanities 2016 Conference held in Poland,
can be considered the foundational text for the development of the PixPlot Project at Yale University.
[Click ‘Read more’ for the full post!]

Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS

Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS

https://openmethods.dariah.eu/2022/05/11/open-source-tool-allows-users-to-create-interactive-timelines-digital-humanities-at-a-state/ OpenMethods introduction to: Collaborative Digital Projects in the Undergraduate Humanities Classroom: Case Studies with Timeline JS 2022-05-11 07:28:36 Marinella Testori Blog post Creation Data Designing Digital Humanities English Methods…

Annotation Guidelines For narrative levels, time features, and subjective narration styles in fiction (SANTA 2).

Annotation Guidelines For narrative levels, time features, and subjective narration styles in fiction (SANTA 2).

Introduction: If you are looking for solutions to translate narratological concepts to annotation guidelines to tag or mark-up your texts for both qualitative and quantitative analysis, then Edward Kearns’s paper “Annotation Guidelines for narrative levels, time features, and subjective narration styles in fiction” is for you! The tag set is designed to be used in XML, but they can be flexibly adopted to other working environments too, including for instance CATMA. The use of the tags is illustrated on a corpus of modernist fiction.
The guidelines have been published in a special issue of The Journal of Cultural Analytics (vol. 6, issue 4) entirely devoted to the illustration of the Systematic Analysis of Narrative levels Through Annotation (SANTA) project, serving as the broader intellectual context to the guidelines. All articles in the special issue are open peer reviewed , open access, and are available in both PDF and XML formats.
[Click ‘Read more’ for the full post!]

BERT for Humanists: a deep learning language model  meets DH

BERT for Humanists: a deep learning language model meets DH

Introduction: Awarded as Best Long Paper at the 2019 NACCL (North American Chapter of the Association for Computational Linguistics) Conference, the contribution by Jacob Devlin et al. provides an illustration of “BERT: Pre-training of Deep Biredictional Transformers for Language Understanding” (https://aclanthology.org/N19-1423/).

As highlighted by the authors in the abstract, BERT is a “new language representation model” and, in the past few years, it has become widespread in various NLP applications; for example, a project exploiting it is CamemBERT (https://camembert-model.fr/), regarding French. 

In June 2021, a workshop organized by David Mimno, Melanie Walsh and Maria Antoniak (https://melaniewalsh.github.io/BERT-for-Humanists/workshop/) pointed out how to use BERT in projects related to digital humanities, in order to deal with word similarity and classification classification while relying on Phyton-based HuggingFace transformers library. (https://melaniewalsh.github.io/BERT-for-Humanists/tutorials/ ). A further advantage of this training resource is that it has been written with sensitivity towards the target audience in mind:  in a way that it provides a gentle introduction to complexities of language models to scholars with education and background other than Computer Science.

Along with the Tutorials, the same blog includes Introductions about BERT in general and in its specific usage in a Google Colab notebook, as well as a constantly-updated bibliography and a glossary of the main terms (‘attention’, ‘Fine-Tune’, ‘GPU’, ‘Label’, ‘Task’, ‘Transformers’, ‘Token’, ‘Type’, ‘Vector’).

TAO IC Project: the charm of Chinese ceramics.

TAO IC Project: the charm of Chinese ceramics.

Introduction: Among the Nominees in the ‘Best DH Dataset’ of the DH Awards 2020, the TAO IC Project (http://www.dh.ketrc.com/index.html) leads us in a fascinating journey through the world of Chinese ceramics. The project, which is developed in a collaborative way at the Knowledge Engineering & Terminology Research Center of Liaocheng (http://ketrc.com/), exploits an onto-terminology-based approach to build an e-dictionary of Chinese vessels. Do you want to know every detail about a ‘Double-gourd Vase I’? If you consult ‘Class’ in the ‘Ontology’ section (http://www.dh.ketrc.com/class.html), you can discover the component, the function, from what such a vessel is made of, and what is the method to fire it. If you also wish to see how the vase appears, under ‘Individuals’ of the same section you can read a full description of it and, also, see a picture (http://www.dh.ketrc.com/class.html). All this information is collected in the e-dictionary for each beautiful item belonging to the Ming and Qing dynasties.

[Click ‘Read more’ for the full post!]