“Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach”

“Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach”

Every scholar in digital humanities and/or social sciences has probably already faced the challenge posed by consulting large digital newspaper archives in order to extract detailed information about a topic. It is beyond any doubt that computational-oriented methods and tools currently available may provide a great contribution; however, applying such methods and tools could pose several difficulties, especially in dealing with large ensembles of items.

Spanish Paleography Digital Teaching and Learning Tool

Spanish Paleography Digital Teaching and Learning Tool

The Spanish Paleography (http://spanishpaleographytool.org) tool helps to bridge this gap for those interested in learning paleography of the early modern Spanish period, covering the late 15th to the 18th centuries. The tool is intended to allow users to learn how to decipher and read handwriting from documents of this era. Full transcriptions of the documents can be viewed in a facing-page format, or users can highlight individual words. This tool could be used as a teaching tool to introduce students to paleography.
[Click ‘Read more’ for the full post!]

Mediate: A Collaborative Time-Based Media Annotation Tool for the Web

Mediate: A Collaborative Time-Based Media Annotation Tool for the Web

Mediate is a collaborative time-based media annotation tool for the web that can be used both individually and collaboratively for synchronous and asynchronous digital annotation. One of its highlighting features is accessibility and customization, i.e. the ability to customize the schema that forms the basis of the analysis or the purpose of the project.
[Click ‘Read more’ for the full post!]

An Engaging Environment for Ancient Chinese Texts: An Introduction to ctext.org

An Engaging Environment for Ancient Chinese Texts: An Introduction to ctext.org

The Chinese Text Project is a well-established resource in Sinology, providing open access to a large number of ancient Chinese texts. As a digital medium, it utilizes crowdsourcing, linked data, knowledge graph and other computational technologies to provide an interactive interface for users who are interested in ancient Chinese texts. Beyond its main aim of providing open access to Chinese literature and philosophy texts, the project features an integrated Chinese character dictionary tool, images of scanned source texts, a search function for parallel passages, and much more. In terms of structured data, the project’s data wiki contains a wealth of records on entities such as persons, locations, and works.
[Click ‘Read more’ for the full post!]

Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects

Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects

The Closing the Gap in non-Latin script data aims at mapping the field of digital humanities projects outside and beyond the anglosphere with a particular focus on non-Latin scripts such as Arabic or Chinese in both machine-actionable and human readable form. The urgency and value of such a survey has been highlighted in recent discussions around global, decolonial, and multilingual digital humanities.

“Multilingual Research Projects: Non-Latin Script Challenges for Making Use of Standards, Authority Files, and Character Recognition”.

Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?

“Document Enrichment as a Tool for Automated Interview Coding”

“Document Enrichment as a Tool for Automated Interview Coding”

Following our last post focusing on Critical Discourse Analysis, today we highlight an automated document enrichment pipeline for automated interview coding, proposed by Ajda Pretnar Žagar, Nikola Ðukic´, Rajko Muršic in their paper presented at the Conference on Language Technologies & Digital Humanities, Ljubljana 2022. As described in the “Essential Guide to Coding Qualitative Data” (https://delvetool.com/guide), one of the main field of application of such a procedure is Ethnography, but not only.

Thanks to qualitative data coding it is possible to enrich texts through adding labels and descriptions to specific passages, that are generally pinpointed by means of computer-assisted qualitative data analysis softwares (CAQDAS). This can be valid for several fields of applications, from the humanities to biology, from sociology to medicine.
In their paper, Pretnar Žagar, Ðukic´ and Muršicˇ illustrate how relying on a couple of taxonomies (or onthologies) already known in anthropological studies may represent an asset to automatize and hasten the process of data labelling. These taxonomies are the Outline of Cultural Materials (OCM) and the ETSEO (acronym for Ethnological Topography of Slovenian Ethnic Territory) systematics. In both cases we deal with taxonomies elaborated and applied in ethnographic research in order to organize and better analyze concepts and categories related to human cultures and traditions.

[Click ‘Read more’ for the full post!]


Tools for Critical Discourse Analysis – and introduction to tool critizism

Tools for Critical Discourse Analysis – and introduction to tool critizism

In this video, Drs. Stephanie Vie and Jennifer deWinter explain some of the tools digital humanists can use for critical discourse analysis and visualization of data collected from social media platforms. Although not all the tools they mention are open source, the majority of them have free to use or freemium versions, including AntConc, a free-to-use concordancing tool, or several Twitter data visualisation tools such as Tweeps map or Tweetstats.

Even though the video does not provide just-as-good open source alternatives to Atlas.ti or MAXQDA (an obviously a recurrent question or shortcoming that is recurrently discussed on OpenMethods), it sets an excellent example for how to introduce tool criticism in the classroom alongside introduction to certain Digital Humanities Tools. After briefly touching upon both advantages and disadvantages of each tool, they encourage their audience (students in Digital Humanities study programs) to pilot each of them by using the same data-set and not only compare their results but also reflect on the epistemic processes in-between.

Sharing the video on Humanities Commons with stable archiving, DOI and rich metadata is among the best things that could happen to teaching resources of all kinds.

SPARQL for music: when melodies meet ontology

SPARQL for music: when melodies meet ontology

Introduction: Developed in the context of the EU H2020 Polifonia project, the investigation deals with the potentialities of SPARQL Anything to
to extract musical features, both at metadata and symbolic levels, from MusicXML files. The paper captures the procedure that has applied by starting from an overview about the application of ontologies to music, as well as of the so- called ‘façade-based’ approach to knowledge graphs, which is at the core of the SPARQL Anything software. Then, it moves to an illustration of the passages involved (i.e., melody extraction, N-grams extraction, N-grams analysis and exploitation
of the Music Notation Ontology). Finally, it provides some considerations regarding the result of the experiment in terms of effectiveness of the queries’ performance. In conclusion, the authors highlight how further studies in the field may cast an increasingly brighter light on the application of semantic-oriented methods and techniques to computational musicology.
[Click ‘Read more’ for the full post!]