Undogmatic Literary Annotation with CATMA in: Annotations in Scholarly Editions and Research

Undogmatic Literary Annotation with CATMA in: Annotations in Scholarly Editions and Research

Introduction: Digital Literary Studies has long engaged with the challenges in representing ambiguity, contradiction and polyvocal readings of literary texts. This book chapter describes a web-based tool called CATMA which  promises a “low-threshold” approach to digitally encoded text interpretation. CATMA has a long trajectory based on a ‘standoff’ approach to markup, somewhat provocatively described by its creators as “undogmatic”, which stands in contrast to more established systems for text representation in digital scholarly editing and publishing such as XML markup, or the Text Encoding Initiative (TEI). Standoff markup involves applying numbers to each character of a text and then using those numbers as identifiers to store interpretation externally. This approach allows for “multiple, over-lapping and even taxonomically contradictory annotations by one or more users” and avoids some of the rigidity which other approaches sometimes imply. An editor working with CATMA is able to create multiple independent annotation cycles, and to even specify which interpretation model was used for each. And the tool allows for an impressive array of analysis and visualization possibilities.

Recent iterations of CATMA have developed approaches which aim to bridge the gap between ‘close’ and ‘distant’ reading by providing scalable digital annotation and interpretation involving “semantic zooming” (which is compared to the kind of experience you get from an interactive map). The latest version also brings greater automation (currently in German only) to grammatical tense capture, temporal signals and part-of-speech annotation, which offer potentially significant effort savings and a wider range of markup review options. Greater attention is also paid to different kinds of interpretation activities through the three CATMA annotation modes of ‘highlight’, ‘comment’ and ‘annotate’, and to overall workflow considerations. The latest version of the tool offers finely grained access options mapping to common editorial roles and workflows.

I would have welcome greater reflection in the book chapter on sustainability – how an editor can port their work to other digital research environments, for use with other tools. While CATMA does allow for export to other systems (such as TEI), quite how effective this is (how well its interpretation structures bind to other digitally-mediated representation systems) is not clear.

What is most impressive about CATMA, and the work of its creator – the forTEXT research group – more generally, is how firmly embedded the thinking behind the tool is in humanities (and in particular literary) scholarship and theory. The group’s long-standing and deeply reflective engagement with the concerns of literary studies is well captured in this well-crafted and highly engaging book chapter.

[Click ‘Read more’ for the full post!]

Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama

Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama

Introduction: The DraCor ecosystem encourages various approaches to the browsing and consultation of the data collected in the corpora, like those detailed in the Tools section: the Shiny DraCor app (https://shiny.dracor.org/), along with the SPARQL queries and the Easy Linavis interfaces (https://dracor.org/sparql and https://ezlinavis.dracor.org/ respectively). The project, thus, aims at creating a suitable digital environment for the development of an innovative way to approach literary corpora, potentially open to collaborations and interactions with other initiatives thanks to its ontology and Linked Open data-based nature.
[Click ‘Read more’ for the full post!]

GROBID: when data extraction becomes a suite

GROBID: when data extraction becomes a suite

Introduction: GROBID is an already well-known open source tool in the field of Digital Humanities, originally built to extract and parse bibliographical metadata from scholarly works. The acronym stands for GeneRation Of BIbliographic Data.
Shaped by use cases and adoptions to a range of different DH and non-DH settings, the tool has been progressively evolved into a suite of technical features currently applied to various fields, like that of journals, dictionaries and archives.
[Click ‘Read more’ for the full post!]

A World of Possibilities: a corpus-based approach to the diachrony of modality in Latin

A World of Possibilities: a corpus-based approach to the diachrony of modality in Latin

Introduction: Hosted at the University of Lausanne, “A world of possibilities. Modal pathways over an extra-long period of time: the diachrony in the Latin language” (WoPoss) is a project under development exploiting a corpus-based approach to the study and reconstruction of the diachrony of modality in Latin.
Following specific annotation guidelines applied to a set of various texts pertaining to the time span between 3rd century BCE and 7th century CE, the work team lead by Francesca Dell’Oro aims at analyzing the patterns of modality in the Latin language through a close consideration of lexical markers.