Analysis

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

Posted on July 8, 2021July 8, 2021
by Erzsebet Tóth-Czifra

Introduction: This blog post by Lucy Havens presents a sentiment analysis of over 2000 Times Music Reviews using freely available tools: defoe for building the corpus of reviews, VADER for sentiment analysis and Jupiter Notebooks to provide a rich documentation and to connect the different components of the analysis. The description of the workflow comes with tool and method criticism reflections, including an outlook how to improve and continue to get better and more results.

Collaboration

OpenMethods Spotlights #3 Keeping a smart diary of research processes with NeMO and the Scholarly Ontology

Posted on June 22, 2021
by Erzsebet Tóth-Czifra

In the next episode, we are looking behind the scenes of two ontologies: NeMO and the Scholarly Ontology (SO) with Panos Constantopoulos and Vayianos Pertsas who tell us the story behind these ontologies and explain how they can be used to ease or upcycle your daily works as a researcher. We discuss the value of knowledge graphs, how NeMO and SO connect with the emerging DH ontology landscape and beyond, why Open Access is a precondition of populating them, the Greek DH landscape …and many more!

Analysis

Undogmatic Literary Annotation with CATMA in: Annotations in Scholarly Editions and Research

Posted on May 17, 2021May 17, 2021
by Paul Spence

Introduction: Digital Literary Studies has long engaged with the challenges in representing ambiguity, contradiction and polyvocal readings of literary texts. This book chapter describes a web-based tool called CATMA which promises a “low-threshold” approach to digitally encoded text interpretation. CATMA has a long trajectory based on a ‘standoff’ approach to markup, somewhat provocatively described by its creators as “undogmatic”, which stands in contrast to more established systems for text representation in digital scholarly editing and publishing such as XML markup, or the Text Encoding Initiative (TEI). Standoff markup involves applying numbers to each character of a text and then using those numbers as identifiers to store interpretation externally. This approach allows for “multiple, over-lapping and even taxonomically contradictory annotations by one or more users” and avoids some of the rigidity which other approaches sometimes imply. An editor working with CATMA is able to create multiple independent annotation cycles, and to even specify which interpretation model was used for each. And the tool allows for an impressive array of analysis and visualization possibilities.

Recent iterations of CATMA have developed approaches which aim to bridge the gap between ‘close’ and ‘distant’ reading by providing scalable digital annotation and interpretation involving “semantic zooming” (which is compared to the kind of experience you get from an interactive map). The latest version also brings greater automation (currently in German only) to grammatical tense capture, temporal signals and part-of-speech annotation, which offer potentially significant effort savings and a wider range of markup review options. Greater attention is also paid to different kinds of interpretation activities through the three CATMA annotation modes of ‘highlight’, ‘comment’ and ‘annotate’, and to overall workflow considerations. The latest version of the tool offers finely grained access options mapping to common editorial roles and workflows.

I would have welcome greater reflection in the book chapter on sustainability – how an editor can port their work to other digital research environments, for use with other tools. While CATMA does allow for export to other systems (such as TEI), quite how effective this is (how well its interpretation structures bind to other digitally-mediated representation systems) is not clear.

What is most impressive about CATMA, and the work of its creator – the forTEXT research group – more generally, is how firmly embedded the thinking behind the tool is in humanities (and in particular literary) scholarship and theory. The group’s long-standing and deeply reflective engagement with the concerns of literary studies is well captured in this well-crafted and highly engaging book chapter.

[Click ‘Read more’ for the full post!]

Analysis

Novels in distant reading: the European Literary Text Collection (ELTeC).

Posted on May 11, 2021
by Marinella Testori

Introduction: Among the most recent, currently ongoing, projects exploiting distant techniques reading there is the European Literary Text Collection (ELTeC), which is one of the main elements of the Distant Reading for European Literary History (COST Action CA16204, https://www.distant-reading.net/). Thanks to the contribution provided by four Working Groups (respectively dealing with Scholarly Resources, Methods and Tools, Literary Theory and History, and Dissemination: https://www.distant-reading.net/working-groups/ ), the project aims at providing at least 2,500 novels written in ten European languages with a range of Distant Reading computational tools and methodological strategies to approach them from various perspectives (textual, stylistic, topical, et similia). A full description of the objectives of the Action and of ELTeC can be found and read in the Memorandum of Understanding for the implementation of the COST Action “Distant Reading for European Literary History” (DISTANT-READING) CA 16204”, available at the link https://e-services.cost.eu/files/domain_files/CA/Action_CA16204/mou/CA16204-e.pdf

[Click ‘Read more’ for the full post!]

Analysis

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

Posted on April 29, 2021April 29, 2021
by Erzsebet Tóth-Czifra

Introduction: NLP modelling and tasks performed by them are becoming an integral part of our daily realities (everyday or research). A central concern of NLP research is that for many of their users, these models still largely operate as black boxes with limited reflections on why the model makes certain predictions, how their usage is skewed towards certain content types, what are the underlying social, cultural biases etc. The open source Language Interoperability Tool aim to change this for the better and brings transparency to the visualization and understanding of NLP models. The pre-print describing the tool comes with rich documentation and description of the tool (including case studies of different kinds) and gives us an honest SWOT analysis of it.

Analysis

Cultural Ontologies: the ArCo Knowledge Graph.

Posted on March 11, 2021March 12, 2021
by Marinella Testori

Introduction: Standing for ‘Architecture of Knowledge’, ArCo is an open set of resources developed and managed by some Italian institutions, like the MiBAC (Minister for the Italian Cultural Heritage) and, within it, the ICCD – Institute of the General Catalogue and Documentation), and the CNR – Italian National Research Council. Through the application of eXtreme Design (XD), ArCO basically consists in an ontology network comprising seven modules (the arco, the core, the catalogue, the location, the denotative description, the context description, and the cultural event) and a set of LOD data comprising a huge amount of linked entities referring to the national Italian cultural resources, properties and events. Under constant refinement, ArCo represents an example of a “robust Semantic Web resource” (Carriero et al., 11) in the field of cultural heritage, along with other projects like, just to mention a couple of them, the Google Arts&Culture (https://artsandculture.google.com/) or the Smithsonian American Art Museum (https://americanart.si.edu/about/lod).

[Click ‘Read more’ for the full post!]

Community Building

OpenMethods Spotlights #2 : Interview with Luise Borek and Canan Hastik about TaDiRAH

Posted on February 10, 2021February 10, 2021
by Erzsebet Tóth-Czifra

In the next Spotlights episode, we are looking behind the scenes of TaDiRAH with Dr. Luise Borek and Dr. Canan Hastic who give us a rich introduction to the new version of it. We discuss communities around TaDiRAH, the evolution of DH, open data culture, linking with Wikidata…and many more!

Analysis

Fragmentarium: a Model for Digital Fragmentology

Posted on December 17, 2020December 18, 2020
by Erzsebet Tóth-Czifra

Introduction: One of the major challenges of digital data workflows in the Arts and Humanities is that resources that belong together, in extreme cases, like this particular one, even parts of dismembered manuscripts, are hosted and embedded in different geographical and institutional silos. Combining IIIF with a mySQL database, Fragmentarium provides a user-friendly but also standardized, open workspace for the virtual reconstruction of medieval manuscript fragments. Lisa Fagin Davis’s blog post gives contextualized insights of the potentials of Fragmentarium and how, as she writes, “technology has caught up with our dreams”.

Analysis

An end-to-end approach for extracting and segmenting high-variance references from pdf documents

Posted on November 9, 2020November 10, 2020
by Stefan Karcher

Introduction: Digital text analysis depends on one important thing: text that can be processed with little effort. Working with PDFs often leads to great difficulties, as Zeyd Boukhers Shriharsh Ambhore and Steffen Staab describe in their paper. Their goal is to extract references from PDF documents. Highlight of their described workflow are very impressive precision rates. The paper thereby encourages to a further development of the process and its application as a “method” in the humanities.

Analysis

Networks in the wild: data analysis and visualization with the Vistorian.

Posted on October 30, 2020November 3, 2020
by Marinella Testori

Introduction: Vistorian is a network analysis tool for digital humanists, especially historians. Its features have been specifically developed along basic principles of simplicity, privacy, openness and extensibility. The tool is part of the open-source Networkcube Project.
[Click ‘Read more’ for the full post!]

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Category: English

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

Undogmatic Literary Annotation with CATMA in: Annotations in Scholarly Editions and Research

Novels in distant reading: the European Literary Text Collection (ELTeC).

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

Cultural Ontologies: the ArCo Knowledge Graph.

OpenMethods Spotlights #2 : Interview with Luise Borek and Canan Hastik about TaDiRAH

Networks in the wild: data analysis and visualization with the Vistorian.