Humanities Data Analysis: Case Studies with Python — Humanities Data Analysis: Case Studies with Python

Humanities Data Analysis: Case Studies with Python — Humanities Data Analysis: Case Studies with Python

Introduction: Folgert Karsdorp, Mike Kestemont and Allen Riddell ‘s  interactive book, Humanities Data Analysis: Case Studies with Python had been written with the aim in mind to equip humanities students and scholars working with textual and tabular resources with practical, hands-on knowledge to better understand the potentials of data-rich, computer-assisted approaches that the Python framework offers to them and eventually to apply and integrate them to their own research projects.

The first part introduces a “Data carpentry”, a collection of essential techniques for gathering, cleaning, representing, and transforming textual and tabular data. This sets the stage for the second part that consists of 5 case studies (Statistics Essentials: WhoReads Novels? ; Introduction to Probability ; Narrating with Maps ; Stylometry and the Voice of Hildegard ; A Topic Model of United States Supreme Court Opinions, 1900–2000 ) showcasing how to draw meaningful insights from data using quantitative methods. Each chapter contains executable Python codes and ends with exercises ranging from easier drills to more creative and complex possibilities to adapt the apply and adopt the newly acquired knowledge to their own research problems.

The book exhibits best practices in how to make digital scholarship available in an open, sustainable ad digital-native manner, coming in different layers that are firmly interlinked with each other. Published with Princeton University Press in 2021, hardcopies are also available, but more importantly, the digital version is an  Open Access Jupyter notebook that can be read in multiple environments and formats (.md and .pdf). The documentation, coda and data materials are available on Zenodo (https://zenodo.org/record/3560761#.Y3tCcn3MJD9). The authors also made sure to select and use packages which are mature and actively maintained.

La poética dramática desde una perspectiva cuantitativa: la obra de Calderón de la Barca

La poética dramática desde una perspectiva cuantitativa: la obra de Calderón de la Barca

Introduction: In this paper, Ehrlicher et al. follow a quantitative approach to unveil possible structural parallelisms between 13 comedies and 10 autos sacramentales written by Calderón de la Barca. Comedies are analyzed within a comparative framework, setting them against Spanish comedia nueva and French comedie precepts. Authors employ tool DramaAnalysis and statistics for their examination, focusing on: word frequency per subgenre, average number of characters, their variation and discourse distribution, etc. Autos sacramentales are also evaluated through these indicators. Regarding comedies, Ehrlicher et al.’s results show that Calderón: a) plays with units of space and time depending on creative and dramatic needs, b) does not follow French comedie conventions of character intervention or linkage, but c) does abide by its concept of structural symmetry. As for autos sacramentales, their findings brought forth that these have a similar length and character variation to comedies. However, they also identified the next difference: Calderón uses character co-presence in them to reinforce the message conveyed. Considering all this, authors confirm that Calderón’s comedies disassociate from classical notions of theatre – both Aristotelian and French –ideals. With respect to autos sacramentales, they believe further evaluation would be needed to verify ideas put forward and identify other structural patterns.

Novels in distant reading: the European Literary Text Collection (ELTeC).

Novels in distant reading: the European Literary Text Collection (ELTeC).

Introduction: Among the most recent, currently ongoing, projects exploiting distant techniques reading there is the European Literary Text Collection (ELTeC), which is one of the main elements of the Distant Reading for European Literary History (COST Action CA16204, https://www.distant-reading.net/). Thanks to the contribution provided by four Working Groups (respectively dealing with Scholarly Resources, Methods and Tools, Literary Theory and History, and Dissemination: https://www.distant-reading.net/working-groups/ ), the project aims at providing at least 2,500 novels written in ten European languages with a range of Distant Reading computational tools and methodological strategies to approach them from various perspectives (textual, stylistic, topical, et similia). A full description of the objectives of the Action and of ELTeC can be found and read in the Memorandum of Understanding for the implementation of the COST Action “Distant Reading for European Literary History” (DISTANT-READING) CA 16204”, available at the link  https://e-services.cost.eu/files/domain_files/CA/Action_CA16204/mou/CA16204-e.pdf

[Click ‘Read more’ for the full post!]

Research COVID-19 with AVOBMAT

Research COVID-19 with AVOBMAT

Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.

‘Voyant Tools’

‘Voyant Tools’

Introduction: Digital humanists looking for tools in order to visualize and analyze texts can rely on ‘Voyant Tools’ (https://voyant-tools.org), a software package created by S.Sinclair and G.Rockwell. Online resources are available in order to learn how to use Voyant. In this post, we highlight two of them: “Using Voyant-Tools to Formulate Research Questions for Textual Data” by Filipa Calado (GC Digital Fellows and the tutorial “Investigating texts with Voyant” by Miriam Posner.

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology

Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities

Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.