Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?
Following our last post focusing on Critical Discourse Analysis, today we highlight an automated document enrichment pipeline for automated interview coding, proposed by Ajda Pretnar Žagar, Nikola Ðukic´, Rajko Muršic in their paper presented at the Conference on Language Technologies & Digital Humanities, Ljubljana 2022. As described in the “Essential Guide to Coding Qualitative Data” (https://delvetool.com/guide), one of the main field of application of such a procedure is Ethnography, but not only.
Thanks to qualitative data coding it is possible to enrich texts through adding labels and descriptions to specific passages, that are generally pinpointed by means of computer-assisted qualitative data analysis softwares (CAQDAS). This can be valid for several fields of applications, from the humanities to biology, from sociology to medicine.
In their paper, Pretnar Žagar, Ðukic´ and Muršicˇ illustrate how relying on a couple of taxonomies (or onthologies) already known in anthropological studies may represent an asset to automatize and hasten the process of data labelling. These taxonomies are the Outline of Cultural Materials (OCM) and the ETSEO (acronym for Ethnological Topography of Slovenian Ethnic Territory) systematics. In both cases we deal with taxonomies elaborated and applied in ethnographic research in order to organize and better analyze concepts and categories related to human cultures and traditions.
[Click ‘Read more’ for the full post!]
Introduction: Folgert Karsdorp, Mike Kestemont and Allen Riddell ‘s interactive book, Humanities Data Analysis: Case Studies with Python had been written with the aim in mind to equip humanities students and scholars working with textual and tabular resources with practical, hands-on knowledge to better understand the potentials of data-rich, computer-assisted approaches that the Python framework offers to them and eventually to apply and integrate them to their own research projects.
The first part introduces a “Data carpentry”, a collection of essential techniques for gathering, cleaning, representing, and transforming textual and tabular data. This sets the stage for the second part that consists of 5 case studies (Statistics Essentials: WhoReads Novels? ; Introduction to Probability ; Narrating with Maps ; Stylometry and the Voice of Hildegard ; A Topic Model of United States Supreme Court Opinions, 1900–2000 ) showcasing how to draw meaningful insights from data using quantitative methods. Each chapter contains executable Python codes and ends with exercises ranging from easier drills to more creative and complex possibilities to adapt the apply and adopt the newly acquired knowledge to their own research problems.
The book exhibits best practices in how to make digital scholarship available in an open, sustainable ad digital-native manner, coming in different layers that are firmly interlinked with each other. Published with Princeton University Press in 2021, hardcopies are also available, but more importantly, the digital version is an Open Access Jupyter notebook that can be read in multiple environments and formats (.md and .pdf). The documentation, coda and data materials are available on Zenodo (https://zenodo.org/record/3560761#.Y3tCcn3MJD9). The authors also made sure to select and use packages which are mature and actively maintained.
Introduction: In this paper, Ehrlicher et al. follow a quantitative approach to unveil possible structural parallelisms between 13 comedies and 10 autos sacramentales written by Calderón de la Barca. Comedies are analyzed within a comparative framework, setting them against Spanish comedia nueva and French comedie precepts. Authors employ tool DramaAnalysis and statistics for their examination, focusing on: word frequency per subgenre, average number of characters, their variation and discourse distribution, etc. Autos sacramentales are also evaluated through these indicators. Regarding comedies, Ehrlicher et al.’s results show that Calderón: a) plays with units of space and time depending on creative and dramatic needs, b) does not follow French comedie conventions of character intervention or linkage, but c) does abide by its concept of structural symmetry. As for autos sacramentales, their findings brought forth that these have a similar length and character variation to comedies. However, they also identified the next difference: Calderón uses character co-presence in them to reinforce the message conveyed. Considering all this, authors confirm that Calderón’s comedies disassociate from classical notions of theatre – both Aristotelian and French –ideals. With respect to autos sacramentales, they believe further evaluation would be needed to verify ideas put forward and identify other structural patterns.
Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.
Introduction: Introduction by OpenMethods Editor (Christopher Nunn): Information visualizations are helpful in detecting patterns in large amounts of text and are often used to illustrate complex relationships. Not only can they show descriptive phenomena that could be revealed in other ways, albeit slower and more laborious, but they can also heuristically generate new knowledge. The authors of this article did just that. The focus here is, fortunately, on narratological approaches that have so far hardly been combined with digital text analyzes, but which are ideally suited for them. To eight German novellas a variety of interactive visualizations were created, all of which show: The combination of digital methods with narratological interest can provide great returns to Literary Studies work. After reading this article, it pays to think ahead in this field.
Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.
Introduction: This article introduces a novel way to unfold and discover patterns in complex texts, at the intersection between macro and micro analytics. This technique is called TIC (Transcendental Information Cascades) allows analysis of how a cast of characters is generated and managed dynamically over the duration of a text.
Introduction: This software paper in Polish describes “Magik” (Magician), a tool for textual scholars which allows for comparisons of different variants of the same text.
Introduction: This post highlights digital methods and standards for an efficient analysis of historical data.