“Document Enrichment as a Tool for Automated Interview Coding”
Following our last post focusing on Critical Discourse Analysis, today we highlight an automated document enrichment pipeline for automated interview coding, proposed by Ajda Pretnar Žagar, Nikola Ðukic´, Rajko Muršic in their paper presented at the Conference on Language Technologies & Digital Humanities, Ljubljana 2022. As described in the “Essential Guide to Coding Qualitative Data” (https://delvetool.com/guide), one of the main field of application of such a procedure is Ethnography, but not only.
Thanks to qualitative data coding it is possible to enrich texts through adding labels and descriptions to specific passages, that are generally pinpointed by means of computer-assisted qualitative data analysis softwares (CAQDAS). This can be valid for several fields of applications, from the humanities to biology, from sociology to medicine.
In their paper, Pretnar Žagar, Ðukic´ and Muršicˇ illustrate how relying on a couple of taxonomies (or onthologies) already known in anthropological studies may represent an asset to automatize and hasten the process of data labelling. These taxonomies are the Outline of Cultural Materials (OCM) and the ETSEO (acronym for Ethnological Topography of Slovenian Ethnic Territory) systematics. In both cases we deal with taxonomies elaborated and applied in ethnographic research in order to organize and better analyze concepts and categories related to human cultures and traditions.
The authors detail the passages (from data preprocessing to assigning labels to topical segments) that lead to the development of an effective pipeline of semantic text enrichment: more in detail, these passages consist in data collection and processing through pipeline, identification of semantically-relevant segments, enrichment of the segments through ontologies, identification of the most relevant segments and, finally, comparison with the already mentioned taxonomies).
In such a way, the analysis and classification of collected data can be improved and enhanced in cultural-comparative studies as well as in other topics of investigation.
While widely used in social sciences and the humanities, qualitative data coding remains a predominantly manual task. With the prolif-
eration of semantic analysis techniques, such as keyword extraction and ontology enrichment, researchers could use existing taxonomies
and systematics to automatically label text passages with semantic labels. We propose and test an analytical pipeline for automated
interview coding in anthropology, using two existing taxonomies, Outline of Cultural Materials and ETSEO systematics. We show it is
possible to quickly, efficiently and automatically annotate text passages with meaningful labels using current state-of-the-art semantic
analysis techniques.
Sources:
Ajda Pretnar Žagar, Nikola Ðukic´, Rajko Muršic. Document Enrichment as a Tool for Automated Interview Coding. Conference on Language Technologies & Digital Humanities, Ljubljana 2022, 169-176.
https://nl.ijs.si/jtdh22/pdf/JTDH2022_PretnarZagar-et-al_Document-Enrichment-as-a-Tool-for-Automated-Interview-Coding.pdf
“Essential Guide to Coding Qualitative Data” https://delvetool.com/guide
Original date of publication: 2022
Internet Archive link: https://web.archive.org/web/20221010113852/https://nl.ijs.si/jtdh22/pdf/JTDH2022_PretnarZagar-et-al_Document-Enrichment-as-a-Tool-for-Automated-Interview-Coding.pdf