Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?
Introduction: What are the essential data literacy skills data literacy skills in (Digital) Humanities? How good data management practices can be translated to humanities disciplines and how to engage more and more humanists in such conversations? Ulrike Wuttke’s reflections on the “Vermittlung von Data Literacy in den Geisteswissenschaften“ barcamp at the DHd 2020 conference does not only make us heartfelt nostalgic about scholarly meetings happening face to face but it also gives in-depth and contextualized insights regarding the questions above. The post comes with rich documentation (including links to the barcamp’s metapad, tweets, photos, follow-up posts) and is also serve as a guide for organizers of barcamps in the future.
Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.
Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.
Introduction: Processing XML flows has sometimes been a complicated affair traditionally, and XProc was designed to standardise and simplify the process by using declarative XML pipelines to manage operations. This blog post by Gioele Barabucci presents conclusions from a meeting in late 2017 of the XProc 3.0 working group, exploring the latest emerging version of the standard and the kinds of challenges it will overcome.
Introduction: The article discusses how letters are being used across the disciplines, identifying similarities and differences in transcription, digitisation and annotation practices. It is based on a workshop held after the end of the project Digitising experiences of migration: the development of interconnected letters collections (DEM). The aims were to examine issues and challenges surrounding digitisation, build capacity relating to correspondence mark-up, and initiate the process of interconnecting resources to encourage cross-disciplinary research. Subsequent to the DEM project, TEI templates were developed for capturing information within and about migrant correspondence, and visualisation tools were trialled with metadata from a sample of letter collections. Additionally, as a demonstration of how the project’s outputs could be repurposed and expanded, the correspondence metadata that was collected for DEM was added to a more general correspondence project, Visual Correspondence.
Introduction: This post proposes the program and the video of a seminar on a software for 3D geographical data capture and visualization.
Introduction: This paper explores some the new toolchains offered by the Open Web Platform and alternatives to be considered in the daily editing workflows.
Introduction: This post highlights the current and future projects, methods and tools in digital Assyriology.
Introduction: This post highlights digital methods and standards for an efficient analysis of historical data.