Introduction: Issues around sustaining digital project outputs after their funding period is a recurrent topic on OpenMethods. In this post, Arianna Ciula introduces the King’s Digital Lab’s solution, a workflow around their CKAN (Comprehensive Knowledge Archive Network) instance, and uncovers the many questions around not only maintaining a variety of legacy resources from long-running projects, but also opening them up for data re-use, verification and integration beyond siloed resources.
Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.
Introduction: The article illustrates the application of a ‘discourse-driven topic modeling’ (DDTM) to the analysis of the corpus ChronicItaly comprising several newspapers in Italian language, appeared in the USA during the time of massive migration towards America between the end of the XIX century and the first two decades of the XX (1898-1920).
The method combines both Text Modelling (™) and the discourse-historical approach (DHA) in order to get a more comprehensive representation of the ethnocultural and linguistic identity of the Italian group of migrants in the historical American context in crucial periods of time like that immediately preceding the eruption and that of the unfolding of World War I.
This short blog post by Laure Barbot, Frank Fischer, Yoan Moranville, and Ivan Pozdniakov from 2019 sheds some light on the old question which DH-tools are actually used in research and which are especially popular.
Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.
Introduction: This white paper is an outcome of a DH2019 workshop dedicated to foster closer collaboration among technology-oriented DH researchers and developers of tools to support Digital Humanities research. The paper briefly outlines the most pressing issues in their collaboration and addresses topics such as: good practices to ease mutual understanding between scholars and researchers; software development and academic career and recognition; or sustainability and funding.
Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.
Introduction: In this article, Alejandro Bia Platas and Ramón P. Ñeco García introduce TEIdown, an extension of the Markdown syntax that aims at creating XML-TEI documents, and transformation programs. TEIdown helps editors to validate and find errors in TEI documents.
Introduction: There is a postulated level of anthropomorphism where people feel uncanny about the appearance of a robot. But what happens if digital facsimiles and online editions become nigh indistinguishable from the real, yet materially remaining so vastly different? How do we ethically provide access to the digital object without creating a blindspot and neglect for the real thing. A question that keeps digital librarian Dot Porter awake and which she ponders in this thoughtful contribution.
Introduction: Computer scientists and humanists at the University of Würzburg have jointly developed a new and promising OCR tool to simplify text recognition in historical prints. “OCR4all” is freely available and works very reliably. The article describes its development and functions and leads to a well documented github repository to test the tool for yourself.