Introduction: Digital text analysis depends on one important thing: text that can be processed with little effort. Working with PDFs often leads to great difficulties, as Zeyd Boukhers Shriharsh Ambhore and Steffen Staab describe in their paper. Their goal is to extract references from PDF documents. Highlight of their described workflow are very impressive precision rates. The paper thereby encourages to a further development of the process and its application as a “method” in the humanities.
Introduction: Issues around sustaining digital project outputs after their funding period is a recurrent topic on OpenMethods. In this post, Arianna Ciula introduces the King’s Digital Lab’s solution, a workflow around their CKAN (Comprehensive Knowledge Archive Network) instance, and uncovers the many questions around not only maintaining a variety of legacy resources from long-running projects, but also opening them up for data re-use, verification and integration beyond siloed resources.
Introduction: In this article, Alejandro Bia Platas and Ramón P. Ñeco García introduce TEIdown, an extension of the Markdown syntax that aims at creating XML-TEI documents, and transformation programs. TEIdown helps editors to validate and find errors in TEI documents.
Introduction: The explore! project tests computer stimulation and text mining on autobiographic texts as well as the reusability of the approach in literary studies. To facilitate the application of the proposed method in broader context and to new research questions, the text analysis is performed by means of scientific workflows that allow for the documentation, automation, and modularization of the processing steps. By enabling the reuse of proven workflows, the goal of the project is to enhance the efficiency of data analysis in similar projects and further advance collaboration between computer scientists and digital humanists.
Introduction: In the context of medieval and early Tudor texts scholarship, this paper discusses the methodological use of the database not simply to store information, but to clarify points of tension between the questions asked and the information provided in order to find answers.
Introduction: How do we improve the quality of the fledgling practice of Web archeology, so much needed now that a first decade of Web information threatens to disappear as current interest wanes but contemporaneous cultural value is undisputed. A National Library of the Netherlands scientific report investigates.
Introduction: Now that sources for research increasingly are digital sources, how do we establish the quality of such sources?
Introduction: This paper explores some the new toolchains offered by the Open Web Platform and alternatives to be considered in the daily editing workflows.
Introduction: This software paper describes ‘stylo’ – an R package for stylometric research and text processing.
Introduction: This post highlights digital methods and standards for an efficient analysis of historical data.