Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.
Introduction: Hosted at the University of Lausanne, “A world of possibilities. Modal pathways over an extra-long period of time: the diachrony in the Latin language” (WoPoss) is a project under development exploiting a corpus-based approach to the study and reconstruction of the diachrony of modality in Latin.
Following specific annotation guidelines applied to a set of various texts pertaining to the time span between 3rd century BCE and 7th century CE, the work team lead by Francesca Dell’Oro aims at analyzing the patterns of modality in the Latin language through a close consideration of lexical markers.
Introduction: In this post, you can find a thoughtful and encouraging selection and description of reading, writing and organizing tools. It guides you through a whole discovery-magamement-writing-publishing workflow from the creation of annotated bibliographies in Zotero, through a useful Markdown syntax cheat sheet to versioning, storage and backup strategies, and shows how everybody’s research can profit by open digital methods even without sophisticated technological skills. What I particularly like in Tomislav Medak’s approach is that all these tools, practices and tricks are filtered through and tested again his own everyday scholarly routine. It would make perfect sense to create a visualization from this inventory in a similar fashion to these workflows.
The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.
Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.
On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]
Introduction: Introduction by OpenMethods Editor (Christopher Nunn): Information visualizations are helpful in detecting patterns in large amounts of text and are often used to illustrate complex relationships. Not only can they show descriptive phenomena that could be revealed in other ways, albeit slower and more laborious, but they can also heuristically generate new knowledge. The authors of this article did just that. The focus here is, fortunately, on narratological approaches that have so far hardly been combined with digital text analyzes, but which are ideally suited for them. To eight German novellas a variety of interactive visualizations were created, all of which show: The combination of digital methods with narratological interest can provide great returns to Literary Studies work. After reading this article, it pays to think ahead in this field.
Introduction: This white paper is an outcome of a DH2019 workshop dedicated to foster closer collaboration among technology-oriented DH researchers and developers of tools to support Digital Humanities research. The paper briefly outlines the most pressing issues in their collaboration and addresses topics such as: good practices to ease mutual understanding between scholars and researchers; software development and academic career and recognition; or sustainability and funding.
Introduction: Sustainability questions such as how to maintain digital project outputs after the funding period, or how to keep aging code and infrastructure that are important for our research up-to-date are among the major challenges DH projects are facing today. This post gives us a sneak peek into the solutions and working practices from the Center for Digital Humanities at Princeton. In their approach to build capacity for sustaining DH projects and preserve access to data and software, they view projects as collaborative and process-based scholarship. Therefore, their focus is on implementing project management workflows and documentation tools that can be flexibly applied to projects of different scopes and sizes and also allow for further refinement in due case. By sharing these resources together with their real-life use cases in DH projects, their aim is to benefit other scholarly communities and sustain a broader conversation about these tricky issues.
Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.
Introduction: Linked Data and Linked Open Data are gaining an increasing interest and application in many fields. A recent experiment conducted in 2018 at Furman University illustrates and discusses some of the challenges from a pedagogical perspective posed by Linked Open Data applied to research in the historical domain.
“Linked Open Data to navigate the Past: using Peripleo in class” by Chiara Palladino describes the exploitation of the search-engine Peripleo in order to reconstruct the past of four archeologically-relevant cities. Many databases, comprising various types of information, have been consulted, and the results, as highlighted in the contribution by Palladino, show both advantages and limitations of a Linked Open Data-oriented approach to historical investigations.
Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.