Introduction: Digital text analysis depends on one important thing: text that can be processed with little effort. Working with PDFs often leads to great difficulties, as Zeyd Boukhers Shriharsh Ambhore and Steffen Staab describe in their paper. Their goal is to extract references from PDF documents. Highlight of their described workflow are very impressive precision rates. The paper thereby encourages to a further development of the process and its application as a “method” in the humanities.
OpenMethods Spotlights showcase people and epistemic reflections behind Digital Humanities tools and methods. You can find here brief interviews with the creator(s) of the blogs or tools that are highlighted on OpenMethods to humanize and contextualize them. In the first episode, Alíz Horváth is talking with Hilde de Weerdt at Leiden University about MARKUS, a tool that offers offers a variety of functionalities for the markup, analysis, export, linking, and visualization of texts in multiple languages, with a special focus on Chinese and now Korean as well.
East Asian studies are still largely underrepresented in digital humanities. Part of the reason for this phenomenon is the relative lack of tools and methods which could be used smoothly with non-Latin scripts. MARKUS, developed by Brent Ho within the framework of the Communication and Empire: Chinese Empires in Comparative Perspective project led by Hilde de Weerdt at Leiden University, is a comprehensive tool which helps mitigate this issue. Selected as a runner up in the category “Best tool or suite of tools” in the DH2016 awards, MARKUS offers a variety of functionalities for the markup, analysis, export, linking, and visualization of texts in multiple languages, with a special focus on Chinese and now Korean as well.
Historical newspapers, already available in many digitized collections, may represent a significant source of information for the reconstruction of events and backgrounds, enabling historians to cast new light on facts and phenomena, as well as to advance new interpretations. Lausanne, University of Zurich and C2DH Luxembourg, the ‘impresso – Media Monitoring of the Past’ project wishes to offer an advanced corpus-oriented answer to the increasing need of accessing and consulting collections of historical digitized newspapers.
[…] Thanks to a suite of computational tools for data extraction, linking and exploration, impresso aims at overcoming the traditional keyword-based approach by means of the application of advanced techniques, from lexical processing to semantically deepened n-grams, from data modelling to interoperability.
[Click ‘Read more’ for the full post!]
Introduction: In our guidelines for nominating content, databases are explicitly excluded. However, this database is an exception, which is not due to the burning issue of COVID-19, but to its exemplary variety of digital humanities methods with which the data can be processed.AVOBMAT makes it possible to process 51,000 articles with almost every conceivable approach (Topic Modeling, Network Analysis, N-gram viewer, KWIC analyses, gender analyses, lexical diversity metrics, and so on) and is thus much more than just a simple database – rather, it is a welcome stage for the Who is Who (or What is What?) of OpenMethods.
Introduction: This blog, curated by Andreas W. Müller from Halle University, provides an insight on qualitative data analysis (QDA) techniques to conduct research in the field of Digital Humanities. The field is currently dominated by quantitative research methods, and is still lacking digital analysis derived from qualitative approaches. The author implies that QDA is a not a method, but a set of techniques that can be used with different analysis methods, for instance Content Analysis or Discourse Analysis. He also outlines how QDA deals with qualitative data combined with qualitative analysis, being both elements fundamental.
[Click ‘Read more’ for the full post!]
The paper illustrates the features of the innovative tool in the field of data visualization: it is the framework RAW Graphs, available in an open access format at the website https://rawgraphs.io/. The framework permits to establish a connection between data coming from various applications (from Microsoft Excel to Google Spreadsheets) and their visualization in several layouts.
As detailed in the video guide available in the ‘Learning section’ (https://rawgraphs.io/learning), it is possible to load own data through a simple ‘copy and past’ command, and then select a chart-based layout among those provided: contour plot, beeswarm plot, hexagonal binnings, scatterplot, treemap, bump chart, Gantt chart, multiple pie charts, alluvial diagram and barchart. The platform permits also to unstack data according to a wide and a narrow format.
RAWGraphs, ideal for those working in the field of design but not only, is kept as an open-source resource thanks to an Indiegogo crowdfunding campaign (https://rawgraphs.io/blog).
[click ‘Read’ for more]
Introduction: Spanish scholars Pablo Ruiz Fabo and Helena Bermúdez Sabel work in this article on two case studies regarding the application of Natural Language Processing (NLP) technologies, entity linking, and Computational Linguistics methods to create corpus navigation interfaces. The authors also focus on how these technologies for automatic text analysis allow us to enrich scholarly digital editions. They include interesting points of view about analogue and digital editions, and their relation with ecdotic practice.
Introduction: In this article, José Calvo Tello offers a methodological guide on data curation for creating literary corpus for quantitative analysis. This brief tutorial covers all stages of the curation and creation process and guides the reader towards practical cases from Hispanic literature. The author deals with every single step in the creation of a literary corpus for quantitative analysis: from digitization, metadata, automatic processes for cleaning and mining the texts, to licenses, publishing and achiving/long term preservation.
Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface.