Met-Hodos: (Re)considering the road of research and analysis

Met-Hodos: (Re)considering the road of research and analysis

Introduction: This blog, curated by Andreas W. Müller from Halle University, provides an insight on qualitative data analysis (QDA) techniques to conduct research in the field of Digital Humanities. The field is currently dominated by quantitative research methods, and is still lacking digital analysis derived from qualitative approaches. The author implies that QDA is a not a method, but a set of techniques that can be used with different analysis methods, for instance Content Analysis or Discourse Analysis. He also outlines how QDA deals with qualitative data combined with qualitative analysis, being both elements fundamental.
[Click ‘Read more’ for the full post!]

Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin

Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin

Introduction: Thibault Clérice reports on the successfulness of recognizing word boundaries in scripta continua (typically late classic and early medieval Latin). This will not be easy reading for many a philologist and classicist, but it is well worth trying to bridge the gap. Next to explaining and evaluating Thibault Clérice releases the software Boudams used for his research.

[Click ‘Read more’ for the full post!]

Automatic annotation of incomplete and scattered bibliographical references in Digital Humanities papers

Automatic annotation of incomplete and scattered bibliographical references in Digital Humanities papers

The reviewed article presents the project BILBO and illustrates the application of several appropriate machine-learning techniques to the constitution of proper reference corpora and the construction of efficient annotation models. In this way, solutions are proposed for the problem of extracting and processing useful information from bibliographic references in digital documentation whatever their bibliographic styles are. It proves the usefulness and high degree of accuracy of CRF techniques, which involve finding the most effective set of features (including three types of features: input, local and global features) of a given corpus of well-structured bibliographical data (with labels such as surname, forename or title). Moreover, this approach has not only been proven efficient when applied to such traditional, well-structured bibliographical data sets, but it also originally contributes to the processing of more complicated, less-structured references such as the ones contained in footnotes by applying SVM with new features for sequence classification.

[Click ‘Read more’ for the full post.]

RAWGraphs: A Visualization Platform to Create Open Outputs

RAWGraphs: A Visualization Platform to Create Open Outputs

The paper illustrates the features of the innovative tool in the field of data visualization: it is the framework RAW Graphs, available in an open access format at the website https://rawgraphs.io/. The framework permits to establish a connection between data coming from various applications (from Microsoft Excel to Google Spreadsheets) and their visualization in several layouts.

As detailed in the video guide available in the ‘Learning section’ (https://rawgraphs.io/learning), it is possible to load own data through a simple ‘copy and past’ command, and then select a chart-based layout among those provided: contour plot, beeswarm plot, hexagonal binnings, scatterplot, treemap, bump chart, Gantt chart, multiple pie charts, alluvial diagram and barchart. The platform permits also to unstack data according to a wide and a narrow format.

RAWGraphs, ideal for those working in the field of design but not only, is kept as an open-source resource thanks to an Indiegogo crowdfunding campaign (https://rawgraphs.io/blog).
[click ‘Read’ for more]

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920

Introduction: The article illustrates the application of a ‘discourse-driven topic modeling’ (DDTM) to the analysis of the corpus ChronicItaly comprising several newspapers in Italian language, appeared in the USA during the time of massive migration towards America between the end of the XIX century and the first two decades of the XX (1898-1920).

The method combines both Text Modelling (™) and the discourse-historical approach (DHA) in order to get a more comprehensive representation of the ethnocultural and linguistic identity of the Italian group of migrants in the historical American context in crucial periods of time like that immediately preceding the eruption and that of the unfolding of World War I.

Web Scraping with Python for Beginners | The Digital Orientalist

Web Scraping with Python for Beginners | The Digital Orientalist

Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.

A World of Possibilities: a corpus-based approach to the diachrony of modality in Latin

A World of Possibilities: a corpus-based approach to the diachrony of modality in Latin

Introduction: Hosted at the University of Lausanne, “A world of possibilities. Modal pathways over an extra-long period of time: the diachrony in the Latin language” (WoPoss) is a project under development exploiting a corpus-based approach to the study and reconstruction of the diachrony of modality in Latin.
Following specific annotation guidelines applied to a set of various texts pertaining to the time span between 3rd century BCE and 7th century CE, the work team lead by Francesca Dell’Oro aims at analyzing the patterns of modality in the Latin language through a close consideration of lexical markers.

Digital scholarship workflows

Digital scholarship workflows

Introduction:  In this post, you can find a thoughtful and encouraging selection and description of reading, writing and organizing tools. It guides you through a whole discovery-magamement-writing-publishing workflow from the creation of annotated bibliographies in Zotero,  through a useful Markdown syntax cheat sheet  to versioning, storage and backup strategies, and shows how everybody’s research can profit by open digital methods even without sophisticated technological skills. What I particularly like in Tomislav Medak’s approach is that all these tools, practices and tricks are filtered through and tested again his own everyday scholarly routine. It would make perfect sense to create a visualization from this inventory in a similar fashion to these workflows.