When history meets technology. impresso: an innovative corpus-oriented perspective.

When history meets technology. impresso: an innovative corpus-oriented perspective.

Historical newspapers, already available in many digitized collections, may represent a significant source of information for the reconstruction of events and backgrounds, enabling historians to cast new light on facts and phenomena, as well as to advance new interpretations. Lausanne, University of Zurich and C2DH Luxembourg, the ‘impresso – Media Monitoring of the Past’ project wishes to offer an advanced corpus-oriented answer to the increasing need of accessing and consulting collections of historical digitized newspapers.
[…] Thanks to a suite of computational tools for data extraction, linking and exploration, impresso aims at overcoming the traditional keyword-based approach by means of the application of advanced techniques, from lexical processing to semantically deepened n-grams, from data modelling to interoperability.
[Click ‘Read more’ for the full post!]

Audio-Dateien zusammenführen und konvertieren in Audacity (Windows)

Audio-Dateien zusammenführen und konvertieren in Audacity (Windows)

Introduction: As online became the default means of teaching globally, the thoughtful use of online technologies will play an even more critical role in our everyday life. In this post, Christopher Nunn guides you through how to publish your lectures as podcasts as MP3 with the help of the open source tool, Audacity. The tutorial had been published as a guest post on Mareike Schuhmacher’s blog, Lebe lieber literarisch.

Met-Hodos: (Re)considering the road of research and analysis

Met-Hodos: (Re)considering the road of research and analysis

Introduction: This blog, curated by Andreas W. Müller from Halle University, provides an insight on qualitative data analysis (QDA) techniques to conduct research in the field of Digital Humanities. The field is currently dominated by quantitative research methods, and is still lacking digital analysis derived from qualitative approaches. The author implies that QDA is a not a method, but a set of techniques that can be used with different analysis methods, for instance Content Analysis or Discourse Analysis. He also outlines how QDA deals with qualitative data combined with qualitative analysis, being both elements fundamental.
[Click ‘Read more’ for the full post!]

Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin

Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin

Introduction: Thibault Clérice reports on the successfulness of recognizing word boundaries in scripta continua (typically late classic and early medieval Latin). This will not be easy reading for many a philologist and classicist, but it is well worth trying to bridge the gap. Next to explaining and evaluating Thibault Clérice releases the software Boudams used for his research.

[Click ‘Read more’ for the full post!]

Automatic annotation of incomplete and scattered bibliographical references in Digital Humanities papers

Automatic annotation of incomplete and scattered bibliographical references in Digital Humanities papers

The reviewed article presents the project BILBO and illustrates the application of several appropriate machine-learning techniques to the constitution of proper reference corpora and the construction of efficient annotation models. In this way, solutions are proposed for the problem of extracting and processing useful information from bibliographic references in digital documentation whatever their bibliographic styles are. It proves the usefulness and high degree of accuracy of CRF techniques, which involve finding the most effective set of features (including three types of features: input, local and global features) of a given corpus of well-structured bibliographical data (with labels such as surname, forename or title). Moreover, this approach has not only been proven efficient when applied to such traditional, well-structured bibliographical data sets, but it also originally contributes to the processing of more complicated, less-structured references such as the ones contained in footnotes by applying SVM with new features for sequence classification.

[Click ‘Read more’ for the full post.]

RAWGraphs: A Visualization Platform to Create Open Outputs

RAWGraphs: A Visualization Platform to Create Open Outputs

The paper illustrates the features of the innovative tool in the field of data visualization: it is the framework RAW Graphs, available in an open access format at the website https://rawgraphs.io/. The framework permits to establish a connection between data coming from various applications (from Microsoft Excel to Google Spreadsheets) and their visualization in several layouts.

As detailed in the video guide available in the ‘Learning section’ (https://rawgraphs.io/learning), it is possible to load own data through a simple ‘copy and past’ command, and then select a chart-based layout among those provided: contour plot, beeswarm plot, hexagonal binnings, scatterplot, treemap, bump chart, Gantt chart, multiple pie charts, alluvial diagram and barchart. The platform permits also to unstack data according to a wide and a narrow format.

RAWGraphs, ideal for those working in the field of design but not only, is kept as an open-source resource thanks to an Indiegogo crowdfunding campaign (https://rawgraphs.io/blog).
[click ‘Read’ for more]

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920

Mining ethnicity: Discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920

Introduction: The article illustrates the application of a ‘discourse-driven topic modeling’ (DDTM) to the analysis of the corpus ChronicItaly comprising several newspapers in Italian language, appeared in the USA during the time of massive migration towards America between the end of the XIX century and the first two decades of the XX (1898-1920).

The method combines both Text Modelling (™) and the discourse-historical approach (DHA) in order to get a more comprehensive representation of the ethnocultural and linguistic identity of the Italian group of migrants in the historical American context in crucial periods of time like that immediately preceding the eruption and that of the unfolding of World War I.