When history meets technology. impresso: an innovative corpus-oriented perspective.

Historical newspapers, already available in many digitized collections, may represent a significant source of information for the reconstruction of events and backgrounds, enabling historians to cast new light on facts and phenomena, as well as to advance new interpretations. Lausanne, University of Zurich and C2DH Luxembourg, the ‘impresso – Media Monitoring of the Past’ project wishes to offer an advanced corpus-oriented answer to the increasing need of accessing and consulting collections of historical digitized newspapers.
[…] Thanks to a suite of computational tools for data extraction, linking and exploration, impresso aims at overcoming the traditional keyword-based approach by means of the application of advanced techniques, from lexical processing to semantically deepened n-grams, from data modelling to interoperability.
[Click ‘Read more’ for the full post!]

Content Analysis

Navegación de corpus a través de anotaciones lingüísticas automáticas obtenidas por Procesamiento del Lenguaje Natural: de anecdótico a ecdótico

Posted on March 18, 2020March 18, 2020
by Gimena Del Rio

Introduction: Spanish scholars Pablo Ruiz Fabo and Helena Bermúdez Sabel work in this article on two case studies regarding the application of Natural Language Processing (NLP) technologies, entity linking, and Computational Linguistics methods to create corpus navigation interfaces. The authors also focus on how these technologies for automatic text analysis allow us to enrich scholarly digital editions. They include interesting points of view about analogue and digital editions, and their relation with ecdotic practice.

Analysis

Do humanists need BERT?

Posted on August 12, 2019August 13, 2019
by Christopher Nunn

Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.

Content Analysis

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology

Posted on November 19, 2018December 11, 2018
by Florian CAFIERO

Introduction: Studying n-grams of characters is today a classical choice in authorship attribution. If some discussion about the optimal length of these n-grams have been made, we have still have few clues about which specific type of n-grams are the most helpful in the process of efficiently identifying the author of a text. This paper partly fills that gap, by showing that most of the information gained from studying n-grams of characters comes from the affixes and punctuation.

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Tag: Computational linguistics

When history meets technology. impresso: an innovative corpus-oriented perspective.

Navegación de corpus a través de anotaciones lingüísticas automáticas obtenidas por Procesamiento del Lenguaje Natural: de anecdótico a ecdótico

Do humanists need BERT?

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution – ACL Anthology