Analysis

“Multilingual Research Projects: Non-Latin Script Challenges for Making Use of Standards, Authority Files, and Character Recognition”.

Posted on July 8, 2023November 13, 2023
by Marinella Testori

Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?

Digital Humanities

Worthäufigkeiten als Quelle für die Geschichtswissenschaft? – Einblicke in die Digital Humanities

Posted on January 7, 2021January 8, 2021
by Christopher Nunn

Introduction: Especially humanities scholars (not only historians) who have not yet had any contact with the Digital Humanities, Silke Schwandt offers a motivating and vivid introduction to see the potential of this approach, using the analysis of word frequencies as an example. With the help of Voyant Tools and Nopaque, she provides her listeners with the necessary equipment to work quantitatively with their corpora. Schwandt’s presentation, to which the following report by Maschka Kunz, Isabella Stucky and Anna Ruh refers, can also be viewed at https://www.youtube.com/watch?v=tJvbC3b1yPc.

Content Analysis

When history meets technology. impresso: an innovative corpus-oriented perspective.

Posted on July 15, 2020July 17, 2020
by Marinella Testori

Historical newspapers, already available in many digitized collections, may represent a significant source of information for the reconstruction of events and backgrounds, enabling historians to cast new light on facts and phenomena, as well as to advance new interpretations. Lausanne, University of Zurich and C2DH Luxembourg, the ‘impresso – Media Monitoring of the Past’ project wishes to offer an advanced corpus-oriented answer to the increasing need of accessing and consulting collections of historical digitized newspapers.
[…] Thanks to a suite of computational tools for data extraction, linking and exploration, impresso aims at overcoming the traditional keyword-based approach by means of the application of advanced techniques, from lexical processing to semantically deepened n-grams, from data modelling to interoperability.
[Click ‘Read more’ for the full post!]

German

Modernes Tool für alte Texte

Posted on June 12, 2019June 17, 2019
by Stefan Karcher

Introduction: Computer scientists and humanists at the University of Würzburg have jointly developed a new and promising OCR tool to simplify text recognition in historical prints. “OCR4all” is freely available and works very reliably. The article describes its development and functions and leads to a well documented github repository to test the tool for yourself.

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Tag: OCR

“Multilingual Research Projects: Non-Latin Script Challenges for Making Use of Standards, Authority Files, and Character Recognition”.

Worthäufigkeiten als Quelle für die Geschichtswissenschaft? – Einblicke in die Digital Humanities

When history meets technology. impresso: an innovative corpus-oriented perspective.

Modernes Tool für alte Texte