Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?
Tag: OCR
Added by PressForward
Introduction: Especially humanities scholars (not only historians) who have not yet had any contact with the Digital Humanities, Silke Schwandt offers a motivating and vivid introduction to see the potential of this approach, using the analysis of word frequencies as an example. With the help of Voyant Tools and Nopaque, she provides her listeners with the necessary equipment to work quantitatively with their corpora. Schwandt’s presentation, to which the following report by Maschka Kunz, Isabella Stucky and Anna Ruh refers, can also be viewed at https://www.youtube.com/watch?v=tJvbC3b1yPc.
Historical newspapers, already available in many digitized collections, may represent a significant source of information for the reconstruction of events and backgrounds, enabling historians to cast new light on facts and phenomena, as well as to advance new interpretations. Lausanne, University of Zurich and C2DH Luxembourg, the ‘impresso – Media Monitoring of the Past’ project wishes to offer an advanced corpus-oriented answer to the increasing need of accessing and consulting collections of historical digitized newspapers.
[…] Thanks to a suite of computational tools for data extraction, linking and exploration, impresso aims at overcoming the traditional keyword-based approach by means of the application of advanced techniques, from lexical processing to semantically deepened n-grams, from data modelling to interoperability.
[Click ‘Read more’ for the full post!]
Introduction: Computer scientists and humanists at the University of Würzburg have jointly developed a new and promising OCR tool to simplify text recognition in historical prints. “OCR4all” is freely available and works very reliably. The article describes its development and functions and leads to a well documented github repository to test the tool for yourself.