Introduction: In this article, José Calvo Tello offers a methodological guide on data curation for creating literary corpus for quantitative analysis. This brief tutorial covers all stages of the curation and creation process and guides the reader towards practical cases from Hispanic literature. The author deals with every single step in the creation of a literary corpus for quantitative analysis: from digitization, metadata, automatic processes for cleaning and mining the texts, to licenses, publishing and achiving/long term preservation.
Category: Cleanup
Data cleanup involves improving the quality of an existing digital object. This could include such things as correcting errors in a written text, errors in OCR results, debugging code, improving the quality of video, audio, or image file.
Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface.
Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.
Introduction: Ecologists are much aided by historical sources of information on human-animal interaction. But how does one cope with the plethora of different descriptions for the same animal in the historic record? A Dutch research group reports on how to aggregate ‘Bunzings’, ‘Ullingen’, and ‘Eierdieven’ (‘Egg-thieves’) into a useful historical ecology knowledge base.
Introduction: This post analyses the sequence alignment text/image and the quality of manuscript transcriptions.
Introduction: This post presents stereotypes on research methods in egyptology, and the current and new projects and tools in this research field.
Introduction: This post highlights the analysis of illuminated manuscript in art history before and after digital methods and tools.
Introduction: Here is the presentation of a project in anthropology on digital humanities methods and researches.
Introduction: This post highlights a new multipurpose platform in epigraphy.
Introduction: This post outlines Giorgio Caviglia’s work on interaction between visualization tools and humanities, and its consequences for research process and results.