Introduction: In this blog post, James Harry Morris introduces the method of web scraping. Step by step from the installation of the packages, readers are explained how they can extract relevant data from websites using only the Python programming language and convert it into a plain text file. Each step is presented transparently and comprehensibly, so that this article is a prime example of OpenMethods and gives readers the equipment they need to work with huge amounts of data that would no longer be possible manually.
Introduction: In this article, José Calvo Tello offers a methodological guide on data curation for creating literary corpus for quantitative analysis. This brief tutorial covers all stages of the curation and creation process and guides the reader towards practical cases from Hispanic literature. The author deals with every single step in the creation of a literary corpus for quantitative analysis: from digitization, metadata, automatic processes for cleaning and mining the texts, to licenses, publishing and achiving/long term preservation.
Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface.
Introduction: This article assesses the issue of personalisation in internet research, raising important issues of how should we interpret users’ choices and how to account for the potential platform-design influence in your research workflow.
Introduction: With Web archives becoming an increasingly more important resource for (humanities) researchers, it also becomes paramount to investigate and understand the ways in which such archives are being built and how to make the processes involved transparent. Emily Maemura, Nicholas Worby, Ian Milligan, and Christoph Becker report on the comparison of three use cases and suggest a framework to document Web archive provenance.
Introduction: The rperseus package provides classicists and other people interested in ancient philology and exegesis with corpora of texts from the ancient world (based on the Perseus Digital Library), combined with a toolkit designed to compare passages and selected words with parallels where the same expressions or words occur.
Introduction: This article explains the concept, the uses and the procedural steps of text mining. It further provides information regarding available teaching courses and encourages readers to use the OpenMinTeD platform for the purpose.
Introduction: The article discusses how letters are being used across the disciplines, identifying similarities and differences in transcription, digitisation and annotation practices. It is based on a workshop held after the end of the project Digitising experiences of migration: the development of interconnected letters collections (DEM). The aims were to examine issues and challenges surrounding digitisation, build capacity relating to correspondence mark-up, and initiate the process of interconnecting resources to encourage cross-disciplinary research. Subsequent to the DEM project, TEI templates were developed for capturing information within and about migrant correspondence, and visualisation tools were trialled with metadata from a sample of letter collections. Additionally, as a demonstration of how the project’s outputs could be repurposed and expanded, the correspondence metadata that was collected for DEM was added to a more general correspondence project, Visual Correspondence.
Introduction: How do we improve the quality of the fledgling practice of Web archeology, so much needed now that a first decade of Web information threatens to disappear as current interest wanes but contemporaneous cultural value is undisputed. A National Library of the Netherlands scientific report investigates.
Introduction: In 2013 due to the phenomenal success of Facebook, the until then unrivalled social media hub Hyves went off line, now it needs to be archived.