If These Crawls Could Talk: Studying and Documenting Web Archives Provenance

https://openmethods.dariah.eu/2018/08/15/if-these-crawls-could-talk-studying-and-documenting-web-archives-provenance/ OpenMethods introduction to: If These Crawls Could Talk: Studying and Documenting Web Archives Provenance 2018-08-15 10:47:24 Introduction: With Web archives becoming an increasingly more important resource for (humanities) researchers, it also becomes paramount to investigate and understand the ways in which such archives are being built and how to make the processes involved transparent. Emily Maemura, Nicholas Worby, Ian Milligan, and Christoph Becker report on the comparison of three use cases and suggest a framework to document Web archive provenance. Joris van Zundert https://tspace.library.utoronto.ca/handle/1807/82840 Blog post Analysis Archiving Capture Content Analysis English Gathering Information Retrieval Meta-Activities Research Activities Research Objects Research Techniques Storage Web Crawling via bookmarklet
Introduction by OpenMethods Editor (Joris van Zundert): With Web archives becoming an increasingly more important resource for (humanities) researchers, it also becomes paramount to investigate and understand the ways in which such archives are being built and how to make the processes involved transparent. Emily Maemura, Nicholas Worby, Ian Milligan, and Christoph Becker report on the comparison of three use cases and suggest a framework to document Web archive provenance. (Pre-print of the article that appeared in Journal of the Association for Information Science and Technology).

By comparing how three different web archives collections were created and documented, we investigate how curatorial decisions interact with technical and external factors and we compare commonalities and differences.
The findings reveal the need to understand both the social and technical context that shapes those decisions and the ways in which these individual decisions interact.

Source: If These Crawls Could Talk: Studying and Documenting Web Archives Provenance

Author: Author on Source

Drs. Joris J. van Zundert (1972) is a senior researcher and developer in humanities computing. He holds a research position in the department of literary studies at the Huygens Institute for the History of The Netherlands, a research institute of The Netherlands Royal Academy of Arts and Sciences (KNAW). His main interest as a researcher and developer is in the possibilities of computational algorithms for the analysis of literary and historic texts, and the nature and properties of humanities information and data modeling. His current research focuses on computer science and humanities interaction and the tensions between hermeneutics and ‘big data’ approaches.