Document ALL the things!| The Center for Digital Humanities at Princeton

Introduction: Sustainability questions such as how to maintain digital project outputs after the funding period, or how to keep aging code and infrastructure that are important for our research up-to-date are among the major challenges DH projects are facing today. This post gives us a sneak peek into the solutions and working practices from the Center for Digital Humanities at Princeton. In their approach to build capacity for sustaining DH projects and preserve access to data and software, they view projects as collaborative and process-based scholarship. Therefore, their focus is on implementing project management workflows and documentation tools that can be flexibly applied to projects of different scopes and sizes and also allow for further refinement in due case. By sharing these resources together with their real-life use cases in DH projects, their aim is to benefit other scholarly communities and sustain a broader conversation about these tricky issues.

Annotating

Exploring internet with Hyphe

Posted on September 25, 2019September 25, 2019
by Florian CAFIERO

Introduction: Given in French by Mathieu Jacomy – also known for his work on Gephi, this seminar presentation gives a substantial introduction to Hyphe, an open-source web crawler designed by a team of the Sciences Po Medialab in Paris. Specifically devised for the researchers’ use, Hyphe helps collecting and curating a corpus of web pages, through an easy to handle interface.

Analysis

Analyzing Documents with TF-IDF | Programming Historian

Posted on September 15, 2019September 16, 2019
by Rombert Stapel

Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.

Analysis

Evaluating named entity recognition tools for extracting social networks from novels

Posted on July 10, 2019July 10, 2019
by Klaus Thoden

Introduction: Named Entity Recognition (NER) is used to identify textual elements that gives things a name. In this study, four different NER tools are evaluated using a corpus of modern and classic fantasy or science fiction novels. Since NER tools have been created for the news domain, it is interesting to see how they perform in a totally different domain. The article comes with a very detailed methodological part and the accompanying dataset is also made available.

Artifacts

The Uncanny Valley and the Ghost in the Machine

Posted on June 28, 2019June 28, 2019
by Joris van Zundert

Introduction: There is a postulated level of anthropomorphism where people feel uncanny about the appearance of a robot. But what happens if digital facsimiles and online editions become nigh indistinguishable from the real, yet materially remaining so vastly different? How do we ethically provide access to the digital object without creating a blindspot and neglect for the real thing. A question that keeps digital librarian Dot Porter awake and which she ponders in this thoughtful contribution.

Analysis

From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources

Posted on February 28, 2019February 28, 2019
by Ulrike Wuttke

Introduction: This lesson by Marten Düring from the “Programming Historian-Website” gently introduces novices to the topic to Network Visualisation of Historical Sources. As a case study it covers not only the general advantages of network visualisation for humanists but also a step-by-step explanation of the process from extraction of the data until the visualization (using the Palladio-tool). This lesson has also been translated into Spanish and includes many useful references for further reading.

Assessing

It’s personal, isn’t it? What personalization means for internet research methods – AoIR

Posted on October 11, 2018October 11, 2018
by Maciej Maryl

Introduction: This article assesses the issue of personalisation in internet research, raising important issues of how should we interpret users’ choices and how to account for the potential platform-design influence in your research workflow.

Analysis

If These Crawls Could Talk: Studying and Documenting Web Archives Provenance

Posted on August 15, 2018September 17, 2018
by Joris van Zundert

Introduction: With Web archives becoming an increasingly more important resource for (humanities) researchers, it also becomes paramount to investigate and understand the ways in which such archives are being built and how to make the processes involved transparent. Emily Maemura, Nicholas Worby, Ian Milligan, and Christoph Becker report on the comparison of three use cases and suggest a framework to document Web archive provenance.

Collaboration

Transkribus & Magazines: Transkribus’ Transcription & Recognition Platform (TRP) as Social Machine…

Posted on July 26, 2018July 27, 2018
by Helen Katsiadakis

Introduction: This article proposes establishing a good collaboration between FactMiners and the Transkribus project that will help the Transkribus team to evolve the “sustainable virtuous” ecosystem they described as a Transcription & Recognition Platform — a Social Machine for Job Creation & Skill Development in the 21st Century!

Analysis

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities

Posted on June 11, 2018July 22, 2018
by Joris van Zundert

Introduction: Apart from its buoyant conclusion that authorship attribution methods are rather robust to noise (transcription errors) introduced by optical character recognition and handwritten text recognition, this article also offers a comprehensive read on the application of sophisticated computational techniques for testing and validation in a data curation process.

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Category: Capture

Document ALL the things!| The Center for Digital Humanities at Princeton

Analyzing Documents with TF-IDF | Programming Historian

Evaluating named entity recognition tools for extracting social networks from novels

It’s personal, isn’t it? What personalization means for internet research methods – AoIR

If These Crawls Could Talk: Studying and Documenting Web Archives Provenance

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm | Digital Humanities