Getting started with OpenRefine – Digital Humanities 201

https://openmethods.dariah.eu/2022/04/27/getting-started-with-openrefine-digital-humanities-201/ OpenMethods introduction to: Getting started with OpenRefine – Digital Humanities 201 2022-04-27 13:26:00 Introduction: Open Refine, the freely accessible successor to Google Refine, is an ideal tool for cleaning up data series and thus obtaining more sustainable results. Entries can be searched in alphabetical order or sorted by frequency, so that typing errors or slightly different variants can be easily found and adjusted. For example, with the help of the software, I discovered two such discrepancies in my Augustinian Correspondence Database, which I am now able to correct with one click in the programme. I was shown that I had noted "As a reference to Jerome's letter it's not counted" five times and "As a reference to Jerome's letter, it's not counted" three times. Consequently, if I searched the database for this expression, I would not see all the results. A second discrepancy was between the entry "continuing reference (marked by Nam)" and the entry "continuing reference (marked by nam)". Thanks to Open Refine, such errors can be completely avoided in the future. The tutorial by Miriam Posner is a useful introduction to come in touch with the software. However, the first step of the installation is already out of date. While version 3.1 was still the latest when the tutorial was published, it is now version 3.5.2. Under Windows, you can now distinguish between a version that requires Java and a version with embedded OpenJDK Java, which I found very pleasing. If needed, there are links at the end of the tutorial to other introductions that go into more depth. Christopher Nunn http://miriamposner.com/classes/dh201w19/tutorials-guides/data-cleaning-and-manipulation/getting-started-with-openrefine/ Blog post Atlantic City British Library Create Project Data management software dropdown menu Extract, transform, load tools free software Google Refine Packt Publishing popup window Regular Expressions via bookmarklet web browser

Introduction by OpenMethods Editor (Christopher Nunn):

Open Refine, the freely accessible successor to Google Refine, is an ideal tool for cleaning up data series and thus obtaining more sustainable results. Entries can be searched in alphabetical order or sorted by frequency, so that typing errors or slightly different variants can be easily found and adjusted. For example, with the help of the software, I discovered two such discrepancies in my Augustinian Correspondence Database, which I am now able to correct with one click in the programme. I was shown that I had noted “As a reference to Jerome’s letter it’s not counted” five times and “As a reference to Jerome’s letter, it’s not counted” three times. Consequently, if I searched the database for this expression, I would not see all the results. A second discrepancy was between the entry “continuing reference (marked by Nam)” and the entry “continuing reference (marked by nam)”. Thanks to Open Refine, such errors can be completely avoided in the future.

The tutorial by Miriam Posner is a useful introduction to come in touch with the software. However, the first step of the installation is already out of date. While version 3.1 was still the latest when the tutorial was published, it is now version 3.5.2. Under Windows, you can now distinguish between a version that requires Java and a version with embedded OpenJDK Java, which I found very pleasing.

If needed, there are links at the end of the tutorial to other introductions that go into more depth.

Source: Getting started with OpenRefine – Digital Humanities 201

Original date of publication: Winter 2019.

InternetArchive link: https://web.archive.org/web/20210921091626/http://miriamposner.com/classes/dh201w19/tutorials-guides/data-cleaning-and-manipulation/getting-started-with-openrefine/