Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects

https://openmethods.dariah.eu/2023/08/09/closing-the-gap-in-non-latin-script-data-a-tool-for-building-and-navigating-collections-of-dh-research-projects/ OpenMethods introduction to: Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects 2023-08-09 09:54:48 The Closing the Gap in non-Latin script data aims at mapping the field of digital humanities projects outside and beyond the anglosphere with a particular focus on non-Latin scripts such as Arabic or Chinese in both machine-actionable and human readable form. The urgency and value of such a survey has been highlighted in recent discussions around global, decolonial, and multilingual digital humanities. Ulrike Wuttke Blog post Crowdsourcing Discovering English Spatial Analysis minimal computing non-Latin script

Introduction by Open Methods guest editors (DH2023, Graz) Jacob Hart, Till Grallert, Jose Hernandez

The Closing the Gap in non-Latin script data aims at mapping the field of digital humanities projects outside and beyond the anglosphere with a particular focus on non-Latin scripts such as Arabic or Chinese in both machine-actionable and human readable form. The urgency and value of such a survey has been highlighted in recent discussions around global, decolonial, and multilingual digital humanities. The project itself relies on minimal computing principles in that it gathers data as one JSON file for each project from which it produces a static website hosted on Github Pages. Beyond their own data collection, anyone on the internet can submit data through either a basic form or GitHub issues and pull requests.

The dataset includes information on project titles, aims, time span, disciplines, and, most importantly, project languages. The website provides multiple ways of accessing the data:

  • Text search allows the user to find a specific project, or search projects by metadata and attain a human-readable parsing of the underlying JSON data.
  • Faceted browsing allows users to select projects by language and keyword based on a custom tagging scheme.
  • A map allows the user to access a geographical overview of all of the projects in the database. This can be useful for assessing diversity, and identifying hotspots for research in the field. note: location is based on research location, not on the location of the actual language of study 
  • A timeline also offers a composite view of the projects in the database. Here we can see when research occurred in relation to each other, and notably see which projects are still active.

This database is a result of community action and as such it has all the strengths and drawbacks that come with the nature of community dependent projects. Its strengths lie in a very intuitive way of browsing the data and allows its users to get a quick cursory overview of the state of the field. On the data level, however, the project depends on contributions from the community through GitHub issues and pull requests. This might not prevent the tech-savvy from contributing but might prove a step too far for many humanists and the general public. For now, while it is possible to create entries into the database using the GUI, this creates a JSON file for download to the user’s computer. This file must then be uploaded through opening a GitHub issue. If a user wishes to modify an entry, they must interact with the git repo (creating pull requests, logging issues etc.). On the upside, the simple data structure prevents any lock-in into a specific technology stack and easy transition to different infrastructures.

Even with these technological hurdles and concerns in mind, the current iteration of the website and the database itself performs an essential service for those in the digital humanities that are working with non -Latin scripts. By raising awareness of current projects, more and more researchers can interact with not only their results but also the unique challenges that they are facing across their research.

This system has great potential to be a template for many other use cases: be it for creating more collections of research projects, or collections of other digital objects. The JSON data format is flexible enough to allow the system to represent anything the user could want. There is still work to be done to make the integration with the underlying git repo more user-friendly, however Closing the Gap is a great resource for researchers and teams looking for a streamlined and simple solution for maintaining field information.

As we began gathering data on digital projects dealing with Arabic or similar languages, we thought about how to provide this data in a way that commits to OpenScience principles. So we chose a public Git repository as our main data store, offering the data as JSON in a way that should be as straightforward as possible. Everyone who is interested should be able to contribute without having to deal with too much of a technology stack.

Source: https://m-l-d-h.github.io/Closing-The-Gap-In-Non-Latin-Script-Data/about/
Screenshot from https://m-l-d-h.github.io/Closing-The-Gap-In-Non-Latin-Script-Data/map/

Original content: Closing the Gap Database interactive website; Capture of the site on Internet Archive (14.07.2022)

Website of parent project: Closing the Gap in Non-Latin Script Data • Berlin University Alliance; Capture of the site on Internet Archive

Leave a Reply

Your email address will not be published. Required fields are marked *