FactGrid is both a database as well as a wiki. This project operated by the Gotha Research Centre and the data lab of the University of Erfurt. It utilizes MediaWiki and a Wikidata’s “wikibase” extension to collect data from historic research. With FactGrid you can create a knowledge graph, giving information in triple statements. This knowledge graph can be asked with SPARQL. All data provided by FactGrid holds a CC0-license.
Category: Annotating
Annotating refers to the activity of making information about a digital object explicit by adding, e.g., comments, metadata or keywords to a digitized representation or to an annotation file associated with it. This can be in the form of annotations that comment on or contextualize a passage (explanatory annotations) in order to make structural or linguistic information explicit (structural/linguistic annotation), as linked open data making the relationships between objects machine-readable, or, in the case of general metadata, adding information about the object as a whole. Encoding is a technique associated with annotating, as are POS-Tagging, Tree-Tagging, and Georeferencing.
Every scholar in digital humanities and/or social sciences has probably already faced the challenge posed by consulting large digital newspaper archives in order to extract detailed information about a topic. It is beyond any doubt that computational-oriented methods and tools currently available may provide a great contribution; however, applying such methods and tools could pose several difficulties, especially in dealing with large ensembles of items.
Mediate is a collaborative time-based media annotation tool for the web that can be used both individually and collaboratively for synchronous and asynchronous digital annotation. One of its highlighting features is accessibility and customization, i.e. the ability to customize the schema that forms the basis of the analysis or the purpose of the project.
The Chinese Text Project is a well-established resource in Sinology, providing open access to a large number of ancient Chinese texts. As a digital medium, it utilizes crowdsourcing, linked data, knowledge graph and other computational technologies to provide an interactive interface for users who are interested in ancient Chinese texts. Beyond its main aim of providing open access to Chinese literature and philosophy texts, the project features an integrated Chinese character dictionary tool, images of scanned source texts, a search function for parallel passages, and much more. In terms of structured data, the project’s data wiki contains a wealth of records on entities such as persons, locations, and works.
Everyone of us is accustomed to reading academic contributions using the Latin alphabet, for which we have already standard characters and formats. But what about texts written in languages featuring different, ideographic-based alphabets (for example, Chinese and Japanese)? What kind of recognition techniques and metadata are necessary to adopt in order to represent them in a digital context?
Following our last post focusing on Critical Discourse Analysis, today we highlight an automated document enrichment pipeline for automated interview coding, proposed by Ajda Pretnar Žagar, Nikola Ðukic´, Rajko Muršic in their paper presented at the Conference on Language Technologies & Digital Humanities, Ljubljana 2022. As described in the “Essential Guide to Coding Qualitative Data” (https://delvetool.com/guide), one of the main field of application of such a procedure is Ethnography, but not only.
Thanks to qualitative data coding it is possible to enrich texts through adding labels and descriptions to specific passages, that are generally pinpointed by means of computer-assisted qualitative data analysis softwares (CAQDAS). This can be valid for several fields of applications, from the humanities to biology, from sociology to medicine.
In their paper, Pretnar Žagar, Ðukic´ and Muršicˇ illustrate how relying on a couple of taxonomies (or onthologies) already known in anthropological studies may represent an asset to automatize and hasten the process of data labelling. These taxonomies are the Outline of Cultural Materials (OCM) and the ETSEO (acronym for Ethnological Topography of Slovenian Ethnic Territory) systematics. In both cases we deal with taxonomies elaborated and applied in ethnographic research in order to organize and better analyze concepts and categories related to human cultures and traditions.
[Click ‘Read more’ for the full post!]
In this video, Drs. Stephanie Vie and Jennifer deWinter explain some of the tools digital humanists can use for critical discourse analysis and visualization of data collected from social media platforms. Although not all the tools they mention are open source, the majority of them have free to use or freemium versions, including AntConc, a free-to-use concordancing tool, or several Twitter data visualisation tools such as Tweeps map or Tweetstats.
Even though the video does not provide just-as-good open source alternatives to Atlas.ti or MAXQDA (an obviously a recurrent question or shortcoming that is recurrently discussed on OpenMethods), it sets an excellent example for how to introduce tool criticism in the classroom alongside introduction to certain Digital Humanities Tools. After briefly touching upon both advantages and disadvantages of each tool, they encourage their audience (students in Digital Humanities study programs) to pilot each of them by using the same data-set and not only compare their results but also reflect on the epistemic processes in-between.
Sharing the video on Humanities Commons with stable archiving, DOI and rich metadata is among the best things that could happen to teaching resources of all kinds.
Introduction: This short video teaser summarizes the main characteristics of PixPlot, a Python-based tool for clustering images and analyzing them from a numerical perspective as well as its pedagogical relevance as far as
machine learning is concerned.
The paper “Visual Patterns Discovery in Large Databases of Paintings”, presented at the Digital Humanities 2016 Conference held in Poland,
can be considered the foundational text for the development of the PixPlot Project at Yale University.
[Click ‘Read more’ for the full post!]
In this post, we reach back in time to showcase an older project and highlight its impact on data visualization in Digital Humanities as well as its good practices to make different layers of scholarship available for increased transparency and reusability.
Developed at Stanford with other research partners (‘Cultures of Knowledge’ at Oxford, the Groupe d’Alembert at CNRS, the KKCC-Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic, the DensityDesign ResearchLab), the ‘Mapping of the Republic of Letters Project’ aimed at digitizing and visualizing the intellectual community throughout the XVI and XVIII centuries known as ‘Republic of Letters’ (an overview of the concept can be found in Bots and Waquet, 1997), to get a better sense of the shape, size and associated intellectual network, its inherent complexities and boundaries.
Below we highlight the different, interrelated
layers of making project outputs available and reusable on the long term (way before FAIR data became a widespread policy imperative!): methodological reflections, interactive visualizations, the associated data and its data model schema. All of these layers are published in a trusted repository and are interlinked with each other via their Persistent Identifiers.
[Click ‘Read more’ for the full post!]
Introduction: This post introduces two tools developed by the Max Planck Institute for the History of Science, LoGaRT and RISE with a focus on Asia and Eurasia. […]The concept of LoGaRT – treating local gazetteers as “databases” by themselves – is an innovative and pertinent way to articulate the essence of the platform: providing opportunities for multi-level analysis from the close reading of the sources (using, for example, the carousel mode) to the large-scale, “bird’s eye view” of the materials across geographical and temporal boundaries. Local gazetteers are predominantly textual sources – this characteristic of the collection is reflected in the capabilities of LoGaRT as well, since some of its key capabilities include data search (using Chinese characters), collection and analysis, as well as tagging and dataset comparison. That said, LoGaRT also offers integrated visualization tools and supports the expansion of the collection and tagging features to the images used in a number of gazetteers. The opportunity to smoothly intertwine these visual and textual collections with Chinese historical maps (see CHMap) is an added, and much welcome, advantage of the tool, which helps to develop sophisticated and multifaceted analyses.
[Click ‘Read more’ for the full post!]