Humanities Data Analysis: Case Studies with Python — Humanities Data Analysis: Case Studies with Python

Humanities Data Analysis: Case Studies with Python — Humanities Data Analysis: Case Studies with Python

Introduction: Folgert Karsdorp, Mike Kestemont and Allen Riddell ‘s  interactive book, Humanities Data Analysis: Case Studies with Python had been written with the aim in mind to equip humanities students and scholars working with textual and tabular resources with practical, hands-on knowledge to better understand the potentials of data-rich, computer-assisted approaches that the Python framework offers to them and eventually to apply and integrate them to their own research projects.

The first part introduces a “Data carpentry”, a collection of essential techniques for gathering, cleaning, representing, and transforming textual and tabular data. This sets the stage for the second part that consists of 5 case studies (Statistics Essentials: WhoReads Novels? ; Introduction to Probability ; Narrating with Maps ; Stylometry and the Voice of Hildegard ; A Topic Model of United States Supreme Court Opinions, 1900–2000 ) showcasing how to draw meaningful insights from data using quantitative methods. Each chapter contains executable Python codes and ends with exercises ranging from easier drills to more creative and complex possibilities to adapt the apply and adopt the newly acquired knowledge to their own research problems.

The book exhibits best practices in how to make digital scholarship available in an open, sustainable ad digital-native manner, coming in different layers that are firmly interlinked with each other. Published with Princeton University Press in 2021, hardcopies are also available, but more importantly, the digital version is an  Open Access Jupyter notebook that can be read in multiple environments and formats (.md and .pdf). The documentation, coda and data materials are available on Zenodo (https://zenodo.org/record/3560761#.Y3tCcn3MJD9). The authors also made sure to select and use packages which are mature and actively maintained.

What is PixPlot? (DH Tools) – YouTube

What is PixPlot? (DH Tools) – YouTube

Introduction: This short video teaser summarizes the main characteristics of PixPlot, a Python-based tool for clustering images and analyzing them from a numerical perspective as well as its pedagogical relevance as far as
machine learning is concerned.

The paper “Visual Patterns Discovery in Large Databases of Paintings”, presented at the Digital Humanities 2016 Conference held in Poland,
can be considered the foundational text for the development of the PixPlot Project at Yale University.
[Click ‘Read more’ for the full post!]

LoGaRT and RISE: Two multilingual tools from the Max Planck Institute for the History of Science

LoGaRT and RISE: Two multilingual tools from the Max Planck Institute for the History of Science

Introduction: This post introduces two tools developed by the Max Planck Institute for the History of Science, LoGaRT and RISE with a focus on Asia and Eurasia. […]The concept of LoGaRT – treating local gazetteers as “databases” by themselves – is an innovative and pertinent way to articulate the essence of the platform: providing opportunities for multi-level analysis from the close reading of the sources (using, for example, the carousel mode) to the large-scale, “bird’s eye view” of the materials across geographical and temporal boundaries. Local gazetteers are predominantly textual sources – this characteristic of the collection is reflected in the capabilities of LoGaRT as well, since some of its key capabilities include data search (using Chinese characters), collection and analysis, as well as tagging and dataset comparison. That said, LoGaRT also offers integrated visualization tools and supports the expansion of the collection and tagging features to the images used in a number of gazetteers. The opportunity to smoothly intertwine these visual and textual collections with Chinese historical maps (see CHMap) is an added, and much welcome, advantage of the tool, which helps to develop sophisticated and multifaceted analyses.
[Click ‘Read more’ for the full post!]

TAO IC Project: the charm of Chinese ceramics.

TAO IC Project: the charm of Chinese ceramics.

Introduction: Among the Nominees in the ‘Best DH Dataset’ of the DH Awards 2020, the TAO IC Project (http://www.dh.ketrc.com/index.html) leads us in a fascinating journey through the world of Chinese ceramics. The project, which is developed in a collaborative way at the Knowledge Engineering & Terminology Research Center of Liaocheng (http://ketrc.com/), exploits an onto-terminology-based approach to build an e-dictionary of Chinese vessels. Do you want to know every detail about a ‘Double-gourd Vase I’? If you consult ‘Class’ in the ‘Ontology’ section (http://www.dh.ketrc.com/class.html), you can discover the component, the function, from what such a vessel is made of, and what is the method to fire it. If you also wish to see how the vase appears, under ‘Individuals’ of the same section you can read a full description of it and, also, see a picture (http://www.dh.ketrc.com/class.html). All this information is collected in the e-dictionary for each beautiful item belonging to the Ming and Qing dynasties.

[Click ‘Read more’ for the full post!]

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

What Counts as Culture? Part I: Sentiment Analysis of The Times Music Reviews, 1950-2009 – train in the distance

Introduction: This blog post by Lucy Havens presents a sentiment analysis of over 2000 Times Music Reviews using freely available tools: defoe for building the corpus of reviews, VADER for sentiment analysis and Jupiter Notebooks to provide a rich documentation and to connect the different components of the analysis. The description of the workflow comes with tool and method criticism reflections, including an outlook how to improve and continue to get better and more results.

OpenMethods Spotlights #3 Keeping a smart diary of research processes with NeMO and the Scholarly Ontology

OpenMethods Spotlights #3 Keeping a smart diary of research processes with NeMO and the Scholarly Ontology

In the next episode, we are looking behind the scenes of two ontologies: NeMO and the Scholarly Ontology (SO) with Panos Constantopoulos and Vayianos Pertsas who tell us the story behind these ontologies and explain how they can be used to ease or upcycle your daily works as a researcher. We discuss the value of knowledge graphs, how NeMO and SO connect with the emerging DH ontology landscape and beyond, why Open Access is a precondition of populating them, the Greek DH landscape …and many more!

Novels in distant reading: the European Literary Text Collection (ELTeC).

Novels in distant reading: the European Literary Text Collection (ELTeC).

Introduction: Among the most recent, currently ongoing, projects exploiting distant techniques reading there is the European Literary Text Collection (ELTeC), which is one of the main elements of the Distant Reading for European Literary History (COST Action CA16204, https://www.distant-reading.net/). Thanks to the contribution provided by four Working Groups (respectively dealing with Scholarly Resources, Methods and Tools, Literary Theory and History, and Dissemination: https://www.distant-reading.net/working-groups/ ), the project aims at providing at least 2,500 novels written in ten European languages with a range of Distant Reading computational tools and methodological strategies to approach them from various perspectives (textual, stylistic, topical, et similia). A full description of the objectives of the Action and of ELTeC can be found and read in the Memorandum of Understanding for the implementation of the COST Action “Distant Reading for European Literary History” (DISTANT-READING) CA 16204”, available at the link  https://e-services.cost.eu/files/domain_files/CA/Action_CA16204/mou/CA16204-e.pdf

[Click ‘Read more’ for the full post!]

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

Introduction: NLP modelling and tasks performed by them are becoming an integral part of our daily realities (everyday or research). A central concern of NLP research is that for many of their users, these models still largely operate as black boxes with limited reflections on why the model makes certain predictions, how their usage is skewed towards certain content types, what are the underlying social, cultural biases etc. The open source Language Interoperability Tool aim to change this for the better and brings transparency to the visualization and understanding of NLP models. The pre-print describing the tool comes with rich documentation and description of the tool (including case studies of different kinds) and gives us an honest SWOT analysis of it.

Cultural Ontologies: the ArCo Knowledge Graph.

Cultural Ontologies: the ArCo Knowledge Graph.

Introduction: Standing for ‘Architecture of Knowledge’, ArCo is an open set of resources developed and managed by some Italian institutions, like the MiBAC (Minister for the Italian Cultural Heritage) and, within it, the ICCD – Institute of the General Catalogue and Documentation), and the CNR – Italian National Research Council. Through the application of eXtreme Design (XD), ArCO basically consists in an ontology network comprising seven modules (the arco, the core, the catalogue, the location, the denotative description, the context description, and the cultural event) and a set of LOD data comprising a huge amount of linked entities referring to the national Italian cultural resources, properties and events. Under constant refinement, ArCo represents an example of a “robust Semantic Web resource” (Carriero et al., 11) in the field of cultural heritage, along with other projects like, just to mention a couple of them, the Google Arts&Culture (https://artsandculture.google.com/) or the Smithsonian American Art Museum (https://americanart.si.edu/about/lod).

[Click ‘Read more’ for the full post!]