This short blog post by Laure Barbot, Frank Fischer, Yoan Moranville, and Ivan Pozdniakov from 2019 sheds some light on the old question which DH-tools are actually used in research and which are especially popular.
The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.
Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.
On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]
Introduction: Ted Underwood tests a new language representation model called “Bidirectional Encoder Representations from Transformers” (BERT) and asks if humanists should use it. Due to its high degree of difficulty and its limited success (e.g. in questions of genre detection) he concludes, that this approach will be important in the future but it’s nothing to deal with for humanists at the moment. An important caveat worth reading.
Introduction: Digital humanists looking for tools in order to visualize and analyze texts can rely on ‘Voyant Tools’ (https://voyant-tools.org), a software package created by S.Sinclair and G.Rockwell. Online resources are available in order to learn how to use Voyant. In this post, we highlight two of them: “Using Voyant-Tools to Formulate Research Questions for Textual Data” by Filipa Calado (GC Digital Fellows and the tutorial “Investigating texts with Voyant” by Miriam Posner.
Introduction: The Research Software Directory of the Netherlands eScience Institute provides easy access to software, source code and its documentation. More importantly, it makes it easy to cite software, which is highly advisable when using software to derive research results. The Research Software Directory positions itself as a platform that eases scientific referencing and reproducibility of software based research—good peer praxis that is still underdeveloped in the humanities.
Introduction: Standards are best explained in real life use cases. The Parthenos Standardization Survival Kit is a collection of research use case scenarios illustrating best practices in Digital Humanities and Heritage research. It is designed to support researchers in selecting and using the appropriate standards for their particular disciplines and workflows. The latest addition to the SSK is a scenario for creating a born-digital dictionary in TEI.
Introduction: The explore! project tests computer stimulation and text mining on autobiographic texts as well as the reusability of the approach in literary studies. To facilitate the application of the proposed method in broader context and to new research questions, the text analysis is performed by means of scientific workflows that allow for the documentation, automation, and modularization of the processing steps. By enabling the reuse of proven workflows, the goal of the project is to enhance the efficiency of data analysis in similar projects and further advance collaboration between computer scientists and digital humanists.
Introduction: This article explains the concept, the uses and the procedural steps of text mining. It further provides information regarding available teaching courses and encourages readers to use the OpenMinTeD platform for the purpose.
Introduction: This article describes the possibilities offered by the ggplot2 package for network visualization. This R package enables the user to use a wide variety of graphic styles, and to include supplementary information regarding vertices and edges.
Introduction: This article introduces a novel way to unfold and discover patterns in complex texts, at the intersection between macro and micro analytics. This technique is called TIC (Transcendental Information Cascades) allows analysis of how a cast of characters is generated and managed dynamically over the duration of a text.