Introduction: GROBID is an already well-known open source tool in the field of Digital Humanities, originally built to extract and parse bibliographical metadata from scholarly works. The acronym stands for GeneRation Of BIbliographic Data.
Shaped by use cases and adoptions to a range of different DH and non-DH settings, the tool has been progressively evolved into a suite of technical features currently applied to various fields, like that of journals, dictionaries and archives.
[Click ‘Read more’ for the full post!]
Category: Research Process
Introduction: In this post, you can find a thoughtful and encouraging selection and description of reading, writing and organizing tools. It guides you through a whole discovery-magamement-writing-publishing workflow from the creation of annotated bibliographies in Zotero, through a useful Markdown syntax cheat sheet to versioning, storage and backup strategies, and shows how everybody’s research can profit by open digital methods even without sophisticated technological skills. What I particularly like in Tomislav Medak’s approach is that all these tools, practices and tricks are filtered through and tested again his own everyday scholarly routine. It would make perfect sense to create a visualization from this inventory in a similar fashion to these workflows.
The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.
Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.
On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]
Introduction: The explore! project tests computer stimulation and text mining on autobiographic texts as well as the reusability of the approach in literary studies. To facilitate the application of the proposed method in broader context and to new research questions, the text analysis is performed by means of scientific workflows that allow for the documentation, automation, and modularization of the processing steps. By enabling the reuse of proven workflows, the goal of the project is to enhance the efficiency of data analysis in similar projects and further advance collaboration between computer scientists and digital humanists.
Introduction: This is a comprehensive account of a workshop on research data in the study of the past. It introduces a broad spectrum of aspects and questions related to the growing relevance of digital research data and methods for this discipline and which methodological and conceptual consequences are involved and needed, especially a shared understanding of standards.
Introduction: This blog post not only presents a technique of measuring poetic meter and using it to plot distances between poets, but it also provides an insight into the theoretical and empirical process leading to those results.
Introduction: In the context of medieval and early Tudor texts scholarship, this paper discusses the methodological use of the database not simply to store information, but to clarify points of tension between the questions asked and the information provided in order to find answers.
Introduction: Open Access has made an impact on the business strategies of major publishing companies, but the effects may turn out to be perverse. Pressed by Open Access to find new revenue models publishing houses have moved to acquire ownership and dominance of academic data infrastructures. This article investigates the strategy of Elsevier to acquire renewed economical gain of academic work.
Introduction: This post presents a new book the purpose of which is to help the researchers appropriate digital research methodologies and tools.
Introduction: This software paper in Polish describes “Magik” (Magician), a tool for textual scholars which allows for comparisons of different variants of the same text.