Introduction: Digital text analysis depends on one important thing: text that can be processed with little effort. Working with PDFs often leads to great difficulties, as Zeyd Boukhers Shriharsh Ambhore and Steffen Staab describe in their paper. Their goal is to extract references from PDF documents. Highlight of their described workflow are very impressive precision rates. The paper thereby encourages to a further development of the process and its application as a “method” in the humanities.
Introduction: GROBID is an already well-known open source tool in the field of Digital Humanities, originally built to extract and parse bibliographical metadata from scholarly works. The acronym stands for GeneRation Of BIbliographic Data.
Shaped by use cases and adoptions to a range of different DH and non-DH settings, the tool has been progressively evolved into a suite of technical features currently applied to various fields, like that of journals, dictionaries and archives.
[Click ‘Read more’ for the full post!]
The reviewed article presents the project BILBO and illustrates the application of several appropriate machine-learning techniques to the constitution of proper reference corpora and the construction of efficient annotation models. In this way, solutions are proposed for the problem of extracting and processing useful information from bibliographic references in digital documentation whatever their bibliographic styles are. It proves the usefulness and high degree of accuracy of CRF techniques, which involve finding the most effective set of features (including three types of features: input, local and global features) of a given corpus of well-structured bibliographical data (with labels such as surname, forename or title). Moreover, this approach has not only been proven efficient when applied to such traditional, well-structured bibliographical data sets, but it also originally contributes to the processing of more complicated, less-structured references such as the ones contained in footnotes by applying SVM with new features for sequence classification.
[Click ‘Read more’ for the full post.]
Introduction: In this article, Nicolás Quiroga reflects on the fundamental place of the note-taking practice in the work of historians. The artcile also reviews some tools for classifying information -which do not substantially affect the note-taking activity – and suggests how the use of these tools can create new digital approaches for historians.
Introduction: This post explains the benefits of using BEACON for data enrichment and increased visibility, on the example of Bibliografie deutsch-jüdische Geschichte Nordrhein-Westfalen.
Introduction: A review of the book BITECA: Bibliografia de textos antics catalans, valencians i balears: Biblioteques i Arxius Valencians, by Beltran, Avenoza & Soriano (2013), that is an excuse to explain the technologies used to work on the first Dictionary of the Old Spanish Langauge (DOSL) and other versions at the Hispanic Seminary of Medieval Studies (HSMS).
Introduction: This post presents a new book the purpose of which is to help the researchers appropriate digital research methodologies and tools.
Introduction: This conference report highlights a tool for preservation and research process of oral archives.
Introduction: This post updates the use of this awarded platform.
Introduction: This Italian post highlights (also with English slides) the use of the software Zotero for research process and results.