Analysis

Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

Posted on December 16, 2019December 16, 2019
by Marinella Testori

The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization
to coreference resolution. It thus provides a range of tools which
can be potentially applied to other languages apart from English.

Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti.

On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click ‘Read more’ for the whole post.]

Analysis

Analyzing Documents with TF-IDF | Programming Historian

Posted on September 15, 2019September 16, 2019
by Rombert Stapel

Introduction: The indispensable Programming Historian comes with an introduction to Term Frequency – Inverse Document Frequency (tf-idf) provided by Matthew J. Lavin. The procedure, concerned with specificity of terms in a document, has its origins in information retrieval, but can be applied as an exploratory tool, finding textual similarity, or as a pre-processing tool for machine learning. It is therefore not only useful for textual scholars, but also for historians working with large collections of text.

OpenMethods

HIGHLIGHTING DIGITAL HUMANITIES METHODS AND TOOLS

Tag: lemmatization

Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

Analyzing Documents with TF-IDF | Programming Historian