Pipelines for languages: not only Latin! The Italian NLP Tool (Tint)

https://openmethods.dariah.eu/2019/12/16/pipelines-for-languages-not-only-latin-the-italian-nlp-tool-tint/ OpenMethods introduction to: Pipelines for languages: not only Latin! The Italian NLP Tool (Tint) 2019-12-16 13:21:50 The StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization to coreference resolution. It thus provides a range of tools which can be potentially applied to other languages apart from English. Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti. On the Tint webpage the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction. [Click 'Read more' for the whole post.] Marinella Testori Blog post Analysis Code Digital Humanities English Language Machine Learning Methods Named Entities Named Entity Recognition POS-Tagging Research Research Activities Research Objects Research Process Research Techniques Text Tools arXiv Association for Computational Linguistics John Bauer lemmatization named entity recognition Natural language processing

In a recent contribution published here about the Classical Language Toolkit (https://openmethods.dariah.eu/2019/10/02/the-classical-language-toolkit-cltk-at-the-forefront-of-digital-philology-for-historical-languages/), the StanfordCore NLP (https://stanfordnlp.github.io/CoreNLP/index.html) has been mentioned among the pipelines currently available for Natural Language Processing.

As detailed on its webpage, the StandforCore NLP wishes to represent a complete Java-based set of tools for various aspects of language analysis, from annotation to dependency parsing, from lemmatization to coreference resolution. It thus provides a range of tools which can be potentially applied to other languages apart from English.

Among the languages to which the StandfordCore NLP is mainly applied there is Italian, for which the Tint pipeline has been developed as described in the paper “Italy goes to Stanford: a collection of CoreNLP modules for Italian” by Alessio Palmero Apostolo and Giovanni Moretti. Preprint arXiv:1609.06204.

On the Tint webpage http://tint.fbk.eu/ the whole pipeline can be found and downloaded: it comprises tokenization and sentence splitting, morphological analysis and lemmatization, part-of-speech tagging, named-entity recognition and dependency parsing, including wrappers under construction.

Tint thus aims at representing an answer to the need of NLP tools for Italian for which, as highlighted by Palmero Apostolo and Moretti in the introduction of their paper, “there is a lack of this kind of resources”.

Resources consulted:

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. [pdf] [bib]

Italy goes to Stanford: a collection of CoreNLP modules for Italian
By Alessio Palmero Aprosio and Giovanni Moretti.
eprint arXiv:1609.06204.

Stanford CoreNLP – Natural Language Software

https://stanfordnlp.github.io/CoreNLP/index.html

Tint (The Italian NLP Tool)

http://tint.fbk.eu/

Leave a Reply

Your email address will not be published. Required fields are marked *