Introduction by OpenMethods Editor (Marinella Testori):
Awarded as Best Long Paper at the 2019 NACCL (North American Chapter of the Association for Computational Linguistics) Conference, the contribution by Jacob Devlin et al. provides an illustration of “BERT: Pre-training of Deep Biredictional Transformers for Language Understanding” (https://aclanthology.org/N19-1423/).
As highlighted by the authors in the abstract, BERT is a “new language representation model” and, in the past few years, it has become widespread in various NLP applications; for example, a project exploiting it is CamemBERT (https://camembert-model.fr/), regarding French.
In June 2021, a workshop organized by David Mimno, Melanie Walsh and Maria Antoniak (https://melaniewalsh.github.io/BERT-for-Humanists/workshop/) pointed out how to use BERT in projects related to digital humanities, in order to deal with word similarity and classification classification while relying on Phyton-based HuggingFace transformers library. (https://melaniewalsh.github.io/BERT-for-Humanists/tutorials/ ). A further advantage of this training resource is that it has been written with sensitivity towards the target audience in mind: in a way that it provides a gentle introduction to complexities of language models to scholars with education and background other than Computer Science.
Along with the Tutorials, the same blog includes Introductions about BERT in general and in its specific usage in a Google Colab notebook, as well as a constantly-updated bibliography and a glossary of the main terms (‘attention’, ‘Fine-Tune’, ‘GPU’, ‘Label’, ‘Task’, ‘Transformers’, ‘Token’, ‘Type’, ‘Vector’).
BERT is a state-of-the-art NLP method trained on a very large dataset of texts—namely, the entirety of English-language Wikipedia (2,500 million words) and a corpus of English-language books (800 million words). Thanks to this large amount of training data and its unique neural network architecture, BERT—–and subsequent methods like it (e.g., GPT-2)–—can understand human language significantly better than previous NLP methods. For example, BERT can identify whether a sentence expresses positive or negative sentiment, predict what sentence should come next in a paragraph, and disambiguate between multivalent words with never-before-seen levels of accuracy.
(Excerpt from Melanie Welsh’s BERT for Humanists training resource)
Jacob Devlin et al., “BERT: Pre-training of Deep Biredictional Transformers for Language Understanding” (https://aclanthology.org/N19-1423/) in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, NACCL: Minneapolis, Minnesota, pp. 4171-4186.
BERT for Humanists