Introduction by OpenMethods Editor (Delphine Montoliu): The podcast of Caroline Sporleder’s English talk at the École Normale Supérieure (ENS) of Paris-France is now available.
Conférence de Caroline Sporleder: “Computational Linguistics and Digital Humanities: Chances and Challenges”
Digital Humanities (DH) is a field that has grown immensely in recent years. It is also a very diverse field covering -in its broadest definition- everything from corpus linguistics over computational philology and quantitative history to computational archaeology.
Because the origin of the field is rooted in corpus linguistics and computational philology and because data in the Humanities and Social Sciences are often (but not always) textual, digital text representation, processing, and mining are a major area of attention. Computational linguistics has a lot to contribute to this, both at the lower end of the scale (e.g., tools for OCR error correction and preprocessing) and at the higher end (e.g., sophisticated text mining tools). Computational linguistics can also benefit from evaluating its algorithms and tools on data from the Humanities as these data are often difficult, e.g. due to non-standard language and spelling, missing sentence boundaries, noisy input data and domains that are different from those typically considered in CL. Hence, CL for DH requires the development of very robust methods that work well on noisy data and do not require large amounts of training data. In this talk, I will address some of the chances and the challenges that arise when applying computational linguistic methods to data from the Humanities and Social Sciences.
Original publication date: 23/06/2017.