OpenMethods Spotlights showcase people and epistemic reflections behind Digital Humanities tools and methods. You can find here brief interviews with the creator(s) of the blogs or tools that are highlighted on OpenMethods to humanize and contextualize them.
In the first episode, Alíz Horváth is talking with Hilde De Weerdt at Leiden University about MARKUS, a tool that offers offers a variety of functionalities for the markup, analysis, export, linking, and visualization of texts in multiple languages, with a special focus on Chinese and now Korean as well.
Before reading the interview below, it is worth to take a look at the tool description here.
Now let’s see who is behind the tool:
Hilde De Weerdt
(ORCID 0000-0002-9670-674X) is Professor of Chinese History at Leiden University. She studied Chinese and Chinese History at KU Leuven (BA) and Harvard University (PH.D.) and taught history at the University of Tennessee at Knoxville, Oxford University, and King’s College London before becoming chair of Chinese History at Leiden in 2013. She is currently working on a longue-durée global history of Chinese political advice literature. She published three volumes on medieval Chinese political culture and intellectual history (Competition over Content: Negotiating Standards for the Civil Service Examinations in Imperial China (1127-1276), 2007; Information, Territory, and Networks: The Crisis and Maintenance of Empire in Song China, 2015; Knowledge and Text Production in an Age of Print–China, 900-1400, ed., 2011). Her latest publications include an edited translation titled The Essentials of Governance (Cambridge University Press, 2020) and a comparative history of European and Chinese political culture (Political Communication in Chinese and European History, 1000-1600, ed., Amsterdam University Press, 2020). She maintains an active interest in designing and developing digital research methods for East Asian languages. With Brent Ho she co-designed MARKUS, and with Mees Gelein COMPARATIVUS. On the history and concept behind these and related digital research projects, see “Creating, Linking, and Analyzing Chinese and Korean Datasets: Digital Text Annotation in MARKUS and COMPARATIVUS” (Journal of Chinese History 2020).
I have been trying to make the case that Asian languages are also European languages, in the sense that languages such as Arabic, Hebrew, Hindi, and Chinese are widely spoken, read, and published in Europe.
Hi Hilde, and thanks for joining us! Let me start by asking what is the key challenge that MARKUS helps alleviate and what motivated you to create it?
When Brent Ho and I first conceived MARKUS, we aimed to create an intuitive way to digitally annotate information in Chinese texts, extract the annotated data, link it to external databases, and perform further analysis on the resulting dataset. I had devised a method for analysing communication networks by annotating the digital texts of notebooks (biji) based on a TEI schema, but this required multiple steps in different software packages. Given that rich textual, biographical, and geographical databases as well online dictionaries already existed for Chinese texts, we thought that rather than working on another visualisation platform for spatial and network analysis (my original plan), it made more sense to generalise the annotation process and link to existing visualisation platforms and provide export to standard file formats for further analysis of the resulting data. Automated and manual annotation of default and customised entities was the original goal, but soon after we launched this, we discovered that there was more we and researchers wanted.
What makes this tool special compared to other markup tools?
When we first conceived MARKUS back in 2013 there were not too many other options around, and none that were designed to take advantage of the Chinese digital resources available. By now there are some more options, some designed for European languages but open to general use such as Pelagios and Recogito, and some that more recently emerged, based on the MARKUS idea in Taiwan and the PRC. I think we are still different from these latter projects in two respects: we are researcher-driven and we have from the beginning sought to link up compatible projects with different strengths. As to the first point, we continue to develop functionality (much slower since 2017 when the initial funding had been used up) that we and those who contact us would like to have and we try to develop these in a way that fits the research flows of humanities researchers. For example, we recently added relational markup, a Korean version, text comparison and text overlap markup because I and students here at Leiden University wanted this for our research projects. As for the second point, there are areas in which we cannot measure up to others: we do not provide texts, we do not have databases, we have not yet implemented group annotation and version control, etc. Some of these things have been on our list for a while (not text provision as others are already doing this), but we have also consciously chosen not to reinvent the wheel but to link up to projects that do things that we think would be useful and to add features that other projects lack.
Do you need specific previous DH experience to make the most out of MARKUS?
Probably; I am giving this a tentative yes. We have tried to make MARKUS as intuitive as possible, but from conducting workshops I have learned that it is very important that one has a basic understanding of what markup is before one dives in. One does not necessarily need previous DH experience for this, but one does have to spend some time to consider the pros and cons of various methods in order to design a research process that suits one’s questions. Learning more about methodology and the theoretical assumptions on which methodologies are based also helps one interpret the data and the text(s) from which the data are drawn.
What is the biggest strength of MARKUS and what do you think could be further improved in it?
Difficult question. I think the biggest strength is that it focuses on one thing (digital text annotation using the best available scholarly resources) but provides a wide range of options (various methods of annotation and linked reference works, for example) within that framework.
As for weaknesses and room for improvement, there are many areas: machine learning, server-based annotation, standalone version, better integration of metadata on multiple textual levels, further text comparison options, more languages and more linked resources and platforms, to name but of a few on our list. I take comfort in the thought that there is more to be done, but it is frustrating that time is limited. The odd thing is that this was very much intended as a limited side project of a larger collaborative research project funded by the European Research Council, but it has kept expanding…
How do you see the role and significance of MARKUS in the context of DH in Asian studies?
Also a difficult question! Perhaps this is one for others to answer? I have been pleasantly surprised by the enthusiasm with which MARKUS has been welcomed by scholars and students from very different fields ever since we first showed the first version at the annual Association for Asian Studies conference in 2014: history, literature, religious studies, law, medical history, anthropology, sociology, communication studies, librarianship, digital humanities, etc. On the whole, I think there is a limited lifespan for such projects. As we keep making progress in developing digital research services in Asian languages I expect (and hope) this will be superseded sometime soon. I mainly see one broader significance then for this project. I see it as an example of how researchers can contribute to developing research services that they wish to see. We tended in the past to rely on commercial providers for this, and what they have come up with (and the speed with which they develop research services) has not been optimal. This requires us to think outside of the box—and one can go much further in this regard than MARKUS has gone. It also requires us to be willing to invest in setting up shared services for the scholarly and wider community. This point is not to be underestimated. Just depositing the code of something one has designed is at this time not sufficient to facilitate broader use in the community of Asian Studies scholars. I invested personal research funds to keep MARKUS running as a service (and I am not the only early bird who has done this, think of Donald Sturgeon’s amazing ctext project), and as long as there is still some use for it, I will keep doing that. However, I also hope we can come up with more sustainable and structural means to fund researcher-driven digital research services in Asian languages.
What are your plans for the future in terms of further developments? For example, are you planning to extend its coverage to Japanese as well since it is already available in Chinese and Korean?
We have a long list of to do’s, much depends on funding for new development. We currently have plans to ensure that Korean place names can also be mapped in DOCUSKY and DOCUGIS (projects based at National Taiwan University’s Centre for Digital Humanities), to develop event markup which will allow one to establish hierarchical relations between different MARKUS tags (which I designed for a project on the history of material infrastructures), and to add a module on the comparison of editions of texts. We remain interested in expanding to other languages – a lot of our functionality works in all languages (manual and keyword markup, batch markup, text comparison and text overlap markup), but in order to set up automated markup and web references we need partners with open scholarly datasets. For Japanese we have not yet been able to locate suitable partners.
In a broader sense, how do you see the current DH scene in Europe and beyond? Are there distinct similarities or differences in how DH is practiced in Europe and elsewhere (for example in the context of Asian studies)?
I find it difficult to generalise about scholarly traditions and trends on national or continental scales. Digital Humanities appears to me to have been from the very beginning an international undertaking as far as academics are concerned. Nevertheless, it seems that differences in the organisation of research and in research funding can lead to different paths in some areas. European funding agencies and national research organisations in European countries such as the Netherlands have heavily invested in large-scale projects that, to my mind at least, have led to a more top-down approach and low uptake on much of what they have produced. Most of these projects (with the exception of some linguistics projects) are focused on a very narrow set of “European” languages. That is a shame; I have been trying to make the case that Asian languages are also European languages, in the sense that languages such as Arabic, Hebrew, Hindi, and Chinese are widely spoken, read, and published in Europe.