OpenMethods Spotlights #3 Keeping a smart diary of research processes with NeMO and the Scholarly Ontology
Interview with Panos Constantopoulos and Vayianos Pertsas
OpenMethods Spotlights showcase people and epistemic reflections behind Digital Humanities tools and methods. You can find here brief interviews with the creator(s) of the blogs or tools that are highlighted on OpenMethods to humanize and contextualize them.
In the next episode, we are looking behind the scenes of two ontologies: NeMO and the Scholarly Ontology (SO) with Panos Constantopoulos and Vayianos Pertsas who tell us the story behind these ontologies and explain how they can be used to ease or upcycle your daily works as a researcher. We discuss the value of knowledge graphs, how NeMO and SO connect with the emerging DH ontology landscape and beyond, why Open Access is a precondition of populating them, the Greek DH landscape …and many more!
Panos Constantopoulos is Professor in the Department of Informatics, Director of the MSc Program in Digital Methods for the Humanities and former Dean of the School of Information Sciences and Technology, Athens University of Economics and Business. He also heads the Digital Curation Unit, Information Management Systems Institute, “Athena” Research Centre. Previously, Professor and Chairman in the Department of Computer Science, University of Crete, and Head of the Information Systems Laboratory and the Centre for Cultural Informatics, Institute of Computer Science, Foundation for Research and Technology – Hellas. He has been principal investigator in ca. 40 R&D projects. Currently he coordinates “APOLLONIS”, the Greek Infrastructure for Digital Arts, Humanities and Language Research and Innovation. His scientific interests include knowledge representation and conceptual modelling, ontology engineering, semantic information access, process mining, knowledge management systems, decision support systems, cultural informatics, digital libraries, digital curation and preservation. | Vayianos Pertsas holds a Dipl. Eng. Degree in Electrical and Computer Engineering from the University of Patras and a PhD degree in Informatics from the Athens University of Economics and Business. His research interests revolve around conceptual modeling and ontology population using information extraction techniques that mainly focus on leveraging linked data and its applications, along with NLP and machine learning techniques. He has worked in the development of the NeDiMAH Methods Ontology in the ESF-funded Network for Digital Methods in the Arts and Humanities and in its evolution and operationalization in the form of the Scholarly Ontology. He has authored papers appearing in IJDL, ISWC, TPDL and co-tutored various workshops. He is currently affiliated with Athens University of Economics and Business as a Post-doctoral Research Associate in the Department of Informatics. |
Hi Panos and Vayianos, and thanks for joining us! Could you start off by telling us a bit about what motivated the creation of NeMO and SO and which communities have been involved in their creation?
Panos: The prime motivation came from the observation that the information needs of researchers are determined not only by their subject matter, but also by the way they work. This reflects the research questions pursued and the resources and tools being developed in each domain. In addition, today research gets increasingly data-driven and computationally intensive. In Digital Humanities, as well as in other areas, this often requires interdisciplinary collaboration (for instance, with IT professionals) and relies on large scale data management and knowledge extraction from databases, text, images, and the Web. A researcher needs to establish an overview and gain access to an expanding universe of information resources, but also of relevant tasks, methods, tools, goals, agents, and projects. So, we decided to design an ontology for capturing the different, yet interconnected, facets of research work.
Based on this, the next step was to streamline the creation of descriptions of the process and the outcome of research works in the form of knowledge graphs, capable of supporting complex information searches and reasoning. The development of NeMO, an acronym for NeDiMAH Methods Ontology, took place in the context of the ESF Network for Digital Methods in the Arts and Humanities (NeDiMAH) project. Using previous work of the Digital Curation Unit on information needs and behaviours in the humanities (in the context of DARIAH, EHRI and Europeana Cloud) as stepping stones, the NeMO team carried out several interviews and workshops with digital humanists to ensure that NeMO has adequate empirical grounding, matches established information needs and is quite straightforward to understand. Activity on validation and refinement of NeMO continued within the DARIAH DiMPO Working Group and was informed by the extensive trans-European survey of digital methods and practices carried out in DiMPO.
Later, the ontology has been expanded to and tested in in domains outside the humanities, such as Medicine, Biology, Computer Science, Electrical Engineering. As a result, a domain-neutral generalization has been developed, which is called Scholarly Ontology (SO). So now, NeMO can be thought of as a specialization of SO in the humanities.
What are the main aims of NeMO and the Scholarly Ontology ?
Panos: With NeMO initially and with the Scholarly Ontology (SO) later, our main goal was to conceptualize and document the research process in a systematic and formal way so that different aspects of scholarly research behaviour can be covered and available resources, possibly from different fields, be interconnected. Although the ontology started out as a conceptual model that describes research processes in DH, it became apparent that we were facing a problem with a very similar manifestation in different research fields and -very interestingly- the modeling approach originally used for DH could be employed for other disciplines as well. This led to the creation of an even more generic model, named Scholarly Ontology, that covers scholarly work in general and -through its modular architecture- can incorporate discipline-specific models like NeMO.
We wanted SO and NeMO not only to provide a schema for manual documentation of one’s work, but also to drive the automatic extraction of information from text feeding directly into a knowledge graph. This brings us to the issue of the source of information: currently, discovery practices related to research activities are mostly limited to traversing author / citation graphs or searching other metadata fields of publications in scholarly information databases such as Google Scholar. What we propose instead is the use of annotations created by ontology-driven extraction of information from the body of scholarly publications. The result is a knowledge graph holding a structured representation of the account of research work extracted from the texts.
Which kinds of input did you use for the creation of this knowledge graph?
Vayianos: NeMO and Scholarly Ontology offer a conceptual schema that can help researchers document their work in a systematic way – common across disciplines – and detailed enough so that research work can be modeled even at a fine level of granularity (describing methods, goals, activities etc.). The knowledge graph comprises schema and instances, in the form of an RDF/RDFS graph. The instances in the database currently highlight potential use cases and workflows for documenting research work. For the creation of the database, we used input from i) the documented work of scholars that we received from various workshops, ii) various papers that we used as indicative examples of how research processes described in a paper could be transformed into instances of the schema; iii) findings of the DiMPO trans-European survey; and iv) existing DH taxonomies.
A subquestion to that: in order to extract and model scholarly research processes in humanities research, you created a knowledge base called Research Spotlight. It was developed to extract information from research articles, enrich it with relevant information from other Web sources. Which databases did you consult for this and what were the biggest challenges in the process? Did you encounter issues related to open access to scholarly databases?
Vayianos: Initially, populating the knowledge base was an interactive exercise for testing the model. But documenting research work by hand can be quite time-demanding and may also lead to inconsistencies between different (human) annotators. A viable process would be to populate the knowledge base automatically and use manual procedures only for certain curation tasks. So, we developed Research Spotlight, a system that automates and enhances the process of creating the knowledge base. This is done using various machine learning algorithms that we have developed, along with algorithms that gather data and infer information from various Web sources. Specifically, we use APIs to retrieve research articles from various publishers (e.g., Springer, Elsevier); we access DBpedia to gather information about various concepts of Scholarly Ontology (research methods, research topics and tools) to train our ML models; and we connect with ORCID for author disambiguation. For sources where we cannot access an API, we developed our own scrapers to collect information from their websites. To avoid licensing conflicts, we used only Open Access material. This alone highlights the importance of open access to the scholarly corus as a precondition of doing state-of-the-art, computational research. Of course, combining information from such different sources and in different formats – let alone transforming it into meaningful datasets that can be used for machine learning – is a technical challenge by itself. Nevertheless, we haven’t yet encountered any insurmountable obstacles that could irrevocably hinder our efforts.
How do NeMO and SO support scholarly discovery e.g. enabling scholars to find workflows, methods and tools relevant to their research questions?
Panos: In contrast to other available vocabularies and taxonomies for documenting scholarly work, NeMO and Scholarly Ontology are ontological models. This allows not only to create various concepts (such as activities, methods, goals, tools, etc.) possibly with synonyms and hierarchical relations, but also, and most importantly, to associate them semantically in a way that gives them context, forming a knowledge graph. For example, using the concepts and relations of Scholarly Ontology such as Activity, follows(Activity1, Activity2), Method, employs(Activity, Method) we can model workflows comprising specific sequences of activities that were conducted during an experiment or a study, along with the methods that were employed to perform the activities. Moreover, since these are instances in a knowledge base, they can be easily retrieved by queries such as “How has a particular experiment that uses a specific method been conducted?”, “Which steps were involved?”, “In which order?”, etc.
How NeMO achieves interoperability and connectivity to other ontologies and taxonomies such as CIDOC-CRM or TaDIRAH (esp. with v2 of TaDIRAH2)?
Vayianos: NeMO and SO were derived from CIDOC-CRM. All their classes and properties are specializations of CIDOC-CRM concepts and properties. Hence, they are compatible with CIDOC CRM. As for TaDiRAH, we have developed specific mappings from their vocabulary to ours under the general “Type” concept (inherited from CIDOC CRM). In general, this technique allows incorporating as well coordinating different vocabularies in an ontology. For example, TaDiRAH terms denoting research activities could be incorporated as instances of the ActivityTypes concept in NeMO (or SO) and therefore inherit all the properties from that class. This way, not only various works that have been documented using TaDiRAH can be easily imported in NeMO (or SO) but also, researchers can have access to the new modelling capabilities that the NeMO/SO ontological framework offers (such as the semantic relations that associate activity types with other concepts like Methods, Goals, etc.).
How can the knowledge graph be integrated in specific research projects? Do you know re-use cases or specific implementations of NeMO?
Vayianos: Researchers can use the knowledge graph (NeMO or Scholarly Ontology) to document their work. We provide definitions for our terms and use case scenarios that can help them achieve consistent documentation. I like to think of this process the same way as “keeping a smart diary”. The difference with a “normal diary” though, would be that the output in our case is a machine readable and understandable object (e.g., an RDF file). This not only ensures that every single piece of information is retrievable, but also that – if a person or a group are using the same model to produce their “smart diaries” – all these RDF files can be combined into a bigger knowledge graph where information can be matched, and new knowledge can be inferred. Interactive documentation of research work using NeMO or SO can be easily done using any modelling tool, such as Protégé, where the ontology can be imported and then used by a researcher for documenting her/his project. Along this line, we also have developed a prototype that incorporates NeMO and provides a friendly GUI for entering instances and querying the knowledge base. This was presented in a workshop at DH 2016 in Krakow where researchers used it to model their own work. Currently we are working on the development of a social network tool (like non-commercial alternatives to ResearchGate or academia.edu) which leverages the automatically parsed instances from various research papers (created by Research Spotlight) and provides a Twitter-like UI where researchers will be able to browse for methods, activities, research topics, etc.
How can you sustain and ensure the coherence of the ontology in the long-term?
Panos: The ontology is maintained by the team in the Digital Curation Unit of the Information Management Systems Institute, Athena Research Centre. Any additions and expansions to “neighboring” fields such as academic publishing or other specific disciplines, are carefully studied and mapped to existing classes so as to assure coherence. Once an update is ready, we publish it and release it as an RDF file along with relevant documentation.
Finally, we know that a great deal of work around NeMO had been carried out in DARIAH-GR. To finish the interview, could you also tell us a bit about the special flavours of Digital Humanities in Greece?
Panos: As you know, Greece is a member of DARIAH as well as of CLARIN, with DARIAH-GR and CLARIN-EL being active for quite some time. DARIAH-GR, or ΔΥΑΣ by its Greek name, has launched the development of a range of registries, terminological thesauri, best practice guides, tools for collections and terminology management, and an extensive monitoring, charting and modelling of DH practices. The development of NeMO and SO is linked to the latter line of action. In late 2017, DARIAH-GR and CLARIN-EL joined forces to develop a united national infrastructure, APOLLONIS, the Greek Infrastructure for Digital Arts, Humanities and Language Research and Innovation. Currently, APOLLONIS includes a permanent, stable infrastructure for accessing language resources and language processing web services; a collaborative workspace for language processing application development; curated digital resources and services for the development, analysis and visualization of data; best practice guidelines; and dissemination and training activities on the use of digital methods and tools in the Humanities. To put the collaborative workflows of humanists in the spotlight and to showcase the joint use of services and resources spanning the entire spectrum of the infrastructure, a thematic action is undertaken involving archival material from different sources and in various forms concerning a specific period of modern Greek history, the 1940s. The SO is the modelling framework underlying the joint APOLLONIS catalogue of resources and services, and the documentation of aggregation and curation processes. The Greek 1940s thematic action provides a demanding use case for validating these, otherwise generic processes. The recent Covid-19 pandemic has sparked a reflection on DH practices and an assessment of the possibly permanent impact this disruptive condition may have.
+1 Is multilingualism a relevant concept to NeMO? Is multilingualism reflected in it somehow and are you planning to have the ontology available in multiple languages?
Vayianos: Since NeMO and SO are implemented as knowledge graphs using RDF technology, the display of the concepts and their instances in other languages is easy to implement and can be achieved by using a RDF attribute such as xml:lang. This is one of the reasons we decided to go with this technology in the first place. Embracing the values of open Web society and highlighting how different aspects (disciplines or even languages for that matter) can be interrelated and combined so that new knowledge can be inferred, is embedded into the core of our work.