Semantic Concepts in Textbooks (SemKoS)
As part of the Semantic Concepts in Textbooks (SemKos) project, the DIFI department is creating a prototype for a digital annotation tool that can continually learn and improve through use, and which is based on a comprehensive needs and requirements analysis. The prototype enables texts to be collaboratively marked up and classified, directly in the digitised material. Words and phrases are linked with entries in the Integrated Authority File (GND): these links can be compared, exported and visualised. This tool has been created in response to the rapidly growing number of digital and digitised textbooks, and facilitates research for academics by providing new methods and approaches.
-
Aims
The SemKos project arose from the need to develop an instrument for the digital humanities that was oriented towards the specific needs of researchers working with educational media, and which could be effectively implemented in practice. During the realisation of the tool it is, therefore, very important for there to be close collaboration between humanities researchers and computer scientists.
-
Methodology
The humanities part of the project examined how researchers worked with digitised materials in practice, in order to design a tool that could optimally support this practice. The IT implementation consequently focussed on digitally mapping the research process in great detail, in order to minimise concerns and reservations researchers may have in transferring to a digital process. An institute-wide survey confirmed the need for a digital annotations tool. A central requirement for researchers working with digitised source material was, for example, the ability to take full advantage of the benefits of digital materials (such as searchable texts), without losing (or losing track of) visual information from the original, such as page layouts and highlighted typefaces.
The project should also realise research opportunities within computer and information science in addition to the potential of research in the humanities. The digital links that the tool supports between information in the digitised material and external databases of knowledge, maintained by experts, serve the interests of both research disciplines. While humanities research is supported by the tool itself, informatics and information science research benefits from the mark-up ability - enabling research on fully automated, entity linking approaches, for example, and research in the field of word sense disambiguation.
-
Results
The SemKos tool can be employed in many ways. The aim of the content mark-up in SemKos is predominantly statistical evaluation based on the meaning of terms. In traditional research processes, researchers mark words and groups of words in a text, index the text according to keywords or write notes on the paper. SemKos enables researchers to work in a similar way, by working directly on the digital image of the source, the digitised data. Through the ability to link terms in the text with the linked open data cloud, implicit knowledge becomes explicit. The resulting research data, such as annotations, links and categorisations, is visible in the digital material, but also saved in a digital format, together with the digital text. It can then be searched, for example, combined with other information, or used for illustrative purposes. In addition, transitive links within the linked open data cloud (e.g. father-son, x’s place of birth is y, is part of, etc.) enable complex analyses that reveal and visualise relationships and factual information that goes beyond the information contained in the source itself.
In SemKos it is possible to link terms in the text with relevant keywords from the common standards file (GND). The ability to link foreign terms, different spellings of words and declensions with key words with the same meaning, enables later analysis that is not reliant on individual languages or spelling. The linkable key words not only encompass people, places, institutions and events, but general key words and terms as well. This terminology enables links to be established between religious terms, well-known occupations or kinds of animal, for example.
-
Outlook/Development
When completed, the SemKos tool will be integrated into the Institute’s infrastructure and will create interfaces that can be used in future as modules in the Edumeres Toolbox. The tool itself will be part of a usability study conducted by the mobile usability lab, which will check whether improvements can be made to the tool’s user interface or the user experience. The project will also produce data sets that can contribute to informatics and information science research. The modules will continue to be developed as part of the Institute’s infrastructure in current and future projects and partnerships.