Data curation for historical educational media research
Through GEI Digital the Georg Eckert Institute makes full texts of historical source materials freely available online to researchers of historical educational media. At the same time, other GEI projects test, develop and use tools for the computer-assisted processing and analysis of digitally available texts. Four problem areas for researchers have been identified:
- Historical research projects frequently include source material of varying origin and data quality.
- Automatic text recognition systems do not always satisfactorily render historical source material into full texts.
- In order to conduct precise searches and statistical evaluations of historical texts it is necessary for their structural features (page numbers, footnotes, headings etc.) to first be annotated.
- In order for different digital tools to be used in synergy they must be interoperable. The aim of this project is to better support historical researchers by providing improved access to digital source material. Use cases will help the team develop processes and technology that will enable flexible curation, processing and expansion of the data in GEI Digital.
In this project, funded through the GEI Seed funds competition, researchers from the Research Library and the Digital Information and Research Infrastructure (DIRI) department will work together to find solutions for these problem areas employing use scenarios based on existing research to investigate the origins of a textbook by Johann Friedrich Wiberg.
Aims
Methodology
The project works with a selection of historical educational media examples of varying provenance: works that already exist in a digitised format, with or without OCR-generated full texts, that are held by the GEI or other libraries, as well as works that have, to date, only been available in the analogue collections of the GEI or other libraries. The OCR4all software suite and LAREX will be used, and adapted if necessary, to optimise full text recognition and structural mark-up of the source materials. By creating the necessary interfaces, these tools and the appropriate quality assurance guidelines are to be made available to libraries and individual researchers as part of the Edumeres toolbox.
Results
The project provides a selection of optimally prepared data for users of the GEI infrastructures, as well as documentation and instructions for independent preparation or subsequent use.
The Research Library and the Digital Information and Research Infrastructure department are optimising and expanding their processes, their repertoire of methods and their cooperation with external libraries and digital humanities researchers in order to be prepared for digitisation projects planned for the future.