OCR4all libraries

Full-text recognition for historic collections

This cooperative project between the Research Library at the Leibniz Institute for Educational Media | Georg Eckert Institute (GEI), the Würzburg Zentrum für Philologie und Digitalität 'Kallimachos' (ZPD) and the chair for Human-Computer Interaction (HCI) at the University of Würzburg is supported by the DFG within the funding line to implement OCR-D software for the digitisation of full-texts (Implementierung der OCR-D-Software zur Volltextdigitalisierung). The seventeenth- and eighteenth-century textbooks digitised by the GEI’s Research Library are serving as a use case in the project. There are vast disparities in the quality of the OCR in the digital collection, due in part to complex layouts and irregular typography, which still present significant obstacles to high-quality text recognition.

In order to improve specific aspects of OCR quality, the project employs a generic process that allows full-text recognition to be organised by collections, each with a similar material base. The open source software developed by ZPD, called OCR4all, combines different solutions for optical character recognition and brings them together into one standardised workflow. A graphical user interface enables all users, regardless of their technical expertise, to capture complex materials independently and with a high degree of quality. In order to ensure the increasingly complex nature of the resulting OCR solution remains user-friendly, the graphical user interface will be adapted and further developed under the guidance of and in close cooperation with the HCI.

  • Publications

    • Anke Hertling, Sebastian Klaes (2022): Volltexterkennung für die Forschung: OCR partizipativ, iterativ und on demand. In: o-bib. Das offene Bibliotheksjournal. (in Vorb.)

Project Team

sroll-to-top