Project Details
OPERANDI: OCR-D PERFORMANCE OPTIMISATION AND INTEGRATION
Applicants
Zeki Mustafa Dogan, since 1/2024; Professor Dr. Ramin Yahyapour
Subject Area
Theoretical Computer Science
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 460609319
The goal of the project is to develop a software package based on OCR-D for mass digitisation with higher performance and better quality. The implementation will be available to other projects or institutions, which have similar requirements. During the pilot stage, two scenarios were identified. In the first scenario, the image digitisation is already accomplished. Therefore, the OCR in this case leads to a downstream process of mass digitisation. In the second scenario, the digitisation is not yet started and OCR is a step in the whole digitisation workflow. To serve both scenarios, the project will create a scalable high-performance system for mass digitisation. This system will run on a high-performance computer and support parallel processing. It will also include useful features, such as data handling, task management and prioritisation, error handling, synchronous/asynchronous interprocess communication, load distribution, authentication and authorization. The focus is on the parallel processing of performance-critical workflows as well as the integration of the OCR-D software into the system. Last but not least, the requirements of VD partner libraries, the SUB, the other projects from the third phase of OCR-D, and the Goobi/Kitodo community will be taken into consideration.
DFG Programme
Research data and software (Scientific Library Services and Information Systems)
Co-Investigator
Professor Dr. Philipp Wieder
Ehemaliger Antragsteller
Professor Dr. Wolfram Horstmann, until 12/2023