Project Details
Projekt Print View

OPERANDI: OCR-D PERFORMANCE OPTIMISATION AND INTEGRATION

Subject Area Theoretical Computer Science
Term since 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 460609319
 
The goal of the project is to develop a software package based on OCR-D for mass digitisation with higher performance and better quality. The implementation will be available to other projects or institutions, which have similar requirements. During the pilot stage, two scenarios were identified. In the first scenario, the image digitisation is already accomplished. Therefore, the OCR in this case leads to a downstream process of mass digitisation. In the second scenario, the digitisation is not yet started and OCR is a step in the whole digitisation workflow. To serve both scenarios, the project will create a scalable high-performance system for mass digitisation. This system will run on a high-performance computer and support parallel processing. It will also include useful features, such as data handling, task management and prioritisation, error handling, synchronous/asynchronous interprocess communication, load distribution, authentication and authorization. The focus is on the parallel processing of performance-critical workflows as well as the integration of the OCR-D software into the system. Last but not least, the requirements of VD partner libraries, the SUB, the other projects from the third phase of OCR-D, and the Goobi/Kitodo community will be taken into consideration.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
Ehemaliger Antragsteller Professor Dr. Wolfram Horstmann, until 12/2023
 
 

Additional Information

Textvergrößerung und Kontrastanpassung