Project Details
Projekt Print View

Development of a Repository for OCR Models and an Automatic Font Recognition tool OCR-D

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Early Modern History
Modern and Contemporary History
Theatre and Media Studies
Term from 2018 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 394448308
 
The project addresses the problem of strongly fluctuating recognition rates of OCR for 16th to 18th century historical prints, limiting the full-text digitization of material created by the VD16, VD17, and VD18 programs.Recognition models trained on modern corpora lacking the specifics of historical prints or historic material without thorough bibliographic analysis, retard recognition rates in comparison to the accuracy now routinely achieved for scans of modern prints.The creation of font-specific corpora on the basis of manual tagging is unrealistic, since both non-trivial knowledge of printing history is necessary and the scalability of such an approach would be insufficient. Due to the repetitiveness of the task, such an approach is also very error-prone. The project will enable the humanities to use OCR in a font-specific manner with limited effort. In order to achieve this the project has three main objectives:The development of an online training infrastructure that allows specific models to be trained for these font groups and at the same time for different OCR software.Development of a tool for the automatic recognition of fonts in digitizations of historical prints. In this case, an algorithm for the recognition of fonts in incunabula is first trained using the ground truth found in the Typenrepertorium der Wiegendrucke. In a second step the fonts are grouped according to their similarity in order to get as few groups as possible while maintaining OCR accuracy.Provision of a model repository, in which developed font-specific OCR models are made available to the public.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
Ehemaliger Antragsteller Professor Gregory R. Crane, Ph.D., until 11/2019
 
 

Additional Information

Textvergrößerung und Kontrastanpassung