Project Details
Projekt Print View

Diverse storage and applications - Optimizing a sustainable repository for audiovisual research data

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Term since 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 519472872
 
Audiovisual research data play an increasingly important role in numerous humanities disciplines, including linguistics, musicology, ethnology, and oral history. At the University of Cologne, the Data Center for the Humanities (DCH), the Institute for Linguistics (IfL), and the Regional Computing Center (RRZK) have been in close cooperation for years in the area of linguistic research data. One result of this collaboration is the Cologne Center for Archiving and Analysing of Audiovisual Data (KA³). The KA³ repository represents the technical core and the main offering of the center. The Language Archive Cologne (LAC) uses the services provided by the KA³ repository for annotated audiovisual language data. The KA³ repository is a repository solution for audiovisual research data that stores its data in a structured way using the Oxford Common File Layout (OCFL). Currently, 1.3 TB of data have been uploaded, with another 3 TB in various stages of the ingest pipeline. In the medium term, 0.5-1 TB of new data per year is expected. The goal of this project is to investigate the extent to which OCFL works robustly in different storage technologies and in interaction with audiovisual research data, thus providing a sustainable data structure for this data type and its associated annotations. The key concepts here are storage and application diversity. Storage diversity: At university IT centers, such as that of the University of Cologne, a new storage system appears approximately every five to ten years, each with its own specific possibilities and restrictions for the storage of research data (quota, mapping of hierarchies, etc.). Application diversity: The information relevant for the sustainable availability of research data is managed and accessed diachronically by very different applications (repositories, subject-specific portals, etc.). The migration from one technical system to another is associated with the risk of losing information. The main goal of the project described here is to show that the combination of OCFL and object storage is an efficient approach to counteract the aging process of technical systems. To validate this approach, data from the application areas of oral history, linguistics, and anthropology will be transferred to the system, and an evaluation will show whether they have survived the technical system change undamaged while retaining their functionality. A major part of the scientific work plan relates to the evaluation of technical solutions for specific data needs in disciplines working with audiovisual data.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung