Project Details
Concepts, tools and support for managing, archiving, mobilizing and integrating taxonomic data
Applicants
Dr. Ivaylo Kostadinov; Professor Dr. Miguel Vences
Subject Area
Evolution, Anthropology
Term
since 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 447018505
Taxonomy is becoming a big data science, with an increasing pace of digitization of many of the three billion specimens in collections, and publication of millions of data entries for images and phenotypic and genetic traits. In the first period of the Taxon-Omics priority program it became clear - and was suggested by the review panel and DFG senate - that data management and integration is to become a crucial and central part of this consortium. In the first period of SPP, PIs Renner and Vences worked with all projects to identify which kinds of data are generated and which archiving strategies followed, organized two workshops, published a first opinion paper emphasizing the importance of data storage and data re-use in taxonomy, and undertook a vast literature survey to identify the requirements for data management in alpha-taxonomy. Here we propose a targeted support project to accompany the second period of the SPP, to ensure from its very beginning support for all SPP participants in conceptual and practical aspects of data management, data archiving and data integration, and to develop this important field further for the entire taxonomic community, including the practical implementation of tailored front-end solutions for submitting and searching specimen-based data packages in alpha taxonomy. For this purpose, we will improve and implement table-based submission templates for diverse taxonomy-related data, and develop a submission interface that automatically reads in this template and the submitted data package and performs initial plausibility check to identify objective errors in the data, such as misspelled scientific names or file names. As a central task, we will advise and assist all projects within the SPP in the development and implementation of data management plans, and provide hands-on support for all SPP1991 projects during data submission to repositories, preferably via the GFBio portal. As part of the networking and data integration activities within this project, we will also collect wet lab as well as analytical work protocols from the projects and compile and publish these - under participation of the collaborating projects - as open-access primers and best-practice recommendations. In connection with a work package carried over from the first project phase we will furthermore initiate the development of machine-learning software tools that mirror the actual workflow of integrative taxonomy, combining probabilistic species delimitation based on genetic divergences, with sympatry tests and coalescence-based approaches, and being able to handle the diversity of data being used by taxonomists in this priority program.
DFG Programme
Priority Programmes