Project Details
Unraveling the human 3p-ome regulation landscape at single-cell resolution
Applicant
Professor Dr. Andreas Gruber
Subject Area
Bioinformatics and Theoretical Biology
General Genetics and Functional Genome Biology
General Genetics and Functional Genome Biology
Term
since 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 524608588
The 3’ end of transcripts is generated by endonucleolytic cleavage and polyadenylation (poly(A)), a process that is mediated by the so-called 3’ end processing complex. The complex interacts with specific sequence motifs that are found in the vicinity of so called poly(A) sites, at which the nascent RNA is cleaved and subsequently polyadenylated. The alternative cleavage and polyadenylation (APA) of transcripts gives rise to 3’ end transcript isoforms that differ in their protein coding sequence and/or their 3’ untranslated regions (3’UTRs). The latter harbor cis-regulatory elements that can regulate the stability, translation and localization of the transcript as well as the localization of the encoded protein. Accordingly, APA events play central regulatory roles to cellular states and dysfunctional 3’ end processing plays key roles in various diseases, including cancer, neurological, immunological and haematological diseases. In human, the usage of 3’ end transcript isoforms varies substantially across tissues. In testis, ovary and embryonic stem cells the expressed 3’UTRs are short, whereas the longest 3’UTRs are generated in neurons. The set of all 3’ end transcript isoforms present in an individual or a population of cells can be referred to as their ‘3p-ome’. Importantly, genome-scale sequencing of transcript 3’ ends has revealed that the majority of poly(A) sites are not represented in current gene annotations and that these not yet annotated transcript 3’ ends are much more tissue-specific compared to currently annotated 3’ ends. This is likely the case because previous transcript annotation efforts were largely based on bulk sequencing data of standard cell lines and a rather limited set of cell states and tissues. However, especially for studying cell identity and function it is crucial to have available a comprehensive gene model that covers tissue-specific transcripts. Because current research is for big parts ‘blind’ for cell-specific 3’ end transcript isoforms the ambitious goal of this project is to largely complete the annotation of human 3’ end transcript isoforms and characterize their expression patterns at single-cell resolution to foster research of this fundamental molecular process. Towards this aim we will use thousands of single-cell sequencing data covering a large array of cell types in order to identify and annotate so far unknown 3’ end transcript isoforms. Having available a vastly extended 3’ end transcript isoform annotation we will search for so far unknown subpopulations of cells based on transcript isoform expression similarities. We will further identify cell population-specific biomarkers and characterize the impact of isoform switching events on protein domain expression and the presence of cis-regulatory elements encoded in the 3’UTR of transcript isoforms.
DFG Programme
Research Grants