Project Details
Large-scale investigation of short read archives for disruption of transcription termination and circular splicing
Applicant
Professorin Dr. Caroline Friedel
Subject Area
Bioinformatics and Theoretical Biology
Term
from 2019 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 417339504
We and others recently reported that both lytic herpes simplex virus 1 (HSV-1) infection and salt and heat stress lead to a disruption of transcription termination (DoTT) of the majority of human genes. This results in massive transcription downstream of genes (DoG transcription). Considering that stress-induced transcription changes have already been studied intensively, it is surprising that DoTT/DoG transcription has been discovered only so recently. The most likely reason for this is that standard RNA-seq analysis focuses only on known transcripts, thus ignoring transcription outside of annotated genes. However, most journals now require that before publication of manuscripts raw sequencing data is submitted to short read archives such as the NCBI SRA. Thus, even if the original studies did not investigate DoTT/DoG transcription, the raw data is available for reanalysis. While the sheer size of RNA-seq data in short read archives previously made it prohibitive to perform a large-scale reanalysis of all available published data, recent developments now make this feasible. On the on hand, these developments include novel alignment-free methods for both short read indexing and searching as well as fast transcript quantification. On the other hand, genome-wide read coverage and splice junction data is now provided by the recount2 database for the majority of human RNA-seq experiments in the SRA. In this project, we propose to leverage these new tools to address specific questions arising from our previous analysis of HSV-1 infection and heat and salt stress. First, we plan to characterize the prevalence and similarity of DoTT/DoG transcription in different conditions and its conservation between species. Second, we will study de novo production of circular RNAs as well as investigate their induction relative to linear RNAs in different conditions. For this purpose, we will pursue both a re-analysis of the recount2 data to quickly characterize DoTT/DoG transcription in a wide range of conditions in human as well as combine this with alignment-free approaches to extend this analysis to the remaining SRA and circular splicing. In summary, this project not only aims to address important biological questions on DoTT/DoG transcription and circular splicing in an SRA-wide manner, but it will also serve as a proof-of-concept that the large-scale investigation of SRAs for specific transcription events is both feasible and useful.
DFG Programme
Research Grants