Project Details
Projekt Print View

A Flexible and Efficient System for the Detection of RNA Sequence/Structure Motifs

Co-Applicant Professor Gad Landau
Subject Area Theoretical Computer Science
Bioinformatics and Theoretical Biology
Term from 2009 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 110058642
 
Non-coding RNAs (ncRNAs) are involved in many regulatory processes of a cell. An overwhelming number of ncRNAs is found by whole transcriptome analyses (next-generation-sequencing (NGS)), most of which are not annotated. The functional analysis of ncRNA relies heavily on sequence-structure similarities. Due to the high computational complexity, however, tools for finding sequence-structure similarities are currently not used for annotating newly identified ncRNA. Current computational genome-wide ncRNA analysis often consumes enormous computing resources (ten to hundreds of computer years). Our goal is to setup a system for analyzing and annotating ncRNAs that consists of a set of efficient algorithms and tools for detecting sequence-structure similarity. We will provide an easy-to-use web-based interface to allow biologists to perform ncRNA annotation tasks, and to analyze the ncRNA in its genomic context using personalized genome browser tracks. The combined system will allow two major tasks to be performed: 1.) To search for annotated ncRNAs or ncRNA transcripts in other NGS data or in ncRNA databases that have structural similarity for a newly detected ncRNAs. We will consider both structured small ncRNAs as well as long non-coding RNAs (lncRNA), where globally conserved structure has not been found yet. 2.) To cluster a set of new non-coding RNAs in order to determine structural classes, which is a prerequisite for functional annotation of new ncRNA classes. In particular, this involves a global clustering of complete transcripts. We will also work on the problem of local clustering, based on local alignments, to find regulatory motifs that are embedded into longer transcripts (e.g., cis-regulatory elements like the IRE (iron response element), IRES (internal ribosome entry site) etc.). This problem is currently hard to solve using automated tools. Our first major objective is to directly improve the efficiency and quality of our sequence-structure alignment approach by using advanced algorithmic techniques. Currently, the best exact algorithmic approaches are not efficient enough for routinely scanning for hundreds (if not thousands) of ncRNAs, typically found in transcriptome data. To make our tools applicable in practice, and thus to fullfill the needs of our cooperation partners, we have to design fast and sensitive filters in order to significantly reduce the number of expensive sequencestructure comparisons. Previous approaches used sequence-based filtering. Obviously, this filtering only works for ncRNAs with high sequence similarities. It is known, however, that conserved ncRNAs may have a very low sequence conservation. Therefore, our second major objective is to develop fast sequencestructure-based filtering methods based on our efficient graph-kernel approach.
DFG Programme Research Grants
International Connection Israel
 
 

Additional Information

Textvergrößerung und Kontrastanpassung