Extending the theory of Algebraic Dynamic Programming for applications in bioinformatics
Final Report Abstract
In the course of the project we investigated how algebraic dynamic programming can be extended for tool building in the field of RNA bioinformatics. Addressing research questions in biology frequently makes it necessary to develop error-free, high-performance bioinformatic applications within a short time-frame. The Algebraic Dynamic Programming framework saves developer time by enabling a precise formulation of the algorithm and avoiding the implementation of repetitive code. We have added several new contributions in the area of ADP with Bellman’s GAP and with introducing sparsity in ADP. Moreover we published multiple novel tools in the field of RNA bioinformatics. CARNA uses information about the whole secondary structure ensemble, instead of a single thermodynamically stable secondary structure to compute high-quality sequence-structure alignments. ExpaRNA-P uses a data-driven sparsification approach to enable matching of secondary structure motifs for the whole structure ensemble of two RNAs. SPARSE is a sequence-structure alignment tool, based on LocARNA, that provides a substantial speed-up by using the sparsification approach of ExpaRNA-P. RNAscclust identifies homologous sequences in a set of input orthologs by clustering them with a graph-kernel approach. It can leverage the speedup introduced in SPARSE to align and cluster hundreds of potentially homologous sequences, allowing to annotate large transcriptomic datasets. We plan to use the novel extensions in the ADPfusion framework for future projects together with our collaboration partners in Leipzig Bioinformatics group. The availability of ADP-framework aids the development of novel tools in the field of bioinformatics. We learned that for gaining the best performance in practical scenarios, the DP improvements need to be combined with efficient algorithms at the top level of the analysis. For example, in the case of RNAscclust where we opted for a graph-kernel machine learning based approach to decide where to apply dynamic programming algorithms.
Publications
- Sparsification in algebraic dynamic programming. In Proceedings of the German Conference on Bioinformatics (GCB 2011), 2011
Mathias Möhl, Christina Schmiedl, and Shay Zakov
- CARNA - alignment of RNA structure ensembles. Nucleic Acids Res, 40(W1):W49–W53, 2012
Dragos A. Sorescu, Mathias Möhl, Martin Mann, Rolf Backofen, and Sebastian Will
(See online at https://doi.org/10.1093/nar/gks491) - Exact pattern matching for RNA structure ensembles. In Proceedings of the 16th International Conference on Research in Computational Molecular Biology (RECOMB 2012), volume 7262 of LNCS, pages 245–260. Springer-Verlag, 2012
Christina Schmiedl, Mathias Möhl, Steffen Heyne, Mika Amit, Gad M. Landau, Sebastian Will, and Rolf Backofen
(See online at https://doi.org/10.1007/978-3-642-29627-7_27) - Bellman’s GAP–a language and compiler for dynamic programming in sequence analysis. Bioinformatics, 29(5):551–60, 2013
Georg Sauthoff, Mathias Möhl, Stefan Janssen, and Robert Giegerich
(See online at https://doi.org/10.1093/bioinformatics/btt022) - ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs. BMC Bioinformatics, 15(1):6602, 2014
Christina Otto, Mathias Möhl, Steffen Heyne, Mika Amit, Gad M. Landau, Rolf Backofen, and Sebastian Will
(See online at https://doi.org/10.1186/s12859-014-0404-0) - SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Bioinformatics, 31(15):2489– 2496, 2015
Sebastian Will, Christina Otto, Milad Miladi, Mathias Möhl, and Rolf Backofen
(See online at https://doi.org/10.1093/bioinformatics/btv185) - RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics, 33(14):2089–2096, 2017
Milad Miladi, Alexander Junge, Fabrizio Costa, Stefan E. Seemann, Jakob Hull Havgaard, Jan Gorodkin, and Rolf Backofen
(See online at https://doi.org/10.1093/bioinformatics/btx114)