A major goal is to produce a multiple sequence alignment (i.e. series of homology statements) that agrees well with a known secondary structure, using time-saving algorithms/programs and sacrificing as little precision as possible. While motif finding software does this in to some degree, their underlying goals are not always the phylogenetic analysis of the end result, for example they may sacrifice pattern-finding utility for confidence in homology, or the output may not formated in a manner useable by tree-reconstruction utilities. Also, motif and pattern finding structure-based software may effectively handle only a small number of nucleotides (e.g. 20-200), many fewer than required for rRNA regions commonly used in phylogenetic analyses (e.g. V4 of 18S or D2-D3 of 28S). There are presently very few (if any) single programs which are !"!( able to produce a multiple sequence alignment and consensus structure mask from unaligned sequences with the ease of producing an alignment from programs like ClustalW. This page highlights some of the on-going efforts that might lead (sometime indirectly) to this goal, and will hopefully ultimately include our efforts in this regard. Our general approach will be to integrate existing tools via a script-based interfaces.
All links below are to external pages/projects.
Alignment of multiple sequences (>2)
This page by Ian Holmes is an good starting point.
Clustal can be used to align sequences to an existing alignment, and can use a mask statement. Here is how.
Cutting edge. Uses a scripted approach to tie together a number of new and existing software.
While hmmer does not take into account secondary structure information per se, training the hmmer model with an existing secondary-structure based alignment is an effective way to estimate a first pass at the structure of unaligned sequences.
Practically, it functions much like HMMER, but this program takes into account covariance information. See here.
Bayesian multiple alignment.
Promising, but note size restrictions "The max. sequence length should not exceed 500 nt. The number of sequences depends on sequence lengths. The total sum of all sequence lengths is restricted to 10000 nt, e.g., you can paste 20 sequences with lengths of 500 nt each, or 100 sequences with lengths of 100 nt, each."
Concensus structures and others
From the homepage: "comRNA ... predicts common RNA secondary structure motifs in a group of related sequences, developed by Yongmei Ji, Xing Xu and Gary Stormo at Washington University in Saint Louis."