Towards the use of morphological characters based on RNA secondary structure in revisionary systematics

Yoder, Matthew, Texas A&M University, College Station, TX, mjyoder [at] tamu.edu; Gillespie, Joseph, J., pvittata [at] hotmail.com


Abstract

    The secondary structure of commonly sequenced mitochondrial and nuclear rRNA-encoding genes can be informative during both the process of multiple-sequence alignment and the subsequent phylogenetic analysis of these alignments.   Use of secondary structure in these cases is analogous in many ways to the definition of morphological characters as commonly practiced in most revisionary cladistic works.  Parallels between traditional morphological characters and characters based on secondary structure of rRNA molecules are examined.  Example alignments, and the tools to parse, characterize, and analyze these alignments are provided.

Overview
    figure 1Molecular-based phylogenies are now routinely used to make revisionary-systematic decisions both under the traditional Linnean nomenclatural system and more recently proposed systems such as the Phylocode.  Character data supporting these phylogenies is usually derived from nucleotide (DNA/RNA) or amino acid (protein) sequences and often combined with morphological, behavioral, etc. data.  Recently, important steps have been made towards characterizing the secondary structure of popularly used rRNA molecules, such as the 28S D2 and D3 expansion segments (Gillespie et al., submitted: see greyed area of Fig. 1) and the variable region 4 of the 18S rRNA (Wuyts et al., 2000).  These predicted structures of the more variable regions of rRNA are extensions of over a decade of refinements to structure predictions for the mitochondrial 16S and 12S genes and nuclear 28S, 18S, 5.8S and 5S genes (e.g., Cannone et al. 2002).  In a phylogenetic context, these characterizations may be seen as a hybrid between DNA and morphological characters.  In practice, both traditional single column (nucleotide) characters and multi-nucleotide structures defined as single states may be derived from multiple-sequence alignments that are based on secondary structure models. 
     Two types of multi-nucleotide characters can be derived from structural alignments:
  1. Those that represent novel structures unique to a particular taxon (as illustrated in Results, 1-3).
  2. Those that represent variations, across an alignment, in the length or composition of the defined homologous region  (for example RAAs, RSCs, or RECs) (as discussed in Result 4). 
    Characters of type 1 may be re-coded as presence/absence or multi-state characters, with each state representing a "fixed" morphological form.  Characters of type 2 are somewhat akin to continuous characters.  They can be coded as fixed states for fragment level analyses (e.g. in either POY and INAASE).  Both types of characters are usable (see phylogeny) in phylogenetic reconstruction and therefore, ultimately, taxonomic revision.  Characters based on structure (see Results 1-3) are perhaps more intriguing since: 1) when corroborated, they suggest that a "manual" approach to sequence alignment for rRNA is justified; and 2) they represent "complex"  characters, which traditionally are thought to be less likely to represent convergences or parallelisms (homoplasy).
   

Materials and Methods
    RNA sequences from Galerucinae (Coleoptera: Chrysomelidae),  Evaniidae (Hymenoptera: Evanioidea), Ichneumonoidea (Hymenoptera) and Strepsiptera were aligned following a method modified from Kjer (1995).  The end product (e.g. Ichneumonoidea) is a modified Nexus legal matrix and an associated block index (e.g. Ichneumonoidea).  To help facilitate analysis of characters of "type 2" (result 4) Perl scripts (Jrna.pm) were written that parse and return a range of reports and input files (see output formats) for subsequent use with popular phylogenetic analysis programs.  Methods are being further formalized at the Jrna homepage.


Results

    Four results, exemplifying both types of characters discussed above are presented.  Results 1-3 represent "type 1" characters, while result 4 represents a "type 2" character.

1. Figure 2 illustrates (in blue) several major modifications to the expansion segment D2 of the 28S rRNA for myrmecolacid Strepsiptera.  Red bases represent substitutions between populations of Caenocholax fenyesi from Texas and Mexico.  Gross modifications on this scale are to-date uncommonly found.
fig 2
2. Figure 3 illustrates a novel helix, 2c-2, apparently synapomorphic for a clade of basal evaniids.  Though phylogenetic analyses are ongoing, preliminary evidence suggests that this character supports a putatively monophyletic clade further supported by wing venation characters (Deans, pers. com.).  This structure is not known in other insects.
fig 3
3. Figure 4 illustrates a region of slipped-strand compensation that results in a novel structure for a clade of Acalymma spp. sensu stricto (striped cucumber beetles).  Here regions of slipped-strand compensation bounded by helices 2e and 2f form in two unique ways (represented by blue and green boxes).  A deletion in RSC (1’) in Acalymma results in a distinct structure not found in any other sampled diabroticine beetles (Gillespie et al. 2004b).
figure 4
4. For a fourth alignment pertaining to ichneumonoids (Gillespie, Yoder, Wharton, submitted) characters derived from differences in compostion and length for RAAs, RSCs, and RECs (bracketed blocks in the alignment) were analyzed under a parsimony framework in POY and PAUP* following recoding in INAASE (results not shown).  Partial results of these analyses are found in figure 10b (in particular note white squares, which represent branches supported when the content of each block for each taxon was treated as a single character).  While topologies derived from these characters were generally less strongly supported (bootstraps, consensus topologies) these characters were found to support numerous clades recovered using non-secondary structure based alignments.  

Discussion
    Producing a "secondary-structure based" alignment is only possible through a long iterative process of "checking and rechecking" the data, a process originally identified by Hennig (1950) as critical to recovering and subsequently testing phylogenetic signal from morphological characters.  The underlying process in deriving multiple sequence alignments in this manner is therefore in many ways equivalent to the exploration of "traditional" morphological character suites.  For morphological characters each iteration of this process tends to clarify further concepts of homology.  We argue that the same fundamental process can be applied to rRNA sequence data.
    If the full breadth of potential phylogenetic information is to be extracted from rRNA sequences then approaches beyond those most frequently used to analyze these variable regions (optimization alignment in POY, or gap-cost methods as implemented in Clustal) are required.  The common approach which broadly delimits conserved and hypervariable regions in rRNA molecules, excludes the latter while submitting the former to algorithmic alignment. This approach explicitly eliminates phylogenetic signal embodied in the morphological restrictions inherent in secondary structure.  The extreme opposite approach (manual alignment "by-brain" according to the method described here) is not without problems.  Alignments resulting from this approach are sometimes critiqued as "subjective", and their phylogenetic utility has not been fully explored, in part due to the relative paucity of methods available for their analysis.  The continued formalization of the alignment process, and the exploration of the phylogenetic utitlity of the end-product alignment, under the recognition that some approaches may be shown to be untenable, is one of the primary goals of the Jrna web pages.
    With respect to revisionary systematics one major advantage granted by the "by-brain" approach is the potential to tie synapomorphies representing complex characters to a given clade.  This practice is not currently available via methods based on algorithmic alignments.  Revisions tend to start by identifying monophyletic groups and treating the members there-in.  In this light, characters (synapomorphies), which can be hypothesized as supporting the monophyly of the in-group are of particular value.  While topologies alone can be used to define the in-group to be revised, they do not inherently provide hypotheses of synapomorphy at each node (e.g. distance, minimum evolution, or likelihood methods).  Successful revisionary approaches depend on both topologies, and the hypothesized synapomorphies associated with these structures.  As such, we believe novel methods which can generate putative synapomorphies are always welcome.


References Cited
       Full citations for references cited can be found here.  Note that not all references cited therein are cited here-in.

Acknowledgements
     This project was supported in part by NSF PEET grants DEB-0328922, DEB-0328920, and DEB-9978150.  Special thanks to Andy Deans for providing the evaniid data.  Numerous helpful editorial comments were provided by Amy Bader and Robert Wharton, both TAMU.