contact

Matt Yoder () and Joe Gillespie () are responsible for these pages.

citing this project

Cite the these web pages as: Yoder, M., and J. Gillespie. 2004-present. jRNA. Exploring insect phylogeny using RNA secondary structure. Web pages at http://purl.org/NET/rRNA/jRNA.

Cite use of the Jrna.pm as: Yoder, M., and J. Gillespie. 2004-present. Jrna, Perl scripts for use in analyses and documentation of secondary-structure based multiple sequence alignments. Available at http://purl.org/NET/rRNA/jRNA.

using Jrna.pm

For Perl users note that Jrna.pm does not yet conform to standard Perl modules format. The code is fairly well commented for those interested in its function. Note that those for those interested in starting to use this approach that the jRNA scripts are no longer being actively developed (though they provide functionality that is not yet replicated elsewhere in some cases, i.e. web-page generation). Contact Matt regarding access to the new Psy scripts.

guide

James Munro has spent a lot of time working with the scripts. He has summarized his experience in a guid that is available as pdf . Thanks James!.

getting ready

To run the scripts, at minimum you'll have to have done the following:

  1. Install Perl. There are many websites that describe how to do this (see links), chances are you already have it installed.
  2. Install the Statistics::Descriptive module from CPAN.
  3. Test your system, and make sure you can execute simple, non Jrna.pm related scripts.
  4. If you're going to use the web-output (recommended) functionality, download the CSS style folder and place it in the directory you will run the scripts from (= home directory).
  5. Download the Jrna.pm and Jrna_objs.pm files here, and place them in the home directory. You may have to configure your PATH variable. Check for new versions frequently, the code is rapidly developing. The version number is available at the head of the Jrna.pm file. NO GUARANTEE OF BACKWARDS COMPATIBILITY IS MADE.

input files

Two input files are required, an alignment, and an index defining helices.

  1. The alignment file. See here, and this example. Note also the following:
    • All files must be text files with Unix or Dos line endings. It appears only Unix line endings are supported for OS X at this time. This should be fixable in the future.
    • The alignment begins with a line with "matrix" and nothing else, and ends with a ";" and nothing else. Its Nexus compatible.
    • The matrix may be broken into any number of interleaves, the taxon name is required for each interleave. Each interleave gets its own HTML page for the reports, so breaking the data into logical interleaves will allow easy HTML page navigation.
    • NO tabs are allowed anywhere between the "matrix" and ";".
    • Three lines are required above the top of each interleave.
      1. a [Block_ ] line. The "Block_" is required, everything else is optional.
      2. a [ some comments here ] line. This is currently only required as a space filler and will likely ultimately become depreciated.
      3. a [mask ... ] line. This is the Phase mask line. See the Phase manual.
    • Blocks of data are delimited by whitespace (spaces, NOT tabs). All taxa must have exactly the same number of blocks delimited.
    • Variable length blocks (generally unaligned) are delimited with brackets.

  2. The helix index file (see example).
  3. The helix index file is a text file with two columns. The first contains the block index number, starting at zero. The second the helix name. Stems are identified by a difference of a single character, the single quote (e.g. myhelixname, myhelixme' ). Blocks need not be named, in this case "?" should be used as a filler.

running the scripts

Though many different approaches to implementing the scripts are possible, its likely easiest to run them as follows.

navigating model reports

The jrna scripts generate a series of descriptive web pages given the input files. Interpretation and navigation of these pages are described below. Each report page has a navigation bar. "Home" will take you to the interleave index, a single arrow moves forward one interleave, double arrows move to the last or first interleave.

the interleave index

An overview of the datafile is viewable by following the interleave index link on each model homepage. This index reports some basic statistics and lists all the interleaves in the dataset. Links to the various reports for each interleave are found to the right of each interleave line (see below).

block composition (bc)

The composition of each block, overall the full dataset is reported here. For purposes of reporting all non AUGCN-? characters are collectively summarized as "o". NOTE: at this time there appears to be a bug in the variance calculations.

matrix (mx)

The original alignment for all taxa. Blocks and columns are indexed on the bottom of the data.

column composition (cc)

This is essentially a bar graph of reporting the composition of each column. Units of the graph are represented somewhat disproportionately in that bases represented by less than one unit but greater than zero percentage given a single unit on the graph and that graph length is equalized by adding single units sequentially to the greatest represented base until the graph reaches the fixed length.

stems (st)

For each interleave, stems starting (i.e. non-prime ends) in that interleave are expanded and shown here. The type of bond formed is shown as underlying: red- fully compensated; blue- partially compensated; green- not compensated.

base-pair frequency (bpf)

This table reports the base-pair frequencies occurring in each stem region. Percentage values reported are based on the total number of taxa reported under "# seqs". This value is calculated as the total taxa minus taxa with "??". At this point taxa with "--" are included, though this clearly misleads reports in cases where "--" indicate missing data. Red highlighted values indicate that base-pair is co-varying, blue values are not co-varying according to the algorithm described below. Darker values simply correlate to higher percentages. For display purposes covariation is defined as follows. For any given basepair there must be other basepair(s) which contain alternate nucleotides for both sides. Only basepairs present at 3% or greater are considered. For example given the presence of these basepairs at 3% or higher: CG, CU, UC, we would say all three basepairs are co-varying. Given CG, UG, GG none would be co-varying because the second position is fixed for G.

FAQ

usage/viewing

These pages, and the HTML generated by the scripts make extensive use of CSS styling. They are W3 CSS compliant and tested using the free W3 standards compliant Mozilla/Firefox (see link below) and OS X Safari browsers. They will not render correctly under non-compliant browsers such as Internet Explorer, and we do not plan to remedy this in the foreseeable future. Some pages, particularly those in the model section, are large, and may take time to download and render. A minimum resolution of at least 1024x768 is recommended.