Matt Yoder () and Joe Gillespie () are responsible for these pages.
Cite the these web pages as: Yoder, M., and J. Gillespie. 2004-present. jRNA. Exploring insect phylogeny using RNA secondary structure. Web pages at http://purl.org/NET/rRNA/jRNA.
Cite use of the Jrna.pm as: Yoder, M., and J. Gillespie. 2004-present. Jrna, Perl scripts for use in analyses and documentation of secondary-structure based multiple sequence alignments. Available at http://purl.org/NET/rRNA/jRNA.
For Perl users note that Jrna.pm does not yet conform to standard Perl modules format. The code is fairly well commented for those interested in its function. Note that those for those interested in starting to use this approach that the jRNA scripts are no longer being actively developed (though they provide functionality that is not yet replicated elsewhere in some cases, i.e. web-page generation). Contact Matt regarding access to the new Psy scripts.
James Munro has spent a lot of time working with the scripts. He has summarized his experience in a guid that is available as pdf . Thanks James!.
To run the scripts, at minimum you'll have to have done the following:
- Install Perl. There are many websites that describe how to do this (see links), chances are you already have it installed.
- Install the Statistics::Descriptive module from CPAN.
- Test your system, and make sure you can execute simple, non Jrna.pm related scripts.
- If you're going to use the web-output (recommended) functionality, download the CSS style folder and place it in the directory you will run the scripts from (= home directory).
- Download the Jrna.pm and Jrna_objs.pm files here, and place them in the home directory. You may have to configure your PATH variable. Check for new versions frequently, the code is rapidly developing. The version number is available at the head of the Jrna.pm file. NO GUARANTEE OF BACKWARDS COMPATIBILITY IS MADE.
Two input files are required, an alignment, and an index defining helices.
- All files must be text files with Unix or Dos line endings. It appears only Unix line endings are supported for OS X at this time. This should be fixable in the future.
- The alignment begins with a line with "matrix" and nothing else, and ends with a ";" and nothing else. Its Nexus compatible.
- The matrix may be broken into any number of interleaves, the taxon name is required for each interleave. Each interleave gets its own HTML page for the reports, so breaking the data into logical interleaves will allow easy HTML page navigation.
- NO tabs are allowed anywhere between the "matrix" and ";".
- Three lines are required above the top of each interleave.
- a [Block_ ] line. The "Block_" is required, everything else is optional.
- a [ some comments here ] line. This is currently only required as a space filler and will likely ultimately become depreciated.
- a [mask ... ] line. This is the Phase mask line. See the Phase manual.
running the scripts
Though many different approaches to implementing the scripts are possible, its likely easiest to run them as follows.
- Modify an existing template file (example) to point to your data.
- In the template comment out all but the "init" lines.
- Execute the template script. Some error checking is done and written to a file in the err folder. If there are errors you'll get warnings (or it will look ugly!).
- If your files pass, try running the out_web function, this will generate the web reports allowing you to visualize your matrices etc. This is a good way to check that everything is working. Once this passes, methods to generate files for analyses can generally be safely called.
- Output is to "analyses", "models", "err" folders created in the home directory.
The jrna scripts generate a series of descriptive web pages given the input files. Interpretation and navigation of these pages are described below. Each report page has a navigation bar. "Home" will take you to the interleave index, a single arrow moves forward one interleave, double arrows move to the last or first interleave.
the interleave index
An overview of the datafile is viewable by following the interleave index link on each model homepage. This index reports some basic statistics and lists all the interleaves in the dataset. Links to the various reports for each interleave are found to the right of each interleave line (see below).
block composition (bc)
The composition of each block, overall the full dataset is reported here. For purposes of reporting all non AUGCN-? characters are collectively summarized as "o". NOTE: at this time there appears to be a bug in the variance calculations.
The original alignment for all taxa. Blocks and columns are indexed on the bottom of the data.
column composition (cc)
This is essentially a bar graph of reporting the composition of each column. Units of the graph are represented somewhat disproportionately in that bases represented by less than one unit but greater than zero percentage given a single unit on the graph and that graph length is equalized by adding single units sequentially to the greatest represented base until the graph reaches the fixed length.
For each interleave, stems starting (i.e. non-prime ends) in that interleave are expanded and shown here. The type of bond formed is shown as underlying: red- fully compensated; blue- partially compensated; green- not compensated.
base-pair frequency (bpf)
This table reports the base-pair frequencies occurring in each stem region. Percentage values reported are based on the total number of taxa reported under "# seqs". This value is calculated as the total taxa minus taxa with "??". At this point taxa with "--" are included, though this clearly misleads reports in cases where "--" indicate missing data. Red highlighted values indicate that base-pair is co-varying, blue values are not co-varying according to the algorithm described below. Darker values simply correlate to higher percentages. For display purposes covariation is defined as follows. For any given basepair there must be other basepair(s) which contain alternate nucleotides for both sides. Only basepairs present at 3% or greater are considered. For example given the presence of these basepairs at 3% or higher: CG, CU, UC, we would say all three basepairs are co-varying. Given CG, UG, GG none would be co-varying because the second position is fixed for G.
- What can the Jrna scripts do?
- The scripts are translation tools. They take a simply formated matrix and associated index file and translate them into other formats such as FASTA, Mr. Bayes, PHASE, POY and numerous others. Wherever possible information regarding secondary structure is preserved in this translation.
- The scripts generate HTML pages. Numerous reports are generated which describe the data in question.
- Align your data. We're working on translating the alignments into HMM models and others (see links) which will align your data. If you really want to try this contact us.
- What characters can I put in my bracketed blocks?
- While only certain characters are presently used in calculations etc, all characters except whitespace (space,tab) are allowed in bracked blocks. White space can only be used to delimit blocks.
These pages, and the HTML generated by the scripts make extensive use of CSS styling. They are W3 CSS compliant and tested using the free W3 standards compliant Mozilla/Firefox (see link below) and OS X Safari browsers. They will not render correctly under non-compliant browsers such as Internet Explorer, and we do not plan to remedy this in the foreseeable future. Some pages, particularly those in the model section, are large, and may take time to download and render. A minimum resolution of at least 1024x768 is recommended.