Skip Navigation


MBE Advance Access originally published online on June 4, 2008
Molecular Biology and Evolution 2008 25(8):1778-1787; doi:10.1093/molbev/msn130
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/8/1778    most recent
msn130v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Knies, J. L.
Right arrow Articles by Burch, C. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Knies, J. L.
Right arrow Articles by Burch, C. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Compensatory Evolution in RNA Secondary Structures Increases Substitution Rate Variation among Sites

Jennifer L. Knies*,{dagger},2,3, Kristen K. Dang{ddagger},3, Todd J. Vision*, Noah G. Hoffman§,1, Ronald Swanstrom|| and Christina L. Burch*

* Department of Biology, University of North Carolina, Chapel Hill
{dagger} Curriculum in Genetics and Molecular Biology, University of North Carolina, Chapel Hill
{ddagger} Department of Biomedical Engineering, University of North Carolina, Chapel Hill
§ Department of Microbiology and Immunology, University of North Carolina, Chapel Hill
|| Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill
The UNC Center for AIDS Research, University of North Carolina at Chapel Hill

E-mail: Jennifer_Knies{at}brown.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
There is growing evidence that interactions between biological molecules (e.g., RNA–RNA, protein–protein, RNA–protein) place limits on the rate and trajectory of molecular evolution. Here, by extending Kimura'sGo model of compensatory evolution at interacting sites, we show that the ratio of transition to transversion substitutions ({kappa}) at interacting sites should be equal to the square of the ratio at independent sites. Because transition mutations generally occur at a higher rate than transversions, the model predicts that {kappa} should be higher at interacting sites than at independent sites. We tested this prediction in 10 RNA secondary structures by comparing phylogenetically derived estimates of {kappa} in paired sites within stems ({kappa}p) and unpaired sites within loops ({kappa}u). Eight of the 10 structures showed an excellent match to the quantitative predictions of the model, and 9 of the 10 structures matched the qualitative prediction {kappa}p > {kappa}u. Only the Rev response element from the human immunovirus (HIV) genome showed the reverse pattern, with {kappa}p < {kappa}u. Although a variety of evolutionary forces could produce quantitative deviations from the model predictions, the reversal in magnitude of {kappa}p and {kappa}u could be achieved only by violating the model assumption that the underlying transition (or transversion) mutation rates were identical in paired and unpaired regions of the molecule. We explore the ability of the APOBEC3 enzymes, host defense mechanisms against retroviruses, which induce transition mutations preferentially in single-stranded regions of the HIV genome, to explain this exception to the rule. Taken as a whole, our findings suggest that {kappa} may have utility as a simple diagnostic to evaluate proposed secondary structures.

Key Words: molecular evolution • RNA secondary structure • compensatory evolution • transition–transversion ratio


    Introduction
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
Compensatory mutations, or mutations that are individually deleterious but neutral or beneficial in combination, permit deleterious mutations to be fixed in populations without causing a net fitness loss (Poon and Otto 2000Go). Experimental evidence from laboratory populations shows that most deleterious mutations can be compensated by numerous mutations at alternative sites (Burch and Chao 1999Go; Poon and Chao 2005Go) and that fitness recovery following the fixation of a deleterious mutation most often occurs via compensatory rather than back mutation (Schrag et al. 1997Go; Maisnier-Patin et al. 2002Go; Hoffman et al. 2005Go).

Kimura developed a population genetics model in which deleterious and compensatory mutations can arise within a single genome and occasionally drift to fixation as a pair (Kimura 1985Go). Because genomes containing single deleterious mutations are not required to become fixed in this process, pairs of compensatory mutations can be fixed at an appreciable rate even if their individual deleterious effects are large and even in large populations. However, because the waiting time for both mutations to arise in the same genome is longer than the waiting time for an independent neutral mutation to arise, the rate of molecular evolution is predicted to be lower at compensatory sites than at independently evolving neutral sites.

The contribution of compensatory mutations to molecular evolution in natural populations has been most thoroughly investigated in regions of RNA secondary structure. RNA secondary structure offers a convenient model for investigating compensatory evolution because individual sites are readily identified as independently evolving (unpaired sites) or involved in a compensatory interaction (stems or paired sites). Compensatory evolution clearly plays a role in the evolution of these regions because mutations in stems are generally accompanied by compensating mutations that maintain base pairing in the stem (Kirby et al. 1995Go; Wilke et al. 2003Go; Kern and Kondrashov 2004Go). As predicted by Kimura's compensatory neutral model, the rates of substitution at compensatory sites are decreased relative to those at independent sites in RNA secondary structures (Stephan 1996Go; Innan and Stephan 2001Go), with few exceptions (Wang and Hickey 2002Go). In fact, this difference in substitution rates at compensatory and independent sites has been used to predict secondary structure (Muse 1995Go; Pedersen et al. 2004Go).

In addition to the slowdown in molecular evolution predicted at compensatory sites, inherent differences in rates are also expected to be exaggerated at paired sites in RNA. Specifically, we expect the ratio of transitions to transversions to be exaggerated in the pairing regions. This expectation is derived from the biochemical constraints of base pairing, where a transition mutation can only be compensated by another transition and likewise for transversions. Because transitions occur more frequently than transversions (reviewed in Wakeley 1996), transitions will be compensated more quickly than transversions, resulting in an elevated transition:transversion rate ratio ({kappa}) at paired sites.

Here, by extending Kimura's model specifically to molecular evolution in RNA secondary structures, we make the prediction that the transition to transversion substitution rate ratio ({kappa}) at paired sites should be the square of that for unpaired sites, all other factors being equal. We tested this prediction in 10 functionally and taxonomically diverse RNA molecules and found common, but not universal, quantitative agreement with the model. The prediction that we test may be useful in increasing the accuracy of methods of RNA secondary structure prediction.


    Kimura's Model of Compensatory Neutral Evolution
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
Following Kimura (1985)Go, we consider the substitution process at a pair of loci involved in a compensatory interaction in a diploid population of size N (equivalent results are obtained for a haploid population of size 2N). Let µ represent the rate of mutation from the wild type to the mutated allele at both loci and ignore back mutation by assuming that selection is sufficiently strong to keep both mutations at a low enough frequency that back mutations are improbable. Selection is assumed to act equivalently on mutations at both sites, so that the fitness of genomes containing either of the 2 mutations is 1 s. Because the 2 mutations are involved in a compensatory interaction, however, the fitness of genomes that contain both mutations is equivalent to the wild type (i.e., fitness = 1). Finally, we assume that the loci are sufficiently close together that recombination between them can be ignored. Recombination is minimal in RNA secondary structures of small to moderate size (Parsch et al. 2000Go). Moreover, although recombination is known to slow down the rate of compensatory evolution (Higgs 1998Go), it does not measurably affect the predictions we lay out below (data not shown: Using data from table 1 in Innan and Stephan [2001Go], we constructed rate ratios of time for compensatory evolution at 2 different classes of nucleotide sites that had different mutation rates but experienced equal amounts of recombination and found that recombination had a minimal affect on this ratio). In Kimura's model, the deleterious mutations are assumed to be present initially at an equilibrium frequency determined by the balance between mutation and selection. At this equilibrium, the expected number of alleles carrying either of the 2 deleterious mutations is 4Nµ/s. Compensating mutations are assumed to arise in genomes that already carry an initial deleterious mutation at rate µ, and the probability that the newly arisen linked pair of mutations drifts to fixation in the population is 1/2N. Combining these effects, the rate at which compensatory substitutions are introduced at a pair of sites is

Formula (1)
Equation (1) is equivalent to equation (8b) of the bidirectional, symmetric model of Stephan (1996Go). We can compare this compensatory substitution rate to the expectation at independently evolving neutral sites (Kimura 1985Go):

Formula (2)


View this table:
[in this window]
[in a new window]

 
Table 1 RNA Secondary Structures Analyzed

 
Now we consider evolution in 2 classes of sites—one class in which both sites participating in the compensatory interaction mutate at a faster rate µ1 and one class in which both sites mutate at a slower rate µ2. We find that the ratio of the substitution rate between the 2 classes of sites:

Formula (3)
is the square of the ratio at independently evolving neutral sites

Formula (4)

To adapt this scenario to the evolution of RNA secondary structures, we make use of the widespread observation that transition mutations (purine-to-purine or pyrimidine-to-pyrimidine) are generally more common than transversion mutations. Furthermore, we note that a transition mutation in one side of an RNA stem structure can only be compensated by another transition mutation and likewise for transversions. Thus, we expect 2 rates of compensatory evolution, one for transitions (Formula) and another for transversions (Formula). By assuming that selection against both types of deleterious intermediates acts with the same strength, we predict that the rate ratio of transition to transversion substitutions ({kappa}) in paired regions of RNA secondary structure (stems) should be:

Formula (5)
which is again the square of the rate ratio in unpaired regions (loops):

Formula (6)


    Methods
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
To test the predictions of the model, we selected 10 RNA molecules with well-documented secondary structures for which a large number of diverse sequences are available. The structures used here were predicted from comparative sequence analyses (12S rRNA, 5S rRNA, and tRNAs), experimental evidence (Rev response element [RRE]), or both (RNase P, internal ribosome entry site [IRES], cis-acting replication element [CRE], 16S, and 23S rRNA). For each set of sequences, an alignment and phylogeny were inferred. The value of {kappa} was then estimated for paired versus unpaired sites, and a test was performed to determine if these 2 values differed significantly.

Sources of Sequence Data and Secondary Structures
Sequences were obtained from a variety of sources, as listed in table 1. For each molecule, sequence positions were classified as either paired or unpaired. In most cases, we used structures reported in the literature: RRE (Phuphuakrat and Auewarakul 2003Go), CRE (Tuplin et al. 2002Go, 2004Go), and 12S rRNA (Springer et al. 1995Go). Structures for IRES, 5S rRNA (Fox and Woese 1975Go), and RNase P (Haas et al. 1991Go) were obtained, respectively, from the Viral RNA Structure Database (Thurner et al. 2004Go), the 5S Ribosomal RNA Database (Szymanski et al. 2002Go), and the RNase P database (http://www.mbio.ncsu.edu/RNaseP/home.html) (Brown 1999Go; Harris et al. 2001Go). The 16S and 23S rRNA structures were obtained from the comparative RNA Web site (http://www.rna.ccbb.utexas.edu), and sites were assigned secondary structure positions according to the reference sequence—Escherichia coli (J01695 [GenBank] ) (Cannone et al. 2002Go). Finally, tRNA structures were obtained using Mfold v3.0 (http://bioweb.pasteur.fr/seqanal/interfaces/mfold-simple.html) (Zuker 2003Go) with manual adjustments to fit the canonical model described in Sprinzl et al. (1998)Go.

Alignment and Phylogenetic Inference
The sequences were either obtained having already been aligned or aligned de novo in ClustalW (Chenna et al. 2003Go) or MAFFT (Katoh et al. 2005Go). We constructed phylogenies in MrBayes v3.1.2 (Huelsenbeck and Ronquist 2001Go). In order to constrain parameter values as little as possible, we used the General Time Reversible (GTR) + gamma + invariant model of nucleotide substitution and set the remaining parameters at their default value. For IRES, 5S rRNA, and RNase P alignments 100,000 generations with a 10% (10,000 generations) burn-in was sufficient for convergence. For the 16S and 23S rRNA alignments, convergence was achieved in 250,000 generations with a 50% burn-in. For the 12S rRNA alignment and the mitochondrial alignments, convergence was achieved in 500,000 generations with a 10% and 20% burn-in, respectively. For the CRE and RRE alignments, convergence was achieved in 1 million generations with a 20% and 50% burn-in, respectively. Three independent runs were conducted for each alignment, and the log likelihood values of these runs were compared to confirm that the chains converged on the same posterior distribution. The consensus tree was obtained by majority rule. There was no evidence of saturation as the distance from root to tip was <1 in all phylogenies.

In 4 cases, we were unable to use all the available sequences because either the phylogenetic method did not converge (RRE), the resulting phylogenetic tree was poorly supported (5S rRNA), or the time required for the phylogenetic method was excessive (16S and 23S rRNA). In the case of RRE, we used an arbitrary subset of 199 of the available sequences. In the case of 5S rRNA, we chose sequences from 2 genera (Bacillus and Clostridium) because the phylogeny built from these sequences achieved convergence. For 16S and 23S rRNA, we trimmed the sequence alignment to 94 and 100 sequences, respectively, by choosing every 10th sequence from the highly refined seed alignment downloaded from the comparative RNA Web site.

The alignments and trees for each molecule have been uploaded to Dryad (http://hdl.handle.net/10255/dryad.162).

Estimation of Substitution Rate Parameters
We used the program HyPhy (Kosakovsky Pond et al. 2005Go) to estimate {kappa} separately for paired and unpaired regions of each molecule. We incorporated rate heterogeneity with a discretized gamma distribution of mutation rates (4 rate classes) and used the HKY85 model of nucleotide substitution (Hasegawa et al. 1985Go), which allows for unequal base frequencies, one substitution rate for all transitions, and one for all transversions. We chose this model in order to obtain a single estimate for the transition–transversion rate ratio ({kappa}) with reasonably small confidence intervals (CIs), even though it is not necessarily the best fit for each alignment (see below). For each alignment, we report {kappa} and the approximate 95% CIs derived from the Fisher information matrix. Parameter estimates of {kappa} are expected to be robust to minor errors in tree topology (Hillis 1999Go).

We also investigated whether the HKY85 model, which allows a single rate for all transitions and a single rate for all transversions, gives a reasonable approximation of the observed substitution patterns. For each alignment, we found the best nucleotide substitution model using the model-testing procedure implemented in HyPhy (Kosakovsky Pond et al. 2005Go), which is based on that of ModelTest (Posada and Crandall 2001Go). We fit models with discrete gamma distributed rate variation and a fraction of invariant sites.

Tests of Predictions
Likelihood ratio tests were used to decide whether the data support estimation of a separate {kappa} for paired and unpaired regions. Formally,

Formula
where {kappa} is the transition–transversion ratio estimated from the entire molecule, {kappa}p is the transition–transversion ratio estimated from an analysis of paired positions only, and {kappa}u is the transition–transversion ratio estimated from an analysis of unpaired positions only. We calculated a test statistic, {lambda}, for each alignment in the following way:

Formula (7)
The statistical significance of {lambda} was evaluated assuming a {chi}2 distribution with one degree of freedom. For each molecule (or alignment), the likelihood values were calculated by holding constant the phylogenetic tree and the rate variation parameter {alpha} (estimated from the complete molecule). Only the parameter values of the nucleotide substitution model (HKY85) were allowed to vary.


    Results
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
We tested the prediction that compensatory evolution in regions of RNA secondary structure should result in the transition to transversion rate ratio ({kappa}) at paired sites being the square of that at unpaired sites by examining 10 different RNA secondary structures. These molecules were selected from a diverse group of organisms: viruses, bacteria, amphibians, and mammals (table 1). In order to accurately estimate {kappa}, we used alignments that had at least 20 variable paired and 20 variable unpaired sites and for which the secondary structure was known to be conserved and functionally important. The alignments had varying amounts of sequence diversity (table 2) and complexity of secondary structures (fig. 1) ranging from relatively simple tRNAs with 3 stem-loops to the 12S rRNA with approximately 20 stem-loops. In addition, 2 of the alignments (16S and 23S rRNA) showed significantly more diversity among paired sites (stems) than unpaired sites (loops), consistent with previous studies that reported lineage-specific elevations of substitution rates in paired as compared with unpaired regions among archaea and bacteria RNA secondary structures (Smit et al. 2007Go).


View this table:
[in this window]
[in a new window]

 
Table 2 Description of Variation in Sequence Alignments

 

Figure 1
Figure 1
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— RNA secondary structures. Representative (A) mitochondrial tRNA for asparagine (from mammalian); (B) mitochondrial tRNA for glutamine (from amphibian); (C) 5s rRNA; (D) RNaseP (shown is the sequence of Bacillus brevis)—black bars represent a pseudoknot; (E) RRE from HIV-1; (F) IRES; (G) CRE; (H) 12s rRNA (shown is the sequence of Bos taurus). The secondary structures of 16S and 23S rRNA can be found in Cannone et al. (2002)Go.

 
Testing Kimura's Model of Compensatory Evolution
Our approach assumes that transition substitution rates (G {leftrightarrow} A, C {leftrightarrow} T) are more similar to each other than they are to transversion substitution rates, and vice versa, and that the 2 rates differ substantially. To examine these assumptions, we used a model-testing procedure that allowed any combination of the 6 time-reversible substitution rates (see Methods) to find the best-fitting nucleotide substitution model for each alignment. The best-fit models are shown in figure 2. Although the HKY85 model, which specifies one rate for transitions and one rate for transversions, was not statistically the best-fit model for any alignment (as determined by Akaike Information Criterion scores), the best-fit models did show differences between transition and transversion rates, and transition rates were almost universally more similar to each other than to transversion rates. Only in the case of the 5S rRNA, alignment was one of the transversion rates (A {leftrightarrow} T) similar to the transition rates. Thus, it is justifiable to use the HKY85 model as a reasonable approximation for the purposes of comparing estimates of {kappa}.


Figure 2
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Best-fit nucleotide substitution models for each alignment. Shown is a cartoon illustration of the rate categories of the best-fit nucleotide substitution models for each molecule. Within a molecule, rates were scaled to the maximum rate (black). Diagonal lines depict transitions; the edges of the square depict transversions. The HKY85 model, which was used for the rate ratios reported throughout this article, is shown for comparison on the right.

 
We estimated {kappa} in 2 ways. First, we considered the molecule as a whole and estimated a single {kappa} to describe all sites (paired and unpaired). Second, we divided the molecule into paired and unpaired sites and estimated {kappa}p from all paired sites and {kappa}u from all unpaired sites. Nine of the 10 structures showed {kappa}p > {kappa}u (fig. 3), qualitatively supporting the prediction of Kimura's model. Only the RRE structure showed {kappa}p < {kappa}u. Likelihood ratio tests (see Methods for details) led to rejection of HO: {kappa} = {kappa}p = {kappa}u in favor of HA: {kappa}p != {kappa}u for all 10 molecules at P < 0.0001 (table 3).


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Transition–transversion rate ratios ({kappa}) for each alignment. The dotted line represents a 1:1 relationship between {kappa}p and {kappa}u. The solid line represents the predicted relationship Formula. Note that the CRE data point is from the analysis of 4-fold degenerate sites in paired and unpaired regions.

 

View this table:
[in this window]
[in a new window]

 
Table 3 Transition–Transversion Rate Ratios ({kappa}p)

 
We also examined the quantitative fit of the estimates to the model predictions. A visual inspection of figure 3 confirms the close match of most structures to the expectation that Formula. For 8 structures (12S rRNA, 5S rRNA, 16S rRNA, 23S rRNA, amphibian tRNAs, RNase P, IRES, and CRE), estimates of {kappa}p cannot be statistically distinguished from Formula, that is, 95% CIs estimated from the Fisher Information matrix overlap the Formula line. Only the human immunovirus (HIV) RRE structure, and to a lesser extent the mammalian tRNAs, deviate significantly from the model prediction.

Substitution Patterns in RRE
We examined 3 possible explanations for the surprising result that {kappa}p < {kappa}u in RRE. First, because both the RRE and CRE secondary structures occur within coding regions, we examined the possibility that the difference between {kappa}p and {kappa}u is diminished by selection on the protein sequence. We recalculated {kappa}p and {kappa}u for both molecules using only data from 4-fold degenerate sites in paired and unpaired regions. In CRE, the presence of codons affects the estimates in the predicted direction (4-fold degenerate sites: Formula; all sites: Formula), though the 4-fold sites overshoot the predicted pattern. We had less power to compare 4-fold degenerate sites at the paired and unpaired sites of RRE because there were too few 4-fold degenerate unpaired sites, and there was insufficient sequence variability at these sites. However, the 4-fold degenerate paired sites did show a higher {kappa}p ({kappa}p = 7.61 with 95% CI [4.79–18.48]) than the paired sites as a whole ({kappa}p = 4.21 with 95% CI [3.51–5.28]). This suggests that the presence of protein-coding constraints does impede compensatory evolution at paired sites in RNA secondary structures, although it does not explain why {kappa}u would be "greater" than {kappa}p in RRE.

Second, we examined the possibility that we had used a nonrepresentative sample of RRE sequences. To confirm that the observed substitution patterns in RRE were not specific to the particular set of HIV sequences we examined (which were all derived from subtype B), we estimated {kappa}p and {kappa}u from 2 additional RRE alignments of sequences drawn from higher taxonomic levels: sequences from different subtypes (1 sequence each from A, B, C, F, G, H, J, and K) and sequences from different groups (1–2 sequences each from M, N, and O) of HIV. In both these alignments, the results were qualitatively similar to those for subtype B: {kappa}u was significantly higher than {kappa}p (table 4).


View this table:
[in this window]
[in a new window]

 
Table 4 Transition–Transversion Rate Ratios ({kappa}) for RRE alignments at Different Taxonomic Levels

 
Third, we considered whether the RRE estimates were disproportionately influenced by a portion of the molecule that experiences a type of selection that differs from the molecule as a whole. We systematically removed each stem-loop of RRE and reestimated {kappa}p and {kappa}u for the resulting partial structures. The {kappa}p and {kappa}u estimates were qualitatively similar for all these partial structures (table 5).


View this table:
[in this window]
[in a new window]

 
Table 5 Transition–Transversion Rate Ratios ({kappa}) for Partial RRE Structures

 

    Discussion
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
We have extended Kimura's population genetics model of compensatory evolution to make the prediction that substitution rate variation due to underlying mutation rate differences is exaggerated by compensatory interactions. We made use of the fact that transition rates generally exceed transversion rates to test this prediction in regions of RNA secondary structure. Specifically, we predict that {kappa}, the ratio of transition to transversion substitutions, should be higher in paired than in unpaired regions. Nine of the 10 RNA secondary structures we examined confirmed this qualitative prediction. Moreover, 8 of the 10 structures (RNase P, 5S rRNA, 16S rRNA, 20S rRNA, IRES, tRNA A, 12S rRNA, and CRE) closely matched the quantitative prediction that the ratio in paired regions should be the square of the ratio in unpaired regions (i.e., Formula).

Remarkably, we observed a close quantitative match even in sequence alignments where we had an a priori reason to believe that model assumptions were violated. Alignments that showed more diversity among paired sites (stems) than unpaired sites (loops), in which selection may have been acting on the RNA primary sequence to constrain evolution in loops (Smit et al. 2007Go), nonetheless showed a close match to the model prediction. Finally, the model prediction appears robust to the amount of signal in the sequence data, in that sequence alignments with both low and high amounts of sequence diversity performed equally well in our analysis.

Implications for RNA Secondary Structure Prediction
The close match to the Formula expectation in most structures confirms the role of compensatory interactions in the molecular evolution of RNA secondary structures and suggests the use of the transition to transversion rate ratio, {kappa}, as a simple diagnostic to evaluate proposed secondary structures. In addition, the observed large deviations from the neutral expectation Formula should make substitution rate variation a useful tool for structure prediction. Indeed, recently developed methods for predicting RNA secondary structure from sequence data capitalize on the expectation that substitution rates should be lower at paired than unpaired sites (Muse 1995Go; Pedersen et al. 2004Go). These methods have been shown to be as strong or stronger in validating known structures and predicting new ones than previous methods (Muse 1995Go; Parsch et al. 2000Go; Pedersen et al. 2004Go).

Our results can be used to refine these methods for secondary structure prediction by providing an exact expectation for the ratio of different classes of substitution at unpaired and paired sites. In particular, we find that Formula because transversions are suppressed in stems to a greater extent than transitions (i.e., Formula). By contrast, the secondary structure prediction method of Muse (1995Go) assumes that transition and transversion rates are equally (multiplicatively) reduced in stems. The secondary structure prediction method of Pedersen et al. (2004Go) uses empirically measured substitution patterns in known stems to estimate transition and transversion rates; that is, it allows Formula, but the measure of {kappa}p is specified identically for all structures in all organisms.

Structures not Explained by the Model
Although most structures showed a good fit to the model predictions, 2 structures deviated either quantitatively (mammalian tRNAs) or qualitatively (RRE). We were particularly puzzled by the observation that the relative magnitude of the rate ratios in RRE was reversed from the model prediction, so that {kappa}p < {kappa}u. The only feature that the 2 molecules share is a high {kappa}u value compared with the others. However, a high ratio of the underlying mutation rates µTi to µTv does not, in itself, violate the model's assumptions. Here, we consider how the assumptions of the model could be violated in a way that preferentially elevates {kappa}u, with a particular focus on mechanisms that could explain the observation in RRE that Formula.

The model assumes that 1) the only factor that differs among sites is their paired or unpaired status within the RNA secondary structure, 2) recombination is infrequent, and 3) the underlying mutation rate µTi is the same in paired and unpaired regions and likewise for µTv. Violations of the first 2 assumptions appear to have had only minor effects on our estimates of {kappa}u and {kappa}p. Coding regions cause additional selective differences among sites, and the presence of codons was shown to affect the estimates of {kappa} for CRE and RRE, but the size of the effect was not sufficient to explain a reversal of the relative magnitudes of {kappa}u and {kappa}p in RRE. Recombination between HIV genomes of the same or very closely related genotypes will minimally affect {kappa}u and {kappa}p. Recombination between divergent genotypes may have a greater effect on {kappa}u and {kappa}p, but the frequency of this process is controversial and is believed to be low (Smith et al. 2005Go).

In contrast, violation of the third assumption that the underlying mutation rate µTi is the same in paired and unpaired regions (and likewise for µTv) may provide a plausible explanation even for the observation that {kappa}p < {kappa}u. In RRE, we identified a potential mechanism for raising the transition mutation rate in unpaired regions of the genome—the host cytosine deaminating enzyme family APOBEC3. After entering a host cell, the HIV RNA genome is reverse transcribed into single-stranded DNA prior to incorporation into the host's genome. During this process, the unpaired regions in the DNA genome (including RRE) are vulnerable to the action of APOBEC3G/F, which preferentially deaminates C to T (a transition) within specific motifs in unpaired regions of retroviral DNA. These mutations are observed as G to A transitions on the resulting plus-strand genome of HIV (Harris and Liddament 2004Go). The unpaired motifs are, thus, predicted to experience an elevated µTi, which would explain the reversal of {kappa}u and {kappa}p in RRE. Our strongest test of this hypothesis comes from an examination of the sequence alignments in which adenine is overrepresented in unpaired regions (~40%) of RRE compared with paired regions (~19%), consistent with the preferential action of APOBEC3 on unpaired sites. In contrast, we did not find evidence that the APOBEC3 F and G target motifs GA and GG were underrepresented in unpaired regions or that the G {leftrightarrow} A substitution rate was elevated in unpaired regions compared with the C {leftrightarrow} T rate (data not shown). However, the power of the latter tests was limited by the small statistical power associated with assessing dinucleotide frequencies and the inability to specify the G -> A substitution rate separately from the A -> G rate in the GTR substitution model used here. Together, we take these results to provide at least some support for the hypothesis that the action of the APOBEC enzymes on the minus single-stranded DNA containing some secondary structure caused an elevation of µTi in unpaired sites, explaining the observation in RRE that {kappa}p < {kappa}u. However, a rigorous test of this hypothesis would require a more detailed analysis of molecular evolution across the HIV genome.

Implications for Secondary Structure Evolution
Extending Kimura's model of compensatory evolution allowed the successful quantitative prediction of substitution rates (Stephan 1996Go; Innan and Stephan 2001Go) and rate variation (our study) in RNA secondary structures. The success of these predictions confirms the existence of strong constraints on both nucleotide substitutions and structural evolution in these molecules. If nucleotide substitutions in RNA secondary structures were not constrained by compensatory interactions, but often proceeded by a neutral process, we would expect the difference between estimates of {kappa}u and {kappa}p to be smaller than the model prediction Formula and to approach the neutral expectation {kappa}p = {kappa}u. Similarly, structural evolution would cause some sites that are paired in the reference sequence to be unpaired in nonreference sequences, and vice versa, resulting in a smaller difference between estimates of {kappa}u and {kappa}p than predicted by the model. Thus, the close quantitative agreement with the model for most molecules confirms that substitution rates in secondary structures have been governed by a history of compensatory evolution and suggests that there has been little structural evolution in these molecules even over long evolutionary time periods.


    Funding
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
National Institutes of Health (NIH) (grants GM067940 to C.L.B. and AI44667 to R.S.); National Science Foundation (grant DBI-0227314 to T.J.V.); NIH Training Grant (T32-AI07001 to J.L.K. and T32-AI007419 to K.K.D.).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 
The authors thank Drs Stefanie Hartmann, Sergei Kosakovsky Pond, Derrick Zwickl, and Daniel Weinreich for advice and technical assistance.


    Footnotes
 
1 Present address: Department of Laboratory Medicine, University of Washington. Back

2 Present address: Department of Ecology and Evolutionary Biology, Brown University. Back

3 J.L.K. and K.K.D. contributed equally to this work. Back

Jeffrey Thorne, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Kimura's Model of Compensatory...
 Methods
 Results
 Discussion
 Funding
 Acknowledgements
 References
 

    Brown J. The Ribonuclease P Database. Nucleic Acids Res (1999) 27:314.[Abstract/Free Full Text]

    Burch C, Chao L. Evolution by small steps and rugged landscapes in the RNA virus phi6. Genetics (1999) 151:921–927.[Abstract/Free Full Text]

    Cannone JJ, Subramanian S, Schnare MN, Pande N, Shang Z, Yu N, Gutell RR. The comparative RNA web (CRW) site: an online database of comparative sequence and structural information for ribosomal, intron, and other RNAs. BMC Bioinformatics (2002) 3:2.[CrossRef][Medline]

    Chenna R, Sugawara H, Koike T, Lopez R, Gibson T, Higgins D, Thompson J. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res (2003) 31:3497–3500.[Abstract/Free Full Text]

    Fox GE, Woese CR. 5S-Rna secondary structure. Nature (1975) 256:505–507.[CrossRef][Medline]

    Haas ES, Morse DP, Brown JW, Schmidt FJ, Pace NR. Long-range structure in ribonuclease-P Rna. Science (1991) 254:853–856.[Abstract/Free Full Text]

    Harris J, Haas E, Williams D, Frank D, Brown J. New insight into RNase P RNA structure from comparative analysis of the archaeal RNA. RNA (2001) 7:220–232.[Abstract]

    Harris RS, Liddament MT. Retroviral restriction by APOBEC proteins. Nature Rev Immunol (2004) 4:868–877.[CrossRef][Web of Science][Medline]

    Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol (1985) 22:160–174.[CrossRef][Web of Science][Medline]

    Higgs P. Compensatory neutral mutations and the evolution of RNA. Genetics (1998) 102/103:91–101.

    Hillis DM. Phylogenetics and the study of HIV. In: The evolution of HIV—Crandall KA, ed. (1999) Baltimore (MD): Johns Hopkins University Press. 504.

    Hoffman NG, Schiffer CA, Swanstrom R. Covariation of amino acid positions in HIV-1 protease. Virology (2005) 331:206–207.[CrossRef][Web of Science]

    Huelsenbeck J, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics (2001) 17:754–755.[Abstract/Free Full Text]

    Innan H, Stephan W. Selection intensity against deleterious mutations in RNA secondary structures and rate of compensatory nucleotide substitutions. Genetics (2001) 159:389–399.[Abstract/Free Full Text]

    Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res (2005) 33:511–518.[Abstract/Free Full Text]

    Kern AD, Kondrashov FA. Mechanisms and convergence of compensatory evolution in mammalian mitochondrial tRNAs. Nat Genetics (2004) 36:1207–1212.[CrossRef][Web of Science][Medline]

    Kimura M. The role of compensatory neutral mutations in molecular evolution. J Genet (1985) 64:7–19.[CrossRef]

    Kirby DA, Muse SV, Stephan W. Maintenance of pre-messenger-Rna secondary structure by epistatic selection. Proc Natl Acad Sci USA (1995) 92:9047–9051.[Abstract/Free Full Text]

    Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics (2005) 21:676–679.[Abstract/Free Full Text]

    Maisnier-Patin S, Berg OG, Liljas L, Andersson DI. Compensatory adaptation to the deleterious effect of antibiotic resistance in Salmonella typhimurium. Mol Microbiol (2002) 46:355–366.[CrossRef][Web of Science][Medline]

    Muse SV. Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics (1995) 139:1429–1439.[Abstract]

    Parsch J, Braverman J, Stephan W. Comparative sequence analysis and patterns of covariation in RNA secondary structure. Genetics (2000) 154:909–921.[Abstract/Free Full Text]

    Pedersen J, Meyer I, Forsberg R, Simmonds P, Hein J. A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res (2004) 32:4925–4936.[Abstract/Free Full Text]

    Phuphuakrat A, Auewarakul P. Heterogeneity of HIV-1 Rev response element. Aids Res Hum Retroviruses (2003) 19:569–574.[CrossRef][Web of Science][Medline]

    Poon A, Chao L. The rate of compensatory mutation in the DNA bacteriophage {phi}X174. Genetics (2005) 170:989–999.[Abstract/Free Full Text]

    Poon A, Otto S. Compensating for our load of mutations: freezing the meltdown of small populations. Evol Int J Org Evol (2000) 54:1467–1479.

    Posada D, Crandall KA. Selecting the best-fit model of nucleotide substitution. Syst Biol (2001) 50:580–601.[Abstract/Free Full Text]

    Schrag SJ, Perrot V, Levin BR. Adaptation to the fitness costs of antibiotic resistance in Escherichia coli. Proc R Soc Lond B Biol Sci (1997) 264:1287–1291.[Medline]

    Smit S, Widmann J, Knight R. Evolutionary rates vary among rRNA structural elements. Nucleic Acids Res (2007) 1–16.

    Smith DM, Richman DD, Little SJ. HIV superinfection. J Infect Dis (2005) 192:438.[CrossRef][Web of Science][Medline]

    Springer M, Hollar L, Burk A. Compensatory substitutions and the evolution of the mitochondrial 12S rRNA gene in mammals. Mol Biol Evol (1995) 12:1138–1150.[Abstract]

    Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res (1998) 26:148–153.[Abstract/Free Full Text]

    Stephan W. The rate of compensatory evolution. Genetics (1996) 144:419–426.[Abstract]

    Szymanski M, Barciszewska M, Erdmann V, Barciszewski J. 5S Ribosomal RNA Database. Nucleic Acids Res (2002) 30:176–178.[Abstract/Free Full Text]

    Thurner C, Witwer C, Hofacker IL, Stadler PF. Conserved RNA secondary structures in Flaviviridae genomes. J Gen Virol (2004) 85:1113–1124.[Abstract/Free Full Text]

    Tuplin A, Evans D, Simmonds P. Detailed mapping of RNA secondary structures in core and NS5B-encoding region sequences of hepatitis C virus by RNase cleavage and novel bioinformatic prediction methods. J Gen Virol (2004) 85:3037–3047.[Abstract/Free Full Text]

    Tuplin A, Wood J, Evans D, Patel A, Simmonds P. Thermodynamic and phylogenetic prediction of RNA secondary structures in the coding region of hepatitis C virus. RNA (2002) 8:824–841.[Abstract]

    Wakely J. The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. Trends in Ecology & Evolution (1996) 11:158–162.[CrossRef][Web of Science]

    Wang H, Hickey D. Evidence for strong selective constraint acting on the nucleotide composition of 16S ribosomal RNA genes. Nucleic Acids Res (2002) 30:2501–2507.[Abstract/Free Full Text]

    Wilke C, Lenski R, Adami C. Compensatory mutations cause excess of antagonistic epistasis in RNA secondary structure folding. BMC Evol Biol (2003) 3:3.[CrossRef][Medline]

    Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res (2003) 31:3406–3415.[Abstract/Free Full Text]

Accepted for publication May 30, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/8/1778    most recent
msn130v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Knies, J. L.
Right arrow Articles by Burch, C. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Knies, J. L.
Right arrow Articles by Burch, C. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?