Skip Navigation


MBE Advance Access originally published online on October 16, 2007
Molecular Biology and Evolution 2008 25(1):52-61; doi:10.1093/molbev/msm226
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
25/1/52    most recent
msm226v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mower, J. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mower, J. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Modeling Sites of RNA Editing as a Fifth Nucleotide State Reveals Progressive Loss of Edited Sites from Angiosperm Mitochondria

Jeffrey P. Mower

Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland

E-mail: mowerj{at}tcd.ie.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
RNA editing is a type of nucleic acid modification found in many eukaryotic lineages. In plants, RNA editing occurs by the site-specific conversion of cytidines to uridines in mitochondrial and plastid transcripts. To quantify the rates of edit site gain and loss in angiosperm mitochondrial genes, a nonreversible maximum likelihood model was developed that treats sites of RNA editing as a fifth nucleotide state. The rate of loss of editing, either by genomic replacement with a thymidine or by loss of recognition by the editing complex, was found to be significantly higher than the rate of gain. Furthermore, the frequency of editing is not at equilibrium in angiosperm mitochondrial sequences; there is a strong tendency for the number of edited sites to decrease over time. These results indicate that selection plays a key role in driving the higher rate of edit site loss relative to gain and suggest that the strength of selection against editing has become increasingly stringent over the course of angiosperm evolution. The model described here should be easily adaptable to other systems that involve nucleic acid modifications.

Key Words: RNA editing • angiosperm mitochondria • maximum likelihood • nonreversible model • fifth state • substitution rate


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
RNA editing is a process that alters the information encoded in the genome by inserting, deleting, or modifying nucleotides in transcripts (Brennicke et al. 1999Go; Gott and Emeson 2000Go). In plants, RNA editing occurs by the conversion of specific cytidines to uridines and, at least in some lineages, uridines to cytidines in organellar transcripts. Current hypotheses on the mechanism of RNA editing in plants suggest that one or more general factors are needed to catalyze the reaction, although no enzymes have yet been identified (Shikanai 2006Go; Salone et al. 2007Go). Editing specificity is likely achieved with associated factors that recognize only one or a few editing positions. This is because the sequence context of edited sites is critical for providing specificity yet contains no obvious motifs shared by all sites (Choury et al. 2004Go; Neuwirt et al. 2005Go). Recently, the family of pentatricopeptide repeat proteins has emerged as a set of likely factors for the specific recognition of sites to be edited (Small and Peeters 2000Go; Lurin et al. 2004Go; Kotera et al. 2005Go; Okuda et al. 2006Go), and a subset of this family may catalyze the editing reaction (Salone et al. 2007Go).

Abundant editing data are currently available from 4 angiosperm mitochondrial genomes (Giege and Brennicke 1999Go; Notsu et al. 2002Go; Handa 2003Go; Mower and Palmer 2006Go). From these data, it is clear that sites of RNA editing are not found at random positions in plant mitochondrial transcripts (Mulligan et al. 2007Go). Most edited sites are located in the coding regions of known protein genes. The few studies on editing in other transcript regions (introns and untranslated regions) or transcript types (rRNAs, tRNAs, and unknown open reading frames) have found a reduced frequency of editing (Schuster et al. 1991Go; Giege and Brennicke 1999Go; Fey et al. 2001Go; Notsu et al. 2002Go; Handa 2003Go). Within a genome, the frequency of editing is known to vary substantially among different protein genes (Giege and Brennicke 1999Go; Notsu et al. 2002Go; Handa 2003Go; Mower and Palmer 2006Go), whereas editing frequency is generally consistent for a homologous gene when evaluated across species (Mower and Palmer 2006Go). With respect to codon position, over half of edited sites are at second positions, whereas only ~10% are at third positions (Giege and Brennicke 1999Go; Cummings and Myers 2004Go), with the result that most editing events alter the encoded amino acid sequence. The vast majority of these nonsilent edit sites increases protein conservation across species; this tendency was recognized upon discovery of editing in plants (Covello and Gray 1989Go; Gualberto et al. 1989Go; Hiesel et al. 1989Go) and is so prevalent that it can be used to accurately predict about 95% of nonsilent edits (Mower 2005Go).

The nonrandom distribution of sites of RNA editing in plant mitochondrial genes indicates that selection has played a major role in shaping this distribution. At the protein level, edited sites may be advantageous, deleterious, or neutral depending on the effect (or lack thereof) of the amino acid change on protein function. At the cellular level, the existence of the RNA editing system is probably deleterious. Shields and Wolfe (1997)Go showed that the loss of editing, by a genomic replacement of the edited C with a T, occurs 3–4 times faster than the rate of C-to-T transition at third positions. Because both events involve a C-to-T genomic change and are silent at the protein level, the increased rate of edit site loss relative to the silent transition rate suggests that selection is acting to remove edited sites from the mitochondrial genome. How could such a costly system have arisen during evolution, and why does it persist? Covello and Gray (1993)Go proposed that editing could have arisen by neutral processes via an initial appearance of a molecular apparatus for editing followed by the accumulation of edited sites by genetic drift. Once established, the sheer number of edited sites in plant mitochondrial transcripts prevents the simultaneous loss of all sites so that the process cannot be dispensed with entirely.

One caveat with the Shields and Wolfe (1997)Go study is that an edited C in one species and a genomic T at the homologous position in another species was implicitly assumed to indicate a loss of editing in the first species rather than a gain of editing in the second. On a mechanistic basis, one might expect that most of these apparent differences would be losses because a loss simply requires a genomic C-to-T transition, whereas a gain of editing requires a T-to-C transition coupled with the acquisition of a recognition signal so that the C is properly edited. Thus, their assumption may be generally accurate, but at least a fraction of sites should represent a gain of editing rather than a loss. It is unclear whether this might confound their results. If their results are correct and the editing process itself is indeed deleterious, then we might expect to find the opposite result for gains. That is, edited sites should be gained less frequently than expected by chance. However, this hypothesis has yet to be investigated.

In order to obtain a more accurate estimate of the rate of loss of sites of RNA editing and to simultaneously obtain the rate of gain, I have developed a maximum likelihood (ML) model of RNA editing. To account for any differences in behavior of edited and unedited cytidines, the general Markov model of nucleotide substitution was extended to incorporate sites of RNA editing as a fifth nucleotide state. This approach differs from that of Picardi and Quagliariello (2006)Go, who instead simulated editing data by applying distinct nucleotide models to edited and unedited sites. However, their mixed-model approach is unlikely to accurately reflect RNA editing evolution for several reasons. First, the rate matrices of their models were assumed to be reversible (i.e., forward and reverse rates of substitution are equal), but this assumption is likely to be invalidated due to differences in the mechanism and selective cost of gaining and losing edited sites. Second, they did not distinguish between silent edited sites (found predominantly at third positions) and nonsilent edited sites (found at first and second positions only), which are likely to evolve at very different rates.

To examine the importance of these issues, the RNA editing model described in this paper was first subjected to a series of model-fitting analyses. The best fitting model was then used to address questions about the dynamics of edit site gain and loss across angiosperms. These tests indicate that edit sites are not in equilibrium. Sites of RNA editing are being lost in angiosperms much faster than expected by neutral processes, most likely due to an increased intensity of selection against sites of RNA editing.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Sequence and RNA Editing Data Sources
For the complete mitochondrial genomes of Arabidopsis thaliana, Oryza sativa, Brassica napus, and Beta vulgaris, experimentally determined sites of RNA editing are available for all known protein-coding genes (Giege and Brennicke 1999Go; Notsu et al. 2002Go; Handa 2003Go; Mower and Palmer 2006Go). Abundant RNA editing data are also available from several species within the genera Oenothera and Triticum, with most data coming from the species Oenothera berteriana and Triticum aestivum. Relevant GenBank accession files were collected for these 6 lineages, protein-coding sequences were extracted, and sites of RNA editing were identified for each gene by using the edited site annotations present in each file (supplementary table S1, Supplementary Material online). In some cases, edited sites were not present or were incorrectly annotated in the GenBank files, and these sites were added or corrected as described in supplementary table S1 (Supplementary Material online). Two putative U-to-C edited sites were present in the data (one from Oenothera cob and one from Triticum cox3) but were not included in this study.

All mitochondrial protein genes with sequence data and editing information from at least 5 of the 6 species were aligned using ClustalW version 1.83 (Thompson et al. 1994Go) and manually adjusted when necessary using BioEdit version 5.0.9 (Hall 1999Go). When present, nonhomologous 5' and 3' ends were trimmed from the alignments. These individual gene alignments were concatenated into a single alignment. Aligned columns with data missing from more than one species were excluded from further analysis. The final data set spans 26 genes, has an aligned length of 24,678 nucleotides, and includes 148,068 aligned characters, of which 10,474 are missing data (7.1%) and 2,327 are edited sites (1.6%). For most analyses, this data set was partitioned so that each column in the alignment was placed into one of 3 partitions based on the codon position occupied by the column.

Definition of an ML Model for RNA Editing
To evaluate the dynamics of RNA editing evolution, an ML model of RNA editing was created. Sites of C-to-U RNA editing were represented in the data set as a fifth nucleotide state, denoted by the letter "E." To incorporate this fifth nucleotide state into an ML framework, the following nonreversible instantaneous rate matrix, QNrev, was created:

Formula (1)
where the parameter {rho}xy is the rate ratio for a substitution from nucleotide x to y, {pi}x is the empirical frequency of nucleotide x in the data set, and the value of each diagonal (represented by an asterisk) is set to the negative sum of the nondiagonals in each row so that the sum of each row equals zero. The QNrev matrix is not time reversible because each member of the matrix, qxy, does not satisfy the following equality:

Formula (2)
By enforcing equation (2) for each member of QNrev, a time-reversible matrix, QRev, was also created. Because QRev is time reversible, each forward rate ratio is equal to its corresponding reverse rate ratio (i.e., {rho}xy = {rho}yx). Additional instantaneous rate matrices were developed that constrain some aspect of QNrev. These constraints are defined in table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 RNA Editing Models Used in This Study

 
ML models of RNA editing were generated by applying one of the instantaneous rate matrices described above to either the full or the partitioned data set (table 1). For each model, a tree topology of ((((Arabidopsis, Brassica), Oenothera), Beta), (Oryza, Triticum)) was enforced, which is in agreement with current hypotheses about phylogenetic relationships among these species (Soltis et al. 1999Go; Angiosperm Phylogeny Group 2003Go). In addition, rate heterogeneity was applied to each matrix by estimating the shape ({alpha}) of a gamma distribution with 20 rate categories. The models were implemented and the parameters were optimized in HyPhy version 0.99 Beta for Windows (Kosakovsky Pond et al. 2005Go) using custom batch files written in the HyPhy batch language. For each model, the overall likelihood and the number of parameters are listed in table 1. For partitioned data sets, the overall likelihood is the sum of the likelihoods from each partition.

All model fitting and testing analyses were performed using the likelihood ratio test (LRT; Felsenstein 1981Go). The LRT statistic (2{Delta}lnL) was assumed to follow a chi-squared distribution with the degrees of freedom set to the difference in the number of free parameters between the null and alternative models. A significance threshold of P < 0.05 was enforced after a Bonferroni correction for 19 comparisons.

Estimation of Substitution Rates
From the parameters optimized by an ML model, a branch length in a phylogenetic tree can be calculated as

Formula (3)
where tbr is a branch-scaling parameter for some branch br, x and y are nucleotide states, {pi}x and {pi}y are nucleotide frequencies, and qxy is defined by the instantaneous rate matrix. This value represents the expected number of substitutions per site along a particular branch. By substituting qxy for its corresponding value in the instantaneous rate matrix, the branch length can be rewritten as

Formula (4)
where {rho}xy is the rate ratio for a substitution from x to y. Thus, a branch length is simply the sum over all values:

Formula (5)
for all x and y (where y != x). This value represents the expected number of x to y substitutions per site for some branch. Because this value is averaged over all nucleotide sites, it is not very useful for the purposes of this paper. Dividing this value by {pi}x gives the expression

Formula (6)
representing the expected number of x to y substitutions per x site for a particular branch. This value provides the rate that some specific nucleotide will be substituted to some other nucleotide along some branch, and it is relative to the rate of all other specific substitutions.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Model Fitting
An ML model was created to examine the dynamics of RNA editing evolution in angiosperm mitochondria. Sites of RNA editing were identified in 26 mitochondrial genes from 6 plants (supplementary table S1, Supplementary Material online), and these sites were represented in the genes as a fifth nucleotide state (denoted by the letter E). An instantaneous rate matrix was then developed to incorporate this fifth state into an ML framework. To identify the appropriate number of rate ratio parameters needed for the background rates (i.e., substitution rates between the nonedited states A, C, G, and T), a model was selected using Modeltest (Posada and Crandall 1998Go). For this analysis only, all data columns with edited sites were removed from the data set. Using either hierarchical LRTs or the Akaike information criterion (AIC) as the performance measure, the transversional model with a gamma distribution of rate variation among sites (TVM + G) model was determined as the best fit. However, the general time reversible plus gamma model (GTR + G) model was within 2 AIC units of the TVM + G model, indicating that it also receives substantial support (Burnham and Anderson 2002Go). To prevent unduly constraining the model prior to further model fitting, the more general GTR + G model was chosen. Thus, a reversible RNA editing model (M1) was created by using an instantaneous rate matrix, QRev, that allows each of the reversible background rates to have a separate rate parameter (table 1). In this matrix, each of the reversible editing rates (i.e., substitution rates between the edited state E and any other state) is also allowed to have an independent parameter.

The reversible rate matrix QRev cannot be used to examine patterns of edit site gain and loss because reversible matrices, by definition, constrain forward and reverse rates to be equal. Therefore, a fully nonreversible matrix, QNrev, was also created that allows for separate forward and reverse parameters for editing and background rates. In addition, because the different codon positions may vary in the frequency of editing and in the effect of editing on protein sequence, a model of RNA editing may need to be fit separately for each codon position. To test the reversibility of the RNA editing model and the equality of the model across codon positions, a series of LRTs was performed. Model M1, which uses the QRev matrix and the entire data set, is the strictest model examined in this study and was defined as the null model in comparison with 3 alternatives: 1) model M2 that uses the QNrev matrix and the entire data set, 2) model M3 that uses a separate QRev matrix for each codon position, and 3) model M4 that uses a separate QNrev matrix for each codon position (table 1). For all 3 tests, there is a significant increase in likelihood for an editing model that is not reversible and not equal across codon positions (table 2).


View this table:
[in this window]
[in a new window]

 
Table 2 LRTs of RNA Editing Model Fit

 
The first 3 comparisons in table 2 demonstrate that, in general, a likelihood model that is both nonreversible and independent across codon positions significantly increases the fit to the data. However, these reversibility and codon position tests do not differentiate among the various contributions of the additional parameters for rate heterogeneity ({alpha}), nucleotide frequencies ({pi}), branch scaling (t), and substitution rate ratios ({rho}). Four models were generated to independently test the contribution of these 4 parameter classes: 1) model M5 constrains M4 to use the same gamma distribution of rate heterogeneity for all 3 codon positions, 2) model M6 constrains M4 to use the nucleotide frequencies across the entire data set (rather than codon-specific frequencies), 3) model M7 constrains M4 so that the branch-scaling parameters for all 3 codon positions are the same (which forces the proportionality of all branch lengths to be the same for all 3 positions), and 4) model M8 constrains M4 to use the same rate ratios for all codon positions (table 1). LRTs were performed that tested these 4 null models against the alternative model M4 (table 2). For the nucleotide and rate ratio parameters, there is a significant increase in likelihood for the alternative model, indicating that these parameters should be allowed to vary at each codon position. However, there was not a major benefit to allowing independent branch scaling or rate heterogeneity for each position (although branch scaling is marginally significant at the 0.05 threshold, this result does not remain significant after correction for multiple tests).

To incorporate both nonsignificant constraints into a single model, M9 was created that forces the branch scaling and rate heterogeneity parameters to be the same for all 3 codon positions (table 1). These constraints have only a minor effect on the shape of the phylogenetic tree. The proportionality of branch lengths resulting in the trees from M4 at each codon position is very similar to the corresponding branch lengths in model M9, in which the proportionality is forced to be the same for all 3 codon positions (fig. 1). Although some differences exist (most notably for internal branch lengths in the M4 position 3 tree relative to M9), they do not result in a significant change in likelihood (table 2). Therefore, M9 was used as the standard model to be manipulated in all subsequent analyses, and all rate estimates are derived from parameters optimized for this model (supplementary table S2, Supplementary Material online).


Figure 1
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— ML trees resulting from models M4 and M9. The ML trees from model M4 differ in both the proportionality and magnitude of branches among codon positions. For M9, although the length of a particular branch is allowed to vary among codon positions, the proportionality of branches to one another is fixed for all 3 positions. The scale bars apply to both models.

 
Rates of Gain and Loss of Edited Sites
From the parameter values optimized in the model (supplementary table S2, Supplementary Material online), a relative rate of substitution, representing the rate at which a particular nucleotide is substituted by some other nucleotide, can be calculated for any type of change. In model M9, the branch-scaling parameter for a particular branch (tbr) is the same for all x to y substitution rates at all codon positions. Thus, when making a comparison of substitution rates (tbr · {rho}xy · {pi}y) within or between positions, the branch-scaling parameter can be factored out of the comparison and the same proportionality will hold by comparing instantaneous rates ({rho}xy · {pi}y). This fact also allows general conclusions to be made without regard to any particular branch because the same instantaneous rate is applied to all branches.

A gain of editing was defined as the instantaneous rate at which a thymidine is substituted by an edited cytidine (T-to-E), whereas a loss of editing was defined as the substitution rate from an edited cytidine to a thymidine (E-to-T). Because an E state and a T state will both produce a U in the mature RNA molecule, T-to-E and E-to-T substitutions will not result in a change in protein sequence, and therefore, both types should be free of selection at the protein level. Using the parameters optimized in model M9, the instantaneous rates of edit site gain and loss were calculated and compared (fig. 2A). As can be seen, the E-to-T rate of edit site loss is much faster than the T-to-E gain rate. Losses occur 130–390 times more frequently than gains depending on codon position. This difference is significant because model M10 forces these rates to be equal (table 1) but is a significantly worse fit to the data than is model M9 (table 3).


Figure 2
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Rate dynamics of RNA edit site gain and loss. Comparison of instantaneous rates for the gain and loss of (A) edited sites and (B) edit site recognition. Comparison of rate ratios for the gain and loss of (C) edited sites and (D) edit site recognition.

 

View this table:
[in this window]
[in a new window]

 
Table 3 LRTs of Rate Dynamics of RNA Edited Sites

 
The higher instantaneous rate of edit site loss relative to gain is not surprising. On a mechanistic basis, the rate of loss of editing is expected to be more frequent than the gain rate. This is because an E-to-T loss can occur through a C-to-T substitution at the genomic level, whereas a T-to-E gain requires not only a T-to-C substitution but also a gain of a recognition signal (i.e., a C-to-E change). However, the accelerated loss rate could also suggest that selection is acting to eliminate edited sites from the genome, consistent with previous hypotheses and analyses on RNA editing evolution (Covello and Gray 1993Go; Shields and Wolfe 1997Go). Because E-to-T and T-to-E changes do not alter the protein sequence, they can be considered as a type of synonymous substitution. If edited sites are not under selection, then the instantaneous rate of E-to-T loss (occurring through a genomic C-to-T substitution) might be expected to be equal to the rate of C-to-T substitution at third positions, which is also a synonymous change (Shields and Wolfe 1997Go). Extending this hypothesis to T-to-E gains is more complex because an edit site may be gained in 2 ways: by a genomic T-to-C transition followed by a C-to-E acquisition of a recognition motif or by a T-to-C transition only (which is then immediately recognized by the editing machinery). Thus, in the absence of selection, the rate of T-to-E gain might be equal to the product of the third position rates of T-to-C substitution and C-to-E substitution, to the third position T-to-C rate alone, or to some value in between.

To test these hypotheses, model M11 was created that constrained the E-to-T loss rate at each codon position to be equal to the C-to-T transition rate at third positions (table 1). Also, models were created (table 1) that constrained the T-to-E gain rate at each codon position to be equal to the product of the third position rates of T-to-C and C-to-E substitution (M12) or to the third position T-to-C rate only (M13). Comparison of model M11 with M9 shows that M11 is a significantly worse fit to the data, indicating that there are significant differences in the rates of editing loss and synonymous C-to-T changes (table 3). E-to-T losses of editing, via silent C-to-T changes, occur 15–35 times faster than silent C-to-T changes at the third positions (fig. 2A), which indicate that selection is acting to remove sites of RNA editing from the mitochondrial genome. In contrast, the T-to-E gain rate falls between the range of neutral expectations (fig. 2A) and is not significantly different from either extreme (M12 and M13 vs. M9 in table 3). Thus, selection does not have a detectable influence on the rate of edit site gain regardless of the confounding effect of the presence or absence of recognition motifs.

Despite the higher instantaneous rate of edit site loss relative to gain, the number of gains and losses of edited sites may still be in equilibrium because there are so few edited cytidine positions relative to thymidines. In other words, if E sites are rare but lost readily, whereas T sites are common but rarely converted to E sites, then the total numbers of sites undergoing E-to-T and T-to-E changes may be roughly equal. To test whether E-to-T losses are in equilibrium with T-to-E gains, model M14 was implemented that enforces the reversibility criterion of equation (2) (i.e., {pi}E · {rho}ET · {pi}T = {pi}T · {rho}TE · {pi}E) at each codon position (table 1). Because the nucleotide frequency parameters can be canceled out from both sides of the equation, the equation was enforced in model M14 by setting forward and reverse rate ratio parameters equal at each codon position (i.e., {rho}ET = {rho}TE). This model is a significantly worse fit to the data than model M9 (table 3), indicating that the observed differences between the forward and reverse rate ratios in model M9 are significant (fig. 2C). E-to-T losses have accumulated over 15 times more frequently than T-to-E gains regardless of codon position. These results demonstrate that the frequency of editing is still in flux in angiosperm mitochondrial sequences, with a strong tendency for the amount of editing to decrease over time.

Rates of Gain and Loss of Edited Site Recognition
A site of RNA editing may also be gained or lost through the acquisition or degradation of a putative site-specific recognition motif in the sequence surrounding a cytidine. The rate of gain and loss of edit site recognition (C-to-E and E-to-C changes, respectively) can also be evaluated with the RNA editing model devised in this study. Comparison of instantaneous rates shows that a loss of recognition occurs approximately 15 times faster at first and second positions and over 40 times faster at third positions than a gain of recognition at these same positions (fig. 2B). Comparison between models M15, which enforces equality of the instantaneous rates of gain and loss (table 1), and M9 verifies that the difference between the gain and loss rate of recognition is significant (table 4). A higher rate of E-to-C loss relative to C-to-E gain is also not unexpected on a mechanistic basis. A loss of recognition of an edited cytidine is expected to occur more easily than a gain of recognition of an unedited cytidine because multiple nucleotide positions have been shown to be critical for edit site specification (Choury et al. 2004Go; Neuwirt et al. 2005Go). Thus, a single mutation at any of these critical nucleotides (or in the recognition factor's binding region) will reduce or abolish the interaction between the factor and the motif. In contrast, before editing could arise at a random unedited C, the transcript must acquire either a recognition motif that can be recognized by some preexisting recognition factor or a recognition factor capable of binding the preexisting sequence context.


View this table:
[in this window]
[in a new window]

 
Table 4 LRTs of Rate Dynamics of Editing Recognition Motifs

 
However, as in the previous section, a higher loss rate of edit site recognition relative to gain may also be indicative of selection against the presence of edited sites. For recognition rate dynamics, though, there is an added layer of selection not only on the editing process itself but also on the outcome of the process. An edited cytidine and an unedited cytidine will result in a different nucleotide being present in the mature RNA transcript, which will usually have consequences for the protein sequence (except when the cytidine occurs at a degenerate codon position, such as at any third position and some first positions). Because of this, it might be expected that the effect of selection at the protein level will differ depending on codon position. Indeed, this is precisely what is observed (fig. 2B). For both C-to-E and E-to-C substitutions, the rates are fastest at third positions (where a change in editing status is always synonymous) and slowest at second positions (where editing status changes are always nonsynonymous). The intermediate level of first position rates is probably due to the facts that these changes are sometimes synonymous and that when nonsynonymous they cause a smaller shift of the physiochemical properties of the amino acid than do second position changes. Models (table 1) that constrain either E-to-C losses (M16) or C-to-E gains (M17) to be equal across codon positions are a significantly worse fit to the data than M9 (table 4), indicating that these codon position–specific differences are important.

As expected, there are clear differences in recognition rate dynamics among codon positions, most likely due to selection acting on the codon position–specific outcomes of editing. However, there is no obvious reason to expect that this protein-level selection should also account for the observed difference between the rates of loss and gain of site recognition. To correct for any potential confounding effects of selection at the protein level, the instantaneous rates of E-to-C loss of recognition were compared with the rates of T-to-C substitution at the same codon position. Because both types of change will have the same effect on the protein sequence, any differences in rate should not be attributable to protein-level selection. As can be seen, the rates of recognition loss are 90, 63, and 400 times faster than the T-to-C rates at first, second, and third positions, respectively (fig. 2B). These differences are significant because model M18, which constrains rates of E-to-C loss to T-to-C transitions (table 1), is significantly worse than model M9 (table 4). In contrast, there were no significant differences (M19 vs. M9 in table 4) between the C-to-E gain rates and the C-to-T rates at corresponding codon positions (fig. 2B).

Thus, there is a substantially higher rate of recognition loss, even after correcting for protein-level selection. This effect is most pronounced at third positions, which suggests that at least some of the difference between gain and loss rates is due to selection against the process of editing itself rather than mechanistic differences in gaining or losing an editing recognition signal. Selection to remove edited sites should be strongest at third positions because all third position edited sites are silent and can be removed without affecting the protein sequence, but there is no obvious reason why a loss of recognition should be easier at third positions than at first or second positions.

Regardless of the cause of the different rates of gain and loss of edited site recognition, this effect appears to have become more pronounced over the course of angiosperm history. Comparison of rate ratios for E-to-C losses and C-to-E gains reveal that the number of recognition site losses accumulates between 3 and 4 times faster than the number of gained recognition motifs (fig. 2D), and this disequilibrium is significant based on a LRT that examines the reversibility of these parameters (M20 vs. M9 in table 4). Thus, even after consideration of the fact that there are many fewer recognition motifs that can be lost than positions where a motif could be gained, there is still a tendency for the total number of recognition motifs to be decreased in angiosperm mitochondrial sequences over time.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
In this paper, a novel ML model was created in order to examine the rate dynamics of RNA editing in flowering plants. The model performs well in that the estimated rates of edited site gain and loss agree with expectations on many levels. Of particular note, sites of RNA editing were found to be lost much more rapidly than gained (fig. 2). This was expected for several reasons. On a mechanistic basis, the loss of an edited site, either by genomic replacement or by loss of recognition, should be less complex than an edited site gain. Also, selection might be expected to favor the loss of RNA editing, as demonstrated previously (Shields and Wolfe 1997Go) and confirmed here. However, whereas Shields and Wolfe found a 3- to 4-fold increase in edit site loss relative to silent C-to-T transitions, the ML model used here reported a 15-fold increase at first and second positions and a 35-fold increase at third positions (fig. 2A). This discrepancy is likely due to the different methodologies used. Estimating rates in a phylogenetic framework enables the detection of repeated gains and losses along a single lineage as well as independent losses in different lineages; the pairwise analyses of Shields and Wolfe (1997)Go would miss these multiple hits. A number of factors could contribute to the selective cost of maintaining the complex process of RNA editing: the conservation of tens to hundreds of nuclear genes involved in the process and the manufacture of their protein products, the preservation of sequence contexts surrounding edited sites to ensure that the sites are recognized by the editing machinery, and the production of incorrect organellar proteins from incompletely edited transcripts and/or the regulation of these transcripts to ensure that they are not translated.

If selection favors the loss of edit sites, then it might also be expected that gains should be disfavored. Unfortunately, there is a complexity in testing this hypothesis because the fraction of gain sites with preexisting recognition motifs relative to those that must acquire them is not known; yet, these 2 types of gains are likely to happen at very different rates. This uncertainty in the presence or absence of recognition motifs is a violation of the memoryless property of Markov models because a site that has undergone an E-to-T loss will retain the recognition motif for some time and will therefore be more likely to revert to an edited site than a random T. To more precisely model the gain process, a hidden Markov model could be generated where the motif status is unknown for a given site. Using the simpler model described here, the observed gain rate could only be tested against the 2 extreme scenarios: all gain sites with preexisting motifs and all gain sites without. Surprisingly, the rate of editing gain was not found to be significantly different from either end of the spectrum of neutral rate possibilities, indicating that the gain rate is not under detectable levels of selection no matter what the fraction of sites with preexisting motifs turns out to be. The lack of selective constraints on editing gains suggests that many of these gains do not lead to an increase in the complexity of the editing process. For example, a new edit site may never arise unless it fortuitously acquires a recognition motif identical to a motif at some other site so that no new recognition factors are needed.

At equilibrium, the frequency of editing will be dictated by the relative rates of gain and loss of editing. Because of the combined effects of the mechanistic difficulties and selective costs in maintaining editing sites, loss rates are much higher than gain rates. Consequently, we can predict that the frequency of editing will never accumulate to a high level in plants unless the strength on one or both of these effects is reduced. However, the results of the reversibility tests indicate that the frequency of editing is still in flux and that the number of edited sites has decreased in angiosperm mitochondrial sequences over time (fig. 2C and D). The fact that edited sites are not in equilibrium is somewhat surprising because it suggests that the frequency of editing was higher in the common ancestor of flowering plants and hence that the mechanistic difficulties or selective costs of editing (or both) have become increasingly stringent during the course of angiosperm evolution. Interestingly, limited sampling of editing in gymnosperm sequences suggests that many species have higher numbers of edited sites overall (Glaubitz and Carlson 1992Go; Hiesel et al. 1994Go; Karpinska et al. 1995Go; Lu et al. 1998Go). The higher editing frequencies in gymnosperms may therefore represent a state that more closely resembles the ancestral condition. It is possible that the results are specific to the species examined here because they are from only 4 families (Brassicaceae, Myrtaceae, Amaranthaceae, and Poaceae) out of almost 500 across flowering plants (Angiosperm Phylogeny Group 2003Go). However, each family represents a diverse lineage spanning over 100 Myr of independent angiosperm evolution (Wikstrom et al. 2001Go). Although there may be angiosperms whose rate dynamics strongly deviate from the pattern observed here, they will most likely be the exception rather than the rule.

Plants that might show unusual frequencies of editing are those with very high or low rates of synonymous substitutions because some studies have suggested a correlation between the rate of synonymous substitution and the frequency of editing (Parkinson et al. 2005Go; Lynch et al. 2006Go). Consistent with this hypothesis, a reduced frequency of editing has already been observed for several plants with extremely fast synonymous substitution rates (Cho et al. 2004Go; Parkinson et al. 2005Go). At the opposite end of the spectrum, a high frequency of editing was found for Magnolia and for several gymnosperms (Perrotta et al. 1996Go; Lu et al. 1998Go; Regina et al. 2002Go) with low mitochondrial rates of synonymous substitution (Mower et al. 2007Go). This correlation does not always hold, however. For instance, Silene noctiflora has a very high synonymous substitution rate but was not predicted to have a greatly reduced frequency of editing (Mower et al. 2007Go). Furthermore, mitochondrial genes can lose some or all their edited sites through the reverse transcription of edited transcripts followed by reinsertion into the genome (Geiss et al. 1994Go; Bowe and dePamphilis 1996Go; Lu et al. 1998Go; Lopez et al. 2007Go), which might occur without respect to substitution rates. It will be interesting to reevaluate the correlation between editing frequency and substitution rate once more genomic and transcriptomic data are available for plants with unusually high or low synonymous rates.

This paper has shown that an ML model of RNA editing can quantitatively model numerous aspects of RNA editing frequencies and rates in plants and can provide additional information on the evolution of these rates over time. The success of this model for plant mitochondrial RNA editing indicates that adaptations to other systems where nucleotide modifications occur might be equally successful, such as for DNA methylation, tRNA modification, RNA editing in other organisms, and even posttranslational protein modifications. Additionally, this model might be useful in properly handling edited sites in phylogenetic analyses of plant mitochondrial and chloroplast genes. There is currently no consensus on the effect of RNA edited sites in molecular systematics (Bowe and dePamphilis 1996Go; Vangerow et al. 1999Go; Petersen et al. 2006Go; Qiu et al. 2006Go). Indeed, some researchers prefer to exclude columns with sites of RNA editing from an analysis, whereas others suggest that edited sites should increase phylogenetic signal. An ML model that treats edited sites separately will allow these sites to remain in all phylogenetic analyses and will at the same time allow them to play by their own rules in the analysis.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary tables S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
I thank Sergei Kosakovsky Pond for helpful advice with the HyPhy package and Gavin Conant, Marie Sémon, Ken Wolfe, and Greg Young for critical reading of the manuscript. This work was supported by a postdoctoral fellowship from the Irish Research Council for Science, Engineering, and Technology and by Science Foundation Ireland.


    Footnotes
 
Peter Lockhart, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Angiosperm Phylogeny Group. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot J Linn Soc (2003) 141:399–436.[CrossRef][Web of Science]

    Bowe LM, dePamphilis CW. Effects of RNA editing and gene processing on phylogenetic reconstruction. Mol Biol Evol (1996) 13:1159–1166.[Abstract]

    Brennicke A, Marchfelder A, Binder S. RNA editing. FEMS Microbiol Rev (1999) 23:297–316.[CrossRef][Web of Science][Medline]

    Burnham KP, Anderson DR. Model selection and multimodel inference: a practical information-theoretic approach (2002) 2nd ed. New York: Springer.

    Cho Y, Mower JP, Qiu YL, Palmer JD. Mitochondrial substitution rates are extraordinarily elevated and variable in a genus of flowering plants. Proc Natl Acad Sci USA (2004) 101:17741–17746.[Abstract/Free Full Text]

    Choury D, Farre JC, Jordana X, Araya A. Different patterns in the recognition of editing sites in plant mitochondria. Nucleic Acids Res (2004) 32:6397–6406.[Abstract/Free Full Text]

    Covello PS, Gray MW. RNA editing in plant mitochondria. Nature (1989) 341:662–666.[CrossRef][Medline]

    Covello PS, Gray MW. On the evolution of RNA editing. Trends Genet (1993) 9:265–268.[CrossRef][Web of Science][Medline]

    Cummings MP, Myers DS. Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA. BMC Bioinformatics (2004) 5:132.[CrossRef][Medline]

    Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol (1981) 17:368–376.[CrossRef][Web of Science][Medline]

    Fey J, Weil JH, Tomita K, Cosset A, Dietrich A, Small I, Marechal-Drouard L. Editing of plant mitochondrial transfer RNAs. Acta Biochim Pol (2001) 48:383–389.[Web of Science][Medline]

    Geiss KT, Abbas GM, Makaroff CA. Intron loss from the NADH dehydrogenase subunit 4 gene of lettuce mitochondrial DNA: evidence for homologous recombination of a cDNA intermediate. Mol Gen Genet (1994) 243:97–105.[CrossRef][Web of Science][Medline]

    Giege P, Brennicke A. RNA editing in Arabidopsis mitochondria effects 441 C to U changes in ORFs. Proc Natl Acad Sci USA (1999) 96:15324–15329.[Abstract/Free Full Text]

    Glaubitz JC, Carlson JE. RNA editing in the mitochondria of a conifer. Curr Genet (1992) 22:163–165.[CrossRef][Web of Science][Medline]

    Gott JM, Emeson RB. Functions and mechanisms of RNA editing. Annu Rev Genet (2000) 34:499–531.[CrossRef][Web of Science][Medline]

    Gualberto JM, Lamattina L, Bonnard G, Weil JH, Grienenberger JM. RNA editing in wheat mitochondria results in the conservation of protein sequences. Nature (1989) 341:660–662.[CrossRef][Medline]

    Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser (1999) 41:95–98.

    Handa H. The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res (2003) 31:5907–5916.[Abstract/Free Full Text]

    Hiesel R, Combettes B, Brennicke A. Evidence for RNA editing in mitochondria of all major groups of land plants except the Bryophyta. Proc Natl Acad Sci USA (1994) 91:629–633.[Abstract/Free Full Text]

    Hiesel R, Wissinger B, Schuster W, Brennicke A. RNA editing in plant mitochondria. Science (1989) 246:1632–1634.[Abstract/Free Full Text]

    Karpinska B, Karpinski S, Hillgren JE. The genes encoding subunit 3 of NADH dehydrogenase and ribosomal protein S12 are co-transcribed and edited in Pinus sylvestris (L.) mitochondria. Curr Genet (1995) 28:423–428.[CrossRef][Web of Science][Medline]

    Kosakovsky Pond SL, Frost SDW, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics (2005) 21:676–679.[Abstract/Free Full Text]

    Kotera E, Tasaka M, Shikanai T. A pentatricopeptide repeat protein is essential for RNA editing in chloroplasts. Nature (2005) 433:326–330.[CrossRef][Medline]

    Lopez L, Picardi E, Quagliariello C. RNA editing has been lost in the mitochondrial cox3 and rps13 mRNAs in Asparagales. Biochimie (2007) 89:159–167.[Medline]

    Lu MZ, Szmidt AE, Wang XR. RNA editing in gymnosperms and its impact on the evolution of the mitochondrial coxI gene. Plant Mol Biol (1998) 37:225–234.[CrossRef][Web of Science][Medline]

    Lurin C, Andres C, Aubourg S, et al. Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell (2004) 16:2089–2103. (19 co-authors).[Abstract/Free Full Text]

    Lynch M, Koskella B, Schaack S. Mutation pressure and the evolution of organelle genomic architecture. Science (2006) 311:1727–1730.[Abstract/Free Full Text]

    Mower JP. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics (2005) 6:96.[CrossRef][Medline]

    Mower JP, Palmer JD. Patterns of partial RNA editing in mitochondrial genes of Beta vulgaris. Mol Genet Genomics (2006) 276:285–293.[CrossRef][Web of Science][Medline]

    Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD. Extensive variation in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol Biol (2007) 7:135.[CrossRef][Medline]

    Mulligan RM, Chang KL, Chou CC. Computational analysis of RNA editing sites in plant mitochondrial genomes reveals similar information content and a sporadic distribution of editing sites. Mol Biol Evol (2007) 24:1971–1981.[Abstract/Free Full Text]

    Neuwirt J, Takenaka M, van der Merwe JA, Brennicke A. An in vitro RNA editing system from cauliflower mitochondria: editing site recognition parameters can vary in different plant species. RNA (2005) 11:1563–1570.[Abstract/Free Full Text]

    Notsu Y, Masood S, Nishikawa T, Kubo N, Akiduki G, Nakazono M, Hirai A, Kadowaki K. The complete sequence of the rice (Oryza sativa L.) mitochondrial genome: frequent DNA sequence acquisition and loss during the evolution of flowering plants. Mol Genet Genomics (2002) 268:434–445.[CrossRef][Web of Science][Medline]

    Okuda K, Nakamura T, Sugita M, Shimizu T, Shikanai T. A pentatricopeptide repeat protein is a site recognition factor in chloroplast RNA editing. J Biol Chem (2006) 281:37661–37667.[Abstract/Free Full Text]

    Parkinson CL, Mower JP, Qiu YL, Shirk AJ, Song K, Young ND, dePamphilis CW, Palmer JD. Multiple major increases and decreases in mitochondrial substitution rates in the plant family Geraniaceae. BMC Evol Biol (2005) 5:73.[CrossRef][Medline]

    Perrotta G, Malek O, Heiser V, Brennicke A, Grohmann L, Quagliariello C. RNA editing in the cox3 mRNA of Magnolia is more extensive than in other dicot or monocot plants. Biochim Biophys Acta (1996) 1307:254–258.[Medline]

    Petersen G, Seberg O, Davis JI, Stevenson DW. RNA editing and phylogenetic reconstruction in two monocot mitochondrial genes. Taxon (2006) 55:871–886.[Web of Science]

    Picardi E, Quagliariello C. EdiPy: a resource to simulate the evolution of plant mitochondrial genes under the RNA editing. Comput Biol Chem (2006) 30:77–80.[CrossRef][Web of Science][Medline]

    Posada D, Crandall KA. Modeltest: testing the model of DNA substitution. Bioinformatics (1998) 14:817–818.[Abstract/Free Full Text]

    Qiu YL, Li L, Hendry TA, Li R, Taylor DW, Issa MJ, Ronen AJ, Vekaria ML, White AM. Reconstructing the basal angiosperm phylogeny: evaluating information content of mitochondrial genes. Taxon (2006) 55:837–856.[Web of Science]

    Regina TM, Lopez L, Picardi E, Quagliariello C. Striking differences in RNA editing requirements to express the rps4 gene in magnolia and sunflower mitochondria. Gene (2002) 286:33–41.[CrossRef][Web of Science][Medline]

    Salone V, Rudinger M, Polsakiewicz M, Hoffmann B, Groth-Malonek M, Szurek B, Small I, Knoop V, Lurin C. A hypothesis on the identification of the editing enzyme in plant organelles. FEBS Lett (2007) 581:4132–4138.[CrossRef][Web of Science][Medline]

    Schuster W, Ternes R, Knoop V, Hiesel R, Wissinger B, Brennicke A. Distribution of RNA editing sites in Oenothera mitochondrial mRNAs and rRNAs. Curr Genet (1991) 20:397–404.[CrossRef][Web of Science][Medline]

    Shields DC, Wolfe KH. Accelerated evolution of sites undergoing mRNA editing in plant mitochondria and chloroplasts. Mol Biol Evol (1997) 14:344–349.[Abstract]

    Shikanai T. RNA editing in plant organelles: machinery, physiological function and evolution. Cell Mol Life Sci (2006) 63:698–708.[CrossRef][Web of Science][Medline]

    Small ID, Peeters N. The PPR motif—a TPR-related motif prevalent in plant organellar proteins. Trends Biochem Sci (2000) 25:46–47.[Web of Science][Medline]

    Soltis PS, Soltis DE, Chase MW. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature (1999) 402:402–404.[CrossRef]

    Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.[Abstract/Free Full Text]

    Vangerow S, Teerkorn T, Knoop V. Phylogenetic information in the mitochondrial nad5 gene of pteridophytes: RNA editing and intron sequences. Plant Biol (1999) 1:235–243.[CrossRef]

    Wikstrom N, Savolainen V, Chase MW. Evolution of the angiosperms: calibrating the family tree. Proc R Soc Lond B Biol Sci (2001) 268:2211–2220.[Medline]

Accepted for publication October 11, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
25/1/52    most recent
msm226v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mower, J. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mower, J. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?