Skip Navigation


MBE Advance Access originally published online on January 3, 2008
Molecular Biology and Evolution 2008 25(3):568-579; doi:10.1093/molbev/msm284
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/3/568    most recent
msm284v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Z.
Right arrow Articles by Nielsen, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Z.
Right arrow Articles by Nielsen, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage

Ziheng Yang* and Rasmus Nielsen{dagger}

* Department of Biology, Galton Laboratory, University College London, London, United Kingdom
{dagger} Department of Biology, University of Copenhagen, Copenhagen, Denmark

E-mail: z.yang{at}ucl.ac.uk.


    Abstract
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
Current models of codon substitution are formulated at the levels of nucleotide substitution and do not explicitly consider the separate effects of mutation and selection. They are thus incapable of inferring whether mutation or selection is responsible for evolution at silent sites. Here we implement a few population genetics models of codon substitution that explicitly consider mutation bias and natural selection at the DNA level. Selection on codon usage is modeled by introducing codon-fitness parameters, which together with mutation-bias parameters, predict optimal codon frequencies for the gene. The selective pressure may be for translational efficiency and accuracy or for fine-tuning translational kinetics to produce correct protein folding. We apply the models to compare mitochondrial and nuclear genes from several mammalian species. Model assumptions concerning codon usage are found to affect the estimation of sequence distances (such as the synonymous rate dS, the nonsynonymous rate dN, and the rate at the 4-fold degenerate sites d4), as found in previous studies, but the new models produced very similar estimates to some old ones. We also develop a likelihood ratio test to examine the null hypothesis that codon usage is due to mutation bias alone, not influenced by natural selection. Application of the test to the mammalian data led to rejection of the null hypothesis in most genes, suggesting that natural selection may be a driving force in the evolution of synonymous codon usage in mammals. Estimates of selection coefficients nevertheless suggest that selection on codon usage is weak and most mutations are nearly neutral. The sensitivity of the analysis on the assumed mutation model is discussed.

Key Words: codon substitution model • codon usage • mutation • selection • synonymous substitution


    Introduction
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
In protein-coding genes, synonymous codons that code for the same amino acid do not appear at the same frequency (Ikemura 1981Go, 1985Go). Whether the origin and maintenance of such codon usage bias is due to biases in the mutation process or to natural selection has been a matter of much controversy (see, e.g., Duret 2002Go for review). Mutation bias must play a role, but the significance of selection in driving the evolution of codon usage is less certain and may depend on the species. In fast-growing organisms with large population sizes, such as Escherichia coli, Saccharomyces cerevisiae, and yeast, codon usage is generally thought to be under selective pressure, as supported by several lines of evidence. First, codon frequencies are correlated with the cellular cognate tRNA concentrations (Ikemura 1981Go, 1985Go; Bennetzen and Hall 1982Go; Bulmer 1987Go; Sharp and Li 1987Go; Moriyama and Powell 1997Go). Preferential use of so-called major codons to match the most abundant tRNAs may enhance translational speed and improve translational accuracy (for reviews, see Akashi 1995Go; Sharp et al. 1995Go; Duret 2002Go). In addition, major codons may reduce the energetic cost of translation by reducing the chances of amino acid misincorporations and ribosomal drop-offs (Kurland 1992Go) and by freeing up the protein synthesis machinery through faster ribosomal elongation. Second, in both Drosophila and Caenorhabditis elegans, codon usage is correlated with gene expression, with highly expressed genes having strongly biased codon usage, presumably because of stronger selective pressure (Duret and Mouchiroud 1999Go; Castillo-Davis and Hartl 2002Go). Third, silent substitution rate (measured by the sequence distances dS or d4 at the synonymous or 4-fold degenerate sites) is lower in genes with highly biased codon usage, implying stronger purifying selection on silent mutations in highly biased genes (e.g., Sharp and Li 1987Go). This correlation was nevertheless found to depend on the method used to estimate silent rates (Dunn et al. 2001Go; Bierne and Eyre-Walker 2003Go). Fourth, in Drosophila, codon usage is more biased for conserved amino acids than for nonconserved amino acids (Akashi 1994Go). This may be explained by selection for translational accuracy because highly conserved amino acids are expected to be functionally more important and less tolerant to misincorporations of wrong amino acids and are thus under stronger selective pressure.

In slowly growing organisms with small population sizes such as vertebrates, natural selection may be inefficient and indeed its effect on codon usage is controversial (see, e.g., Duret 2002Go for a review). In contrast to results for bacteria, yeast, and Drosophila, strong evidence for selection on codon usage is lacking in vertebrates. For example, Kanaya et al. (2001)Go found a correspondence between codon bias and tRNA gene copy number (a proxy for tRNA concentration) in Schizosaccharomyces pombe and C. elegans but not in Xenopus laevis and Homo sapiens; in the later species, highly expressed genes such as ribosomal genes and histone genes do not have strong codon bias. Some studies (e.g., Musto et al. 2001Go) found a correlation between codon bias and putative expression levels (as measured by expressed sequence tag frequencies), but this correlation could be explained by transcription-coupled repair (Duret 2002Go).

Besides selection for translational efficiency and accuracy, recent experimental work suggests that the selective pressure on codon usage may also be due to the need for an optimal translation kinetics, to ensure correct protein folding. Protein folding is thought to be cotranslational, occurring at the same time the protein is translated from the mRNA (Frydman 2001Go). The use of preferred and unpreferred codons may affect the rate at which the protein is translated. The translation kinetics may be important in separating temporally folding events during protein synthesis on the ribosome, thus ensuring "beneficial" interactions and avoiding "unwanted" interactions within the growing peptide, to achieve high yield of the correctly folded protein. Kimchi-Sarfaty et al. (2007)Go reported that certain synonymous mutations in the multidrug resistance 1 gene resulted in altered drug and inhibitor interactions. They found similar mRNA and protein levels but altered protein conformations between the "wild type" and mutant protein products and hypothesized that the incorporation of rare synonymous codons may have affected the timing of folding. This form of selection differs from translational selection in that preferred codons are not always advantageous if the optimal folding requires a slow translation. It is unclear how important such selection for protein folding is to the evolutionary process of protein-coding genes.

A number of authors have studied population genetics models in which the proportions of synonymous codons are modeled as the product of interactions between mutation bias, natural selection, and genetic drift (Kimura 1983Go; Li 1987Go; Bulmer 1991Go; McVean and Charlesworth 1999Go). McVean and Vieira (1999)Go applied maximum likelihood (ML) to fit such a model to counts of synonymous codons for 2-fold amino acids in protein-coding genes in several Drosophila species, to estimate parameters of mutation bias and selective pressure. The analysis does not consider the evolutionary relationships among species, which may provide useful information concerning relative mutation rates between nucleotides. This model was extended by McVean and Vieira (2001)Go to analyze synonymous differences between different species, with nonsynonymous differences ignored. Nielsen et al. (2007)Go implemented a codon-substitution model in which a mutation is favored or disfavored by natural selection depending on whether it changes an unpreferred codon into a preferred one or vice versa. The model was applied to Drosophila protein–coding genes to obtain ML estimates of parameters measuring the strength of selection. This method requires a priori partitioning of synonymous codons into preferred and unpreferred categories and also assumes only one selection coefficient to accommodate selection on codon usage.

In this paper, we implement a few new models of codon substitution that relax those assumptions. Our motivations for this study are 2-fold. First, we devise a likelihood ratio test (LRT) of neutral evolution of codon usage to infer possible effects of natural selection. Whereas many previous studies have performed correlation analysis to test the various predictions of the mutation and selection theory of codon usage bias (see above), the LRT addresses this problem directly. Our model also provides direct measurements of selection acting on silent sites. Second, we examine the effects of model assumptions about codon usage on estimation of sequence distances such as dS, dN, and their ratio {omega} = dN/dS. There has been considerable interest in the use of the {omega} ratio to detect positive selection affecting protein evolution, and some concerns have been expressed as to whether this inference is affected by natural selection acting on silent sites (Kreitman and Akashi 1995Go; Yang and Bielawski 2000Go). We analyze 2 sets of data to address these issues, the first of the human and chimpanzee mitochondrial protein–coding genes and the second of 5,639 protein-coding genes from the 5 mammalian species: human, chimpanzee, macaque, mouse, and rat.


    Theory
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
A Mutation-Selection Model of Codon Substitution
We construct a model of codon substitution by specifying the instantaneous rate of substitution from sense codons I = i1i2i3 to J = j1j2j3, where i1 is the nucleotide at the first position in codon I, and so on. We assume that point mutations occur independently at nucleotide sites and thus the rate is zero if I and J differ at more than 2 or 3 codon positions (Goldman and Yang 1994Go). Thus, we focus on the rate between 2 codons that differ at only one position, say position k, with ik != jk. We explicitly model the process of one codon substituting another codon, that is, mutation, selection on the DNA (selection on codon usage), and selection on the protein.

Mutation Bias
Let the mutation rate from nucleotides i to j be µij per generation. The mutation model applies to all 3 codon positions, although the base compositions at the 3 positions may differ. We use the general time reversible (GTR or REV) model (e.g., Yang 1994Go) to describe the mutation process so that Formula, with aij = aji for all i != j. Here Formula reflects mutation bias; if Formula is large, mutations are biased toward T. One of the mutation-bias parameters is redundant, and we scale them so that Formula. If the HKY mutation model (Hasegawa et al. 1985Go) is used, Formulaif i and j differ by a transition and Formulaif i and j differ by a transversion, with {kappa} to be the transition/transversion rate ratio. Our analysis below is based mostly on the HKY model, but GTR is used in some analyses to examine the robustness of the results.

Selection on Codon Usage
We model selection on codon usage by introducing a fitness parameter fI for codon I. The selection coefficient for the mutation that changes the wild type codon I into a new mutant codon J is thus sIJ = fJfI. The probability of fixation of the mutation is Formula , where N is the effective chromosomal population size (Fisher 1930Go; Wright 1931Go; Kimura 1957Go). Let FI = 2NfI be the scaled fitness of codon I, and SIJ = 2NsIJ = 2N(fJfI) = FJ FI be the scaled selection coefficient. As the number of the I -> J mutations in a generation is Formula, the substitution rate from codons I to J is given as

Formula (1)
where Formula is the ratio of the fixation probability of the I -> J mutation to the fixation probability of a neutral mutation, with h(SIJ) < 1, = 1 and > 1 for deleterious mutations (with SIJ < 0), neutral mutations (SIJ = 0), and advantageous mutations (SIJ > 0), respectively.

When the model is applied to sequence data from different species, we have in this study assumed that the effective population size N and the selection coefficients are the same among lineages. Those assumptions can be relaxed at the expense of including more parameters (McVean and Vieira 2001Go; Nielsen et al. 2007Go).

Selection on the Protein
To describe selection on the protein, we multiply the substitution rate by {omega} if and only if the mutation is nonsynonymous (Goldman and Yang 1994Go; Yang and Nielsen 1998Go). Thus, {omega} is the nonsynonymous/synonymous substitution rate ratio. The use of one single {omega} to describe selection on the protein is very simplistic. However, previous models that incorporate amino acid chemical properties to specify codon substitution rates achieved only moderate (although statistically significant) improvements to the model's fit to data, and furthermore, such models produced rather similar estimates of mutation parameters to the simple model of one {omega} ratio (Goldman and Yang 1994Go; Yang et al. 1998Go). Here our focus is on the effect of selection on synonymous codon usage. We also implement the site models that assume variable {omega} ratios among codons in the gene (Nielsen and Yang 1998Go; Yang et al. 2000Go).

To summarize, the substitution rate from codons I to J is specified as

Formula (2)
The diagonals of the rate matrix Q = {qIJ} are determined by the requirement that each row in the matrix sums to zero. As only the difference SIJ = FJFI enters the probability calculation under the model, we fix one of the 61 FI’s to zero and estimate 60 free parameters for the universal genetic code. The model thus includes the following parameters in the substitution rate matrix Q: 8 parameters in the GTR mutation model (or 4 parameters in HKY: {kappa}, Formula, Formula, and Formula), 60 scaled fitness parameters, and {omega}. The sequence distance t or branch lengths on the tree are additional parameters to be estimated from the data.

After the Q matrix is constructed, the stationary distribution of the Markov chain, {pi} = {{pi}1, {pi}2, ..., {pi}61}, is given by the system of linear equations {pi}Q = 0, subject to the constraint that the {pi}j’s sum to one. This distribution can also be calculated directly (see eq. 4 below). The matrix is then multiplied by a constant so that the "average" rate is one: Formula. The transition probability matrix P(t) = eQt is calculated following standard theory. (Note that we have used {pi}J, where the subscript J is a codon to indicate the equilibrium frequency of codon J, and Formula, where the subscript j is a nucleotide to represent the mutation-bias parameter in the HKY or GTR mutation models.)

The Markov model of codon substitution specified by equation (2) is time reversible. To show this, it is sufficient to write the rate matrix as a product of a symmetrical matrix and a diagonal matrix (e.g., Yang 2006Go, p. 33–34). The rate qIJ in equation (2) for a synonymous change can be rewritten as

Formula (3)
Here Formulais the product of the mutation-bias parameters for the 2 unchanged nucleotides (i.e., Formula if I = TCA and J = TCG). The quantity in the square brackets, denoted AIJ, satisfies AIJ = AJI for all I != J, whereas the quantity in the parentheses is a function of J only. The rate qIJ when the I -> J substitution is nonsynonymous can be written in this form as well. Thus, the rate matrix Q = {qIJ} can be written as a product of a symmetrical matrix {AIJ} and a diagonal matrix so that the Markov process is time reversible, with the stationary frequency for codon J given as

Formula (4)
For example, the equilibrium frequency of codon TCG is proportional to Formula. This result makes it clear that the stationary codon frequencies are determined by both mutation bias (represented by Formula) and selection on codon usage (represented by Formula). The model is referred to below as the FMutSel model. It may also be noted that instead of the codon fitness parameters (FJ), one may use the codon frequencies ({pi}J) as parameters. The latter parametrization is convenient for an approximate implementation to be described below.

An LRT of Selection on Codon Usage
We implement a special case of the mutation-selection model of codon substitution (eq. 2), in which all synonymous codons (codons that encode the same amino acid) have the same fitness. Thus, instead of 60 (=61 – 1) codon fitness parameters for the universal genetic code, only 19 (=20 – 1) amino acid fitness parameters are used. The model assumes that the amino acid frequencies are determined by the functional requirements of the protein, but there is no fitness difference among the synonymous codons. From the theory above (eq. 4), the relative frequencies of synonymous codons are determined solely by the mutational-bias parameters. This model is referred to as FMutSel0.

An LRT can be constructed by comparing models FMutSel0 against FMutSel. Twice the log-likelihood difference between the 2 models is compared with the {chi}2 distribution with degree of freedom = 60 – 19 = 41 for the universal code (or 40 for the vertebrate mitochondrial code). This constitutes a test of the null hypothesis that codon usage is due to mutation bias alone and not to selection acting at silent sites.

Measurements of Selection on Codon Usage
As our model explicitly separates mutation bias from selection affecting codon usage, we devise a few measures of the strength of natural selection on codon usage. Imagine observing the Markov process of codon substitution at any site (any codon triplet) for an infinitely long time. In a proportion {pi}I of the time, the wild-type codon at the site in the population is codon I. The mutation (from codon I) to codon J, which changes the nucleotides ik into jk at codon position k and which has scaled fitness SIJ = FJFI, occurs at the rate Formula. Averaged over time, the proportion of the I -> J mutation among all mutations is

Formula (5)
where the sum in the denominator is over all pairs of codons I and J with I != J.

One may then calculate the proportion of advantageous mutations among all mutations as

Formula (6)
where the indicator function Formula SIJ>0 = 1 if SIJ > 0 or = 0 if otherwise. Similarly, the proportion of deleterious mutations among all mutations is

Formula (7)

The strength of positive selection on an average advantageous mutation may be measured by

Formula (8)
where

Formula (9)
is the proportion of the I -> J mutation among all advantageous mutations. Here Formula is defined only if the I -> J mutation is advantageous, with SIJ > 0. Similarly, the strength of negative selection may be measured by the average SIJ among deleterious mutations with SIJ < 0.

One may also calculate the proportion of advantageous mutations among all "substitutions," that is, among those mutations that have passed the filtering by natural selection. This can be calculated using equation (6), with the proportion mIJ calculated using equation (5) but with Formula replaced by Formula or {pi}IqIJ (eq. 2). Because the substitution process is reversible, the proportion of advantageous mutations among substitutions is exactly Formula.

An Approximate Implementation
In the FMutSel and FMutSel0 models, the codon fitness and amino acid fitness parameters are estimated by numerical optimization under ML. We also implement approximate versions of these models by fixing the predicted codon or amino acid frequencies to the observed frequencies in the sequence data. These are referred to as "FMutSel-F" and "FMutSel0-F," respectively. This strategy reduces the number of parameters to be estimated by numerical iteration by 60 under FMutSel-F for the universal genetic code and by 19 under FMutSel0-F. Early models concerning codon usage, such as F1 x 4, F3 x 4, and Fcodon, were all implemented using the observed base or codon frequencies as parameter estimates (Yang 1997Go). For fair comparison, they are now also implemented using proper numerical optimization of the frequency parameters. Models implemented using the approximation are referred to using the suffix "-F" (e.g., F1 x 4-F).


    Analysis of Real Data
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
We analyze 2 sets of data. The first consists of the mitochondrial genes of the human (GenBank accession number D38112) and the chimpanzee (D38113 [GenBank] ) of Horai et al. (1995)Go. The 12 protein-coding genes on the same strand of the genome are concatenated into one "supergene," with 3,569 codons in the alignment. The data were analyzed previously by Hasegawa et al. (1998)Go. We fit both the new models implemented in this paper and many old models implemented in the CODEML program (Yang 1997Go). Several distances between the 2 sequences are calculated under different models, and our objective in this analysis is to examine the impact of model assumptions concerning codon usage on distance estimation.

The second set of data consists of the 5,639 human–chimpanzee–macaque–mouse–rat quintet alignments of orthologous genes from the macaque genome-sequencing project (Rhesus Macaque Genome Sequencing and Analysis Consortium 2007Go). Codons that had alignment gaps in at least one species are removed. The data were analyzed as the primate pair of human and macaque genes, the rodent pair of mouse and rat genes, as well as the quintet including all 5 species. Our objectives in those analyses are to conduct the LRT of neutral evolution at silent sites and to estimate the coefficients of selection acting on codon usage.

Effects of the Model of Codon Usage on Distance Estimation
The log-likelihood values and estimates of sequence distances are shown in table 1 for the human and chimpanzee mitochondrial data set. The assumed mutation model is HKY, but different models are used concerning codon usage. The F1 x 4, F3 x 4, and Fcodon models specify the codon-substitution rate to be proportional to the frequency of the target codon, with the codon frequencies calculated using the 4 nucleotide frequencies (F1 x 4), the nucleotide frequencies at the 3 codon positions (F3 x 4), or with all codon frequencies treated as free parameters (Fcodon) (Yang 1997Go). The F1 x 4MG model was proposed by Muse and Gaut (1994)Go and assumes that the codon-substitution rate is proportional to the frequency of the target nucleotide. F3 x 4MG is an extension of F1 x 4MG and uses different base frequencies at the 3 codon positions. F1 x 4MG and F1 x 4 predict the same equilibrium codon frequencies, as do F3 x 4MG and F3 x 4.


View this table:
[in this window]
[in a new window]

 
Table 1 Estimates of Parameters between the Human and Chimpanzee Mitochondrial Genes under Different Models

 
The new FMutSel model has a much higher log-likelihood value than all the old models, indicating better fit to the data. Note that except for F1 x 4MG, which is equivalent to FMutSel with all codons having the same fitness, none of the other old models are nested within FMutSel and the {chi}2 distribution cannot be used to compare them. However, use of the Akaike information criterion (Akaike 1974Go) leads to clear preference of FMutSel over all old models (table 1). Besides the better fit, we emphasize the better explanatory power of the new model.

We are interested in whether model assumptions concerning codon usage affect estimation of the distances between 2 protein-coding genes. The familiar nonsynonymous and synonymous distances dN and dS are calculated according to Goldman and Yang (1994)Go. Previous studies have found that those distances are sensitive to assumptions about codon usage (e.g., Yang and Nielsen 1998Go, 2000Go). Estimates of dN are very similar among models, but estimates of dS vary considerably. Estimates of the {omega} ratio differ by 2-folds among models. Nevertheless, the new FMutSel model produced estimates that are within the range of the old estimates. The estimates of {omega} under the commonly used F3 x 4 and Fcodon models are slightly smaller than that under FMutSel.

Distances Formula and Formula are the number of nonsynonymous substitutions per nonsynonymous site and the number of synonymous substitutions per synonymous site, respectively, based on the "physical site" definition of sites (Yang 2006Go: eq. 2.20). These distances are more stable across models, as noted previously. Formula is the number of nucleotide substitutions per site at the third codon position before selection on the protein, whereas Formula is the number of nucleotide substitutions per 4-fold degenerate site, estimated from the codon model under ML (Yang 2006Go, p. 63–64). Distances d3B and d4 are very similar to each other and their estimates are also similar among different models of codon usage (table 1). See Yang (2006)Go and Bierne and Eyre-Walker (2003)Go for a discussion of those distances in analysis of codon usage bias.

Overall, estimates of sequence distances and {omega} ratio under the old models, especially models F3 x 4 and Fcodon, are similar to estimates under the new FMutSel model. We also note that FMutSel produced almost identical results to FMutSel-F, indicating that the approximation of fixing the equilibrium codon frequencies at their observed values worked well in the data set. FMutSel-F has a big computational advantage and may be useful in real data analysis.

Test of Selection on Synonymous Codon Usage
We applied the LRT of neutral evolution of codon usage to nuclear genes from the mammalian species. The FMutSel and FMutSel0 models are fitted to each of the 5,639 genes for the human–macaque pair, the mouse–rat pair, and the 5-species quintet. The histograms of the log-likelihood difference between the 2 models ({Delta}{ell}) are shown in figure 1. Table 3 lists the number and proportion of genes in which the LRT is significant. At the 5% level, the null hypothesis of neutral evolution is rejected in 87%, 90%, and 94% of genes for the primate pair, the rodent pair, and the quintet, respectively. The differences in the proportions appear to reflect the information content in the data sets rather than any real biological differences between primates and rodents. The mouse–rat pair is more divergent than the human–macaque pair so that the data are more informative and the test has higher power. Similarly, the quintet data are most informative so that the null hypothesis is rejected in the greatest number of genes. The analysis thus provides statistical evidence that synonymous codon usage in most genes is influenced by natural selection. Nevertheless, the LRT may be sensitive to the mutation model assumed in the FMutSel and FMutSel0 models, and we suggest caution should be exercised in interpreting those results (see Discussion).


Figure 1
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Histograms of the log-likelihood difference ({Delta}{ell}) for test of selection on codon usage for (a) the human–macaque genes, (b) the mouse–rat genes, and (c) the quintet of all 5 species. Values greater than 150 are grouped into the last bin. As 2{Delta}{ell} is asymptotically distributed as Formula under the null model, the critical values for {Delta}{ell} are 28.47 and 32.48 at the 5% and 1% levels, respectively.

 

View this table:
[in this window]
[in a new window]

 
Table 3 Number and Percentage (in Parentheses) of Mammalian Genes for Which the Null Model of Neutral Evolution at Silent Sites Is Rejected

 
We also conducted the LRT by comparing FMutSel0-F against FMutSel-F, using the approximation of fixing equilibrium codon frequencies at their observed values. This approximate test produced very similar results to those of figure 1. The test statistics ({Delta}{ell}) calculated using the 2 procedures are plotted against each other in figure 2 for the quintet data sets.


Figure 2
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— The log-likelihood difference ({Delta}{ell}) for test of selection on codon usage when the codon frequency parameters are estimated by ML iteration (exact) or by fixing them at the observed values. The 5-species mammalian genes are analyzed.

 
The Distribution of Selection Coefficients
We used the FMutSel model to calculate the proportions of mutations with different selective coefficients (S), generating an estimation of the distribution of S among new mutations. For this analysis, we use 4 large data sets: the concatenated mitochondrial genes from the human and chimpanzee and the concatenated nuclear genes for the human–macaque pair, the mouse–rat pair, and the quintet. We used both model M0 (1-ratio), which assumes the same {omega} ratio for all codons, and M3 (discrete), which assumes 2 site classes in proportions p0 and p1 with different {omega} ratios {omega}0 and {omega}1 (Yang et al. 2000Go). The results are shown in table 2. The log-likelihood values under models M0 (1-ratio) and M3 (discrete) are hugely different, indicating that the {omega} ratio is highly variable among codons. Nevertheless, estimates of the mutation bias parameters (Formula) and codon fitness parameters (not shown) are very similar between the 2 models in each of the 4 large data sets (table 2).


View this table:
[in this window]
[in a new window]

 
Table 2 Parameter Estimates under the Mutation-Selection (FMutSel) Model in 4 Concatenated Data Sets

 
We used parameter estimates obtained under model M0 (1-ratio) to calculate the scaled selective coefficients (S) for mutations that involve 2 codons differing at exactly one position and thus have nonzero rates. Those are the possible mutations allowed by the model, and their probabilities of occurrences are given by equation (5). There are 526 and 508 such mutations (codon pairs) for the universal and mitochondrial codes, respectively. The S values for those mutations were binned into 21 bins to generate a histogram, with the mid value in each bin used as the representative for that bin and with the proportion for the bin calculated as the sum of proportions (mIJ in eq. 5) of all mutations falling into that bin. The results are shown in figure 3a. The proportion (P+) of advantageous mutations among all mutations is shown in table 2, as well as the average selective coefficients of advantageous and deleterious mutations (Formulaand Formula). Because preferred codons with higher fitness are more common and most mutations lead to unpreferred codons with lower fitness, the distribution of S among new mutations is skewed to the left, with the proportion Formula. The proportion of advantageous mutations among substitutions is higher than P+ because an advantageous mutation has a higher fixation probability and makes a greater contribution to substitutions than does a deleterious mutation. Indeed, the proportions of advantageous mutations among substitutions is Formula, due to the reversibility of the substitution model.


Figure 3
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Estimated distributions of selection coefficient S = 2Ns from 4 data sets: concatenated human–chimpanzee mitochondrial genes, concatenated human–macaque nuclear genes, concatenated mouse–rat nuclear genes, and concatenated data for all 5 mammalian species. The histograms show the proportion of mutations with scaled selection coefficient S (a) among all mutations, (b) after filtering by natural selection on codon usage, and (c) after filtering by selection on both codon usage and on amino acid replacements. Model M0 (1-ratio) is used, with the same {omega} ratio for all nonsynonymous changes. Parameter estimates are shown in table 2.

 
The estimates of Formula and Formula are greater and thus selection on silent sites is stronger in the mitochondrial genes than in the nuclear genes (table 2). In the former, ~31% of new mutations are advantageous, whereas in the latter, the proportion is 37–40%. The much lower {omega} ratios in the mitochondrial genes than in the nuclear genes indicate that the mitochondrial proteins are under much stronger selective constraint than the nuclear proteins. The difference is more striking when one considers the fact that the effective population size for mitochondrial genes is ~ Formula that of the nuclear genes and that selection is less efficient in smaller populations. The higher efficiency of selection in mtDNA, with respect to both codon usage and protein evolution, may be due to the fact that the haploid mitochondrial genome makes it easy to remove recessive mutations, whereas they may remain hidden in the heterozygous state in nuclear genes. Another possible explanation is the hypothesis of selection for translational accuracy, which predicts stronger selection on codon usage on highly conserved proteins or on highly conserved amino acids in a protein because the fitness cost of translational misincorporation should depend on how the amino acid change affects protein function (Akashi 1994Go). If mitochondrial genes perform crucial biological functions and are more highly expressed than nuclear genes, this hypothesis may explain both the stronger selection on protein evolution and the stronger selection on codon usage.

It should be noted that in our model, all S values are nonzero, and P+ in table 2 includes mutations with S only very slightly positive, the evolutionary dynamics of which may be indistinguishable from that of neutral mutations. For example, mutations with |S| > 2 are rare in all data sets. The estimated proportions of mutations with S > 2 and S < –2 are 0.2% and 1.7%, respectively, for the mitochondrial genes, 0.2% and 1.6% for the human–macaque pair, 0.1% and 1.0% for the mouse–rat pair, and 0.1% and 0.9% for the quintet. Thus, although the LRT rejects the null model of neutral evolution of silent sites, selection on codon usage is mostly weak, and most mutations appear to be nearly neutral with respect to selection on codon usage.

We are also interested in how natural selection on codon usage changes the fitness distribution of mutations, that is, how mutations of different fitness contribute to substitutions. A histogram of S after filtering by natural selection on codon usage can be generated using the same procedure as described above, except that the proportion mIJ is calculated using equation (5), with Formula replaced by Formula. The resulting histograms (fig. 3b) show the proportion of mutations with scaled fitness S that has survived natural selection on codon usage. Similarly, If we replace Formula by {pi}IqIJ in equation (5), the resulting histograms (fig. 3c) represent the proportion of mutations with fitness S among observed substitutions, that is, among mutations that have passed the filtering by selection both on codon usage bias and on amino acid replacements. Because of the detailed balance condition of the reversible Markov model of substitution, the distributions in figure 3b and c are all symmetrical. Note that here the distinction between selection on codon usage and selection on amino acid replacements is more conceptual than temporal, with no implication that one necessarily occurs before the other.


    Discussion
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
Mechanistic Models of Codon Usage and Protein Evolution
A number of authors have studied the frequencies of synonymous codons for 2-fold degenerate amino acids as the result of interactions between mutation, genetic drift, and natural selection (Kimura 1983Go; Li 1987Go; Bulmer 1991Go; McVean and Charlesworth 1999Go). Let the 2 alleles be 1 (preferred codon) and 0 (unpreferred codon), with the mutation rate from 0 to 1 to be µ1 and that in the reverse direction be µ0. Suppose that the 2 alleles have fitness f0 and f1 so that the selection coefficient of the 0 -> 1 mutation in the allele-0 population is s = f1 f0 and that of the 1 -> 0 mutation in the allele-1 population is –s. At mutation-selection-drift equilibrium, the probability density of the frequency p of allele 1 is given as

Formula (10)
(Wright 1931Go). This theory can be used to analyze codon usage in a single species, under the assumption that one of the alleles is fixed. The probability that the population is fixed at the preferred codon can be obtained by integrating the density f(p) from 1 – 1/N to 1 (e.g., Li 1987Go), as

Formula (11)
where the proportionality constant is determined to ensure that {pi}0 + {pi}1 = 1. If we assume that the same selective pressure applies to synonymous codons for all 2-fold degenerate amino acids in a gene, {pi}1 will be the proportion of preferred codons in the gene. The contributions of mutation and selection to the equilibrium frequencies of synonymous codons are apparent from equation (11). This may also be considered a special case of equation (4), which gives the equilibrium distribution of the codon-substitution process.

McVean and Vieira (1999)Go used equation (11) to analyze observed counts of preferred codons for 2-fold amino acids in several Drosophila species, fitting binomial models by ML. The analysis used information on codon usage but ignored differences between species. McVean and Vieira (2001)Go implemented a population genetics model that is very similar to equation (2) to describe substitutions between synonymous codons between species. The authors analyzed between-species synonymous differences to estimate the strength of natural selection on synonymous codon usage, with nonsynonymous differences ignored. The FMutSel models extend the work of McVean and Vieira to a full codon substitution model, which is suitable for comparative analysis of protein-coding genes from multiple species.

Previous models of codon substitution (Goldman and Yang 1994Go; Muse and Gaut 1994Go) aim to describe nucleotide substitutions and do not explicitly accommodate mutation bias and natural selection acting on the DNA level. The models may thus be ill suited for studying the forces and mechanisms of the evolutionary process at silent sites. The mutation-selection models implemented in this paper address this drawback, by introducing parameters that explicitly describe mutation bias and natural selection acting on codon usage. We suggest that such models, with the easy interpretation of the model parameters, may be very useful for studying the process of molecular sequence evolution.

There has been considerable interest in incorporating fitness effects of new mutations in constructing substitution models for phylogenetic analysis. Halpern and Bruno (1998)Go considered a codon-substitution model in which at every amino acid site in the protein, different amino acids have different fitness and thus different equilibrium frequencies. The model was developed for distance calculation but is not practical for real data analysis due to its use of too many parameters. Moses et al. (2003)Go adapted the theory to describe nucleotide substitutions and to estimate site-specific substitution rates in noncoding regulatory elements such as transcription factor–binding sites. Note that from equation (4), we have

Formula (12)
from which equation (9) of Halpern and Bruno (1998)Go can be seen to equal h(SIJ) in equation (1), with h(SIJ) = 1 for SIJ = 0. Thus, the underlying population genetics theory is the same although the applications are very different. Note that given a reversible mutation model such as HKY or GTR, reversibility of codon substitution is a natural property of the model and not an additional assumption, as made by Halpern and Bruno (1998)Go and Moses et al. (2003)Go.

The FMutSel model also has similarities to the site-class models of amino acid replacement implemented by Koshi et al. (1999)Go, which assume that different site classes have different amino acid frequencies and different substitution patterns and that in each site class, every amino acid J has its own "propensity" FJ. Koshi et al. (1999Go, eq. 4) applied a truncation on the substitution rate, equivalent to fixing h(SIJ) = 1 whenever the difference in propensity SIJ = FJFI > 0. Like FMutSel, this model is also time reversible, with the same equilibrium distribution, where the frequency of amino acid J is proportional to Formula. Except for the truncation mentioned above, the model of Koshi et al. (1999)Go can be given a population genetics interpretation, with the propensity interpreted as the scaled fitness FJ. However, the truncation of rates means that the model assumes that an advantageous mutation is fixed at the same rate as a neutral mutation, which is unrealistic biologically. A similar criticism was made by Thorne et al. (2007)Go.

More recent work by Yu and Thorne (2006)Go, Thorne et al. (2007)Go, and Choi et al. (2007)Go assigned a fitness to the sequence when they developed mutation-selection models to describe the evolution of RNA or protein sequences. An advantage of those models is that they allow dependence among sites due to RNA or protein structural constraints.

We note that there has been some debate in the literature concerning whether use of the {omega} ratio to detect natural selection acting on the protein (for reviews, see Yang and Bielawski 2000Go; Yang 2002Go) requires the assumption of neutral evolution at silent sites. Many authors take it for granted that this assumption is needed. A concern is that if selection acts on codon usage, codon models may be misled to produce an {omega} ratio greater than one because selection on silent sites has reduced dS and not because positive selection has elevated dN. From the mutation-selection models implemented in this paper, it is clear that the assumption is not necessary and it is possible to use the {omega} ratio to detect positive selection acting on the protein even if silent sites are under natural selection, as assumed in FMutSel. Comparison between dS and dN is a contrast between the rates before and after the action of selection on the protein (Yang 2006Go, eq. 2.19) so that the comparison is valid whether evolution at silent sites is driven by mutation or selection. In this regard, selection on silent sites may be more accurately described as selection on the DNA level as it affects both silent and replacement sites.

Sensitivity of the LRT to the Mutation Model
The mutation-selection model of codon substitution makes many simplistic assumptions about the evolutionary process. For our purpose of testing for selection acting on silent sites, the most worrying assumptions appear to be those concerning the mutation process as the mutation-bias and codon-fitness parameters are expected to be highly correlated in such an analysis. Indeed, the effects of the 2 would be virtually impossible to separate if we had used only information on codon frequencies (see eqs. 4 and 11).

To examine the impact of the assumed mutation model on the LRT of selection on codon usage, we implemented the GTR mutation model (e.g., Yang 1994Go). The codon frequency parameters are estimated using the observed frequencies rather than by ML iteration. Application of the LRT under the GTR model to the mammalian data produced results very similar to those obtained under HKY. The proportions of genes for which the LRT is significant under GTR (table 3) are slightly lower (by 1–2%) than under HKY. Figure 4 plots the test statistic ({Delta}{ell}) for the 2 mutation models for the quintet data sets. The results suggest that the LRT may not be very sensitive to the assumed mutation model.


Figure 4
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— The log-likelihood difference ({Delta}{ell}) for test of selection on codon usage when the assumed mutation model is HKY or GTR. The 5-species mammalian genes are analyzed.

 
However, the estimates of codon-fitness parameters for the concatenated data under the 2 mutation models are very different (results not shown). This is the case even though both mutation models predicted very similar codon frequency parameters, which closely match the observed frequencies. Our estimates of the selection coefficients are affected by the mutation model. Thus, we found that the LRT is somewhat insensitive to the assumed mutation model but the estimates of codon fitness parameters are.

Both HKY and GTR assume independent mutations at nucleotide sites. There is considerable evidence suggesting that the mutation rate of a nucleotide may depend on neighboring nucleotides (e.g., Bulmer 1986Go; Hwang and Green 2004Go; Siepel and Haussler 2004Go). One well-known example of such context effects is the high mutation rate of CpG dinucleotides in mammalian genomes. As the cytosine in CpG is prone to methylation and deamination, CpG dinucleotides have a very high rate of mutating into TpG (Scarano et al. 1967Go). With such mutational context effects, both the null and alternative hypotheses (FMutSel0 and FMutSel) in the LRT are violated, but the 2 models may not be affected to the same extent, in which case the violation of assumptions may cause the test to generate excessive false positives. For example, FMutSel0 predicts that the relative frequencies of 4-fold degenerate codons encoding the same amino acid are given by the mutation-bias parameters (Formula), independent of the encoded amino acid. If the mutation rate and pattern at the third codon position depend on the nucleotides at the first and second positions, FMutSel0 may fit the data poorly, but FMutSel may still achieve a reasonable fit because of its use of a separate codon fitness parameter FJ for each target codon J. Although both FMutSel0 and FMutSel make use of information from nonsynonymous differences as well as synonymous differences, the test may nevertheless be sensitive to such mutational context effects. It has also been suggested that one mutation event may affect multiple nucleotides and the assumption of independent mutations may be unrealistic (e.g., Yang et al. 1998Go; Whelan and Goldman 2004Go). However, those studies typically analyze substitutions instead of mutations, and the apparent double or triple substitutions may reflect artifacts of the inadequate substitution model rather than true double or triple mutations. The models developed here concern the mutation process, and it would appear that double or triple mutations, if not rare, should affect the 2 models in similar ways. At any rate, the sensitivity of the LRT to violations of the assumed mutation model is not well understood and merits further research.

We consider several strategies that may alleviate the confounding effect of mutation and selection. The first is to make certain assumptions concerning either the mutation or the selection process. For example, the method of Nielsen et al. (2007)Go required prior knowledge of preferred and unpreferred codons and also assumed the same selective strength acting on all codons. The latter assumption may be unrealistic in some data sets. A second strategy is to analyze pseudogenes or noncoding DNA to estimate mutation parameters and then use them in the mutation-selection model of codon substitution to analyze coding genes. Similarly, one may analyze coding and neighboring noncoding regions jointly, with the same mutation-bias parameters applied to both regions and the selection parameters applied to the coding regions only. This requires that the same mutation process operates in both coding and noncoding regions, an assumption that may be violated due to translation-coupled repair (Duret 2002Go). A third strategy, suitable for joint analysis of many genes from the same set of species, is to assume that the mutation parameters are shared among genes or at least among genes with similar codon usage bias or GC content at the third codon positions, whereas the strengths of selection on codon usage may differ among genes. In this paper, we analyzed the 5,639 mammalian genes separately, fitting 66 or more parameters to each gene, so that the model is rather parameter-rich. Finally, developing models that explicitly accommodate mutational context effects may also be very useful in improving the realism of the models implemented here. In this regard, our likelihood model provides a natural framework for such extensions.


    Program Availability
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
The new FMutSel and FMutSel0 models developed in this paper are implemented independently by the 2 authors for error checking. All models described in this paper are implemented in the CODEML program in PAML 4 (Yang 2007Go).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 
We thank 3 referees for many useful comments. This study is supported by a grant from the Biotechnological and Biological Sciences Research Council to Z.Y. and grants from FNU (Danish Natural Science Research Council) and Danmarks Grundforskningsfond to R.N.


    Footnotes
 
Jeffrey Thorne, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Theory
 Analysis of Real Data
 Discussion
 Program Availability
 Acknowledgements
 References
 

    Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr (1974) 19:716–723.[CrossRef]

    Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics (1994) 136:927–935.[Abstract]

    Akashi H. Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics (1995) 139:1067–1076.[Abstract]

    Bennetzen JL, Hall BD. Codon selection in yeast. J Biol Chem (1982) 257:3026–3031.[Abstract/Free Full Text]

    Bierne N, Eyre-Walker A. The problem of counting sites in the estimation of the synonymous and nonsynonymous substitution rates: implications for the correlation between the synonymous substitution rate and codon usage bias. Genetics (2003) 165:1587–1597.[Abstract/Free Full Text]

    Bulmer M. Neighboring base effects on substitution rates in pseudogenes. Mol Biol Evol (1986) 3:322–329.[Abstract]

    Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature (1987) 325:728–730.[CrossRef][Medline]

    Bulmer MG. The selection-mutation-drift theory of synonymous codon usage. Genetics (1991) 129:897–907.[Abstract]

    Castillo-Davis CI, Hartl DL. Genome evolution and developmental constraint in Caenorhabditis elegans. Mol Biol Evol (2002) 19:728–735.[Abstract/Free Full Text]

    Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL. Quantifying the impact of protein tertiary structure on molecular evolution. Mol Biol Evol (2007) 24:1769–1782.[Abstract/Free Full Text]

    Dunn KA, Bielawski JP, Yang Z. Substitution rates in Drosophila nuclear genes: implications for translational selection. Genetics (2001) 157:295–305.[Abstract/Free Full Text]

    Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev (2002) 12:640–649.[CrossRef][Web of Science][Medline]

    Duret L, Mouchiroud D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, Arabidopsis. Proc Natl Acad Sci USA (1999) 96:4482–4487.[Abstract/Free Full Text]

    Fisher R. The distribution of gene ratios for rare mutations. Proc R Soc Edinb (1930) 50:205–220.

    Frydman J. Folding of newly translated proteins in vivo: the role of molecular chaperones. Annu Rev Biochem (2001) 70:603–647.[CrossRef][Web of Science][Medline]

    Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol (1994) 11:725–736.[Abstract]

    Halpern AL, Bruno WJ. Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol (1998) 15:910–917.[Abstract]

    Hasegawa M, Cao Y, Yang Z. Preponderance of slightly deleterious polymorphism in mitochondrial DNA: replacement/synonymous rate ratio is much higher within species than between species. Mol Biol Evol (1998) 15:1499–1505.[Free Full Text]

    Hasegawa M, Kishino H, Yano T. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol (1985) 22:160–174.[CrossRef][Web of Science][Medline]

    Horai S, Hayasaka K, Kondo R, Tsugane K, Takahata N. Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc Natl Acad Sci USA (1995) 92:532–536.[Abstract/Free Full Text]

    Hwang DG, Green P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA (2004) 101:13994–14001.[Abstract/Free Full Text]

    Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol (1981) 146:1–21.[CrossRef][Web of Science][Medline]

    Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol (1985) 2:13–34.[Abstract]

    Kanaya S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T. Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol (2001) 53:290–298.[CrossRef][Web of Science][Medline]

    Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM. A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science (2007) 315:525–528.[Abstract/Free Full Text]

    Kimura M. Some problems of stochastic processes in genetics. Ann Math Stat (1957) 28:882–901.[CrossRef]

    Kimura M. The neutral theory of molecular evolution (1983) Cambridge: Cambridge University Press.

    Koshi JM, Mindell DP, Goldstein RA. Using physical-chemistry-based substitution models in phylogenetic analyses of HIV-1 subtypes. Mol Biol Evol (1999) 16:173–179.[Abstract]

    Kreitman M, Akashi H. Molecular evidence for natural selection. Annu Rev Ecol Syst (1995) 26:403–422.[Web of Science]

    Kurland CG. Translational accuracy and the fitness of bacteria. Annu Rev Genet (1992) 26:29–50.[Web of Science][Medline]

    Li W-H. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J Mol Evol (1987) 24:337–345.[CrossRef][Web of Science][Medline]

    McVean GA, Charlesworth B. A population genetics model for the evolution of synonymous codon usage: patterns and predictions. Genet Res (1999) 74:145–158.[CrossRef][Web of Science]

    McVean GA, Vieira J. The evolution of codon preferences in Drosophila: a maximum-likelihood approach to parameter estimation and hypothesis testing. J Mol Evol (1999) 49:63–75.[CrossRef][Web of Science][Medline]

    McVean GA, Vieira J. Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila. Genetics (2001) 157:245–257.[Abstract/Free Full Text]

    Moriyama EN, Powell JR. Codon usage bias and tRNA abundance in Drosophila. J Mol Evol (1997) 45:514–523.[CrossRef][Web of Science][Medline]

    Moses AM, Chiang DY, Kellis M, Lander ES, Eisen MB. Position specific variation in the rate of evolution in transcription factor binding sites. BMC Evol Biol (2003) 3:19.[CrossRef][Medline]

    Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol (1994) 11:715–724.[Abstract]

    Musto H, Cruveiller S, D'Onofrio G, Romero H, Bernardi G. Translational selection on codon usage in Xenopus laevis. Mol Biol Evol (2001) 18:1703–1707.[Abstract/Free Full Text]

    Nielsen R, Bauer DuMont VL, Hubisz MJ, Aquadro CF. Maximum likelihood estimation of ancestral codon usage bias parameters in Drosophila. Mol Biol Evol (2007) 24:228–235.[Abstract/Free Full Text]

    Nielsen R, Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics (1998) 148:929–936.[Abstract/Free Full Text]

    Rhesus Macaque Genome Sequencing and Analysis Consortium. Evolutionary and biomedical insights from the Rhesus macaque genome. Science (2007) 316:222–234.[Abstract/Free Full Text]

    Scarano E, Iaccarino M, Grippo P, Parisi E. The heterogeneity of thymine methyl group origin in DNA pyrimidine isostichs of developing sea urchin embryos. Proc Natl Acad Sci USA (1967) 57:1394–1400.[Free Full Text]

    Sharp PM, Averof M, Lloyd AT, Matassi G, Peden JF. DNA sequence evolution: the sounds of silence. Philos Trans R Soc Lond B Biol Sci (1995) 349:241–247.[Web of Science][Medline]

    Sharp PM, Li WH. The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol Biol Evol (1987) 4:222–230.[Abstract]

    Siepel A, Haussler D. Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol (2004) 21:468–488.[Abstract/Free Full Text]

    Thorne JL, Choi SC, Yu J, Higgs PG, Kishino H. Population genetics without intraspecific data. Mol Biol Evol (2007) 24:1667–1677.[Abstract/Free Full Text]

    Whelan S, Goldman N. Estimating the frequency of events that cause multiple-nucleotide changes. Genetics (2004) 167:2027–2043.[Abstract/Free Full Text]

    Wright S. Evolution in Mendelian populations. Genetics (1931) 16:97–159.[Free Full Text]

    Yang Z. Estimating the pattern of nucleotide substitution. J Mol Evol (1994) 39:105–111.[Web of Science][Medline]

    Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci (1997) 13:555–556.[Free Full Text]

    Yang Z. Inference of selection from multiple species alignments. Curr Opin Genet Dev (2002) 12:688–694.[CrossRef][Web of Science][Medline]

    Yang Z. Computational molecular evolution (2006) Oxford (UK): Oxford University Press.

    Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol (2007) 24:1586–1591.[Abstract/Free Full Text]

    Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol (2000) 15:496–503.[CrossRef][Medline]

    Yang Z, Nielsen R. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol (1998) 46:409–418.[CrossRef][Web of Science][Medline]

    Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol (2000) 17:32–43.[Abstract/Free Full Text]

    Yang Z, Nielsen R, Goldman N, Pedersen A-MK. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics (2000) 155:431–449.[Abstract/Free Full Text]

    Yang Z, Nielsen R, Hasegawa M. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol (1998) 15:1600–1611.[Abstract]

    Yu J, Thorne JL. Dependence among sites in RNA evolution. Mol Biol Evol (2006) 23:1525–1537.[Abstract/Free Full Text]

Accepted for publication December 19, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J R Soc InterfaceHome page
M. Welch, A. Villalobos, C. Gustafsson, and J. Minshull
You're one in a googol: optimizing genes for protein expression
J R Soc Interface, August 6, 2009; 6(Suppl_4): S467 - S476.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Rodrigue, C. L. Kleinman, H. Philippe, and N. Lartillot
Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons
Mol. Biol. Evol., July 1, 2009; 26(7): 1663 - 1676.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
E. J. Dunham, V. G. Dugan, E. K. Kaser, S. E. Perkins, I. H. Brown, E. C. Holmes, and J. K. Taubenberger
Different Evolutionary Trajectories of European Avian-Like and Classical Swine H1N1 Influenza A Viruses
J. Virol., June 1, 2009; 83(11): 5485 - 5494.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. J. Weadick and B. S.W. Chang
Molecular Evolution of the {beta}{gamma} Lens Crystallin Superfamily: Evidence for a Retained Ancestral Function in {gamma}N Crystallins?
Mol. Biol. Evol., May 1, 2009; 26(5): 1127 - 1142.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Subramanian
Temporal Trails of Natural Selection in Human Mitogenomes
Mol. Biol. Evol., April 1, 2009; 26(4): 715 - 717.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. M. Moses and R. Durbin
Inferring Selection on Amino Acid Preference in Protein Domains
Mol. Biol. Evol., March 1, 2009; 26(3): 527 - 536.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Anisimova and C. Kosiol
Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models
Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. dos Reis and L. Wernisch
Estimating Translational Selection in Eukaryotic Genomes
Mol. Biol. Evol., February 1, 2009; 26(2): 451 - 461.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
W. Delport, K. Scheffler, and C. Seoighe
Models of coding sequence evolution
Brief Bioinform, January 1, 2009; 10(1): 97 - 109.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
S. C. Choi, B. D Redelings, and J. L Thorne
Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences
Phil Trans R Soc B, December 27, 2008; 363(1512): 3931 - 3939.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
S. Kryazhimskiy, G. A Bazykin, J. Plotkin, and J. Dushoff
Directionality in the evolution of influenza A haemagglutinin
Proc R Soc B, November 7, 2008; 275(1650): 2455 - 2464.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Rodrigue, N. Lartillot, and H. Philippe
Bayesian Comparisons of Codon Substitution Models
Genetics, November 1, 2008; 180(3): 1579 - 1591.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/3/568    most recent
msm284v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Yang, Z.
Right arrow Articles by Nielsen, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yang, Z.
Right arrow Articles by Nielsen, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?