Molecular Biology and Evolution 17:32-43 (2000)
© 2000 Society for Molecular Biology and Evolution
Article |
Estimating Synonymous and Nonsynonymous Substitution Rates Under Realistic Evolutionary Models

*Department of Biology, University College London, England;
and
Department of Organismic and Evolutionary Biology, Harvard University
| Abstract |
|---|
|
|
|---|
Approximate methods for estimating the numbers of synonymous and nonsynonymous substitutions between two DNA sequences involve three steps: counting of synonymous and nonsynonymous sites in the two sequences, counting of synonymous and nonsynonymous differences between the two sequences, and correcting for multiple substitutions at the same site. We examine complexities involved in those steps and propose a new approximate method that takes into account two major features of DNA sequence evolution: transition/transversion rate bias and base/codon frequency bias. We compare the new method with maximum likelihood, as well as several other approximate methods, by examining infinitely long sequences, performing computer simulations, and analyzing a real data set. The results suggest that when there are transition/transversion rate biases and base/codon frequency biases, previously described approximate methods for estimating the nonsynonymous/synonymous rate ratio may involve serious biases, and the bias can be both positive and negative. The new method is, in general, superior to earlier approximate methods and may be useful for analyzing large data sets, although maximum likelihood appears to always be the method of choice.
| Introduction |
|---|
|
|
|---|
Estimation of synonymous and nonsynonymous substitution rates is important in understanding the dynamics of molecular sequence evolution (Kimura 1983;
The simplest problem in this regard is estimation of the numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site between two sequences. In the past two decades, a number of intuitive methods have been suggested for this estimation. They involve ad hoc treatments that cannot be justified rigorously, and they will be referred to here as approximate methods. In common, they involve three steps. First, the numbers of synonymous (S) and nonsynonymous (N) sites in the sequences are counted. Second, the numbers of synonymous and nonsynonymous differences between the two sequences are counted. Third, a correction for multiple substitutions at the same site is applied to calculate the numbers of synonymous (dS) and nonsynonymous (dN) substitutions per site between the two sequences. Here, we use the notation of Nei and Gojobori (1986
, referred to later as "NG"), which appears to be the most commonly used approximate method; definitions of symbols are given in Table 1
. The reader is referred to Ina (1995, 1996)
for a recent discussion of important concepts. While the above strategy appears simple, well-known features of DNA sequence evolution, such as unequal transition and transversion rates and unequal nucleotide or codon frequencies, make it a real challenge to count sites and differences correctly.
|
Miyata and Yasunaga (1980)
A maximum-likelihood (ML) method for estimating dS and dN between two sequences was developed by Goldman and Yang (1994)
based on an explicit model of codon substitution. The ML method does not involve ad hoc approximations. Furthermore, the ML method is flexible in that knowledge of the substitution process such as transition/transversion bias, codon usage biases, and even chemical differences between amino acids can easily be incorporated into the model.
Relying on insights gained through previous methods, particularly ML estimation, we propose in this paper an approximate method for estimating dS and dN that accounts for two major features of DNA sequence evolution: the transition/transversion bias and the base (codon) frequency bias. We examine the similarities and differences among the ML method, the approximate method of this paper, and two other approximate methods by a consistency analysis of infinitely long sequences and computer simulation of finite data. A real data set is also analyzed using different estimation methods.
| Methods for Estimating Synonymous and Nonsynonymous RatesMaximum-Likelihood Estimation |
|---|
|
|
|---|
First, we describe the ML method of Goldman and Yang (1994)
j) is given by

Parameter
is the (mutational) transition/transversion rate ratio, and
= dN/dS is the nonsynonymous/synonymous rate ratio. The equilibrium codon frequencies (
j) are calculated using the nucleotide frequencies at the three codon positions; that is, codon frequencies are proportional to the products of nucleotide frequencies at the three codon positions. This approach was found to produce results similar to those obtained using all codon frequencies as free parameters, although codon frequencies are often quite different from those expected from nucleotide frequencies at codon positions (e.g., Goldman and Yang 1994
; Pedersen, Wiuf, and Christiansen 1998
; Yang and Nielsen 1998
). We note that different assumptions about codon frequencies can be easily incorporated into the ML method and the approximate method of this paper. Equation (1) is similar to the simulation model of Gojobori (1983)
and the likelihood model of Muse and Gaut (1994)
, although those authors did not consider the transition/transversion rate bias or different base frequencies at the three codon positions.
The diagonal elements of the rate matrix, Q = {qij}, are determined by the mathematical requirement (e.g., Grimmett and Stirzaker 1992
, p. 241) that the row sums are zero:
![]()
Because time and rate are confounded, we multiply the rate matrix by a scaling factor so that the expected number of nucleotide substitutions per codon is one:
![]()
This scaling means that time t (or, equivalently, branch length or sequence distance) is measured by the expected number of (nucleotide) substitutions per codon.
The transition probability matrix is calculated as
![]() |
![]() |
The log-likelihood function is given by the multinomial probability
![]()
where nij is the number of sites occupied by codons i and j in the two sequences. The codon frequencies
i are estimated using the observed nucleotide frequencies at the three codon positions in the two sequences. Parameters t,
, and
are estimated by maximizing the likelihood function numerically, and the estimates are used to calculate dS and dN. Specifically, the proportions of synonymous and nonsynonymous substitutions are given as

and
*N = 1 -
*S, respectively. The summation is taken over all codon pairs i and j (i
j) that code for the same amino acid, and aai is the amino acid encoded by codon i. The numbers of synonymous and nonsynonymous substitutions per codon are then t
*S and t
*N, respectively. The proportions of synonymous and nonsynonymous sites are defined as the proportions of synonymous and nonsynonymous "mutations" before the operation of natural selection at the amino acid level (Goldman and Yang 1994
; Ina 1995
). These can be calculated in a manner similar to equation (7) as
1S and
1N (equivalent to 
S and 
N in Goldman and Yang 1994
), using the ML estimate of
but with
= 1 fixed. The numbers of synonymous and nonsynonymous sites per codon are 3
1S and 3
1N, respectively. The numbers of synonymous and nonsynonymous substitutions per site are then dS = t
*S/(3
1S) and dN = t
*N/(3
1N), respectively (see Table 1
).
A New Approximate Method
We suggest an approximate method for estimating dS and dN using the strategy adopted by previous authors: counting sites, counting differences, and correcting for multiple hits. In all three steps, we take into account the transition/transversion rate bias and the base (codon) frequency bias. Approximately, our method is based on the HKY85 nucleotide mutation (substitution) model (Hasegawa, Kishino, and Yano 1985
). Although this is not the most general model available, it accounts for two most important features of the mutation process, that is, the transition/transversion bias and unequal base frequencies. Previous results (e.g., Yang 1994a
) suggest that adding further complication is often unnecessary.
Estimating the Transition/Transversion Rate Ratio (
)
We use the fourfold-degenerate sites at the third codon positions and the nondegenerate sites to estimate
. Mutations at the fourfold-degenerate sites do not change the amino acid, and thus the transition/transversion rate bias at those sites should reflect the mutational bias. Mutations at nondegenerate sites all lead to amino acid changes and can also be used to estimate
(see eq. 1). Here, we assume that different nonsynonymous substitutions have the same rate irrespective of the pair of amino acids involved, although the assumption is unrealistic (Yang, Nielsen, and Hasegawa 1998
). We calculate an average of
, weighted by the numbers of nucleotide sites in the two site classes. Since no simple formula is available for estimating
under the HKY85 model, we use the formula for the F84 model (Yang 1994 b
) instead, relying on the similarity of the two models. We calculate

where T and V are proportions of transitional and transversional differences, respectively, and
Y =
T +
C and
R =
A +
G. Then,
![]() |

(Yang 1994b
). The estimated
F84 is then transformed to
(i.e.,
HKY85) using the following formula (see Goldman 1993
)
![]()
For data of multiple sequences, we suggest estimating a common
by averaging estimates from all pairwise comparisons and using the combined estimate of
in the calculation of pairwise dN and dS rates.
Counting Synonymous and Nonsynonymous Sites
Inas (1995)
Table 1
for counting synonymous and nonsynonymous sites in each codon is correct for mutation models more general than that of Kimura (1980)
, although Inas Table involves minor errors for codons that can change to stop codons in one step. In general, synonymous and nonsynonymous sites can be counted as in the ML method discussed above for any codon-substitution model (Goldman and Yang 1994
). We count sites using codons in the two compared sequences, rather than the equilibrium codon frequencies expected from the model (see discussion in Yang and Nielsen 1998
). As there should be about 4% loss of sites due to mutations to stop codons, this scaling means that we are slightly underestimating dS and dN, although the
ratio is not affected (Yang and Nielsen 1998
). The numbers of sites (S and N) are scaled so that S + N = 3Lc, where Lc is the number of codons. Nucleotide frequencies at synonymous and nonsynonymous sites are recorded and used later for multiple-hit corrections.
Counting Synonymous and Nonsynonymous Differences
Observed nucleotide differences between the two sequences are classified into four categories: synonymous transitions, synonymous transversions, nonsynonymous transitions, and nonsynonymous transversions. When the two compared codons differ at one position, the classification is obvious. When they differ at two or three positions, there will be two or six parsimonious pathways along which one codon could change into the other, and all of them should be considered. Since different pathways may involve different numbers of synonymous and nonsynonymous changes, they should be weighted differently. Miyata and Yasunaga (1980)
and Li, Wu, and Luo (1985)
made valuable attempts to weight pathways. They also pointed out that equal weighting of pathways, which is used in later methods such as those of Nei and Gojobori (1986)
and Ina (1995)
, biases estimates of
toward 1; that is, equal weighting tends to overestimate
when
< 1 and to underestimate
when
> 1.
The most appropriate weights are the relative probabilities of pathways, which we will use in the new approximate method. The probabilities depend on the parameters being estimated: the sequence divergence level (t), the transition/transversion rate ratio (
), and the dN/dS ratio (
). For given values of t,
, and
, it is easy to construct the rate matrix Q and calculate the transition probability matrix P(t) (see eqs. 1 and 4). We use the Taylor expansion in this case for its fast speed.

The number of terms used is determined by a preset accuracy level. The weight, that is, the probability for each pathway, is calculated as the product of the probabilities of all changes involved in the pathway. Pathways involving stop codons are given weight 0. If all pathways involve stop codons (for example, between AAG and TGG in the mammalian mitochondrial code), ad hoc decisions have to be made.
An example is given in Figure 1
using a pair of codons in the mitochondrial genes of the human and the orangutan. The concatenated sequence of 12 genes on the H-strand has 3,331 codons (see below). Between the two sequences, 1,198 codons are different at one position, 151 are different at two positions, and 21 are different at all three positions. The 329th codon is TTA in the human and CTC in the orangutan. Transition probabilities for changes involved in each of the two pathways are given in Figure 1
, calculated using the estimates obtained by the new method (t = 0.873,
= 10.61, and
= 0.057). The probability for the first path is then pTTA,TTC(t) x pTTC,CTC(t) = 0.00541 x 0.04974 = 0.00027, while that for the second path is 0.29219 x 0.08679 = 0.02536. Weights for the two pathways are thus 0.011 and 0.989, and there are 0.022 nonsynonymous and 1.978 synonymous differences between the two codons. Since the nonsynonymous rate is much lower than the synonymous rate (
< 1), the first path is much less likely. Equal weighting of pathways would give one synonymous and one nonsynonymous difference between the two codons. Other codon pairs may not be so extreme, as the different pathways may involve the same numbers of different types of changes.
|
Correcting for Multiple Hits Using Estimated Numbers of Sites and Differences
We use the distance formula (eq. 10) for the F84 model (Tateno, Takezaki, and Nei 1994
The Algorithm
Our method for estimating dS and dN can be summarized in the following iterative algorithm.
1. Estimate
from the fourfold-degenerate sites and the nondegenerate sites under the HKY85-F84 model using base (codon) frequencies from the real data. The estimated
is used in later steps.
2. Count the numbers of synonymous and nonsynonymous sites (S and N, respectively) using the estimated
and the observed base (codon) frequencies.
3. Choose starting values for t and
(e.g., using estimates from the NG method).
4. Count the numbers of synonymous and nonsynonymous differences (both transitions and transversions) using
, the codon frequencies, and the current values of t and
. The transition probability matrix P(t) is calculated by equation (12) and used to weight pathways when the two codons differ at more than one position. This step generates the proportions of transitional (T) and transversional (V) differences for each of the synonymous and nonsynonymous site classes.
5. Correct for multiple hits to calculate dS and dN using counts of sites and differences and base frequencies at synonymous and nonsynonymous sites. This step updates t and
: t = dS x 3S/(S + N) + dN x 3N/(S + N), and
= dN/dS.
6. Repeat steps 45 until the algorithm converges.
In general, two or three rounds of iteration are sufficient. Some variations of the above algorithm are possible. For example, one may use an estimate of
obtained externally. Furthermore, no iteration is needed if pathways are weighted equally when counting differences.
| Comparison of Methods for Estimating dS and dN |
|---|
|
|
|---|
We examine the performance of the following four methods for estimating dS and dN: ML (Goldman and Yang 1994
Two approaches are taken to evaluate the methods. The first examines infinitely long sequences and may be termed a "consistency analysis." Instead of the observed codons in the two sequences, the data consist of the expected frequencies (fij) of all 61 x 61 codon "site patterns," calculated using equation (5) for given parameters t,
,
, and codon frequencies
j. ML estimates are known to be statistically consistent when the model is correct (Stuart, Ord, and Arnold 1999
, chapter 18). Since the dN and dS rates (and their ratio) are defined as functions of parameters t,
,
, and
j, ML estimates of dN and dS will also be consistent. Approximate methods, including the method of this paper, involve ad hoc approximations and in general do not give the true values as estimates. They are statistically inconsistent. However, a good approximate method should not deviate too far from the truth with an infinite amount of data. The second approach we take is computer simulation. Finite data sets are generated by simulation and then analyzed by different methods to examine their biases and sampling variances.
We examine effects of the transition/transversion rate ratio (
), base (codon) frequencies, and the selective pressure on the gene reflected in parameter
. We initially fix t = 1 nucleotide substitution per codon, although the effect of sequence divergence is examined later. For a neutral gene (
= 1), this translates to 1/3 synonymous and 1/3 nonsynonymous substitutions per site. Three values of
are considered:
= 1 (no selection),
= 0.3 (purifying selection) and
= 3 (positive selection). Estimates of
(dN/dS) from real data vary widely from gene to gene, and
= 0.3 appears to represent moderate purifying selection (see, e.g., Ohta 1995
; Li 1997
; Yang and Nielsen 1998
; Eyre-Walker and Keightley 1999
). There are not many genes under positive selection, but estimates at about
= 3 are found in real data (e.g., Lee, Ota, and Vacquier 1995
; Messier and Stewart 1997
). Three sets of base frequencies at codon positions are used. The first set has equal base (codon) frequencies. The second set is from primate mitochondrial protein-coding genes and has very biased base frequencies. The third set is from HIV env genes (Table 2
). The universal genetic code is used in both the consistency analysis and the computer simulation.
|
Consistency Analysis Using Infinite Data
Consistency is the property that the estimate converges to the true value of the parameter as the amount of data approaches infinity. While consistency is a weak requirement, the approximate methods examined here are all inconsistent. It is nevertheless interesting to examine which steps of the approximate methods (i.e., counting sites, counting differences, and correcting for multiple hits) cause the bias. This is relatively easy since infinite data do not involve sampling errors, and estimates of sites (S and N) and rates (dS and dN) can be directly compared with the correct values.
Estimates of
by different methods are plotted against the transition/transversion rate ratio
for different values of the dN/dS ratio (
). Results for the three sets of codon frequencies are shown in Figure 2AI.
|
Equal Codon Frequencies
We consider the NG method first. When base (codon) frequencies are equal and transition and transversion rates are equal (
= 1), assumptions of the NG method are largely satisfied. In this case, NG indeed gives estimates close to the true values. Estimates of
given by NG when
= 1 are 1.001, 0.318, and 2.523 for the true
= 1, 0.3, and 3, respectively (Fig. 2AC
). The method is biased toward 1 when the true
1 due to its use of equal weighting of pathways when counting sites. When there is transition bias (
> 1), NG underestimates the
ratio, and the bias is more serious when the transition bias is more extreme. This bias is mainly generated in the step of counting sites.
The case of
= 10 is explored in Table 3
, which shows that NG substantially underestimates
and gives 0.669, 0.216, and 1.812 for
= 1, 0.3, and 3, respectively. In this case, the proportion of synonymous sites (S%) should be 33%, but NG (assuming
= 1) gives 26% (Yang and Nielsen 1998
, Fig. 3
). Use of equal weighting (assuming
= 1) in counting differences by NG tends to bias the estimate of
toward 1. However, compared with the bias in counting sites, the bias in counting differences is much less important because there may not be many pairs of codons that are different at two or three positions and because different pathways may involve the same numbers of synonymous and nonsynonymous changes. As mentioned above, NG underestimates S considerably (25.5/32.5 = 0.785) when
= 10, and almost all of this underestimation is translated into the overestimation of dS (0.3333/0.4134 = 0.806 for
= 1). For similar reasons, the underestimation of
by the NG method is more serious for
= 3 than for
= 0.3 (Fig. 2B and C
). In the latter case, equal weighting assuming
= 1 in the NG method counterbalances the effect of ignoring
in counting sites, while in the former case, the two biases are in the same direction (Table 3
).
|
|
Inas (1995)
for large values of
(Fig. 2AC
). For small values of
, the method tends to overestimate
when
< 1 and to underestimate
when
> 1. This pattern appears to be due to the use of equal weighting of pathways in counting differences. The method of this paper underestimates
slightly when the transition/transversion bias is weak (that is, when
is close to 1) and when
1 (Fig. 2BC
).
Primate Mitochondrial Codon Frequencies
Results obtained using base frequencies at the three codon positions from the primate mitochondrial genes (see Table 2
) are shown in Figure 2DF.
Estimates of S, dS, and dN for
= 10 are shown in Table 3
. The results are very different from those of Figure 2AC
under equal codon frequencies. Except for small values of
(
< 2), Inas method performs more poorly than NG. The two methods give very different counts of sites (S). While transition bias always leads to more synonymous sites, the effect of base frequency bias is more complicated. Extreme codon usage bias can cause the proportion of synonymous sites to range from 0% (e.g., when only codons TTC and TTA are present in the sequences) to 100% (e.g., when only codons CTT, CTC, CTA, and CTG are present). There tend to be more synonymous sites if the two most frequent nucleotides at third positions are both purines or both pyrimidines.
For the mitochondrial genes, S is much smaller than expected under equal base frequencies, causing NG to overestimate rather than underestimate S. For example, NG gives S% = 26.7%, which is higher than the correct value at either
= 1 (23.1%) or
= 10 (23.7%) (Table 3
). The overestimation of S caused by ignoring the base frequency bias more than compensates for the underestimation caused by ignoring the transition bias. As a result, NG overestimates
when
= 1 (with estimates from 1.1 to 1.3; Fig. 2D
) or
= 0.3 (with estimates from 0.42 to 0.46; Fig. 2E
). When
= 3, equal weighting of pathways (assuming
= 1) in counting differences combined with the assumption of no transition bias (
= 1) in counting sites cancels the effect of the base frequency bias in counting sites, such that NG produces a quite reliable estimate of
(Fig. 2F
). Inas (1995)
method, by considering the transition bias alone and ignoring the base/codon frequency bias, substantially overestimates the proportion of synonymous sites and overestimates
(Table 3
and Fig. 2DF
). Nevertheless, it should be noted that the observed pattern depends on the particular set of codon frequencies. For frequencies from other data sets, NG may be considerably worse than Inas method.
The method of this paper is slightly better than NG for
= 1 although the two methods have opposite biases. When
= 0.3, the new method has little bias. When
= 3, the new method underestimates
, with estimates from 2.5 to 2.6. Since the new method counts sites correctly (see Table 3
), the bias must be due to counting of differences and correction for multiple hits. Table 3
suggests that the new method underestimates both dS and dN, but the underestimation of dN is more serious, leading to underestimation of the
ratio. Apart from the case in which
= 3, the new method is better than both NG and Inas method.
HIV env Codon Frequencies
Figure 2GI
shows estimates of
when base/codon frequencies from the HIV envelope genes (see Table 2
) are used. Base frequencies in this gene are less biased than are those in the mitochondrial genes, and the effect of ignoring the base frequency bias is minor. For example, the correct proportion of synonymous sites at
= 1 is 21.9%, while NG gives 24.0%, with very slight overestimation. Patterns in Figure 2GI
are quite similar to those for equal codon frequencies (Fig. 2AC
). Exactly at
= 1, NG gives the estimates 1.105, 0.371, and 2.554 when the true
= 1, 0.3, and 3, respectively. The estimates are biased toward 1, mainly due to the use of equal weighting in counting differences. When
= 10, NG underestimates the proportion of synonymous sites (24.0% vs. the correct value, 28.5%). The bias is not as extreme as that in the case of equal codon frequencies, as unequal base/codon frequencies appear to counterbalance the effect of transition bias to some extent (Tables 3 and 4
).
Inas (1995)
method overestimates the
ratio because it ignores the base frequency bias and thus overestimates the number of synonymous sites. The bias is not as extreme as it is for mitochondrial genes. The new method gives estimates very close to the true values for
= 1 and
= 0.3. When
= 3, the new method slightly underestimates the ratio, as in the case of mitochondrial codon frequencies.
Computer Simulations
The data consist of a pair of codon sequences and are simulated by sampling codon site patterns from the multinomial distribution specified by the site pattern probabilities fij (eq. 5). The sequence has Lc = 100 or 500 codons. Three values of
are used: 1 (no bias), 2 (small bias), and 20 (large bias). Most estimates of
from nuclear genes are in the range (1.5, 5), so a value of 2 is typical. Estimates from mitochondrial genes vary considerably among data sets, from 2 or 3 to over 100.
The averages of the
estimates among simulated replicates are listed in Table 4
for the three sets of codon frequencies. Standard errors for the ML estimates are also presented, while those for other methods (not shown) are very small due to the use of many more replicates. Averages of dS and dN are calculated for all methods but not shown. We note that the simulation results are highly consistent with those found for infinite data, discussed above. For example, if a method gives estimates smaller than the true value in infinite data, it tends to have negative biases in finite samples as well.
|
ML estimates are known to be often biased in small samples. Table 4 shows that MLEs of
are nearly unbiased when
= 1 or 0.3 for all three sets of codon frequencies and for all values of
. However, it is biased to larger values when
= 3. Although the bias is small in large genes (with 500 codons), it can be quite large for small genes (Lc = 100), especially when
is small.
The NG method has little bias if codon frequencies are equal and if there is no transition/transversion bias (
= 1). When
1, the method tends to bias toward 1 due to its use of equal weighting in counting sites. The bias is nevertheless small. These results agree well with previous simulations by Ota and Nei (1994)
and Muse (1996)
, who used similar simple models to examine the performance of NG. However, NG is biased in most other parameter combinations. The biases in general agree with findings of the consistency analysis (Fig. 2
). In particular, ignoring the transition/transversion bias leads to underestimates of
and ignoring unequal base frequencies leads to overestimates of
. For equal codon frequencies and no selection (
= 1), NG gives severe underestimates of
when
is large. The effects of the transition/transversion rate bias and base frequency biases tend to cancel each other, such that NG has smaller biases than ML when
= 3 for the mitochondrial codon frequencies. In almost all other cases, ML has smaller biases than NG.
Inas (1995)
method has small biases when base frequencies are equal. The method tends to overestimate
when
< 1 and to underestimate
when
> 1, probably due to its use of equal weighting in counting sites. This is the same pattern as that found in infinite data (Fig. 2AC
). Inas method considerably overestimates
for all values of
under the mitochondrial codon frequencies, probably because it overestimates the number of synonymous sites. For the HIV env codon frequencies, the method overestimates
when
1 and underestimates
when
> 1, as noted for infinite data (Fig. 2GI
).
The new method appears to have little bias over most of the parameter space examined (Table 4
). When
1, it is less biased than NG or Inas (1995)
method. When
= 3, it tends to overestimate
in small samples, like the ML method, but the bias seems smaller than that of ML. The new method appears to provide a close approximation of ML over the range of parameter values examined.
Since all methods are biased for at least some parameter combinations, the mean squared error (MSE) may be an appropriate criterion by which to compare methods. The MSE of a parameter estimator is defined as MSE() = E( -
)2, where
is the true value. Since MSE() = Var() + [E() -
]2, this measures both bias and variance. The square root of the MSE is plotted in Figure 3
against the sequence length (number of codons). Two parameter combinations are considered. The first is for mitochondrial genes with a strong transition bias (
= 20) and purifying selection (
= 0.3), and the second is for the HIV env gene with moderate transition bias (
= 2) and positive selection (
= 3). In the first case (Fig. 3A
), ML and the method of this paper performed very similarly, and both have the smallest MSEs, while Inas (1995)
method has very large MSEs due to the positive bias of the method (Fig. 2E
). In the second case (Fig. 3B
), ML performed much worse than other methods for short genes (Lc < 300 codons) due to its large positive bias at
= 3, while for large genes (Lc > 500), ML and the new method are better than NG and Inas method. In both cases, the new method lies between NG and ML.
We also performed a small-scale simulation to examine the effect of sequence divergence level (t). The results are shown in Figure 4 . We examine two sets of parameter values, with the sequence length fixed at Lc = 500. In the first case (Fig. 4A
), equal codon frequencies are used with
= 2 and
= 0.3. The new method of this paper overestimates
at small divergences but underestimates
at large divergences. Other methods are insensitive to sequence divergence level. ML and the new method are less biased than NG and Inas method. In the second case (Fig. 4B
), mitochondrial codon frequencies are used with
= 20 and
= 0.3. In this case, ML and the new method have little bias over the whole range of the sequence divergence level. Note that the synonymous rate is quite high, with dS = 0.71 at t = 1, and dS = 1.1 at t = 1.5. NG and Inas method involve positive biases, and the biases become more serious when the sequences are more divergent. Although average estimates of both dN and dS by NG increase with t, dN increases at a faster rate, such that the
ratio increases with the increase of t. Muse (1996)
discussed the fact that at high sequence divergences, NG does not produce distance estimates linear with time.
|
| Comparison of Human and Orangutan Mitochondrial Genes |
|---|
|
|
|---|
The concatenated sequences of the 12 protein-coding genes on the H-strand of the mitochondrial genome from the human (Homo sapiens, GenBank accession number D38112) and the orangutan (Pongo pygmaeus p., GenBank accession number D38115) are compared using different methods. The results are shown in Table 5 . We also included the method of Li (1993)
estimated) increased the proportion of synonymous sites from 25% to 33%, and increased the dN/dS ratio from 0.093 to 0.130. Accounting further for the codon usage bias (F60,
estimated) decreased the proportion of synonymous sites to 28%, with an estimate of
= 0.045. The pattern is the same as that in the consistency analysis and the computer simulation discussed above. The results demonstrate that the estimation method is important for estimating dN and dS (and their ratio
) from real data and that methods accounting for both the transition bias and base/codon frequency bias should be used. This conclusion is consistent with Ina (1995)
|
Furthermore, we note that estimates from the NG method are similar to those of ML assuming no transition bias (
= 1 fixed) and no base frequency bias, and estimates obtained from Li (1993)
estimated) and no base frequency bias. Inas (1995)| Discussion |
|---|
|
|
|---|
The approximate method of this paper accounts for two major features of DNA sequence evolution: transition bias and base/codon frequency bias. The consistency analysis of infinite data and the computer simulation of finite data suggest that the new method has smaller biases than either NG or Inas (1995)
The ML method for pairwise comparison is less biased and has a lower MSE than the approximate methods for almost all parameter combinations. Only for short sequences and high
ratios does it involve a positive bias and perform more poorly than some of the approximate methods. We suggest that, in general, the ML method, which accounts for both the transition bias and the codon usage bias, should be the preferred method for estimating dS and dN between two sequences. Only in the case of very short sequences may it be advantageous to use simpler models. In the course of this study, we realized that correcting for biases involved in the NG method is extremely complicated, despite the fact that the method is well known for its simplicity. In contrast, ML is conceptually much simpler, mainly because the probability theory employed by the method takes care of the difficult tasks of weighting evolutionary pathways and correcting for multiple hits, with no need for ad hoc approximations. Specifically, the Chapman-Kolmogorov theorem (e.g., Grimmett and Stirzaker 1992
, p. 239) states that pij(t) =
k pik(s)pkj(t - s) for any 0
s
t; that is, the probability that codon i changes to codon j over time t is a sum over all possible codons (k) at any intermediate time point s. This obvious result ensures that the likelihood calculation (eqs. 46) accounts for all possible pathways of changes between the two codons, weighting them appropriately according to their relative probabilities of occurrence.
The major advantage of ML appears to lie in its flexibility in simultaneous comparison of multiple sequences, taking into account their phylogenetic relationship. Hypotheses concerning variable dN/dS ratios among lineages (Yang 1998
; Yang and Nielsen 1998
) or among sites (Nielsen and Yang 1998
) can be tested using the likelihood ratio test. The ML model can easily be extended to include important features of DNA sequence evolution such as the dependence of nonsynonymous rates on the chemical properties of the amino acids (Yang, Nielsen, and Hasegawa 1998
).
| Program Availability and Performance |
|---|
|
|
|---|
A C program implementing the approximate method of this paper will be included in the PAML package, available at http://abacus.gene.ucl.ac.uk/software/paml.html. On a fast Pentium II, each pairwise comparison takes about 1020 s by ML and a few seconds by the method of this paper. If pathways are weighted equally in counting differences in the new method, iteration will not be needed, and the method will be about as fast as other approximate methods such as NG, which seem to finish instantaneously.
| Acknowledgements |
|---|
|
|
|---|
We thank Hinrich Schulenburg, the two referees Keith Crandall and Spencer Muse, and Associate Editor Caro-Beth Stewart for many constructive comments. We thank X. Xia for the analysis using the method of Li (1993)
| Footnotes |
|---|
Caro-Beth Stewart, Reviewing Editor
1 Keywords: synonymous rate,
nonsynonymous rate,
approximate methods,
maximum likelihood,
molecular evolution,
adaptive evolution,
positive selection. ![]()
2 Address for correspondence and reprints: Ziheng Yang, Department of Biology, 4 Stephenson Way, London NW1 2HE, England. E-mail: z.yang{at}ucl.ac.uk ![]()
| References |
|---|
|
|
|---|
Akashi, H. 1995. Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:10671076.
Comeron, J. M. 1995. A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. J. Mol. Evol. 41:11521159.[Web of Science][Medline]
Crandall, K. A., and D. M. Hillis. 1997. Rhodopsin evolution in the dark. Nature 387:667668.
Eyre-Walker, A., and P. D. Keightley. 1999. High genomic deleterious mutation rates in hominoids. Nature 397:344347.
Gillespie, J. H. 1991. The causes of molecular evolution. Oxford University Press, Oxford, England.
Gojobori, T. 1983. Codon substitution in evolution and the "saturation" of synonymous changes. Genetics 105:10111027.
Goldman, N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182198.[Web of Science][Medline]
Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725736.[Abstract]
Grimmett, G. R., and D. R. Stirzaker. 1992. Probability and random processes. 2nd edition. Clarendon Press, Oxford, England.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174.[Web of Science][Medline]
Ina, Y. 1995. New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol. 40:190226.[Web of Science][Medline]
. 1996. Pattern of synonymous and nonsynonymous substitutions: an indicator of mechanisms of molecular evolution. J. Genet. 75:91115.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21123 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111120.[Web of Science][Medline]
. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England.
Lee, Y. H., T. Ota, and V. D. Vacquier. 1995. Positive selection is a general phenomenon in the evolution of abalone sperm lysin. Mol. Biol. Evol. 12:231238.[Abstract]
Lewontin, R. 1989. Inferring the number of evolutionary events from DNA coding sequence differences. Mol. Biol. Evol. 6:1532.[Abstract]
Li, W.-H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:9699.[Web of Science][Medline]
. 1997. Molecular evolution. Sinauer, Sunderland, Mass.
Li, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method for estimating synonymous and non-synonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150174.[Abstract]
Messier, W., and C.-B. Stewart. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151154.
Miyata, T., and T. Yasunaga. 1980. Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its applications. J. Mol. Evol. 16:2336.[Web of Science][Medline]
Moriyama, E. N., and J. R. Powell. 1997. Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J. Mol. Evol. 45:378391.[Web of Science][Medline]
Muse, S. V. 1996. Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13:105114.[Abstract]
Muse, S. V., and B. S. Gaut. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to chloroplast genome. Mol. Biol. Evol. 11:715724.[Abstract]
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426.[Abstract]
Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929936.
Ohta, T. 1995. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 405663.
Ota, T., and M. Nei. 1994. Variance and covariances of the numbers of synonymous and nonsynonymous substitutions per site. Mol. Biol. Evol. 11:613619.[Abstract]
Pamilo, P., and N. O. Bianchi. 1993. Evolution of the Zfx and Zfy genesrates and interdependence between the genes. Mol. Biol. Evol. 10:271281.[Abstract]
Pedersen, A.-M. K., C. Wiuf, and F. B. Christiansen. 1998. A codon-based model designed to describe lentiviral evolution. Mol. Biol. Evol. 15:10691081.[Abstract]
Perler, F., A. Efstratiadis, P. Lomedica, W. Gilbert, R. Kolodner, and J. Dodgson. 1980. The evolution of genes: the chicken preproinsulin gene. Cell 20:555566.
Stuart, A., K. Ord, and S. Arnold. 1999. Kendalls advanced theory of statistics. Vol. 2a, 6th edition. Arnold, London.
Tateno, Y., N. Takezaki, and M. Nei. 1994. Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. Mol. Biol. Evol. 11:261277.[Abstract]
Yang, Z. 1994a. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105111.
. 1994b. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306314.
. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568573.[Abstract]
. 1999. Phylogenetic analysis by maximum likelihood (PAML). Version 2. University College London, England.
Yang, Z., and R. Nielsen. 1998. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J. Mol. Evol. 46:409418.[Web of Science][Medline]
Yang, Z., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol. Biol. Evol. 15:16001611.[Abstract]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. R. Khan, J.-Y. Hu, S. Riss, C. He, and H. saedler MPF2-Like-A MADS-Box Genes Control the Inflated Calyx Syndrome in Withania (Solanaceae): Roles of Darwinian Selection Mol. Biol. Evol., November 1, 2009; 26(11): 2463 - 2473. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. G. Barrett, P. H. Thrall, P. N. Dodds, M. van der Merwe, C. C. Linde, G. J. Lawrence, and J. J. Burdon Diversity and Evolution of Effector Loci in Natural Populations of the Plant Pathogen Melampsora lini Mol. Biol. Evol., November 1, 2009; 26(11): 2499 - 2513. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Y. Han, K. C. Sizer, E. J. Thompson, J. Kabanja, J. Li, P. Hu, L. Gomez-Valero, and F. J. Silva Comparative Sequence Analysis of Mycobacterium leprae and the New Leprosy-Causing Mycobacterium lepromatosis J. Bacteriol., October 1, 2009; 191(19): 6067 - 6074. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. W. Wolf, A. Kunstner, K. Nam, M. Jakobsson, and H. Ellegren Nonlinear Dynamics of Nonsynonymous (dN) and Synonymous (dS) Substitution Rates Affects Inference of Selection Gen Biol Evol, September 4, 2009; 2009(0): 308 - 319. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-K. Oh, C. Young, M. Lee, R. Oliva, T. O. Bozkurt, L. M. Cano, J. Win, J. I.B. Bos, H.-Y. Liu, M. van Damme, et al. In Planta Expression Screens of Phytophthora infestans RXLR Effectors Reveal Diverse Phenotypes, Including Activation of the Solanum bulbocastanum Disease Resistance Protein Rpi-blb2 PLANT CELL, September 1, 2009; 21(9): 2928 - 2947. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-H. Kuo, N. A. Moran, and H. Ochman The consequences of genetic drift for bacterial genome complexity Genome Res., August 1, 2009; 19(8): 1450 - 1454. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Hao and G. B. Golding Does Gene Translocation Accelerate the Evolution of Laterally Transferred Genes? Genetics, August 1, 2009; 182(4): 1365 - 1375. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Heger, C. P. Ponting, and I. Holmes Accurate Estimation of Gene Evolutionary Rates Using XRATE, with an Application to Transmembrane Proteins Mol. Biol. Evol., August 1, 2009; 26(8): 1715 - 1721. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-Y. Jiang, A. Christoffels, R. Ramamoorthy, and S. Ramachandran Expansion Mechanisms and Functional Annotations of Hypothetical Genes in the Rice Genome Plant Physiology, August 1, 2009; 150(4): 1997 - 2008. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mallick, S. Gnerre, P. Muller, and D. Reich The difficulty of avoiding false positives in genome scans for natural selection Genome Res., May 1, 2009; 19(5): 922 - 933. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. C. Conant and P. F. Stadler Solvent Exposure Imparts Similar Selective Pressures across a Range of Yeast Proteins Mol. Biol. Evol., May 1, 2009; 26(5): 1155 - 1161. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. O'Brien, V. N. Minin, and M. A. Suchard Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences Mol. Biol. Evol., April 1, 2009; 26(4): 801 - 814. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Chae, S. Sudat, S. Dudoit, T. Zhu, and S. Luan Diverse Transcriptional Programs Associated with Environmental Stress and Hormones in the Arabidopsis Receptor-Like Kinase Gene Family Mol Plant, January 1, 2009; 2(1): 84 - 107. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hu and J. L. Blanchard Environmental Sequence Data from the Sargasso Sea Reveal That the Characteristics of Genome Reduction in Prochlorococcus Are Not a Harbinger for an Escalation in Genetic Drift Mol. Biol. Evol., January 1, 2009; 26(1): 5 - 13. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Cohen, N. D Rubinstein, A. Stern, U. Gophna, and T. Pupko A likelihood framework to analyse phyletic patterns Phil Trans R Soc B, December 27, 2008; 363(1512): 3903 - 3911. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Mukhopadhyay, S. Basak, and T. C. Ghosh Differential Selective Constraints Shaping Codon Usage Pattern of Housekeeping and Tissue-specific Homologous Genes of Rice and Arabidopsis DNA Res, December 1, 2008; 15(6): 347 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kryazhimskiy, G. A Bazykin, J. Plotkin, and J. Dushoff Directionality in the evolution of influenza A haemagglutinin Proc R Soc B, November 7, 2008; 275(1650): 2455 - 2464. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Tobias, G. Sarath, P. Twigg, E. Lindquist, J. Pangilinan, B. W. Penning, K. Barry, M. C. McCann, N. C. Carpita, and G. R. Lazo Comparative Genomics in Switchgrass Using 61,585 High-Quality Expressed Sequence Tags The Plant Genome, November 1, 2008; 1(2): 111 - 124. [Abstract] [Full Text] [PDF] |
||||
![]() |
D.-Q. Nguyen, C. Webber, J. Hehir-Kwa, R. Pfundt, J. Veltman, and C. P. Ponting Reduced purifying selection prevails over positive selection in human copy number variant evolution Genome Res., November 1, 2008; 18(11): 1711 - 1723. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. S. G. Oliveira, R. Raychoudhury, D. V. Lavrov, and J. H. Werren Rapidly Evolving Mitochondrial Genome and Directional Selection in Mitochondrial Genes in the Parasitic Wasp Nasonia (Hymenoptera: Pteromalidae) Mol. Biol. Evol., October 1, 2008; 25(10): 2167 - 2180. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Hughes, R. Friedman, P. Rivailler, and J. O. French Synonymous and Nonsynonymous Polymorphisms versus Divergences in Bacterial Genomes Mol. Biol. Evol., October 1, 2008; 25(10): 2199 - 2209. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Zhou, G. Zhang, Y. Zhang, S. Xu, R. Zhao, Z. Zhan, X. Li, Y. Ding, S. Yang, and W. Wang On the origin of new genes in Drosophila Genome Res., September 1, 2008; 18(9): 1446 - 1455. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Jiang, W. Guan, D. Pinney, W. Wang, and Z. Gu Relaxation of yeast mitochondrial functions after whole-genome duplication Genome Res., September 1, 2008; 18(9): 1466 - 1471. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Xia, Z. Fu, L. Hou, and J.-D. J. Han Impacts of protein-protein interaction domains on organism and network complexity Genome Res., September 1, 2008; 18(9): 1500 - 1508. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Go and Y. Niimura Similar Numbers but Different Repertoires of Olfactory Receptor Genes in Humans and Chimpanzees Mol. Biol. Evol., September 1, 2008; 25(9): 1897 - 1907. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao, H. Gu, K. A. Dunn, and J. P. Bielawski Likelihood-Based Clustering (LiBaC) for Codon Models, a Method for Grouping Sites according to Similarities in the Underlying Process of Evolution Mol. Biol. Evol., September 1, 2008; 25(9): 1995 - 2007. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Toft and M. A. Fares The Evolution of the Flagellar Assembly Pathway in Endosymbiotic Bacterial Genomes Mol. Biol. Evol., September 1, 2008; 25(9): 2069 - 2076. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kongchanagul, O. Suptawiwat, P. Kanrai, M. Uiprasertkul, P. Puthavathana, and P. Auewarakul Positive selection at the receptor-binding site of haemagglutinin H5 in viral sequences derived from human tissues J. Gen. Virol., August 1, 2008; 89(8): 1805 - 1810. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Irausquin and A. L. Hughes Distinctive pattern of sequence polymorphism in the NS3 protein of hepatitis C virus type 1b reflects conflicting evolutionary pressures J. Gen. Virol., August 1, 2008; 89(8): 1921 - 1929. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-T. Huang, F.-C. Chen, C.-J. Chen, H.-L. Chen, and T.-J. Chuang Identification and analysis of ancestral hominoid transcriptome inferred from cross-species transcript and processed pseudogene comparisons Genome Res., July 1, 2008; 18(7): 1163 - 1170. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Xu, C.-H. C. Cheng, P. Hu, H. Ye, Z. Chen, L. Cao, L. Chen, Y. Shen, and L. Chen Adaptive Evolution of Hepcidin Genes in Antarctic Notothenioid Fishes Mol. Biol. Evol., June 1, 2008; 25(6): 1099 - 1112. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Burri, H. N. Hirzel, N. Salamin, A. Roulin, and L. Fumagalli Evolutionary Patterns of MHC Class II B in Owls and Their Implications for the Understanding of Avian MHC Evolution Mol. Biol. Evol., June 1, 2008; 25(6): 1180 - 1191. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Jorba, R. Campagnoli, L. De, and O. Kew Calibration of Multiple Poliovirus Molecular Clocks Covering an Extended Evolutionary Range J. Virol., May 1, 2008; 82(9): 4429 - 4440. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Venditti, A. Meade, and M. Pagel Phylogenetic Mixture Models Can Reduce Node-Density Artifacts Syst Biol, April 1, 2008; 57(2): 286 - 293. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Houzelstein, I. R. Goncalves, A. Orth, F. Bonhomme, and P. Netter Lgals6, a 2-Million-Year-Old Gene in Mice: A Case of Positive Darwinian Selection and Presence/Absence Polymorphism Genetics, March 1, 2008; 178(3): 1533 - 1545. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Storz, F. G. Hoffmann, J. C. Opazo, and H. Moriyama Adaptive Functional Divergence Among Triplicated {alpha}-Globin Genes in Rodents Genetics, March 1, 2008; 178(3): 1623 - 1638. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang and R. Nielsen Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage Mol. Biol. Evol., March 1, 2008; 25(3): 568 - 579. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gao, M. G. Giansanti, G. J. Buttrick, S. Ramasubramanyan, A. Auton, M. Gatti, and J. G. Wakefield Australin: a chromosomal passenger protein required specifically for Drosophila melanogaster male meiosis J. Cell Biol., February 6, 2008; 180(3): 521 - 535. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Nystedt, A. C. Frank, M. Thollesson, and S. G. E. Andersson Diversifying Selection and Concerted Evolution of a Type IV Secretion System in Bartonella Mol. Biol. Evol., February 1, 2008; 25(2): 287 - 300. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Ding, Y. Sun, H. Li, Z. Wang, H. Fan, C. Wang, D. Yang, and Y. Li EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon information Nucleic Acids Res., January 11, 2008; 36(suppl_1): D255 - D262. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Streisfeld and M. D. Rausher Relaxed Constraint and Evolutionary Rate Variation between Basic Helix-Loop-Helix Floral Anthocyanin Regulators in Ipomoea Mol. Biol. Evol., December 1, 2007; 24(12): 2816 - 2826. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Gilchrist Combining Models of Protein Translation and Population Genetics to Predict Protein Production Rates from Codon Usage Patterns Mol. Biol. Evol., November 1, 2007; 24(11): 2362 - 2372. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Leach, Z. Zhang, C. Lu, M. J. Kearsey, and Z. Luo The Role of Cis-Regulatory Motifs and Genetical Control of Expression in the Divergence of Yeast Duplicate Genes Mol. Biol. Evol., November 1, 2007; 24(11): 2556 - 2565. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Takuno, R. Fujimoto, T. Sugimura, K. Sato, S. Okamoto, S.-L. Zhang, and T. Nishio Effects of Recombination on Hitchhiking Diversity in the Brassica Self-incompatibility Locus Complex Genetics, October 1, 2007; 177(2): 949 - 958. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. T. Gerrard and A. Meyer Positive Selection and Gene Conversion in SPP120, a Fertilization-Related Gene, during the East African Cichlid Fish Radiation Mol. Biol. Evol., October 1, 2007; 24(10): 2286 - 2297. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Petersen, J. P. Bollback, M. Dimmic, M. Hubisz, and R. Nielsen Genes under positive selection in Escherichia coli Genome Res., September 1, 2007; 17(9): 1336 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Radhakrishnan, M. A. Fares, F. S. French, and S. H. Hall Comparative genomic analysis of a mammalian {beta}-defensin gene cluster Physiol Genomics, August 20, 2007; 30(3): 213 - 222. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Krull, M. Petrusma, W. Makalowski, J. Brosius, and J. Schmitz Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs) Genome Res., August 1, 2007; 17(8): 1139 - 1145. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Gomez-Valero, E. P.C. Rocha, A. Latorre, and F. J. Silva Reconstructing the ancestor of Mycobacterium leprae: The dynamics of gene loss and genome reduction Genome Res., August 1, 2007; 17(8): 1178 - 1185. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wagner Rapid Detection of Positive Selection in Genes and Genomes Through Variation Clusters Genetics, August 1, 2007; 176(4): 2451 - 2463. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Win, W. Morgan, J. Bos, K. V. Krasileva, L. M. Cano, A. Chaparro-Garcia, R. Ammar, B. J. Staskawicz, and S. Kamoun Adaptive Evolution Has Targeted the C-Terminal Domain of the RXLR Effectors of Plant Pathogenic Oomycetes PLANT CELL, August 1, 2007; 19(8): 2349 - 2369. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang PAML 4: Phylogenetic Analysis by Maximum Likelihood Mol. Biol. Evol., August 1, 2007; 24(8): 1586 - 1591. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tarraga, I. Medina, L. Arbiza, J. Huerta-Cepas, T. Gabaldon, J. Dopazo, and H. Dopazo Phylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics Nucleic Acids Res., July 13, 2007; 35(suppl_2): W38 - W42. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-C. Chen, S.-M. Chaw, Y.-H. Tzeng, S.-S. Wang, and T.-J. Chuang Opposite Evolutionary Effects between Different Alternative Splicing Patterns Mol. Biol. Evol., July 1, 2007; 24(7): 1443 - 1446. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Kim, C. Faulk, and J. Kim Retroposition and evolution of the DNA-binding motifs of YY1, YY2 and REX1 Nucleic Acids Res., May 11, 2007; 35(10): 3442 - 3452. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Hakes, S. C. Lovell, S. G. Oliver, and D. L. Robertson Specificity in protein interactions and its relationship with sequence diversity and coevolution PNAS, May 8, 2007; 104(19): 7999 - 8004. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A Swann, S. J B Cooper, and W. G Breed Molecular evolution of the carboxy terminal region of the zona pellucida 3 glycoprotein in murine rodents Reproduction, April 1, 2007; 133(4): 697 - 708. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jagadeeshan and R. S. Singh Rapid Evolution of Outer Egg Membrane Proteins in the Drosophila melanogaster Subgroup: A Case of Ecologically Driven Evolution of Female Reproductive Traits Mol. Biol. Evol., April 1, 2007; 24(4): 929 - 938. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Kalamegham, D. Sturgill, E. Siegfried, and B. Oliver Drosophila mojoless, a Retroposed GSK-3, Has Functionally Diverged to Acquire an Essential Role in Male Fertility Mol. Biol. Evol., March 1, 2007; 24(3): 732 - 742. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Hoffman and E. Birney Estimating the Neutral Rate of Nucleotide Substitution Using Introns Mol. Biol. Evol., February 1, 2007; 24(2): 522 - 531. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Gomes, W. J. Bruno, A. Nunes, N. Santos, C. Florindo, M. J. Borrego, and D. Dean Evolution of Chlamydia trachomatis diversity occurs by widespread interstrain recombination involving hotspots Genome Res., January 1, 2007; 17(1): 50 - 60. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-C. Chen, C.-J. Chen, W.-H. Li, and T.-J. Chuang Human-specific insertions and deletions inferred from mammalian genome sequences Genome Res., January 1, 2007; 17(1): 16 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Shakhnovich and E. V. Koonin Origins and impact of constraints in evolution of gene families Genome Res., December 1, 2006; 16(12): 1529 - 1536. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P.C. Rocha, M. Touchon, and E. J. Feil Similar compositional biases are caused by very different mutational effects Genome Res., December 1, 2006; 16(12): 1537 - 1547. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Pie The Influence of Phylogenetic Uncertainty on the Detection of Positive Darwinian Selection Mol. Biol. Evol., December 1, 2006; 23(12): 2274 - 2278. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Metta, R. Gudavalli, J.-M. Gibert, and C. Schlotterer No Accelerated Rate of Protein Evolution in Male-Biased Drosophila pseudoobscura Genes Genetics, September 1, 2006; 174(1): 411 - 420. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Haerty and R. S. Singh Gene Regulation Divergence Is a Major Contributor to the Evolution of Dobzhansky-Muller Incompatibilities between Species of Drosophila Mol. Biol. Evol., September 1, 2006; 23(9): 1707 - 1714. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res., August 1, 2006; 16(8): 1017 - 1030. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. B. Plotkin, J. Dushoff, M. M. Desai, and H. B. Fraser Estimating Selection Pressures from Limited Comparative Data Mol. Biol. Evol., August 1, 2006; 23(8): 1457 - 1459. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. P. Tourova, E. M. Spiridonova, I. A. Berg, B. B. Kuznetsov, and D. Yu. Sorokin Occurrence, phylogeny and evolution of ribulose-1,5-bisphosphate carboxylase/oxygenase genes in obligately chemolithoautotrophic sulfur-oxidizing bacteria of the genera Thiomicrospira and Thioalkalimicrobium Microbiology, July 1, 2006; 152(7): 2159 - 2169. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liang, W. Zhou, and L. F. Landweber SWAKK: a web server for detecting positive selection in proteins using a sliding window substitution rate analysis. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W382 - W384. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Khatri, V. Desai, A. L. Tarca, S. Sellamuthu, D. E. Wildman, R. Romero, and S. Draghici New Onto-Tools: Promoter-Express, nsSNPCounter and Onto-Translate. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W626 - W631. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Abhiman, C. O. Daub, and E. L. L. Sonnhammer Prediction of Function Divergence in Protein Families Using the Substitution Rate Variation Parameter Alpha Mol. Biol. Evol., July 1, 2006; 23(7): 1406 - 1413. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Cui, P. K. Wall, J. H. Leebens-Mack, B. G. Lindsay, D. E. Soltis, J. J. Doyle, P. S. Soltis, J. E. Carlson, K. Arumuganathan, A. Barakat, et al. Widespread genome duplications throughout the history of flowering plants Genome Res., June 1, 2006; 16(6): 738 - 749. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Larson, C. G. Kumar, R. E. Everts, C. A. Green, A. Everts-van der Wind, M. R. Band, and H. A. Lewin Discovery of eight novel divergent homologs expressed in cattle placenta Physiol Genomics, May 16, 2006; 25(3): 405 - 413. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Davey, I. R. Paton, Y. Yin, M. Schmidt, F. K. Bangs, D. R. Morrice, T. G. Smith, P. Buxton, D. Stamataki, M. Tanaka, et al. The chicken talpid3 gene encodesa novel protein essentialfor Hedgehog signaling. Genes & Dev., May 15, 2006; 20(10): 1365 - 1377. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Chen and L. D. Stein Conservation and functional significance of gene topology in the genome of Caenorhabditis elegans Genome Res., May 1, 2006; 16(5): 606 - 617. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-H. Kim and S. V. Yi Correlated Asymmetry of Sequence and Functional Divergence Between Duplicate Proteins of Saccharomyces cerevisiae Mol. Biol. Evol., May 1, 2006; 23(5): 1068 - 1075. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Huelsenbeck, S. Jain, S. W. D. Frost, and S. L. K. Pond A Dirichlet process model for detecting positive selection in protein-coding DNA sequences PNAS, April 18, 2006; 103(16): 6263 - 6268. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Mueller Evolutionary Rates, Divergence Dates, and the Performance of Mitochondrial Genes in Bayesian Phylogenetic Analysis Syst Biol, April 1, 2006; 55(2): 289 - 300. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Hall, G. C. Kettler, and D. Preuss Dynamic evolution at pericentromeres Genome Res., March 1, 2006; 16(3): 355 - 364. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. E. Popescu, T. Borza, J. P. Bielawski, and R. W. Lee Evolutionary Rates and Expression Level in Chlamydomonas Genetics, March 1, 2006; 172(3): 1567 - 1576. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-C. Chen, S.-S. Wang, C.-J. Chen, W.-H. Li, and T.-J. Chuang Alternatively and Constitutively Spliced Exons Are Subject to Different Evolutionary Forces Mol. Biol. Evol., March 1, 2006; 23(3): 675 - 682. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-Y. Jiang and S. Ramachandran Comparative and evolutionary analysis of genes encoding small GTPases and their activating proteins in eukaryotic genomes Physiol Genomics, February 23, 2006; 24(3): 235 - 251. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Wilson, S. Flibotte, P. I. Missirlis, M. A. Marra, S. Jones, K. Thornton, A. G. Clark, and R. A. Holt Identification by full-coverage array CGH of human DNA copy number increases relative to chimpanzee and gorilla Genome Res., February 1, 2006; 16(2): 173 - 181. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Sweeney, M. J. Thomson, B. E. Pfeil, and S. McCouch Caught Red-Handed: Rc Encodes a Basic Helix-Loop-Helix Protein Conditioning Red Pericarp in Rice PLANT CELL, February 1, 2006; 18(2): 283 - 294. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Khan, A. Smit, and S. Boissinot Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates Genome Res., January 1, 2006; 16(1): 78 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Csuros and I. Miklos Statistical Alignment of Retropseudogenes and Their Functional Paralogs Mol. Biol. Evol., December 1, 2005; 22(12): 2457 - 2471. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Zhang, C. Chen, L. Li, L. Meng, J. Singh, N. Jiang, X.-W. Deng, Z.-H. He, and P. G. Lemaux Evolutionary Expansion, Gene Structure, and Expression of the Rice Wall-Associated Kinase Gene Family Plant Physiology, November 1, 2005; 139(3): 1107 - 1124. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. P. Byrne and K. H. Wolfe The Yeast Gene Order Browser: Combining curated homology and syntenic context reveals gene fate in polyploid species Genome Res., October 1, 2005; 15(10): 1456 - 1461. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah, Y. Zhang, D. Blankenberg, I. Albert, J. Taylor, et al. Galaxy: A platform for interactive large-scale genome analysis Genome Res., October 1, 2005; 15(10): 1451 - 1455. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-M. Chang, Y. Lu, and M. D. Rausher Neutral Evolution of the Nonbinding Region of the Anthocyanin Regulatory Gene Ipmyb1 in Ipomoea Genetics, August 1, 2005; 170(4): 1967 - 1978. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Rowen, E. Williams, G. Glusman, E. Linardopoulou, C. Friedman, M. E. Ahearn, J. Seto, C. Boysen, S. Qin, K. Wang, et al. Interchromosomal Segmental Duplications Explain the Unusual Structure of PRSS3, the Gene for an Inhibitor-Resistant Trypsinogen Mol. Biol. Evol., August 1, 2005; 22(8): 1712 - 1720. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Pfeil, J. A. Schlueter, R. C. Shoemaker, and J. J. Doyle Placing Paleopolyploidy in Relation to Taxon Divergence: A Phylogenetic Analysis in Legumes Using 39 Gene Families Syst Biol, June 1, 2005; 54(3): 441 - 454. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Kosakovsky Pond and S. D. W. Frost Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection Mol. Biol. Evol., May 1, 2005; 22(5): 1208 - 1222. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



























