MBE Advance Access originally published online on October 19, 2005
Molecular Biology and Evolution 2006 23(2):372-379; doi:10.1093/molbev/msj043
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
A New Method for Estimating Nonsynonymous Substitutions and Its Applications to Detecting Positive Selection
Department of Ecology and Evolution, University of Chicago
E-mail: ciwu{at}uchicago.edu.
| Abstract |
|---|
|
|
|---|
The standard methods for computing the number of nonsynonymous substitutions (Ka) lump all amino acid changes into one single class, even though their rates of substitution vary by at least 10-fold (Tang et al., 2004). Classifying these changes by their physicochemical properties has not been suitably effective in isolating the fastest evolving classes of changes. We now propose to use the Universal index U of Tang et al. (2004) to classify the 75 elementary amino acid changes (codons differing by 1 bp) by their evolutionary exchangeability. Let Ki denote the Ka value of each class (i = 1, ..., 75 from the most to the least exchangeable). The cumulative Ki for the top 10 classes, denoted Kh (for high-exchangeability types), has two important properties: (1) Kh usually accounts for 25%30% of total amino acid changes and (2) when the observed number of amino acid substitutions is large, Kh is predictably twice the value of Ka. This shall be referred to as the twofold approximation. The new method for estimating Kh is applied to the comparisons between human and macaque and between mouse and rat. The twofold approximation holds well in these data sets, and the signature of positive selection can be more easily discerned using the Kh statistic than using Ka. Many genes with Ka/Ks > 0.5 can now be shown to have Kh/Ks > 1 and to have evolved adaptively, at least for the high-exchangeability group of amino acid changes.
Key Words: amino acid substitution evolutionary index positive selection
| Introduction |
|---|
|
|
|---|
The estimation of nucleotide changes is fundamental to molecular evolutionary studies (Li, 1997
There are many ways to decompose nonsynonymous changes into individual classes. For example, there have been attempts at classifying the 20 amino acids into groups according to their charge, polarity, volume, and so on. Amino acid substitutions within groups are considered conservative, whereas those between groups are radical (Zhang, 2000
). Several other measures such as Grantham's distance have also been proposed to quantify the differences between amino acids (Grantham, 1974
). However, it does not appear that amino acid changes so classified would have comparable evolutionary dynamics (Rand, Weinreich, and Cezairliyan, 2000
; Zhang, 2000
). Many amino acid changes classified as conservative in fact evolved slowly, whereas others so classified evolved much more rapidly (Yang, Nielsen, and Hasegawa, 1998
; Rand, Weinreich, and Cezairliyan, 2000
). The lack of consistency arose from the nonempirical nature of amino acid classifications.
An empirical system of classifying amino acid changes by their evolutionary exchangeability has recently been developed by Tang et al. (2004)
. Among the 20 amino acids, there are 190 possible changes, of which only 75 kinds can be substituted with a 1-bp change in the codons. Each of these 75 kinds of amino acid mutations is referred to as an elementary amino acid change. The remaining 115 kinds are composites of two or three elementary changes. In Tang et al. (2004)
, we estimated the evolutionary index (EI(i), i = 175) for each of the 75 kinds of changes. EI is the equivalent of the Ka/Ks ratio for each kind. Between closely related species, EIs can be accurately computed when a large number of DNA sequences are available.
This method of EI for amino acid changes differs from earlier systems in three important ways. First, EI is codon based, whereas earlier methods such as the PAM matrix (Dayhoff, Schwartz, and Orcutt, 1978
) were based on amino acid sequences. Second, EI is computed between closely related species, hence requiring a large number of DNA sequences. Third, EIs among different gene sets from diverse taxa have been shown to be highly correlated. This has led to the proposal of a universal measure of amino acid exchangeability, U. For any large data set, we only need to know the mean
in order to compute the expected EIs, which are linearly correlated with the constant scale, U.
One of the many reasons for separating synonymous and nonsynonymous changes (and for classifying amino acid substitutions) is to detect positive selection. If the Ka/Ks ratio is significantly greater than 1, the sequence evolution is usually interpreted to be driven by positive selection, on the assumption that synonymous changes are the proxy of neutral changes (Li, 1997
). Although synonymous changes are generally not neutral (Akashi, 1995
; Hellmann et al., 2003
; Lu and Wu, 2005
; discussed later), the Ka/Ks ratio remains relatively free of assumptions for inferring the action of natural selection.
The Ka/Ks > 1 test is perhaps overly stringent as it requires the acceleration in amino acid substitutions, due to positive selection at some sites, to overcompensate for the retardation at other sites due to negative selection. In this study, we demonstrate how grouping amino acid changes by their U-index and computing the associated Ka/Ks values can substantially augment the power of detecting the signature of positive selection. Moreover, we establish the statistical criteria for isolating the high-exchangeability amino acid pairs from the rest. The average Ka value of this group will be referred to as Kh, which generally accounts for 25%30% of total amino acid changes. We show that Kh on average is about twice the value of Ka at the genomic level. There are many cases where Kh/Ks > 1, indicating positive selection, whereas the traditional Ka/Ks does not reveal adaptive evolution.
| Materials and Methods |
|---|
|
|
|---|
DNA Sequences
The 2,369 pairs of orthologs between human and the macaque monkey were aligned using the DNASTAR Megalign program. The 1,306 orthologs between mouse and rat were taken from those used in Tang et al. (2004)
Evaluation of
in U Data Sets
In any U data set, Ki is assumed to equal exactly
where c is a scaling factor equal to the average Ka of the data set. The cumulative
for the first i classes is
where Lj is the estimated length for jth-type amino acid changes in the data set. Var(Ki) can be approximated as
where
(Li, 1997
). Between closely related species, Ki << 1 for i = 1, 2, ..., 75. Thus, Var(Ki) can be further approximated as
which then gives rise to
An optimal i (number of classes) can be found by maximizing the quantity
(where
)
![]() |
The optimal i at which
is maximized does not depend on the exact values of Li and c. In figure 1, we plot
versus i by choosing c = 0.05 (equivalently, the average Ka is equal to 0.05) and
for convenience because the general shapes are not affected by their values. Also
versus i is plotted in the same figure by choosing c/Ks = 0.5 (i.e., the average Ka/Ks of the data set is set to be 0.5). In our calculations, we used the genome-wide frequencies of sites for the 75 kinds of elementary nonsynonymous changes for rodents which are included in the Table S1 in the Supplementary Material online. Because mammalian genome codon frequencies are highly correlated and genome-wide ts/tv biases are similar, the exact genomes used for this purpose do not matter much.
|
| Results |
|---|
|
|
|---|
Computing Nonsynonymous Substitutions of the ith Kind, Ki (and
)The Ka value of a gene represents the average rate of amino acid substitution for that gene. Nonsynonymous substitutions between some types of amino acid changes may take place much faster than the average rate. Our objective is to isolate the highly exchangeable types of amino acid changes from the rest, based on our previous analysis of two species of yeast and two species of rodents (Tang et al., 2004
Thr in this case) and i = 75 the least exchangeable (Asp
Tyr) kind. Generally, the high-exchangeability pairs are more conservative in their physicochemical properties. For comparison, we also ranked the changes by (1) the increasing order of Grantham's distance and (2) rankings by random permutations.
We show below how the Ka value for the ith class of amino acid changes, referred to as Ki, is calculated. Ki/Ks is the EI of the ith class, EI(i), as defined by Tang et al. (2004)
. We also define
as the cumulative Ki value of the first i classes.
is thus the weighted average of Kj for j from 1 to i. Note that
is equivalent to Ka in the conventional calculation. The estimation we will introduce here is a simple counting method, as a first attempt. More elaborate statistical methods will have to be developed and evaluated at a later time.
Counting the Differences in the Synonymous Class and Each of the 75 Elementary Amino Acid Change Classes
We first obtain the observed differences for the ith-type elementary amino acid change, Ni (i=1, 2, ..., 75), as well as the total number of synonymous differences Ns. For those codons differing by 2 bp, there are two pathways; for example, TTTTTCTCC or TTTTCTTCC. We use the genome-wide EI (Tang et al., 2004
) to assign the probability for each of the two possible pathways. We disregard the codons differing by 3 bp because they are very rare between closely related species.
Counting the Synonymous and Nonsynonymous Sites
For genes with length L, we count the total number of synonymous sites, Ls, and sites for each type of elementary amino acid changes, Li (i = 1, 2, ..., 75). We assume mutations happen randomly on the sequences and obtain the frequency of changes from amino acid j to amino acid k (j = j or j
k); mutations to the stop codons are excluded. Given
we can then calculate Ls and Li. The counting method is similar to that in (Wyckoff, Wang, and Wu, 2000
) or Zhang (2000)
, except that we count the number of sites for each of the 75 elementary changes separately.
In enumerating the changes, the mutation pattern is important. We consider only the difference between transition (ts) and transversion (tv). Dagan, Talmor, and Graur (2002)
showed that the ts/tv ratio has a profound effect on the estimates of radical versus conservative amino acid substitutions. Fortunately, Rosenberg, Subramanian, and Kumar (2003)
recently showed the ts/tv ratio to be relatively constant in each genome. Therefore, the genome-wide ts/tv ratios (2.4 for primates and 1.7 for rodents) estimated from the fourfold-degenerate sites were used in the estimation.
Estimation of Ks, Ki, and 
Given the observed changes (Ns and Ni, i = 1, 75) and the estimated number of sites (Ls and Li), we can estimate Ks and Ki as Ns/Ls and Ni/Li, respectively. To account for multiple hits, we use the method of Jukes and Cantor (1969)
. Note that the method developed in this paper is intended for closely related species with less than 20% sequence divergence; thus, the following formulas were used:
![]() |
![]() |
the cumulative rate for the first i kinds of amino acid changes, is
![]() |
Variance of Ks, Ki, and Ki*
Between species with a sequence divergence of 20% or less, we suggest using the variance formulas of the method of Jukes and Cantor (1969)
.
![]() |
![]() |
![]() |
and
For larger data sets, we recommend bootstrapping as an empirical means for estimating the variance of Ki, using codons as the resampling units.
Application of the Ki Method to the Generalized U Data Sets
One may have expected Ki to vary from one data set to another (say, yeast vs. mammals), and the application of this estimation method to different data sets would yield different patterns. Fortunately, there is a strong correlation among Ki's. The salient finding of Tang et al. (2004)
is that EI(i)'s (=Ki/Ks) for different sets of genes from different taxa are highly correlated. Hence, a Universal index, U, was proposed (Table S1, Supplementary Material online) such that EIs from any data set would be highly correlated with U. For any data set with more than 20,000 amino acid changes, EI(i) can be approximated by
where
and
are, respectively, the (weighted) mean nonsynonymous and mean synonymous substitution rates of the entire data set. Ui's are given in Table S1 (Supplementary Material online). For such data sets, the correlation coefficient (r) between the observed EI(i) and the expected
is usually greater than 0.95. However, even smaller data sets with only 2,500 amino acid changes would still yield an r value of >0.85 (Tang et al., 2004
).
We shall refer to a collection of generalized data sets as U sets, for which
![]() | (1) |
may differ from one data set to another. In any U set (Tang et al., 2004
In figure 1, we show the decrease in
as i increases; for example,
(>1.1) is more than 2.2 times the value of
(=0.5 in fig. 1). The result suggests, not surprisingly, that the high-exchangeability classes of amino acid changes should be more revealing of positive selection than the entire set of changes. The question "what demarcates high- and low-exchangeability classes?" is addressed in the next section.
Defining Nonsynonymous Substitutions for the High-Exchangeability Class (Kh)
For small i's,
(such as
) in a generalized data set is much higher than the standard Ka but the standard deviation (SD) associated with the estimate of
is also relatively large. If we include more classes of amino acid changes (say, i > 30), the estimation error would be smaller but
would also decrease. There is a trade-off in the demarcation of high- and low-exchangeability classes. An optimal number of classes in this trade-off is about 1012, as shown below.
In figure 1, we plot
against i where
is the SD of
Note that
increases and reaches a plateau between i = 10 and i = 25. (The general shape and position of the plateau in figure 1 are not affected much by the total length of genes because we are comparing the relative values among different
) One may wish to maximize both
and
To do so, we recommend the i value to be between 10 and 12, corresponding to where the curve in figure 1 reaches the plateau.
In general, the top 1012 classes account for 25%30% of the total number of amino acid differences observed.
or
is slightly above or below twice the conventional Ka/Ks. We shall refer to this pattern as the "twofold approximation," which will be tested further by using published genomic data.
will be designated Kh (h for high exchangeability) from now on.
Estimation of High-Exchangeability Substitutions, Kh
in Primates and Rodents
We shall now apply the Ki (or Kh) method to the genomic sequences of primates (human vs. macaque monkey) and rodents (mouse vs. rat).
The Distribution of Ka/Ks in Primates and Rodents
In figures 2a and 2b, we show the distributions of the Ka/Ks ratios for primates and rodents. The weighted means are 0.176 and 0.148 for the primate and rodent comparisons, respectively. Only 21 out of 2,369 genes in the primate data show a Ka/Ks ratio greater than 1 (including 7 genes with Ks = 0). In rodents, 7 out of 1,306 genes have Ka/Ks > 1. In none of these comparisons is Ka > Ks significant. In other words, the conventional Ka/Ks analysis reveals little signature of positive selection in these data sets.
|
Ka versus Kh Among the Fastest Evolving Genes in Primates and Rodents
In either data set, we first analyzed the top 100 genes with the highest Ka values as shown in figure 3. The fastest evolving 100 genes in primates and rodents have a mean Ka/Ks ratio of 0.72 and 0.60, respectively. In figure 3, the cumulative
values for the concatenated sequences are plotted against i (thick black line). The curve in figure 3 decreases monotonically as i increases.
|
Among the conservative changes in primates (i < 20) (fig. 3a), the cumulative
ratios are greater than 1, hinting the action of positive selection. (Note that the ranking by i was independent of the data of figure 3a; it was determined in Tang et al. [2004]
value approaches 0.72, which is the mean Ka/Ks. The inclusion of more radical classes of amino acid changes, on which negative selection operates strongly, masks the signature of positive selection. We use Kh to designate
as noted; Kh/Ks is 1.494, about twice the Ka/Ks ratio of 0.72.
To show the significance of the difference between Kh/Ks and Ka/Ks, we randomly shuffled the i ranking and determined that the differences observed after the ranking is shuffled 1,000 times. For each i value, the highest 5% in
as well as the means are plotted in figure 3a. The dashed line also decreases monotonically because the SD of
decreases when i becomes larger with more samples. At each i, the observed
is always bigger than the highest 5% value in the randomized ranking scenarios. The observed excess is thus statistically significant for every i rank.
We also tested other ranking methods in the literature. The long-dashed line presents
calculations based on the ranking by Grantham's distance (Grantham, 1974
). The line moves up and down but never goes above 1. This irregular pattern confirms what has been reportedthat amino acid changes determined to be conservative by Grantham's distance are often pairs of low exchangeability (Yang, Nielsen, and Hasegawa, 1998
). Grantham's distance thus does not provide a suitable ranking of the substitution rate between amino acids. We obtained the same pattern for the top 100 genes in rodents (fig. 3b).
Ka versus Kh Among All Genes in Primates and Rodents
Although the Ki method is most useful when large number of substitutions can be analyzed, Kh can be applied to individual genes as well. For each individual gene, we calculated the Kh, Ka, and Ks values. In Table 1, we present 20 such examples, 10 from each data set. These are examples of longer genes and hence smaller stochastic fluctuations. In general, Kh is indeed larger than Ka for longer genes, but the variance for each individual gene is substantial. The new method is more informative when many genes are analyzed simultaneously, as shown below.
|
The scatter plot of Kh/Ks versus Ka/Ks for the 1,948 genes between human and macaque (excluding those with Ka = 0 or Ks = 0) is shown in figure 4a. Among those genes, only 14 have a Ka/Ks ratio greater than 1. In contrast, Kh/Ks is larger than 1 in 174 genes. Most data points in figure 4a are well above the 45° line; data points below the 45° line are generally genes with small Ka. We also analyzed 1,241 orthologs between rat and mouse, as shown in figure 4b. In this comparison, 7 genes show Ka/Ks greater than 1 but for Kh/Ks 92 genes do. In both panels, the slopes of the regression lines are above 2, indicating that Kh/Ks is at least twice the value of Ka/Ks on average.
|
It should be noted that Kh/Ks may often be larger than 1 due to the larger standard error (SE) of Kh, in addition to its larger mean. To remove, or at least reduce, the contribution of the larger SE, we examine the correlation between Kh SE(Kh) and Ka SE(Ka). Both data sets show that they are highly positively correlated with the correlation coefficient larger than 0.75, and on average the Kh SE(Kh) is approximately 1.7 times the value of Ka SE(Ka). Moreover, for those genes with Kh/Ks > 1, Kh 2SE(Kh) is on average 1.29 times higher than Ka 2SE(Ka) for rodents and 1.13 times higher for primates. These results suggest that the larger Kh/Ks ratios in most cases are indeed due to the larger mean in Kh.
The "Twofold Approximation" of Kh versus Ka
In the generalized data sets, we show Kh
2Ka. To test the idea that genes of sufficient size will reliably yield the twofold relationship, Kh
2Ka, we concatenated genes with similar Ka values. Of the 2,369 orthologs between human and macaque, we sorted the 1,955 genes with Ka > 0 by the descending order of Ka and concatenated every 50 orthologs into a supersequence. A total of 39 supersequences were obtained. In figure 5a, we plot Kh/Ks against Ka/Ks for these supersequences. The correlation coefficient r is 0.993, and the slope of the regression line is 2.005. Therefore, the Kh/Ks ratio is very close to twice the value of Ka/Ks, as shown in figure 1 for the generalized data set. We applied the same procedure to 1,200 orthologs between mouse and rat, creating 24 supersequences (each again being a 50 gene contig). In figure 5b, the correlation coefficient between Kh/Ks and Ka/Ks is 0.996, and the slope is 1.943. Again, the twofold approximation holds quite well.
|
In summary, Kh's in any sequence comparisons would fluctuate, but generally hover about two times of the corresponding Ka value. For longer sequences (or collection of sequences), the Kh
2Ka approximation holds well. | Discussion |
|---|
|
|
|---|
The study of coding sequence evolution generally relies on the distinction between synonymous and nonsynonymous changes. The latter is a heterogeneous class driven by forces that both accelerate and retard the rate of molecular evolution. The signals of positive and negative selection can be better resolved when nonsynonymous changes are properly classified. The power of the method lies in the empirical means of finding the right set of conservative amino acid changes. The relative ranking of most to least exchangeable replacements applies equally well to yeasts, Drosophila, plants, and mammals (Tang et al., 2004
The method proposed here requires genomic sequences from only two closely related species; hence, the influx of genomic data should make the method widely applicable. The downside of calculating Kh is the larger SE associated with only a subset of changes. (Although it is defined by 10 of the 75 types of elementary changes, Kh usually accounts for 25%30% of the amino acid substitutions.) In this study, a simple counting method is introduced. While it may be sufficient to accurately estimate the divergence between closely related species, the potential for more sophisticated methods, such as maximum likelihood (ML) estimation under the Bayesian framework, should not be overlooked.
By examining the top 10 classes (or about 25%) of amino acid changes, it becomes much easier to find cases where nonsynonymous changes outpace synonymous ones both at the genomic and genic levels. Because Ka > Ks is taken to indicate the action of positive selection in most methods (discussed below), Kh > Ks should be interpreted in the same manner. Although Ks has been known to be lower than the neutral substitution rate, by as much as 20% in some taxa (Akashi, 1995
; Hellmann et al., 2003
; Lu and Wu, 2005
), Ka (or Kh) > Ks remains a reasonable criterion for inferring positive selection. Without invoking positive selection on amino acid changes, the alternative interpretation would have to be that amino acid changes are subjected to weaker selective constraints than synonymous changes are. It seems more reasonable to assume that, on average, selective constraints on nonsynonymous changes are at least as strong as those on synonymous changes.
In figure 4 on mammalian genes, Kh fluctuates greatly among individual genes, but the regression corroborates the relationship of Kh
2Ka. A source of the fluctuation is codon composition. Different codon compositions yield different estimations due to the weight for each type of change. There are many other sources of variation, such as neighboring effect, gene structure, and selection constraints for different types of amino acid changes in different genes. Nevertheless, if each gene is sufficiently large, the codon composition will be close to the genome average and the relationship, Kh
2Ka will be approached.
There are several other approaches to detecting positive selection. A most widely used method is the site-by-site analysis of a set of DNA sequences (Nielsen and Yang, 1998
; Suzuki and Gojobori, 1999
). In this approach, the proportion of sites in the sequences under positive selection is estimated, often in the ML framework (Yang, 1998
; Yang and Nielsen, 2002
). Although the method has been applied to genomic sequences from as few as three species (Clark et al., 2003
), it is probably more suited to cases where a much larger number of taxa can be used. Some recent studies have also raised the issue of possible high rate of false positives in the more elaborate ML models (Zhang, 2004
). More recently, Massingham and Goldman (2005)
demonstrated an improved ML method, that is, sitewise likelihood ratio (SLR), for detecting nonneutral evolution. They showed that the SLR method can be more powerful, especially in difficult cases where the strength of selection is low. Most other approaches require additional functional (Hughes and Nei, 1988
; Wyckoff, Wang, and Wu, 2000
), chromosomal (Betancourt, Presgraves, and Swanson, 2002
; Lu and Wu, 2005
), or polymorphism data (McDonald and Kreitman, 1991
; Fay, Wyckoff, and Wu, 2002
) for inferring positive selection. A recent method that needs less information than our proposed method is the so-called "volatility measure" (Plotkin, Dushoff, and Fraser, 2004
) which uses only sequences from one single genome. The utility of this method in detecting selection, however, remains to be demonstrated (Chen, Emerson, and Martin, 2005
; Dagan and Graur, 2005
; Hahn et al., 2005
).
In conclusion, in addition to the conventional Ka and Ks estimates, the calculation of Kh may often yield additional information. This is especially true when one attempts to compare two genomic sequences. Although the statistical variation associated with Kh is larger than the conventional Ka, genes with high Kh/Ks ratios may reveal more evolutionary signature and can then be subjected to additional analysis (McDonald and Kreitman, 1991
; Yang, 1998
; Fay and Wu, 2000
) or experimentation (Greenberg et al., 2003
; Sun, Ting, and Wu, 2004
).
| Supplementary Material |
|---|
|
|
|---|
Supplementary Table S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
The authors wish to thank Hurng-Yi Wang for providing the alignments of primate orthologs used in this study. The authors also thank Alex Kondrashov, Wen-Hsiung Li, Martin Kreitman, and Jian Lu for comments and discussions. The work is supported by National Institutes of Health grants to C.-I.W.
| Footnotes |
|---|
Jianzhi Zhang, Associate Editor
| References |
|---|
|
|
|---|
Akashi, H. 1995. Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:10671076.[Abstract]
Betancourt, A. J., D. C. Presgraves, and W. J. Swanson. 2002. A test for faster X evolution in Drosophila. Mol. Biol. Evol. 19:18161819.
Chen, Y., J. J. Emerson, and T. M. Martin. 2005. Evolutionary genomics: codon volatility does not detect selection. Nature 433:E6E7[discussion E7E8].
Clark, A. G., S. Glanowski, R. Nielsen et al. (17 co-authors). 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:19601963.
Dagan, T., and D. Graur. 2005. The comparative method rules! Codon volatility cannot detect positive Darwinian selection using a single genome sequence. Mol. Biol. Evol. 22:496500.
Dagan, T., Y. Talmor, and D. Graur. 2002. Ratios of radical to conservative amino acid replacement are affected by mutational and compositional factors and may not be indicative of positive Darwinian selection. Mol. Biol. Evol. 19:10221025.
Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt 1978. A model of evolutionary change in proteins. National Biomedical Research Foundation, Washington.
Fay, J. C., and C. I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:14051413.
Fay, J. C., G. J. Wyckoff, and C. I. Wu. 2002. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:10241026.[CrossRef][Medline]
Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862864.
Greenberg, A. J., J. R. Moran, J. A. Coyne, and C. I. Wu. 2003. Ecological adaptation during incipient speciation revealed by precise gene replacement. Science 302:17541757.
Hahn, M. W., J. G. Mezey, D. J. Begun, J. H. Gillespie, A. D. Kern, C. H. Langley, and L. C. Moyle. 2005. Evolutionary genomics: codon bias and selection on single genomes. Nature 433:E5E6[discussion E7E8].[CrossRef][Medline]
Hellmann, I., S. Zollner, W. Enard, I. Ebersberger, B. Nickel, and S. Paabo. 2003. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 13:831837.
Hughes, A. L., and M. Nei 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167170.[CrossRef][Medline]
Jukes, T. H., C. R. Cantor. 1969. Evolution of protein molecules. P. 21 in H. N. Munroe, ed. Mammalian protein metabolism. Academic Press, New York.
Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Inc. Sunderland, Mass.
Li, W. H., C. I. Wu, and C. C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150174.[Abstract]
Lu, J., and C. I. Wu. 2005. Weak selection revealed by the whole-genome comparison of the X chromosome and autosomes of human and chimpanzee. Proc. Natl. Acad. Sci. USA 102:40634067.
Massingham, T., and N. Goldman. 2005. Detecting amino acid sites under positive selection and purifying selection. Genetics 169:17531762.
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652654.[CrossRef][Medline]
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426.[Abstract]
Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929936.
Plotkin, J. B., J. Dushoff, and H. B. Fraser. 2004. Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum. Nature 428:942945.[CrossRef][Medline]
Rand, D. M., D. M. Weinreich, and B. O. Cezairliyan. 2000. Neutrality tests of conservative-radical amino acid changes in nuclear- and mitochondrially-encoded proteins. Gene 261:115125.[CrossRef][ISI][Medline]
Rosenberg, M. S., S. Subramanian, and S. Kumar. 2003. Patterns of transitional mutation biases within and among mammalian genomes. Mol. Biol. Evol. 20:988993.
Sun, S., C. T. Ting, and C. I. Wu. 2004. The normal function of a speciation gene, Odysseus, and its hybrid sterility effect. Science 305:8183.
Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:13151328.[Abstract]
Tang, H., G. J. Wyckoff, J. Lu, and C. I. Wu. 2004. A universal evolutionary index for amino acid changes. Mol. Biol. Evol. 21:15481556.
Wyckoff, G. J., W. Wang, and C. I. Wu. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403:304309.[CrossRef][Medline]
Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568573.[Abstract]
Yang, Z., and J. P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496503.[CrossRef][Medline]
Yang, Z., and R. Nielsen. 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908917.
Yang, Z., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol. Biol. Evol. 15:16001611.[Abstract]
Zhang, J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50:5668.[ISI][Medline]
. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:13321339.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B.-Y. Liao and J. Zhang Null mutations in human and mouse orthologs frequently result in different phenotypes PNAS, May 13, 2008; 105(19): 6987 - 6992. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nei The new mutation theory of phenotypic evolution PNAS, July 24, 2007; 104(30): 12235 - 12242. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gojobori, H. Tang, J. M. Akey, and C.-I Wu Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution PNAS, March 6, 2007; 104(10): 3907 - 3912. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Doron-Faigenboim and T. Pupko A Combined Empirical and Mechanistic Codon Model Mol. Biol. Evol., February 1, 2007; 24(2): 388 - 397. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Cohen, J. Tirindelli, M. Gomez-Chiarri, and D. Nacci Functional implications of Major Histocompatibility (MH) variation using estuarine fish populations Integr. Comp. Biol., December 1, 2006; 46(6): 1016 - 1029. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


10 are separately analyzed. Note that
is slightly greater than 1 or twice the value of the average Ka/Ks.












