Molecular Biology and Evolution 19:49-57 (2002)
© 2002 Society for Molecular Biology and Evolution
Codon-Substitution Models to Detect Adaptive Evolution that Account for Heterogeneous Selective Pressures Among Site Classes
Galton Laboratory, Department of Biology, University College London;
Department of Molecular Biology and Genetics, Cornell University;
Department of Biology, University of California, Riverside
| Abstract |
|---|
|
|
|---|
The nonsynonymous to synonymous substitution rate ratio (
= dN/dS) provides a sensitive measure of selective pressure at the protein level, with
values <1, =1, and >1 indicating purifying selection, neutral evolution, and diversifying selection, respectively. Maximum likelihood models of codon substitution developed recently account for variable selective pressures among amino acid sites by employing a statistical distribution for the
ratio among sites. Those models, called random-sites models, are suitable when we do not know a priori which sites are under what kind of selective pressure. Sometimes prior information (such as the tertiary structure of the protein) might be available to partition sites in the protein into different classes, which are expected to be under different selective pressures. It is then sensible to use such information in the model. In this paper, we implement maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different
parameters for the partitions. The models, referred to as fixed-sites models, are also useful for combined analysis of multiple genes from the same set of species. We apply the models to data sets of the major histocompatibility complex (MHC) class I alleles from human populations and of the abalone sperm lysin genes. Structural information is used to partition sites in MHC into two classes: those in the antigen recognition site (ARS) and those outside. Positive selection is detected in the ARS by the fixed-sites models. Similarly, sites in lysin are classified into the buried and solvent-exposed classes according to the tertiary structure, and positive selection was detected at the solvent-exposed sites. The random-sites models identified a number of sites under positive selection in each data set, confirming and elaborating the results of the fixed-sites models. The analysis demonstrates the utility of the fixed-sites models, as well as the power of previous random-sites models, which do not use the prior information to partition sites. | Introduction |
|---|
|
|
|---|
The nonsynonymous/synonymous substitution rate ratio (
= dN/dS) provides a measure of selective pressure at the amino acid level. An
ratio greater than 1 means that nonsynonymous mutations offer fitness advantages and are fixed in the population at a higher rate than synonymous mutations. Positive selection can thus be detected by identifying cases where
significantly exceeds 1. Previous studies have most often calculated synonymous (dS) and nonsynonymous (dN) rates by averaging over all codons (amino acids). As many amino acids in a functional protein may be under strong structural and functional constraints, the average dN is rarely higher than the average dS. As a result, this approach of averaging rates over the entire sequence has little power in detecting positive selection (e.g., Endo, Ikeo, and Gojobori 1996
Recently Nielsen and Yang (1998)
and Yang et al. (2000)
extended the model of codon substitution of Goldman and Yang (1994)
(see also Muse and Gaut 1994
) to account for variable selective pressures among sites in the sequence. A statistical distribution is assumed for
ratios among sites. For example, the discrete model (M3) assumes three site classes, which have different
ratios. The proportions and
ratios for the site classes are estimated from the data by maximum likelihood. In such a model, we assume that there are several heterogeneous site classes but we do not know a priori which class each site is from. We refer to such models as random-sites models. Application of those models to real data sets has led to detection of positive selection in a number of genes, demonstrating the importance of accounting for variable selective pressures among sites (Zanotto et al. 1999
; Bishop, Dean, and Mitchell-Olds 2000
; Bielawski and Yang 2001
; Fares et al. 2001
; Ford 2001
; Haydon et al. 2001
; Peek et al. 2001
; Swanson et al. 2001
; see Yang and Bielawski 2000
for a review). Consistent with real data analysis, computer simulations also confirmed the power of those methods (Anisimova, Bielawski, and Yang 2001
).
Sometimes prior information is available to partition sites into classes, which are expected to have different selective pressures and thus different
ratios. In such cases, it is sensible to make use of such information and fit models that assign different
ratios for site classes. For example, Hughes and Nei (1988)
tested the hypothesis that amino acid residues at the antigen-recognition site (ARS) of the major histocompatibility complex (MHC) identified by Bjorkman et al. (1987a, 1987b)
might be under diversifying selection. In this case, residues in the MHC can be partitioned into two classes: those in the ARS region and those outside, and two independent
ratios can be used. Another possible use of such models is the combined analysis of multiple protein-coding genes from the same set of species to test for their similarities and differences in the substitution pattern. The models then have similarities to the relative-ratio test developed by Muse and Gaut (1997)
.
In this paper, we implement models that account for the heterogeneity of different site partitions, and refer to them as the fixed-sites models. We apply the new models to two well-documented genes, the MHC class I gene (Hughes and Nei 1988, 1989
; Hughes, Ota, and Nei 1990
) and the abalone sperm lysin gene (Lee, Ota, and Vacquier 1995
; Yang, Swanson, and Vacquier 2000
).
| Theory |
|---|
|
|
|---|
As outlined by Yang (2001)
ratio. The basic model of codon substitution specifies the relative substitution rate from codons i to j as
|
|
is the transition/transversion rate ratio and
j is the equilibrium frequency of codon j, calculated using the empirical nucleotide frequencies observed at the three codon positions, with nine parameters used (Goldman and Yang 1994
When we apply the model to data of partitioned sites, we use different
ratios, and thus different Q matrices, for sites from different partitions. Similarly we can allow other parameters to differ between site partitions. These models are structurally similar to models of nucleotide substitution of Yang (1996)
, which account for different transition/transversion rate ratios, different base frequencies, and different levels of among-site rate variation among prior partitions of sites, for example, the three codon positions. Here we also implement several models to accommodate different levels of site heterogeneity (table 1 ). The simplest model assumes that all sites in the sequence have the same substitution pattern with identical parameters (model A in table 1
). Parameters in the model include the b branch lengths, the transition/transversion rate ratio
, the nonsynonymous/synonymous rate ratio
, and the nine parameters for the codon frequencies, with b + 11 parameters in total. The most complex model (model F in table 1
) assumes that all site partitions have different substitution patterns with independent substitution parameters. This model is equivalent to analyzing data of different partitions as separate data sets and summing up the log-likelihood values. For g partitions, the model has g x (b + 11) parameters. Models BE lie in between these two extremes, and assume proportional branch lengths among partitions. Branch lengths for partition k are rk times those for the first partition (r1 = 1). Thus b + (g - 1), instead of b x g, parameters are used to specify all branch lengths for the site partitions. Apart from the different substitution rates, model B (table 1 ) assumes homogeneity among partitions in the transition/transversion rate ratio
, the nonsynonymous/synonymous rate ratio
, and the codon frequencies. Model C assumes proportional branch lengths, identical
and
, but different codon frequencies among partitions. Model D assumes proportional branch lengths, different
and
, but identical codon frequencies among partitions. Model E assumes proportional branch lengths, different
and
, and different codon frequencies among partitions. These models are implemented in the PAML program package (Yang 1997
); see table 1
for details.
|
The likelihood ratio test can be used to compare those models to test interesting hypotheses. For example, comparison between models A and B is a test of the hypothesis that the overall rate of nucleotide substitution is the same among partitions. The
2 distribution with d.f. = g - 1 can be used. Similarly, comparison of models C and E is a test of the hypothesis that
and
are identical across partitions. This comparison accounts for possible differences in codon usage among partitions. | Analysis of Class I MHC Alleles and Abalone Sperm Lysin Genes |
|---|
|
|
|---|
We analyze two data sets to compare the fixed-sites models implemented in this paper and the random-sites models developed earlier (Nielsen and Yang 1998
25) sequences are available for both genes, permitting sensible phylogenetic comparisons, and crystal structures are available for representative proteins. Additionally, for the MHC, structural analyses have predicted sites that may be subjected to positive selection. These features allow for the amino acid sites in both proteins to be partitioned a priori, so that the models developed in this paper can be applied.
Class I MHC
The class I MHC glycoprotein recognizes and binds foreign peptides. The apparent selective force acting upon the MHC is to recognize and bind a large number of foreign peptides. Based on the crystal structure, different domains of the MHC have been characterized. The ARS is the cleft that binds foreign antigens (Bjorkman et al. 1987a, 1987b
). The identification of the ARS enabled previous researchers to partition the data into ARS and non-ARS sites and to demonstrate positive selection in the ARS (Hughes and Nei 1988, 1989
). Without partitioning the data, positive selection was not detected in pairwise comparisons averaging rates over the entire sequence. Therefore, the MHC makes an ideal test case for maximum likelihood analyses of partitioned data. We compiled and aligned 192 alleles of the human class I MHC from the A, B, and C loci. The alignment is available from the authors upon request. Alignment gaps were removed, with 270 codons left in each sequence. We used the maximum likelihood method to estimate pairwise distances under the codon-substitution model (Goldman and Yang 1994
), and then used the neighbor-joining method (Saitou and Nei 1987
) to construct a tree topology, which is used in later analysis. The tree topology was found to have little effect on the analysis in previous studies (e.g., Yang et al. 2000
; Ford 2001
), and in this paper we ignore the uncertainty of the tree topology.
First, we applied the random-sites models (Nielsen and Yang 1998
; Yang et al. 2000
) to the data. The results are presented in table 2
. Model M0 assumes one
ratio for all sites. The log likelihood is
= -8225.16, with the estimate
= 0.612. This is an average over all sites in the protein and all lineages in the tree, and indicates the dominating role of purifying selection in the evolution of the MHC. Model M1 (neutral) assumes two site classes in the sequence: the conserved sites with
0 = 0 and the neutral sites with
1 = 1. This model has the same number of parameters as M0 (one-ratio) but fitted the data much better, with a log likelihood
= -7719.46. Model M2 (selection) adds another site class to M1 (neutral), with a free
ratio estimated from the data, thus allowing for the possibility of positive selection. Parameter estimates suggest that about 10% of sites are under positive selection with
2 = 8.1 (table 2
). This model fits the data much better than the neutral model; the test statistic is 2
= 2 x (-7296.69 - (-7719.46)) = 845.54, compared with the
2 distribution with d.f. = 2. Model M3 (discrete) assumes three site classes with the proportions (p0, p1, p2) and
ratios (
0,
1,
2) estimated from the data. The estimates suggest that the majority of sites are under purifying selection with
0 = 0.07, but about 9% of sites are under strong diversifying selection with
2 = 6.0. M3 fits the data significantly better than any of the simpler models M0, M1, or M2. Model M7 (beta) assumes a beta distribution of
over sites. The beta distribution can take a variety of shapes although it is limited to the interval (0, 1). So it provides a flexible null model for testing positive selection. The estimated distribution B(0.103, 0.354) has an extreme U shape, with most of the sites having
close to either 0 or 1. Model M8 (beta &
) adds an extra site class to M7 (beta) with a free
ratio estimated from the data. The estimates suggest that about 10% of sites are under diversifying selection with
= 5.1. The likelihood ratio test comparing M7 (beta) and M8 (beta &
) has the statistic 2
= 2 x (-7232.68 - [-7498.97]) = 2 x 266.29 = 532.58, much greater than a
2 significance value at d.f. = 2. Summing up, the random-sites models demonstrate extreme variability in selective pressure among sites in the MHC and the presence of a number of sites under diversifying selection. Sites inferred to be under positive selection are listed in table 2
. The posterior probabilities and posterior means for sites are shown in figure 1 . Inferred sites are also mapped onto the crystal structure in figure 2
. It is noteworthy that the sites inferred to be under positive selection are scattered along the primary sequence, but are all clustered in the ARS in the crystal structure (fig. 2 ).
|
|
|
To apply the fixed-sites models of this paper, we partitioned amino acid sites in the MHC into two classes: those located outside the ARS and those within, based on structural studies of Bjorkman et al. (1987a, 1987b
Table 3
lists results obtained under the fixed-sites models. The simplest model (model A in table 3
) assumes no site heterogeneity and gives
A = -8225.16. Allowing for different substitution rates for the two partitions (model B in table 3
) gave
B = -7790.10. This is a dramatic improvement of 435.06 log-likelihood units upon adding a single parameter (r2). The estimate r2 indicates that the substitution rate in the ARS is 6.5 times as high as outside the ARS. Model C further allows for different codon frequencies for the two partitions, by using nine additional parameters for base frequencies at the three codon positions. The log likelihood increased by
C -
B = -7767.77 - (-7790.10) = 22.33. While statistically significant, this is not a very big improvement. Model D uses different
and
but the same codon frequencies for the two partitions. It has two more parameters than model B and fits the data much better; the likelihood ratio statistic is 2
= 2(
D -
B) = 2 x ([-7691.57] - [-7790.10]) = 197.06. Variation in
and
between the partitions is much more important to the fit of the model than variation in the codon frequencies. Model E assumes different
and
as well as different codon frequencies for the two partitions, and fits the data significantly better than any of the simpler models. Parameter estimates under model E are similar to those under model D. They all suggest that the
ratio is very different in the two partitions. The non-ARS sites are under purifying selection with
1 = 0.23, whereas the ARS sites are under diversifying selection with
2 = 1.9. Like the comparison between models B and D, comparison between models C and E leads to rejection of model C, with 2
= 2(
E -
C) = 191.70, indicating that
and
are different between the partitions. Model F is the separate analysis. Despite its use of 381 x 2 branch lengths for the two partitions, many of which are zero, the model fits the data significantly better than models for combined analysis which assume proportional branch lengths (models B, C, D, and E). For example, the test statistic for comparing models E and F is 2
= 492.34, and P < 0.0001 with d.f. = 380. Nevertheless, estimates of parameters such as
and
are highly similar to those obtained in the combined analyses. The tree length, i.e., the sum of branch lengths along the tree, for the first partition (sites outside the ARS) is 1.957 nucleotide substitutions per codon, or
S = 1.789 synonymous substitutions per synonymous site and
N = 0.414 nonsynonymous substitutions per nonsynonymous site. At the ARS, the tree length is 12.087 nucleotide substitutions per codon, or
S = 2.317 and
N = 4.297. Therefore, the synonymous rates are similar between the two partitions, and the over sixfold difference in substitution rate between the two partitions is mainly caused by the accelerated nonsynonymous rate at the ARS.
|
To test whether the
ratio at the ARS is significantly different from 1, we recalculated the log-likelihood values in models D, E, and F by fixing
2 = 1. If the ARS sites only are analyzed under the one-ratio model (model F; Goldman and Yang 1994
is a free parameter and -3866.58 when
= 1 is fixed. Thus the likelihood ratio test statistic is 2
= 2 x ([-3857.64] - [-3866.58]) = 17.88, with P = 2.4 x 10-5 at d.f. = 1 (table 4
, model F). Models D and E analyze the two partitions as one combined data set. When
2 = 1 is fixed in model D, the log likelihood is
= -7702.55, so the test statistic 2
= 2 x ([-7691.57] - [-7702.55]) = 21.95. Under model E, the test statistic 2
= 2 x ([-7671.92] - [-7681.25]) = 18.66. All these tests, which make different assumptions about differences between the two site partitions, reject the null hypothesis and suggest that the ratio
2 at the ARS is significantly greater than 1 (table 4
).
|
The fixed-sites and random-sites models are not nested and cannot be compared using a simple
2 distribution. Nevertheless, the log-likelihood values are comparable between the two classes of models. Note that in a fixed-sites model, the probability of observing data at a site is calculated using the
ratio for the partition the site is from. In a random-sites model, the probability is calculated as an average over all site classes (Nielsen and Yang 1998
E = -7671.92, whereas the random-sites model M8 (beta &
) in table 2
has 395 parameters but a much higher log-likelihood value,
= -7232.68, with a difference of 439.24.
The poorer performance of the fixed-sites models appears to be mainly caused by inclusion of conserved sites in the list of the 57 ARS sites. We note that structural studies permit the identification of sites potentially involved in antigen binding, but do not expect all of them to be under diversifying selection in the data set examined. The random-sites model M8 (beta &
) identified 25 sites to be under positive selection (table 2
), out of which 22 are in the list of ARS sites. The three sites that are not in the list are 45M, 94T, and 113Y. These sites are located in the ARS domain, although not in the binding cleft, and might also be involved in specificity of binding foreign peptides. Previous studies demonstrated that antibody specificity can be mediated by both variable loops and substitutions on the protein framework that do not have direct contact with the antigen (Foote and Winter 1992
). The results here suggest a similar process may be occurring at these sites in the MHC. There are 35 sites in the ARS partition that are not identified to be under positive selection by the random-sites models. Of them, site 73T has posterior probability P = 0.64 and posterior mean
= 3.6, and is quite likely to be under positive selection (fig. 1
). Sites 64T, 66K, 74H, 75R, 76V, and 171Y all have posterior means
> 0.8 and are possibly under positive selection but not detected by the random-sites models because of lack of information in the data at these sites. Sites 5M, 22F, 26G, 57P, 72Q, 84Y, 146K, 154E, 159Y, 165V, and 169R have posterior probabilities close to zero and posterior mean
< 0.1 (fig. 1
). These sites are most likely to be under strong purifying selection. Indeed, sites 57P, 72Q, 154E, 165V, and 169R point away from the antigen binding cleft and were predicted not to be involved in direct antigen binding in the original MHC structural analysis (Bjorkman et al. 1987b
).
Overall, these comparisons demonstrate the consistency of the fixed-sites and random-sites models and, in particular, the utility of the random-sites models even when structural information is available. They also highlight the power of predicting binding sites by incorporating both structural and evolutionary information.
It is also interesting to compare the results of table 2
(see also fig. 2
) with those of Swanson et al. (2001)
, who applied the random-sites models to a dataset of only six MHC alleles. The smaller data set included the signal sequence and additional C-terminal sequence, which were removed in this paper because these regions were not sequenced in all 192 alleles analyzed. Under the numbering system of this paper, this analysis identified 12 sites at the 50% level: 45M, 62G, 63E, 66K, 67V, 70H, 71S, 97R, 114H, 116Y, 151H, and 156L. All but one site (site 66K) are in the list of this paper (table 2
). It is remarkable that all sites identified in both studies are clustered in the ARS domain. At the 95% level, only two sites (114H and 156L) were identified in the small data set, compared with 25 sites in this paper. This comparison demonstrates the dramatic improvement in the power of the method with the increase of the number of sequences used, consistent with the simulation study of Anisimova, Bielawski, and Yang (2001)
. We suggest that more sites might be under positive selection in the MHC than identified in this paper.
Abalone Sperm Lysin
Abalones are large marine gastropod mollusks that exhibit external fertilization, with sperm and eggs released directly into seawater where fertilization occurs. Despite many of the species having overlapping breeding seasons and habitats, the species remain distinct. One barrier to cross-species fertilization is the species-specific interaction of sperm and eggs, which can be quantitatively demonstrated in the laboratory (e.g., Lyon and Vacquier 1999
). The molecules involved in the species-specific interaction have been characterized extensively (reviewed in Vacquier et al. 1999
). Abalone sperm lysin is a 16-kDa protein localized in the sperm acrosome granule. Upon exocytosis, lysin dissolves a hole in the egg vitelline envelope (VE) in a nonenzymatic and species-specific manner. Lysin binds to and unravels the fibrous VE by disrupting hydrogen bonds and hydrophobic interactions of its receptor VERL (Swanson and Vacquier 1997, 1998
). The crystal structures of the red (Haliotis rufescens) and green (H. fulgens) abalone have been determined (Shaw et al. 1995
; Kresge, Vacquier, and Stout 2000a, 2000b
). The sperm lysin genes of 25 abalone species were sequenced and analyzed by Lee, Ota, and Vacquier (1995)
, and strong diversifying selection was demonstrated at a number of amino acid sites in lysin, particularly in closely related sympatric species (Yang, Swanson, and Vacquier 2000
). The sequence data used in this paper are the same as those analyzed by Lee, Ota, and Vacquier (1995)
and Yang et al. (2000)
, except that an alignment gap between residues 133 and 134 in the original alignment is deleted in this paper, so that 134 codons are in each sequence. We use the phylogeny estimated by Lee, Ota, and Vacquier (1995)
.
Extensive analysis of the data under random-sites models was performed by Yang et al. (2000)
. In this paper, we present results obtained under models M7 (beta) and M8 (beta &
) only (table 5
). Parameter estimates are essentially identical to those in Yang et al. (2000)
, but the log-likelihood values are quite different, because of the removed site. Estimates under model M8 (beta &
) suggest that many sites are highly conserved, but as many as 27% of sites are under diversifying selection with
2 = 3.0. The likelihood ratio test comparing these two models suggests that the difference is statistically significant; the test statistic is 2
= 2(
1 -
0) = 2 x ([-4410.57] - [-4472.16]) = 123.18, compared with the
2 distribution with d.f. = 2. Sites inferred to be under positive selection are listed in table 5
. The lysin structure of the red abalone (H. rufescens), with sites identified to be under positive selection mapped onto it, was presented in Yang, Swanson, and Vacquier (2000)
.
|
As lysin is a surface-active molecule, we hypothesize that solvent-exposed residues in lysin might be subjected to positive selection, whereas the buried residues would be conserved in order to maintain the protein structure. To test this hypothesis, we partitioned the 134 sites in lysin into two classes, the buried sites and the solvent-exposed sites. Solvent accessibility is calculated from the red abalone lysin structure (1LIS in Protein Data Bank) using the program GETAREA (http://www.scsb.utmb.edu/cgi-bin/get_a_form.tcl; Fraczkiewicz and Braun 1998
The results obtained under the fixed-sites models are shown in table 6
. Model A, which assumes the same parameters in the two partitions, gave
A = -4627.03. Model B allows the overall rates to differ and fits the data much better than model A; the likelihood ratio test statistic is 2
= 2 x ([-4549.99] - [-4627.03]) = 154.08, compared with the
2 distribution with d.f. = 1. The rate at the solvent-exposed sites is 2.8 times as high as at the buried sites (r1:
2 = 1:2.755). Model C allows further for different codon frequencies for the two partitions, determined by the nucleotide frequencies at the three codon positions. This model fits the data much better than model B (2
= 119.84, d.f. = 9), suggesting that the codon usage patterns are indeed different at the buried and exposed sites. Model D assumes the same codon frequencies but different transition/transversion rate ratio
and nonsynonymous/synonymous rate ratio
. This model fits the data better than model B (2
= 35.74, d.f. = 2). The estimates are
1 = 1.7 and
1 = 0.39 for the buried sites and
2 = 1.5 and
2 = 1.25 for the solvent-exposed sites (table 6
). Whereas estimates of
are similar between the partitions, estimates of
are very different. As hypothesized, buried sites are under strong purifying selection, and solvent-exposed sites appear to be under diversifying selection. Unlike the MHC data set, allowing for different codon frequencies (model C) improves the fit of the model more than allowing for different
and
(model D). This pattern might be the result of different amino acid compositions at the buried and exposed sites. Model E allows different
and
as well as different codon frequencies between partitions, and fits the data better than any of the simpler models. The model gave similar estimates of parameters as model D (table 6
). Model F is equivalent to separate analysis of the two partitions. It is not significantly better than model E; the statistic is 2
= 38.06, and P = 0.79, with d.f. = 46. So it is acceptable to use 47 + 1 instead of 47 x 2 parameters for branch lengths in the two partitions. Parameter estimates under model F are similar to those obtained in the combined analyses (models BE). The tree length for the buried sites is 3.96 nucleotide substitutions per codon, or
S = 2.20 synonymous substitutions per synonymous site and
N = 0.99 nonsynonymous substitutions per nonsynonymous site. The tree length for the solvent-exposed sites is 9.98, or
S = 2.76 and
N = 3.50. Thus the 2.5 times rate difference between the two partitions is mainly caused by the accelerated nonsynonymous rate at the exposed sites.
|
To test whether the
ratio at the solvent-exposed sites is significantly greater than 1, we recalculated the log-likelihood values in models D, E, and F by fixing
2 = 1. In an analysis of the exposed sites only (model F), the log likelihood is -3517.01 when
is estimated as a free parameter and -3519.62 when
= 1 is fixed. Thus the likelihood ratio statistic for testing the null hypothesis
2 = 1 is 2
= 5.23, with P = 0.022 at d.f. = 1 (table 7
). Models D and E analyze the two partitions as one combined data set and have the test statistics to be 4.53 and 5.57, respectively (table 7
). So, whatever our assumptions about possible differences between the two partitions, we reject the hypothesis
2 = 1 at 1% < P < 5%, and conclude that the solvent-exposed sites in lysin are under diversifying selection with
2 > 1 (table 7 ).
|
All sites predicted by the random-sites models to be under positive selection are located on the surface of lysin and, therefore, included in the solvent exposed class. Similar to the analysis of the MHC data set, the fixed-sites models fit the data more poorly than the random-sites models, judged by their log-likelihood values. The main reason for this difference appears to be that some exposed sites are under purifying rather than positive selection.
| Discussions |
|---|
|
|
|---|
Comparison of Fixed-Sites and Random-Sites Models
The analyses of both the MHC and the lysin data sets demonstrate the utility of the new fixed-sites models implemented in this paper. In both genes, the
ratio averaged over all sites in the sequence is less than 1. However, positive selection is detected when structural information is used to identify sites that might be expected to be under positive selection, and an independent
ratio is assigned to the partition of such sites in the likelihood model. Perhaps more remarkable is the power of the random-sites models, which do not use structural information to partition sites. In both genes, the random-sites models provided even better fit to the data than the fixed-sites models, indicated by their higher log-likelihood values. This discrepancy appears to be caused by the inclusion of conserved sites in the site partition expected to be under positive selection used in the fixed-sites models, so that there is still substantial variation in selective pressure among sites within the same partition. In terms of statistical significance for detecting positive selection, we suspect that the fixed-sites models will seldom be more powerful than the random-sites models. To obtain significant results about positive selection by the fixed-sites models, it will be necessary to have reliable information to partition sites and a number of sites in the partition under fairly strong positive selection. In such cases, the random-sites models are unlikely to fail.
We note that in the MHC data set, 22 of the 25 sites identified by the random-sites models to be under positive selection are in the list of sites in the ARS, whereas the other three sites are in the ARS domain. In the lysin data set, all sites identified by the random-sites models are in the partition of exposed sites. Such consistency between the two classes of models validates the biological hypothesis used to partition sites a priori and also the reliability of the random-sites models. We suggest that the random-sites models are useful whether or not prior information is available to partition sites in the sequence. However, it should be emphasized that identification of sites under positive selection using the Bayes theorem requires simultaneous inferences at all sites in the sequence. Whereas the accuracy at one site might be high as indicated by the posterior probability, it is very unlikely for all sites to be identified correctly. Furthermore, the empirical Bayes procedure we used does not account for the sampling errors in parameter estimates, and the posterior probability calculations might be sensitive to parameters in the
distribution (Yang and Bielawski 2000
). Those problems may be serious when the analyzed data set is small and contains only a few highly similar sequences, with little information to estimate parameters in the
distribution. Thus we suggest that caution be exercised and the inferred sites be considered hypotheses to be verified by experimental investigation.
Analysis of Data from Multiple Genes
We envisage that one major use of the fixed-sites models is to test for similarities and differences in the evolutionary process among different genes. When sequences from multiple protein-coding genes are available for the same set of species, they can be analyzed as a combined data set, with their differences in the substitution pattern accounted for. Interesting hypotheses concerning differences among genes in the selective pressure indicated by the
ratio can then be tested. In this regard, some variations to the models we implemented here might be more interesting. For example, one such model might have a homogeneous synonymous substitution rate and variable nonsynonymous rates among genes. Another model might assume proportional branch lengths at the synonymous site and freely variable branch lengths at the nonsynonymous site among the genes. It might also be worthwhile to decouple
and
. In this paper, these two parameters are either both homogeneous or both different among genes. Analyses of this paper did not assume a molecular clock, so that the overall rate varies among branches. Models that enforce the molecular clock at the synonymous site but do not enforce the clock at the nonsynonymous site might be interesting. We note that some similar models have been developed by Muse and Gaut (1997)
in their pioneering work, and further implementation of such likelihood models is straightforward.
| Footnotes |
|---|
Brandon Gaut, Reviewing Editor
Keywords: synonymous rate
nonsynonymous rate
positive selection
partitioned data
lysin
MHC
maximum likelihood
Bayes ![]()
Address for correspondence and reprints: Ziheng Yang, Department of Biology, 4 Stephenson Way, London NW1 2HE, UK. z.yang{at}ucl.ac.uk
. ![]()
| References |
|---|
|
|
|---|
Akashi H., 1999 Within- and between-species DNA sequence variation and the footprint of natural selection Gene 238:39-51[Web of Science][Medline]
Anisimova M., J. P. Bielawski, Z. Yang, 2001 The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites Mol. Biol. Evol 18:1585-1592
Bielawski J. P., Z. Yang, 2001 Positive and negative selection in the DAZ gene family Mol. Biol. Evol 18:523-529
Bishop J. G., A. M. Dean, T. Mitchell-Olds, 2000 Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution Proc. Natl. Acad. Sci. USA 97:5322-5327
Bjorkman P. J., M. A. Saper, B. Samraoui, W. S. Bennett, J. L. Strominger, D. C. Wiley, 1987a. The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens Nature 329:512-518[Medline]
Bjorkman P. J., M. A. Saper, B. Samraoui, W. S. Bennett, J. L. Strominger, D. C. Wiley, 1987b. Structure of the class I histocompatibility antigen, HLA-A2 Nature 329:506-512[Medline]
Crandall K. A., C. R. Kelsey, H. Imamichi, H. C. Lane, N. P. Salzman, 1999 Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection Mol. Biol. Evol 16:372-382[Abstract]
Endo T., K. Ikeo, T. Gojobori, 1996 Large-scale search for genes on which positive selection may operate Mol. Biol. Evol 13:685-690[Abstract]
Fares M. A., A. Moya, C. Escarmis, E. Baranowski, E. Domingo, E. Barrio, 2001 Evidence for positive selection in the capsid protein-coding region of the foot-and-mouth disease virus (FMDV) subjected to experimental passage regimens Mol. Biol. Evol 18:10-21
Foote J., G. Winter, 1992 Antibody framework residues affecting the conformation of the hypervariable loops J. Mol. Biol 224:487-499[Web of Science][Medline]
Ford M. J., 2001 Molecular evolution of transferrin: evidence for positive selection in salmonids Mol. Biol. Evol 18:639-647
Fraczkiewicz R., W. Braun, 1998 Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules J. Comp. Chem 19:319-333
Gao G. F., J. Tormo, U. C. Gerth, J. R. Wyer, A. J. McMichael, D. I. Stuart, J. I. Bell, E. Y. Jones, B. K. Jakobsen, 1997 Crystal structure of the complex between human CD8alpha(alpha) and HLA-A2 Nature 387:630-634[Medline]
Goldman N., Z. Yang, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences Mol. Biol. Evol 11:725-736[Abstract]
Haydon D. T., A. D. Bastos, N. J. Knowles, A. R. Samuel, 2001 Evidence for positive selection in foot-and-mouth-disease virus capsid genes from field isolates Genetics 157:7-15
Hughes A. L., M. Nei, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection Nature 335:167-170[Medline]
Hughes A. L., M. Nei, 1989 Evolution of the major histocompatibility complex: independent origin of nonclassical class I genes in different groups of mammals Mol. Biol. Evol 6:559-579[Abstract]
Hughes A. L., T. Ota, M. Nei, 1990 Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules Mol. Biol. Evol 7:515-524[Abstract]
Kresge N., V. D. Vacquier, C. D. Stout, 2000a. 1.35 and 2.07 Å resolution structures of the red abalone sperm lysin monomer and dimer reveal features involved in receptor binding Acta Crystallogr 56:34-41
. 2000b. The high resolution crystal structure of green abalone sperm lysin: implications for species-specific binding of the egg receptor J. Mol. Biol 296:1225-1234[Web of Science][Medline]
Lee Y.-H., T. Ota, V. D. Vacquier, 1995 Positive selection is a general phenomenon in the evolution of abalone sperm lysin Mol. Biol. Evol 12:231-238[Abstract]
Lyon J. D., V. D. Vacquier, 1999 Interspecies chimeric sperm lysins identify regions mediating species-specific recognition of the abalone egg vitelline envelope Dev. Biol 214:151-159[Web of Science][Medline]
Muse S. V., B. S. Gaut, 1994 A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome Mol. Biol. Evol 11:715-724[Abstract]
Muse S. V., B. S. Gaut, 1997 Comparing patterns of nucleotide substitution rates among chloroplast loci using the relative ratio test Genetics 146:393-399[Abstract]
Nielsen R., Z. Yang, 1998 Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene Genetics 148:929-936
Peek A. S., V. Souza, L. E. Eguiarte, B. S. Gaut, 2001 The interaction of protein structure, selection, and recombination on the evolution of the type-1 fimbrial major subunit (fimA) from Escherichia coli J. Mol. Evol 52:193-204[Web of Science][Medline]
Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]
Sharp P. M., 1997 In search of molecular Darwinism Nature 385:111-112[Medline]
Shaw A., P. A. Fortes, C. D. Stout, V. D. Vacquier, 1995 Crystal structure and subunit dynamics of the abalone sperm lysin dimer: egg envelopes dissociate dimers, the monomer is the active species J. Cell Biol 130:1117-1125
Swanson W. J., V. D. Vacquier, 1997 The abalone egg vitelline envelope receptor for sperm lysin is a giant multivalent molecule Proc. Natl. Acad. Sci. USA 94:6724-6729
Swanson W. J., V. D. Vacquier, 1998 Concerted evolution in an egg receptor for a rapidly evolving abalone sperm protein Science 281:710-712
Swanson W. J., Z. Yang, M. F. Wolfner, C. F. Aquadro, 2001 Positive Darwinian selection in the evolution of mammalian female reproductive proteins Proc. Natl. Acad. Sci. USA 98:2509-2514
Vacquier V. D., W. J. Swanson, E. C. Metz, C. D. Stout, 1999 Acrosomal proteins of abalone spermatozoa Adv. Dev. Biochem 5:49-81
Yang Z., 1996 Maximum-likelihood models for combined analyses of multiple sequence data J. Mol. Evol 42:587-596[Web of Science][Medline]
Yang Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood Comput. Appl. Biosci 13:555-556
Yang Z., 2001 Adaptive molecular evolution Pp. 327350 in D. Balding, M. Bishop, and C. Cannings, eds. Handbook of statistical genetics. Wiley, New York
Yang Z., J. P. Bielawski, 2000 Statistical methods for detecting molecular adaptation Trends Ecol. Evol 15:496-503[Medline]
Yang Z., R. Nielsen, N. Goldman, A.-M. K. Pedersen, 2000 Codon-substitution models for heterogeneous selection pressure at amino acid sites Genetics 155:431-449
Yang Z., W. J. Swanson, V. D. Vacquier, 2000 Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites Mol. Biol. Evol 17:1446-1455
Zanotto P. M., E. G. Kallas, R. F. Souza, E. C. Holmes, 1999 Genealogical evidence for positive selection in the nef gene of HIV-1 Genetics 153:1077-1089
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Anisimova and C. Kosiol Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models Mol. Biol. Evol., February 1, 2009; 26(2): 255 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Delport, K. Scheffler, and C. Seoighe Models of coding sequence evolution Brief Bioinform, January 1, 2009; 10(1): 97 - 109. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. C. Almeida and R. DeSalle Orthology, Function and Evolution of Accessory Gland Proteins in the Drosophila repleta Group Genetics, January 1, 2009; 181(1): 235 - 245. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wadhawan, B. Dickins, and A. Nekrutenko Wheels within Wheels: Clues to the Evolution of the Gnas and Gnal Loci Mol. Biol. Evol., December 1, 2008; 25(12): 2745 - 2757. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Bao, H. Gu, K. A. Dunn, and J. P. Bielawski Likelihood-Based Clustering (LiBaC) for Codon Models, a Method for Grouping Sites according to Similarities in the Underlying Process of Evolution Mol. Biol. Evol., September 1, 2008; 25(9): 1995 - 2007. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rodrigue, H. Philippe, and N. Lartillot Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models Bioinformatics, January 1, 2008; 24(1): 56 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Andres, C. de Hemptinne, and J. Bertranpetit Heterogeneous Rate of Protein Evolution in Serotonin Genes Mol. Biol. Evol., December 1, 2007; 24(12): 2707 - 2715. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Streisfeld and M. D. Rausher Relaxed Constraint and Evolutionary Rate Variation between Basic Helix-Loop-Helix Floral Anthocyanin Regulators in Ipomoea Mol. Biol. Evol., December 1, 2007; 24(12): 2816 - 2826. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley, S. de Groot, T. Mailund, and J. Hein Annotation of selection strengths in viral genomes Bioinformatics, November 15, 2007; 23(22): 2978 - 2986. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. L. Clark, G. D. Findlay, X. Yi, M. J. MacCoss, and W. J. Swanson Duplication and Selection on Abalone Sperm Lysin in an Allopatric Population Mol. Biol. Evol., September 1, 2007; 24(9): 2081 - 2090. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang PAML 4: Phylogenetic Analysis by Maximum Likelihood Mol. Biol. Evol., August 1, 2007; 24(8): 1586 - 1591. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Choi, A. Hobolth, D. M. Robinson, H. Kishino, and J. L. Thorne Quantifying the Impact of Protein Tertiary Structure on Molecular Evolution Mol. Biol. Evol., August 1, 2007; 24(8): 1769 - 1782. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. de Groot, T. Mailund, and J. Hein Comparative annotation of viral genomes with non-conserved gene structure Bioinformatics, May 1, 2007; 23(9): 1080 - 1089. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C.-C. Shih, T.-C. Hsiao, M.-S. Ho, and W.-H. Li Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution PNAS, April 10, 2007; 104(15): 6283 - 6288. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Doron-Faigenboim and T. Pupko A Combined Empirical and Mechanistic Codon Model Mol. Biol. Evol., February 1, 2007; 24(2): 388 - 397. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Ren, W. Li, M. Yu, P. Hao, Y. Zhang, P. Zhou, S. Zhang, G. Zhao, Y. Zhong, S. Wang, et al. Full-length genome sequences of two SARS-like coronaviruses in horseshoe bats and genetic variation analysis. J. Gen. Virol., November 1, 2006; 87(Pt 11): 3355 - 3359. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res., August 1, 2006; 16(8): 1017 - 1030. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Panhuis and W. J. Swanson Molecular Evolution and Population Genetic Analysis of Candidate Female Reproductive Genes in Drosophila Genetics, August 1, 2006; 173(4): 2039 - 2047. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liang, W. Zhou, and L. F. Landweber SWAKK: a web server for detecting positive selection in proteins using a sliding window substitution rate analysis. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W382 - W384. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Wilson and G. McVean Estimating Diversifying Selection and Functional Constraint in the Presence of Recombination Genetics, March 1, 2006; 172(3): 1411 - 1425. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Scheffler and C. Seoighe A Bayesian Model Comparison Approach to Inferring Positive Selection Mol. Biol. Evol., December 1, 2005; 22(12): 2531 - 2540. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Pond and S. V. Muse Site-to-Site Variation of Synonymous Substitution Rates Mol. Biol. Evol., December 1, 2005; 22(12): 2375 - 2385. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Perez-Losada, R. P. Viscidi, J. C. Demma, J. Zenilman, and K. A. Crandall Population Genetics of Neisseria gonorrhoeae in a High-Prevalence Community Using a Hypervariable Outer Membrane porB and 13 Slowly Evolving Housekeeping Genes Mol. Biol. Evol., September 1, 2005; 22(9): 1887 - 1902. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Bishop Directed Mutagenesis Confirms the Functional Importance of Positively Selected Sites in Polygalacturonase Inhibitor Protein Mol. Biol. Evol., July 1, 2005; 22(7): 1531 - 1534. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Chen, M. Terai, L. Fu, R. Herrero, R. DeSalle, and R. D. Burk Diversifying Selection in Human Papillomavirus Type 16 Lineages Based on Complete Genome Analyses J. Virol., June 1, 2005; 79(11): 7014 - 7023. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. C. Spady, O. Seehausen, E. R. Loew, R. C. Jordan, T. D. Kocher, and K. L. Carleton Adaptive Molecular Evolution in the Opsin Genes of Rapidly Speciating Cichlid Species Mol. Biol. Evol., June 1, 2005; 22(6): 1412 - 1422. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Kosakovsky Pond and S. D. W. Frost Not So Different After All: A Comparison of Methods for Detecting Amino Acid Sites Under Selection Mol. Biol. Evol., May 1, 2005; 22(5): 1208 - 1222. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Pfister and I. Rodriguez Olfactory expression of a single and highly variable V1r pheromone receptor-like gene in fish species PNAS, April 12, 2005; 102(15): 5489 - 5494. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang, W. S.W. Wong, and R. Nielsen Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection Mol. Biol. Evol., April 1, 2005; 22(4): 1107 - 1118. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Thomas, J. L. Kelley, H. M. Robertson, K. Ly, and W. J. Swanson Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae PNAS, March 22, 2005; 102(12): 4476 - 4481. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Yang The power of phylogenetic comparison in revealing protein function PNAS, March 1, 2005; 102(9): 3179 - 3180. [Full Text] [PDF] |
||||
![]() |
D. A. McClellan, E. J. Palfreyman, M. J. Smith, J. L. Moss, R. G. Christensen, and J. K. Sailsbery Physicochemical Evolution and Molecular Adaptation of the Cetacean and Artiodactyl Cytochrome b Proteins Mol. Biol. Evol., March 1, 2005; 22(3): 437 - 455. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Kosakovsky Pond and S. D. W. Frost A Simple Hierarchical Approach to Modeling Distributions of Substitution Rates Mol. Biol. Evol., February 1, 2005; 22(2): 223 - 234. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Schmidt and R. Durrett Adaptive Evolution Drives the Diversification of Zinc-Finger Binding Domains Mol. Biol. Evol., December 1, 2004; 21(12): 2326 - 2339. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P.C. Rocha Codon usage bias from tRNA's point of view: Redundancy, specialization, and efficient decoding for translation optimization Genome Res., November 1, 2004; 14(11): 2279 - 2286. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. S. W. Wong, Z. Yang, N. Goldman, and R. Nielsen Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected Sites Genetics, October 1, 2004; 168(2): 1041 - 1051. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Tang, G. J. Wyckoff, J. Lu, and C.-I Wu A Universal Evolutionary Index for Amino Acid Changes Mol. Biol. Evol., August 1, 2004; 21(8): 1548 - 1556. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-K. Seo, H. Kishino, and J. L. Thorne Estimating Absolute Rates of Synonymous and Nonsynonymous Nucleotide Substitution in Order to Characterize Natural Selection and Date Species Divergences Mol. Biol. Evol., July 1, 2004; 21(7): 1201 - 1213. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Canino and P. Bentzen Evidence for Positive Selection at the Pantophysin (Pan I) Locus in Walleye Pollock, Theragra chalcogramma Mol. Biol. Evol., July 1, 2004; 21(7): 1391 - 1400. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Johannesson, P. Vidal, J. Guarro, R. A. Herr, G. T. Cole, and J. W. Taylor Positive Directional Selection in the Proline-Rich Antigen (PRA) Gene Among the Human Pathogenic Fungi Coccidioides immitis, C. posadasii and Their Closest Relatives Mol. Biol. Evol., June 1, 2004; 21(6): 1134 - 1145. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Emes, S. A. Beatson, C. P. Ponting, and L. Goodstadt Evolution and Comparative Genomics of Odorant- and Pheromone-Associated Genes in Rodents Genome Res., April 1, 2004; 14(4): 591 - 602. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. H. Pogson and K. A. Mesa Positive Darwinian Selection at the Pantophysin (Pan I) Locus in Marine Gadid Fishes Mol. Biol. Evol., January 1, 2004; 21(1): 65 - 75. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Stolz, S. Velez, K. V. Wood, M. Wood, and J. L. Feder Darwinian natural selection for orange bioluminescent color in a Jamaican click beetle PNAS, December 9, 2003; 100(25): 14955 - 14959. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Negre, J. M. Ranz, F. Casals, M. Caceres, and A. Ruiz A New Split of the Hox Gene Complex in Drosophila: Relocation and Evolution of the Gene labial Mol. Biol. Evol., December 1, 2003; 20(12): 2042 - 2054. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Takebayashi, P. B. Brewer, E. Newbigin, and M. K. Uyenoyama Patterns of Variation Within Self-Incompatibility Loci Mol. Biol. Evol., November 1, 2003; 20(11): 1778 - 1794. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-w. Zhang, O. A. Ryder, and Y.-p. Zhang Intra- and Interspecific Variation of the CCR5 Gene in Higher Primates Mol. Biol. Evol., October 1, 2003; 20(10): 1722 - 1729. [Abstract] [Full Text] |
||||
![]() |
U. Sorhannus The Effect of Positive Selection on a Sexual Reproduction Gene in Thalassiosira weissflogii (Bacillariophyta): Results Obtained from Maximum-Likelihood and Parsimony-Based Methods Mol. Biol. Evol., August 1, 2003; 20(8): 1326 - 1328. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Anisimova, R. Nielsen, and Z. Yang Effect of Recombination on the Accuracy of the Likelihood Method for Detecting Positive Selection at Amino Acid Sites Genetics, July 1, 2003; 164(3): 1229 - 1236. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.G. CLARK, S. GLANOWSKI, R. NIELSEN, P. THOMAS, A. KEJARIWAL, M.J. TODD, D.M. TANENBAUM, D. CIVELLO, F. LU, B. MURPHY, et al. Positive Selection in the Human Genome Inferred from Human-Chimp-Mouse Orthologous Gene Alignments Cold Spring Harb Symp Quant Biol, January 1, 2003; 68(0): 479 - 486. [Abstract] [PDF] |
||||
![]() |
M. Mondragon-Palomino, B. C. Meyers, R. W. Michelmore, and B. S. Gaut Patterns of Positive Selection in the Complete NBS-LRR Gene Family of Arabidopsis thaliana Genome Res., September 1, 2002; 12(9): 1305 - 1315. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||












