MBE Advance Access originally published online on April 18, 2006
Molecular Biology and Evolution 2006 23(7):1348-1356; doi:10.1093/molbev/msk025
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
The Rate of Adaptive Evolution in Enteric Bacteria

* Centre for Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom; and
National Evolutionary Synthesis Center, Durham, USA
E-mail: a.c.eyre-walker{at}sussex.ac.uk.
| Abstract |
|---|
|
|
|---|
Here we estimate the rate of adaptive substitution in a set of 410 genes that are present in 6 Escherichia coli and 6 Salmonella enterica genomes. We estimate that more than 50% of amino acid substitutions in this set of genes have been fixed by positive selection between the E. coli and S. enterica lineages. We also show that the proportion of adaptive substitutions is uncorrelated with the rate of amino acid substitution or gene function but that it may be correlated with levels of synonymous codon usage bias.
Key Words: adaptive evolution enteric bacteria McDonald-Kreitman test
| Introduction |
|---|
|
|
|---|
The rate at which adaptive substitutions occur has long been an interesting evolutionary question (Nei 2005
11% of genes show evidence of adaptive evolution (Clark et al. 2003
Although the above studies suggest that adaptive evolution is reasonably common, they do not give us an estimate of the proportion of evolutionary change that is due to positive selection. However, this quantity can be estimated using an extension of the test of McDonald and Kreitman (1991)
that was first suggested by Charlesworth (1994)
. By combining data from both between and within species, Fay et al. (2001)
estimated that
35% of amino acid substitutions were due to adaptive evolution in primates, but recent studies have found little evidence of positive selection in hominids (Chimpanzee Sequencing and Analysis Consortium 2005
; Zhang and Li 2005
). In contrast, estimates of the rate of adaptive substitution in Drosophila have been consistently high. However, the precise estimate depends upon the methods used. Using approaches similar to those employed in the analysis of primate data, the proportion of amino acid substitutions that are adaptive (
) is estimated to be between 25% and 45% in Drosophila (Smith and Eyre-Walker 2002
; Bierne and Eyre-Walker 2004
). This rises to over 90% under the method used by Sawyer et al. (2003)
.
The fact that some genes show evidence of adaptive evolution whereas others do not suggests that adaptive evolution may be restricted or more prevalent in some genes. This is certainly what one might expect; for example, genes involved in the immune system would be expected to undergo more adaptive evolution than housekeeping genes. However, the only analysis to formally test whether the proportion of substitutions driven by positive selection varies between genes found no evidence of variation in a set of Drosophila simulans genes (Bierne and Eyre-Walker 2004
) (though see Fay et al. 2002
, for contrary evidence using a less formal approach). This was striking given that the rate of nonsynonymous substitution varied by several orders of magnitude in the genes studied. However, it is unclear whether this apparent constancy was real or due to a lack of power in the method.
Here we attempt to estimate the rate of adaptive amino acid substitution in enteric bacteria. We also investigate whether the proportion of adaptive substitutions varies between genes. We do this using the method of Bierne and Eyre-Walker (2004)
that is, by comparing models in which the proportion of adaptive substitutions is allowed to vary between genes, relative to a model in which all genes share a single proportion of adaptive substitutions. However, we also investigate whether the proportion of adaptive substitutions is correlated with the rate of amino acid substitution, gene function, or codon bias. This latter analysis was motivated by the fact that there is a strong negative correlation between the nonsynonymous substitution rate and codon bias in several taxa (Sharp 1991
; Akashi 1994
; Pal et al. 2001
; Betancourt and Presgraves 2002
; Rocha and Danchin 2004
) and by the suggestion that one might be able to estimate rates of adaptive evolution using measures of codon usage bias (Plotkin et al. 2004
; Stoletzki et al. 2005
). We investigate these questions using a large set of genes sampled from several complete-genome sequences of both E. coli and S. enterica.
| Materials and Methods |
|---|
|
|
|---|
Data
Genes were extracted from 6 complete E. coli genomes: E. coli K12, O157:H7, O157:H7 EDL933, CFTO73 (GenBank accession numbers: U00096, BA000007, AE006174, AE014075, respectively), 042, and E2348 (these sequence data were produced by the E. coli and Shigella Comparative Sequencing Group at the Sanger Institute and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/Escherichia_Shigella) and 6 complete S. enterica genomes: S. enterica LT2, Choleraesuis, Paratyphi A, Typhi CT18, and Typhi Ty2 (GenBank asscession numbers: AE006468, AE017220, CP000026, AL513382, AE014613, respectively), PT4, Salmonella typhimurium DT104 and S. typhimurium SL1344 (these sequence data were produced by the Salmonella Comparative Sequencing Group at the Sanger Institute and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/Salmonella). Note that S. typhimurium is a strain of S. enterica. Sequences were downloaded from GenBank http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi and from the Sanger ftp sites as cited.
All protein-coding sequences longer than 100 bp were extracted from the E. coli K12 and S. typhimurium LT2 genomes, and StandAlone Blast (Altschul et al. 1997
) was used to query each gene against databases containing the remaining available genomes for each species plus the outgroup genome sequence (S. typhimurium LT2 for E. coli and E. coli K12 for S. enterica). An alignment for a gene was accepted if 85% of the coding sequence was present in all strains, and no premature stop codons were detected. Genes which were present in all strains but incomplete were aligned by hand using SeAl v. 2.0a11 Carbon (Rambaut 1996
). In a large number of cases, variants were found in a single sequence that appeared to be single base pair insertions or deletions. It was not possible to tell whether these were real indels or sequencing errors. In order to be certain that no genuine pseudogenes were accidentally included in the analysis, any genes containing such single base pair indels were excluded from the final data set; in addition, our analysis excluded singleton polymorphisms (i.e., polymorphisms present in a single sequence) to reduce the effect of sequencing errors. Our final data set contained 410 genes present in all 6 E. coli and all 6 S. enterica genomes.
Analysis
The relationship between strains was examined by concatenating 200 randomly chosen genes and constructing a Neighbor-Joining tree using synonymous divergence between strains as estimated by the NeiGojobori method I (Nei and Gojobori 1986
). This analysis was performed in Mega3 (Kumar et al. 2004
). We also estimated overall synonymous diversity between the strains using Watterson's
(Watterson 1975
) (
s).
s was calculated for all 410 genes and averaged across genes.
Polymorphisms were counted using software written for this purpose. Counts were identical to those produced by DnaSP (Rozas J and Rozas R 1999
), except where we resolved codons containing multiple mutations, which DnaSP omits. We adopted the following algorithm. For all pairs of codons present at a site, we estimated the number of synonymous and nonsynonymous differences between them using the method of Nei and Gojobori (1986)
that is, we took an unweighted average of the paths. We identified for each codon the codon it was most closely related to, and then we summed the numbers of nonsynonymous and synonymous polymorphisms across these pairs of codons. This seems a reasonable approach for species, such as E. coli and S. enterica, in which recombination is sufficiently rare that it will not often introduce a recombination breakpoint within a codon (Maynard Smith et al. 1993
). As we had no information about which sequences were ancestral, a polymorphism segregating at a frequency of i out of n sequences was indistinguishable from a polymorphism segregating in ni sequences. Numbers of nonsynonymous and synonymous polymorphisms were denoted by Pn and Ps, respectively. Polymorphism frequencies were defined as singletons (a single polymorphic sequence), doubletons (2 strains have a variant), and tripletons (3 strains have a variant).
The numbers of substitutions between E. coli and S. enterica were estimated using the sequences from the 2 strains E. coli K12 and S. typhimurium LT2; substitutions were counted using the "Codeml" program in the PAML package (Yang 2002
), with the F3x4 model of codon frequencies. We follow Dunn et al. (2001)
and denote the number of nonsynonymous and synonymous substitutions by Dn and Ds, respectively, with the number of substitutions per site denoted by dN and dS. We restricted the estimate of divergence to those sites for which we had polymorphism data. Since this varies slightly between E. coli and S. enterica (i.e., a gene may be present in both species, but the exact length of alignable sequence might differ between species), comparisons in the 2 directions between the species yielded slightly different estimates for the numbers of synonymous and nonsynonymous substitutions (Dn and Ds) for some genes.
We tested for recombination within the E. coli and S. enterica strains separately using Maynard Smith's "maxchi" test (Maynard Smith 1992
). We ran the test with the adjustment suggested by Piganeau and Eyre-Walker (2004), in which chi-square values were ignored if they were generated by contingency tables with any expected value less than 2. This prevents very high chi-square values being produced when there is little data. Maynard Smith's test was originally designed to test for recombination in bacteria and has proved to be one of the most powerful tests of recombination in general (Posada and Crandall 2001
). The maxchi test will detect both recombinations between strains within the sample being considered and between strains from different species. We differentiated these 2 possibilities by testing whether the level of polymorphism was different either side of the breakpoint; we tested this using a chi-square test of independence where the cell entries in the 2 x 2 table are number of polymorphic and nonpolymorphic sites before the breakpoint and number of polymorphic and nonpolymorphic sites after the breakpoint. The breakpoint was considered to be midway between the single-nucleotide polymorphisms (SNPs) flanking the breakpoint. We considered it likely that the recombinant was from outside the set of strains if the chi square was significant at the 5% level.
The proportion of amino acid substitutions due to positive selection,
, was calculated using the maximum likelihood method of Bierne and Eyre-Walker (2004)
. We investigated whether there was evidence for variation in
using 2 approaches. First, we tested for correlations between
and the rate of amino acid evolution (dN), codon bias, or functional category. Second, we applied the method of Bierne and Eyre-Walker (2004)
, using software written by Welch (2006)
, to investigate whether models which allowed
to vary fit the data better than models that assumed a constant value of
for all genes. To investigate the relationship between dN and
, we split each gene into 2 equal-sized sets of mutually exclusive codons by randomly selecting codons without replacement. One set was used to estimate dN, whereas the other set was used to estimate Dn and
. By using different sets of codons to measure dN and
, we ensured that
and dN were statistically independent and that any correlation between the 2 was not due to using the same sites to measure dN and the Dn count used to calculate
. To reduce sampling error, we divided the data set into 9 groups in order of increasing dN, each containing an approximately equal number of genes. Within each group, we summed Dn, Ds, Pn, and Ps and used these summed counts to calculate
using the second set of codons.
was calculated by the simple equation:
![]() | (1) |
![]() | (2a) |
![]() | (2b) |
We performed a similar analysis to test for a correlation between codon usage bias and rates of adaptive evolution. Codon usage bias was measured using the codon adaptation index (CAI) (Sharp and Li 1987
). Optimal codons were assumed to be the same in both species, and we used the codon fitness values defined for a set of E. coli genes by Sharp and Li (1987)
. Genes were divided into 17 groups of approximately equal size by ascending CAI values;
was calculated for each group using the summation method given above.
To investigate the relationship between
and gene function, we divided our genes into functional categories by searching the database of information on E. coli genes available at http://coli.berkeley.edu/cgi-bin/ecoli/coli_entry.pl. We used the functional categories defined by Riley and Labedan (1996)
. Genes were sorted by functional category, and categories containing 20 or more genes were investigated further. In order to control for the correlation between
and codon bias, we first looked at the mean CAI of each group; analysis of variance (ANOVA) was used to test whether any group had a significantly different CAI. To test for variation between
among genes belonging to different functional groups, we compared the variance in
between different groups of genes with the variance obtained by randomly permuting genes between groups, preserving the same number of genes in each group as in the real functional categories.
| Results |
|---|
|
|
|---|
Data
We extracted all the protein-coding sequences which were readily identifiable as being common to 6 E. coli and 6 S. enterica genomes (i.e., present in all 12 genomes under consideration). This yielded 410 genes, about one-tenth of the genes present in E. coli K12. The number of genes in our data set is much smaller than the total number of genes in any individual microbial genome, for several reasons. First, many genes are species specific; the fraction of S. enterica LT2 genes having a homologue in 8 other enterobacterial genomes (including several E. coli and S. enterica strains) is estimated to be only
55% (McClelland et al. 2001
Species
The McDonaldKreitman framework relies on the distinction between inter- and intraspecies processes because neutral and advantageous mutations behave differently in these 2 settingsadvantageous mutations contribute relatively more to interspecies differences (i.e., substitutions) than they do to intraspecies differences (i.e., polymorphism) when compared to neutral mutations. However, species boundaries are notoriously difficult to infer in prokaryotes, although species, in a population genetic sense, do existthat is, there are collections of strains which undergo genetic drift together (Hey 2001
). To proceed, we took a pragmatic approach to the analysis and determined levels of synonymous divergence between our strains by concatenating 200 randomly selected genes and constructing a phylogenetic tree. The strains are closely related to each other within both E. coli and S. enterica; the maximum synonymous divergence between any 2 strains is less than 10% within both species (fig. 1). Furthermore, the average values of Watterson's
across all genes are 0.05 and 0.02 for S. enterica and E. coli, respectively, whereas the average synonymous divergence as measured by dS is 0.76. The values of Watterson's
are very similar to those estimated in several Drosophila species (Moriyama and Powell 1997
; Andolfatto 2001
). It is therefore plausible, though by no means proved, that the E. coli and S. enterica strains used here act as distinct species in the population genetic sense.
|
To further investigate whether the strains were part of a single species, we tested for recombination between them. In both E. coli and S. enterica, we observed substantial numbers of genes that had apparently undergone recombination (104/334 = 31% of genes in E. coli and 64/276 = 23% of genes in S. enterica were significant at the 5% level using the maxchi test) (table 1). In 85% of E. coli genes and 82% of S. enterica genes, in which we had evidence of recombination, there was no evidence of an increase in the diversity in the recombinant region, suggesting that recombination is occurring between strains within a species and not between species. Almost all strains in both E. coli and S. enterica appear to have exchanged genetic material; however, a couple of pairs appear to have undergone more recombination than othersfor example, E. coli strains K12 and 0157:H7 (table 1). The presence of recombination does not prove that the strains form species at all loci, but it does suggest that the strains are acting as a single unit. It should also be appreciated that we will underestimate the level of adaptive evolution if some of the strains are actually separate species; this is because we will count some substitutions as polymorphisms.
|
Slightly Deleterious Mutations
Many nonsynonymous SNPs appear to be slightly deleterious in bacterial species because they segregate at lower frequencies than synonymous SNPs (Hughes 2005
|
Estimating

It is possible to partially remove the effects of these slightly deleterious mutations by removing mutations segregating at low frequencies (Fay et al. 2001
excluding low-frequency variants, we find that
increases as the frequency of the polymorphisms being considered increases. Unfortunately, there is no tendency for the
value to approach an asymptote as the frequency of the mutations being considered increases (table 3). This is not unexpected when the number of alleles sampled is small. Nevertheless, it makes it difficult to estimate of the rate of adaptive evolution. The best we can do is to estimate a lower bound for the rate of adaptive evolution using SNPs from the highest frequency class, which in this case is tripletons. Using polymorphism data from E. coli, this is 56%, with a lower confidence interval (CI) of 45%. Using polymorphism data from S. enterica, it is 34%, with a lower CI of 14%. Because slightly deleterious nonsynonymous mutations seem to be more prevalent in S. enterica than in E. coli, we take the estimate of
using the E. coli polymorphism data as our best minimum estimate.
|
Variation in
Across the GenomeIt has been suggested that adaptive evolution may be more prevalent in fast evolving genes (Fay et al. 2002
is shared across genes, rather than models where
is allowed to vary according to a specified distribution. Here, we investigated variation in
by testing whether
was correlated with a number of parameters and by model comparisons. To test whether
correlates with the rate of evolution, as suggested by Fay et al. (2002)
= 0.618). One set of codons was used to estimate
, dNadaptive and dNneutral, and the other set was used to estimate dN; in this way we made our measures of dN and our measures of
, dNadaptive and dNneutral, statistically independent. To reduce sampling errors, we grouped genes into 9 classes of roughly equal size and used both doubleton and tripleton polymorphisms from E. coli to estimate the level of adaptive evolution; the results were qualitatively similar if we just used tripletons. Not surprisingly, dNadaptive and dNneutral were correlated with dN (Spearman's
= 0.972 for dNadaptive and 0.968 for dNneutral); however, there was no significant correlation between dN and
(Spearman's
= 0.167, P = 0.668; fig. 2).
|
Previous studies have found that the rate of nonsynonymous substitution is correlated with codon bias in a variety of organisms (Sharp 1991
= 0.674, P = 0.001 and
= 0.783, P = 0.000 for E. coli and S. enterica, respectively). To investigate the relationship further, we divided the rate of nonsynonymous substitution into 2 components, a part apparently due to positive selection (dNadaptive) and a part apparently due to neutral evolution (dNneutral); it should be appreciated that dNadaptive is probably underestimated, and dNneutral overestimated because of the segregation of slightly deleterious nonsynonymous mutations. As described above, we divided the data into 17 groups of genes, this time by their CAI value, using doubleton and tripleton polymorphisms from E. coli to estimate the level of adaptive evolution. We found that dNneutral was significantly negatively correlated with codon bias (
= 0.594, P = 0.006), whereas dNadaptive and
were not significantly correlated (
= 0.402, P = 0.079). The negative correlation between dNneutral and CAI results in a positive correlation between
and CAI (
= 0.596, P = 0.012, fig. 3). This correlation could potentially arise through selection on synonymous codon useas selection on synonymous codon use gets stronger, codon bias increases, the ratio dS/pS decreases, and
consequently increases. To investigate this further, we divided our data set of genes into 3 groups according to their CAI values and calculated the average allelic frequency of synonymous and nonsynonymous SNPs in genes with high and low codon bias (table 2). As expected,
varies between these groups (low = 0.39, med = 0.59, high = 0.71). However, although synonymous polymorphisms segregate at slightly lower allelic frequencies in high than low CAI genes, as expected, the frequency of synonymous mutations is not significantly lower than that of nonsynonymous mutations in the groups with high or low CAI in either species (table 2). This suggests that while selection against synonymous polymorphisms is stronger in high bias genes, there is also stronger selection on nonsynonymous polymorphisms.
|
Finally, we tested whether
varied between genes with different functions by grouping the E. coli genes according to functional category. All our genes belonged to the general category of housekeeping genes, genes of unknown function excepted (Riley and Labedan 1996
and CAI, we first investigated the variation in CAI between functional groups. We find a significant difference in CAI between genes belonging to different functional groups (table 4; ANOVA, F = 27.24, P = 0). Tukey's test showed ribosomal components to be the significantly different group, so we excluded this group from our investigation of variation in
. One might intuitively expect this group of genes to have a high CAI; ribosomal components are possibly the most essential of genes and have long been known to be highly expressed and have high codon bias in bacteria (Gouy and Gautier 1982
for each of the remaining functional categories represented by our data and found no variation between categories (figure 4).
|
|
| Discussion |
|---|
|
|
|---|
We estimate that at least 50% of amino acid substitutions have been driven by positive selection between the enteric bacteria E. coli and S. enterica in our data set of 410 genes. Unfortunately, we can only estimate a lower bound because some of the nonsynonymous polymorphisms appear to be slightly deleterious, and it is difficult to remove their effects when so few alleles have been sampled. If the sequences had been sampled randomly, then we might have been able to estimate the strength of selection acting upon these polymorphisms and, hence estimate
more accurately (see Fay et al. [2001]
Artifacts
Although highly statistically significant, the evidence of adaptive evolution could be artifactual for 2 reasons. First, it could be caused by a combination of slightly deleterious nonsynonymous mutations and expanding population size. It is difficult to rule out this possibility completely. Although E. coli and S. enterica are found in a broad variety of mammals and other animals (Selander et al. 1996
), many of the complete genome sequences appear to be of human-specific pathogens (e.g., E. coli 0157:H7; CFT0373; S. enterica typhii; Choleraesuis). Their population sizes may therefore have expanded with human populations. However, although the human census population size has increased, estimates of the effective population size in humans are either very similar to or smaller than estimates in other hominids, including chimpanzees (Eyre-Walker et al. 2002
), gorillas (Yu et al. 2004
), and the ancestor of humans and chimpanzees (Rannala and Yang 2003
). This suggests that most recent demographic changes have not increased the effective population size of humans greatly. So unless the diversity in E. coli and S. enterica is much younger than the diversity in humans, we would also not expect the diversity in them to reflect recent increases in human population sizes. The nucleotide mutation rate has been estimated to be
5 x 1010 in E. coli (Drake 1991
). If we accept that the E. coli strains we have analyzed belong to the same species, then the effective population size is estimated to be 50 x 106 using the estimate of synonymous diversity given above. A similar calculation yields an estimate of 20 x 106 for S. enterica. If we accept that E. coli goes through a generation every 2 days (Savageau 1983
; Selander et al. 1987
), then the diversity in E. coli is on average 275,000 years old. Thus, the diversity in E. coli is older than the diversity in human mitochodria but a little younger than the age of diversity in nuclear genes (Jorde et al. 1998
). We currently do not know the average generation time for S. enterica and so cannot estimate the age of the diversity.
Second, the evidence of adaptive evolution could be due to the systematic underestimation of dS either because synonymous sites are approaching saturation between E. coli and S. enterica or because there is substantial variation in the strength of selection acting upon synonymous sites. It is difficult to rule out this possibility completely; however, if we divide our data set into 2 halves according to dS (dS < 0.75, 191 genes and dS > 0.75, 219 genes), we find the estimates of
to be almost identical in the 2 data sets: 0.51 (0.31, 0.78) and 0.58 (0.45, 0.72), respectively. And yet the mean and median values of dS are substantially different between the 2 data sets: 0.438 and 0.516 for dS < 0.75 and 1.06 and 1.03 for dS > 0.75. If
is an overestimate because dS has been underestimated, it would seem remarkable that this bias is the same in genes which have different values of dS.
Comparisons with Other Species
Although we can only set a lower limit on the proportion of substitutions driven by adaptive evolution, it does appear to be higher than that found in either primates (0% to
35%; Fay et al. 2001
; Chimpanzee Sequencing and Analysis Consortium 2005
; Zhang and Li 2005
) or Drosophila (
25%; Bierne and Eyre-Walker 2004
)the very high level of adaptive substitution (
94%) in Drosophila inferred by Sawyer et al. (2003)
may have been a consequence of the method they used; their method assumed that the amino acid mutations contributing to polymorphism and divergence are drawn from the same normal distribution. This may not be the case. The high proportion of adaptive substitution we have found in E. coli and S. enterica is possibly not surprising given the very large effective population sizes of these bacteriaestimates (see above) of their effective population sizes are at least an order of magnitude greater than Drosophila (Begun and Aquadro 1993
; Andolfatto 2001
) and 3 orders of magnitude greater than primates (Eyre-Walker et al. 2002
; Rannala and Yang 2003
; Yu et al. 2004
). Such large effective population sizes will decrease the probability that slightly deleterious mutations are fixed and increase the probability that advantageous mutations are fixed.
Genomic Rates
Bacteria can potentially adapt to their environment in 2 distinct ways; evolution can occur via mutation in the genes they already have, or they may acquire novel genes through horizontal transfer events (Ochman et al. 2000
). The full sequences of bacterial genomes have made it evident that many genes are acquired and lost during bacterial evolutionmany genes are not shared between strains belonging to a single species (e.g.,
500 genes found in E. coli K12 are not present in E. coli 0157:H7) (Perna et al. 2001
). Much of this difference in gene content may be adaptive. We have estimated the level of adaptive evolution for a set of genes which are present in all the E. coli and S. enterica genomes we surveyed; one might think of these as the core genes. On average, these core genes are
310 amino acids long (data used here and Blattner et al. [1997]
) and differ between E. coli and S. enterica by
0.05 amino acid substitutions per codon. So given that there are 4,288 genes in E. coli K12 (Blattner et al. 1997
) and a minimum of 55% of those are present in S. enterica, we estimate that these 2 species differ by at least 37,000 amino acid differences. Because we estimate that at least 50% of these helped either E. coli or S. enterica adapt to their environment, this means they differ by a minimum of 18,500 adaptive amino acid substitutions in these core genes. It has been estimated that these 2 species diverged
100 MYA, which means they have gone through at least of one adaptive substitution every 11,000 years. This rate of adaptive substitution is far lower than Drosophila (1 every 45 years [Smith and Eyre-Walker 2002
]); this is possibly due to the smaller genome size of these bacteria.
Variation in 
We have also investigated whether
varies between genes. Surprisingly, the proportion of nonsynonymous substitutions which are adaptive does not seem to be correlated to the overall rate of nonsynonymous substitution. This is contrary to the results of Fay et al. (2002)
, who found that
was higher in faster evolving Drosophila genes, but agrees with the results of Bierne and Eyre-Walker (2004)
, who found no evidence of variation in
between genes, also in Drosophila. The lack of a correlation between
and dN might be due to the fact that our data set is biased toward conserved "core" genes; however, there is more than an order of magnitude variation in dN within our data.
Although we found no correlation between
and dN, we did find a highly significant correlation between
and the level of codon bias. This seems largely to be a consequence of a correlation between dNneutral and codon bias rather than dNadaptive. This is as expected if the correlation between dN and codon bias is a result of selection on translational accuracy because dNneutral is probably a better indicator of how many of the amino acid sites are critical for function and therefore how strong selection for translational accuracy is likely to be. Adaptive substitutions might occur at both critical and noncritical sites. If this interpretation is correct, then this means that although codon bias might give us some information about the rate of amino acid substitution, it is not likely to be useful in identifying genes undergoing a high rate of adaptive evolution as suggested by Plotkin et al. (2004)
. We also find no evidence that
varies between genes of different function, with the caveat that the majority of genes represented by our data set are core genes.
Although Bierne and Eyre-Walker (2004)
found no evidence of variation in
in Drosophila, it remained unclear whether this was a genuine result or a failure of power. Our bacterial data allowed us to investigate the matter further because we found a significant correlation between
and codon biasthat is, evidence that there is significant variation in
between genes within these bacteria. When we ran the method of Bierne and Eyre-Walker (2004)
on our data set, we found no evidence of variation in
. Models in which
was beta distributed converged upon a single spike with very little variance and were not favored over models in which alpha was assumed to be constant across genes. Likewise, a constant
model was favored over a model in which 2 categories of genes were assumed to have different
values; this model again converged on a single spike where
was extremely close to our maximum likelihood estimate. This unfortunately suggests that Bierne and Eyre-Walker's result in Drosophila must be viewed with caution. Furthermore, it implies that it will generally be difficult to test for heterogeneity in
, unless correlates of
can be found. One option may be to constrain the model further as Bustamante et al. (2005) have done.
| Acknowledgements |
|---|
|
|
|---|
The authors are grateful for the comments of several referees, John Welch, Brian Charlesworth, and Deborah Charlesworth. The authors are supported by the Biotechnology and Biological Sciences Research Council and the National Evolutionary Synthesis Center.
| Footnotes |
|---|
William Martin, Associate Editor
| References |
|---|
|
|
|---|
Akashi H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:92735.[Abstract]
Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L. 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol 2:e286.[CrossRef][Medline]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389402.
Andolfatto P. 2001. Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans. Mol Biol Evol 18:27990.
Begun D, Aquadro CF. 1993. African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365:54850.[CrossRef][Medline]
Betancourt AJ, Presgraves DC. 2002. Linkage limits the power of natural selection in Drosophila. Proc Natl Acad Sci USA 99:1361620.
Bierne N, Eyre-Walker A. 2004. The genomic rate of adaptive amino acid substitution in Drosophila. Mol Biol Evol 21:135060.
Blattner FR, Plunkett G III, Bloch CA et al. (17 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:145374.
Bustanante CD, Fledel-Anton A, Williamson SH et al. (14 co-authors). 2005. Natural selection on protein-coding genes in the human genome. Nature 437:115357.[CrossRef][Medline]
Charlesworth B. 1994. The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet Res 63:21327.[Web of Science][Medline]
Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:6987.[CrossRef][Medline]
Clark AG, Glanowski S, Nielsen R et al. (17 co-authors). 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:19603.
Drake JW. 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc Natl Acad Sci USA 88:71604.
Dunn KA, Bielawski JP, Yang Z. 2001. Substitution rates in Drosophila nuclear genes: implications for translational selection. Genetics 157:295305.
Endo T, Ikeo K, Gojobori T. 1996. Large-scale search for genes on which positive selection may operate. Mol Biol Evol 13:68590.[Abstract]
Eyre-Walker A, Keightley PD, Smith NG, Gaffney D. 2002. Quantifying the slightly deleterious mutation model of molecular evolution. Mol Biol Evol 19:21429.
Fay J, Wykcoff GJ, Wu C-I. 2001. Positive and negative selection on the human genome. Genetics 158:122734.
Fay J, Wykcoff GJ, Wu C-I. 2002. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415:10246.[CrossRef][Medline]
Glinka S, Ometto L, Mousset S, Stephan W, De Lorenzo D. 2003. Demography and natural selection have shaped genetic variation in Drosophila melanogaster: a multi-locus approach. Genetics 165:126978.
Gouy M, Gautier C. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:705574.
Hey J. 2001. Genes, categories, and species. New York: Oxford University Press.
Hughes AL. 2005. Evidence for abundant slightly deleterious polymorphisms in bacterial populations. Genetics 169:5338.
Jorde LB, Bamshad M, Rogers AR. 1998. Using mitochondrial and nuclear DNA markers to reconstruct human evolution. Bioessays 20:12636.[CrossRef][Web of Science][Medline]
Kumar S, Tamura K, Nei M. 2004. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5:15063.
Maynard Smith JM. 1992. Analyzing the mosaic structure of genes. J Mol Evol 34:1269.[Web of Science][Medline]
Maynard Smith JM, Smith NH, O'Rourke M, Spratt BG. 1993. How clonal are bacteria? Proc Natl Acad Sci USA 90:43848.
McClelland M, Sanderson KE, Spieth J et al. (26 co-authors). 2001. Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413:8526.[CrossRef][Medline]
McDonald JH, Kreitman M. 1991. Adaptive evolution at the Adh locus in Drosophila. Nature 351:6524.[CrossRef][Medline]
Moriyama EN, Powell JR. 1997. Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes. J Mol Evol 45:37891.[CrossRef][Web of Science][Medline]
Nei M. 2005. Selectionism and neutralism in molecular evolution. Mol Biol Evol 22:231842.
Nei M, Gojobori T. 1986. Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:41826.[Abstract]
Nielsen R. 2001. Statistical tests of selective neutrality in the age of genomics. Heredity 86:6417.[CrossRef][Web of Science][Medline]
Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299304.[CrossRef][Medline]
Orengo DJ, Aguade M. 2004. Detecting the footprint of positive selection in a European population of Drosophila melanogaster: multilocus pattern of variation and distance to coding regions. Genetics 167:175966.
Pal C, Papp B, Hurst LD. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:92731.
Perna NT, Plunkett G III, Burland V et al. (28 co-authors). 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:52933.[CrossRef][Medline]
Piganeau G, Gardner MJ, Eyre-Walker A. 2004. A broad survey of recombination in animal mitochondrial DNA. Mol Biol Evol 21:231925.
Plotkin JB, Dushoff J, Fraser HB. 2004. Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum. Nature 428:9425.
Posada D, Crandall KA. 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA 98:1375762.
Rambaut A. 1996. Se-Al: Sequence Alignment Editor. http://www.evolve.zoo.ox.ac.uk. Accessed Jan 10, 2004.
Rannala B, Yang Z. 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164:164556.
Riley M, Labedan B. 1996. Escherichia coli gene products: physiological properties and common ancestries. In: Neidhardt FC, editor. Escherichia coli and Salmonella typhimurium: cellular and molecular biology. Washington, DC: ASM Press. p 2118202.
Rocha EP, Danchin A. 2004. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:10816.
Ronald J, Akey JM. 2005. Genome-wide scans for loci under selection in humans. Hum Genomics 2:11325.[Medline]
Rozas J, Rozas R. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:1745.
Savageau MA. 1983. Escherichia coli habitats, cell types and molecular mechanisms of gene control. Am Nat 122:73244.[CrossRef]
Sawyer SA, Kulathinal RJ, Bustamante CD, Hartl DL. 2003. Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection. J Mol Evol 57(Suppl 1):S15464.
Selander RK, Caugant DA, Whittam TS. 1987. Genetic structure and variation in natural populations of Escherichia coli. In: Neidhardt FC, editor. Escherichia coli and Salmonella typhimurium: cellular and molecular biology. Washington, DC: ASM Press. p. 270820.
Selander RK, Li J, Nelson K. 1996. Evolutionary genetics of Salmonella enterica. In: Neidhardt FC, editor. Escherichia coli and Salmonella typhimurium: cellular and molecular biology. Washington, DC: ASM Press. p 2691707.
Sharp PM. 1991. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol 33:2333.[CrossRef][Web of Science][Medline]
Sharp PM, Li W-H. 1987. The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol Biol Evol. 4:22230.[Abstract]
Smith NGC, Eyre-Walker A. 2002. Adaptive protein evolution in Drosophila. Nature 415:10224.[CrossRef][Medline]
Stoletzki N, Welch J, Hermisson J, Eyre-Walker A. 2005. A dissection of volatility in yeast. Mol Biol Evol 22:20226.
Watterson GA. 1975. On the number of segregating sites. Theor Popul Biol 7:25676.[CrossRef][Web of Science][Medline]
Welch JJ. 2006. Estimating the genome-wide rate of adaptive protein evolution in Drosophila. Genetics. Forthcoming.
Welch RA, Burland V, Plunkett III G et al. (19 co-authors). 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA 99:170204.
Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, Gaut BS. 2005. The effects of artificial selection on the maize genome. Science 308:13104.
Yang Z. 2002. Phylogenetic analysis by maximum likelihood (PAML), version 3.13. United Kingdom: University College.
Yu N, Jensen-Seaman MI, Chemnick L, Ryder O, Li WH. 2004. Nucleotide diversity in gorillas. Genetics 166:137583.
Zhang L, Li W-H. 2005. Human SNPs reveal no evidence of frequent positive selection. Mol Biol Evol. 22:250407.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Lefebure and M. J. Stanhope Pervasive, genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter Genome Res., July 1, 2009; 19(7): 1224 - 1232. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Strasburg, C. Scotti-Saintagne, I. Scotti, Z. Lai, and L. H. Rieseberg Genomic Patterns of Adaptive Divergence between Chromosomally Differentiated Sunflower Species Mol. Biol. Evol., June 1, 2009; 26(6): 1341 - 1355. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Axelsson and H. Ellegren Quantification of Adaptive Evolution of Genes Expressed in Avian Brain and the Population Size Effect on the Efficacy of Selection Mol. Biol. Evol., May 1, 2009; 26(5): 1073 - 1079. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. I. Lucas-Lledo and M. Lynch Evolution of Mutation Rates: Phylogenomic Analysis of the Photolyase/Cryptochrome Family Mol. Biol. Evol., May 1, 2009; 26(5): 1143 - 1153. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N. Balakrishnan and S. V. Edwards Nucleotide Variation, Linkage Disequilibrium and Founder-Facilitated Speciation in Wild Populations of the Zebra Finch (Taeniopygia guttata) Genetics, February 1, 2009; 181(2): 645 - 660. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Novichkov, I. Ratnere, Y. I. Wolf, E. V. Koonin, and I. Dubchak ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D448 - D454. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Hughes, R. Friedman, P. Rivailler, and J. O. French Synonymous and Nonsynonymous Polymorphisms versus Divergences in Bacterial Genomes Mol. Biol. Evol., October 1, 2008; 25(10): 2199 - 2209. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Charlesworth and A. Eyre-Walker The McDonald-Kreitman Test and Slightly Deleterious Mutations Mol. Biol. Evol., June 1, 2008; 25(6): 1007 - 1015. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Gonzalez-Escalona, J. Martinez-Urtaza, J. Romero, R. T. Espejo, L.-A. Jaykus, and A. DePaola Determination of Molecular Phylogenetics of Vibrio parahaemolyticus Strains by Multilocus Sequence Typing J. Bacteriol., April 15, 2008; 190(8): 2831 - 2840. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Macpherson, G. Sella, J. C. Davis, and D. A. Petrov Genomewide Spatial Correspondence Between Nonsynonymous Divergence and Neutral Polymorphism Reveals Extensive Adaptation in Drosophila Genetics, December 1, 2007; 177(4): 2083 - 2099. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Charlesworth and A. Eyre-Walker The other side of the nearly neutral theory, evidence of slightly advantageous back-mutations PNAS, October 23, 2007; 104(43): 16992 - 16997. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Perfeito, L. Fernandes, C. Mota, and I. Gordo Adaptive Mutations in Bacteria: High Rate and Small Effects Science, August 10, 2007; 317(5839): 813 - 815. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Stoletzki and A. Eyre-Walker Synonymous Codon Usage in Escherichia coli: Selection for Translational Accuracy Mol. Biol. Evol., February 1, 2007; 24(2): 374 - 381. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||













