MBE Advance Access originally published online on September 29, 2005
Molecular Biology and Evolution 2006 23(2):240-244; doi:10.1093/molbev/msj026
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Apparent Trends of Amino Acid Gain and Loss in Protein Evolution Due to Nearly Neutral Variation
Department of Biological Sciences, University of Delaware
E-mail: mcdonald{at}udel.edu.
| Abstract |
|---|
|
|
|---|
It has recently been claimed that certain amino acids have been increasing in frequency in all living organisms for most of the history of life on earth, while other amino acids have been decreasing in frequency. Three lines of evidence have been offered for this assertion, but each has a more plausible alternative interpretation. Here I show that unequal patterns of gains and losses for particular pairs of amino acids (such as more leucine
phenylalanine than phenylalanine
leucine substitutions in humans and chimpanzees since they split from a common ancestor) are consistent with a simple neutral model at equilibrium amino acid frequencies. Unequal numbers of gains and losses for particular amino acids (such as more gains than losses of cysteine) are shown by simulations to be consistent with a model of nearly neutral evolution. Unequal numbers of gains and losses for particular amino acids in human polymorphism data are shown by simulations to be explainable by the nearly neutral model as well. In a comparison of protein sequences from four strains of Escherichia coli, polarized by one outgroup strain of Salmonella, the disparity in number of gains and losses for particular amino acids is strong in terminal branches but weaker or nonexistent in internal branches, which is inconsistent with the universal trend model but as expected under the nearly neutral model.
Key Words: protein evolution Markov chain nearly neutral model Escherichia coli
| Introduction |
|---|
|
|
|---|
Jordan et al. (2005)
| Substitutional Asymmetry of Pairs of Amino Acids |
|---|
|
|
|---|
Jordan et al. (2005)
In a two-allele system at equilibrium, there must be equal number of substitutions in each direction; a significant difference in the number of substitutions would indeed indicate that the allele frequencies were changing. However, there are 20 possible amino acids at any site. For a site with more than two possible alleles, equal numbers of substitutions in each direction would only be expected if the evolutionary process were a time-reversible Markov process with stationary amino acid frequencies (Liò and Goldman 1998
). In a reversible process, by definition the number of A
B substitutions per unit time is equal to the number of B
A substitutions, so that the process would look the same whether time was running forward or backward. Reversibility is usually assumed when inferring a matrix of substitution rates from pairwise comparisons of sequences (Dayhoff, Schwartz, and Orcutt 1978
; Jones, Taylor, and Thornton 1992
; S. Henikoff and J. G. Henikoff 2000
; Müller and Vingron 2000
; Goldman and Whelan 2002
; Veerassamy, Smith, and Tillier 2003
; Kosiol and Goldman 2005
). However, this assumption is made purely for mathematical convenience; there is no biological evidence for reversibility (Liò and Goldman 1998
).
With more than two alleles, it is easy to create neutral models of protein evolution that are not reversible. For example, consider a three-allele model in which the substitution rate from B to C (PBC) is 4 x 105 substitutions per generation and PAB = PAC = PBA = PCA = PCB = 1 x 105. Using this rate matrix in a Markov chain model until the amino acid frequencies become stationary, the frequencies of A, B, and C are fA = 0.333, fB = 0.167, and fC = 0.500. Because there are twice as many A as B amino acids and PAB = PBA, there will be twice as many A
B as B
A substitutions per unit time. Clearly, differences in the numbers of forward and reverse substitutions for pairs of amino acids are consistent with a simple neutral model at equilibrium and do not necessarily indicate changing amino acid frequencies.
| Unequal Numbers of Gains and Losses of Amino Acids |
|---|
|
|
|---|
The second line of evidence offered by Jordan et al. (2005)
One problem with using parsimony to infer gains and losses of amino acids is that the inferences are sometimes incorrect; when an outgroup and one ingroup have amino acid B and the other ingroup has A, there may have been two A
B substitutions, not one B
A. This is particularly likely if A is more common than B; there will be more inferred common
rare than rare
common substitutions, even when the actual number of substitutions in each direction is equal (Collins, Wimberger, and Naylor 1994
; Perna and Kocher 1995
). Jordan et al. (2005)
attempted to avoid this problem by using closely related species, but there can be substantially more inferred common
rare substitutions despite fairly small amounts of divergence (Eyre-Walker 1998
).
Sites which fit the nearly neutral model of protein evolution (Ohta 1992
) may be particularly likely to show more inferred substitutions in one direction than the other. Consider a site at which amino acid A is favored by natural selection. When a mutation to amino acid B occurs, it is selected against. If the selection against B is weak enough, B may remain present for a while before being replaced by A again. Under this model, there is a chance that when comparing sequences from an outgroup and two ingroup species, one of the ingroups will have the mildly deleterious B, while the other two species have the preferred A. This would be interpreted as a gain of B. On the other hand, an apparent loss of B, in which the outgroup and one of the ingroup species would both have the deleterious B while only one of the ingroup species has the preferred A, would be quite unlikely; it would require either two independent substitutions of the deleterious B, one in the outgroup and one in an ingroup, or a deleterious B that survived in two lineages since the common ancestor of all three species. Under this nearly neutral model, there could be many more apparent gains of B than losses, even if the ancestral state at sites that differ among the taxa is always inferred correctly. This may seem paradoxical, but the "missing" losses of B would occur at sites where a common ancestor of two species happened to have the mildly deleterious B, which was replaced by the favored A in both daughter lineages (fig. 1).
|
To illustrate how nearly neutral evolution could produce a difference in the number of gains and losses, I wrote a computer program to simulate mutation, drift, and selection for a two-allele locus, with allele A being preferred and allele B being deleterious. The population size was 25 diploid individuals, and the fitnesses of genotypes AA, AB, and BB were 1, 1 s, and 1 2s, respectively. The A
B and B
A mutation rates, µAB and µBA, were 0.0001 per generation. For each replicate, a single population was started with the initial state either fixed for A or fixed for B. The initial probability of being fixed for B, PB, was determined by setting the number of A
B substitutions equal to the number of B
A substitutions, PAµABuB = PBµBAuA, solving for PB, PB = uB/(uA + uB), and then using equation (10) of Kimura (1962)When 2Ns = 0 (the neutral model), the equilibrium frequency of B is 0.5; as the selection coefficient against B increases, the equilibrium frequency of B declines (fig. 2). The final frequency is the same as the initial frequency, indicating that there is no trend of changing allele frequencies in these simulations. The number of gains of B initially increases and then declines as the selection coefficient against B increases (fig. 3). The increase is presumably due to the increasing frequency of A (and thus increasing frequency of sites where B can be gained). The number of losses of B declines more rapidly; as selection against B intensifies and the average frequency of B decreases, it becomes unlikely that both the outgroup and one of the ingroups would have B at the end of the simulated generations (the pattern that is interpreted as a loss of B). As a result, nearly neutral evolution produces many more apparent gains than losses of a mildly deleterious allele. Although this simple two-allele model could be made more elaborate and realistic, the results seem sufficient to demonstrate that unequal numbers of gains and losses do not necessarily indicate changing allele frequencies.
|
|
| Nearly Neutral Human Polymorphisms |
|---|
|
|
|---|
The third line of evidence for a universal trend offered by Jordan et al. (2005)
To test whether nearly neutral evolution could produce more gains than losses for polymorphism data, I simulated evolution using the same model described above. The original population evolved for 1,000 generations, then it was duplicated and the outgroup and ingroup species evolved for 1,000 more generations. If the ingroup was polymorphic, one allele was sampled at random from the outgroup to infer which allele was gained and which was lost. The simulations were replicated 10,000 times for each selection coefficient.
The simulations show that for a broad range of selection coefficients, nearly neutral evolution results in a large number of differences between the number of gains and losses in polymorphism data (fig. 4). The results demonstrate that the unequal numbers of gains and losses in human polymorphism data found by Jordan et al. (2005)
do not necessarily indicate changing allele frequencies but may merely add to the evidence that many protein polymorphisms in humans are mildly deleterious, an interpretation that is amply supported by evidence from mitochondrial (Nachman et al. 1996
; Hasegawa, Cao, and Yang 1998
; Moilanen and Majamaa 2003
; Elson, Turnbull, and Howell 2004
; Ho et al. 2005
) and nuclear genes (Cargill et al. 1999
; Fay, Wyckoff, and Wu 2001
; Sunyaev et al. 2001
, 2003
; Hughes et al. 2003
; Williamson et al. 2005
).
|
| Nearly Neutral Polymorphisms in Escherichia coli |
|---|
|
|
|---|
One way to distinguish between the universal trend model and the nearly neutral model is to examine the pattern of gains and losses on a phylogeny with more than three taxa. If the long-term directional trend postulated by Jordan et al. (2005)
The data of Jordan et al. (2005)
and the unique substitutions show similar patterns; all differences between gains and losses are in the same direction, and 19 of the 20 amino acids show a larger difference for unique substitutions than for the data of Jordan et al. (2005)
(table 1). However, there is a marked difference between unique substitutions and shared substitutions in the patterns of gain and loss. For 18 amino acids, the shared substitutions have a normalized difference between gains and losses that is either smaller than or in the opposite direction to those seen at unique sites. Only 8 amino acids have a significant difference between gains and losses for shared substitutions, while 19 amino acids have a significant difference for unique substitutions. The stronger bias seen for unique gains and losses is inconsistent with the universal trend postulated by Jordan et al. (2005)
, which would predict the same amount of bias on all branches of a phylogeny, but it is expected under the nearly neutral model. This adds to the evidence that many protein polymorphisms in E. coli are mildly deleterious (Sawyer, Dykhuizen, and Hartl 1987
; Hughes 2005
). The bias for shared gains and losses seen for some amino acids may result from universal trends in amino acid composition, but they may also represent mildly deleterious alleles that are so nearly neutral that they have survived in more than one strain. The bias in shared gains and losses may also result from directional changes in amino acid composition due to positive selection, as has been repeatedly shown for related species of prokaryotes living in different environments (Haney et al. 1999
; McDonald, Grasso, and Rejto 1999
; McDonald 2001
; Nishio et al. 2003
; Di Giulio 2005
; Methe et al. 2005
).
|
| Acknowledgements |
|---|
|
|
|---|
I thank H. Akashi, B. C. Verrelli, and B. J. Wolpert for helpful comments on the manuscript.
| Footnotes |
|---|
William Martin, Associate Editor
| References |
|---|
|
|
|---|
Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:33893402.
Blattner, F. R., G. Plunkett, C. A. Bloch et al. (16 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:14531474.
Cargill, M., D. Altshuler, J. Ireland et al. (17 co-authors). 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231238.[CrossRef][ISI][Medline]
Collins, T. M., P. H. Wimberger, and G. J. P. Naylor. 1994. Compositional bias, character-state bias, and character-state reconstruction using parsimony. Syst. Biol. 47:482496.[CrossRef]
Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in proteins. Pp. 345352 in M. O. Dayhoff, ed. Atlas of protein sequence and structure. Volume 5, Supplement 2. National Biomedical Research Foundation, Washington, D.C.
Di Giulio, M. 2005. A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. Gene 346:16.[CrossRef][ISI][Medline]
Elson, J. L., D. M. Turnbull, and N. Howell. 2004. Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection. Am. J. Hum. Genet. 74:229238.[CrossRef][ISI][Medline]
Eyre-Walker, A. 1998. Problems with parsimony in sequences of biased base composition. J. Mol. Evol. 47:686690.[CrossRef][ISI][Medline]
Fay, J. C., G. J. Wyckoff, and C.-I. Wu. 2001. Positive and negative selection on the human genome. Genetics 158:12271234.
Goldman, N., and S. Whelan. 2002. A novel use of equilibrium frequencies in models of sequence evolution. Mol. Biol. Evol. 19:18211831.
Guttman, D. S., and D. E. Dykhuizen. 1994. Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science 266:13801383.
Haney, P. J., J. H. Badger, G. L. Buldak, C. I. Reich, C. R. Woese, and G. J. Olsen. 1999. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc. Natl. Acad. Sci. USA 96:35783583.
Hasegawa, M., Y. Cao, and Z. H. Yang. 1998. Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol. Biol. Evol. 15:14991505.
Henikoff, S., and J. G. Henikoff. 2000. Amino acid substitution matrices. Adv. Protein Chem. 54:7397.[ISI][Medline]
Ho, S. Y. W., M. J. Phillips, A. Cooper, and A. J. Drummond. 2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22:15611568.
Hughes, A. L. 2005. Evidence for abundant slightly deleterious polymorphisms in bacterial populations. Genetics 169:533538.
Hughes, A. L., B. Packer, R. Welch, A. W. Bergen, S. J. Chanock, and M. Yeager. 2003. Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc. Natl. Acad. Sci. USA 100:1575415757.
Jin, Q., Z. H. Yuan, J. G. Xu et al. (32 co-authors). 2002. Genome sequence of Shigella flexneri 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Res. 30:44324441.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282.
Jordan, I. K., F. A. Kondrashov, I. A. Adzhubei, Y. I. Wolf, E. V. Koonin, A. S. Kondrashov, and S. Sunyaev. 2005. A universal trend of amino acid gain and loss in protein evolution. Nature 433:633638.[CrossRef][Medline]
Kimura, M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47:713719.
Kosiol, C., and N. Goldman. 2005. Different versions of the Dayhoff rate matrix. Mol. Biol. Evol. 22:193199.
Liò, P., and N. Goldman. 1998. Models of molecular evolution and phylogeny. Genome Res. 8:12331244.
McDonald, J. H. 2001. Patterns of temperature adaptation in proteins from the bacteria Deinococcus radiodurans and Thermus thermophilus. Mol. Biol. Evol. 18:741749.
McDonald, J. H., A. M. Grasso, and L. K. Rejto. 1999. Patterns of temperature adaptation in proteins from Methanococcus and Bacillus. Mol. Biol. Evol. 16:17851790.[Abstract]
McGraw, E. A., J. Li, R. K. Selander, and T. S. Whittam. 1999. Molecular evolution and mosaic structure of
, ß, and
intimins of pathogenic Escherichia coli. Mol. Biol. Evol. 16:1222.[Abstract]
Methe, B. A., K. E. Nelson, J. W. Deming et al. (24 co-authors). 2005. The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. Proc. Natl. Acad. Sci. USA 102:1091310918.
Moilanen, J. S., and K. Majamaa. 2003. Phylogenetic network and physicochemical properties of nonsynonymous mutations in the protein-coding genes of human mitochondrial DNA. Mol. Biol. Evol. 20:11951210.
Müller, T., and M. Vingron. 2000. Modeling amino acid replacement. J. Comput. Biol. 7:761776.[CrossRef][ISI][Medline]
Nachman, M. W., W. M. Brown, M. Stoneking, and C. F. Aquadro. 1996. Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142:953963.[Abstract]
Nishio, Y., Y. Nakamura, Y. Kawarabayasi et al. (11 co-authors). 2003. Comparative complete genome sequence analysis of the amino acid replacements responsible for the thermostability of Corynebacterium efficiens. Genome Res. 13:15721579.
Ohta, T. 1992. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23:263286.[CrossRef][ISI]
Parkhill, J., G. Dougan, K. D. James et al. (40 co-authors). 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848852.[CrossRef][Medline]
Perna, N. T., and T. D. Kocher. 1995. Unequal base frequencies and estimation of substitution rates. Mol. Biol. Evol. 12:359361.[ISI]
Perna, N. T., G. Plunkett, V. Burland et al. (27 co-authors). 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529533.[CrossRef][Medline]
Sawyer, S. A., D. E. Dykhuizen, and D. L. Hartl. 1987. Confidence interval for the number of selectively neutral amino acid polymorphisms. Proc. Natl. Acad. Sci. USA 84:62256228.
Sunyaev, S., V. Ramensky, I. Koch, W. Lathe, A. S. Kondashov, and P. Bork. 2001. Prediction of deleterious human alleles. Hum. Mol. Genet. 10:591597.
Sunyaev, S., F. A. Kondrashov, P. Bork, and V. Ramensky. 2003. Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12:33253330.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL-W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.
Veerassamy, S., A. Smith, and E. R. M. Tillier. 2003. A transition probability model for amino acid substitutions from blocks. J. Comput. Biol. 10:9971010.[CrossRef][ISI][Medline]
Welch, R. A., V. Burland, G. Plunkett et al. (18 co-authors). 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl. Acad. Sci. USA 99:1702017024.
Williamson, S. H., R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamente. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA 102:78827887.
Zuckerkandl, E., J. Derancourt, and H. Vogel. 1971. Mutational trends and random processes in the evolution of informational macromolecules. J. Mol. Biol. 59:473490.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. A. Goldstein and D. D. Pollock Observations of Amino Acid Gain and Loss during Protein Evolution Are Explained by Statistical Bias Mol. Biol. Evol., July 1, 2006; 23(7): 1444 - 1449. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




