MBE Advance Access originally published online on June 25, 2008
Molecular Biology and Evolution 2008 25(9):1943-1953; doi:10.1093/molbev/msn143
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Frequent and Widespread Parallel Evolution of Protein Sequences

* Department of Biological Sciences, Vanderbilt University
Howard Hughes Medical Institute, R. M. Bock Laboratories, University of Wisconsin–Madison
E-mail: sbcarrol{at}wisc.edu.
| Abstract |
|---|
|
|
|---|
Understanding the patterns and causes of protein sequence evolution is a major challenge in evolutionary biology. One of the critical unresolved issues is the relative contribution of selection and genetic drift to the fixation of amino acid sequence differences between species. Molecular homoplasy, the independent evolution of the same amino acids at orthologous sites in different taxa, is one potential signature of selection; however, relatively little is known about its prevalence in eukaryotic proteomes. To quantify the extent and type of homoplasy among evolving proteins, we used phylogenetic methodology to analyze 8 genome-scale data matrices from clades of different evolutionary depths that span the eukaryotic tree of life. We found that the frequency of homoplastic amino acid substitutions in eukaryotic proteins was more than 2-fold higher than expected under neutral models of protein evolution. The overwhelming majority of homoplastic substitutions were parallelisms that involved the most frequently exchanged amino acids with similar physicochemical properties and that could be reached by a single-mutational step. We conclude that the role of homoplasy in shaping the protein record is much larger than generally assumed, and we suggest that its high frequency can be explained by both weak positive selection for certain substitutions and purifying selection that constrains substitutions to a small number of functionally equivalent amino acids.
Key Words: homoplasy positive selection selective constraint protein independent evolution
| Introduction |
|---|
|
|
|---|
Since the proposal of the neutral theory of molecular evolution (Kimura 1968
Furthermore, inferences of widespread positive selection have raised the issue of the relative magnitude of the selective effects of most individual fixed substitutions (Nei 2005
). Ample biochemical evidence (reviewed in DePristo et al. [2005
]) and population genetic data (Drake 2006
) indicate that most substitutions at most sites are deleterious. For the remaining minority of sites where substitutions are nondeleterious, typical estimates suggest that the strength of selection fixing most substitutions is very weak, such that 1 < Nes < 10 (Eyre-Walker 2006
; Sawyer et al. 2007
), where Ne is the effective population size and s the selection coefficient, and could be regarded as nearly neutral. Therefore, the physicochemical significance of most amino acid substitutions and the role of positive selection in their fixation remain unclear.
One alternative potential signature of positive selection is the independent evolution of identical molecular character states (nucleotide or amino acid residues) in different branches of a phylogenetic tree that are not directly inherited from a common ancestor (Zhang and Kumar 1997
; Zhang 2006
; Jost et al. 2008
). Such homoplastic character states may be derived from different ancestral states (convergent changes), from the same ancestral character state (parallel changes), or via the reversal of a derived character state to the ancestral one (reverse or back changes) (Li 1997
; Page and Holmes 1998
).
Homoplasy may arise due to the action of positive or balancing selection (Wells 1996
). Theoretical studies have suggested that the probability of parallel evolution under natural selection is nearly twice as large as that under neutrality (Orr 2005
). A variety of experimental studies have provided evidence that natural selection on the proteome can be manifested as homoplasy. For example, several experimental evolution studies examining the adaptations of viral populations on bacterial or eukaryotic hosts have uncovered striking examples of parallelism (Wichman et al. 2000
; Hughes et al. 2001
; Pinel-Galzi et al. 2007
), convergence (Bull et al. 1997
; Fares et al. 2001
), and reversal (Crill et al. 2000
; Depristo et al. 2007
). One study which examined the adaptation of 2 closely related phages to laboratory culture conditions found that a remarkable 62% of all substitutions were parallel (Wichman et al. 2000
). Great numbers of parallel and convergent homoplastic substitutions have also been uncovered in genes belonging to the major histocompatibility complex (Yeager and Hughes 1999
).
Alternatively, molecular homoplasy may occur simply by chance, through the action of neutral evolutionary processes (Zhang and Kumar 1997
). Because sequence evolution is a stochastic process and each site has a finite number of possible states (4 for nucleotide and 20 for amino acid residues), it is "expected" that independent evolutionary lineages will occasionally acquire the same character states independently (Zhang and Kumar 1997
). Given time, homoplasy (and divergence) is expected to increase up to some level of saturation, which is determined by a variety of factors such as mutational bias (Smith JM and Smith NH 1996
; Baer et al. 2007
), and rates of evolution (Felsenstein 1978
; Pupko and Galtier 2002
).
There is, however, yet another potential cause of homoplasy that may be underappreciated, the action of purifying selection. In nucleic acids or proteins, the number of possible states may be frequently smaller than 4 or 20 and the frequency of homoplasy higher because certain sites may be constrained by purifying selection such that only a fraction of all possible residues are allowed, such as only those amino acids sharing the same physicochemical properties (Naylor et al. 1995
). For example, it has been argued that the hydrophobic nature of most mitochondrial proteins has effectively constrained the character state space of their second codon positions to 1 of 2 states, C or T (Naylor et al. 1995
). Homoplasy, then, can also be the result of substitution constraints imposed by purifying selection.
Beyond particular case studies and the expectation of homoplasy due to random chance, the actual levels of homoplasy among evolving proteins have not been well characterized. For example, in an experiment, very similar in design to the phage studies cited above, the evolution of 12 initially identical Escherichia coli populations for 20,000 generations under the same selective pressure produced very low levels of molecular homoplasy (Woods et al. 2006
). Thus, the results of case studies may not be the best basis for predicting and assessing the levels of molecular homoplasy expected in natural populations, for reasons including but not limited to the near identity of selective environments used in experimental studies, the strength and continuity of selection (Bull et al. 1997
; Hughes 2007
), and the genome size and complexity of the organisms studied (Woods et al. 2006
).
Although molecular homoplasy has long been appreciated in evolution studies and much effort has been invested into understanding its causes and providing corrections for them (Felsenstein 2003
), relatively few studies have utilized homoplasy as a source of evidence about evolutionary hypotheses. For example, examination of patterns of amino acid variation at the Gpdh locus across Drosophila species identified 4 sites exhibiting such high levels of homoplasy that they accounted for approximately half of the substitutions "observed" (Wells 1996
). We too noted surprisingly high levels of homoplasy in an analysis of phylogenetic bushes (Rokas and Carroll 2006
), and studies examining noncoding and coding DNA have similarly revealed high levels of homoplasy (O'hUigin et al. 2002
; Bazykin et al. 2007
). For example, Bazykin et al. (2007)
found an elevated rate of parallel nonsynonymous substitutions in the genomes of mammals, Drosophila, and yeasts. Importantly, the underlying causes of this excess homoplasy have been attributed to several different factors including differences in mutation rates (O'hUigin et al. 2002
), the action of purifying selection (Wells 1996
; Rokas and Carroll 2006
), or the action of weak positive selection (Rokas and Carroll 2006
; Bazykin et al. 2007
).
In this study, we revisit the questions of the prevalence and underlying causes of homoplasy using a different methodological framework. Specifically, we employed a phylogenetic approach to conduct a systematic survey of the occurrence of homoplasy across 8 clades of the tree of life. Given evolutionary trees for these 8 clades, we measured the extent of observed homoplasy on each clade and compared the observed values with the homoplasy expected based on simulation analyses of the same trees. We also measured the types of amino acid substitutions that generate homoplasy, using an index (evolutionary index [EI], Tang et al. 2004
) that captures the evolutionary trends of amino acid exchangeability. We found that across these 8 clades protein sequences underwent more than twice as many homoplastic substitutions than was expected by neutral processes alone. The overwhelming majority of homoplastic amino acid substitutions were between amino acids with similar physicochemical properties. We suggest that these results are likely to be the evolutionary product of 2 different types of selection: weak positive selection for certain substitutions and purifying selection that constrains substitutions to a small number of functionally equivalent alternatives.
| Materials and Methods |
|---|
|
|
|---|
Data Matrix Generation
Data matrices from 8 representative clades of the eukaryotic tree of life were used to evaluate the observed and expected levels of homoplasy. Information about the clades and taxa is shown in table 1. All data matrices contained sequences from 4 taxa. The mammalian data matrix contained mitochondrial genes, whereas the other 7 data matrices contained nuclear genes. Two data matrices (Saccharomyces yeasts and land plants) were obtained from previously published studies (Rokas et al. 2003
|
Homoplasy Estimation Methodology
We estimated observed and expected homoplasy for all the data matrices using a modified version of a previously published methodology (Takezaki et al. 2004
|
For example, examination of the mammalian data matrix showed that pattern AABB was displayed by the largest number of amino acid sites (49) thus providing support for a grouping of whales with hippopotamuses (fig. 1B and C), a clade whose existence has been independently corroborated by multiple lines of phylogenetic evidence (Nikaido et al. 1999
To calculate the homoplasy expected under neutral conditions, the best-fit model of amino acid evolution for each of the 8 data matrices was selected using ProtTest (Abascal et al. 2005
). Given the amino acid sequence alignment for each data matrix, model selection was performed by evaluating the fit of 12 different alternative models of amino acid evolution according to the Akaike information criterion (Abascal et al. 2005
). Models of amino acid evolution are typically derived by counting observed amino acid substitutions in large sequence databases, making allowance for multiple substitutions and the phylogeny of the species used (Whelan et al. 2001
). Because the history of the proteins in the sequence databases used to construct models have been shaped by the effects of both selection and mutational biases (Thorne 2007
), the simulation conditions may not be strictly neutral. Thus, levels of expected homoplasy may actually be overestimated with our approach.
Using the parameters obtained from ProtTest, the maximum likelihood phylogeny was generated using Phyml (Guindon and Gascuel 2003
). Importantly, the phylogenetic relationships for all data matrices used in this study are unambiguous and our analyses confirm the results of several previous studies. To test whether accurate knowledge is a prerequisite for our analyses, we also examined 2 data matrices in which the true topology is ambiguous, 1 from vertebrates (Takezaki et al. 2004
) and 1 from Paenungulata mammals (Nishihara et al. 2005
), selecting as the "correct" topology the one that minimized the number of inferred homoplastic substitutions.
To calculate the expected average values for the 15 amino acid patterns under simulation, we generated 100 data sets of equal size as the original set using the amino acid evolution parameters and the maximum likelihood tree as inputs into the simulation software Evolver (part of the PAML software package, Yang 1997
). By calculating the number of expected homoplastic substitutions and dividing it by the number of expected total substitutions, we estimated the expected homoplasy for all the data matrices under study. For example, the expected homoplasy calculated from simulation analysis of the mammalian data matrix was 5% (fig. 1C and D). Thus, the fold difference between the observed homoplasy (12.3%) and that expected from simulation analysis (5%) was 2.5 (12.3/5.0%).
Assessing the Relative Contribution of Convergence, Parallelism, and Reversal to Homoplasy
Parallelisms, reversals, and convergences cannot be identified in 4-taxon data matrices. Thus, to assess the relative contribution of parallel, convergent, and reverse substitutions to homoplasy, we expanded 3 of the 10 four-taxon data matrices used in this study (Cetartiodactyl mammals, Saccharomyces yeasts, and Aspergillus filamentous ascomycetes) for which the species phylogenetic relationships are well supported (Nikaido et al. 1999
; Rokas et al. 2003
; Rokas and Galagan 2008
) by adding sequence data from several additional species. Using the parsimony criterion, we counted all substitutions occurring in all parsimony-informative sites that generated homoplasy via parallelisms, convergences, and reversals in these enlarged data matrices. All substitutions for which their state in the most immediate ancestor was ambiguous were ignored. In the rare cases where the same substitution contributed to 2 homoplastic types (e.g., a reversal and a parallelism), the substitution was counted as an equal split between the 2 types of events.
Any potential differences in the relative contribution of parallel, convergent, and reverse substitutions to homoplasy across the 3 clades could be the consequence of differences in the shape of each topology or, alternatively, due to bias of the excess homoplastic events toward specific types of homoplasy. To discriminate between the 2 alternatives, we conducted simulation analyses on the 3 expanded data matrices. We counted all substitutions occurring in the first 1,000 parsimony-informative sites from each simulated data matrix that generated homoplasy via parallelisms, convergences, and reversals. For both simulation calculations as well as for the counting of types of homoplasy, we used the same methodologies as above.
Classification of Homoplastic Amino Acid Substitutions according to the EI
Of the 190 possible interchanges among the 20 amino acids, only 75 can be attained via a single-nucleotide substitution. The remaining 115 substitutions require 2 or 3 single-nucleotide mutational steps. Among these 190 amino acid interchanges, some can be achieved much more easily than others, the ease of substitution being largely dependent on the mutational distance (determined by the genetic code and mutational biases) and the physicochemical distance between amino acids (Yang et al. 1998
; Tang et al. 2004
). To better understand which amino acid substitutions most commonly contribute to homoplasy, we employed the EI devised by Tang et al. (2004)
to classify all parsimony-informative patterns (AABB, ABAB, and ABBA) into 4 categories: 1) sites exhibiting the 12 most frequent single-mutational step amino acid substitutions (top12), 2) sites exhibiting the middle 51 most frequent single-mutational step substitutions (middle51), 3) sites exhibiting the 12 least frequent single-mutational step substitutions (bottom12), and 4) sites exhibiting the 115 amino acid substitutions requiring 2 or 3 mutational steps (multiple115). The EI was chosen because it has been shown to perform better than other measures such as PAM and Grantham's distance, and because its predictions hold well across genes and organisms (Tang et al. 2004
).
| Results |
|---|
|
|
|---|
Data Matrices
The 8 data matrices that we assembled to measure the extent of homoplasy in molecular data sets included a wide range of gene numbers and amino acid sites (table 2). The smallest set, the mitochondrial mammalian data matrix, was composed of just 12 genes and slightly more than 3,500 amino acid sites, whereas the largest sets were the metazoan and eukaryotic phyla data matrices containing 239 genes each and the 200-gene Aspergillus data matrix containing 99,204 amino acid sites (table 2). The percentages of variable and parsimony-informative sites also varied, ranging from 14% to 62% for variable sites and from 0.5% to 6.4% for parsimony-informative sites (table 2). The maximum likelihood phylogenetic tree for each clade is shown in the supplementary figure S1 (Supplementary Material online) and is in agreement with the published literature (Rokas et al. 2003
|
Levels of Observed and Excess Homoplasy
The observed, expected, and excess homoplasy values for the 8 data matrices are shown in table 3. Values of observed homoplasy ranged from 2.3% (for the land plant data matrix) to 12.3% (for the Cetartiodactyl mammalian data matrix), whereas the levels of expected homoplasy ranged from 0.9% to 5.0% for the same data matrices resulting in an excess homoplasy of 1.3–7.4% (table 3). Thus, the observed homoplasy among all data sets was consistently 1.9- to 3.2-fold greater than expected from the simulation analyses (table 3). Similar fold differences in homoplasy were observed in the data sets from clades whose phylogeny is ambiguous (Paenungulata mammals and vertebrates) (table 3).
|
Importantly, the excess homoplasy was not the resulter of a larger number of substitutions but the result of a specific increase of homoplastic substitutions. Comparison of the parsimony-informative substitutions that support the correct topology (i.e., examination of just the AABB sites in fig. 1) from both observed and expected sites across all data matrices reveals that the number of parsimony-informative substitutions in both cases are very similar (table 4). Similarly, comparison of the observed and expected sites for the remaining 12 nonparsimony-informative patterns revealed very small disagreements (data not shown). Only in the case of the Paenungulata, data matrix was the excess of substitutions very large, whereas in the case of the land plant data matrix, the expected number of parsimony-informative substitutions was actually slightly larger than the observed number (table 4). Examination of data sets from clades whose phylogeny is ambiguous revealed similar results (tables 1–5
|
|
Lack of Correlation between Homoplasy and the Amount of Evolution
It is widely held that levels of homoplasy are positively correlated with the total amount of evolution (Kallersjo et al. 1999
|
The Relative Contribution of Convergence, Parallelism, and Reversal to Homoplasy
The large number of homoplastic sites in these 4-taxon data sets presented the opportunity to assess which types of mutational events contributed to homoplasy. Because it was impossible to distinguish parallelisms from reversals and convergences in 4-taxon trees, we expanded 3 of the 10 data matrices by adding taxa. Previous experimental and simulation studies have indicated that parallelisms and reversals are much more frequent than convergences (Wells 1996
|
We noted that the relative contribution of each type of homoplastic event varied widely across the 3 data sets. For example, whereas 41% of homoplastic substitutions in the Saccharomyces yeasts were due to reversals, only 1% were due to reversals in Aspergillus filamentous ascomycetes (fig. 3). Examination of the fraction of parallelisms, reversals, and convergences observed with those expected from the simulation analyses for each of the 3 data matrices did not reveal any major differences, with the possible exception of the smaller than expected fraction of reversals and the larger than expected faction of parallelisms in the Aspergillus data matrix (fig. 3). Similarly, the expected fractions of homoplastic substitutions classified to different classes according to the EI (Tang et al. 2004
Excess Homoplasy Is Largely due to Substitutions between Frequently Exchangeable Amino Acids
To determine if certain amino acid substitutions contribute more often to homoplasy, we first classified all sites that exhibited parsimony-informative patterns into 1 of 4 substitution groups according to the EI (Tang et al. 2004
) (table 5). We found that, on average, 43.4% of parsimony-informative sites exhibited substitutions belonging to the frequently exchangeable top12 category and 76% of sites involved substitutions from the top12 and middle51 categories combined, whereas just 0.6% of sites exhibited substitutions between the rarely exchangeable amino acids in the bottom12 category. The remaining 22.8% of sites exhibited substitutions between amino acids 2 or 3 mutational steps away that belong to the multiple115 category.
We then examined the most common homoplastic amino acid substitutions in the 3 data matrices shown in figure 3. Strikingly, 65% of homoplasies involved substitutions from just the top12 category and 96% of all homoplastic substitutions belonged to the top12 and middle51 categories (fig. 3). The finding that substitutions in the top12 category were more numerous than substitutions in all other categories combined demonstrates that a very large fraction of all homoplastic substitutions are between amino acids with very similar physicochemical properties that can be reached via a single-mutational step.
| Discussion |
|---|
|
|
|---|
Understanding the extent and causes of homoplasy is important for understanding the processes that have sculpted the protein record. We used phylogenetic methodology to quantify the extent and type of homoplasy present on the eukaryotic proteome. We found that the frequency of homoplastic amino acid substitutions in eukaryotic proteins was on average 2.4-fold higher than would be expected under widely accepted models of protein evolution. Remarkably, this ratio is relatively stable across clades that differ by more than an order of magnitude in time of origin. In light of the diversity of proteins and taxa sampled, this consistency suggests that the levels of homoplasy observed reflect fundamental and general aspects of protein evolution. Indeed, we found that the majority of these homoplastic substitutions were between frequently exchangeable amino acids that are only one mutational step away and that only an extremely small fraction of substitutions involved rarely exchangeable amino acids. These results bear on our understanding of the role of selection in shaping protein evolution and the biological significance of amino acid sequence differences between species.
The Underlying Causes of Molecular Homoplasy
Two major explanations that have been proposed to account for the elevated levels of homoplasy are mutational rate differences (O'hUigin et al. 2002
) and selection (Wells 1996
; Bazykin et al. 2007
). For example, an examination of 51 primate loci revealed a weak positive correlation between the rate of substitution at individual genes and the degree of homoplasy, which argues for a mutation-based explanation of homoplasy (O'hUigin et al. 2002
). However, this trend did not hold for all loci (e.g., slowly evolving genes did not fit the pattern) and the inference rests on the assumption that substitution rates are a good proxy for mutation rates (O'hUigin et al. 2002
). Examination of our results from several different eukaryote clades revealed the opposite trend, with homoplasy actually decreasing slightly as the degree of substitution increased (fig. 2). The remarkably similar levels of excess homoplasy, examined in the light of orders-of-magnitude differences in mutation rates across the genes, and organisms used in this study (Baer et al. 2007
) make it highly unlikely that mutational rate differences are the principal explanation of excess homoplasy.
There are substantial grounds for considering the role of 2 modes of selection, both "positive" and "purifying" selection, in the generation of homoplasy. Theoretical work suggests that the probability of parallel evolution approximately doubles under positive selection, relative to neutral expectations (Orr 2005
), and experimental work has identified several genetic loci in which positive selection has resulted in the parallel evolution of identical amino acid residues in different lineages (Bull et al. 1997
; Yeager and Hughes 1999
; Jost et al. 2008
). Support for the role of weak positive selection in generating excess homoplasy was obtained in a recent study of coding sequences from mammals, yeasts, and Drosophila (Bazykin et al. 2007
). Comparison of the rate of nonsynonymous (dN) to synonymous (dS) substitutions with that of nonsynonymous parallel (dNP) to synonymous parallel (dSP) substitutions, revealed that the dNP/dSP ratio was approximately 5-fold higher than the dN/dS ratio (Bazykin et al. 2007
). By assuming that dSP and dS, the rate of synonymous parallel and synonymous substitutions, respectively, were good proxies for the rates of selectively neutral substitutions in the lineages examined, the elevated rates observed for dNP relative to dN could be attributed in part to the action of weak positive selection (Bazykin et al. 2007
).
But positive selection is not the only selective explanation for excess homoplasy. An alternative, but not mutually exclusive, explanation is raised by considering the effect of purifying selection. Purifying selection constrains the amino acid residues permitted at variable sites in protein sequences (Kimura 1983
; Wells 1996
; Naylor and Brown 1997
; Bazykin et al. 2007
). Some of the most common parallel substitutions in our data matrices involve amino acids with similar physicochemical properties, for example, valine and isoleucine (both hydrophobic and aliphatic) or aspartate and glutamate (both negatively charged and polar). For sites in protein sequences which can only accept amino acids with specific physicochemical properties, the substitution of one amino acid for its equivalent may be functionally neutral, whereas substitutions for all other, nonequivalent, amino acids will be deleterious. Thus, purifying selection on such sites makes the fixation of homoplastic substitutions more likely and frequent relative to other sites that tolerate a wider range of substitutions.
A large body of biophysical studies of protein structure and activity suggest that such constrained sites constitute a substantial fraction of all residues within and are widely dispersed throughout most proteins (reviewed in Pakula and Sauer [1989
]). Most phenotypically defective missense mutants do not affect protein activity directly but do so indirectly (Pakula and Sauer 1989
). Systematic replacement of amino acids within a variety of proteins has revealed that many mutations at many positions outside of active sites affect properties such as protein folding, stability, and aggregation (Pakula and Sauer 1989
; DePristo et al. 2005
). These functional studies, and our observation of the bias for parallel replacement of physicochemically similar amino acids, suggest that strong selective constraints operate not only upon the most conserved parts of proteins but also upon their nonconserved parts as well.
The Functional and Biological Significance of Molecular Homoplasy
What is the functional meaning of this abundance of homoplasy in eukaryotic proteomes? Although several cases of conservative amino acid substitutions resulting in parallel adaptation have been identified (Stewart et al. 1987
; Deeb et al. 2003
; Zhang 2006
), it is highly unlikely that the majority of parallel amino acid substitutions observed in the protein record has been driven by large selection coefficients. Several recent population genetic studies of model organisms (mainly Drosophila) have estimated that a significant fraction of amino acid substitutions in protein-coding genes (Smith and Eyre-Walker 2002
; Sawyer et al. 2003
; Begun et al. 2007
) has been driven by positive selection but that the magnitude of their selective effects is nearly neutral (Sawyer et al. 2007
). This statistical inference is supported by some lines of experimental evidence. For example, Saccharomyces cerevisiae strains differing by a handful to even scores of amino acid substitutions in a variety of proteins exhibit no significant differences in fitness in the most sensitive assays devised to date (Williams BL, Carroll SB, in preparation). Thus, our finding that a large fraction of homoplastic substitutions are conservative is consistent with emerging statistical and experimental data and suggests that many of these substitutions are either functionally equivalent or have been driven by very small selection coefficients and are thus unlikely to contribute to adaptation.
Excess homoplasy and the contribution of different modes of selection to its generation bear important implications for studies in phylogenetics and molecular evolution. On the one hand, excess homoplasy raises novel statistical challenges with the analysis of molecular data because a general lack of correspondence between an underlying model and actual evolutionary processes can lead to the failure of a statistical methodology (Naylor and Brown 1998
; Rokas and Carroll 2006
). On the other hand, the finding that the majority of homoplastic substitutions are conservative in nature signifies that, although statistically important, most of these homoplastic substitutions are not biologically meaningful in terms of shaping molecular function or organismal diversity (Nei 2005
).
| Supplementary Material |
|---|
|
|
|---|
Supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Barry L. Williams for insightful comments on the manuscript. Research in AR's laboratory is supported by the Searle Scholars Program and Vanderbilt University. SBC is an investigator of the Howard Hughes Medical Institute.
| Footnotes |
|---|
Kenneth Wolfe, Associate Editor
| References |
|---|
|
|
|---|
Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics (2005) 21:2104–2105.
Baer CF, Miyamoto MM, Denver DR. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet (2007) 8:619–631.[CrossRef][Web of Science][Medline]
Bazykin GA, Kondrashov FA, Brudno M, Poliakov A, Dubchak I, Kondrashov AS. Extensive parallelism in protein evolution. Biol Direct (2007) 2:20.[CrossRef][Medline]
Begun DJ, Holloway AK, Stevens K, et al, (13 co-authors). Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol (2007) 5:e310.[CrossRef][Medline]
Bull JJ, Badgett MR, Wichman HA, Huelsenbeck JP, Hillis DM, Gulati A, Ho C, Molineux IJ. Exceptional convergent evolution in a virus. Genetics (1997) 147:1497–1507.[Abstract]
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol (2000) 17:540–552.
Cracraft J, Donoghue MJ. Assembling the tree of life (2004) Oxford: Oxford University Press. 576.
Crill WD, Wichman HA, Bull JJ. Evolutionary reversals during viral adaptation to alternating hosts. Genetics (2000) 154:27–37.
Deeb SS, Wakefield MJ, Tada T, Marotte L, Yokoyama S, Marshall Graves JA. The cone visual pigments of an Australian marsupial, the tammar wallaby (Macropus eugenii): sequence, spectral tuning, and evolution. Mol Biol Evol (2003) 20:1642–1649.
Depristo MA, Hartl DL, Weinreich DM. Mutational reversions during adaptive protein evolution. Mol Biol Evol (2007) 24:1608–1610.
DePristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet (2005) 6:678–687.[CrossRef][Web of Science][Medline]
Drake JW. Chaos and order in spontaneous mutation. Genetics (2006) 173:1–8.
Eyre-Walker A. The genomic rate of adaptive evolution. Trends Ecol Evol (2006) 21:569–575.[CrossRef][Medline]
Fares MA, Moya A, Escarmis C, Baranowski E, Domingo E, Barrio E. Evidence for positive selection in the capsid protein-coding region of the foot-and-mouth disease virus (FMDV) subjected to experimental passage regimens. Mol Biol Evol (2001) 18:10–21.
Felsenstein J. Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool (1978) 27:401–410.
Felsenstein J. Inferring phylogenies (2003) Sunderland (MA): Sinauer.
Gatesy J, O'Leary MA. Deciphering whale origins with molecules and fossils. Trends Ecol Evol (2001) 16:562–570.[CrossRef]
Gaut BS, Lewis PO. Success of maximum likelihood phylogeny inference in the 4-taxon case. Mol Biol Evol (1995) 12:152–162.[Abstract]
Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol (2003) 52:696–704.
Hughes AL. Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level. Heredity (2007) 99:364–373.[CrossRef][Web of Science][Medline]
Hughes AL, Westover K, da Silva J, O'Connor DH, Watkins DI. Simultaneous positive and purifying selection on overlapping reading frames of the tat and vpr genes of simian immunodeficiency virus. J Virol (2001) 75:7966–7972.
Jost MC, Hillis DM, Lu Y, Kyle JW, Fozzard HA, Zakon HH. Toxin-resistant sodium channels: parallel adaptive evolution across a complete gene family. Mol Biol Evol (2008) 25:1016–1024.
Kallersjo M, Albert VA, Farris JS. Homoplasy increases phylogenetic structure. Cladistics (1999) 15:91–93.[Medline]
Kimura M. Evolutionary rate at the molecular level. Nature (1968) 217:624–626.[CrossRef][Web of Science][Medline]
Kimura M. The neutral theory of molecular evolution (1983) Cambridge: Cambridge University Press.
King JL, Jukes TH. Non-Darwinian evolution. Science (1969) 164:788–798.
Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet (2005) 39:309–338.[CrossRef][Web of Science][Medline]
Li W-H. Molecular evolution (1997) Sunderland (MA): Sinauer.
Naylor GJP, Brown WM. Structural biology and phylogenetic estimation. Nature (1997) 388:527–528.[CrossRef][Web of Science][Medline]
Naylor GJP, Brown WM. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol (1998) 47:61–76.
Naylor GJP, Collins TM, Brown WM. Hydrophobicity and phylogeny. Nature (1995) 373:565–566.[Medline]
Nei M. Selectionism and neutralism in molecular evolution. Mol Biol Evol (2005) 22:2318–2342.
Nikaido M, Rooney AP, Okada N. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interspersed elements: hippopotamuses are the closest extant relatives of whales. In: Proc Natl Acad Sci USA (1999) 96:10261–10266.
Nishihara H, Satta Y, Nikaido M, Thewissen JGM, Stanhope MJ, Okada N. A retroposon analysis of Afrotherian phylogeny. Mol Biol Evol (2005) 22:1823–1833.
O'hUigin C, Satta Y, Takahata N, Klein J. Contribution of homoplasy and of ancestral polymorphism to the evolution of genes in anthropoid primates. Mol Biol Evol (2002) 19:1501–1513.
Orr HA. The probability of parallel evolution. Evol Int J Org Evol (2005) 59:216–220.
Page RDM, Holmes EC. Molecular evolution: a phylogenetic approach (1998) Oxford (UK): Blackwell Science.
Pakula AA, Sauer RT. Genetic analysis of protein stability and function. Annu Rev Genet (1989) 23:289–310.[CrossRef][Web of Science][Medline]
Pinel-Galzi A, Rakotomalala M, Sangu E, et al, (14 co-authors). Theme and variations in the evolutionary pathways to virulence of an RNA plant virus species. PLoS Pathog (2007) 3:e180.[CrossRef][Medline]
Pollard DA, Iyer VN, Moses AM, Eisen MB. Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet (2006) 2:e173.[CrossRef][Medline]
Pupko T, Galtier N. A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc R Soc Lond Ser B Biol Sci (2002) 269:1313–1316.[Medline]
Rogozin IB, Thomson K, Csuros M, Carmel L, Koonin EV. Homoplasy in genome-wide analysis of rare amino acid replacements: the molecular-evolutionary basis for Vavilov's law of homologous series. Biol Direct (2008) 3:7.[CrossRef][Medline]
Rokas A, Carroll SB. Bushes in the tree of life. PLoS Biol (2006) 4:e352.[CrossRef][Medline]
Rokas A, Galagan JE. The Aspergillus nidulans genome and a comparative analysis of genome evolution in Aspergillus. In: The aspergilli: genomics, medical applications, biotechnology, and research methods—Osmani SA, Goldman GH, eds. (2008) Boca Raton (FL): CRC Press. 43–55.
Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature (2003) 425:798–804.[CrossRef][Web of Science][Medline]
Sanderson MJ, Driskell AC, Ree RH, Eulenstein O, Langley S. Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Mol Biol Evol (2003) 20:1036–1042.
Sawyer SA, Kulathinal RJ, Bustamante CD, Hartl DL. Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection. J Mol Evol (2003) 57(Suppl 1):S154–S164.[CrossRef][Web of Science][Medline]
Sawyer SA, Parsch J, Zhang Z, Hartl DL. Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila. Proc Natl Acad Sci USA (2007) 104:6504–6510.
Smith JM, Smith NH. Synonymous nucleotide divergence: what is "saturation"? Genetics (1996) 142:1033–1036.[Abstract]
Smith NG, Eyre-Walker A. Adaptive protein evolution in Drosophila. Nature (2002) 415:1022–1024.[CrossRef][Web of Science][Medline]
Stewart CB, Schilling JW, Wilson AC. Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature (1987) 330:401–404.[CrossRef][Web of Science][Medline]
Takahata N. Allelic genealogy and human evolution. Mol Biol Evol (1993) 10:2–22.[Abstract]
Takezaki N, Figueroa F, Zaleska-Rutczynska Z, Takahata N, Klein J. The phylogenetic relationship of tetrapod, coelacanth, and lungfish revealed by the sequences of 44 nuclear genes. Mol Biol Evol (2004) 21:1512–1524.
Tang H, Wyckoff GJ, Lu J, Wu CI. A universal evolutionary index for amino acid changes. Mol Biol Evol (2004) 21:1548–1556.
The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 437:69–87.[CrossRef][Web of Science][Medline]
Thompson JD, Higgins DG, Gibson TJ. Clustal-W—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.
Thorne JL. Protein evolution constraints and model-based techniques to study them. Curr Opin Struct Biol (2007) 17:337–341.[CrossRef][Web of Science][Medline]
Wells RS. Excessive homoplasy in an evolutionarily constrained protein. Proc R Soc Lond Ser B Biol Sci (1996) 263:393–400.[Medline]
Whelan S, Lio P, Goldman N. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet (2001) 17:262–272.[CrossRef][Web of Science][Medline]
Wichman HA, Scott LA, Yarber CD, Bull JJ. Experimental evolution recapitulates natural evolution. Philos Trans R Soc Lond Ser B Biol Sci (2000) 355:1677–1684.
Woods R, Schneider D, Winkworth CL, Riley MA, Lenski RE. Tests of parallel molecular evolution in a long-term experiment with Escherichia coli. Proc Natl Acad Sci USA (2006) 103:9107–9112.
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci (1997) 13:555–556.
Yang Z, Nielsen R, Hasegawa M. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol (1998) 15:1600–1611.[Abstract]
Yeager M, Hughes AL. Evolution of the mammalian MHC: natural selection, recombination, and convergent evolution. Immunol Rev (1999) 167:45–58.[CrossRef][Web of Science][Medline]
Zhang J. Parallel adaptive origins of digestive RNases in Asian and African leaf monkeys. Nat Genet (2006) 38:819–823.[CrossRef][Web of Science][Medline]
Zhang J, Kumar S. Detection of convergent and parallel evolution at the amino acid sequence level. Mol Biol Evol (1997) 14:527–536.[Abstract]
Zhang L, Li WH. Human SNPs reveal no evidence of frequent positive selection. Mol Biol Evol (2005) 22:2504–2507.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. A. Castoe, A. P. J. de Koning, H.-M. Kim, W. Gu, B. P. Noonan, G. Naylor, Z. J. Jiang, C. L. Parkinson, and D. D. Pollock From the Cover: Evidence for an ancient adaptive episode of convergent molecular evolution PNAS, June 2, 2009; 106(22): 8986 - 8991. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



