Skip Navigation

This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (34)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lockhart, P. J.
Right arrow Articles by Steel, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lockhart, P. J.
Right arrow Articles by Steel, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Molecular Biology and Evolution 17:835-838 (2000)
© 2000 Society for Molecular Biology and Evolution


Letter to the Editor

How Molecules Evolve in Eubacteria

Peter J. LockhartGo,*, Daniel Huson{ddagger}, Uwe Maier{ddagger}, Martin J. Fraunholz{ddagger}, Yves Van de Peer§, Adrian C. Barbrook||, Christopher J. Howe|| and Mike A. Steel

*Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand;
{dagger}Program in Applied and Computational Mathematics, Princeton University;
{ddagger}Fachbereich Biologie, Zellbiologie und Angewandte Botanik, Philipps-Universität Marburg, Marburg, Germany;
§Fakultät Biologie, Evolutionsbiologie, Universität Konstanz, Konstanz, Germany;
||Department of Biochemistry and Cambridge Centre for Molecular Recognition, University of Cambridge, Cambridge, England;
¶Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.

A fundamental assumption in building evolutionary trees is that processes of change are constant across the tree of life (Li and Gu 1996;Citation Swofford et al. 1996Citation ). Despite this universal view, it is now clear that nucleotide compositions, amino acid compositions (e.g., Lanave et al. 1984;Citation Sueoka 1988;Citation Hasegawa and Hashimoto 1993Citation ; Barbrook, Lockhart, and Howe 1998Citation ; Forster and Hickey 1999Citation ; Lockhart et al. 1999Citation ), and, as we demonstrate here for eubacterial sequences, the distribution of sites in sequences that can accept substitutions may change over time.

We investigated anciently diverged eubacterial sequences using a simple linear dissimilarity measure (dlcov) that was sensitive to the type of variable sequence evolution predicted by a covarion/covariotide model (a model of evolution in which the same sequence positions are free to substitute in some taxa but not in others). Since tree-building properties of dlcov differ under covarion/covariotide and rates-across-sites models, dlcov allowed us to test for evidence of covarion/covariotide evolution in eubacterial sequences. Our analyses demonstrated that evolving distributions of variable sites in molecules provide support for deep-branching patterns in phylogenies reconstructed for eubacterial trees of life. This finding joins growing evidence supporting the covarion/covariotide evolution of sequences (Fitch and Markowitz 1970Citation ; Lockhart et al. 1996, 1998;Citation Phillippe and Laurent 1998;Citation Germot and Philippe 1999Citation ; Lopez, Forterre, and Philippe 1999Citation ; Moreira, Guyader, and Philippe 1999Citation ; Philippe et al. 2000;Citation Steel, Huson, and Lockhart 2000Citation ).

Given two monophyletic groups of taxa, the site patterns found in an alignment of sequences can be described in terms of five classes (Lockhart et al. 1998Citation ). Two of these are used in calculating dlcov. Let N3 denote the number of sites that are unvaried in the first group but varied in the second group, and let N4 denote the number of sites that are unvaried in the second group but varied in the first. Let N denote the total number of sites. Thus,


is the proportion of sites varied in one group but not the other. We describe exactly the expected value of dlcov under two models—a model in which there is a distribution of rates across sites (RAS), and a covarion-style model of the type described and analyzed recently by Tuffley and Steel (1998)Citation . Under this latter model, the following nonlinear dissimilarity measure converges (with increasing sequence length) to an additive measure that is proportional to the evolutionary distance between the groups:


where N5 is the number of sites that are varied in both groups.

At variable positions under both the RAS and the covarion-style models, we assume that the underlying mechanism of nucleotide substitution is described by the Kimura 3ST model (or some submodel). The results are expected to be similar under other models of nucleotide substitution but somewhat more difficult to analyze. Under either the RAS or the covarion-style model, the expected value of Nk/N is pk - pij, where pk is the probability that the site is varied among the sequences in group k {i, j}, and where pij is the probability that the site is varied among the sequences in both groups. Thus, if we let eij denote the expected value of dlcov (under either model), then eij = pi + pj - 2pij. Consequently, under an RAS model, we have:


where P[Ek | {lambda}] is the probability that the sequences in group k are varied at a site evolving at rate {lambda}, and the integration is performed with respect to the distribution of rates across sites. Note that if the sites all evolve at the same rate, pij = pipj.

For the covarion-style model described in Tuffley and Steel (1998)Citation , lemma 7 of that paper shows that

where b and c are positive constants (dependent only on the switching rates between "variable" and "invariable" states under the model), {tau}ij is the evolutionary distance between groups i and j, and xk = P[Ek | var] - P[Ek | inv], where P[Ek | var] (respectively, P[Ek | inv]) is the probability that a site is varied for the sequences in group k {i, j}, given that it is variable (respectively, invariable) at the root vertex of this group in the underlying tree.

In comparing formulae (1) and (2) for eij under the two models, we note that equation (1) does not involve the evolutionary distance {tau}ij between the groups. Hence, under an RAS model, we cannot expect dlcov to extract phylogenetic signal. However, eij increases monotonically with {tau}ij for the covarion-style model (eq. 2) and therefore is a (nonlinear) measure of the phylogenetic distance between the groups. Thus, to a first approximation, an expectation is that the dlcov values should fit a star phylogeny under an RAS model. Under a suitable covarion/covariotide-style model (and with {tau}ij small and monophyletic groups of similar diversity), the expectation is that dlcov will fit the underlying bifurcating tree. We tested if dlcov would allow the recovery of tree shapes similar to the model tree when sequences evolved under a non–covarion/covariotide model. Hence, for sequences of finite length (c = 100, 200, 300, 400, and 500), we simulated the evolution of five groups of sequences (each containing four sequences) on a bifurcating tree under Jukes-Cantor and RAS models (gamma law distribution of rates with shape parameters 0.5, 1, and 1.5), where the numbers of expected substitutions per site were set to 0.2 for the internal edges and to 0.1 for the external ones. For all combinations of parameters, we generated 100 different data sets. To each such data set we then reconstructed splitsgraphs (Bandelt and Dress 1992Citation ; Huson 1998Citation ) using (1) dlcov and (2) traditional distance measures, corrected according to the model used to simulate the data. Unlike the model transformation, dlcov tended to produce a splitsgraph that did not favor a particular bifurcating tree. Next, we applied dlcov and split decomposition to five different eubacterial tree of life data sets. For the analyses carried out, sequences were sampled from eubacterial groups (e.g., oxygenic photosynthesis, low G+C gram positives, etc.) so as to cover as much of the genetic diversity of each group as possible yet also maintain a hierarchical structure within each group. These steps were carried out in an attempt to identify diverse sequences showing the most conserved group structure. Sequences whose presence produced unresolved trifurcations between basal lineages within groups were excluded, since these perturbed the treelike properties of both dlcov and dcov (i.e., the splitsgraphs became boxlike). Groups that were poorly sampled with shallow divergences were also avoided. The list of taxa used, along with the alignments, are available from http://www.massey.ac.nz/~imbs/Research/MolEvol/Farside/ Plants.html.

For each data set, figure 1 shows (1) unweighted bootstrap neighbor-joining trees (obtained using PAUP, version 4; Swofford 1999Citation ) recovered using uncorrected (Hamming) distances and (2) split decomposition graphs (obtained using Splitstree, version 3.1; Huson 1997Citation ) recovered using dlcov. Since split decomposition makes no assumption that data fit a bifurcating tree, it provides a conservative test for identifying covarion/covariotide support for splits which occur in the neighbor-joining trees.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1.—Neighbor-joining trees (left) and splitsgraphs (right) for five eubacterial data sets. Total sequence (Hamming) differences were used in construction of the neighbor-joining trees, and group dissimilarity measures were used in reconstructing the splitsgraphs. With protein sequences, dij = dlcov = (N3 + N4)/N. For 16S rDNA, dij = (xN2 + N3 + N4)/N, where weightings for x = 2–4 gave the bifurcating graph shown. Gamma proteobacterial groups 1 and 2 correspond to strongly supported splits in the Hamming distance/neighbor-joining trees

 
Comparisons of the neighbor-joining trees and splitsgraphs for protein sequences indicate that the distributions of N3 and N4 patterns in the different data sets give rise to treelike distances for dlcov and splits that correspond to those recovered most strongly in the neighbor-joining trees (e.g., the splits between the {alpha} and {gamma} proteobacteria and between the proteobacteria and other groups). These observations are explained if sequences belonging to the different monophyletic groups differ in their distributions of variable sites and if these differences provide support for the treelike structures recovered by tree-building algorithms such as neighbor joining.

Less support is provided by N3 + N4 patterns in the 16S rDNA sequences studied here. With these data, the expected phylogenetic–neighbor- joining 16SrDNA tree is recovered only if we include in our dissimilarity measure an additional pattern class N2 (i.e., sites at which the character states are different between the two groups and unvaried within each group). The evolution of these patterns is equally well described by covarion and noncovarion models. Thus, with rDNA, while there is evidence for covarionlike patterns of evolution in this molecule (Lockhart et al. 1998Citation ), the extent to which these contribute to the inferred phylogenetic relationship between major eubacterial groups is less clear.

It is reassuring that the strongest splits recovered in our protein splitsgraphs reconstructed using dlcov are found with different eubacterial data sets and are also recovered using the nonlinear covarion transform dcov (figures not shown), suggesting a common evolutionary history for these different molecules. However, the extent to which asymmetric processes of change may be convergent (and potentially misleading for phylogeny reconstruction) across more widely sampled groups in trees of life is a question that requires further study. Biased amino acid and nucleotide compositions can be convergent (Barbrook, Lockhart, and Howe 1998Citation ; Forster and Hickey 1999Citation ; Lockhart et al. 1999Citation ), and they are known to cause a problem for phylogeny reconstruction when sequences accepting biased substitutions also share similar distributions of varying sites (Lockhart et al. 1998Citation ). Although changes in distributions of variable sites may help to "fossilize" phylogenetic history in sequences (Lopez, Forterre, and Philippe 1999Citation ), some changes may cause problems for tree building. This can occur if the proportion of variable sites in sequences increases independently in different lineages (e.g., Lockhart et al. 1998;Citation Philippe and Laurent 1998Citation ; Germot and Philippe 1999Citation ; Steel, Huson, and Lockhart 2000Citation ). In this case, the data can be described by the type of inconsistency phenomena discussed by Felsenstein (1978)Citation . Such processes have been suggested to mislead outgroup placement with duplicated genes (Lockhart et al. 1996;Citation Philippe and Forterre 1999Citation ) and also to mislead the divergence order of eukaryotes (Germot and Philippe 1999Citation ; Philippe et al. 2000Citation ). These results and those we report here highlight the need for improving our understanding of the biochemical basis for processes of asymmetrical change in sequence evolution. This knowledge would surely help provide confidence in the phylogenetic inference of ancient divergences.

A final point is that we do not propose dlcov as an additive distance measure for building evolutionary trees. The measure is not expected to extract all the useful information present in the sequences, and, as we have pointed out, observations on diverse data sets suggest that the evolution of some sequences occurs by covarion processes which are nonstationary. This is a phenomenon which is difficult to model.


    Acknowledgements
 TOP
 Acknowledgements
 literature cited
 
We acknowledge support from the Alexander von Humboldt Foundation, the Deutsche Forschungsgemeinschaft, the New Zealand Marsden Fund, the New Zealand/German co-operation agreement, the Broodbank Fund, and the BBSRC.


    Footnotes
 
Masami Hasegawa, Reviewing Editor

1 Keywords: covarion covariotide nonstationarity split decomposition Back

2 Address for correspondence and reprints: Peter J. Lockhart, Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand. Back


    literature cited
 TOP
 Acknowledgements
 literature cited
 

    Bandelt, H. J., and A. W. M. Dress. 1992. Split decomposition: a new and useful approach to phylogenetic distance data. Mol. Phylogenet. Evol. 1:242–252.[Medline]

    Barbrook, A. C., P. J. Lockhart, and C. J. Howe. 1998. Phylogenetic analysis of plastid origins based on SecA sequences. Curr. Genet. 34:336–341.[Web of Science][Medline]

    Felsenstein, J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27:401–410.

    Fitch, W. F., and E. Markowitz. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579–593.[Web of Science][Medline]

    Forster, P. G., and D. A. Hickey. 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284–290.[Web of Science][Medline]

    Germot, A., and H. Philippe. 1999. Critical analysis of eukaryotic phylogeny: a case study based on on the HSP70 family. J. Eukaryot. Microbiol. 46:116–124.[Web of Science][Medline]

    Hasegawa, M., and T. Hashimoto. 1993. Ribosomal RNA trees misleading? Nature 361:23.

    Huson, D. 1998. SplitsTree: a program for analyzing and visualizing evolutionary data. Bioinformatics 14:68–73.

    Lanave, C., G. Preparata, C. Saccone, and G. J. Serio. 1984. A new method for calculating evolutionary substitution rates. J. Mol. Evol. 20:86–93.[Web of Science][Medline]

    Li, W. H., and X. Gu. 1996. Estimating evolutionary distances between DNA sequences. U.K. edition, London.

    Lockhart, P. J., A. W. D. Larkum, M. A. Steel, P. J. Waddell, and D. Penny. 1996. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc. Natl. Acad. Sci. USA 93:1930–1934.

    Lockhart, P. J., M. A. Steel, A. C. Barbrook, D. H. Huson, and C. J. Howe. 1998. A covariotide model describes the evolution of oxygenic photosynthesis. Mol. Biol. Evol. 15:1183–1188.[Abstract]

    Lockhart, P. J., C. J. Howe, A. C. Barbrook, A. W. D. Larkum, and D. Penny. 1999. Spectral analysis, systematic bias, and the evolution of chloroplasts. Mol. Biol. Evol. 16:573–576.[Web of Science]

    Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49:496–508.[Web of Science][Medline]

    Moreira, D., H. L. Guyader, and H. Philippe. 1999. Unusually high evolutionary rate of the elongation factor 1a genes from the ciliophora and its impact on the phylogeny of eukaryotes. Mol. Biol. Evol. 16:234–245.[Abstract]

    Philippe, H., and P. Forterre. 1999. The rooting of the universal tree of life is not reliable. J. Mol. Evol. 49:509–523[Web of Science][Medline]

    Philippe, H., and J. Laurent. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8:616–623.[Web of Science][Medline]

    Philippe, H., P. Lopez, H. Brinkman, K. Budin, A. Germot, J. Laurent, D. Moreira, M. Müller, and H. LeGuyader. 2000. Tree reconstruction and the phylogeny of the eukaryotes. Proc. Natl. Acad. Sci. USA (in press).

    Steel, M. A., D. Huson, and P. J. Lockhart. 2000. Syst. Biol. (in press).

    Sueoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653–2657.

    Swofford, D. L. 1999. PAUP. Version 4.65. Sinauer, Sunderland, Mass.

    Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407–514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. Sinauer, Sunderland, Mass.

    Tuffley, C., and M. A. Steel. 1998. Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147:63–91.[Web of Science][Medline]

Accepted for publication January 18, 2000.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Phil Trans R Soc BHome page
C.J Howe, A.C Barbrook, R.E.R Nisbet, P.J Lockhart, and A.W.D Larkum
The origin of plastids
Phil Trans R Soc B, August 27, 2008; 363(1504): 2675 - 2685.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. White, S. Hills, R Gaddam, B. Holland, and D. Penny
Treeness Triangles: Visualizing the Loss of Phylogenetic Signal
Mol. Biol. Evol., September 1, 2007; 24(9): 2029 - 2039.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
H.-C. Wang, M. Spencer, E. Susko, and A. J. Roger
Testing for Covarion-like Evolution in Protein Sequences
Mol. Biol. Evol., January 1, 2007; 24(1): 294 - 305.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
D. C. Marshall, C. Simon, and T. R. Buckley
Accurate Branch Length Estimation in Partitioned Bayesian Analyses Requires Accommodation of Among-Partition Rate Variation and Attention to Branch Length Priors
Syst Biol, December 1, 2006; 55(6): 993 - 1003.
[Full Text] [PDF]


Home page
Mol Biol EvolHome page
P. Lockhart, P. Novis, B. G. Milligan, J. Riden, A. Rambaut, and T. Larkum
Heterotachy and Tree Building: A Case Study with Plastids and Eubacteria
Mol. Biol. Evol., January 1, 2006; 23(1): 40 - 45.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
P. Lockhart and M. Steel
A Tale of Two Processes
Syst Biol, December 1, 2005; 54(6): 948 - 951.
[Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Rokas and S. B. Carroll
More Genes or More Taxa? The Relative Contribution of Gene Number and Taxon Number to Phylogenetic Accuracy
Mol. Biol. Evol., May 1, 2005; 22(5): 1337 - 1344.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Ane, J. G. Burleigh, M. M. McMahon, and M. J. Sanderson
Covarion Structure in Plastid Genome Evolution: A New Statistical Test
Mol. Biol. Evol., April 1, 2005; 22(4): 914 - 924.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. Blouin, D. Butt, and A. J. Roger
Impact of Taxon Sampling on the Estimation of Rates of Evolution at Sites
Mol. Biol. Evol., March 1, 2005; 22(3): 784 - 791.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
A. P. Vogler, A. Cardoso, and T. G. Barraclough
Exploring Rate Variation Among and Within Sites in a Densely Sampled Tree: Species Level Phylogenetics of North American Tiger Beetles (Genus Cicindela)
Syst Biol, February 1, 2005; 54(1): 4 - 20.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
R. A. Jenner
Accepting Partnership by Submission? Morphological Phylogenetics in a Molecular Millennium
Syst Biol, April 1, 2004; 53(2): 333 - 359.
[Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Meyer and A. von Haeseler
Identifying Site-Specific Substitution Rates
Mol. Biol. Evol., February 1, 2003; 20(2): 182 - 189.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. Susko, Y. Inagaki, C. Field, M. E. Holder, and A. J. Roger
Testing for Differences in Rates-Across-Sites Distributions in Phylogenetic Subtrees
Mol. Biol. Evol., September 1, 2002; 19(9): 1514 - 1523.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Galtier
Maximum-Likelihood Phylogenetic Analysis Under a Covarion-like Model
Mol. Biol. Evol., May 1, 2001; 18(5): 866 - 873.
[Abstract] [Full Text]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (34)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lockhart, P. J.
Right arrow Articles by Steel, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lockhart, P. J.
Right arrow Articles by Steel, M. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?