Skip Navigation


MBE Advance Access originally published online on May 19, 2008
Molecular Biology and Evolution 2008 25(8):1677-1682; doi:10.1093/molbev/msn117
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/8/1677    most recent
msn117v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lake, J. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lake, J. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Reconstructing Evolutionary Graphs: 3D Parsimony

James A. Lake1

Department of Molecular, Cellular, and Developmental Biology, University of California, Los Angeles; Molecular Biology Institute, University of California, Los Angeles; Department of Human Genetics, University of California, Los Angeles; and UCLA Astrobiology Institute, University of California, Los Angeles

E-mail: lake{at}mbi.ucla.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Theory
 Discussion
 Acknowledgements
 References
 
The increasing recognition that symbioses have greatly altered evolution through genome fusions is creating a need for algorithms that can reliably detect and reconstruct fusions. Here, we generalize the bootstrappers gambit algorithm (a quartet method) in order to permit it to analyze both bifurcations and fusions under a single mathematical model, and thereby detect past genomic branchings and endosymbioses. This new method, 3-dimensional parsimony, can be applied to aligned sequences, such as gene, indel, or other genomic presence/absence sequences. It also provides a statistical measure of support for each possible graph. The usefulness of this method is demonstrated by applying it to the ring of life.

Key Words: 3D parsimony • ring of life • eukaryotes • fusion digraphs • evolutionary graphs


    Introduction
 TOP
 Abstract
 Introduction
 Theory
 Discussion
 Acknowledgements
 References
 
Given the enormous interest in discovering the topology of the net-, the web-, the ring-, the tree-, or the non–tree-of-life (Hilario and Gogarten 1993Go; Doolittle 1999Go; Jain et al. 2003Go; Rivera and Lake 2004Go; Konstantinidis and Tiedje 2005Go; Dagan and Martin 2006Go; McInerney and Pisani 2007Go; Sorek et al. 2007Go), there is a growing need for phylogenetic algorithms that can reconstruct trees, rings, and other graphs from genomic data. The conditioned reconstruction algorithm (Lake and Rivera 2004Go) reconstructs graphs from genomes by converting gene presences and absences to sequences, determines phylogenetic trees from alignments of these sequences, and thereby detects graphs through the analyses of cyclically related phylogenetic trees.

Here, we provide a generalized method, 3-dimensional (3D) parsimony, to reconstruct genomic graphs. 3D parsimony analyzes graphs, including trees, under a single mathematical model using the presence and absence patterns observed for genes, indels, or other character data. It also provides an improved theoretical framework for understanding graph reconstruction. We demonstrate this method by applying it to the Ring of Life.


    Theory
 TOP
 Abstract
 Introduction
 Theory
 Discussion
 Acknowledgements
 References
 
Tree Reconstruction
3D parsimony is a generalization of parsimony-based logic for detecting graphs and directed graphs from aligned genomic sequences. It combines aspects of traditional parsimony (Fitch 1971Go), conditioned reconstructions (Lake and Rivera 2004Go), quartet methods (Strimmer and VonHaeseler 1996Go)—specifically bootstrappers gambit (Lake 1995Go)—and top-down indel rooting (Lake et al. 2007Go).

To appreciate how parsimony can be applied to graphs, it is useful to review how the bootstrappers gambit algorithm parsimoniously analyzes trees. Consider the 5-taxon, unrooted tree shown at the top of figure 1, the ((a, b), c, (d, e)) tree in Newick notation. Gambit analyzes 5-taxon trees by systematically removing 1 taxon at a time in order to generate 5, four-taxon sequence alignments of gene presences, +’s, and absences, –’s. It then analyzes the 4-taxon alignments by counting which parsimoniously informative character state patterns, that is, the patterns containing 2 +’s and 2 –’s, occur most frequently. These 5 most parsimonious, 4-taxon subtrees are then graphically combined into a single 5-taxon tree.


Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— A gambit analysis of a labeled, 5-taxon tree. In the first column of the figure, the tree is deconstructed into 5, four-taxon trees. The corresponding set of parsimonious patterns obtained when each of the 4-taxon trees is evaluated using the 2 character states, gene presence (+) and gene absence (–), is shown at the right of each 4-taxon tree (note that –’s are shown as blanks). The sites of parsimonious character state changes are marked on the trees in the first column by open rectangles labeled with the identifying numbers of the character state patterns (shown in the third column). When analyzed by gambit, the boxed Xs in the second column, may have either character state + or –, but when analyzed by conditioned reconstruction, the character state of X may be only +. All possible informative character state patterns are shown in the third column. The numbered boxes on the 4-taxon trees indicate the sites of the parsimonious changes and their patterns. Note that those informative character state patterns used for conditioning contain 3 +’s and 2 –’s and correspond to patterns 1–10. For the labeled tree analyzed in this figure, conditioned patterns 1, 2, 5, 8, 9, and 10 are most parsimonious and are predicted to be supported by high numbers of gene counts when data are generated by this tree.

 
A gambit analysis is illustrated in figure 1, assuming that the genomes evolve according to the tree shown at the top of the figure. When taxon e is eliminated from the analysis, shown by a dashed e leaf in the tree at the top of the first column, the resulting subtree has topology ((a, b), (c, d)).

Most parsimoniously this subtree produces gene counts for the supported patterns, (+, +, , , X) and ( , ,+, +, X) for taxa (a, b, c, d, e) respectively, shown at the top of the second column of figure 1. The 4 nonparsimonious patterns are not shown. (Note also that –’s are shown as blanks for visual clarity.) The Xs in the patterns indicate that the genomic sequence state for taxon e may be either a plus or a minus because the character states of the missing taxa are unknown. The other 4 panels in the second column of figure 1 show the allowed sets for the remaining choices of missing taxa. Note that paired patterns for each of the 5 subtrees are symmetric, that is, (+, +, –, –) and (–, –, +, +), when the taxon containing X is not shown. This is true only for trees and not generally the case for graphs where the corresponding patterns may be (+, +, –, –) and (+, –, +, –), for example. Once the best supported character state sets are identified by counting the number of genes that support each pattern, one can infer the most parsimonious 4-taxon trees and subsequently combine them into 5-taxon trees. This logic can be iteratively applied to more taxa as needed.

Gambit-like analyses of conditioned data sets proceed in related ways. As previously described (Lake and Rivera 2004Go), conditioning is used when analyzing complete genomes in order to define rigorously the genes that support each character state pattern. (Without a conditioning genome, the gene missing state, –, may be defined in multiple ways.) Conditioning is accomplished by referencing the analyses to a predefined set of genes, such as all those genes present, +, in a reference taxon. The reference taxon is then used to define the gene "universe" for the analysis but is not directly used in the phylogenetic reconstruction. By utilizing each taxon in turn as the reference genome, one can avoid deciding on a particular reference genome and simultaneously maximize the informational content of the analysis (Spencer et al. 2007Go).

Thus, for a conditioned 5-taxon analysis, one analyzes all 5 four-taxon trees, as in the second column of figure 1, except that one only counts patterns in which X corresponds to character state +. Under this model, the 2 patterns that support the ((a, b), (c, d)) tree, shown at the top of the second column, are 5 (+, +, , ,+) and 10 (, , +, +, +), corresponding to taxa (a, b, c, d, e), respectively. Similarly, the remaining 4-taxon trees are analyzed by conditioning on each of the 4 remaining taxa. Thus, conditioning can be naturally integrated into the gambit algorithm.

At this point, it may be useful to step back and consider how a bootstrappers gambit analysis differs from a direct parsimony analysis of 5-taxon trees. The parsimoniously informative character sets for conditioned 4-taxon analyses consist of all sets that contain 3 +’s and 2 –’s (1 + for conditioning and 2 +’s and 2 –’s for analyzing the 4-taxon trees). This restricts the analysis to the 10 possible combinations of 3 +’s and 2 –’s listed in the third column of figure 1, numbered 1–10. Depending on the conditioning taxon, patterns 1, 2, 5, 8, 9, and 10 are most parsimonious with the corresponding 4-taxon trees derived from tree ((a, b), c, (d, e)). In contrast, a parsimonious analysis of the entire 5-taxon tree, that is, a nonbootstrappers analysis, permits a smaller set of patterns. Specifically, patterns 1, 10, 11, and 20 are parsimonious. Because patterns 11 and 20 contain 3 –’s and 2 +’s, they are not informative if conditioned. Thus, 3D parsimony utilizes 4 terms, 2, 5, 8, and 9, that do not support the 5-taxon tree in direct, traditional parsimony analyses. When the method is used to reconstruct graphs, these additional terms provide the new information that is needed to move out of 2 dimensional tree space and into 3D graph space, as explained in the next section.

Reconstructing Graphs
The procedures described in this section are appropriate for labeled, connected graphs generated under a general evolutionary model. Under this model, each internal node connects 3 edges/leaves and may be produced by several processes of evolutionary diversification: fusion, bifurcation, or directed transfer. Fusion nodes are surrounded by 1 directed edge exiting the node (the fusion organism) and 2 directed edges entering it (the fusion donors), unrooted bifurcation nodes are surrounded by 3 edges of unknown directionality, and unrooted directed transfer donors and recipients are surrounded by 1 edge of known directionality. An example of a directed transfer would be the significant transfer of genes from the chloroplast to the nucleus but not in the reverse direction. (In graphical terminology [Buckley and Harary 1990Go], the indegrees and outdegrees, respectively, of a fusion node are 2 and 1, those of a rooted bifurcation node are 1 and 2, those of an unrooted bifurcation node are 3 and 3, those of an unrooted directed donor are 2 and 1, and those of an unrooted directed recipient are 1 and 2.) In evolutionarily realizable graphs, the fusion, directed, and bifurcation nodes must be arranged so that at all points are reachable from at least 1 point on the graph (the root).

A 3D parsimony analysis of a 5-taxon, unicycle (Semple and Steel 2006Go) graph is illustrated in figure 2. In this graph, shown at the top of the figure, genomes from 2 donor taxa, labeled a and d, fuse to create a new taxon, c. Genome fusions have novel properties that make them initially more difficult to understand than bifurcations. To appreciate this, let us focus on the ring that is conditioned on taxon e, shown at the top of the left column in figure 2, and consider the number of character state changes necessary to obtain the pattern (+, –, +, –) for taxa (a, b, c, d). The gene present in taxon a can be transferred to taxon c without a character state change, as indicated by the arrowhead directed toward c from a. Because the gene is absent in taxon b and present in a, it must either be lost or gained between a and b as indicated by the open rectangular box marked 6. Because the gene absent in b is also absent in d, no changes along the path between b and d are required (ignore rectangular box 10 because it applies to a different pattern). Also no gene state changes are required between d and c. Even though c and d have different states, d is a "–" and c is a "+," no character state changes are required because c has already inherited the gene from a, so that it is not necessary to inherit a second copy from d. Thus, only a single gene character state change is needed to explain pattern 6.


Figure 2
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— A gambit analysis of a labeled, 5-taxon evolutionary graph. In the first column of the figure, the graph is deconstructed into 5, four-taxon subgraphs. The corresponding set of parsimonious patterns obtained when each of the 4-taxon subgraphs is evaluated using the 2 character states, gene presence (+) and gene absence (–), is shown in the second column at the right of each 4-taxon subgraph (note that –’s are shown as blanks). The sites of parsimonious character state changes are marked on the graphs in the first column by open rectangles labeled with the identifying numbers of the character state patterns that this site generates. When analyzed by gambit and conditioned on the Xs, the character state of X may be only +. Hence, only patterns 1–10, consisting of 3 +'s and 2 –'s are allowed. The numbered boxes on the 4-taxon graphs indicate the sites of the parsimonious changes and their patterns. For the labeled graph analyzed here, conditioned patterns 1, 3, 4, 6, and 10 are most parsimonious and predicted to be supported by high numbers of gene counts when data are generated by this graph.

 
The fusion property differentiates the character state patterns generated by trees from those of other graphs. Trees, conditioned on a given taxon, create 2 four-taxon patterns that are symmetrically related. The top tree in figure 1, conditioned on taxon e, generates patterns 5 (+, +, –, –) and 10 (–, –, +, +) for taxa (a, b, c, d). These patterns are symmetrically related with respect to an interchange of +’s and –’s. In contrast, graphs are not restricted to this symmetry. For example, the graph at the top of figure 2 produces patterns (+, –, +, –) and (–, –, +, +) for taxa (a, b, c, d). These 2 patterns are not symmetrically related because the fusion provides 2 pathways for genes to enter fusion organism, c. A gene may enter c through a, producing pattern (+, –, +, –), or a gene may enter c through d, producing pattern (–, –, +, +) for taxa (a, b, c, d). Thus, fusion events can dramatically increase the complexity of allowed phylogenetic patterns.

The patterns allowed by the tree in figure 1, and by the ring in figure 2, are compared side-by-side in figure 3. The disallowed patterns, corresponding to low gene counts, are shaded, whereas the allowed patterns, corresponding to high gene counts, are not shaded. The allowed patterns for the ((a, b), c, (d, e)) tree analyzed in figure 1 (patterns 1, 2, 5, 8, 9, and 10) are shown in the second column, and the allowed patterns for the unicycle graph analyzed in figure 2 (patterns 1, 3, 4, 6, and 10) are shown in the fourth column. By comparing the experimentally observed pattern with those predicted by various graphs, one can determine the graph that best fits the data. Determining which 5-taxon graph is supported by the data then reduces to asking how many counts support each of the 10 possible, 5-taxon character state patterns, just as computing 4-taxon trees corresponds to measuring parsimony counts for the 3 possible E, F, and G trees.


Figure 3
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— A comparison of the patterns allowed by the labeled 5-taxon trees and 5-taxon graphs. The allowed patterns for the ((a, b), c, (d, e)) tree analyzed in figure 1 (patterns 1, 2, 5, 8, 9, and 10 shown in the second column) are compared with allowed patterns for the unicycle graph analyzed in figure 2 (patterns 1, 3, 4, 6, and 10, shown in the fourth column). The disallowed patterns correspond to low gene counts and are shaded, whereas the allowed patterns correspond to high gene counts and are not shaded. The numbered rectangular boxes shown on the conditioned, 4-taxon subtrees (column 1) and subgraphs (column 2) show the sites of parsimonious changes. As described in the text, these computed patterns are compared with the experimentally observed patterns to determine which graphs best fit the data.

 
Intuitively, a ring is a more complex object than a tree leading to the suspicion that one would always favor a ring over a tree. But this is not the case. Because all 5-taxon graphs (including trees) are analyzed as 4-taxon subtrees and because these subtrees are evaluated using the same parsimony criteria (namely all 2 plus and 2 minus patterns can be explained by a maximum of 2 character state changes or a minimum of 1 change), this ensures that identical selection criteria are applied to both trees and graphs.

The 3D Parsimony Analyses of the Ring of Life Data
Conditioned reconstructions have previously been used to study the origin of the eukaryotes and provided evidence for a Ring of Life in which the eukaryotes are the fusion organism and the 2 fusion partners are prokaryotes (Rivera and Lake 2004Go). That study concluded that 1 fusion partner branches from deep within an ancient photosynthetic clade and is "either a proteobacterium or a member of a larger photosynthetic clade that included the Cyanobacteria and the Proteobacteria," and "the other is related to the archaeal prokaryotes." Here, gene presence/absence data from the Ring of Life are reanalyzed using 3D parsimony, to demonstrate the use of this new tool.

Parsimoniously informative, conditioned character state sets, containing 3 gene presences, +’s, and 2 gene absences, –’s, were extracted from the aligned gene absence/presence sequences of the gamma proteobacterium Xylella fastidiosa (X), the bacillus Staphylococcus aureus MW2 (S), the eukaryotes (Y) Schizosaccharomyces pombe and Saccharomyces cerevisiae, the archaeal eocyte Sulfolobus tokodaii (E), and the halobacterium Halobacterium sp. NRC-1 (H). The extracted counts for the informative character state sets are shown in table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 Parsimony Counts for the Ring of Life

 
The first column of table 1 shows the 10 parsimoniously informative, character state patterns with gene presences indicated by +’s and gene absences indicated by blanks. The gene counts observed when either S. pombe, Yp, or S. cerevisiae, Yc, is included in the analyses are shown in the fourth and fifth columns of table 1, respectively. Note that similar count distributions are found when either S. pombe or S. cerevisiae are used as the eukaryote in the analyses, as expected for 2 closely related eukaryotes. The second column shows the number of parsimony changes needed for each pattern if the ((X, S), Y, (E, H)) tree is correct. The third column shows the number of parsimony changes needed if the graph shown in figure 4 is correct. Thus, we calculate the total number of gene changes required by multiplying the number of parsimony changes predicted by the tree or the graph times the number of genes supporting each pattern. Assuming the most parsimonious tree, ((X, S), Y, (E, H)), is correct requires 1,264 or 1,301 character state changes, when S. pombe or S. cerevisiae is used as the eukaryote, respectively. Assuming the graph in figure 4 is correct, requires 1,049 or 1,096 character state changes when S. pombe or S. cerevisiae, is used as the eukaryote, respectively. Thus, the ring of life requires about 210 fewer changes than the best tree.


Figure 4
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— The 5-taxon evolutionary graph derived from the analyses of the character state counts obtained from the ring of life data set. The character state set data used to derive this graph are shown in table 1. The taxa used in the analyses are referred to using the following abbreviations: the gamma proteobacterium Xylella fastidiosa (X), the bacillus Staphylococcus aureus MW2 (S), the eukaryotic yeast Schizosaccharomyces pombe (Y = Sp) and Saccharomyces cerevisiae (Y = Sc), the archaeal eocyte Sulfolobus tokodaii (E), and the halobacterium Halobacterium sp. NRC-1 (H). The graph in which S and H are switched is equally parsimonious.

 
The parsimony changes calculated from the best tree and the best graph are marked with stars when a character set pattern is most parsimonious (1 change). Six character set patterns are most parsimonious for the tree, and 5 are most parsimonious for the graph. Only 2 of the starred tree patterns match the top 6 gene counts, indicating poor agreement between the best tree and the observed counts. In contrast, the parsimony changes calculated from the 5-taxon unicycle graph in figure 4 (column 3) are in full agreement with the observed counts. Specifically, the 5 starred character set patterns computed to be most parsimonious (1 change) correspond to the top 5 highest scoring patterns for S. pombe and S. cerevisiae (shown in bold), and the 5 nonparsimonious patterns (2 changes) correspond to the bottom 5 scoring patterns. The 5 allowed patterns for the ring of life explain 601 (72.8%) of the Pfam counts or 624 (72.6%) of the Pfam counts when either S. pombe or S. cerevisiae are used as the eukaryote in the analyses.

To assess whether the gene counts provide statistically significant support for the graph in figure 4, we calculate the number of bootstrap analyses that support it and the number of bootstrap analyses that support any other graphs. The conditional probability supporting the graph in figure 3 is then the number of those bootstrap replicates that support the graph in figure 3 divided by the number of replicates that support it or any other graphs (Lake 1995Go). (For discussions of the interpretations of bootstraps, see Felsenstein and Kishino (1993)Go; Hillis and Bull (1993)Go; Efron et al. (1996)Go.) Using the fractions of the total counts supporting each of the 10 patterns to estimate the probability under a decavariate multinomial model, we calculate 10,000 bootstrap replicates of the character state pattern distributions. For the S. pombe data, 96.02% of the replicates contain the same 5 top patterns as those in table 1 and therefore support the graph in figure 4. For the S. cerevisiae data, 99.80% of the bootstraps support the graph in figure 4. We conclude that both 5-taxon graphs are statistically supported by these data, consistent with the ring reconstructed in Rivera and Lake (2004)Go.


    Discussion
 TOP
 Abstract
 Introduction
 Theory
 Discussion
 Acknowledgements
 References
 
Graphs are assuming increased importance in studies of genomic evolution. Today genome fusions through symbioses are a central feature in evolution and interest in fusions continues to increase (Margulis 1970Go; Lake and Rivera 1994Go; Moran 1996Go; Martin and Muller 1998Go; Embley 2002Go). Fusions are unique because they can fundamentally alter evolution in ways that bifurcations and simple gene transfers cannot. For example, a single endosymbiosis can introduce many more duplicate copies of genes into a host organism than other mechanisms and can dramatically restructure cellular architecture, as in eukaryotes.

Fusions have properties that allow them to reveal details of past evolutionary events. Genome fusions carry partial rooting information, whereas bifurcations do not. Fusions can thus exclude the root from parts of the graph of life, whereas bifurcations cannot. Fusions, in contrast to tree bifurcations, bring 2 different fusion partners together at the same "time" and in the same "environment" (place). This can provide important clues about the nature of the relationship between the partners and about the environment they lived in at the time of the fusion and can help place time markers on the graph of life. Yet, few phylogenetic algorithms can directly detect fusions.

There is considerable interest and discussion about the biological bases for genome fusions and the time spans over which they occur (Martin and Embley 2004Go; Bapteste and Walsh 2005Go; Lester et al. 2005Go; McInerney and Wilkinson 2005Go). Mathematically, fusions correspond to the directed transfer of statistically significant numbers of genes from 2 donor taxa to generate a new taxon. This is a process constrained by time, space, and phylogenetic relationships. When a fusion is detected, its biological interpretation depends in part upon how accurately one can experimentally answer the following questions. What are the best phylogenetic estimates of the identity of the fusion donor taxa? Is the resolution of the phylogenetic reconstruction sufficiently high that the donors can be localized to a few taxa, or can they only be identified as a large taxon? How accurately can we identify the fusion taxon? Are multiple events involved? Gaining such knowledge depends upon obtaining the necessary phylogenetic information. In this sense, the Ring of Life is only starting to inform us about eukaryotic origins. We would like to know if the ring is the result of observing the flow of genes from mitochondrion and chloroplast to the nucleus. Or is the ring representing an earlier phylogenetic relationship? 3D parsimony now provides us with another tool to help us better answer the fascinating questions related to eukaryotic evolution.

3D parsimony is only a first step toward analyzing evolutionary graphs, and many additional modifications are needed to improve and extend it. For example, it has long been known, theoretically (Felsenstein 1978Go) and practically (Lake 1991Go), that parsimony is subject to long-branch attraction. Thus, the results obtained here using 3D parsimony provide a starting point for improved graphical reconstructions and leave much room for improvement and modification. There is also considerable progress being made in developing circular splits and other algorithms useful for graphical reconstructions (Baroni et al. 2006Go; Grunewald et al. 2006Go). In theory, the results presented here allow one to reconstruct 5-taxon graphs from character state patterns by enumerating all possible 5-taxon evolutionary graphs and calculating their corresponding character state patterns. But no algorithm is available yet that allows us to determine 5-taxon graphs directly from the character state patterns. Also unsolved are the precise details of how 3D parsimony analyses can be extended from 5-taxon graphs to N-taxon graphs. The necessary procedures, probably like those used to extend 4-taxon trees to N-taxon trees in gambit and other algorithms, remain to be developed.

3D parsimony shows great promise for understanding the evolution of life and further developing this method to extend its usefulness represents an exciting scientific challenge.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Theory
 Discussion
 Acknowledgements
 References
 
This paper is dedicated to Walter Fitch who taught me parsimony and to Mike Steel who showed me how to write the Bayesian-Jeffreys’ Bootstrap theorem. I thank Jackie Servin, Ryan Skophammer, and Craig Herbold for constant advice and help. Supported by grants from National Science Foundation and the UCLA NASA Astrobiology Institute to J.A.L.


    Footnotes
 
1 Present address: 232 Boyer Hall, 611 South Young Drive, University of California, Los Angeles Back

Martin Embley, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Theory
 Discussion
 Acknowledgements
 References
 

    Bapteste E, Walsh DA. Does the ‘Ring of Life’ ring true? Trends Microbiol (2005) 13:256–261.[CrossRef][Web of Science][Medline]

    Baroni M, Semple C, Steel M. Hybrids in real time. Syst Biol (2006) 55:46–56.[CrossRef][Web of Science][Medline]

    Buckley F, Harary F. Distance in graphs (1990) Addison-Wesley. Redwood City (CA).

    Dagan T, Martin W. The tree of one percent. Genome Biol (2006) 7:118.[CrossRef][Medline]

    Doolittle WF. Phylogenetic classification and the universal tree. Science (1999) 284:2124–2128.[Abstract/Free Full Text]

    Efron B, Halloran E, Holmes E. Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA (1996) 93:13429–13434.[Abstract/Free Full Text]

    Embley TM. Anaerobic eukaryotes and their archaebacterial endosymbionts. Environ Microbiol (2002) 4:15–16.[CrossRef][Medline]

    Felsenstein J. Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool (1978) 27:401–410.[Abstract/Free Full Text]

    Felsenstein J, Kishino H. Is there something wrong with the bootstrap on phylogenies—a reply. Syst Biol (1993) 42:193–200.[Free Full Text]

    Fitch WM. Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool (1971) 20:406–416.[Abstract]

    Grunewald S, Forslund K, Dress A, Moulton V. QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets. Mol Biol Evol (2006) 24:532–538.[CrossRef][Web of Science][Medline]

    Hilario E, Gogarten JP. Horizontal transfer of ATPase genes—the tree of life becomes a net of life. Biosystems (1993) 31:111–119.[CrossRef][Web of Science][Medline]

    Hillis DM, Bull JJ. An empirical-test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol (1993) 42:182–192.[Abstract/Free Full Text]

    Jain R, Rivera MC, Moore JE, Lake JA. Horizontal gene transfer accelerates genome innovation and evolution. Mol Biol Evol (2003) 20:1598–1602.[Abstract/Free Full Text]

    Konstantinidis KT, Tiedje JM. Towards a genome-based taxonomy for prokaryotes. J Bacteriol (2005) 187:6258–6264.[Abstract/Free Full Text]

    Lake JA. Tracing origins with molecular sequences: metazoan and eukaryotic beginnings. Trends Biochem Sci (1991) 16:46–50.[CrossRef][Web of Science][Medline]

    Lake JA. Calculating the probability of multitaxon evolutionary trees—bootstrappers Gambit. Proc Natl Acad Sci USA (1995) 92:9662–9666.[Abstract/Free Full Text]

    Lake JA, Herbold CW, Rivera MC, Servin JA, Skophammer RG. Rooting the tree of life using non-ubiquitous genes. Mol Biol Evol (2007) 24:130–136.[Abstract/Free Full Text]

    Lake JA, Rivera MC. Was the nucleus the 1st endosymbiont? Proc Natl Acad Sci USA (1994) 91:2880–2881.[Free Full Text]

    Lake JA, Rivera MC. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol Biol Evol (2004) 21:681–690.[Abstract/Free Full Text]

    Lester L, Meade A, Pagel M. The slow road to the eukaryotic genome. Bioessays (2005) 28:57–64.[Web of Science]

    Margulis L. Origin of the eukaryotic cells (1970) New Haven (CT): Yale University Press.

    Martin W, Embley TM. Early evolution comes full circle. Nature (2004) 431:134–136.[CrossRef][Medline]

    Martin W, Muller M. The hydrogen hypothesis for the first eukaryote. Nature (1998) 392:37–41.[CrossRef][Medline]

    McInerney JO, Pisani D. Paradigm for life. Science (2007) 318:1390–1391.[Abstract/Free Full Text]

    McInerney JO, Wilkinson M. New methods ring changes for the tree of life. Trends Ecol Evol (2005) 20:105–107.[CrossRef][Medline]

    Moran NA. Accelerated evolution and Muller's ratchet in endosymbiotic bacteria. Proc Natl Acad Sci USA (1996) 93:2873–2878.[Abstract/Free Full Text]

    Rivera MC, Lake JA. The ring of life: evidence for a genome fusion origin of eukaryotes. Nature (2004) 431:152–155.[CrossRef][Medline]

    Semple C, Steel MA. Unicyclic networks: compatibility and enumeration. IEEE/ACM Trans Comput Biol Bioinform (2006) 3:84–91.[CrossRef]

    Sorek R, Zhu Y, Creevery CJ, Francino MP, Bork P, Rubin EM. Genome-wide experimental determination of barriers to horizontal gene transfer. Science (2007) 318:1449–1452.[Abstract/Free Full Text]

    Spencer M, Bryant D, Susko E. Conditioned genome reconstruction: how to avoid choosing the conditioning genome. Syst Biol (2007) 56:25–43.[CrossRef][Web of Science][Medline]

    Strimmer K, VonHaeseler A. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol (1996) 13:964–969.[Web of Science]

Accepted for publication May 10, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Phil Trans R Soc BHome page
J. A. Lake, R. G. Skophammer, C. W. Herbold, and J. A. Servin
Genome beginnings: rooting the tree of life
Phil Trans R Soc B, August 12, 2009; 364(1527): 2177 - 2185.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/8/1677    most recent
msn117v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Lake, J. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lake, J. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?