MBE Advance Access originally published online on October 16, 2007
Molecular Biology and Evolution 2008 25(1):83-91; doi:10.1093/molbev/msm229
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Alternative Methods for Concatenation of Core Genes Indicate a Lack of Resolution in Deep Nodes of the Prokaryotic Phylogeny
,2
* Canadian Institute for Advanced Research and Genome Atlantic, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
Genome Atlantic, Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
E-mail: eric.bapteste{at}snv.jussieu.fr.
| Abstract |
|---|
|
|
|---|
It has recently been proposed that a well-resolved Tree of Life can be achieved through concatenation of shared genes. There are, however, several difficulties with such an approach, especially in the prokaryotic part of this tree. We tackled some of them using a new combination of maximum likelihood-based methods, developed in order to practice as safe and careful concatenations as possible. First, we used the application concaterpillar on carefully aligned core genes. This application uses a hierarchical likelihood-ratio test framework to assess both the topological congruence between gene phylogenies (i.e., whether different genes share the same evolutionary history) and branch-length congruence (i.e., whether genes that share the same history share the same pattern of relative evolutionary rates). We thus tested if these core genes can be concatenated or should be instead categorized into different incongruent sets. Second, we developed a heat map approach studying the evolution of the phylogenetic support for different bipartitions, when the number of sites of different phylogenetic quality in the concatenation increases. These heatmaps allow us to follow which phylogenetic signals increase or decrease as the concatenation progresses and to detect emerging artifactual groupings, that is, groups that are more and more supported when more and more homoplasic sites are thrown in the analysis. We showed that, as far as 7 major prokaryotic lineages are concerned, only 22 core genes can be said to be congruent and can be safely concatenated. This number is even smaller than the number of genes retained to reconstruct a "Tree of One Per Cent." Furthermore, the concatenation of these 22 markers leads to an unresolved tree as the only groupings in the concatenation tree seem to reflect emerging artifacts. Using concatenated core genes as a valid framework to classify uncharacterized environmental sequences can thus be misleading.
Key Words: phylogeny concatenation simultaneous analysis Tree of Life prokaryotes
| Introduction |
|---|
|
|
|---|
Recently, in Science, Ciccarelli et al. (2006)
However, as Dagan and Martin (2006)
recently pointed out, this analysis can be interpreted in 2 very different ways. If the topology obtained by Ciccarelli et al. (2006)
is, indeed, the one phylogeneticists have been searching for since Darwin, evolutionists embracing tree thinking ("positivists" [Dagan and Martin 2006
]) can celebrate a major achievement: their principles are demonstrably capable of building the TOL. On the other hand, the victory appears suddenly incredibly fragile and vacuous for evolutionists interested in describing the diversity of evolutionary processes ("microbialists" [Dagan and Martin 2006
]) as this Tree is simply the "Tree of One Per Cent" of the genome content. Its practical value and predictive powers for studying the genomic make-up and the similarities and the differences of various living beings appear almost shockingly limited.
In the analyses reported here, we have used 2 new powerful independent methodologies to test what can be safely assumed about the prokaryotic portion of such a tree and its meaning. We have dissected the behavior of the phylogenetic signal at various stages of the concatenation process and have shown that, for prokaryotes, 1) only 0.7% of the genome can be safely used to build a tree and 2) that the apparent resolution of this 0.7% tree is probably artifactual. In other words, for these taxa and these core markers, tree-thinking logic would lead to the conclusion—unfortunately for positivists and not surprisingly so for microbialists—that it is safer to assume a comb-like topology of life (a soft polytomy due to a lack of resolution in deep nodes) rather than a tree-like one. A few consequences of this result for phylogenetics are also briefly discussed.
Toward an Automated TOL
Noting that "reconstructing the phylogenetic relationships among all living organisms is one of the fundamental challenges in biology," Ciccarelli et al. (2006)
have worked hard to produce an almost automatic reconstruction of the TOL. This problem is not an easy one to solve. As they observe, "even under the assumption of a tree of life, numerous groupings and taxonomic entities remain heavily debated, and the advent of molecular and genomic data has increased the variety of classifications rather than reducing the problem" (Ciccarelli et al. 2006
). Furthermore, a fair comparison of these conflicting results would be extremely difficult. In such a context, Ciccarelli et al. argued that what makes their own analysis especially valuable was that it demonstrated "the feasibility of the tree construction." They grounded this conclusion on the observation that the selective exclusion of lateral gene transfer (LGT) (for which they identified 7 candidates among 31 orthologues from 191 species) increased the robustness of the phylogenetic signal and that the remaining ubiquitous orthologous markers were of sufficient length to be conclusive in supporting a single reasonably well-resolved phylogenetic tree. Put another way, they took the resolution of their final concatenated tree as the evidence for their claim about the feasibility of tree construction from concatenated core genes. Had their tree been only weakly supported they would have been obliged to conclude that no TOL can be obtained with the current methodology. Because this was fortunately not the case, they presented several of their robust groupings, such as the monophyly of all major divisions and some confirmatory branching orders to convince the readers of the strength of their conclusion.
However, high statistical support in a concatenation is not really independent evidence. This method does not test the fundamental premise of the existence of a tree, as we discussed more rigorously elsewhere, it is verificationist (Bucknam et al. 2006
). That is, it works by accumulating data that are compatible with the null hypothesis of a common tree without having a chance to refute it, even for data of poor phylogenetic quality. It thus remains necessary to explore the origin of the resolution in the tree, and to test whether the support is genuine or artifactual, before the feasibility of the TOL can be proven. After all, the application of a phylogenetic reconstruction method will result in a tree, whether or not it is the true tree, and several nodes will likely be supported if the analysis was based on enough data. Notably, in the bacterial domain, where the branching order of lineages is unknown, it is difficult to evaluate how trustworthy the resolution is. The deep branching of the firmicutes, for instance, is in agreement with a proposed Gram-positive ancestor for all bacteria (but see Cavalier-Smith [2006]
for different views) but could as well be due to some long-branch attraction (LBA) between the numerous fast-evolving Mycoplasmas included in the analysis and the archaeal and eukaryotic outgroups. Artifact resulting from LBA could also be responsible for the sister grouping between Nanoarchaeota and Crenarchaeota and the authors themselves cited the basal position of Giardia lamblia in Eukaryotes as such a possible difficulty (Ciccarelli et al. 2006
).
Importantly too, the resolution of such a tree was no guarantee that this topology reflected the history of any of the individual genes that were used to build it. It may be the case that this tree emerged from this collection of data, but that another different and equally robust output could have been obtained, were a slightly different site selection considered (Brinkmann and Philippe 1999
). Site selection could prove to be a significant issue as the length of concatenation of Ciccarelli et al. (2006)
(8,090 sites for 191 species in 31 markers) suggests that their site selection was not particularly strict (Dagan and Martin [2006
] even claimed that "only 1,212 sites would have been retained had gapped sites been excluded," and our own careful concatenation of a similar number of markers, comprising taxa of the 3 life domains and manually edited was only 5,808 positions long). In addition, it is well known that concatenating data tend to produce strongly supported trees, but these trees depend on the model of evolution (Phillips et al. 2004
; Keane et al. 2006
) Thus, the very promising analysis of Ciccarelli et al. did not show much evidence that the concatenated tree it proposed was more than a central tendency of the phylogenetic signal, nor that its apparent support and topology did not result from some possible biases present in a few genes. Indeed, as these authors admitted, "independent tests carried out on individual gene trees revealed that, although they are not identical, they share similarities with both the obtained tree of life and with each other." More precisely, an average distance of 23 subtree pruning and regraphting operations (SPRs) separated any individual gene tree from the concatenated topology, a distance that is almost half what a random marker would be expected to present (49 SPR) (Dagan and Martin 2006
). Because the distance from a concatenated topology for real genes will be smaller than the corresponding distances for random markers if any portion of the tree has some resolution, this intermediate value may not indicate much about the conservation of the global and deeper structure of the tree. It may well be that it is only local relationships within groups (the tips of the tree) that are better resolved than they would be with random markers.
Ciccarelli et al. (2006)
nevertheless concluded very honestly that "although it may be possible to reject the null hypothesis of each of these tests without much difficulty, their combined evidence suggests that the gene trees have a cohesive phylogenetic signal," taking the resolution of the concatenation as the validation of the relevance of the concatenation approach. Yet, when we focused our own attention on the relationships between 7 of the major prokaryotic groups of interest to us, we observed a different result: that the resolution was not particularly strong (see supplementary material 1, Supplementary Material online). In order to critically assess the feasibility of the tree for these 7 aforementioned groups at least when following a similar approach of automated concatenation, we applied 2 new phylogenetic methods to control the process of concatenation analysis. The first one (the progressive reconstruction method) tracks the evolution of the global phylogenetic signal, when increasing proportions of sites of lesser phylogenetic quality are introduced in the concatenation. It notably identifies the emergence of artifactual groupings and the disappearance of genuine relationships due to the addition of too many noisy sites. The second one (the concaterpillar analysis) tests the compatibility of individual markers using thorough statistics and evaluates if multiple separate rather than a single simultaneous analysis of the data (concatenation) is statistically valid, taking into account the branch lengths of the individual gene trees.
Although inspired by the study of Ciccarelli et al. discussed before, our present paper is not a reanalysis of the same data. For this reason, it should not be interpreted as a direct rebuttal of specific topology of Ciccarelli et al. Rather, it seeks to clarify some of the difficulties associated with the concatenation of phylogenetic markers. More precisely, we present an alternative set of automated concatenation methods allowing evaluation of the resolution of the best tree proposed in a concatenation, which could be applied to many other phylogenetic issues than the supertree of core life genes as well. Nonetheless, in this case, our conclusion is at odds with several claims of Ciccarelli et al. It suggests that the concatenation approach should only be applied for 22 markers at most (seven-tenths of a percent of an average genome), and then, it yields a comb (polychotomous or star phylogeny) rather than a tree for these 7 major prokaryotic lineages.
| Materials and Methods |
|---|
|
|
|---|
Constitution of the Data Sets
The data set was constituted as described in Bucknam et al. (2006)
Phylogenetic Analyses
All 31 gene alignments were concatenated, producing an alignment of 5,808 amino acid characters. Additionally, a concatenation of a subset of 22 topologically congruent genes (dnaG, fusA, gcp, infB, ksgA, nusA, nusG, rplA, rplC, rplE, rplF, rplK, rplN, rpoB, rpsB, rpsC, rpsD, rpsG, rpsH, secY, tufA, ychF) was prepared. These concatenated alignments, as well as the remaining 9 individual genes (argS, gltX, hisS, leuS, metG, serS, thrS, trpS, valS) were analyzed under the Whelan and Goldman model, using IQPNNI v.3.0.1 (Vinh le and Von Haeseler 2004
) using the default options, with a few exceptions. First, the stopping rule was used, but with the default minimum number of iterations (82 in all cases). Second, variation of rates across sites was modeled by a discretised gamma distribution (4 substitution rate categories) with the shape parameter estimated from the data. Statistical support for the relationships implied in the trees inferred from all alignments was assessed by nonparametric bootstrapping (100 replicates) using IQPNNI with the same options, except that the stopping rule was not used, because it occasionally results in failure to converge on a tree.
Progressive Reconstruction Analysis
The progressive reconstruction analysis is inspired by the methodologies of Brinkmann and Philippe (1999)
and Brochier and Philippe (2002)
, which were based on estimates of maximum parsimony to split sites into categories with different degrees of homoplasy. By contrast, our method is based on estimates of maximum likelihood. It was implemented by a script (available from E.B. upon request) that performs the following tasks. 1) The largest alignment for all species is split into several small alignments of the same length for predefined subgroups of species. Here, the predefined groups were the same as described in supplementary material 2 (Supplementary Material online; the Archaea, the Spirochaetes, the Chlamydiales, the actinobacteria, the Proteobacteria, the Cyanobacteria, and the firmicutes). 2) The best PHYML tree for each subgroup of size larger than 4 operational taxonomic units (OTUs) is estimated (but a user-defined tree can be provided for groups with 3 members or less). 3) The likelihood associated with the best PHYML topologies and for each position, for each subgroup, are calculated using Tree-puzzle 5.1. (option—wsl). 4) To find the global likelihood of a position, the likelihoods at each position are summed together over all the subgroups. 5) Constant sites are removed, and a set of smaller, nested alignments is produced from the large alignment such that, for instance, alignment 1 contains only sites with calculated global likelihoods between –10 and –20, alignment 2 those sites with global likelihoods between –10 and –30, alignment 3 those sites with global likelihoods between –10 and –40, etc. 6) All the global topologies presenting all possible relationships among these subgroups of OTUs are generated. 7) The exhaustive list of tree topologies is used as input trees in Tree-puzzle 5.1 (option—wsl) for each alignment of a given category. 8) Sitewise likelihoods from Tree-puzzle 5.1 are evaluated by CONSEL (Shimodaira and Hasegawa 2001
; Shimodaira 2002
), which runs the approximately unbiased (AU) test, and trees that fail to reject the alignment at the 5% level are retained.
The splits of the retained trees are then studied using the statistical analysis package R, and their distribution is represented on a heatmap (script available for E. Susko upon request). More precisely, each cell of the heatmap gives a measure of support for a particular choice of splits and a particular choice of category of sites. Light colors indicate high support, whereas dark colors indicate low support. The measure used is the proportion of topologies, among those in a 95% confidence region for the category of sites under consideration that had the split present. If a split is well supported, it should appear in a large number of these topologies. For any given heatmap, attention is restricted to splits of the 7 main groups that appeared in at least one of the topologies in a 95% confidence region; a consequence is that the first row of heat maps in figures 1 and 3 may represent different splits.
|
|
Concaterpillar Analysis
Topological congruence between the 31 genes was assessed using Concaterpillar (Leigh et al. 2007
Concaterpillar was also used to determine which of those genes found to be topologically congruent should be combined by concatenation (i.e., which of these genes share compatible branch lengths). A similar hierarchical approach is used for this test, but using a ratio between the likelihood of pairs of genes forced to share the same branch lengths, or each allowed its own set of branch lengths (but forced to share the same topology in both cases). In the branch-length congruence test, the statistical significance of the likelihood ratio is evaluated by the chi-square test.
| Results and Discussion |
|---|
|
|
|---|
We analyzed the phylogenetic information of 31 core life genes, for 41 species, distributed in 7 prokaryotic groups, with a new phylogenetic method: the progressive reconstruction method (see Materials and Methods). In the case of genuine congruence between markers, this technique could help improve the resolution of the best tree based on careful site selection. It could also assist in interpretation of the support obtained in concatenations. Briefly, we used it to split the unambiguously aligned positions of each individual molecule (as well as their concatenation) into 6 nested categories of sites. The first category comprised the fewest homoplastic sites (of each individual gene or of their concatenation), the second category was slightly longer because it contained in addition some slightly more homoplastic sites, and so on, until the sixth category which corresponded to the whole individual gene or the whole concatenation (with undistinguished homoplastic and good sites). The phylogenetic information of each of these partitions could thus be compared, permitting testing of the hypothesis that the sites of better phylogenetic quality suggest different relationships than the sites of mixed phylogenetic quality and so on, and investigation of how the phylogenetic message of the whole unpartitioned sequence was affected by the presence of sites of poor phylogenetic quality.
The distinction between better and worse sites (see Materials and Methods) was based on the notion that, in the context of a TOL, the terminal monophyletic groups were well accepted but that finding a better resolution of their deeper relationships was the challenge, so that sites that were able to resolve the monophyly of the terminal groups with a better likelihood were likely less homoplastic than sites that already failed to support these accepted terminal groups. We feel that lower categories (as defined above) are thus less likely to be susceptible to processes that cause phenomena like LBA.
When applied to individual genes, this method led to the interesting observation that, for 22 of them, the partition into sites of different qualities had no effect on our ability to identify which true (resolved) relationships these markers were supporting, when asked to elect which were their favorite trees within the exhaustive list of test topologies involving these 7 monophyletic groups. Phylogenetic signal was simply too weak to reject the majority of the test trees, and partitions were either too short or unable to favor some splits over some other groupings (i.e., to identify some splits present in a majority of the nonrejected test trees according to the AU test, 5% level). The typical output for these "weak" phylogenetic markers is represented on a heatmap (fig. 1A). More precisely, each cell of the heatmap gives a measure of support for a particular choice of splits and a particular choice of category of sites. Light colors indicate high support, whereas dark colors indicate low support. The measure used is the proportion of topologies, among those in a 95% confidence region for the category of sites under consideration that had the split present. If a split is well supported it should appear in a large number of these topologies. Clearly, for "weak" genes, no split receives any significant support, regardless of which category of sites is used. By contrast, the 9 remaining markers were deemed "stronger" because different categories of sites displayed different patterns of support for them, suggesting that, perhaps, parts of their true gene history could be defined, once the most homoplastic sites were dismissed, or that some of their apparent support was to be doubted if it came from sites of poor phylogenetic quality.
Interestingly, the progressive reconstruction of the concatenation of the 31 markers showed a very different pattern than the typical heatmap of the individual "weak" genes as some phylogenetic signal emerged from the association of all the markers. Figure 1B presents the heatmap for this concatenation. Blue arrows point to emerging phylogenetic signals that were not supported by the sites of the best categories but received support when more and more noisy sites were being thrown in the analysis. Red arrows indicate vanishing phylogenetic signals that were supported by the better sites but then got obscured and were finally gone in the final concatenation. An interesting example of such a dramatic change in trends can be observed when sites of category 3 and above are considered. The split indicated by a "*," indicating a late emergence of Spirochaetes and Chlamydiales, suddenly loses its support, whereas the grouping of these 2 taxa with the Archaea at the base of the tree (indicated by +) immediately gains it. A detailed analysis shows that the support for such an early branching of these taxa occurs in fact in every single tree but has hardly any support in categories 1 and 2. Interestingly, this split does not receive much support in the individual strong genes (with the exception of the fifth and sixth category for trpS, accounting, however, only for 87 and 90 positions, respectively, in the 4,989 long concatenation), which indicates that the apparent overall strong support for this partition obscures what is actually diversity when viewed in terms of genes.
There are thus multiple lessons to take from this heatmap. First, there is an effect of concatenation: some increase in resolution (by contrast to that observed in individual markers) is expected when more positions are added. However, second, the observed resolution resulting from concatenation can be artifactual and sometimes masks phylogenetic signal of better quality. Our concatenation of 31 markers thus appears to be the contingent outcome of the number of sites considered rather than the confirmation of a clear phylogenetic trend. The maximum likelihood tree based on all core genes (fig. 2A) is thus doubtful, and general resolution in concatenation cannot be taken as evidence per se of the feasibility of the prokaryotic part of the TOL, based on multiple markers. Rather it suggests that traditional concatenation methods can yield artifactual results for deep nodes. Thus, for deep-level analyses, the choice of model might be very important, and its efficacy deserves to be critically assessed (for instance, by internal phylogenetic criteria when not much external evidence is available).
|
We then investigated, using a totally different method, whether the potentially risky concatenation of our 31 markers was statistically acceptable. We used the recently published concaterpillar application (Leigh et al. 2007
In fact, the tree issued from the concatenation of these 22 markers was poorly resolved (fig. 3A), although it showed strong support for the monophyly of the accepted terminal groups. It presented a topology where the branching pattern between major phyla was only partly resolved, which we will call the semi-comb of life. On this semi-comb of life, one deep node was supported: the grouping of (Proteobacteria and [Chlamydiae + Spirochaetes]) or (P,(S,C)). The other groups appear to emerge simultaneously in a basal polytomy, a global result which, at first glance, would suggest that an unescapable erosion of the ancient phylogenetic signal might be the cause of our inability to resolve most of the deepest prokaryotic branches. Hence, one could argue that the observed partial resolution does not speak in favor of the feasibility of the tree by concatenation approaches. We will in fact even argue against its feasibility because the study of the progressive reconstruction of this concatenation allows us to decide how trustworthy (or not) the only resolved relationship of the semi-comb of life is. Figure 3B presents a fascinating result in that regard. This heatmap displays 2 notable features: 1) a vanished relationship, as indicated by the red arrow, which was supported by the best sites only and got lost in the concatenation process, whereas 2) another relationship, indicated by a blue arrow, gained some support only when more and more noisy sites were added, as would be expected for the emergence of an artifactual grouping.
More precisely, the relationship erased when more data were added, as if a possibly true relationship were masked in the process of concatenation, corresponded to a sister grouping between the Spirochaetes and the Proteobacteria, a relationship not observed on the semi-comb of life. Importantly, this relationship appeared due to the concatenation of the best sites of all markers, and its support was not found in any specific marker. It could thus reflect a genuine amplification of common information, rather than the strong influence of a marker over the rest of the data set. Yet, if indeed this sister group relationship favored by the better sites (P,S) were true, the relationship present in the semi-comb of life (P,(S,C)) is deemed to be incorrect. This last claim is further reinforced by the fact that the possible artifactual grouping emerging with the progressive addition of the worse sites was actually the alternative association of the Spirochaetes with the Chlamydiales (S,C), observed in the semi-comb of life. In other words, our analysis suggests that the grouping of (S,C) is wrong, casting convergent doubts about the validity of the only relationship supported in the semi-comb of life. Overall concatenation results are thus not supported in this rather large pool of concatenated genes, the only ones that concaterpillar supported concatenating, and the semi-comb of life would be in fact no more than an artifactual resolution of the deep nodes of the prokaryotic phylogeny. In that case, the common tree produced by the concatenation analysis of multiple genes should not be trusted.
This conclusion seems very sensible to us as the second-level analysis built in concaterpillar suggested that no unique evolutionary model should be applied simultaneously to the 22 compatible markers, which displayed 2 different branch lengths. Instead, concaterpillar recommended that separate analyses should be preferred over a single concatenation. At the 5% level, concaterpillar identified only 4 possible groups of genes that could be investigated collectively under a unique model: (rplE + ychF), (infB + rplN + rpsG), (rplF + rpsH), and (gcP + rplC). Hence, although Ciccarelli et al. insisted that an important feature of their TOL was that the branch lengths derived from the concatenation of identical markers could be directly compared across the entire tree, we would recommend caution when building any interpretation based on the branch lengths of such a concatenation, at least in its prokaryotic part. This, unfortunately, removes one of the TOL's last major heuristic interests.
| Conclusion |
|---|
|
|
|---|
Dagan and Martin underlined how tedious and potentially unsatisfactory the quest of a Tree of One Per Cent could be for biological inferences. Our analyses suggest that unfortunately even less than that is accessible to positivists. For prokaryotes, it may be only a comb of seven-tenths of a percent that could be obtained if one follows the canonical phylogenetic logic of concatenating congruent core markers. The traditional quest for the TOL, once achieved, is going to be far less informative than it seemed, and we should not trust a tree resulting from a concatenation simply because it appears to be well resolved. In fact, the resulting topology from a concatenation of core genes could very well be the true tree of no gene at all, in the sense that no single marker need have followed this emerging path. Our concatenated tree based on 31 genes is a good example of that: at least 9 individual gene trees were strongly different, yet the concatenation still produced a well-resolved best tree (although with different traces of artifacts when we look at it in more detail). What such a tree does show then is only a central tendency: it does not give access to any essential identity of the taxa or to the precise history of their properties.
This realization should have several major implications, of which we will only mention 2 briefly. First, in contrast to the proposal of Ciccarelli et al., it is probably generally not justified to add DNA fragments of unknown phylogenetic origin (i.e., environmental sequences) to the matrix used to build such an emerging average tree in the hope to identify species because the coupling between the individual gene histories and the topology of the concatenation has no reason to be exact. Second, because there is no necessary identity relationship between gene trees and concatenated trees (often incorrectly qualified as "organismal phylogenies"), we might be willing to adjust our phylogenetic practices. We would like to propose that, in the future, for prokaryotes at least, in order to learn about the "origin of species," rather than looking for a concatenated tree made of core genes (a result only apparently impressive), phylogeneticists might be better off first developing new methodologies to investigate which sets of markers present a true evolutionary cohesion. Without having an a priori unique phylogenetic hierarchy in mind, they could study the various modes of inheritance of each of these cohesive sets of genes, their clustering and disruption and, from the bottom up, gain some knowledge on the processes that have made up lineages and sometimes kept them sufficiently stable to be identified by us. Critical progressive reconstruction analyses and the development of more reliable tests of congruence able to identify multiple sets of individual markers, as introduced here, may prove useful in that task and are nowadays certainly more and more necessary in developing a pluralistic approach of phylogenetics (Doolittle and Bapteste 2007)
.
| Supplementary Material |
|---|
|
|
|---|
Supplementary materials 1 and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Céline Brochier for critical discussions on the issue of gene concatenation. E.B. was supported by a Canadian Institutes of Health Research grant MOP4467 to W.F.D. J.L. was supported by a Canadian Institutes of Health Research grant MOP-62809 to A. Roger. E.S. was supported by a Discovery Grant awarded by the Natural Sciences and Engineering Research Council of Canada.
| Footnotes |
|---|
1 Present address: UPMC UMR 7138, 7 quai Saint-Bernard, Bâtiment A, 4ème étage, 75005 Paris, France.
2 E.B. and E.S. contributed equally to this article. ![]()
William Martin, Associate Editor
| References |
|---|
|
|
|---|
Baldauf SL. A search for the origins of animals and fungi: comparing and combining molecular data. Am Nat (1999) 154:S178–S188.[CrossRef][Medline]
Bapteste E, Brinkmann H, Lee JA. (11 co-authors). The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci USA (2002) 99(3):1414–1419.
Brinkmann H, Philippe H. Archaea sister group of bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Mol Biol Evol (1999) 16(6):817–825.[Abstract]
Brochier C, Forterre P, Gribaldo S. An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences. BMC Evol Biol (2005) 5(1):36.[CrossRef][Medline]
Brochier C, Philippe H. Phylogeny: a non-hyperthermophilic ancestor for bacteria. Nature (2002) 417(6886):244.[CrossRef][Medline]
Bucknam J, Boucher Y, Bapteste E. Refuting phylogenetic relationships. Biol Direct (2006) 1:26.[CrossRef][Medline]
Cavalier-Smith T. Rooting the tree of life by transition analyses. Biol Direct (2006) 1(1):19.[CrossRef][Medline]
Charlebois RL, Doolittle WF. Computing prokaryotic gene ubiquity: rescuing the core from extinction. Genome Res (2004) 14(12):2469–2477.
Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science (2006) 311(5765):1283–1287.
Dagan T, Martin W. The tree of one percent. Genome Biol (2006) 7:118.[CrossRef][Medline]
Doolittle WF, Bapteste E. Pattern pluralism and the Tree of Life hypothesis. Proc Natl Acad Sci USA (2007) 104(7):2043–2049.
Fast NM, Xue L, Bingham S, Keeling PJ. Re-examining alveolate evolution using multiple protein molecular phylogenies. J Eukaryot Microbiol (2002) 49(1):30–37.[CrossRef][Web of Science][Medline]
Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol (2003) 52(5):696–704.[CrossRef][Web of Science][Medline]
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney JO. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol (2006) 6:29.[CrossRef][Medline]
Leigh J, Susko E, Baumgartner M, Roger AJ, Forthcoming. Testing congruence in phylogenomic analysis. Syst Biol (2007).
Lerat E, Daubin V, Moran NA. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biol (2003) 1(1):E19.[Medline]
Phillips MJ, Delsuc F, Penny D. Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol (2004) 21(7):1455–1458.
Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Syst Biol (2002) 51(3):492–508.[CrossRef][Web of Science][Medline]
Shimodaira H, Hasegawa M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics (2001) 17(12):1246–1247.
Vinh le S, Von Haeseler A. IQPNNI: moving fast through tree space and stopping in time. Mol Biol Evol (2004) 21(8):1565–1571.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Dagan and W. Martin Getting a better picture of microbial evolution en route to a network of genomes Phil Trans R Soc B, August 12, 2009; 364(1527): 2187 - 2196. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. B. Rogozin, M. K. Basu, M. Csuros, and E. V. Koonin Analysis of Rare Genomic Changes Does Not Support the Unikont-Bikont Phylogeny and Suggests Cyanobacterial Symbiosis as the Point of Primary Radiation of Eukaryotes Gen Biol Evol, June 22, 2009; 2009(0): 99 - 113. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Galtier and V. Daubin Dealing with incongruence in phylogenomic analyses Phil Trans R Soc B, December 27, 2008; 363(1512): 4023 - 4029. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Dagan, Y. Artzy-Randrup, and W. Martin Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution PNAS, July 22, 2008; 105(29): 10039 - 10044. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





