MBE Advance Access originally published online on September 29, 2006
Molecular Biology and Evolution 2007 24(1):6-9; doi:10.1093/molbev/msl137
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letters |
Lack of Resolution in the Animal Phylogeny: Closely Spaced Cladogeneses or Undetected Systematic Errors?

* Canadian Institute for Advanced Research and Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
Département des Sciences de la Vie, Université de Liège, Liège, Belgium
E-mail: herve.philippe{at}umontreal.ca.
| Abstract |
|---|
|
|
|---|
A recent phylogenomic study reported that the animal phylogeny was unresolved despite the use of 50 genes. This lack of resolution was interpreted as "a positive signature of closely spaced cladogenetic events." Here, we propose that this lack of resolution is rather due to the mutual cancellation of the phylogenetic signal (historical) and the nonphylogenetic signal (due to systematic errors) that results from inadequate taxon sampling and/or model of sequence evolution. Starting with a data set of comparable size, we use 3 different strategies to reduce the nonphylogenetic signal: 1) increasing the number of species; 2) replacing a fast-evolving species by a slowly evolving one; and 3) using a better model of sequence evolution. In all cases, the phylogenetic resolution is markedly improved, in agreement with our hypothesis that the originally reported lack of resolution was artifactual.
Key Words: animal evolution phylogenomics phylogenetic resolution systematic error nonphylogenetic signal
Recently, Rokas, Kruger, and Carroll (2005)
(RKC) were unable to resolve most nodes of the phylogenetic tree of animals despite the use of 50 genes (12,060 amino acid positions) from 17 animal species. After dismissing potential alternative explanations (e.g., missing data, long-branch attraction, compositional bias, and taxon sampling), they concluded that "the lack of resolution is a positive signature of closely spaced cladogenetic events." This conclusion is at odds with several similar studies reporting a good resolution for the animal tree (Philippe, Lartillot, et al. 2005
; Delsuc et al. 2006
; Marletaz et al. 2006
; Matus et al. 2006
). Although phylogenomic approaches are effective at reducing the stochastic (sampling) error, they have been shown to be sensitive to systematic errors (Delsuc et al. 2005
). Systematic errors stem from the inaccuracy of tree reconstruction methods, which, in a probabilistic framework, is directly related to model misspecifications (Felsenstein 2004
; Philippe, Zhou, et al. 2005
). In opposition to the genuine phylogenetic signal, the signal that results from systematic errors is called nonphylogenetic (Ho and Jermiin 2004
; Philippe, Delsuc, et al. 2005
). Interestingly, RKC trees have markedly lower statistical support with Parsimony than with maximum likelihood (ML). Because Parsimony is known to be more sensitive to systematic errors than ML (Felsenstein 2004
), we propose that the lack of resolution observed by Rokas et al. (2005)
is due to the presence of a similar amount of conflicting phylogenetic and nonphylogenetic signal. The nonphylogenetic signal results from the incorrect interpretation, by the reconstruction method, of multiple substitutions occurring at a given position over the phylogeny (e.g., convergent acquisition of an A by 2 unrelated AT-rich species) (Olsen 1987
). Consequently, it will be stronger for ancient and fast-evolving lineages. Given the age of the metazoan radiation, any alignment of animal sequences is expected to contain a sizable amount of nonphylogenetic signal. This is especially true for the RKC data set that included several fast-evolving species (e.g., Caenorhabditis and platyhelminths). If our interpretation is correct, using strategies that reduce the nonphylogenetic signal without altering the genuine phylogenetic signal should increase resolution. As abundantly discussed, reduction of the nonphylogenetic signal can be achieved either by improving the detection of multiple substitutions or by minimizing their number. Accordingly, we use 3 complementary approaches to test our hypothesis: 1) increasing the number of species (improved detection; Hendy and Penny 1989
; Hillis 1996
); 2) replacing a fast-evolving species by a slowly evolving species (minimization; Aguinaldo et al. 1997
; Graybeal 1998
); and 3) using a better model of sequence evolution (improved detection; Olsen 1987
; Whelan and Goldman 2001
; Lartillot and Philippe 2004
).
The scarce taxon sampling of the RKC data set prevented its use in this work. Instead, we updated our previous alignments (Delsuc et al. 2006
) and assembled a data set of 133 genes (31,089 amino acid positions) from 57 species. To reduce the computational burden, only the 12,942 positions determined for at least 75% of the species were taken, yielding a data set with 12% of missing positions. Representative groups for the 3 clades currently recognized within Bilateria (Halanych 2004
) were chosen as follows: Deuterostomia (tunicates and vertebrates), Ecdysozoa (arthropods, nematodes, and tardigrades), and Lophotrochozoa (annelids, mollusks, and platyhelminths), the latter 2 groups forming the Protostomia. The multilayered outgroup was composed of fungi, choanoflagellates, sponges, and cnidarians. Four variants of this data set were then constructed to explore the effects of taxon sampling (see below). Because of the potential limitations of the current heuristic algorithms (Philippe, Lartillot, et al. 2005
), trees were inferred using a WAG +
4 model (Yang 1993
; Whelan and Goldman 2001
) with 3 different heuristics (TREEFINDER [Jobb et al. 2004
], PHYML [Guindon and Gascuel 2003
] and PHYML with SPR moves [Hordijk and Gascuel 2005
]) and an exhaustive analysis with constraints (see Supplementary Material online). All heuristics behaved similarly (fig. S1, Supplementary Material online) and were somewhat biased toward the artifactual attraction of nematodes and platyhelminths (see Supplementary Material online). Therefore, only PHYML–SPR bootstrap values are presented, except for the 4 internal nodes within protostomes for which results from the exhaustive analysis are also provided. In the following, the average bootstrap support for these 4 nodes (AB4N) will be used to estimate the level of resolution. As usual, high bootstrap supports are indicative of robustness (data consistency) and not necessarily of accuracy (correct reconstruction of a clade).
Trees inferred with our data set using a taxon sampling similar to RKC (23 species, fig. 1A) received marginally better support values than those obtained with the RKC alignment. In particular, we strongly recovered Bilateria and Protostomia. The low resolution reported by RKC for these nodes could be caused by an unfortunate selection of genes, including some of questionable orthology (i.e., cytosolic HSP70 [Martin and Burg 2002
] and tubulins [Philippe, Lartillot, et al. 2005
]). Alternatively, it could stem from a slightly different species sampling (especially hexactinellid and calcareous poriferans). We will thus focus on protostome relationships for which the resolution of both data sets is similar. Inside Protostomia, the artifactual grouping of the fast-evolving nematode Caenorhabditis with platyhelminths (Philippe, Lartillot, et al. 2005
) confirms the presence of a strong nonphylogenetic signal.
|
To test our hypothesis that the lack of resolution is due to the mutual cancellation of phylogenetic and nonphylogenetic signal, we 1) added 33 animal species to improve the detection of multiple substitutions by breaking long branches (Hendy and Penny 1989
A third way to overcome the nonphylogenetic signal is to use better evolutionary models (Whelan et al. 2001
; Felsenstein 2004
; Philippe, Delsuc, et al. 2005
). Hence, to further test our hypothesis, we used the category (CAT) +
model (Lartillot and Philippe 2004
) in an Markov Chain Monte Carlo framework, as implemented in PhyloBayes (http://www.lirmm.fr/mab/). The CAT model explicitly handles the heterogeneity of the substitution process across positions. In particular, it ensures a better detection of multiple substitutions at positions displaying only 2 or 3 different amino acids (Lartillot et al. 2006
). Instead of evaluating the resolution with posterior probabilities, we performed a bootstrap analysis to allow the comparison with the ML approach used so far. Despite the use of Caenorhabditis (fig. 1B and Supplementary Material online), the CAT model yielded better support values than the WAG model (for 23 species, AB4N of 66% vs. 58% and for 56 species, 91% vs. 76%). In addition, a more careful examination (fig. 1B) shows that 3 out of 4 nodes (Lophotrochozoa, Ecdysozoa, and nematodes + tardigrades) received 100% bootstrap support, whereas the last node corresponding to the position of platyhelminths within Lophotrochozoa received relatively low bootstrap support (65%). This raises an interesting point concerning the connection between phylogenetic/nonphylogenetic signal and the observed resolution. As proposed by our hypothesis, the lack of resolution can be due to the mutual cancellation of both signals, even when the genuine phylogenetic signal is strong (e.g., Ecdysozoa). In contrast, the apparent resolution of the basal position of platyhelminths within Lophotrochozoa is likely caused by a strong nonphylogenetic signal (not surprising given their long branch). When the CAT model is used, the support for this artifactual position seriously decreases, which suggests that the genuine phylogenetic signal for positioning platyhelminths is weak. Interestingly, the CAT model is less sensitive to taxon sampling than the WAG model and is therefore a valuable step toward the elaboration of an optimal model that would have the desirable property of drawing inferences insensitive to taxon sampling.
In conclusion, the results from 3 independent approaches have confirmed our hypothesis that the lack of resolution in the animal phylogeny observed by RKC and ourselves was due to a strong nonphylogenetic signal and does not constitute "a positive signature of closely spaced cladogenetic events." Our results are congruent with many previous studies underlining the prime importance of taxon sampling, either empirically or through simulation (e.g., Wheeler 1992
; Lecointre et al. 1993
; Adachi and Hasegawa 1995
; Hillis 1996
; Graybeal 1998
; Hedtke et al. 2006
). Although the emphasis is often put on the accuracy of phylogenetic inference, the present work demonstrates that its resolving power can also be drastically affected by taxon sampling. More generally, our opinion is that it is no longer worthwhile to argue on the relative benefits of gene versus taxon sampling (Graybeal 1998
; Rosenberg and Kumar 2001
; Rokas and Carroll 2005
; Hedtke et al. 2006
) but that progress in sequencing technology will lead to data sets rich in both genes and taxa (Philippe and Telford 2006
). Instead, phylogeneticists should put most of their efforts into developing better models of sequence evolution (Lanave et al. 1984
; Galtier and Gouy 1995
; Tuffley and Steel 1998
; Whelan and Goldman 2001
; Robinson et al. 2003
; Blanquart and Lartillot 2006
), which improve both accuracy and resolution, as shown here with the CAT model.
| Acknowledgements |
|---|
|
|
|---|
This work was supported by operating funds from Genome Québec and Natural Sciences and Engineering Research Council. H.P. is a member of the Program in Evolutionary Biology of the Canadian Institute for Advanced Research, whom we thank for interaction support, and is grateful to the Canada Research Chairs Program and the Canadian Foundation for Innovation for salary and equipment support. D.B. is a postdoctoral researcher of the Fonds National de la Recherche Scientifique (FNRS) at the University of Liège (Belgium) and is gratefully indebted to the FNRS for the financial support of his stay at the University of Montréal. The authors want to thank Nicolas Lartillot, Didier Casane, Béatrice Roure, and Naiara Rodríguez-Ezpeleta for their comments on a previous version of the manuscript, as well as 2 anonymous reviewers for useful suggestions. Nicolas Lartillot is also gratefully acknowledged for his help with the PhyloBayes analyses.
| Footnotes |
|---|
Scott Edwards, Associate Editor
| References |
|---|
|
|
|---|
Adachi J and Hasegawa M. (1995) Phylogeny of whales: dependence of the inference on species sampling. Mol Biol Evol 12:177–179.[Web of Science][Medline]
Aguinaldo AM, Turbeville JM, Linford LS, Rivera MC, Garey JR, Raff RA, Lake JA. (1997) Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387:489–493.[CrossRef][Medline]
Blanquart S and Lartillot N. (2006) A bayesian compound stochastic process for modelling non-stationary and non-homogeneous sequence evolution. Mol Biol Evol 23:2058–2071.
Delsuc F, Brinkmann H, Chourrout D, Philippe H. (2006) Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature 439:965–968.[CrossRef][Medline]
Delsuc F, Brinkmann H, Philippe H. (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375.[Web of Science][Medline]
Felsenstein J. (2004) Inferring phylogenies(Sinauer Associates, Inc, Sunderland (MA)).
Galtier N and Gouy M. (1995) Inferring phylogenies from DNA sequences of unequal base compositions. Proc Natl Acad Sci USA 92:11317–11321.
Goldstein B and Blaxter M. (2002) Tardigrades. Curr Biol 12:R475.[CrossRef][Web of Science][Medline]
Graybeal A. (1998) Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol 47:9–17.[CrossRef][Web of Science][Medline]
Guindon S and Gascuel O. (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704.[CrossRef][Web of Science][Medline]
Halanych KM. (2004) The new view of animal phylogeny. Annu Rev Ecol Evol Syst 35:229–256.[CrossRef]
Hedtke SM, Townsend TM, Hillis DM. (2006) Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol 55:522–529.[CrossRef][Medline]
Hendy MD and Penny D. (1989) A framework for the quantitative study of evolutionary trees. Syst Zool 38:297–309.[CrossRef][Web of Science]
Hillis DM. (1996) Inferring complex phylogenies. Nature 383:130–131.[CrossRef][Medline]
Ho SY and Jermiin L. (2004) Tracing the decay of the historical signal in biological sequence data. Syst Biol 53:623–637.[CrossRef][Web of Science][Medline]
Hordijk W and Gascuel O. (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21:4338–4347.
Jobb G, von Haeseler A, Strimmer K. (2004) TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol 4:18.[CrossRef][Medline]
Lanave C, Preparata G, Saccone C, Serio G. (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93.[CrossRef][Web of Science][Medline]
Lartillot N, Brinkmann H, Philippe H. Forthcoming. (2006) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol.
Lartillot N and Philippe H. (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:1095–1109.
Lecointre G, Philippe H, Van Le HL, Le Guyader H. (1993) Species sampling has a major impact on phylogenetic inference. Mol Phylogenet Evol 2:205–224.[CrossRef][Medline]
Marletaz F, Martin E, Perez Y, et al. (12 co-authors). (2006) Chaetognath phylogenomics: a protostome with deuterostome-like development. Curr Biol 16:R577–R578.[CrossRef][Web of Science][Medline]
Martin AP and Burg TM. (2002) Perils of paralogy: using HSP70 genes for inferring organismal phylogenies. Syst Biol 51:570–587.[CrossRef][Web of Science][Medline]
Matus DQ, Copley RR, Dunn CW, Hejnol A, Eccleston H, Halanych KM, Martindale MQ, Telford MJ. (2006) Broad taxon and gene sampling indicate that chaetognaths are protostomes. Curr Biol 16:R575–R576.[CrossRef][Web of Science][Medline]
Olsen G. (1987) Earliest phylogenetic branching: comparing rRNA-based evolutionary trees inferred with various techniques. Cold Spring Harb Symp Quant Biol LII:825–837.
Philippe H, Delsuc F, Brinkmann H, Lartillot N. (2005) Phylogenomics. Annu Rev Ecol Evol Syst 36:541–562.[CrossRef]
Philippe H, Lartillot N, Brinkmann H. (2005) Multigene analyses of bilaterian animals corroborate the monophyly of ecdysozoa, lophotrochozoa, and protostomia. Mol Biol Evol 22:1246–1253.
Philippe H and Telford MJ. (2006) Large-scale sequencing and the new animal phylogeny. Trends Ecol Evol 21:614–620.[CrossRef][Medline]
Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F. (2005) Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol 5:50.[CrossRef][Medline]
Robinson DM, Jones DT, Kishino H, Goldman N, Thorne JL. (2003) Protein evolution with dependence among codons due to tertiary structure. Mol Biol Evol 28:28.
Rokas A and Carroll SB. (2005) More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol 22:1337–1344.
Rokas A, Kruger D, Carroll SB. (2005) Animal evolution and the molecular signature of radiations compressed in time. Science 310:1933–1938.
Rosenberg MS and Kumar S. (2001) Incomplete taxon sampling is not a problem for phylogenetic inference. Proc Natl Acad Sci USA 98:10751–10756.
Tuffley C and Steel M. (1998) Modeling the covarion hypothesis of nucleotide substitution. Math Biosci 147:63–91.[CrossRef][Web of Science][Medline]
Wheeler WC. (1992) Extinction, sampling, and molecular phylogenetics. In Novacek MJ and Wheeler QD (Eds.). Extinction and phylogeny(Columbia University Press, New York) pp. 205–215.
Whelan S and Goldman N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699.
Whelan S, Lio P, Goldman N. (2001) Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet 17:262–272.[CrossRef][Web of Science][Medline]
Yang Z. (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10:1396–1401.[Abstract]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Helmkampf, I. Bruchhaus, and B. Hausdorf Phylogenomic analyses of lophophorates (brachiopods, phoronids and bryozoans) confirm the Lophotrochozoa concept Proc R Soc B, August 22, 2008; 275(1645): 1927 - 1933. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Li, G. Lu, and G. Orti Optimal Data Partitioning and a Test Case for Ray-Finned Fishes (Actinopterygii) Based on Ten Nuclear Loci Syst Biol, August 1, 2008; 57(4): 519 - 539. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Lartillot and H. Philippe Improvement of molecular phylogenetic inference and the phylogeny of Bilateria Phil Trans R Soc B, April 27, 2008; 363(1496): 1463 - 1472. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A Jenner and D.T. J Littlewood Problematica old and new Phil Trans R Soc B, April 27, 2008; 363(1496): 1503 - 1512. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J Swalla and A. B Smith Deciphering deuterostome phylogeny: molecular, morphological and palaeontological perspectives Phil Trans R Soc B, April 27, 2008; 363(1496): 1557 - 1568. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Struck and F. Fisse Phylogenetic Position of Nemertea Derived from Phylogenomic Data Mol. Biol. Evol., April 1, 2008; 25(4): 728 - 736. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy and M. Irimia Rare Genomic Characters Do Not Support Coelomata: Intron Loss/Gain Mol. Biol. Evol., April 1, 2008; 25(4): 620 - 623. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jian, P. S. Soltis, M. A. Gitzendanner, M. J. Moore, R. Li, T. A. Hendry, Y.-L. Qiu, A. Dhingra, C. D. Bell, and D. E. Soltis Resolving an Ancient, Rapid Radiation in Saxifragales Syst Biol, February 1, 2008; 57(1): 38 - 57. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Hausdorf, M. Helmkampf, A. Meyer, A. Witek, H. Herlyn, I. Bruchhaus, T. Hankeln, T. H. Struck, and B. Lieb Spiralian Phylogenomics Supports the Resurrection of Bryozoa Comprising Ectoprocta and Entoprocta Mol. Biol. Evol., December 1, 2007; 24(12): 2723 - 2729. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. V. Lavrov Key transitions in animal evolution: a mitochondrial DNA perspective Integr. Comp. Biol., November 1, 2007; 47(5): 734 - 743. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rodrigue, H. Philippe, and N. Lartillot Exploring Fast Computational Strategies for Probabilistic Phylogenetic Analysis Syst Biol, October 1, 2007; 56(5): 711 - 726. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Irimia, I. Maeso, D. Penny, J. Garcia-Fernandez, and S. W. Roy Rare Coding Sequence Changes are Consistent with Ecdysozoa, not Coelomata Mol. Biol. Evol., August 1, 2007; 24(8): 1604 - 1607. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Jimenez-Guri, H. Philippe, B. Okamura, and P. W. H. Holland Buddenbrockia Is a Cnidarian Worm Science, July 6, 2007; 317(5834): 116 - 118. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Haen, B. F. Lang, S. A. Pomponi, and D. V. Lavrov Glass Sponges and Bilaterian Animals Share Derived Mitochondrial Genomic Features: A Common Ancestry or Parallel Evolution? Mol. Biol. Evol., July 1, 2007; 24(7): 1518 - 1527. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gottschling, A. Stamatakis, I. Nindl, E. Stockfleth, A. Alonso, and I. G. Bravo Multiple Evolutionary Mechanisms Drive Papillomavirus Diversification Mol. Biol. Evol., May 1, 2007; 24(5): 1242 - 1258. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






