Skip Navigation


MBE Advance Access originally published online on October 5, 2007
Molecular Biology and Evolution 2007 24(12):2594-2597; doi:10.1093/molbev/msm218
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
24/12/2594    most recent
msm218v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Rogozin, I. B.
Right arrow Articles by Koonin, E. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rogozin, I. B.
Right arrow Articles by Koonin, E. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2007.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Letter

Analysis of Rare Amino Acid Replacements Supports the Coelomata Clade

Igor B. Rogozin, Yuri I. Wolf, Liran Carmel and Eugene V. Koonin

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland

E-mail: koonin{at}ncbi.nlm.nih.gov.


    Abstract
 TOP
 Abstract
 Methods
 Acknowledgements
 References
 
The recent analysis of a novel class of rare genomic changes, RGC_CAMs (after conserved amino acids—multiple substitutions), supported the Coelomata clade of animals as opposed to the Ecdysozoa clade (Rogozin et al. 2007Go). A subsequent reanalysis, with the sequences from the sea anemone Nematostella vectensis included in the set of outgroup species, suggested that this result was an artifact caused by reverse amino replacements and claimed support for Ecdysozoa (Irimia et al. 2007Go). We show that the internal branch connecting the sea anemone to the bilaterian animals is extremely short, resulting in a weak statistical support for the Coelomata clade. Direct estimation of the level of homoplasy, combined with taxon sampling with different sets of outgroup species, reinforces the support for Coelomata, whereas the effect of reversals is shown to be relatively minor.

Key Words: phylogenetic analysis • cladistics • rare genomic changes • coelomata • ecdysozoa

As the set of sequenced genomes from diverse taxa rapidly grows, phylogenetic analysis is entering a new era when the reconstruction of the evolutionary history of organisms on the basis of full-scale comparison of their genomes becomes the strategy of choice. In addition to more traditional, genome-wide analysis of alignments, rare genomic changes (RGCs) that are likely to comprise derived shared characters of individual clades are increasingly used in genome-wide phylogenetic studies (Rokas and Holland 2000Go; Nei and Kumar 2001Go; Rokas et al. 2003Go).

We have recently proposed a new type of RGCs designated RGC_CAMs (after conserved amino acids—multiple substitutions), which are inferred using a genome-scale analysis of protein and underlying nucleotide sequence alignments (Rogozin et al. 2007Go). The RGC_CAM approach utilizes amino acid residues that are conserved in the major lineages within an analyzed taxonomic division (e.g., eukaryotes), with the exception of a few species comprising a putative clade. In addition, to reduce the effect of homoplasy, only those amino acid replacements that require 2 or 3 nucleotide substitutions are employed for phylogenetic inference. The RGC_CAM analysis has been combined with a procedure for rigorous statistical testing of competing phylogenetic hypotheses and shown to be robust to branch-length differences and taxon sampling. When applied to animal phylogeny, the RGC_CAM approach significantly supports the coelomate clade that unites chordates with arthropods as opposed to the ecdysozoan (molting animals) clade that encompasses arthropods and nematodes (Rogozin et al. 2007Go). This conclusion is compatible with some previous genome-wide phylogenetic analyses (Mushegian et al. 1998Go; Blair et al. 2002Go; Stuart and Berry 2004Go; Wolf et al. 2004Go; Philip et al. 2005Go) but not others (Copley et al. 2004Go; Dopazo and Dopazo 2005Go; Philippe et al. 2005Go) and runs against the view of animal evolution that is currently prevailing in the evolutionary developmental biology (evo-devo) community (Aguinaldo et al. 1997Go; Adoutte et al. 2000Go; Telford and Copley 2005Go).

Irimia et al. (2007)Go have further explored the RGC_CAM approach, after adding proteins from 2 recently sequenced animal genomes, the cnidarian (sea anemone) Nematostella vectensis and the nematode Brugia malayi, to the original data set of Rogozin et al. (2007)Go. The analysis of the resulting alignments has suggested that the apparent support for the coelomate clade resulted from the rapid rate of evolution in the nematodes (Irimia et al. 2007Go). There are 2 types of errors that have the potential to distort the results obtained with the RGC_CAM approach, namely, reversals and parallel changes (fig. 1). Irimia et al. (2007)Go emphasize the effect of reversals but, effectively, ignore parallel changes; furthermore, they do not report any rigorous statistical analysis of the results.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The animal phylogeny employed in this study. The node connecting Deuterostomes, nematodes, and insects is shown as a trifurcation. Branch lengths were calculated in RGC_CAM units (Rogozin et al. 2007Go), and the respective value is given above each branch. Reversals are shown in red, and parallel changes are shown in blue. Am, Apis mellifera; Ag, Anopheles gambiae; At, Arabidopsis thaliana; Bm, Brugia malayi; Cb, Caenorhabditis briggsae; Ce, Caenorhabditis elegans; Dm, Drosophila melanogaster; Hs, Homo sapiens; Mb, Monosiga brevicollis; Mm, Mus musculus; Nv, Nematostella vectensis; Pf, Plasmodium falciparum; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; St, Strongylocentrotus purpuratus.

 
Here we report a reanalysis of animal evolution with the RGC_CAM method, with special attention to the sources of potential artifacts, using a further amended data set. The adopted animal phylogeny is shown in figure 1, and the results of the RGC_CAM analysis of the set of 15 species are shown in the table 1 (top row). Only one RGC_CAM supported the coelomate clade, and 2 RGC_CAMs supported the ecdysozoan clade (table 1). Thus, considering the lengths of the respective branches, the coelomate clade still had a weak statistical support (table 1; see Methods for the details of the statistical test) under the assumption of the basal position of N. vectensis (the branch separating N. vectensis from the rest of the Bilateria is only 3 RGC_CAMs long [fig. 1], with no reversals). We further explored the support for different topologies provided by RGC_CAMs by performing taxon sampling of the outgroup species. All combinations of 10–15 species, that is, including from 1 to 6 outgroup species (63 combinations altogether), were analyzed. Of the 63 combinations, in 29 combinations of species, the raw number of RGC_CAMs compatible with the coelomate topology was greater than the number of RGC_CAMs compatible with the ecdysozoa topology, whereas the reverse was true of 32 combinations, with the remaining 2 combinations showing the same number of RGC_CAMs for both topologies (table 1). Considering the respective branch lengths, for 57 (91%) combinations of species, there was statistical support for the coelomate clade (table 1), whereas with the rest of the combinations (9%), none of the topologies received statistical support. Thus, the results of this extensive RGC_CAM analysis indicate support for the Coelomata topology but no significant support for the Ecdysozoa. As indicated by the results in table 1, the test loses most of its power when N. vectensis is included in the outgroup species set due to the very short branch connecting this species to the rest of animals. The problem could be caused by compressed cladogenesis at the base of the animal clade (Rokas et al. 2005Go; Rokas and Carroll 2006Go) although an alternative explanation, such as a whole-genome duplication with subsequent differential loss of paralogs, cannot be ruled out (Rogozin et al. 2007Go).


View this table:
[in this window]
[in a new window]

 
Table 1 RGC_CAM Analysis of the Coelomata–Ecdysozoa Problem with Sampling of the Outgroup Species

 
There are 2 types of evolutionary events that have the potential to produce artifacts in the RGC_CAM analysis, namely, parallel changes and reversals (fig. 1) (Irimia et al. 2007Go; Rogozin et al. 2007Go). Parallel changes are taken into account in the statistical test that was applied as part of the original RGC_CAM analysis (Rogozin et al. 2007Go) (table 1). However, reversals might present a substantial problem for the RGC_CAM method (Irimia et al. 2007Go). The RGC_CAM approach provides for the possibility to estimate the level of homoplasy directly. To obtain an estimate of the number of reversals, we employed the scheme shown in figure 2. We required the same amino acid to be shared by a pair of closely related nematodes (the 2 Caenorhabditis species) and outgroup species but not the rest of the animals (fig. 2). A reversal is the most parsimonious explanation for this pattern, assuming that the tree topology in the node leading to Deuterostomes, insects, and worms is a true trifurcation, and such reversals were invoked by Irimia et al. (2007)Go to explain the observed RGC_CAM support for the coelomate clade. If the tree topology in the node leading to Deuterostomes, insects, and worms is not a true trifurcation, 2 parallel changes, one in the internal branch leading to the coelomate clade and the other one in the B. malayi branch, also might explain the observed pattern. Thus, the obtained estimates give the upper bound of the number of reversals. The branches leading to the 2 Caenorhabditis species and to the 3 nematodes both comprise 63 RGC_CAMs (fig. 1). Thanks to this coincidence, the homoplasy level that is determined here can be directly compared with the results of the RGC_CAM analysis of the Coelomata–Ecdysozoa problem (figs. 1 and 2). For 50 of the 63 species sets obtained by sampling (see above), the number of RGC_CAMs supporting the coelomate topology is greater than the number of reversals (table 1). This excess is sufficient to reject the hypothesis that the RGC_CAMs supporting the coelomate topology are reversals with a high statistical significance (Student's t-test, P = 4 x 10–7). Thus, the support for the coelomate clade obtained using the RGC_CAM method is not explained solely by reversals.


Figure 2
View larger version (4K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Direct determination of the number of reversals. X and Y denote 2 amino acids found in a particular position. The reversals are shown in red. The tree is the same as in figure 1, but the species names are omitted for simplicity.

 
In summary, the results of RGC_CAM analysis reported here reinforce the support for the Coelomata clade observed with this approach in the original study (Rogozin et al. 2007Go) and additionally emphasize the importance of the analysis of multiple outgroups for obtaining reliable results in the study of deep phylogenies. Of course, the definitive solution to the coelomate–ecdysozoa conundrum will require a much larger set of complete genome sequences representing diverse animal taxa.


    Methods
 TOP
 Abstract
 Methods
 Acknowledgements
 References
 
Each of the 694 protein alignments constructed from selected eukaryotic orthologous groups (KOGs) (Koonin et al. 2004Go) analyzed here included orthologous genes from 10 eukaryotic species with completely sequenced genomes: Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, Anopheles gambiae, Plasmodium falciparum, Caenorhabditis briggsae, and Mus musculus (Rogozin et al. 2007Go). Amino acid sequence alignments are available at ftp://ftp.ncbi.nlm.nih.gov/pub/koonin/RGC_CAM/. To these KOGs, probable orthologs from 5 other animal genomes, namely, those of N. vectensis, B. malayi, Apis mellifera, Strongylocentrotus purpuratus, and Monosiga brevicollis, were added using the COGNITOR method (Tatusov et al. 1997Go). Briefly, all the protein sequences from the new genomes are compared with the protein sequences previously included in the KOGs; a protein is assigned to a KOG when 2 genome-specific best hits to members of the given KOG are detected. To minimize misalignment problems, only conserved, unambiguously aligned regions of the alignments constructed using the MUSCLE program (Edgar 2004Go) were included in the further analysis. Specifically, all positions containing a deletion or insertion in at least one sequence were removed from the protein sequence alignment together with 5 adjacent positions (Rogozin et al. 2007Go).

The statistical test of phylogenetic hypotheses is based on a null model under which, in a comparison of 2 alternative hypotheses, for example, ([X–Y],Z) versus ([X–Z],Y), the number of RGC_CAMs that are shared by 2 lineages due to chance (NXY and NXZ) is proportional to the length of the branch the position of which differs between the 2 hypotheses, that is, Y and Z, respectively. The significance of the difference between normalized numbers of RGC_CAMs was estimated using Fisher’s exact test (Rogozin et al. 2007Go).


    Acknowledgements
 TOP
 Abstract
 Methods
 Acknowledgements
 References
 
We thank Scott Roy for providing his manuscript prior to publication and Miklos Csuros for helpful discussions. The sequence data for Monosiga brevicolis were produced by the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/. The B. malayi sequencing effort (http://www.tigr.org) is part of the International Brugia Genome Sequencing Project and is supported by an award from the National Institute of Allergy and Infectious Diseases, National Institutes of Health. This work was supported in part by the Intramural Research Program of the National Library of Medicine at National Institutes of Health/Department of Health and Human Services.


    Footnotes
 
Lauren McIntyre, Associate Editor


    References
 TOP
 Abstract
 Methods
 Acknowledgements
 References
 

    Adoutte A, Balavoine G, Lartillot N, Lespinet O, Prud'homme B, de Rosa R. The new animal phylogeny: reliability and implications. Proc Natl Acad Sci USA (2000) 97:4453–4456.[Abstract/Free Full Text]

    Aguinaldo AM, Turbeville JM, Linford LS, Rivera MC, Garey JR, Raff RA, Lake JA. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature (1997) 387:489–493.[CrossRef][Medline]

    Blair JE, Ikeo K, Gojobori T, Hedges SB. The evolutionary position of nematodes. BMC Evol Biol (2002) 2:7.[CrossRef][Medline]

    Copley RR, Aloy P, Russell RB, Telford MJ. Systematic searches for molecular synapomorphies in model metazoan genomes give some support for Ecdysozoa after accounting for the idiosyncrasies of Caenorhabditis elegans. Evol Dev (2004) 6:164–169.[CrossRef][Web of Science][Medline]

    Dopazo H, Dopazo J. Genome-scale evidence of the nematode-arthropod clade. Genome Biol (2005) 6:R41.[CrossRef][Medline]

    Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 32:1792–1797.[Abstract/Free Full Text]

    Irimia M, Maeso I, Penny D, Garcia-Fernandez J, Roy SW. Rare coding sequence changes are consistent with Ecdysozoa, not Coelomata. Mol Biol Evol (2007) 24:1604–1607.[Abstract/Free Full Text]

    Koonin EV, Fedorova ND, Jackson JD, et al, (18 co-authors). A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol (2004) 5:R7.[CrossRef][Medline]

    Mushegian AR, Garey JR, Martin J, Liu LX. Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes. Genome Res (1998) 8:590–598.[Abstract/Free Full Text]

    Nei M, Kumar S. Molecular evolution and phylogenetics (2001) Oxford: Oxford University.

    Philip GK, Creevey CJ, McInerney JO. The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Mol Biol Evol (2005) 22:1175–1184.[Abstract/Free Full Text]

    Philippe H, Lartillot N, Brinkmann H. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa and Protostomia. Mol Biol Evol (2005) 22:1246–1253.[Abstract/Free Full Text]

    Rogozin IB, Wolf YI, Carmel L, Koonin EV. Ecdysozoan clade rejected by genome-wide analysis of rare amino acid replacements. Mol Biol Evol (2007) 24:1080–1090.[Abstract/Free Full Text]

    Rokas A, Carroll SB. Bushes in the tree of life. PLoS Biol (2006) 4:e352.[CrossRef][Medline]

    Rokas A, Holland PW. Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol (2000) 15:454–459.[CrossRef][Medline]

    Rokas A, King N, Finnerty J, Carroll SB. Conflicting phylogenetic signals at the base of the metazoan tree. Evol Dev (2003) 5:346–359.[CrossRef][Web of Science][Medline]

    Rokas A, Kruger D, Carroll SB. Animal evolution and the molecular signature of radiations compressed in time. Science (2005) 310:1933–1938.[Abstract/Free Full Text]

    Stuart GW, Berry MW. An SVD-based comparison of nine whole eukaryotic genomes supports a coelomate rather than ecdysozoan lineage. BMC Bioinformatics (2004) 5:204.[CrossRef][Medline]

    Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science (1997) 278:631–637.[Abstract/Free Full Text]

    Telford MJ, Copley RR. Animal phylogeny: fatal attraction. Curr Biol (2005) 15:R296–R299.[CrossRef][Web of Science][Medline]

    Wolf YI, Rogozin IB, Koonin EV. Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. Genome Res (2004) 14:29–36.[Abstract/Free Full Text]

Accepted for publication September 27, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Gen Biol EvolHome page
I. B. Rogozin, M. K. Basu, M. Csuros, and E. V. Koonin
Analysis of Rare Genomic Changes Does Not Support the Unikont-Bikont Phylogeny and Suggests Cyanobacterial Symbiosis as the Point of Primary Radiation of Eukaryotes
Gen Biol Evol, June 22, 2009; 2009(0): 99 - 113.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
A. V. Alekseyenko, C. J. Lee, and M. A. Suchard
Wagner and Dollo: A Stochastic Duet by Composing Two Parsimonious Solos
Syst Biol, October 1, 2008; 57(5): 772 - 784.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. W. Roy and M. Irimia
Rare Genomic Characters Do Not Support Coelomata: Intron Loss/Gain
Mol. Biol. Evol., April 1, 2008; 25(4): 620 - 623.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
24/12/2594    most recent
msm218v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Rogozin, I. B.
Right arrow Articles by Koonin, E. V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rogozin, I. B.
Right arrow Articles by Koonin, E. V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?