MBE Advance Access published online on May 19, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn115
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Signature Genes as a Phylogenomic Tool
Center for Molecular and Biomolecular Informatics / Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre
1 Center for Molecular and Biomolecular Informatics / Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre
2 Bioinformatics group, Department Biology and Academic Biomedical Centre, Utrecht University
3 Department of Molecular Evolution, Evolutionary Biology Centre, Uppsala Universitet
4 Corresponding author. Address: Center for Molecular and Biomolecular Informatics / Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre. Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands. Email: dutilh{at}cmbi.ru.nl. Phone: (0)24-3619797. Fax: (0)24-3619395
Received for publication January 25, 2008. Revision received April 18, 2008. Accepted for publication May 10, 2008.
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss, and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition.
We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that
92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarising, signature genes can complement traditional sequence based methods in addressing taxonomic questions.
Key Words: Signature genes metagenomics phylogenomics gene content slow-fast
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. J.G. Ettema and S. G.E. Andersson The {alpha}-proteobacteria: the Darwin finches of the bacterial world Biol Lett, June 23, 2009; 5(3): 429 - 432. [Abstract] [Full Text] [PDF] |
||||
