Skip Navigation



MBE Advance Access published online on May 30, 2003

Molecular Biology and Evolution, doi:10.1093/molbev/msg115
Molecular Biology and Evolution © Society for Molecular Biology and Evolution 2003; all rights reserved
This Article
Right arrow Advance Access manuscript (PDF) Freely available
Right arrow All Versions of this Article:
20/7/1036    most recent
msg115v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Langley, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Langley, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Accepted January 29, 2003
© 2003 Society for Molecular Biology and Evolution

Original Articles

Obtaining Maximal Concatenated Phylogenetic Data Sets from Large Sequence Databases

Michael J. Sanderson 1*, Amy C. Driskell 1, Richard H. Ree 1, Oliver Eulenstein 2, Sasha Langley 1

1 Section of Evolution and Ecology, University of California, Davis, California, 95616 USA
2 Department of Computer Science, Iowa State University, Ames, IA 50011, USA

* To whom correspondence should be addressed. E-mail: mjsanderson{at}ucdavis.edu.


   Abstract

To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multi-gene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largest multi-gene data sets from a collection of sequences. The algorithm was then tested on a set of 100,000 protein sequences of green plants and used to identify the largest multi-gene ortholog data sets having at least 3 genes and 6 species. The distribution of sizes of these data sets forms a hollow curve, and the largest are surprisingly small, ranging from 62 genes by 6 species, to 3 genes by 65 species, with more symmetrical data sets of around 15 taxa by 15 genes. These upper bounds to sequence concatenation have important implications for building the tree of life from large sequence databases.

Key Words: biclique, NP-complete, sequence concatenation, phylogeny, optimization


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Syst BiolHome page
R. C. Thomson and H. B. Shaffer
Sparse Supermatrices for Phylogenetic Inference: Taxonomy, Alignment, Rogue Taxa, and the Phylogeny of Living Turtles
Syst Biol, November 11, 2009; (2009) syp075v1.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
L. S. Haggerty, F. J. Martin, D. A. Fitzpatrick, and J. O. McInerney
Gene and genome trees conflict at many levels
Phil Trans R Soc B, August 12, 2009; 364(1527): 2209 - 2219.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y. Liu, J. W. Leigh, H. Brinkmann, M. T. Cushion, N. Rodriguez-Ezpeleta, H. Philippe, and B. F. Lang
Phylogenomic Analyses Support the Monophyly of Taphrinomycotina, including Schizosaccharomyces Fission Yeasts
Mol. Biol. Evol., January 1, 2009; 26(1): 27 - 34.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Rokas and S. B. Carroll
Frequent and Widespread Parallel Evolution of Protein Sequences
Mol. Biol. Evol., September 1, 2008; 25(9): 1943 - 1953.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. J. Sanderson, D. Boss, D. Chen, K. A. Cranston, and A. Wehe
The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics Research
Syst Biol, June 1, 2008; 57(3): 335 - 346.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. D. Bailey, M. A. Koch, M. Mayer, K. Mummenhoff, S. L. O'Kane Jr, S. I. Warwick, M. D. Windham, and I. A. Al-Shehbaz
Toward a Global Phylogeny of the Brassicaceae
Mol. Biol. Evol., November 1, 2006; 23(11): 2142 - 2160.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
M. M. McMahon and M. J. Sanderson
Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes
Syst Biol, October 1, 2006; 55(5): 818 - 836.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
J. J. Wiens
Can Incomplete Taxa Rescue Phylogenetic Analyses from Long-Branch Attraction?
Syst Biol, October 1, 2005; 54(5): 731 - 742.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
C. Ane and M. J. Sanderson
Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories
Syst Biol, February 1, 2005; 54(1): 146 - 157.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
A. C. Driskell, C. Ane, J. G. Burleigh, M. M. McMahon, B. C. O'Meara, and M. J. Sanderson
Prospects for Building the Tree of Life from Large Sequence Databases
Science, November 12, 2004; 306(5699): 1172 - 1174.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
H. Philippe, E. A. Snell, E. Bapteste, P. Lopez, P. W. H. Holland, and D. Casane
Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments
Mol. Biol. Evol., September 1, 2004; 21(9): 1740 - 1752.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.