Molecular Biology and Evolution 19:554-562 (2002)
© 2002 Society for Molecular Biology and Evolution
A Comprehensive Vertebrate Phylogeny Using Vector Representations of Protein Sequences from Whole Genomes
*Department of Life Sciences, Indiana State University;
Department of Mathematics, Rose-Hulman Institute of Technology
We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively, the vector representations of genes from individual species can be summed, allowing species trees to be produced.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. J. Taylor and B. A. Peculis Evolutionary conservation supports ancient origin for Nudt16, a nuclear-localized, RNA-binding, RNA-decapping enzyme Nucleic Acids Res., October 1, 2008; 36(18): 6021 - 6034. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Martin, N. N. Diaz, J. Ontrup, and T. W. Nattkemper Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification Bioinformatics, July 15, 2008; 24(14): 1568 - 1574. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wu, Z. Cai, X.-F. Wan, T. Hoang, R. Goebel, and G. Lin Nucleotide composition string selection in HIV-1 subtyping using whole genomes Bioinformatics, July 15, 2007; 23(14): 1744 - 1752. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hohl and M. A. Ragan Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny? Syst Biol, April 1, 2007; 56(2): 206 - 221. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. H. Chu, C. P. Li, and J. Qi Ribosomal RNA as molecular barcodes: a simple correlation analysis without sequence alignment Bioinformatics, July 15, 2006; 22(14): 1690 - 1701. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. V Edwards, W Bryan Jennings, and A. M Shedlock Phylogenetics of modern birds in the era of genomics Proc R Soc B, May 22, 2005; 272(1567): 979 - 992. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. H. Chu, J. Qi, Z.-G. Yu, and V. Anh Origin and Phylogeny of Chloroplasts Revealed by a Simple Correlation Analysis of Complete Genomes Mol. Biol. Evol., January 1, 2004; 21(1): 200 - 206. [Abstract] [Full Text] [PDF] |
||||




