MBE Advance Access published online on April 17, 2009
Molecular Biology and Evolution, doi:10.1093/molbev/msp077
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
FastTree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix
Physical Biosciences Division, Lawrence Berkeley ational Lab, 1 Cyclotron Road Mailstop 922, Berkeley California 94720, USA
All authors: hysical Biosciences Division, Lawrence Berkeley National Lab, Berkeley, USA; Virtual Institute of Microbial Stress and Survival, Lawrence Berkeley National Lab, Berkeley California, USA; A.P.A.: Department of Bioengineering, University of California, Berkeley, California, USA
Corresponding Author Morgan Price, Lawrence Berkeley National Lab, 1 Cyclotron Road Mailstop 922, Berkeley California 94720, USA, Phone: 510-643-3722, Fax: 510-643-3721, E-mail: morgannprice{at}yahoo.com
Received for publication February 9, 2009. Revision received April 9, 2009. Accepted for publication April 11, 2009.
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(
) memory and O(
) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
Key Words: minimum evolution neighbor joining large phylogenies
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. D. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger, J. E. Pollington, O. L. Gavin, P. Gunasekaran, G. Ceric, K. Forslund, et al. The Pfam protein families database Nucleic Acids Res., November 17, 2009; (2009) gkp985v1. [Abstract] [Full Text] [PDF] |
||||
