MBE Advance Access originally published online on April 17, 2009
Molecular Biology and Evolution 2009 26(7):1641-1650; doi:10.1093/molbev/msp077
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix


,
* Physical Biosciences Division, Lawrence Berkeley National Laboratory
Virtual Institute of Microbial Stress and Survival, Lawrence Berkeley National Laboratory
Department of Bioengineering, University of California, Berkeley
E-mail: morgannprice{at}yahoo.com.
Accepted for publication April 11, 2009.
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(NLa + N
) memory and O(N
log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes–Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
Key Words: minimum evolution Neighbor-Joining large phylogenies
While this paper was under review, we implemented tree-comparison in O(N) space and approximately O(N) time(http://www.microbesonline.org/fasttree/ treecmp.html). This makes it possible to use the traditional bootstrap with tens of thousands of sequences.
Koichiro Tamura, Associate Editor
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. D. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger, J. E. Pollington, O. L. Gavin, P. Gunasekaran, G. Ceric, K. Forslund, et al. The Pfam protein families database Nucleic Acids Res., November 17, 2009; (2009) gkp985v1. [Abstract] [Full Text] [PDF] |
||||
