MBE Advance Access originally published online on April 17, 2009
Molecular Biology and Evolution 2009 26(7):1641-1650; doi:10.1093/molbev/msp077
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix


,
* Physical Biosciences Division, Lawrence Berkeley National Laboratory
Virtual Institute of Microbial Stress and Survival, Lawrence Berkeley National Laboratory
Department of Bioengineering, University of California, Berkeley
E-mail: morgannprice{at}yahoo.com.
Accepted for publication April 11, 2009.
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(NLa + N
) memory and O(N
log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes–Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
Key Words: minimum evolution Neighbor-Joining large phylogenies
While this paper was under review, we implemented tree-comparison in O(N) space and approximately O(N) time(http://www.microbesonline.org/fasttree/ treecmp.html). This makes it possible to use the traditional bootstrap with tens of thousands of sequences.