MBE Advance Access published online on July 25, 2007
Molecular Biology and Evolution, doi:10.1093/molbev/msm144
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
On Reduced Amino Acid Alphabets for Phylogenetic Inference
1 Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia
2 Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4H7
Corresponding Author: Edward Susko, Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada B3H 3J5; Phone: (902) 494-8865; Fax: (902) 494-5130; E-mail: susko{at}mathstat.dal.ca
Received for publication April 18, 2007. Revision received July 11, 2007. Accepted for publication July 12, 2007.
We investigate the use of Markov models of evolution for reduced amino acid alphabets, or, bins of amino acids. The use of reduced amino acid alphabets can ameliorate effects of model misspecification and saturation. We present algorithms for two different ways of automating the construction of bins: minimizing criteria based on properties of rate matrices and minimizing criteria based on properties of alignments. By simulation we show that in the absence of model misspecification, the loss of information due to binning is found to be insubstantial and the use of Markov models at the binned level is found to be almost as effective as the more appropriate missing data approach. By applying these approaches to real datasets where compositional heterogeneity and/or saturation appear to be causing biased tree estimation, we find that binning can improve topological estimation in practice.
Key Words: protein evolution amino acid alphabets Markov models compositional heterogeneity