MBE Advance Access published online on March 21, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn066
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
An Information-Theoretic Method for the Treatment of Plural Ancestry in Phylogenetics
1 Center for Computational Immunology, Department of Biostatistics and Bioinformatics, Duke University Medical Center
2 Computational Biology and Bioinformatics PhD Program, Institute for Genome Sciences and Policy, Duke University
3 Department of Immunology, Department of Statistical Science, Duke University
Corresponding Author, Dr. Thomas B. Kepler, Box 2734 DUMC, 2424 Erwin Road, Hock Plaza G035, Durham NC 27705, TEL: +1 919 681 0620; FAX: +1 919 668-5888, kepler{at}duke.edu
Received for publication August 20, 2007. Revision received January 31, 2008. Accepted for publication March 16, 2008.
In the presence of recombination and gene conversion, a given genomic segment may inherit information from two distinct immediate ancestors. The importance of this type of molecular inheritance has become increasingly clear over the years, and the potential for erroneous inference when it is not accounted for in the statistical model is well documented. Yet the inclusion of plural ancestry in phylogenetic analysis is still not routine. This omission is due to the greater difficulty of phylogenetic inference on general acyclic graphs compared that on to trees and the accompanying computational burden. We have developed a technique for phylogenetic inference in the presence of plural ancestry based on the principle of minimum description length, which assigns a cost—the description length—to each network topology given the observed sequence data. The description length combines the cost of poor data fit and model complexity in terms of information. This device allows us to search through network topologies to minimize the total description length. By comparing the best models obtained with and without plural ancestry, one can determine whether or not recombination has played an active role in the evolution of the genes under investigation, identify those genes that appear to be mosaic, and infer the phylogenetic network that best represents the history of the alignment. We show that the method performs well on simulated data and demonstrate its application on HIV env gene sequence data from 8 human subjects. The software implementation of the method is available upon request.
Key Words: Recombination MDL gene conversion phylogenetic networks