MBE Advance Access published online on February 7, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn030
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Influence Function for Robust Phylogenetic Reconstructions


* MAP5, Univ. René Descartes, 45 rue des Saints-Pères, 75270 PARIS cedex 06
Univ. Paris-Sud Bât 425, Dept. de maths 91405 Orsay Cedex, France
Univ. de Rennes I, UMR 6553 Ecobio, 35042 Rennes, France
E-mail: avner{at}math-info.univ-paris5.fr
Received for publication September 27, 2007. Revision received January 10, 2008. Revision received January 29, 2008. Based on the computation of the influence function, a tool to measure the impact of each piece of sampled data on the statistical inference of a parameter, we propose to analyze the support of the maximum likelihood tree for each site. We provide a new tool for filtering datasets (nucleotides, amino acids and others) in the context of maximum likelihood phylogenetic reconstructions. Because different sites support different phylogenic topologies in different ways, outlier sites, i.e. sites with a very negative influence value, are important: they can drastically change the topology resulting from the statistical inference. Therefore, these outlier sites must be clearly identified and their effects accounted for before drawing biological conclusions from the inferred tree.
A matrix containing 158 fungal terminals all belonging to Chytridiomycota, Zygomycota and Glomeromycota is analyzed. We show that removing the strongest outlier from the analysis strikingly modifies the maximum likelihood topology, with a loss of as many as 20% of the internal nodes. As a result, estimating the topology on the filtered dataset results in a topology with enhanced bootstrap support. From this analysis, the polyphyletic status of the fungal phyla Chytridiomycota and Zygomycota is reinforced suggesting the necessity of revisiting the systematics of these fungal groups. We show the ability of influence function to produce new evolution hypotheses.
Key Words: Influence function Phylogenetic Maximum likelihood Tree stability
1 All scripts written with R software are available upon request to ABH