MBE Advance Access published online on January 29, 2008
Molecular Biology and Evolution, doi:10.1093/molbev/msn018
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
A Site- and Time-Heterogeneous Model of Amino-Acid Replacement
Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
1 Corresponding author: samuel.blanquart{at}lirmm.fr
Received for publication August 14, 2007. Revision received December 17, 2007. Revision received January 10, 2008. Accepted for publication January 20, 2008.
We combined the CAT mixture model (Lartillot and Philippe 2004) and the non-stationary BP model (Blanquart and Lartillot 2006) into a new model, CAT-BP, accounting for variations of the evolutionary process both along the sequence and across lineages. As in CAT, the model implements a mixture of distinct Markovian processes of substitution distributed among sites, thus accommodating site-specific selective constraints induced by protein structure and function. Furthermore, as in BP, these processes are non-stationary, and their equilibrium frequencies are allowed to change along lineages in a correlated way, through discrete shifts in global amino acid composition distributed along the phylogenetic tree.
We implemented the CAT-BP model in a Bayesian Markov Chain Monte Carlo framework, and compared its predictions with those of three simpler models, BP, CAT, and the site- and time-homogeneous GTR model, on a concatenation of four mitochondrial proteins of 20 arthropod species. In contrast to GTR, BP and CAT, which all display a phylogenetic reconstruction artefact positioning the bees Apis m. and Melipona b. among chelicerates, the CAT-BP model is able to recover the monophyly of insects. Using posterior predictive tests, we further show that the CAT-BP combination yields better anticipations of site- and taxon-specific amino acid frequencies, and that it better accounts for the homoplasies that are responsible for the artefact.
Altogether, our results show that the joint modelling of heterogeneities across sites and along time results in a synergistic improvement of the phylogenetic inference, indicating that it is essential to disentangle the combined effects of both sources of heterogeneity, in order to overcome systematic errors in protein phylogenetic analyses.
Key Words: phylogeny MCMC non-stationary mixture posterior predictive model violation LBA