MBE Advance Access published online on October 12, 2009
Molecular Biology and Evolution, doi:10.1093/molbev/msp248
Research Article |
A Dirichlet process covarion mixture model and its assessments using posterior predictive discrepancy tests
1 Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, CP 6128 – Succursale Centre-Ville, Montréal (Québec) H3C 3J7, Canada
2 Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa ON, K1N 6N5, Canada
Corresponding author: Hervé Philippe, Département de Biochimie, Université de Montréal, Succursale Centre-Ville, Montréal, Québec H3C3J7, Canada, Email: herve.philippe{at}umontreal.ca
Received for publication May 29, 2009. Revision received September 18, 2009. Revision received October 5, 2009. Accepted for publication October 5, 2009.
Heterotachy, the variation of substitution rate at a site across time, is a prevalent phenomenon in nucleotide and amino acid alignments, which may mislead probabilistic-based phylogenetic inferences. The covarion model is a special case of heterotachy, in which sites change between the "ON" state (allowing substitutions according to any particular model of sequence evolution) and the "OFF" state (prohibiting substitutions). In current implementations, the switch rates between ON and OFF states are homogeneous across sites, a hypothesis that has never been tested. In this study we developed an infinite mixture model, called the covarion mixture (CM) model, which allows the covarion parameters to vary across sites, controlled by a Dirichlet process prior. Moreover, we combine the covarion mixture model with other approaches. We use a second independent Dirichlet process that models the heterogeneities of amino acid equilibrium frequencies across sites, known as the CAT model, and general rate-across-site heterogeneity is modeled by a gamma distribution. The application of the CM model to several large alignments demonstrates that the covarion parameters are significantly heterogeneous across sites. We describe posterior predictive discrepancy tests, and use these to demonstrate the importance of these different elements of the models.
Key Words: Heterotachy covarion phylogenetics model violations posterior predictive discrepancy