MBE Advance Access originally published online on April 21, 2009
Molecular Biology and Evolution 2009 26(7):1663-1676; doi:10.1093/molbev/msp078
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons



* Department of Biology, Center for Advanced Research in Environmental Genomics, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
Département de Biochimie, Centre Robert Cedergren, Université de Montréal, C.P. 6821, Succursale Centre-ville, Montréal, Québec H3C 3J7, Canada
E-mail: nicolas.rodrigue{at}uottawa.ca.
Accepted for publication April 14, 2009.
In recent years, molecular evolutionary models formulated as site-interdependent Markovian codon substitution processes have been proposed as means of mechanistically accounting for selective features over long-range evolutionary scales. Under such models, site interdependencies are reflected in the use of a simplified protein tertiary structure representation and predefined statistical potential, which, along with mutational parameters, mediate nonsynonymous rates of substitution; rates of synonymous events are solely mediated by mutational parameters. Although theoretically attractive, the models are computationally challenging, and the methods used to manipulate them still do not allow for quantitative model evaluations in a multiple-sequence context. Here, we describe Markov chain Monte Carlo computational methodologies for sampling parameters from their posterior distribution under site-interdependent codon substitution models within a phylogenetic context and allowing for Bayesian model assessment and ranking. Specifically, the techniques we expound here can form the basis of posterior predictive checking under these models and can be embedded within thermodynamic integration algorithms for computing Bayes factors. We illustrate the methods using two data sets and find that although current forms of site-interdependent models of codon substitution provide an improved fit, they are outperformed by the extended site-independent versions. Altogether, the methodologies described here should enable a quantified contrasting of alternative ways of modeling structural constraints, or other site-interdependent criteria, and establish if such formulations can match (or supplant) site-independent model extensions.
Key Words: Markov chain Monte Carlo data augmentation auxiliary variables posterior predictive checking Bayes factors protein tertiary structure
Jeffrey Thorne, Associate Editor