MBE Advance Access published online on September 28, 2007
Molecular Biology and Evolution, doi:10.1093/molbev/msm190
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2007 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Research Article |
Insights from Modeling Protein Evolution with Context-Dependent Mutation and Asymmetric Amino Acid Selection
1 Department of Genome Sciences, Box 355065, University of Washington, Seattle, WA 98195, USA
2 Howard Hughes Medical Institute, Seattle, WA 98105, USA
3 To whom correspondence should be addressed: ctsa{at}u.washington.edu (C.T.S.) or phg{at}u.washington.edu (P.G.)
Received for publication May 20, 2007. Revision received August 30, 2007. Accepted for publication September 6, 2007.
We develop an approximate maximum likelihood method to estimate flanking nucleotide context-dependent mutation rates and amino acid exchange-dependent selection in orthologous protein coding sequences, and use it to analyze genome-wide coding sequence alignments from mammals and yeast. Allowing context-dependent mutation provides a better fit to coding sequence data than simpler (context-independent, or CpG hotspot) models, and significantly affects selection parameter estimates. Allowing asymmetric (nonreciprocal) selection on amino acid exchanges gives a better fit than simple dN/dS or symmetric selection models. Relative selection strength estimates from our models show good agreement with independent estimates derived from human disease-causing and engineered mutations. Selection coefficients depend on local protein structure, showing expected biophysical trends in helical vs. non-helical regions, and increased asymmetry on polar-hydrophobic exchanges with increased burial. The more stringent selection that has previously been observed for highly expressed proteins is primarily concentrated in buried regions, supporting the notion that such proteins are under stronger than average selection for stability. Our analyses indicate that a highly parameterized model of mutation and selection is computationally tractable, and is a useful tool for exploring a variety of biological questions concerning protein and coding sequence evolution.