MBE Advance Access published online on August 16, 2007
Molecular Biology and Evolution, doi:10.1093/molbev/msm169
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Combining Models of Protein Translation and Population Genetics to Predict Protein Production Rates from Codon Usage Patterns
Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN, USA
Email: mikeg{at}utk.edu
Received for publication April 18, 2007. Revision received July 24, 2007. Accepted for publication August 1, 2007.
Genes are often biased in their codon usage. The degree of bias displayed often changes with expression level and intra-genic position. Numerous indices, such as the codon adaptation index, have been developed to measure this bias. While the expression level of a gene and index values are often correlated, the heuristic nature of these metrics limits their ability to explain this relationship. As an alternative approach this study integrates mechanistic models of cellular and population processes in a nested manner to develop a stochastic evolutionary model of a protein's production rate (SEMPPR). SEMPPR assumes that the evolution of codon bias is driven by selection to reduce the cost of nonsense errors and that this selection is counteracted by mutation and drift. Through the application of Bayes theorem, SEMPPR generates a posterior probability distribution for the protein production rate of a given gene. Conceptually, SEMPPR's predictions are based on the degree of adaptation to reduce the cost of nonsense errors observed in the codon usage pattern of the gene. As an illustration, SEMPPR was parametrized using the Saccharomyces cerevisiae genome and its predictions tested using available empirical data. The results indicate that SEMPPR's predictions are more reliable and easily interpreted than index based approaches. By assuming that codon usage patterns are primarily driven by selection to reduce the cost of nonsense errors, SEMPPR illustrates how sequence data can be used to make reliable, quantitative predictions about a gene's protein production rate.
Key Words: Codon Bias Nonsense Errors Fitness Landscape Synonymous Substitution Evolution