MBE Advance Access first published online on August 16, 2007
This version published online on September 6, 2007
Molecular Biology and Evolution, doi:10.1093/molbev/msm169
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Combining Models of Protein Translation and Population Genetics to Predict Protein Production Rates from Codon Usage Patterns using SEMPPR
Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN, USA
Email: mikeg{at}utk.edu
Received for publication April 20, 2007. Revision received June 4, 2007. Accepted for publication August 1, 2007.
Genes are often biased in their codon usage. The degree of bias displayed often changes with expression level and intra-genic position. Numerous indices, such as the codon adaptation index, have been developed to measure this bias. While the expression level of a gene and index values are correlated, the heuristic nature of these metrics limits their ability to explain this relationship. As an alternative approach this study integrates mechanistic models of cellular and population processes in a nested manner to develop a stochastic evolutionary model of a protein's production rate (SEMPPR). SEMPPR assumes that the evolution of codon bias is driven by selection to reduce the cost of nonsense errors and that this selection is counteracted by mutation and drift. Through the application of Bayes' theorem, SEMPPR generates a posterior probability distribution for the protein production rate of a given gene. Conceptually, SEMPPR's predictions are based on the degree of adaptation to reduce the cost of nonsense errors observed in the codon usage pattern of the gene. As an illustration, SEMPPR was parametrized using the Saccharomyces cerevisiae genome and its predictions tested using available empirical data. The results indicate that SEMPPR's predictions are as reliable index based ones. In addition, SEMPPR's output is more easily interpreted and its predictions could be improved through refinements of the models upon which it is built. By assuming that codon usage patterns are primarily driven by selection to reduce the cost of nonsense errors, SEMPPR illustrates how sequence data can be used to make reliable, quantitative predictions about a gene's protein production rate.
Key Words: Codon Bias Nonsense Errors Fitness Landscape Synonymous Substitution Evolution
Figure 4 and all references to figure 4 have been corrected in this version of the article.