MBE Advance Access published online on October 8, 2009
Molecular Biology and Evolution, doi:10.1093/molbev/msp232
Research Article |
Estimates of the effect of natural selection on protein coding content
1 Department of Statistics and Applied Probability, National University of Singapore, Singapore
2 John Curtin School of Medical Research, Australian National University, Australia
* Corresponding author: Dr Gavin Huttley, John Curtin School of Medical Research, Building 54, The Australian National University, Canberra ACT 0200 Australia, T: 61 2 6125 7961, F: 61 2 6125 2499, M: 0404 004 919, E-mail: gavin.huttley{at}anu.edu.au
Received for publication August 21, 2009. Revision received September 24, 2009. Accepted for publication September 24, 2009.
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, co-operation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (
) distinguishes neutrally evolving sequences (
= 1) from those subjected to purifying (
< 1) or positive Darwinian (
> 1) selection. We show that current models used to estimate
are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artefacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis and Lyme disease gave significant discrepancies in estimates with
10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.
Key Words: codon substitution models maximum-likelihood dN/dS natural selection molecular evolution