MBE Advance Access published online on October 28, 2009
Molecular Biology and Evolution, doi:10.1093/molbev/msp260
Research Article |
Evolutionary Fingerprinting of Genes
1 Department of Medicine, University of California, San Diego, CA, USA.
2 Department of Mathematical Sciences, University of Stellenbosch, Stellenbosch, South Africa.
3 School of Medicine, University of Swansea, Swansea, Wales.
4 Department of Pathology, University of California, San Diego, CA, USA.
5 Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
*To whom correspondence should be addressed; e-mail: spond{at}ucsd.edu.
Received for publication August 25, 2009. Accepted for publication October 20, 2009.
Over time, natural selection molds every gene into a unique mosaic of sites evolving rapidly or resisting change - an evolutionary fingerprint of the gene. Aspects of this evolutionary fingerprint, such as the site-specific ratio of nonsynonymous to synonymous substitution rates (dN/dS), are commonly used to identify genetic features of potential biological interest; however, no framework exists for comparing evolutionary fingerprints between genes. We hypothesize that protein coding genes with similar protein structure and/or function tend to have similar evolutionary fingerprints, and that comparing evolutionary fingerprints can be useful for discovering similarities between genes in a way that is analogous to, but independent of, discovery of similarity via sequence-based comparison tools such as BLAST.
To test this hypothesis, we develop a novel model of coding sequence evolution that uses a general bivariate discrete parameterization of the evolutionary rates. We show that this approach provides a better fit to the data using a smaller number of parameters than existing models. Next, we use the model to represent evolutionary fingerprints as probability distributions and present a methodology for comparing these distributions in a way that is robust against variations in data set size and divergence. Finally, using sequences of three rapidly evolving RNA viruses (HIV-1, Hepatitis C virus and Influenza A virus) we demonstrate that genes within the same functional group tend to have similar evolutionary fingerprints. Our framework provides a sound statistical foundation for efficient inference and comparison of evolutionary rate patterns in arbitrary collections of gene alignments, clustering homologous and non-homologous genes and investigation of biological and functional correlates of evolutionary rates.