MBE Advance Access originally published online on May 10, 2006
Molecular Biology and Evolution 2006 23(7):1450-1454; doi:10.1093/molbev/msl012
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Cytosine Usage Modulates the Correlation between CDS Length and CG Content in Prokaryotic Genomes



* Department of Biology, University of Ottawa, Ottawa, Ontario, Canada;
Center for Advanced Research in Environmental Genomics, University of Ottawa, Ottawa, Ontario, Canada;
Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia; and
Biology Department, Concordia University, Montreal, Quebec, Canada
E-mail: xxia{at}uottawa.ca.
Previous studies have argued that, given the AT-rich nature of stop codons, the length and CG% of coding sequences (CDSs) should be positively correlated. This prediction is generally supported empirically by prokaryotic genomes. However, the correlation is weak for a number of species, with 4 species showing a negative correlation. Here we formulate a more general hypothesis incorporating selection against cytosine (C) usage to explain the lack of strong positive correlation between the length and GC% of CDSs. Two factors contribute to the selection against C usage in long CDSs. First, C is the least abundant nucleotide in the cell, and a long CDS should have fewer Cs to increase transcription efficiency. Second, C is prone to mutation to U/T and selection for increased reliability should reduce C usage in long CDSs. Empirical data from prokaryotic genomes lend strong support for this new hypothesis.
Key Words: selection against cytosine differential nucleotide availability genomic evolution CDS length CG content