Skip Navigation


MBE Advance Access originally published online on May 5, 2006
Molecular Biology and Evolution 2006 23(7):1345-1347; doi:10.1093/molbev/msl009
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/7/1345    most recent
msl009v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fuglsang, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fuglsang, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letter

Accounting for Background Nucleotide Composition When Measuring Codon Usage Bias: Brilliant Idea, Difficult in Practice

Anders Fuglsang*,{dagger}

* Danish University of Pharmaceutical Sciences, 2 Universitetsparken, Copenhagen Ø, Denmark; and {dagger} Norwegian Medicines Agency, Sven Oftedals Vei 6-8, Oslo, Norway

E-mail: anfu{at}dfuni.dk.


    Abstract
 TOP
 Abstract
 A Practical Example
 Conclusion
 Supplementary Material
 References
 
The effective number of codons used in a gene is a commonly used measure of codon usage. It varies between 20 and 61 (standard genetic code) and indicates to which degree the entire genetic code is used. It is a drawback of this method that it does not take background composition into account. This led Novembre to introduce a variant called Nc' (Novembre JA. 2002. Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol 19:1390–4). In this letter, its properties are under the loupe, with special emphasis on phenomena relating to codon homozygosity. A theoretical misunderstanding regarding this estimator is explained in detail, notably Nc varies between 0 and 61 instead of 20 and 61 (with the standard genetic code). Practical examples from the genome of Pseudomonas aeruginosa are given which demonstrate that the problem is not just theoretical.

Key Words: codon bias • Pseudomonas aeruginosa • genetic code

Frank Wright's widely used unidimensional estimator of synonymous codon usage bias, Nc, and its descendants (Banerjee et al. 2005Go, Fuglsang 2006Go) suffer from one potential problem in that their calculation does not take background composition into consideration. Novembre (2002)Go suggested a variant called Nc', in which Wright's original thinking is combined with the contrast of observed versus expected codon frequencies, the latter being determined by the background composition. The concept is very interesting because it emphasizes that codon usage bias is a relative phenomenon. To exemplify this in a simple manner, consider glycine codon usage bias in a gene where we expect the codon GGA, GGC, GGG, and GGT to occur in the relative fractions 0.14, 0.36, 0.36, and 0.14, respectively. If we observe 6 of each, then the resulting bias (rounded to one decimal) is 3.6. If we observe the codons at counts GGA = 2, GGC = 10, GGG = 12, and GGT = 0, then the result is once again (rounded to one decimal) 3.6. So the same levels of bias measured by Novembre's method can be a consequence of either a more uniform distribution of codons than expected or a more skewed distribution of codons than expected. There is no problem with this; rather it is an elegant proposal that forces researchers to rethink the whole concept of codon usage bias.

Novembre's approach, however, introduces a problem in the methodology which Wright's original method does not have; Novembre's way of estimating codon homozygosity is based on a {chi}2-statistic (see eq. 5 in Novembre's paper). Because Formula it follows that Formula and consequently, Novembre's estimator may give values Formula instead of Formula In practice, Formula see later. Overall, we can thus obtain the values Formula (Standard genetic code; the upper limit is 62 in the case of the mitochondrial and mycoplasma genetic codes) instead of Formula Therefore, the basic property of Wright's Nc (its range) cannot readily be extrapolated to Novembre's Nc'. This fact was not given attention in the original work by Novembre, and consequently, some readers of Novembre's paper have apparently been under the impression that the range of Nc' is similar to that of Wright's Nc (Rocha 2004Go, Dean and Ballard 2005Go, Qin et al. 2004).


    A Practical Example
 TOP
 Abstract
 A Practical Example
 Conclusion
 Supplementary Material
 References
 
The matter described above is not just a theoretical phenomenon.

In the following the genes from the fully sequenced Pseudomonas aeruginosa PA01 (GenBank accession number NC_002516) have been analyzed. All annotated genes were extracted from the flatfile and analyzed if they had correct start and stop codons, no internal stop codons, and no undetermined nucleotides.

All cases where an individual Formula (eq. 7) was above 1.0, corresponding to an Formula below 1.0, were counted. One has to decide what criteria are appropriate for the expected nucleotide fractions. Ideally, these should indicate the mutational pressure somehow. The composition at third letter of synonymous codons (in the following called "3s composition") have often been used for that purpose (this is the principle of Wright's [1990]Go Nc-plot). The analysis was carried out in 2 variations over this theme. In the first analysis, each gene had its own 3s composition used for calculation of the expected nucleotide fractions, and in the second analysis, the 3s composition from all genes pooled were applied to each individual gene in order to calculate the nucleotide fractions. In practice, the choice of method applied should depend on whether we are interested in the mutational forces acting locally or genome-wide.

Figure 1a shows a histogram of the number of times Formula apparently is below 1.0 when genes have their own 3s composition used for calculation of the expected nucleotide fractions. There are more than 1900 such cases, that is, the phenomenon is common. It can be seen that it happens most often with 2-fold degenerate amino acids. Figure 1b shows a similar plot, but where the overall 3s composition has been used to calculate expected nucleotide fractions in each gene. Here, the total is over 3000. Figure 1c illustrates that Nc' is somewhat depending on whether we apply local 3s composition or overall (average) 3s composition in the estimator. The local approach does not result in estimates below 20.0, but 22 such cases are found when the overall (average) approach is used. One example is the wzz gene, for which the Nc' is 16.6. A codon usage table for this gene, including expected nucleotide fractions, observed and expected codon counts, n-, k-, and {chi}2-values is available as supplementary material online to this letter. It can be seen that with 348 codons in the gene, the problem is not caused by low codon counts but stem from the mere fact that for some of the 2-fold degenerate amino acids, and for isoleucine, the codon counts deviate quite a bit from the expected values.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
 
FIG. 1.— (a) Number of times Formula exceeds 1 (corresponding to Formula below 1) for genes in Pseudomonas aeruginosa. Each gene's own composition at third letters of synonymous codons was used to derive the expected codon counts (see text). (b) Same as (a), but here the overall genomic composition at the third letter of synonymous codons was applied to each individual gene in order to calculate expected codon counts (see text). (c) Scatterplot of the Nc'-values calculated using either each gene's own 3s composition (vertical axis) or the average 3s composition across the entire genome (horizontal axis) to derive the expected nucleotide fractions.

 
Would it then be desirable to make a change to the method so that all Formula values that exceed 1 are automatically corrected to 1? I think this would be a sensible thing to do from the point of view that it would make the algorithm return values in the desired interval (20–61, standard genetic code). On the other hand, it would result in a loss of valuable information, and it is per principle not scientifically intuitive to make these adjustments because they are just necessary when observed codon counts are too different from the expected counts. After all, the purpose of the estimator was to take this difference into account. For that reason, I tend to believe that this kind of adjustment is not acceptable.

Another less severe methodological problem is the numerator in Novembre's equation (5) which may become negative when the observed and expected counts are close similar and when the number of codons for the given amino acid is less than the degeneracy. The work-around suggested by Novembre (2002)Go is to skip the calculation when there are less than 5 (later corrected to 6) codons for the amino acid. On the positive side, we can then be absolutely sure that the homozygosity turns out above 0, and random phenomena in the counts will have less effect. But on the negative side, we have potentially discarded some codon bias information. For simplicity, suppose for example that we have 2 hypothetical genes in which there are 4 glycine codons (in addition to codons encoding other amino acids). In gene 1, these glycine codons consist of exclusively GGA, whereas in gene 2, there is one GGA, GGC, GGG, and GGT. For both genes, the calculation of homozygosity would be skipped and Formula would be set equal to the average of codon homozygosity for the other 4-fold degenerate amino acids. If, furthermore, the genes only differ in their glycine codons then the 2 genes would turn out having the exact same codon usage bias "unconditionally." The term unconditionally is used here because these 2 situations might signify equal bias (depending on the expected codon fractions as explained in the glycine example given earlier), but they might also be very different.


    Conclusion
 TOP
 Abstract
 A Practical Example
 Conclusion
 Supplementary Material
 References
 
Novembre's method for calculating codon bias clearly does measure some degree of codon bias, but its range is by nature different from Wright's Nc. This probably does not affect the conclusion in the papers by Rocha (2004)Go, Dean and Ballard (2005)Go, and Qin et al. (2004) because the conclusions drawn in these works rely on correlations.

In cases where the absolute value of the estimator is crucial and where it is desirable that the estimator has the natural range of [20; 61], it might be satisfactory to apply the principles underlying Wright's (1990)Go Nc-plot in stead and somehow use the vertical distance between an observed point and its theoretical value as an indicator of the codon bias that cannot be explained by mutation alone. One argument against this is that the interpretation of a vertical distance of, say, 3.0 codons may be dependent on the corresponding location on the horizontal axis (the Nc-plot is bell shaped; the highest possible vertical difference has a global maximum). There is therefore a need for further work in this field.


    Supplementary Material
 TOP
 Abstract
 A Practical Example
 Conclusion
 Supplementary Material
 References
 
Supplementary table is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Footnotes
 
Kenneth Wolfe, Associate Editor


    References
 TOP
 Abstract
 A Practical Example
 Conclusion
 Supplementary Material
 References
 

    Banerjee T, Gupta SK, Ghosh TC. 2005. Towards a resolution on the inherent methodological weakness of the "effective number of codons used by a gene." Biochem Biophys Res Commun 330:1015–8.[CrossRef][Medline]

    Dean MD, Ballard JW. 2005. High divergence among Drosophila simulans mitochondrial haplogroups arose in midst of long term purifying selection. Mol Phylogenet Evol 36:328–37.[Medline]

    Fuglsang A. 2006. Estimating the "effective number of codons": the Wright way of determining codon homozygosity leads to superior estimates. Genetics 172:1301–7.[Abstract/Free Full Text]

    Novembre JA. 2002. Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol 19:1390–4.[Free Full Text]

    Qin H, Wu WB, Comeron JM, Kreitman M, Li W-H. 2004. Intragenic spatial patterns of codon usage bias in prokaryotic and eukaryotic genomes. Genetics 168:2245–60.[Abstract/Free Full Text]

    Rocha EP. 2004. Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res 14:2279–86.[Abstract/Free Full Text]

    Wright F. 1990. The ‘effective number of codons’ used in a gene. Gene 87:23–9.[CrossRef][Web of Science][Medline]

Accepted for publication April 19, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
S. Subramanian
Nearly Neutrality and the Evolution of Codon Usage Bias in Eukaryotic Genomes
Genetics, April 1, 2008; 178(4): 2429 - 2432.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/7/1345    most recent
msl009v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fuglsang, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fuglsang, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?