Skip Navigation


MBE Advance Access originally published online on August 6, 2008
Molecular Biology and Evolution 2008 25(11):2279-2291; doi:10.1093/molbev/msn173
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/11/2279    most recent
msn173v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Higgs, P. G.
Right arrow Articles by Ran, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Higgs, P. G.
Right arrow Articles by Ran, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Coevolution of Codon Usage and tRNA Genes Leads to Alternative Stable States of Biased Codon Usage

Paul G. Higgs and Wenqi Ran

Department of Physics and Astronomy, McMaster University, Hamilton, Ontario, Canada

E-mail: higgsp{at}mcmaster.ca.


    Abstract
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
The typical number of tRNA genes in bacterial genomes is around 50, but this number varies from under 30 to over 120. We argue that tRNA gene copy numbers evolve in response to translational selection. In rapidly multiplying organisms, the time spent in translation is a limiting factor in cell division; hence, it pays to duplicate tRNA genes, thereby increasing the concentration of tRNA molecules in the cell and speeding up translation. In slowly multiplying organisms, translation time is not a limiting factor, so the overall translational cost is minimized by reducing the tRNAs to only one copy of each required gene. Translational selection also causes a preference for codons that are most rapidly translated by the current tRNAs; hence, codon usage and tRNA gene content will coevolve to a state where each is adapted to the other. We show that there is often more than one stable coevolved state. This explains why different combinations of tRNAs and codon bias can exist for different amino acids in the same organism. We analyze a set of 80 complete bacterial genomes and show that the theory predicts many of the trends that are seen in these data.

Key Words: codon usage • translational efficiency • tRNA genes • translational kinetics


    Introduction
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
It has long been known that synonymous codons in many species are not used with equal frequency. One of the major causes of this is selection for translational efficiency, that is, an organism prefers to use codons that are more rapidly translated in order to reduce the time and effort spent on translation (Sharp et al. 1988Go, 2005Go; Akashi 2003Go; Dos Reis et al. 2003Go). In many organisms, the preferred codons are the ones that match the tRNAs that are most abundant in the cell (Ikemura 1981Go, 1985Go; Percudani et al. 1997Go; Duret 2000Go) because there is more rapid translation of codons whose matching tRNA concentration is higher. In cases where there is only one type of tRNA for an amino acid, the preferred codon is usually the one that is more rapidly translated by that tRNA. A signal of the presence of selection for translational efficiency is that relative codon frequencies in high- and low-expression genes in the same genome are different. Measures such as the codon adaptation index (Sharp and Li 1987Go) and the effective number of codons (Wright 1990Go) usually indicate that codon usage is more biased in the highly expressed genes. It is usually supposed that frequencies of codons in low-expression genes are principally determined by mutation rates, whereas those in high-expression genes are influenced by selection as well as mutation. By comparing the two, it is possible to estimate the strength of selection acting on the high-expression genes (Sharp et al. 2005Go).

Previous population genetics theories have calculated the way codon usage should depend on selection, mutation, and drift (SMD; Li 1987Go; Shields 1990Go; Bulmer 1991Go). We will extend and improve these theories by taking more careful account of the way that the selective advantage of one codon over another should depend on the set of tRNA genes in the genome. This is a problem of coevolution. For a given set of tRNA genes, codon usage will evolve to an equilibrium in which there is a bias toward codons that are rapidly translated by the current set of tRNAs. However, tRNA copy numbers can change due to gene deletions and duplications or anticodon mutations, as we show below. Therefore, copy numbers can evolve to match the codon usage. In general, we expect to find genomes in a stable coevolved state in which neither codon usage nor tRNA gene content can change without decreasing the efficiency of translation. The theory that we give below predicts which combinations of codon usage and tRNA genes will be stable. The theory will be tested by comparison with observations in a large set of bacterial genomes.

The basis of the theory is that within a family of synonymous codons, the more rapidly translated codon will be preferred. We suppose that the selective advantage of one codon over another is proportional to the difference in the mean times taken to translate the two codons. We use a simple model of translation kinetics that considers the rate at which a given tRNA pairs with a given codon during translation of an mRNA by a ribosome. It is important to realize that, due to wobble pairing between codon and anticodon, many tRNAs can pair with more than one codon and that there is often more than one tRNA that can pair with a given codon. The wobble position of a tRNA is the first position of the anticodon and it pairs with the third position of the codon. The simplest case is that of amino acids with two-codon families ending in U and C. In most genomes, there is a single type of tRNA with a G base at the wobble position that pairs with both C and U. In a large number of bacterial genomes, the C codon is selectively favored (Sharp et al. 2005Go), presumably because the wobble-G pairs more rapidly with the C codon than the U. In Escherichia coli, this has been tested experimentally by Curran and Yarus (1989)Go, who showed that the C codons are indeed translated more rapidly for amino acids with U + C codon families.

The next simplest case is codon families ending in A and G. Here, the types of tRNAs differ for different A + G families in the same genome and for the same codon family in different genomes. In many cases, there is a single type of tRNA with a U base at the wobble position that can pair with both A and G. In such cases, the A codon is usually preferred, presumably because the wobble-U pairs more rapidly with the A than the G. An example in this category is the Glu codon family in E. coli. Sorensen and Pedersen (1991)Go tested this case and found that the A codon was translated roughly three times faster than the G codon. However, in other A + G families, there can be a second type of tRNA with a wobble-C. It is thought that the C pairs only with the G codon. When both types of tRNA are present, translation of the G codon is more rapid because both tRNAs pair with the G codon but only the wobble-U tRNA pairs with the A codon. Selection therefore prefers the G codon. The Gln codons in E. coli are a particular example in this category. This case was also tested by Curran and Yarus (1989)Go, who found that the G codon is indeed translated more rapidly.

The examples above confirm the basic idea that more rapidly translated codons are selectively favored. A complicating factor is that there are many tRNAs in which the base at the wobble position is modified to a nonstandard base. For simplicity, in what follows, we will refer to the unmodified base at the wobble position because the nature of the modification in different organisms may be different and because it is not directly apparent from the genome sequence whether a wobble base is modified. Base modification potentially affects the relative rates of pairing of an anticodon with alternative codons. In some cases, modification may prevent pairing with undesired codons. For example, in A + G families, the wobble-U may be modified so that it only pairs with A and G and not U and C. In four-codon families, there is evidence that an unmodified U at the wobble position is able to pair with all four codons. This is the standard situation in animal mitochondrial genomes (Jia and Higgs 2008Go), where there is only one tRNA per four-codon family. It is also the case in a number of bacteria, although more usually for four-codon families in bacteria, there is a wobble-G tRNA as well as a wobble-U, and sometimes there is a wobble-C tRNA in addition to both of these. In the present paper, we will consider only U + C and A + G families, but we hope to extend the theory to four-codon families in the future.

The smallest bacterial genomes have under 30 tRNA genes, all of which are distinct, whereas larger bacterial genomes can have over 120 tRNA genes, many of which are duplicate copies. Cell division times in bacteria differ from minutes to days. Rocha (2004)Go has shown that more rapidly multiplying bacteria tend to have larger numbers of tRNA gene copies. The likely explanation is that increasing the number of gene copies leads to increased tRNA concentration in the cell, which allows more rapid translation. In species where concentrations have been measured, they do increase roughly in proportion to the number of genes (Kanaya et al. 1999Go). Rapid translation is a significant advantage in rapidly multiplying organisms for which the time spent in protein synthesis is a significant limiting factor on the cell division time. In this paper, we incorporate this qualitative observation into a quantitative theory. We show that in organisms for which the time cost of translation is significant (i.e., rapidly multiplying organisms), it is beneficial to duplicate tRNA genes. Selection on codon usage will also be stronger in these same organisms. Thus, more strongly biased codon usage should occur in genomes with larger numbers of tRNAs.

The fact that tRNA copy numbers can vary by duplication and deletion is evident from the observation that different organisms have different total copy numbers. However, anticodon mutations are also important in tRNA evolution because they can potentially turn one type of tRNA into another. A good example of this is with tRNA Leu genes in mitochondrial genomes (Higgs et al. 2003Go), where several cases are known of interchange between genes with UAG and UAA anticodons. Lavrov and Lang (2005)Go have also found cases of anticodon mutations that convert a mitochondrial tRNA into a gene for a different amino acid. Anticodon mutations are also linked to several changes in the mitochondrial genetic code (Sengupta et al. 2007Go). The reassignment of the UGA stop codon to Trp occurs via a mutation in the anticodon of the Trp tRNA from CCA to UCA. The reassignment of AGY to Gly in urochordates occurs by duplication of a standard tRNA Gly with UCC anticodon, followed by mutation of one of the anticodons to UCU. Anticodon mutations have also been reported in bacterial tRNAs (Saks et al. 1998Go). When one compares bacterial genomes that are not too distantly related, it is often possible to find more than one distinct set of orthologous tRNA genes for the same amino acid. This suggests that anticodon substitutions are not too frequent. However, it is less clear whether this is true in more distantly related species because it becomes difficult to distinguish orthologues and paralogues in short tRNA sequences that contain a limited amount of phylogenetic information. The important point is that if a mutation occurs in an anticodon, the mutant sequence is likely to still be functional; therefore, anticodon mutations are a potential means of evolution of the tRNA gene content of a genome. A recent study compared tRNAs in several complete genomes of E. coli and related species (Withers et al. 2006Go). They were able to distinguish a core set of tRNAs that has been present in these genomes for over a hundred million years from additional tRNAs that had been inserted by recent horizontal transfer in E. coli O157:H7 and Shigella flexneri.

In the following section, we summarize the existing population genetics theory of codon usage and show how this can be used to estimate the strength of selection acting on different codons. In the subsequent sections, we present a new theory that shows the way that translational kinetics and translational selection should depend on the number of tRNA gene copies. We show that coevolution of tRNA genes and codon usage can lead to more than one possible stable state. This explains the observation that different amino acids in the same organism can have different preferred codons, as discussed above for Glu and Gln in E. coli. Finally, we carry out an analysis of a large number of bacterial genomes in order to test the predictions of the theory regarding the relationship between codon usage and tRNA copy number.


    Estimating the Strength of Translational Selection from Sequence Data
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
The theory of Li (1987)Go, Shields (1990)Go, and Bulmer (1991)Go predicts the way that codon usage depends on SMD. Consider a site at the third position in a codon belonging to a U + C two-codon family, such as that for Phe, for example. Let {theta} be the GC content of the genome that would arise from mutation alone, that is, let the rate of mutation from U to C be u{theta} and the rate of the reverse mutation be u(1 – {theta}). In absence of selection, the relative frequencies of U:C will be (1 – {theta}):{theta}. In the presence of selection, we will define the fitness of the U codon as w = 1, and let the fitness of the C codon be w = 1 + s, measured relative to this. The magnitude of genetic drift is controlled by the effective population size, Ne. The theory considers the limit where Neu is small, so that for most of the time, the population is dominated by one nucleotide or the other, but it occasionally makes transitions between the two.

In a population where almost all individuals have a U at a particular third position site, the net rate of creation of C mutations is u{theta} x Ne. The probability that any of these mutations becomes fixed in the population is

Formula (1)
The net rate of transition of the population from the U to the C state is

Formula (2)
where

Formula (3)
Here, it has been assumed that selection on synonymous sites is weak (s << 1), so that the numerator of equation (1) is approximately 2s. In this case, the effectiveness of selection is controlled by the parameter S = 2Nes and the parameters s and Ne do not appear separately in the equations. The net rate of transition from the C to the U state is RCU = u(1 – {theta})F(–S). Let the relative frequencies of U and C in the presence of selection be (1 – {phi}(S)):{phi}(S). At equilibrium under SMD we have

Formula (4)
Note that the dependence on S reduces to a very simple exponential function in the last step above. Rearranging this gives

Formula (5)
In U + C families, the C codon is almost always preferred, that is, S > 0, and {phi}(S) > {theta}.

We expect that translational selection is significant principally on a relatively small number of genes that are much more highly expressed than average. Genes such as ribosomal proteins and elongation factors are presumed to be highly expressed in all organisms, and codon frequencies in these genes should be indicative of those in genes in which translational selection is operating. On the other hand, the levels of expression of the majority of genes in the genome are much lower and the strength of selection on the majority of genes may be negligible in comparison to that on the small number of very highly expressed genes. Using these assumptions, Sharp et al. (2005)Go estimated the strength of selection acting in U + C codon families in bacterial genomes. We will use the same method. Let nFormula and nFormula be the number of U and C codons in a U + C codon family in the whole genome. These are labeled "low" because the majority of genes are assumed to be expressed at a low level. Let nFormula and nFormula be the numbers of codons in a small set of genes that are presumed to be highly expressed. If selection is negligible on the low-expression genes, we expect codon usage to depend only on the mutation parameter {theta}:

Formula (6)
In the high-expression genes, codon usage depends on {theta} and S via the {phi}(S) function:

Formula (7)
Rearranging equation (4), we have

Formula (8)
Thus, both {theta} and S can be estimated by simple counting of codons. In what follows, we also consider A + G codon families. The equations are equivalent with subscript G replacing C and subscript A replacing U. In this case, S is the selective advantageof the G codon with respect to the A. S is positive when G is preferred and negative when A is preferred.

Tables 1 and 2 show several examples of this method. In table 1, we consider codons for Asn and Asp, two examples of U + C codon families, and in table 2, we consider codons for Gln and Glu, two examples of A + G codon families. Four bacterial species and three eukaryotes have been chosen as illustrative examples of the patterns of tRNA copy numbers that occur in the A + G families. The bacteria were chosen from among the 80 bacterial species previously analyzed by Sharp et al. (2005)Go. The high-expression gene set includes elongation factors Tu, Ts, and G and 37 ribosomal proteins genes, as described by Sharp et al. (2005)Go. For comparison, the tables also include three eukaryotes where translational selection is thought to be important. We calculated these data using the sequences of the ribosomal proteins obtained from the ribosomal protein gene database (Nakao et al. 2004Go) as the high-expression gene set.


View this table:
[in this window]
[in a new window]

 
Table 1 Examples of Codon Usage, Estimated Selection Strength, and tRNA Gene Copy Numbers in Representative Species for Two Amino Acids with U + C Codon Families

 

View this table:
[in this window]
[in a new window]

 
Table 2 Examples of Codon Usage, Estimated Selection Strength, and tRNA Gene Copy Numbers in Representative Species for Two Amino Acids with A + G Codon Families

 
In table 1, NG denotes the number of tRNA gene copies for the amino acid, all of which have G at the wobble position. In table 2, NU and NC denote the number of tRNA gene copies for the amino acid that have U or C at the wobble position. In table 1, we see that S is positive for Asn and Asp in each species. This agrees with our expectations because the wobble-G pairs better with C. In table 2, we see examples of both positive and negative S for Gln and Glu and the direction of selection depends on the tRNA copy numbers. Cases where NC = 0 (i.e., 1:0, 2:0, and 4:0) all have negative S. This shows that when the only tRNA has wobble-U, it interacts more efficiently with the A codon and the A codon is preferred. In cases where NU > NC > 0 (such as 2:1 and 4:2), S is still negative and A is still preferred. In cases where NU = NC or NU < NC, S is positive and G is preferred. This shows that if there are a sufficient number of wobble-C tRNAs relative to wobble-U tRNAs, the direction of selection on codon usage is reversed. This agrees with our expectation that wobble-C tRNAs interact with G codons but not A codons.

These examples are intended as motivation for the theory that follows. The bacterial species chosen are illustrative of the trends in the larger data set of 80 bacterial species that we considered and many other examples could have been chosen. We will give a statistical analysis of the full data set after presenting the theory. The three eukaryotes are included because these species are among the eukaryotes whose codon usage has been studied in most detail. The theory below is intended principally as a theory of codon usage in bacteria, but it also applies well to these three eukaryotes. For some other multicellular eukaryotes, codon usage does not seem to be dominated by translational selection in the same way. Thus, we will not discuss eukaryotes further in this paper.

An interesting feature of table 2 is that there are several species where selection goes in opposite directions for Gln and Glu, that is, where G is preferred for one amino acid and A is preferred for the other. The coevolution of codon usage and tRNA gene content has led to different stable states in the same organism. We now wish to show how this situation can arise using a simple model for translation kinetics.


    Translational Kinetics and Selection
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
When considering how to incorporate translational kinetics into a theory for codon usage, we need to account for two principal facts. First, preferred codons correspond to tRNAs that are more frequent; thus, we expect that the rate of translation of a codon should increase with the frequency of its cognate tRNAs. Second, we know that selection occurs between codons that are translated by the same tRNA; thus, translation rates must depend on the individual anticodon–codon pair. The simplest assumption that incorporates these factors is to suppose that the rate at which tRNAs of type i translate codons of type j can be written as rij = Cikij, where Ci is the tRNA concentration and kij is a rate constant specific to the codon–anticodon pair. In the case that more than one tRNA translates the same codon, the rate of translation of codon j is Formula where the sum is over all types of tRNA that translate codon j. This amounts to the assumption that there is a single dominant step in the translation kinetics that is codon dependent and that this step is first order in the tRNA concentration. The same assumption has been made in previous theories for codon usage (e.g., Shields 1990Go; Bulmer 1991Go; Solomovici et al. 1997Go). For simplicity, we will assume that tRNA concentrations are proportional to tRNA gene numbers, which is approximately true experimentally (Kanaya et al. 1999Go), that is, Ci = c0Ni, where Ni is the number copies of the corresponding tRNA gene in the genome and c0 is the concentration arising from one gene copy. The rate constant for tRNA i with codon j will be written as kij = k0bxY, where k0 is an overall rate constant for translation, X is the base at the wobble position of the tRNA, Y is the base at the third position of the codon, and bxY is a constant of order 1 that measures the relative rate of translation of the XY combination.

The translation rates for the U and C codons in a two-codon U + C family are therefore

Formula (9)
The mean translation times for these codons are tU = 1/rU and tC = 1/rC. We propose that S = {sigma}(tU tC), where {sigma} is an organism-specific constant. This means that S is a function of NG:

Formula (10)
where Formula which combines all the previous constants into a single parameter that quantifies the cost of translation in an organism. If bGC > bGU, S will always be positive in equation (10).

In an A + G family, the rates of translation of the two codons are

Formula (11)
We have assumed that the A codon can only be translated by the U tRNA and the G codon can be translated by both tRNAs. The mean times for G and A codons are tG = 1/rG and tA = 1/rA, and the selection coefficient is

Formula (12)
This can be positive or negative, depending on the b parameters and the numbers of tRNAs.

In equations (10 and 12), we have grouped the factors that affect the strength of selection into a single parameter K and separated these from the factors that influence the direction of selection (i.e., the N and b parameters). One of our aims below is to compare the cost of translation in different organisms. K is a useful parameter because it should be a property of an organism that depends on its lifestyle. For example, in rapidly multiplying organisms, the time taken in protein synthesis should be a significant fraction of the total cell division time. Hence, there should be significant selection to speed up translation and K should be large. In slowly multiplying organisms, the time for translation may not be a limiting factor. Hence, K may be small. In estimating selection strength for an organism, it may be useful to average over codon families in order to reduce the statistical error from counting small numbers of codons in each family. Sharp et al. (2005)Go averaged all the estimated S values for the U + C codon families. However, equation (10) predicts that S depends on NG, which is different for different amino acids. Moreover, NG can evolve in response to translational selection, so it is useful to separate the driving force, K, from the response. Therefore, it seems more useful to use K than S to compare organisms. We will also consider A + G families, which were not considered by Sharp et al., and in this case, the difference between K and S is even more clear. An organism with a high K may have positive or negative S or may even have S close to zero if the N and b factors happen to almost cancel out in equation (12). Therefore, it would not be useful to average S over amino acids in this case.


    Coevolution with Fixed Total tRNA Copy Number
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
In this section, we will consider an A + G codon family with tRNA copy numbers NU and NC. We suppose that the total number of tRNAs is fixed, but that the number of copies of each type can vary. If the total number of copies is four, as is the case for Gln and Glu in E. coli, then the possible combinations of NU:NC are 4:0, 3:1, 2:2, and 1:3. The combination 0:4 is forbidden because there must be at least one U tRNA to translate the A codons. The situation of fixed total copy number is discussed first because it is the simplest. We generalize to variable total number in the following section. However, the fixed total copy number case is similar to the situation considered by Bulmer (1987)Go and Shields (1990)Go, who suppose that there are two tRNAs with fixed total concentration. We will discuss the differences that arise between our theory and the previous one at the end of this section.

Let Formula denote the frequency of the G codon in an A + G family that would arise under SMD with specified tRNA copy numbers, that is, Formula which can be calculated from equations (5 and 12). Figure 1a shows Formula as a function of K for each possible combination of NU:NC. As the b parameters are relative rates, we can set one of them to 1 by definition. We will use the U tRNA + A codon combination as a reference, that is, bUA = 1. In the examples used in this paper, we will also set bCG = 1, for simplicity. Based on strengths of RNA base pair interaction, we would expect that bUG < 1 because UG pairs are weaker than Watson–Crick pairs. We will set bUG = 0.4 in these examples. We will show below that these parameter values appear to be close to optimal for interpreting the codon usage on bacterial genomes. For these parameter choices, S(4,0) and S(3,1) are negative, but S(2,2) and S(1,3) are positive; therefore, Formula(4,0) and Formula(3,1) decrease with K but Formula(2,2) and Formula(1,3) increase with K, as shown in figure 1a.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— (a) Expected frequency {phi}* of the G codon in A + G families as a function of translational cost parameter K. Four different combinations of tRNA gene copy numbers NU:NC are shown. Thick black lines indicate regions of stability. (b) Mean translation time per codon as a function of {phi} for each tRNA gene combination.

 
In genes where the frequencies of G and A codons are {phi} and 1 {phi}, the mean time per codon is

Formula (13)
Figure 1b shows Formula as a function of {phi} for each combination of tRNAs. We have set k0c0 = 1 in the figure because this is simply a multiplying factor. Selection is acting to minimize Formula . The gradients of these lines are proportional to the selection coefficients. The 4:0 and 3:1 lines slope down toward {phi} = 0, corresponding to negative S, and the 2:2 and 1:3 lines slope down toward {phi} = 1, corresponding to positive S. The codon frequency that would optimize translation time is always either 0 or 1, depending on the direction of the slope. However, the frequency that occurs is Formula, which is between 0 and 1 because it is also influenced by mutation and drift, not just by selection.

In a real genome, the anticodons of the tRNAs can adapt to the codon usage at the same time as the codon usage adapts to the tRNAs. A straightforward mutation at the wobble position can convert a U tRNA to a C tRNA, as discussed in the introduction. Thus, an organism can jump between the lines on figure 1b. If the codon frequencies are at SMD balance with specified NU and NC, the mean time per codon is Formula (Formula(NU,NC),NU,NC). If this combination of tRNAs is stable to anticodon mutation, then this time must be less than the time that would occur at the same codon frequencies with any other copy numbers MU and MC such that MU + MC = NU + NC:

Formula (14)
In other words, if a tRNA gene combination is stable, then the Formula line for this combination must be the lowest line of those on figure 1b. For each of the NU:NC combinations, there is a range of {phi}, where the corresponding time is the lowest. The boundaries between these regions are indicated by dotted vertical lines in figure 1b. The same boundaries are indicated by dotted horizontal lines in figure 1a. In order for the NU:NC combination to be stable to anticodon mutations, Formula must lie within the corresponding boundaries. Stable regions are indicated by thick black lines in figure 1a. In these regions, the tRNAs and codons are coadapted. In the regions of the curves drawn with thin lines, the codons are adapted to the tRNAs but the tRNAs are not adapted to the codons. Hence, the configuration is not stable to anticodon mutations.

For the example illustrated in figure 1, the GC content arising from mutation is {theta} = 0.5. Thus all curves tend to 0.5 as K tends to zero. This lies in the interval where the 3:1 combination is stable. If K increases, the other three combinations become stable within certain intervals of K. For moderate or large K, there is always more than one stable solution. This is the central point of this paper: In species where there is significant translational selection, alternative stable states of codon usage exist where codons and tRNAs are coadapted. It is therefore possible for the codon usage in codon families for different amino acids to be biased in different directions, even if they are subject to the same mutation and selection processes, as we saw in the examples in table 2.

In this theory, we are treating mutations in anticodons in a different way from those in codons. For synonymous mutations in codons, we assume that selection is balanced by mutation and drift, so that at any synonymous site, it is possible for either an optimal or nonoptimal codon to occur with some probability. In contrast, if a favorable mutation occurs at the wobble position of the anticodon, we assume that it is always selected. This is because a mutation at a synonymous site affects the translation time of only one codon, whereas a mutation in the anticodon affects the translation time of all the codons for that amino acid. Selection on the wobble position is therefore orders of magnitude stronger than on a synonymous site. If selection is large enough to cause a significant bias at synonymous sites, then selection on the wobble position should be very large indeed, and it is reasonable to assume that the optimal state of the anticodons will always be fixed in the population. This point has also been made by Jia and Higgs (2008)Go with regard to the evolution of tRNAs and codon usage in mitochondrial genomes.

As we have assumed that the concentration of tRNAs is proportional to the number of gene copies and that the total number of gene copies is fixed, this case is similar to previous theories that considered concentrations of two tRNAs with a fixed total concentration, C1 + C2 = 1. However, there are important differences. The selection parameter used by Bulmer (1987)Go and Shields (1990)Go is Formula which may be compared with our equation (12). This has the problem that it becomes infinite if either C1 or C2 goes to zero. It does not capture the way anticodon–codon interactions work because it assumes that two separate tRNAs translate the two different codons, whereas in reality, wobble-G tRNAs can translate both codons in U + C families and wobble-U tRNAs can translate both codons in A + G families. In A + G families, it is possible for NC to be zero when NU is nonzero because the U tRNA translates both codons. The selection strength in equation (12) can never be infinite in this case. On the other hand, NU cannot be zero because then it would be impossible to translate A codons. Shields (1990)Go found that, if tRNA concentrations were treated as continuous variables, there would be a synergistic increase of tRNA concentrations and codon frequencies leading to ever-increasing biases. He thus concluded that organisms would exhibit either no codon bias or complete codon bias. We disagree with this: The bias is always finite according to our theory because the selection parameter is always finite.

Interestingly, the problem of ever-increasing biases in the Shields’ theory applies only to the case where all genes are assumed to have equal expression levels. He also considered a model with separate classes of high- and low-expression genes, assuming that a small proportion of highly expressed genes accounts for half the total gene expression and assuming a lower selection strength in the low-expression genes. In our theory, we have made the more extreme assumption that the translational effort of the cell is dominated by the high-expression genes. Therefore, only the time Formula for the high-expression genes is relevant for the translational cost, and we only need to consider selection on one class of genes. It would be possible to modify our theory to include a reduced level of selection on low-expression genes and a parameter to control the fraction of the total translational effort of the cell that goes into high- and low-expression genes. However, we do not want to make the theory more complicated with additional parameters at this point. We emphasize that it is not necessary to consider genes with two different selection levels in order to avoid the problem of ever-increasing biases. The problem is avoided by a more realistic choice of the selection parameter in our model.


    Coevolution with Variable Total tRNA Copy Number
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
In reality, tRNA copy numbers can change by gene duplication and deletion as well as anticodon mutation. Therefore, the total tRNA copy number for an amino acid is not fixed. In organisms where translational selection is strong, it will pay to make duplicate copies of a tRNA gene so that the tRNA concentration will increase and translation will be faster. However, there is some cost to the organism for duplicating this gene. Bacteria do not often retain redundant duplicate genes because they are under selection for rapid replication and, therefore, their genome size tends to be minimized. Duplicating the tRNA therefore has a cost in terms of increased DNA replication time. It also has a cost in terms of transcription. When the gene is present, the organism will expend time and energy in making tRNA molecules by transcribing this gene. This would be disadvantageous to the organism if the extra tRNAs were not beneficial for translation. We expect the total cost of translation of codons of a given amino acid to be

Formula (15)
where fa is the frequency of codons for the amino acid in high-expression genes, Na is the number of tRNA genes for the amino acid and g is the cost per tRNA gene.

The U + C families become of greater interest when the number of tRNAs can vary. For a U + C family, the translational cost T is a function of the frequency {phi} of the C codon and the number of wobble-G tRNAs. From equations (9, 10, and 15), we have

Formula (16)
At the SMD equilibrium for a fixed NG, the codon frequency is Formula from equations (5 and 10). If this solution is stable against duplication or deletion of tRNAs, we must have

Formula (17)

Figure 2 shows the Formula curves as a function of K for each value of NG. Regions that are stable according to equation (17) are indicated by thick black lines. In this example, bUA = bGC = 1, bGU = 0.4, and {theta} = 0.5. If K is small enough, T is minimized by setting NG = 1, irrespective of the other parameters. Above a certain value of K, the NG = 1 solution becomes unstable. As K increases, each successive value of NG becomes stable in a range that partially overlaps the previous value of NG, so once again, there can be more than one stable state for a given K. The ranges of {phi} covered by the stable regions of the curves are quite broad. Thus, there might be a considerable amount of variation in the degree of codon bias observed in organisms with the same number of tRNA genes. Paradoxically, because the strength of selection varies inversely with NG, the degree of codon bias actually decreases after a tRNA gene duplication. However, the stable range of {phi} shifts toward more biased codon usage as NG increases; therefore, on average, we expect to see stronger codon bias in organisms with more tRNA genes.


Figure 2
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Frequency of the C codon in U + C families as a function of translational cost parameter K for varying numbers NG of tRNA copies. Thick black lines show regions of stability.

 
The regions of stability depend on the ratio g/fa, which is set to 0.3 in this example. If this ratio is lower, the regions of stability slide down the curves to lower ranges of K, although the positions of the curves themselves do not depend on g/fa. Thus, it is more favorable to add an extra tRNA if the cost per tRNA gene is lower or if the frequency of the amino acid is higher.

We now return to A + G codon families and consider both anticodon mutations and tRNA duplications and deletions at the same time. In the same way as above, the translational cost is

Formula (18)
The stability criterion is

Formula (19)
and this must apply for all combinations MU:MC != NU:NC.

Figure 3 shows an example with bUA = bCG = 1, bUG = 0.4, {theta} = 0.5, and g/fa = 0.3. For clarity, only the stable regions of the curves are shown. As with figure 2, the solution with only one tRNA is stable at very low K because the time-dependent term in the cost function becomes smaller than the term involving the cost per gene, so the total cost is minimized by minimizing the number of genes. Solutions with larger numbers of tRNAs become more stable as K increases. This is different from the situation in figure 1, where the 3:1 solution was stable at low K. The parameters in figures 1 and 3 are the same, with the exception of the addition of g/fa in figure 3. The 3:1 solution is stable against anticodon mutations at low K but not against tRNA deletion.


Figure 3
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Stable solutions for codon frequency as a function of translational cost parameter K in A + G families, where the GC content of the mutational process is {theta} = 0.5. Labels indicate tRNA gene numbers NU:NC.

 
There are some combinations (e.g., 1:2, 2:2, 3:2, and 2:3) that do not appear on figure 3 because there is no stable region of the corresponding curve for any K. The combinations that have a stable range depend on the other parameters, and in particular, they are influenced by {theta}, the GC content specified by the mutation rates. In figure 4, {theta} = 0.7, rather that 0.5, as it was in figure 3. In this case 1:2, 2:2, 3:2, and 2:3, all have a stable region but there is no stable region for 2:0, 3:0, and 4:0.


Figure 4
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— As figure 3 except that {theta} = 0.7.

 
As {theta} has an important influence on the stability of the different solutions, it is interesting to consider the way codon frequencies and tRNA combinations are likely to vary with {theta} when K is fixed. Figure 5 shows two examples with K = 0.5 and 3.0. The other parameters are as before. For K = 0.5, only low tRNA number solutions are stable (1:0, 2:0, and 1:1). Each of these is stable within an interval of {theta}. As selection is weak in this example, {phi} does not differ much from {theta} on any of these curves (all the curves lie close to the diagonal). For K = 3.0, solutions with larger tRNA numbers are stable (3:0, 4:0, 3:1, 2:2, and 1:3). There is significant codon bias in this example: {phi} can be considerably higher or lower than {theta}, especially for the more uneven tRNA combinations.


Figure 5
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Variation of codon usage {phi}* with the GC content of the mutation process {theta} shown for two different values of translational cost parameter K. Stable solutions with different tRNA gene combinations are shown.

 
Figure 5 gives an idea of what might be expected to happen if there is a gradual change in the GC content of the genome of an organism with time due to changing of the relative mutation rates between the different bases. A GC rich organism with {theta} = 0.9 might initially lie on the 1:3 curve. If {theta} is gradually reduced, the GC content in the majority of genes in the genome will follow this, but the GC content in the high-expression genes will follow the 1:3 curve and will therefore remain very high until this curve becomes unstable at {theta} close to 0.5. At this point, there will be a shift in the tRNA content and a sudden change in the codon usage in the highly expressed genes. A similar behavior would also occur according to the theory of Shields (1990)Go.


    Comparison of Theory with Bacterial Codon Usage Data
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
We now return to the analysis of the 80 bacterial genomes. Sharp et al. (2005)Go calculated an average of S in U + C amino acids. As discussed above, the cost of translation is determined by K. Although the strength of selection S is proportional to K, it is better to average K over amino acids than to average S because S is affected by the number of gene copies, which is variable. For a U + C amino acid a, the estimated selection strength is Sa (determined from eq. 8), and the estimated cost parameter is Formula where NG is known from the genome. The factor involving the b parameters is simply a multiplying factor that we assume is the same for each amino acid. For consistency with the examples above, we have used bUA = bGC = 1 and bGU = 0.4, but the relative values of K for each organism do not depend on this choice of b parameters. Thus, as a measure of the cost of translation in an organism, we will use

Formula (20)
where fa is the frequency of amino acid a in the high-expression protein sequences. Sharp et al. used the four amino acids Phe, Ile, Tyr, and Asn in their average. We will also include His and Asp, making six U + C amino acids in total because these two behave the same way as the other four. We do not include Cys and Ser(AGY) because there are usually fewer codons for these amino acids and the statistics are less reliable.

Rocha (2004)Go estimated the minimum doubling times for different bacteria. He showed that the number of tRNA genes in the genome decreases with doubling time and increases with codon bias (estimated from comparing effective numbers of codons in high- and low-expression genes). Our theory explains why this occurs. Our estimate of Formula is a measure of the time-dependent cost of translation and hence of the benefit to be gained by optimizing translation. Figure 6a shows the relationship between doubling time and Formula for 77 of the 80 species considered here. Three species where no doubling time estimate was available were excluded (Aquifex aeolicus, Mycoplasma penetrans, and Wigglesworthia glossinidia). There is a strong negative correlation between doubling time and Formula because translational selection is strongest in fast growing organisms that need to synthesize proteins very rapidly. Figure 6b shows that the total number of tRNA genes increases as a function of Formula . Our theory explains why tRNA duplication is favorable in species with high K and therefore explains why tRNA numbers increase with K and decrease with doubling time.


Figure 6
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— (a) Relationship between the minimum doubling time of bacteria and the value of the translational cost parameter Figure 6 estimated from U + C codon families. (b) Dependence of the total number of tRNA genes in bacterial genomes (circles) and the total number of different anticodons among the tRNA genes (triangles) on the estimated Figure 6.

 
In contrast to this, the diversity of tRNAs (the number of distinct anticodons) appears to decrease slightly with Formula , as shown in figure 6b. A similar point was noted by Rocha (2004)Go, who showed that tRNA diversity is lower in the fast growing species than the slow growing ones. A possible explanation of this is that when translational selection is strong, codon bias is strong, and so it is possible for the organism to optimize its tRNAs by specializing to the preferred codons and deleting the tRNAs that would best match the rare codons. To consider this argument more fully, it will be necessary to extend our theory to deal with tRNAs in four-codon families, which we will do in a future paper. One point that is apparent from the theory of A + G codon families above is that tRNA diversity will be influenced substantially by GC content. When {theta} is small and K is large, tRNA combinations such as 3:0 and 4:0 are expected to be stable (fig. 5), whereas when {theta} is large and K is large, combinations like 1:3 will be stable, which requires two distinct tRNAs instead of one. Thus, we expect that tRNA diversity will be larger in high-GC organisms. In fact, it has already been shown that this is the case in bacterial genomes—see figure 4D of Kanaya et al. (1999)Go. Although the downward trend of tRNA diversity with Formula appears to be significant in figure 6b, this result should be treated with caution until a more complete analysis of four-codon families has been made, and GC content has been controlled for effectively.

We will now consider the way the observed tRNA gene combinations depend on Formula . There are 14 of the 80 bacterial species for which Formula is negative: Borrelia burgdorferi, Buchnera aphidicola (Ap, Bp, and Sg), Chlamydophila pneumoniae, Chlorobium tepidum, Neisseria meningitidis, Nitrosomonas europaea, Pseudomonas aeruginosa, Rickettsia conorii, Rickettsia prowazekii, Treponema pallidum, Tropheryma whipplei, and Xylella fastidiosa). With the exception of C. tepidum, these species also have negative Formula . Sharp et al. (2005)Go give two possible explanations of negative Formula values: either base composition may be skewed between strands and high-expression genes may be predominantly on the leading strand or there may be islands of unusual base composition arising from horizontal transfer. These species are all slow-growing organisms with low tRNA numbers. It is likely that translational selection is very weak in these species and that the small effects mentioned above are more important than translational selection. As tRNA-dependent translational selection is not the dominant effect in these organisms, they are a poor test of this theory; therefore, we exclude these species from the following analysis. The remaining 66 species cover a wide range of Formula , doubling time and total tRNA number and provide a good test set.

Table 3 summarizes information from six U + C amino acids (Phe, Ile, Tyr, His, Asn, and Asp) in the 66 species. Table 4 summarizes information from three A + G amino acids (Gln, Lys, and Glu) in the same species. Each row is a category corresponding to a given number of tRNA genes. Nobs is the number of observations in each category. In table 3, the case of Ile in Clostridium acetobutylicum has been excluded because there are no tRNAs for Ile annotated on the genome, which is presumably an error in tRNA identification. The sum of the Nobs column is therefore 395 (= 6 x 66 – 1). In table 4 the case of Gln in Streptomyces coelicolor is excluded because the annotated tRNA combination is NU:NC = 0:2, which should be impossible according to our assumption that NU must be at least one. This could represent a very unusual tRNA combination, but more likely, it is an error in tRNA annotation. The sum of the Nobs column is therefore 197 (= 3 x 66 – 1). The "mean K" is the mean value of Formula for all the observations in each category. In table 3, the mean K increases with NG. In table 4, the mean K increases with NU in the categories from 1:0 to 7:0, and also from 1:1 to 3:1, and from 1:2 to 7:2. These results confirm the prediction of the above theory that larger numbers of tRNAs should be found in organisms with higher K. Note that only the U + C amino acids were used in order to estimate Formula for each species. However, these results show that Formula is also a predictor of what happens in the A + G amino acids. If selection is strong in an organism, it causes higher numbers of tRNAs to arise for both U + C and A + G amino acids.


View this table:
[in this window]
[in a new window]

 
Table 3 Comparison of Theory for U + C Codon Families with Observations in Bacteria

 

View this table:
[in this window]
[in a new window]

 
Table 4 Comparison of Theory for A + G Codon Families with Observations in Bacteria

 
We also wish to test the sign of the selective effect. The tables show the times of translation of the two codons (with k0c0 factor set to 1) and the difference in the times. The sign of the selective effect should be the same as the sign of the time difference. The mean S column of the tables is the mean of Sa for each of the observations in the category. In table 3, the mean S is positive in every case, as expected. In table 4, the mean S can be either positive or negative, but it has the same sign as the time difference in every case except for the 7:0 category, in which there are only two observations. We also calculated Nsign, the number of observations for which Sa has the same sign as the time difference. In table 3, almost all the observations in each category have the correct sign. In table 4, a majority of observations have the correct sign but by no means all. One reason for this is probably statistical, because Sa is estimated from relatively small numbers of codons in the high-frequency genes. The statistical error will be more likely in categories where the true selective effect is smaller. The true effect is small when K is small, such as the 1:0 category or for categories that are close to the balance point where the two codons have equal times, such as the 2:1 category.

It should be remembered that the predicted sign of the effect depends on the b parameters. Throughout this paper, we have assumed that bUA = bCG = bGC = 1 and bUG = bGU = 0.4. If the parameters are shifted too much from these values, the agreement with the observations becomes worse. For example, there is a clear majority of observations with positive selection in the 1:1 category. Thus from equation (12), we have the inequality bUG + bCG > bUA. There is also a clear majority of observations with negative selection in the 3:1 category; hence, 3bUG + bCG < 3bUA. The observations in the 2:1 category are more evenly split, which suggests that there is little difference in the times for the two codons when there is a 2:1 ratio of tRNAs. We are currently carrying out a more complete evaluation of the effect of varying the b parameters in order to determine the values that give the best explanation of the data. Data from four-codon families can also be included. These results will be presented elsewhere. Preliminary results suggest that the values we have used in this paper are not far from optimal in terms of maximizing the number of observations for which the sign is correctly predicted.

One of the reasons the sign is not correctly predicted in 100% of the cases may be that the rates depend on the other two positions in the anticodon, not just the wobble position. In that case, the relative rates of translating the two codons may not be the same for each amino acid, so no single set of b parameters would make correct predictions for Gln, Lys, and Glu at the same time. It is also possible that there is variation in the level of transcription arising from different tRNA gene copies, so that the tRNA concentrations are not exactly proportional to the gene copy numbers. This could also lead to a shift in the expected sign of the selective effect. Furthermore, it should be remembered that the wobble positions of some tRNAs are changed to modified bases. We have referred to the unmodified bases only because the modifications are not known directly from the tRNA gene sequences. However, if different modifications occur in different organisms or in different tRNAs in the same organism, this could also change the b parameters.


    Discussion
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
We have used a very simple assumption about kinetics of translation in this paper, namely that the rate of translation of a codon by a given tRNA is proportional to the tRNA concentration and a rate constant that depends on the codon–anticodon combination. Experimental studies of the translation process have shown that there are many steps to the translation of each codon (Rodnina and Wintermeyer 2001Go; Blanchard et al. 2004Go; Daviter et al. 2006Go) and complex models of the translation process have been developed (Heyd and Drew 2003Go; Ninio 2006Go). There is not yet complete agreement as to how to define the different steps, and most of the measurements focus on differences in rates between cognate and noncognate codons, whereas it is differences in rates between alternative cognate codons that are relevant for codon bias. The fact that codon bias occurs in a large number of bacterial genomes means that these rates must indeed differ. Theories like ours can therefore be used to predict which codons should be translated most rapidly and thus give suggestions for future experiments. In this paper, we have only dealt with two-codon families, and it is obviously of interest to generalize this to four-codon families. The SMD theory for four-codon families can be done straightforwardly and a kinetic model can also be specified in the same way. The predictions of which codons are preferred will depend strongly on the b parameters that determine the rates of binding of different anticodon–codon combinations. We are in the process of determining what values of these rate parameters best predict the codon usage data in four-codon families. This should give predictions for the relative rates of translation of different codons that could be measured in future experiments.

The theory in this paper has been based on selection for translation speed. However, another type of translational selection that cannot be ruled out by this work is selection for translational accuracy. A test for translational accuracy is that preferred codons are sometimes more frequent at conserved sites than variable sites in the same gene (Akashi 1994Go; Stoletzki and Eyre-Walker 2007Go). If mistranslation occurs (i.e., a codon is translated by a noncognate tRNA), the resulting protein may misfold. Drummond et al. (2005)Go have argued that misfolded proteins are toxic to the cell, and hence that selection will favor use of codons for which the probability of mistranslation is lowest.

Some of the codon usage effects discussed in this paper can be explained by both translational accuracy and speed. If speed is important, selection should be strongest on highly expressed genes because more time will be saved by changing a codon that is translated more often. However, if there is a small probability of mistranslation each time a protein is made, more mistranslated proteins will result from genes that are translated more often; therefore, selection for accuracy would also be stronger in highly expressed genes. The relative importance of speed and accuracy could differ among organisms, and it would not be surprising if accuracy were more important than speed in larger multicellular organisms where cell division is slow. On the other hand, in bacteria, which we discuss here, the fact that codon bias is strongest in rapidly multiplying cells seems a strong indicator to us that translation speed is the key factor. Nevertheless, we acknowledge the possible argument suggested by a reviewer that rapidly multiplying organisms would produce mistranslated proteins at a higher rate and would stand a greater chance of overloading their apparatus for degradation of misfolded proteins, thus producing stronger selection for accuracy in rapidly multiplying cells.

The probability, p, that a codon is mistranslated may be written

Formula (21)
where r is the rate of translation by the cognate tRNA and m is the sum of the rates of mistranslation by all the noncognate tRNAs. The most accurate codon is the one for which m/r is smallest. The fact that rapidly multiplying bacteria have more duplicated tRNAs can evidently be explained by selection for translation speed. It is also true that if one tRNA is duplicated, this will increase the accuracy of its cognate codon because it will increase r but not m. However, if all the tRNAs are duplicated, this makes no difference to the accuracy, according to equation (21), so it is not clear that general increase in tRNA numbers would arise from selection for accuracy. Also, we have explained the direction of the selective effect above in terms of speed. This requires parameters in the theory to specify the relative rates of translation of the alternative cognate codon–anticodon combinations. A theory based on translational accuracy would also have to specify rates of mistranslation of all the noncognate codon–anticodon combinations in order to determine m/r for each codon. It is not obvious that the fastest codon would always be the most accurate, that is, there may be cases where accuracy and speed could be distinguished because they act in opposite directions. Some progress in measuring mistranslation rates experimentally has been made recently (Kramer and Farabaugh 2007Go).

Here, we mention two other recent theories that consider the relative ability of different kinds of anticodon–codon combinations to pair. Dos Reis et al. (2004)Go have developed a tRNA adaptation index to assess the degree to which codon usage in a gene is adapted to the tRNA content of the genome. Xia (2008)Go has considered coevolution between codon usage and tRNA anticodons in fungi mitochondria. Both papers use parameters that have some relation to our bij parameters, although they are defined in different ways.

A simplifying assumption made in our theory is that tRNA concentrations are directly proportional to gene copy number. For some organisms, information is available about the concentrations of tRNAs in the cell. It would therefore be possible to use these concentrations explicitly in the theory. We have not done this because it would then be impossible to carry out the statistical survey of large numbers of organisms. In cases where it has been measured, a rough proportionality between concentration and gene number exists, for example, in Bacillus subtilis (Kanaya et al. 1999Go) and Saccharomyces cerevisiae (Percudani et al. 1997Go). In E. coli more detailed information about regulation of tRNA gene expression is available (Dong et al. 1996Go). When E. coli is grown at a variety of different growth rates, it is found that the concentration of tRNAs cognate to the most frequent codons increases as growth rate increases, although not dramatically, and the concentrations of tRNAs cognate to less-frequent codons remain unchanged with growth rate. This suggests some degree of regulation of tRNA gene expression. One factor causing regulation of tRNA genes is the positioning of genes within the genome. Genes close to the origin of replication may be present in a double dose, whereas those that are further away will be replicated later in the cycle and are less likely to be present in a double dose. This should lead to corresponding variation in the tRNA concentrations. Ardell and Kirsebom (2005)Go have investigated the dosage effect and also the effect on expression of transcription of tRNAs in operons of several genes.

Although these complex details are interesting, we should not forget that the simplest way to regulate the concentration of tRNAs is to duplicate or delete the gene. We presume that duplications and deletions occur randomly with respect to the type of tRNA but selection operates among genome variants with different gene contents. In organisms with strong translational selection, the tRNA gene content is important to the organism and there will be significant selective differences among genome variants with different tRNA copy numbers. Genomes with efficiently coevolved sets of tRNA genes will tend to replace those with less efficient sets. Although there have been many previous studies of codon usage, ours is the first that gives a theory explicitly describing the coevolution of codon usage with tRNA gene content and that carries out a large-scale survey of the trends in many bacterial genomes that are caused by this coevolution.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 
We are indebted to Paul Sharp for supplying the codon usage data in 80 bacterial species, Eduardo Rocha for supplying information on bacterial growth times and tRNA gene copy numbers, and Hiroshi Akashi for helpful comments on codon usage data in Saccharomyces cerevisiae. This work was funded by the Canada Research Chairs organization and by the Natural Sciences and Engineering Research Council of Canada.


    Footnotes
 
Jeffery Thorne, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Estimating the Strength of...
 Translational Kinetics and...
 Coevolution with Fixed Total...
 Coevolution with Variable Total...
 Comparison of Theory with...
 Discussion
 Acknowledgements
 References
 

    Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics (1994) 136:927–935.[Abstract]

    Akashi H. Translational selection and yeast proteome evolution. Genetics (2003) 164:1291–1303.[Abstract/Free Full Text]

    Ardell DH, Kirsebom LA. The genomic pattern of tDNA operon expression in E. coli. PLoS Comput Biol (2005) 1(1):e12.[Medline]

    Blanchard SC, Gonzalez RL Jr, Kim HD, Chu S, Puglisi JD. tRNA selection and kinetic proofreading in translation. Nat Struct Mol Biol (2004) 11:1008–1014.[CrossRef][Web of Science][Medline]

    Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature (1987) 325:728–730.[CrossRef][Web of Science][Medline]

    Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics (1991) 129:897–907.[Abstract]

    Curran JF, Yarus M. Rates of aminoacyl-tRNA selection at 29 sense codons in vivo. J Mol Biol (1989) 209:65–77.[CrossRef][Web of Science][Medline]

    Daviter T, Gromadski KB, Rodnina MV. The ribosome's response to codon-anticodon mismatches. Biochimie (2006) 88:1001–1011.[CrossRef][Web of Science][Medline]

    Dong H, Nilsson L, Kurland CG. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol (1996) 260:649–663.[CrossRef][Web of Science][Medline]

    Dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res (2004) 32:5036–5044.[Abstract/Free Full Text]

    Dos Reis M, Wernisch L, Savva R. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res (2003) 31:6976–6985.[Abstract/Free Full Text]

    Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Nat Acad Sci USA (2005) 102:14338–14343.[Abstract/Free Full Text]

    Duret L. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet (2000) 16:287–289.[CrossRef][Web of Science][Medline]

    Heyd A, Drew DA. A mathematical model for elongation of a peptide chain. Bull Math Biol (2003) 65:1095–1109.[CrossRef][Web of Science][Medline]

    Higgs PG, Jameson D, Jow H, Rattray M. The evolution of tRNA-Leucine genes in animal mitochondrial genomes. J Mol Evol (2003) 57:435–445.[CrossRef][Web of Science][Medline]

    Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol (1981) 151:389–409.[CrossRef][Web of Science][Medline]

    Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol (1985) 2:13–34.[Abstract]

    Jia W, Higgs PG. Codon usage in mitochondrial genomes: distinguishing context-dependent mutation from translational selection. Mol Biol Evol (2008) 25:339–351.[Abstract/Free Full Text]

    Kanaya S, Yamada Y, Kudo Y, Ikemura T. Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs. Gene (1999) 238:143–155.[CrossRef][Web of Science][Medline]

    Kramer EB, Farabaugh PJ. The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA (2007) 13:87–96.[Abstract/Free Full Text]

    Lavrov DV, Lang BF. Transfer RNA gene recruitment in mitochondrial DNA. Trends Genet (2005) 21:129–133.[CrossRef][Web of Science][Medline]

    Li WH. Models of nearly neutral mutation with particular implications for nonrandom usage of synonymous codons. J Mol Evol (1987) 24:337–345.[CrossRef][Web of Science][Medline]

    Nakao A, Yoshihama M, Kenmochi N. RPG: the Ribosomal Protein Gene database. Nucleic Acids Res (2004) 32:D168–D170.[Abstract/Free Full Text]

    Ninio J. Multiple stages in codon-anticodon recognition: double trigger mechanisms and geometric constraints. Biochimie (2006) 88:963–992.[CrossRef][Web of Science][Medline]

    Percudani R, Pavesi A, Ottonello S. Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae. J Mol Biol (1997) 268:322–330.[CrossRef][Web of Science][Medline]

    Rocha EPC. Codon usage bias from the tRNA's point of view: redundancy, specialization, and efficient decoding for translational optimization. Genome Res (2004) 14:2279–2286.[Abstract/Free Full Text]

    Rodnina MV, Wintermeyer W. Fidelity of aminoacyl-tRNA selection on the ribosome: kinetic and structural mechanisms. Annu Rev Biochem (2001) 70:415–435.[CrossRef][Web of Science][Medline]

    Saks ME, Sampson JR, Abelson J. Evolution of a transfer RNA gene through a point mutation in the anticodon. Science (1998) 279:1665–1667.[Abstract/Free Full Text]

    Sengupta S, Yang X, Higgs PG. The mechanisms of codon reassignments in mitochondrial genetic codes. J Mol Evol (2007) 64:662–688.[CrossRef][Web of Science][Medline]

    Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res (2005) 33:1141–1153.[Abstract/Free Full Text]

    Sharp PM, Cowe E, Higgins DG, Shields DC, Wolfe KH, Wright F. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within species diversity. Nucleic Acids Res (1988) 16:8207–8211.[Abstract/Free Full Text]

    Sharp PM, Li WH. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res (1987) 15:1281–1295.[Abstract/Free Full Text]

    Shields DC. Switches in species-specific codon preferences: the influence of mutation biases. J Mol Evol (1990) 31:71–80.[CrossRef][Web of Science][Medline]

    Solomovici J, Lesnik T, Reiss C. Does Escherichia coli optimize the economics of the translation process? J Theor Biol (1997) 185:511–521.[CrossRef][Web of Science][Medline]

    Sorensen MA, Pedersen S. Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J Mol Biol (1991) 222:265–280.[CrossRef][Web of Science][Medline]

    Stoletzki N, Eyre-Walker A. Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol Biol Evol (2007) 24:374–381.[Abstract/Free Full Text]

    Withers M, Wernisch L, Dos Reis M. Archaeology and evolution of transfer RNA genes in the Escherichia coli genome. RNA (2006) 12:933–942.[Abstract/Free Full Text]

    Wright F. The effective number of codons used in a gene. Gene (1990) 87:23–29.[CrossRef][Web of Science][Medline]

    Xia X. The cost of wobble translation in fungal mitochondrial genomes: integration of two traditional hypotheses. BMC Evol Biol (2008) 8:211.[CrossRef][Medline]

Accepted for publication July 7, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/11/2279    most recent
msn173v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Higgs, P. G.
Right arrow Articles by Ran, W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Higgs, P. G.
Right arrow Articles by Ran, W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?