MBE Advance Access originally published online on August 6, 2008
Molecular Biology and Evolution 2008 25(11):2279-2291; doi:10.1093/molbev/msn173
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Coevolution of Codon Usage and tRNA Genes Leads to Alternative Stable States of Biased Codon Usage
Department of Physics and Astronomy, McMaster University, Hamilton, Ontario, Canada
E-mail: higgsp{at}mcmaster.ca.
| Abstract |
|---|
The typical number of tRNA genes in bacterial genomes is around 50, but this number varies from under 30 to over 120. We argue that tRNA gene copy numbers evolve in response to translational selection. In rapidly multiplying organisms, the time spent in translation is a limiting factor in cell division; hence, it pays to duplicate tRNA genes, thereby increasing the concentration of tRNA molecules in the cell and speeding up translation. In slowly multiplying organisms, translation time is not a limiting factor, so the overall translational cost is minimized by reducing the tRNAs to only one copy of each required gene. Translational selection also causes a preference for codons that are most rapidly translated by the current tRNAs; hence, codon usage and tRNA gene content will coevolve to a state where each is adapted to the other. We show that there is often more than one stable coevolved state. This explains why different combinations of tRNAs and codon bias can exist for different amino acids in the same organism. We analyze a set of 80 complete bacterial genomes and show that the theory predicts many of the trends that are seen in these data.
Key Words: codon usage translational efficiency tRNA genes translational kinetics
| Introduction |
|---|
It has long been known that synonymous codons in many species are not used with equal frequency. One of the major causes of this is selection for translational efficiency, that is, an organism prefers to use codons that are more rapidly translated in order to reduce the time and effort spent on translation (Sharp et al. 1988
Previous population genetics theories have calculated the way codon usage should depend on selection, mutation, and drift (SMD; Li 1987
; Shields 1990
; Bulmer 1991
). We will extend and improve these theories by taking more careful account of the way that the selective advantage of one codon over another should depend on the set of tRNA genes in the genome. This is a problem of coevolution. For a given set of tRNA genes, codon usage will evolve to an equilibrium in which there is a bias toward codons that are rapidly translated by the current set of tRNAs. However, tRNA copy numbers can change due to gene deletions and duplications or anticodon mutations, as we show below. Therefore, copy numbers can evolve to match the codon usage. In general, we expect to find genomes in a stable coevolved state in which neither codon usage nor tRNA gene content can change without decreasing the efficiency of translation. The theory that we give below predicts which combinations of codon usage and tRNA genes will be stable. The theory will be tested by comparison with observations in a large set of bacterial genomes.
The basis of the theory is that within a family of synonymous codons, the more rapidly translated codon will be preferred. We suppose that the selective advantage of one codon over another is proportional to the difference in the mean times taken to translate the two codons. We use a simple model of translation kinetics that considers the rate at which a given tRNA pairs with a given codon during translation of an mRNA by a ribosome. It is important to realize that, due to wobble pairing between codon and anticodon, many tRNAs can pair with more than one codon and that there is often more than one tRNA that can pair with a given codon. The wobble position of a tRNA is the first position of the anticodon and it pairs with the third position of the codon. The simplest case is that of amino acids with two-codon families ending in U and C. In most genomes, there is a single type of tRNA with a G base at the wobble position that pairs with both C and U. In a large number of bacterial genomes, the C codon is selectively favored (Sharp et al. 2005
), presumably because the wobble-G pairs more rapidly with the C codon than the U. In Escherichia coli, this has been tested experimentally by Curran and Yarus (1989)
, who showed that the C codons are indeed translated more rapidly for amino acids with U + C codon families.
The next simplest case is codon families ending in A and G. Here, the types of tRNAs differ for different A + G families in the same genome and for the same codon family in different genomes. In many cases, there is a single type of tRNA with a U base at the wobble position that can pair with both A and G. In such cases, the A codon is usually preferred, presumably because the wobble-U pairs more rapidly with the A than the G. An example in this category is the Glu codon family in E. coli. Sorensen and Pedersen (1991)
tested this case and found that the A codon was translated roughly three times faster than the G codon. However, in other A + G families, there can be a second type of tRNA with a wobble-C. It is thought that the C pairs only with the G codon. When both types of tRNA are present, translation of the G codon is more rapid because both tRNAs pair with the G codon but only the wobble-U tRNA pairs with the A codon. Selection therefore prefers the G codon. The Gln codons in E. coli are a particular example in this category. This case was also tested by Curran and Yarus (1989)
, who found that the G codon is indeed translated more rapidly.
The examples above confirm the basic idea that more rapidly translated codons are selectively favored. A complicating factor is that there are many tRNAs in which the base at the wobble position is modified to a nonstandard base. For simplicity, in what follows, we will refer to the unmodified base at the wobble position because the nature of the modification in different organisms may be different and because it is not directly apparent from the genome sequence whether a wobble base is modified. Base modification potentially affects the relative rates of pairing of an anticodon with alternative codons. In some cases, modification may prevent pairing with undesired codons. For example, in A + G families, the wobble-U may be modified so that it only pairs with A and G and not U and C. In four-codon families, there is evidence that an unmodified U at the wobble position is able to pair with all four codons. This is the standard situation in animal mitochondrial genomes (Jia and Higgs 2008
), where there is only one tRNA per four-codon family. It is also the case in a number of bacteria, although more usually for four-codon families in bacteria, there is a wobble-G tRNA as well as a wobble-U, and sometimes there is a wobble-C tRNA in addition to both of these. In the present paper, we will consider only U + C and A + G families, but we hope to extend the theory to four-codon families in the future.
The smallest bacterial genomes have under 30 tRNA genes, all of which are distinct, whereas larger bacterial genomes can have over 120 tRNA genes, many of which are duplicate copies. Cell division times in bacteria differ from minutes to days. Rocha (2004)
has shown that more rapidly multiplying bacteria tend to have larger numbers of tRNA gene copies. The likely explanation is that increasing the number of gene copies leads to increased tRNA concentration in the cell, which allows more rapid translation. In species where concentrations have been measured, they do increase roughly in proportion to the number of genes (Kanaya et al. 1999
). Rapid translation is a significant advantage in rapidly multiplying organisms for which the time spent in protein synthesis is a significant limiting factor on the cell division time. In this paper, we incorporate this qualitative observation into a quantitative theory. We show that in organisms for which the time cost of translation is significant (i.e., rapidly multiplying organisms), it is beneficial to duplicate tRNA genes. Selection on codon usage will also be stronger in these same organisms. Thus, more strongly biased codon usage should occur in genomes with larger numbers of tRNAs.
The fact that tRNA copy numbers can vary by duplication and deletion is evident from the observation that different organisms have different total copy numbers. However, anticodon mutations are also important in tRNA evolution because they can potentially turn one type of tRNA into another. A good example of this is with tRNA Leu genes in mitochondrial genomes (Higgs et al. 2003
), where several cases are known of interchange between genes with UAG and UAA anticodons. Lavrov and Lang (2005)
have also found cases of anticodon mutations that convert a mitochondrial tRNA into a gene for a different amino acid. Anticodon mutations are also linked to several changes in the mitochondrial genetic code (Sengupta et al. 2007
). The reassignment of the UGA stop codon to Trp occurs via a mutation in the anticodon of the Trp tRNA from CCA to UCA. The reassignment of AGY to Gly in urochordates occurs by duplication of a standard tRNA Gly with UCC anticodon, followed by mutation of one of the anticodons to UCU. Anticodon mutations have also been reported in bacterial tRNAs (Saks et al. 1998
). When one compares bacterial genomes that are not too distantly related, it is often possible to find more than one distinct set of orthologous tRNA genes for the same amino acid. This suggests that anticodon substitutions are not too frequent. However, it is less clear whether this is true in more distantly related species because it becomes difficult to distinguish orthologues and paralogues in short tRNA sequences that contain a limited amount of phylogenetic information. The important point is that if a mutation occurs in an anticodon, the mutant sequence is likely to still be functional; therefore, anticodon mutations are a potential means of evolution of the tRNA gene content of a genome. A recent study compared tRNAs in several complete genomes of E. coli and related species (Withers et al. 2006
). They were able to distinguish a core set of tRNAs that has been present in these genomes for over a hundred million years from additional tRNAs that had been inserted by recent horizontal transfer in E. coli O157:H7 and Shigella flexneri.
In the following section, we summarize the existing population genetics theory of codon usage and show how this can be used to estimate the strength of selection acting on different codons. In the subsequent sections, we present a new theory that shows the way that translational kinetics and translational selection should depend on the number of tRNA gene copies. We show that coevolution of tRNA genes and codon usage can lead to more than one possible stable state. This explains the observation that different amino acids in the same organism can have different preferred codons, as discussed above for Glu and Gln in E. coli. Finally, we carry out an analysis of a large number of bacterial genomes in order to test the predictions of the theory regarding the relationship between codon usage and tRNA copy number.
| Estimating the Strength of Translational Selection from Sequence Data |
|---|
The theory of Li (1987)
be the GC content of the genome that would arise from mutation alone, that is, let the rate of mutation from U to C be u
and the rate of the reverse mutation be u(1 –
). In absence of selection, the relative frequencies of U:C will be (1 –
):
. In the presence of selection, we will define the fitness of the U codon as w = 1, and let the fitness of the C codon be w = 1 + s, measured relative to this. The magnitude of genetic drift is controlled by the effective population size, Ne. The theory considers the limit where Neu is small, so that for most of the time, the population is dominated by one nucleotide or the other, but it occasionally makes transitions between the two.
In a population where almost all individuals have a U at a particular third position site, the net rate of creation of C mutations is u
x Ne. The probability that any of these mutations becomes fixed in the population is
|
| (1) |
|
| (2) |
|
| (3) |
)F(–S). Let the relative frequencies of U and C in the presence of selection be (1 –
(S)):
(S). At equilibrium under SMD we have
![]() | (4) |
|
| (5) |
(S) >
.
We expect that translational selection is significant principally on a relatively small number of genes that are much more highly expressed than average. Genes such as ribosomal proteins and elongation factors are presumed to be highly expressed in all organisms, and codon frequencies in these genes should be indicative of those in genes in which translational selection is operating. On the other hand, the levels of expression of the majority of genes in the genome are much lower and the strength of selection on the majority of genes may be negligible in comparison to that on the small number of very highly expressed genes. Using these assumptions, Sharp et al. (2005)
estimated the strength of selection acting in U + C codon families in bacterial genomes. We will use the same method. Let n
and n
be the number of U and C codons in a U + C codon family in the whole genome. These are labeled "low" because the majority of genes are assumed to be expressed at a low level. Let n
and n
be the numbers of codons in a small set of genes that are presumed to be highly expressed. If selection is negligible on the low-expression genes, we expect codon usage to depend only on the mutation parameter
:
|
| (6) |
and S via the
(S) function:
|
| (7) |
![]() | (8) |
and S can be estimated by simple counting of codons. In what follows, we also consider A + G codon families. The equations are equivalent with subscript G replacing C and subscript A replacing U. In this case, S is the selective advantageof the G codon with respect to the A. S is positive when G is preferred and negative when A is preferred.
Tables 1 and 2 show several examples of this method. In table 1, we consider codons for Asn and Asp, two examples of U + C codon families, and in table 2, we consider codons for Gln and Glu, two examples of A + G codon families. Four bacterial species and three eukaryotes have been chosen as illustrative examples of the patterns of tRNA copy numbers that occur in the A + G families. The bacteria were chosen from among the 80 bacterial species previously analyzed by Sharp et al. (2005)
. The high-expression gene set includes elongation factors Tu, Ts, and G and 37 ribosomal proteins genes, as described by Sharp et al. (2005)
. For comparison, the tables also include three eukaryotes where translational selection is thought to be important. We calculated these data using the sequences of the ribosomal proteins obtained from the ribosomal protein gene database (Nakao et al. 2004
) as the high-expression gene set.
|
|
In table 1, NG denotes the number of tRNA gene copies for the amino acid, all of which have G at the wobble position. In table 2, NU and NC denote the number of tRNA gene copies for the amino acid that have U or C at the wobble position. In table 1, we see that S is positive for Asn and Asp in each species. This agrees with our expectations because the wobble-G pairs better with C. In table 2, we see examples of both positive and negative S for Gln and Glu and the direction of selection depends on the tRNA copy numbers. Cases where NC = 0 (i.e., 1:0, 2:0, and 4:0) all have negative S. This shows that when the only tRNA has wobble-U, it interacts more efficiently with the A codon and the A codon is preferred. In cases where NU > NC > 0 (such as 2:1 and 4:2), S is still negative and A is still preferred. In cases where NU = NC or NU < NC, S is positive and G is preferred. This shows that if there are a sufficient number of wobble-C tRNAs relative to wobble-U tRNAs, the direction of selection on codon usage is reversed. This agrees with our expectation that wobble-C tRNAs interact with G codons but not A codons.
These examples are intended as motivation for the theory that follows. The bacterial species chosen are illustrative of the trends in the larger data set of 80 bacterial species that we considered and many other examples could have been chosen. We will give a statistical analysis of the full data set after presenting the theory. The three eukaryotes are included because these species are among the eukaryotes whose codon usage has been studied in most detail. The theory below is intended principally as a theory of codon usage in bacteria, but it also applies well to these three eukaryotes. For some other multicellular eukaryotes, codon usage does not seem to be dominated by translational selection in the same way. Thus, we will not discuss eukaryotes further in this paper.
An interesting feature of table 2 is that there are several species where selection goes in opposite directions for Gln and Glu, that is, where G is preferred for one amino acid and A is preferred for the other. The coevolution of codon usage and tRNA gene content has led to different stable states in the same organism. We now wish to show how this situation can arise using a simple model for translation kinetics.
| Translational Kinetics and Selection |
|---|
When considering how to incorporate translational kinetics into a theory for codon usage, we need to account for two principal facts. First, preferred codons correspond to tRNAs that are more frequent; thus, we expect that the rate of translation of a codon should increase with the frequency of its cognate tRNAs. Second, we know that selection occurs between codons that are translated by the same tRNA; thus, translation rates must depend on the individual anticodon–codon pair. The simplest assumption that incorporates these factors is to suppose that the rate at which tRNAs of type i translate codons of type j can be written as rij = Cikij, where Ci is the tRNA concentration and kij is a rate constant specific to the codon–anticodon pair. In the case that more than one tRNA translates the same codon, the rate of translation of codon j is
where the sum is over all types of tRNA that translate codon j. This amounts to the assumption that there is a single dominant step in the translation kinetics that is codon dependent and that this step is first order in the tRNA concentration. The same assumption has been made in previous theories for codon usage (e.g., Shields 1990
The translation rates for the U and C codons in a two-codon U + C family are therefore
|
| (9) |
(tU – tC), where
is an organism-specific constant. This means that S is a function of NG:
|
| (10) |
which combines all the previous constants into a single parameter that quantifies the cost of translation in an organism. If bGC > bGU, S will always be positive in equation (10).
In an A + G family, the rates of translation of the two codons are
|
| (11) |
![]() | (12) |
In equations (10 and 12), we have grouped the factors that affect the strength of selection into a single parameter K and separated these from the factors that influence the direction of selection (i.e., the N and b parameters). One of our aims below is to compare the cost of translation in different organisms. K is a useful parameter because it should be a property of an organism that depends on its lifestyle. For example, in rapidly multiplying organisms, the time taken in protein synthesis should be a significant fraction of the total cell division time. Hence, there should be significant selection to speed up translation and K should be large. In slowly multiplying organisms, the time for translation may not be a limiting factor. Hence, K may be small. In estimating selection strength for an organism, it may be useful to average over codon families in order to reduce the statistical error from counting small numbers of codons in each family. Sharp et al. (2005)
averaged all the estimated S values for the U + C codon families. However, equation (10) predicts that S depends on NG, which is different for different amino acids. Moreover, NG can evolve in response to translational selection, so it is useful to separate the driving force, K, from the response. Therefore, it seems more useful to use K than S to compare organisms. We will also consider A + G families, which were not considered by Sharp et al., and in this case, the difference between K and S is even more clear. An organism with a high K may have positive or negative S or may even have S close to zero if the N and b factors happen to almost cancel out in equation (12). Therefore, it would not be useful to average S over amino acids in this case.
| Coevolution with Fixed Total tRNA Copy Number |
|---|
In this section, we will consider an A + G codon family with tRNA copy numbers NU and NC. We suppose that the total number of tRNAs is fixed, but that the number of copies of each type can vary. If the total number of copies is four, as is the case for Gln and Glu in E. coli, then the possible combinations of NU:NC are 4:0, 3:1, 2:2, and 1:3. The combination 0:4 is forbidden because there must be at least one U tRNA to translate the A codons. The situation of fixed total copy number is discussed first because it is the simplest. We generalize to variable total number in the following section. However, the fixed total copy number case is similar to the situation considered by Bulmer (1987)
Let
denote the frequency of the G codon in an A + G family that would arise under SMD with specified tRNA copy numbers, that is,
which can be calculated from equations (5 and 12). Figure 1a shows
as a function of K for each possible combination of NU:NC. As the b parameters are relative rates, we can set one of them to 1 by definition. We will use the U tRNA + A codon combination as a reference, that is, bUA = 1. In the examples used in this paper, we will also set bCG = 1, for simplicity. Based on strengths of RNA base pair interaction, we would expect that bUG < 1 because UG pairs are weaker than Watson–Crick pairs. We will set bUG = 0.4 in these examples. We will show below that these parameter values appear to be close to optimal for interpreting the codon usage on bacterial genomes. For these parameter choices, S(4,0) and S(3,1) are negative, but S(2,2) and S(1,3) are positive; therefore,
(4,0) and
(3,1) decrease with K but
(2,2) and
(1,3) increase with K, as shown in figure 1a.
|
In genes where the frequencies of G and A codons are
and 1 –
, the mean time per codon is
![]() | (13) |
for each combination of tRNAs. We have set k0c0 = 1 in the figure because this is simply a multiplying factor. Selection is acting to minimize
= 0, corresponding to negative S, and the 2:2 and 1:3 lines slope down toward
= 1, corresponding to positive S. The codon frequency that would optimize translation time is always either 0 or 1, depending on the direction of the slope. However, the frequency that occurs is
, which is between 0 and 1 because it is also influenced by mutation and drift, not just by selection.
In a real genome, the anticodons of the tRNAs can adapt to the codon usage at the same time as the codon usage adapts to the tRNAs. A straightforward mutation at the wobble position can convert a U tRNA to a C tRNA, as discussed in the introduction. Thus, an organism can jump between the lines on figure 1b. If the codon frequencies are at SMD balance with specified NU and NC, the mean time per codon is
(
(NU,NC),NU,NC). If this combination of tRNAs is stable to anticodon mutation, then this time must be less than the time that would occur at the same codon frequencies with any other copy numbers MU and MC such that MU + MC = NU + NC:
|
| (14) |
, where the corresponding time is the lowest. The boundaries between these regions are indicated by dotted vertical lines in figure 1b. The same boundaries are indicated by dotted horizontal lines in figure 1a. In order for the NU:NC combination to be stable to anticodon mutations,
must lie within the corresponding boundaries. Stable regions are indicated by thick black lines in figure 1a. In these regions, the tRNAs and codons are coadapted. In the regions of the curves drawn with thin lines, the codons are adapted to the tRNAs but the tRNAs are not adapted to the codons. Hence, the configuration is not stable to anticodon mutations.
For the example illustrated in figure 1, the GC content arising from mutation is
= 0.5. Thus all curves tend to 0.5 as K tends to zero. This lies in the interval where the 3:1 combination is stable. If K increases, the other three combinations become stable within certain intervals of K. For moderate or large K, there is always more than one stable solution. This is the central point of this paper: In species where there is significant translational selection, alternative stable states of codon usage exist where codons and tRNAs are coadapted. It is therefore possible for the codon usage in codon families for different amino acids to be biased in different directions, even if they are subject to the same mutation and selection processes, as we saw in the examples in table 2.
In this theory, we are treating mutations in anticodons in a different way from those in codons. For synonymous mutations in codons, we assume that selection is balanced by mutation and drift, so that at any synonymous site, it is possible for either an optimal or nonoptimal codon to occur with some probability. In contrast, if a favorable mutation occurs at the wobble position of the anticodon, we assume that it is always selected. This is because a mutation at a synonymous site affects the translation time of only one codon, whereas a mutation in the anticodon affects the translation time of all the codons for that amino acid. Selection on the wobble position is therefore orders of magnitude stronger than on a synonymous site. If selection is large enough to cause a significant bias at synonymous sites, then selection on the wobble position should be very large indeed, and it is reasonable to assume that the optimal state of the anticodons will always be fixed in the population. This point has also been made by Jia and Higgs (2008)
with regard to the evolution of tRNAs and codon usage in mitochondrial genomes.
As we have assumed that the concentration of tRNAs is proportional to the number of gene copies and that the total number of gene copies is fixed, this case is similar to previous theories that considered concentrations of two tRNAs with a fixed total concentration, C1 + C2 = 1. However, there are important differences. The selection parameter used by Bulmer (1987)
and Shields (1990)
is
which may be compared with our equation (12). This has the problem that it becomes infinite if either C1 or C2 goes to zero. It does not capture the way anticodon–codon interactions work because it assumes that two separate tRNAs translate the two different codons, whereas in reality, wobble-G tRNAs can translate both codons in U + C families and wobble-U tRNAs can translate both codons in A + G families. In A + G families, it is possible for NC to be zero when NU is nonzero because the U tRNA translates both codons. The selection strength in equation (12) can never be infinite in this case. On the other hand, NU cannot be zero because then it would be impossible to translate A codons. Shields (1990)
found that, if tRNA concentrations were treated as continuous variables, there would be a synergistic increase of tRNA concentrations and codon frequencies leading to ever-increasing biases. He thus concluded that organisms would exhibit either no codon bias or complete codon bias. We disagree with this: The bias is always finite according to our theory because the selection parameter is always finite.
Interestingly, the problem of ever-increasing biases in the Shields theory applies only to the case where all genes are assumed to have equal expression levels. He also considered a model with separate classes of high- and low-expression genes, assuming that a small proportion of highly expressed genes accounts for half the total gene expression and assuming a lower selection strength in the low-expression genes. In our theory, we have made the more extreme assumption that the translational effort of the cell is dominated by the high-expression genes. Therefore, only the time
for the high-expression genes is relevant for the translational cost, and we only need to consider selection on one class of genes. It would be possible to modify our theory to include a reduced level of selection on low-expression genes and a parameter to control the fraction of the total translational effort of the cell that goes into high- and low-expression genes. However, we do not want to make the theory more complicated with additional parameters at this point. We emphasize that it is not necessary to consider genes with two different selection levels in order to avoid the problem of ever-increasing biases. The problem is avoided by a more realistic choice of the selection parameter in our model.
| Coevolution with Variable Total tRNA Copy Number |
|---|
|
|
|---|
In reality, tRNA copy numbers can change by gene duplication and deletion as well as anticodon mutation. Therefore, the total tRNA copy number for an amino acid is not fixed. In organisms where translational selection is strong, it will pay to make duplicate copies of a tRNA gene so that the tRNA concentration will increase and translation will be faster. However, there is some cost to the organism for duplicating this gene. Bacteria do not often retain redundant duplicate genes because they are under selection for rapid replication and, therefore, their genome size tends to be minimized. Duplicating the tRNA therefore has a cost in terms of increased DNA replication time. It also has a cost in terms of transcription. When the gene is present, the organism will expend time and energy in making tRNA molecules by transcribing this gene. This would be disadvantageous to the organism if the extra tRNAs were not beneficial for translation. We expect the total cost of translation of codons of a given amino acid to be
|
| (15) |
The U + C families become of greater interest when the number of tRNAs can vary. For a U + C family, the translational cost T is a function of the frequency
of the C codon and the number of wobble-G tRNAs. From equations (9, 10, and 15), we have
|
| (16) |
from equations (5 and 10). If this solution is stable against duplication or deletion of tRNAs, we must have
|
| (17) |
Figure 2 shows the
curves as a function of K for each value of NG. Regions that are stable according to equation (17) are indicated by thick black lines. In this example, bUA = bGC = 1, bGU = 0.4, and
= 0.5. If K is small enough, T is minimized by setting NG = 1, irrespective of the other parameters. Above a certain value of K, the NG = 1 solution becomes unstable. As K increases, each successive value of NG becomes stable in a range that partially overlaps the previous value of NG, so once again, there can be more than one stable state for a given K. The ranges of
covered by the stable regions of the curves are quite broad. Thus, there might be a considerable amount of variation in the degree of codon bias observed in organisms with the same number of tRNA genes. Paradoxically, because the strength of selection varies inversely with NG, the degree of codon bias actually decreases after a tRNA gene duplication. However, the stable range of
shifts toward more biased codon usage as NG increases; therefore, on average, we expect to see stronger codon bias in organisms with more tRNA genes.
|
The regions of stability depend on the ratio g/fa, which is set to 0.3 in this example. If this ratio is lower, the regions of stability slide down the curves to lower ranges of K, although the positions of the curves themselves do not depend on g/fa. Thus, it is more favorable to add an extra tRNA if the cost per tRNA gene is lower or if the frequency of the amino acid is higher.
We now return to A + G codon families and consider both anticodon mutations and tRNA duplications and deletions at the same time. In the same way as above, the translational cost is
![]() | (18) |
|
| (19) |
NU:NC.
Figure 3 shows an example with bUA = bCG = 1, bUG = 0.4,
= 0.5, and g/fa = 0.3. For clarity, only the stable regions of the curves are shown. As with figure 2, the solution with only one tRNA is stable at very low K because the time-dependent term in the cost function becomes smaller than the term involving the cost per gene, so the total cost is minimized by minimizing the number of genes. Solutions with larger numbers of tRNAs become more stable as K increases. This is different from the situation in figure 1, where the 3:1 solution was stable at low K. The parameters in figures 1 and 3 are the same, with the exception of the addition of g/fa in figure 3. The 3:1 solution is stable against anticodon mutations at low K but not against tRNA deletion.
|
There are some combinations (e.g., 1:2, 2:2, 3:2, and 2:3) that do not appear on figure 3 because there is no stable region of the corresponding curve for any K. The combinations that have a stable range depend on the other parameters, and in particular, they are influenced by
, the GC content specified by the mutation rates. In figure 4,
= 0.7, rather that 0.5, as it was in figure 3. In this case 1:2, 2:2, 3:2, and 2:3, all have a stable region but there is no stable region for 2:0, 3:0, and 4:0.
|
As
has an important influence on the stability of the different solutions, it is interesting to consider the way codon frequencies and tRNA combinations are likely to vary with
when K is fixed. Figure 5 shows two examples with K = 0.5 and 3.0. The other parameters are as before. For K = 0.5, only low tRNA number solutions are stable (1:0, 2:0, and 1:1). Each of these is stable within an interval of
. As selection is weak in this example,
does not differ much from
on any of these curves (all the curves lie close to the diagonal). For K = 3.0, solutions with larger tRNA numbers are stable (3:0, 4:0, 3:1, 2:2, and 1:3). There is significant codon bias in this example:
can be considerably higher or lower than
, especially for the more uneven tRNA combinations.
|
Figure 5 gives an idea of what might be expected to happen if there is a gradual change in the GC content of the genome of an organism with time due to changing of the relative mutation rates between the different bases. A GC rich organism with
= 0.9 might initially lie on the 1:3 curve. If
is gradually reduced, the GC content in the majority of genes in the genome will follow this, but the GC content in the high-expression genes will follow the 1:3 curve and will therefore remain very high until this curve becomes unstable at
close to 0.5. At this point, there will be a shift in the tRNA content and a sudden change in the codon usage in the highly expressed genes. A similar behavior would also occur according to the theory of Shields (1990)| Comparison of Theory with Bacterial Codon Usage Data |
|---|
We now return to the analysis of the 80 bacterial genomes. Sharp et al. (2005)
where NG is known from the genome. The factor involving the b parameters is simply a multiplying factor that we assume is the same for each amino acid. For consistency with the examples above, we have used bUA = bGC = 1 and bGU = 0.4, but the relative values of K for each organism do not depend on this choice of b parameters. Thus, as a measure of the cost of translation in an organism, we will use
![]() | (20) |
Rocha (2004)
estimated the minimum doubling times for different bacteria. He showed that the number of tRNA genes in the genome decreases with doubling time and increases with codon bias (estimated from comparing effective numbers of codons in high- and low-expression genes). Our theory explains why this occurs. Our estimate of
is a measure of the time-dependent cost of translation and hence of the benefit to be gained by optimizing translation. Figure 6a shows the relationship between doubling time and
for 77 of the 80 species considered here. Three species where no doubling time estimate was available were excluded (Aquifex aeolicus, Mycoplasma penetrans, and Wigglesworthia glossinidia). There is a strong negative correlation between doubling time and
because translational selection is strongest in fast growing organisms that need to synthesize proteins very rapidly. Figure 6b shows that the total number of tRNA genes increases as a function of
. Our theory explains why tRNA duplication is favorable in species with high K and therefore explains why tRNA numbers increase with K and decrease with doubling time.
|
In contrast to this, the diversity of tRNAs (the number of distinct anticodons) appears to decrease slightly with
is small and K is large, tRNA combinations such as 3:0 and 4:0 are expected to be stable (fig. 5), whereas when
is large and K is large, combinations like 1:3 will be stable, which requires two distinct tRNAs instead of one. Thus, we expect that tRNA diversity will be larger in high-GC organisms. In fact, it has already been shown that this is the case in bacterial genomes—see figure 4D of Kanaya et al. (1999)
We will now consider the way the observed tRNA gene combinations depend on
. There are 14 of the 80 bacterial species for which
is negative: Borrelia burgdorferi, Buchnera aphidicola (Ap, Bp, and Sg), Chlamydophila pneumoniae, Chlorobium tepidum, Neisseria meningitidis, Nitrosomonas europaea, Pseudomonas aeruginosa, Rickettsia conorii, Rickettsia prowazekii, Treponema pallidum, Tropheryma whipplei, and Xylella fastidiosa). With the exception of C. tepidum, these species also have negative
. Sharp et al. (2005)
give two possible explanations of negative
values: either base composition may be skewed between strands and high-expression genes may be predominantly on the leading strand or there may be islands of unusual base composition arising from horizontal transfer. These species are all slow-growing organisms with low tRNA numbers. It is likely that translational selection is very weak in these species and that the small effects mentioned above are more important than translational selection. As tRNA-dependent translational selection is not the dominant effect in these organisms, they are a poor test of this theory; therefore, we exclude these species from the following analysis. The remaining 66 species cover a wide range of
, doubling time and total tRNA number and provide a good test set.
Table 3 summarizes information from six U + C amino acids (Phe, Ile, Tyr, His, Asn, and Asp) in the 66 species. Table 4 summarizes information from three A + G amino acids (Gln, Lys, and Glu) in the same species. Each row is a category corresponding to a given number of tRNA genes. Nobs is the number of observations in each category. In table 3, the case of Ile in Clostridium acetobutylicum has been excluded because there are no tRNAs for Ile annotated on the genome, which is presumably an error in tRNA identification. The sum of the Nobs column is therefore 395 (= 6 x 66 – 1). In table 4 the case of Gln in Streptomyces coelicolor is excluded because the annotated tRNA combination is NU:NC = 0:2, which should be impossible according to our assumption that NU must be at least one. This could represent a very unusual tRNA combination, but more likely, it is an error in tRNA annotation. The sum of the Nobs column is therefore 197 (= 3 x 66 – 1). The "mean K" is the mean value of
for all the observations in each category. In table 3, the mean K increases with NG. In table 4, the mean K increases with NU in the categories from 1:0 to 7:0, and also from 1:1 to 3:1, and from 1:2 to 7:2. These results confirm the prediction of the above theory that larger numbers of tRNAs should be found in organisms with higher K. Note that only the U + C amino acids were used in order to estimate
for each species. However, these results show that
is also a predictor of what happens in the A + G amino acids. If selection is strong in an organism, it causes higher numbers of tRNAs to arise for both U + C and A + G amino acids.
|
|
We also wish to test the sign of the selective effect. The tables show the times of translation of the two codons (with k0c0 factor set to 1) and the difference in the times. The sign of the selective effect should be the same as the sign of the time difference. The mean S column of the tables is the mean of Sa for each of the observations in the category. In table 3, the mean S is positive in every case, as expected. In table 4, the mean S can be either positive or negative, but it has the same sign as the time difference in every case except for the 7:0 category, in which there are only two observations. We also calculated Nsign, the number of observations for which Sa has the same sign as the time difference. In table 3, almost all the observations in each category have the correct sign. In table 4, a majority of observations have the correct sign but by no means all. One reason for this is probably statistical, because Sa is estimated from relatively small numbers of codons in the high-frequency genes. The statistical error will be more likely in categories where the true selective effect is smaller. The true effect is small when K is small, such as the 1:0 category or for categories that are close to the balance point where the two codons have equal times, such as the 2:1 category.
It should be remembered that the predicted sign of the effect depends on the b parameters. Throughout this paper, we have assumed that bUA = bCG = bGC = 1 and bUG = bGU = 0.4. If the parameters are shifted too much from these values, the agreement with the observations becomes worse. For example, there is a clear majority of observations with positive selection in the 1:1 category. Thus from equation (12), we have the inequality bUG + bCG > bUA. There is also a clear majority of observations with negative selection in the 3:1 category; hence, 3bUG + bCG < 3bUA. The observations in the 2:1 category are more evenly split, which suggests that there is little difference in the times for the two codons when there is a 2:1 ratio of tRNAs. We are currently carrying out a more complete evaluation of the effect of varying the b parameters in order to determine the values that give the best explanation of the data. Data from four-codon families can also be included. These results will be presented elsewhere. Preliminary results suggest that the values we have used in this paper are not far from optimal in terms of maximizing the number of observations for which the sign is correctly predicted.
One of the reasons the sign is not correctly predicted in 100% of the cases may be that the rates depend on the other two positions in the anticodon, not just the wobble position. In that case, the relative rates of translating the two codons may not be the same for each amino acid, so no single set of b parameters would make correct predictions for Gln, Lys, and Glu at the same time. It is also possible that there is variation in the level of transcription arising from different tRNA gene copies, so that the tRNA concentrations are not exactly proportional to the gene copy numbers. This could also lead to a shift in the expected sign of the selective effect. Furthermore, it should be remembered that the wobble positions of some tRNAs are changed to modified bases. We have referred to the unmodified bases only because the modifications are not known directly from the tRNA gene sequences. However, if different modifications occur in different organisms or in different tRNAs in the same organism, this could also change the b parameters.
| Discussion |
|---|
We have used a very simple assumption about kinetics of translation in this paper, namely that the rate of translation of a codon by a given tRNA is proportional to the tRNA concentration and a rate constant that depends on the codon–anticodon combination. Experimental studies of the translation process have shown that there are many steps to the translation of each codon (Rodnina and Wintermeyer 2001
The theory in this paper has been based on selection for translation speed. However, another type of translational selection that cannot be ruled out by this work is selection for translational accuracy. A test for translational accuracy is that preferred codons are sometimes more frequent at conserved sites than variable sites in the same gene (Akashi 1994
; Stoletzki and Eyre-Walker 2007
). If mistranslation occurs (i.e., a codon is translated by a noncognate tRNA), the resulting protein may misfold. Drummond et al. (2005)
have argued that misfolded proteins are toxic to the cell, and hence that selection will favor use of codons for which the probability of mistranslation is lowest.
Some of the codon usage effects discussed in this paper can be explained by both translational accuracy and speed. If speed is important, selection should be strongest on highly expressed genes because more time will be saved by changing a codon that is translated more often. However, if there is a small probability of mistranslation each time a protein is made, more mistranslated proteins will result from genes that are translated more often; therefore, selection for accuracy would also be stronger in highly expressed genes. The relative importance of speed and accuracy could differ among organisms, and it would not be surprising if accuracy were more important than speed in larger multicellular organisms where cell division is slow. On the other hand, in bacteria, which we discuss here, the fact that codon bias is strongest in rapidly multiplying cells seems a strong indicator to us that translation speed is the key factor. Nevertheless, we acknowledge the possible argument suggested by a reviewer that rapidly multiplying organisms would produce mistranslated proteins at a higher rate and would stand a greater chance of overloading their apparatus for degradation of misfolded proteins, thus producing stronger selection for accuracy in rapidly multiplying cells.
The probability, p, that a codon is mistranslated may be written
|
| (21) |
Here, we mention two other recent theories that consider the relative ability of different kinds of anticodon–codon combinations to pair. Dos Reis et al. (2004)
have developed a tRNA adaptation index to assess the degree to which codon usage in a gene is adapted to the tRNA content of the genome. Xia (2008)
has considered coevolution between codon usage and tRNA anticodons in fungi mitochondria. Both papers use parameters that have some relation to our bij parameters, although they are defined in different ways.
A simplifying assumption made in our theory is that tRNA concentrations are directly proportional to gene copy number. For some organisms, information is available about the concentrations of tRNAs in the cell. It would therefore be possible to use these concentrations explicitly in the theory. We have not done this because it would then be impossible to carry out the statistical survey of large numbers of organisms. In cases where it has been measured, a rough proportionality between concentration and gene number exists, for example, in Bacillus subtilis (Kanaya et al. 1999
) and Saccharomyces cerevisiae (Percudani et al. 1997
). In E. coli more detailed information about regulation of tRNA gene expression is available (Dong et al. 1996
). When E. coli is grown at a variety of different growth rates, it is found that the concentration of tRNAs cognate to the most frequent codons increases as growth rate increases, although not dramatically, and the concentrations of tRNAs cognate to less-frequent codons remain unchanged with growth rate. This suggests some degree of regulation of tRNA gene expression. One factor causing regulation of tRNA genes is the positioning of genes within the genome. Genes close to the origin of replication may be present in a double dose, whereas those that are further away will be replicated later in the cycle and are less likely to be present in a double dose. This should lead to corresponding variation in the tRNA concentrations. Ardell and Kirsebom (2005)
have investigated the dosage effect and also the effect on expression of transcription of tRNAs in operons of several genes.
Although these complex details are interesting, we should not forget that the simplest way to regulate the concentration of tRNAs is to duplicate or delete the gene. We presume that duplications and deletions occur randomly with respect to the type of tRNA but selection operates among genome variants with different gene contents. In organisms with strong translational selection, the tRNA gene content is important to the organism and there will be significant selective differences among genome variants with different tRNA copy numbers. Genomes with efficiently coevolved sets of tRNA genes will tend to replace those with less efficient sets. Although there have been many previous studies of codon usage, ours is the first that gives a theory explicitly describing the coevolution of codon usage with tRNA gene content and that carries out a large-scale survey of the trends in many bacterial genomes that are caused by this coevolution.
| Acknowledgements |
|---|
We are indebted to Paul Sharp for supplying the codon usage data in 80 bacterial species, Eduardo Rocha for supplying information on bacterial growth times and tRNA gene copy numbers, and Hiroshi Akashi for helpful comments on codon usage data in Saccharomyces cerevisiae. This work was funded by the Canada Research Chairs organization and by the Natural Sciences and Engineering Research Council of Canada.
| Footnotes |
|---|
Jeffery Thorne, Associate Editor
| References |
|---|
Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics (1994) 136:927–935.[Abstract]
Akashi H. Translational selection and yeast proteome evolution. Genetics (2003) 164:1291–1303.
Ardell DH, Kirsebom LA. The genomic pattern of tDNA operon expression in E. coli. PLoS Comput Biol (2005) 1(1):e12.[Medline]
Blanchard SC, Gonzalez RL Jr, Kim HD, Chu S, Puglisi JD. tRNA selection and kinetic proofreading in translation. Nat Struct Mol Biol (2004) 11:1008–1014.[CrossRef][Web of Science][Medline]
Bulmer M. Coevolution of codon usage and transfer RNA abundance. Nature (1987) 325:728–730.[CrossRef][Web of Science][Medline]
Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics (1991) 129:897–907.[Abstract]
Curran JF, Yarus M. Rates of aminoacyl-tRNA selection at 29 sense codons in vivo. J Mol Biol (1989) 209:65–77.[CrossRef][Web of Science][Medline]
Daviter T, Gromadski KB, Rodnina MV. The ribosome's response to codon-anticodon mismatches. Biochimie (2006) 88:1001–1011.[CrossRef][Web of Science][Medline]
Dong H, Nilsson L, Kurland CG. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol (1996) 260:649–663.[CrossRef][Web of Science][Medline]
Dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res (2004) 32:5036–5044.
Dos Reis M, Wernisch L, Savva R. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res (2003) 31:6976–6985.
Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Nat Acad Sci USA (2005) 102:14338–14343.
Duret L. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet (2000) 16:287–289.[CrossRef][Web of Science][Medline]
Heyd A, Drew DA. A mathematical model for elongation of a peptide chain. Bull Math Biol (2003) 65:1095–1109.[CrossRef][Web of Science][Medline]
Higgs PG, Jameson D, Jow H, Rattray M. The evolution of tRNA-Leucine genes in animal mitochondrial genomes. J Mol Evol (2003) 57:435–445.[CrossRef][Web of Science][Medline]
Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol (1981) 151:389–409.[CrossRef][Web of Science][Medline]
Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol (1985) 2:13–34.[Abstract]
Jia W, Higgs PG. Codon usage in mitochondrial genomes: distinguishing context-dependent mutation from translational selection. Mol Biol Evol (2008) 25:339–351.
Kanaya S, Yamada Y, Kudo Y, Ikemura T. Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs. Gene (1999) 238:143–155.[CrossRef][Web of Science][Medline]
Kramer EB, Farabaugh PJ. The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA (2007) 13:87–96.
Lavrov DV, Lang BF. Transfer RNA gene recruitment in mitochondrial DNA. Trends Genet (2005) 21:129–133.[CrossRef][Web of Science][Medline]
Li WH. Models of nearly neutral mutation with particular implications for nonrandom usage of synonymous codons. J Mol Evol (1987) 24:337–345.[CrossRef][Web of Science][Medline]
Nakao A, Yoshihama M, Kenmochi N. RPG: the Ribosomal Protein Gene database. Nucleic Acids Res (2004) 32:D168–D170.
Ninio J. Multiple stages in codon-anticodon recognition: double trigger mechanisms and geometric constraints. Biochimie (2006) 88:963–992.[CrossRef][Web of Science][Medline]
Percudani R, Pavesi A, Ottonello S. Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae. J Mol Biol (1997) 268:322–330.[CrossRef][Web of Science][Medline]
Rocha EPC. Codon usage bias from the tRNA's point of view: redundancy, specialization, and efficient decoding for translational optimization. Genome Res (2004) 14:2279–2286.
Rodnina MV, Wintermeyer W. Fidelity of aminoacyl-tRNA selection on the ribosome: kinetic and structural mechanisms. Annu Rev Biochem (2001) 70:415–435.[CrossRef][Web of Science][Medline]
Saks ME, Sampson JR, Abelson J. Evolution of a transfer RNA gene through a point mutation in the anticodon. Science (1998) 279:1665–1667.
Sengupta S, Yang X, Higgs PG. The mechanisms of codon reassignments in mitochondrial genetic codes. J Mol Evol (2007) 64:662–688.[CrossRef][Web of Science][Medline]
Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res (2005) 33:1141–1153.
Sharp PM, Cowe E, Higgins DG, Shields DC, Wolfe KH, Wright F. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within species diversity. Nucleic Acids Res (1988) 16:8207–8211.
Sharp PM, Li WH. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res (1987) 15:1281–1295.
Shields DC. Switches in species-specific codon preferences: the influence of mutation biases. J Mol Evol (1990) 31:71–80.[CrossRef][Web of Science][Medline]
Solomovici J, Lesnik T, Reiss C. Does Escherichia coli optimize the economics of the translation process? J Theor Biol (1997) 185:511–521.[CrossRef][Web of Science][Medline]
Sorensen MA, Pedersen S. Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J Mol Biol (1991) 222:265–280.[CrossRef][Web of Science][Medline]
Stoletzki N, Eyre-Walker A. Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol Biol Evol (2007) 24:374–381.
Withers M, Wernisch L, Dos Reis M. Archaeology and evolution of transfer RNA genes in the Escherichia coli genome. RNA (2006) 12:933–942.
Wright F. The effective number of codons used in a gene. Gene (1990) 87:23–29.[CrossRef][Web of Science][Medline]
Xia X. The cost of wobble translation in fungal mitochondrial genomes: integration of two traditional hypotheses. BMC Evol Biol (2008) 8:211.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











