MBE Advance Access originally published online on October 4, 2006
Molecular Biology and Evolution 2007 24(1):122-129; doi:10.1093/molbev/msl139
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Selective Constraints on Codon Usage of Nuclear Genes from Arabidopsis thaliana

* Department of Biological Science, Barnard College, Columbia University
Department of Biology, York University, Toronto, Canada
E-mail: bmorton{at}barnard.edu.
| Abstract |
|---|
|
|
|---|
Highly expressed nuclear genes from Arabidopsis thaliana show an increased frequency of codons that match abundant tRNAs, and it has been suggested that this reflects a selective pressure to increase translation efficiency. Here we explore the possibility that the difference in codon usage between highly expressed genes and other Arabidopsis genes is not the result of selection but, rather, arises from mutation biases. Specifically, we explore the possibility that an influence of transcription level on mutational properties coupled with a context dependency of mutations, both of which have been observed in various organisms, contribute to variation in codon-usage bias across genes. Using noncoding sites immediately flanking both high- and low-expression–coding sequences to infer context-dependent composition biases, we analyze codon-usage bias across genes. The data show that mutation bias cannot explain codon usage of high-expression genes in Arabidopsis and, surprisingly, also indicate that even low-expression genes are under selective constraints. In addition, the data indicate that the general preference for certain codons is context dependent; the composition of the 3' nucleotide, that is, the first position of the next codon, is correlated with what codon is found at an increased frequency in highly expressed genes. This context dependency indicates that selective pressure on codon usage is more complex than previously thought. Overall, the study supports previous suggestions that selection plays a significant role in determining codon usage of nuclear genes in A. thaliana.
Key Words: codon bias arabidopsis dinucleotide gene expression
| Introduction |
|---|
|
|
|---|
Analysis of codon bias is typically based on the selection–mutation–drift hypothesis (Bulmer 1991
The contribution, if any, of variation in mutation bias across genes to variation in codon bias has not been given the same amount of attention as that given for selection. However, if we wish to assert that variation in codon usage is a result of selection, particularly in cases such as Arabidopsis, where the variation in codon bias across genes is statistically significant but small, we need to exclude the possibility that mutation bias is the determining factor. Several lines of evidence suggest the possibility that complex mutation dynamics may affect patterns of codon usage. Analyses of chloroplast DNA (Morton 2003
) and more recently nuclear DNA (Morton et al. 2006
) from grasses has shown that the mutation processes can be strongly context dependent, and although mutation patterns in plant nuclear genomes have not been studied in the same detail, it has been shown that the codon usage of plant nuclear genes is context dependent (Fedorov et al. 2002
). Additionally, evidence from other organisms indicates that the mutation process can be influenced by transcription, particularly through transcription-coupled repair systems (Francino et al. 1996
; Francino and Ochman 1997
; Beletskii and Bhagwat 1998
; Klapacz and Bhagwat 2002
; Ochman 2003
). Overall, this emerging evidence about the complexity of mutation biases raises the issue of whether or not variation among genes in mutation bias, particularly if this bias is context dependent, is correlated with transcription, and thus potentially expression, level. Differences in codon usage between low- and high-expression genes could then arise through complex mutation differences, not selection. If this were the case, then the observed correlation between codon usage and tRNA abundance would not reflect selection on codon usage but might simply be the result of tRNA levels adjusting to codon usage.
To assess whether or not mutation bias could explain the differences in codon usage between low- and high-expression genes in A. thaliana, we utilize the nucleotide composition patterns of flanking noncoding regions from both sets of genes. Given the evidence regarding context dependency in related plants (Morton 2003
; Morton et al. 2006
), we use trinucleotide frequencies in order to control for context effects to 1) compare composition patterns of noncoding DNA flanking high- and low-expression genes and 2) use noncoding DNA to predict codon-usage patterns of both high- and low-expression genes. The data show that there are differences in nucleotide composition bias between regions flanking low- and high-expression genes. However, even when controlling for context, these differences cannot explain the difference in codon usage between low and high expressed genes nor can we explain the general codon usage of low-expression genes. We infer that selection is responsible for the codon usage of highly expressed genes, as was posited earlier (Wright et al. 2004
; Kliman and Henry 2005
). Additionally, the data show patterns in the codon usage of low-expression genes that cannot be explained by mutation alone indicating that selection, probably for several different biological features, has a more general influence on codon usage in Arabidopsis than previously supposed.
| Materials and Methods |
|---|
|
|
|---|
High- and low-expression genes in this study were those that were defined previously for Arabidopsis based on MPSS data across tissue types (Wright et al. 2004
For the intergenic sequences, we determined mononucleotide, dinucleotide, and trinucleotide composition. Dinucleotide signatures were calculated as described by Campbell et al. (1999)
for intergenic sequences as well as for the second- to third-codon position dinucleotides of the coding sequences. These signatures consist of a frequency for each dinucleotide that has been normalized by the expected frequency given random base associations. Comparisons of nucleotide composition between sets of regions, such as between regions flanking low-expression genes and regions flanking high-expression genes, were made by performing a region resample. For a comparison of n1 sequences from 1 set and n2 sequences from a second set, all N = n1 + n2 were pooled. The sequences were then randomly partitioned into 2 sets of n1 and n2 sequences each. A distance measure was then calculated for base composition, dinucleotide composition, and dinucleotide signature; in this case, we used the square root of the sum of squared differences between the arrays of the 2 sets, which is the distance between the 2 arrays, or vectors, in n-dimensional space (Grossman 1984
). Three distances were calculated for each comparison: 1 for the base frequency array, 1 for the dinucleotide frequency array, and 1 for the dinucleotide signature array. This pooling and random partitioning was repeated over 10,000 trials to generate an expected distribution of distances, assuming that the 2 sets of sequences are drawn from a single pool with the same composition bias, and out of these 10,000, the number of occurrences of a distance greater than the observed value was counted.
Calculations of expected codon usage in the absence of selection were based on the nucleotide composition of specified sets of intergenic sequences. For any one calculation, we first defined the set of intergenic sequences to be used as the basis for the calculation, such as, for example, the sequences immediately downstream from high-expression genes. The cumulative trinucleotide composition of the intergenic sequences was calculated and then used to determine context-dependent base frequency arrays. To generate these arrays, we collapsed the trinucleotides into 16 groups of 4, with each group consisting of the trinucleotides with the same 2 external bases. The 4 trinucleotides within each group, then, differ only in the internal nucleotide, and these provide the number of occurrences of the 4 bases within the nucleotide context defined by the 2 external bases. The relative frequencies of the 4 trinucleotides were then calculated, and these represent the base frequency array for that context. For example, the relative frequencies of GGT, GCT, GTT, and GAT make up 1 composition array {G, C, T, A}, given the flanking composition of a 5' G and a 3' T (which can be designated by G_T). Using context-dependent base frequency arrays from specific sets of intergenic sequences allows us to try to control for context dependency and/or transcription-coupled mutation effects (by using, e.g., UTR sequences).
Given the 16 context-dependent base frequency arrays, we then calculated expected codon usage as a function of context, that is, the composition of the 5' and 3' flanking bases. Expected codon usage is simply the base frequency at the third-codon position; the three 6-fold degenerate codon groups were broken down into separate 4-fold and 2-fold degenerate groups, each of which varies only at the third position. The genetic code defines the composition of the nucleotide flanking the third position on the 5' side (which is the second-codon position), and the 3' flanking nucleotide is the first-codon position of the downstream codon, which can vary. Therefore, expected context-dependent codon usage will consist of 4 separate codon arrays, one for each of the 3' flanking bases. The calculation for a particular synonymous codon group upstream from a specific 3' base (the downstream base) was performed in 2 steps. First, we counted the total number codons from the group occurring upstream from that base in the gene sequence(s) being analyzed. Second, the intergenic base frequency array for the appropriate flanking base composition (defined by the second-codon position and the specified downstream base) was multiplied by the total number of codons to generate expected usage at the third position. For example, we can calculate expected codon usage for the synonymous group GGN (glycine) upstream from the base T. The context of the third-codon position is G_T (a 5' G defined by the genetic code and a 3' T). We first count the total number of GGN|T codons, where "|" represents the codon boundary, in the gene sequences under analysis; this number is M. The base frequency array {G, C, A, T} in the G_T context from the intergenic regions is then multiplied by M to yield the expected numbers of GGG|T, GGC|T, GGT|T, and GGA|T codons. This was repeated for GGN for each of the other 3 downstream bases to generate the overall expected usage of GGN codons as a function of the downstream base. For 2-fold degenerate groups, the number of codons, M, was multiplied by a reduced array ({G, A} or {C, T}) in which the 2 frequencies were extracted from the full array and then normalized to sum to 1. Repeating this calculation for all synonymous groups yields a full context–dependent codon usage for the set of genes being studied, predicated on the intergenic sequences used to generate the base frequency arrays. All sequence and codon-usage calculations were performed using a Java package written by one of the authors (B.R.M.), which is available upon request.
| Results and Discussion |
|---|
|
|
|---|
A number of recent studies on a diverse array of taxa provide evidence for selective constraints on noncoding DNA (Dubchak et al. 2000
Composition of Noncoding Regions Flanking High- and Low-Expression Genes
We tested for a contribution of transcription-coupled mutation and/or repair processes to composition bias by comparing the nucleotide composition patterns of noncoding sequences immediately flanking both high- and low-expression genes. To do this, we generated 2 cumulative composition data sets: 1 from the 100 nt immediately downstream from the coding regions (i.e., start codons) of high-expression genes and 1 from the 100 nt immediately downstream from the coding regions (i.e., stop codons) of low-expression genes.
Overall, the 100 nt downstream of high and low-expression genes are similar to each other in cumulative composition with an A + T content of 68.0% (G = 15.9%; A = 34.4%; T = 33.5%; C = 16.1%) for regions flanking high-expression genes and an A + T content of 68.2% (G = 16.3%; A = 31.7%; T = 36.4%; C = 15.6%) for regions flanking low-expression genes. Although the difference in A + T content is not significant as tested by an ANOVA (F = 0.105, P > 0.1), the resampling test indicates a general difference in base composition between the 2 sets of sequences; on generating a distribution of composition distances from 10,000 resamples (see Materials and Methods), we did not get any that had a larger distance than the observed value. This is probably due largely to the skew toward T over A in the low-expression set.
We also examined whether or not the 2 sets of regions differ in dinucleotide composition. A comparison of dinucleotide signatures (i.e., dinucleotide frequencies divided by the frequencies expected under random base association) for downstream regions flanking high- and low-expression genes is shown in figure 1. The signatures of high- and low-expression genes are very similar to one another, and both are similar to what is commonly observed for dinucleotide signatures; there is a significant underrepresentation, as defined by a signature value of less than 0.78 (Campbell et al. 1999
), of CG and TA dinucleotides as well as a higher than random representation of RR and YY dinucleotides, all of which are features that are common across taxa (Campbell et al. 1999
). The underrepresentation of CG dinucleotides is probably due either to selection against these potential methylation sites or to a high rate of spontaneous mutation resulting from cytosine deamination (Morton et al. 2006
). The latter is probably not a full explanation because it is not the case that the resulting deamination products (CA and TG) are always overrepresented. Interestingly, flanking regions of high-expression genes do not show as low a CG signature as seen in low-expression genes (fig. 1), which might reflect some difference in CpG methylation between the 2 gene sets.
|
Although the dinucleotide signatures appear to be similar, the resampling test again showed that the 2 sets of sequences do in fact have significantly different dinucleotide compositions; the observed distance between both the frequency arrays and the signature arrays of the 2 sets of regions was larger than every distance generated from 10,000 resamples. Similar overall results were found for the upstream flanking regions as well as for tests using 25 and 500 nt flanking each gene (data not shown).
There is also a difference between the upstream and downstream composition (fig. 2). Upstream regions are slightly less A + T–rich (64.2% for high expression and 65.3% for low expression, compared with the 68.0 and 68.2 given above for downstream sequences) and also tend to have a higher representation of GG and CC dinucleotides than do downstream regions (fig. 2). This overrepresentation of GG and CC appears to occur at the expense of TG and CA dinucleotides and may reflect requirements for translation initiation or gene regulation.
|
The results indicate that, in general, downstream regions flanking low- and high-expression genes have similar composition patterns, but there are some significant differences between them. This suggests the possibility that transcription-coupled processes result in a slight composition difference between low- and high-expression genes, although it is also possible that differences in selective pressure on the different noncoding regions are responsible. Regardless of the underlying reason, the difference in composition means that we need to test the possibility that differences in codon-usage bias between the 2 sets of genes are simply a result of these more general composition differences.
Codon Usage of High- and Low-Expression Genes
To assess whether the variation in composition bias can account for the differences in codon usage between high- and low-expression Arabidopsis genes, context-dependent nucleotide frequency arrays were determined as described in Materials and Methods from the cumulative composition of the 100 nt downstream from high-expression genes. A second set of arrays was generated in the same manner for low-expression genes. These arrays are our estimates of the context-dependent neutral base composition and were used to predict codon usage as a function of the 3' nucleotide, also described in Materials and Methods. Expectations were generated for high- and low-expression genes using the appropriate flanking base composition, and the expected and observed codon usages were then compared.
The comparisons are shown in table 1 for the 4-fold degenerate codon groups and in table 2 for the 2-fold degenerate codon groups. In most cases, a G test shows highly significant deviations from expected codon usage (data not shown); so to assess the influence of selection, we analyzed general trends in the difference between expected and observed codon usage. To do so, we selected as a cutoff, a 25% difference between observed and expected, either 25% more or less than expected, and highlighted these codons in tables 1 and 2. These we will refer to as "overused" and "underused" codons, although the terms are not meant to imply statistical significance. What we will argue is that the overall pattern of "overused" codons provides strong evidence that composition bias cannot account for the codon usage of Arabidopsis nuclear genes. Our assertion is based largely on the general observation that preference for major codons, that is, codons that match the tRNA population, is often dependent on the 3' flanking base. As we will argue, this preference for major codons only in certain contexts requires a strong bias toward a very different nucleotide than what is predicted by the dinucleotide signature such that we are forced to explain the observation either by selective constraints on codon usage or by significant constraints across a large proportion of intergenic sites.
|
|
High-Expression Genes
We start by considering 4-fold degenerate codon groups from high-expression genes. Across a total of 32 contexts (8 codon groups and 4 flanking bases), we find that a major codon is "overused" in 25 of them (table 1), 24 of which involve a codon that is used more frequently in high-expression genes than in low-expression genes (see Wright et al. 2004
In the 2-fold degenerate codon groups (table 2), there is a discernable pattern in the high-expression genes. Across these codon groups, the NNG or NNC codon, depending on if it is an NNR or NNY codon group, is the major codon (Wright et al. 2004
), and out of the 48 contexts for the 12 codon groups, 46 have a major codon that is "overused." More importantly, it is not simply that major codon usage is greater than expected but that composition bias predicts a strong bias toward NNA or NNT in every context. However, 32 of the 46 groups noted show a bias toward NNG or NNC, not just a higher than expected frequency.
The data in both tables 1 and 2 demonstrate that context is an important factor with respect to codon usage, as has been observed in other organisms (Yarus and Folley 1985
; Shpaer 1986
; Morton and So 2000
; Fedorov et al. 2002
). If we consider table 1, 1 example is GGT, which is a major codon for GGN. GGT|G is observed more frequently than GGA|G, but GGT|A is observed less frequently than GGA|A. Another is GTT|G that is found much more frequently than GTC|G, but GTT|A is observed less frequently that GTC|A. This sort of pattern, in which the utilization of 1 codon is greater than a second codon in one context but less than the second codon in another, is common. A fairly general feature of this pattern is that when both NNT and NNC are major codons, the use of NNT is more common with a 3' G or C, but NNC is more frequent when there is a 3' A or T.
The importance of context is also apparent in table 2. In the contexts where a major codon is not utilized at a higher than expected frequency, we observe a bias toward NNT or NNA. This means that the major codon "preference" is context dependent. In particular, 9 of the 15 contexts that do not have a bias toward major codons involve a 3' flanking G. For example, we can consider the major codon GAC. When there is a 3' G, only 24.3% of the GAY codons are GAC, but when there is a 3' A, 63.0% of the GAY codons are GAC. A similar pattern is observed with CAY, AAY, TTY, AGY, and TGY. Overall, if we compare NNT and NNC usage, we observe 37.8% NNC when there is a 3' G but 75.9% NNC when there is a 3' A. The tRNA gene sequences for Arabidopsis (http://lowelab.ucsc.edu/GtRNAdb/Athal/) show that the 3' flanking base in the mRNA would essentially always interact with a U in the transfer RNA, and this interaction may be an underlying factor in the context dependency of codon usage. We should note that this general preference for the NNC codon as a function of the 3' flanking base is strikingly similar to what is observed in codon usage of chloroplast DNA from flowering plants (Morton and So 2000
), which suggests a common aspect of codon preferences.
Obviously, these context effects suggest that the 3' flanking base is an important factor to be considered when defining "major" codons in analyses of codon usage and selection. In addition, though, they are very strong evidence for selective pressure on codon usage of highly expressed genes. For example, the bias toward NNC|A codons in 2-fold degenerate groups, which is in essence a bias toward a CA dinucleotide over other NA dinucleotides, is not predicted from composition bias of flanking noncoding regions: if we look at the dinucleotide composition from the flanking regions downstream from high-expression genes, we observe 1212 GA, 2794 AA, 1805 TA, and 1298 CA. In general, we see many cases, in tables 1 and 2, of a strong bias toward certain codons in specific contexts, when this bias runs counter to what the intergenic sequence composition predicts. This pattern of a context-dependent bias toward major codons is consistent with selection on codon usage and is difficult to explain solely as a product of mutation bias.
One last point to consider here is that because silent substitutions at 2-fold degenerate sites are limited to transitions, a difference between the composition of noncoding DNA and these sites can arise by mutation bias alone if there is a strong difference in GC
AT pressure for transitions and transversions (Morton 2001
). Analyses of single nucleotide polymorphisms (SNPs) from Arabidopsis will help shed light on whether or not this is a contributing factor.
Low-Expression Genes
In the 32 contexts for 4-fold degenerate groups in low-expression genes, there are 10 in which a major codon is "overused," whereas 18 have a nonmajor codon that is "overused" (table 1). However, 15 of the latter are NCG codons, and this suggests a more general overrepresentation of CG dinucleotides in coding relative to noncoding sequences. This could be indicative of a lower methylation level of CpG in coding sequences, resulting in a lower spontaneous mutation rate away from CG dinucleotides and, consequently, a higher CG frequency in coding sequences. The overall effect is that it suggests a significant problem that can arise from using noncoding DNA to estimate neutral composition bias even when the assumption regarding a lack of selective constraints is true. This problem of CpG methylation will be discussed further below.
For the 2-fold degenerate codon groups, there are 48 contexts. In 20 of these, the major codon (NNG and NNC as noted above) is "overused," and in 16 of these 20, there is not just a higher than expected frequency but an actual bias toward NNC or NNG. Even more notable is that in 17 of the 24 contexts with a 3' A or 3' T, the major codon is "overused," with 13 being a true bias toward NNG or NNC. For example, if we compare the relative usage of TTT : TTC, we find a ratio of 3608:1956 (35.2% TTC) with a 3' G or C but a ratio of 2126:3243 (60.4% TTC) with a 3' A or T.
As noted above for high-expression genes, this context-dependent variation in codon bias strongly suggests a role for selection. One possible explanation is that, because the group of genes we define as low expression is composed of genes with a low average across tissues (Wright et al. 2004
), it might be the case that there are some genes that are highly expressed in a specific cell type or at a specific developmental time and whose codon usage is under selective constraints as a result. However, when we removed those genes with high expression in 1 cell type, the results were not changed qualitatively (data not shown). Because data from high-expression genes show that major codon preference is context dependent, if there is stronger selective pressure in certain contexts, then the effect of selection could be observed even in low-expression genes in these contexts. Thus, for the 2-fold degenerate groups, it could be that the translation of NNT|A and NNT|T codons is too inefficient or inaccurate for any gene, although the effect is not strong enough to generate an overall bias toward NNC codons in low-expression genes (see Wright et al. 2004
). The observation concerning a preference for NNC|A and NNC|T in the 4-fold degenerate groups suggests that there could be a general preference against using T|T and T|A pairings. Although we have been focusing on translation efficiency, it is most likely that several other factors, such as mRNA structural constraints and splicing efficiency (Chamary et al. 2006
), contribute to this pattern of codon usage. Most importantly, as was discussed for high-expression genes, the pattern of context-dependent bias toward certain major codons is consistent with selective pressure, not mutation bias, and indicates that even low-expression genes have significant selective constraints on codon usage.
| Conclusions |
|---|
|
|
|---|
Although we observe statistically significant compositional differences between noncoding sequences flanking low- and high-expression genes, these differences cannot account for the differences in codon usage between low- and high-expression genes. The data also show that the codon usage of high-expression genes is not consistent with the context-dependent composition patterns of noncoding DNA. The same results are obtained whether we perform the analysis using the composition UTR sequences (25 nt flanking the coding sequence) or the composition of nontranscribed intergenic sequences. In general, we find that major codon preference is dependent upon the composition of the 3' flanking base, and this is observed in both high- and low-expression genes.
We conclude that the data support the previous suggestion that selection influences the codon usage of highly expressed nuclear genes from A. thaliana (Wright et al. 2004
). Given the fact that highly expressed genes show an increased frequency of major codons (codons that match the tRNA population), this selection probably includes selective pressure for increased translation efficiency. In addition, we conclude that the data indicate a more general influence of selection on codon usage across all nuclear genes from Arabidopsis. This general influence probably includes a number of factors including translation efficiency, mRNA stability, and splicing (Chamary et al. 2006
). Even given the potential problems of using noncoding DNA to infer neutral composition bias, the context-dependent pattern of codon usage in these genes is not consistent with mutation bias alone. Most telling is the codon usage of 2-fold degenerate genes. These genes show a strong bias toward the major codons NNC (C at the third position), when the next base is A or T, in both high- and low-expression genes, but a strong bias toward NNT when the next base is a G (in low-expression genes, this bias toward NNT is also observed with a 3' C). Given the bias toward A and T in all contexts in noncoding sequences, roughly 65% A + T, a neutral explanation for this pattern of codon-usage bias would require extremely strong selective constraints across a significant proportion of noncoding sequences, something that can be tested in the future using SNP data. We consider it more likely that the data reflect selective constraints on the codon usage of a majority of nuclear genes from Arabidopsis.
Our inference about selection on codon usage across a broad range of nuclear genes in Arabidopsis is in contrast to the recent conclusion that there has been a general decrease in the effectiveness of selection in A. thaliana as a result of a shift in mating system (Bustamente et al. 2002). This decreased effectiveness, though, appears to have occurred following the divergence of A. thaliana and A. lyrata, which means that it is probable that codon bias is not at equilibrium and that our results reflect ancestral, but not current, selective pressures. Again, the utilization of SNP data from different Arabidopsis lineages should allow us to assess this possibility.
All of the conclusions that we draw are based on using intergenic sequences to infer neutral context-dependent base composition. A growing body of data indicates that there are, not surprisingly, selective constraints on some intergenic sites. However, unless selective constraints on intergenic sites in Arabidopsis are extremely strong and widespread, our general results should not be affected. None of the intergenic regions show any evidence of a bias in any general context for G or C, a bias that is observed at some third-codon positions in specific contexts. Further, this bias is stronger in highly expressed genes, although such a difference between high and low expression is not predicted from intergenic sequences. The data require either selection on codon usage or very strong constraints on noncoding DNA. One factor that does appear to have an effect is CpG methylation. As noted above, several "overused" codons in low-expression genes involve a CG dinucleotide. This could reflect the fact that our predictions about CG utilization are simply low due to a high rate of spontaneous deamination at these sites in noncoding DNA. If methylation levels are lower within coding sequences, then the CG frequency in intergenic DNA is not an accurate measure of neutral levels in coding sequences, and this would affect all dinucleotide frequencies. Although this raises a serious issue about the absolute accuracy of comparing coding and noncoding composition, it could not explain the context-dependent pattern described above, in particular, the strong bias toward NNC|A and NNC|T codons. Therefore, our general conclusions regarding selection should not be affected. Furthermore, a recent study suggested that CpG methylation may, in fact, be elevated within genes compared with intergenic sequence that is not composed of transposable elements (Tran et al. 2005
). Given our results, this would provide further evidence for selective constraint acting on synonymous sites in both high- and low-expression genes.
| Acknowledgements |
|---|
|
|
|---|
We thank Brandon Gaut in whose lab this work was initiated and 2 anonymous reviewers for numerous helpful comments. S.W. is supported by a Sloan Research Fellowship and an Natural Sciences and Engineering Research Council grant.
| Footnotes |
|---|
Jianzhi Zhang, Associate Editor
| References |
|---|
|
|
|---|
Akashi H and Eyre-Walker A. (1998) Translational selection and molecular evolution. Curr Opin Genet Dev 8:688–693.[CrossRef][ISI][Medline]
Andolfatto P. (2005) Adaptive evolution of non-coding DNA in Drosophila. Nature 437:1149–1152.[CrossRef][Medline]
Beletskii A and Bhagwat AS. (1998) Correlation between transcription and C to T mutations in the non-transcribed DNA strand. Biol Chem 379:549–551.[ISI][Medline]
Bulmer M. (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907.[Abstract]
Bustamante CD, Nielsen R, Sawyer SA, Olsen KM, Purugganan MD, Hartl DL. (2002) The cost of inbreeding in Arabidopsis. Nature 416:531–534.[CrossRef][Medline]
Campbell A, Mrazek J, Karlin S. (1999) Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. Proc Natl Acad Sci USA 96:9184–9189.
Chamary JV, Parmley JL, Hurst LD. (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7:98–108.[CrossRef][ISI][Medline]
Chiapello H, Lisacek F, Caboche M, Henaut A. (1998) Codon usage and gene function are related in sequences of Arabidopsis thaliana. Gene 209:GC1–GC38.[CrossRef][ISI][Medline]
Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA. (2000) Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res 10:1304–1306.
Duret L. (2000) tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet 16:287–289.[CrossRef][ISI][Medline]
Duret L and Mouchiroud D. (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila and Arabidopsis. Proc Natl Acad Sci USA 96:4482–4487.
Fedorov A, Saxonov S, Gilbert W. (2002) Regularities of context-dependent codon bias in eukaryotic genes. Nucleic Acids Res 30:1192–1197.
Francino MP, Chao L, Riley MA, Ochman H. (1996) Asymmetries generated by transcription-coupled repair in enterobacterial genomes. Science 272:107–109.[Abstract]
Francino MP and Ochman H. (1997) Strand asymmetries in DNA evolution. Trends Genet 13:240–245.[CrossRef][ISI][Medline]
Frazer KA, Sheehan JB, Stokowski RP, Chen X, Hosseini R, Cheng J-F, Fodor SPA, Cox DR, Patil N. (2001) Evolutionarily conserved sequences on human chromosome 21. Genome Res 11:1651–1659.
Grossman S. (1984) Elementary linear algebra. 2nd ed (Wadsworth Publishing Co, Belmont, (CA)).
Halligan DL, Eyre-Walker A, Andolfatto P, Keightley PD. (2004) Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila. Genome Res 14:273–279.
Ikemura T. (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–35.[Abstract]
Klapacz J and Bhagwat AS. (2002) Transcription-dependent increase in multiple classes of base substitution mutations in Escherichia coli. J Bacteriol 184:6866–6872.
Kliman RM and Henry AN. (2005) Inference of codon preferences in Arabidopsis thaliana. Int J Plant Sci 166:3–11.[CrossRef]
Ko CH, Brendel V, Taylor D, Walbot V. (1998) U-richness is a defining feature of plant introns and may function as an intron recognition signal in maize. Plant Mol Biol 36:573–583.[CrossRef][ISI][Medline]
Morton BR. (1997) Influence of neighboring base composition on substitutions at four-fold degenerate sites of chloroplast coding sequences. Mol Biol Evol 14:189–194.[ISI]
Morton BR. (1998) Selection on the codon bias of chloroplast and cyanelle genes in different plant and algal lineages. J Mol Evol 46:449–459.[CrossRef][ISI][Medline]
Morton BR. (2001) Selection at the amino acid level can influence synonymous codon usage: implications for the study of codon adaptation in plastid genes. Genetics 159:347–358.
Morton BR. (2003) The role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast DNA. J Mol Evol 56:616–629.[CrossRef][ISI][Medline]
Morton BR, Bi IV, McMullen MD, Gaut BS. (2006) Variation in mutation dynamics across the maize genome as a function of regional and flanking base composition. Genetics 172:569–577.
Morton BR and So BG. (2000) Codon usage in plastid genes is correlated with context, position within the gene and amino acid content. J Mol Evol 50:184–193.[ISI][Medline]
Ochman H. (2003) Neutral mutations and neutral substitutions in bacterial genomes. Mol Biol Evol 20:2091–2096.
Rose AB. (2002) Requirements for intron-mediated enhancement of gene expression in Arabidopsis. RNA 8:1444–1453.[Abstract]
Shpaer EG. (1986) Constraints on codon context in Escherichia coli genes. Their possible role in modulating the efficiency of translation. J Mol Biol 188:555–564.[CrossRef][ISI][Medline]
Tran RK, Henikoff JG, Zilberman D, Ditt RF, Jacobsen SE, Henikoff S. (2005) DNA methylation profiling identifies CG methylation clusters in Arabidopsis genes. Curr Biol 15:154–159.[CrossRef][ISI][Medline]
Wright SI, Yau CBK, Looseley M, Meyers BC. (2004) Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol 21:1719–1726.
Yarus M and Folley LS. (1985) Sense codons are found in specific contexts. J Mol Biol 182:529–540.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Jia and P. G. Higgs Codon Usage in Mitochondrial Genomes: Distinguishing Context-Dependent Mutation from Translational Selection Mol. Biol. Evol., February 1, 2008; 25(2): 339 - 351. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


