Skip Navigation


MBE Advance Access originally published online on September 28, 2007
Molecular Biology and Evolution 2007 24(12):2755-2762; doi:10.1093/molbev/msm210
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/12/2755    most recent
msm210v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Warnecke, T.
Right arrow Articles by Hurst, L. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Warnecke, T.
Right arrow Articles by Hurst, L. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Evidence for a Trade-Off between Translational Efficiency and Splicing Regulation in Determining Synonymous Codon Usage in Drosophila melanogaster

Tobias Warnecke and Laurence D. Hurst

Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom

E-mail: l.d.hurst{at}bath.ac.uk.


    Abstract
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
In Drosophila melanogaster, synonymous codons corresponding to the most abundant cognate tRNAs are used more frequently, especially in highly expressed genes. Increased use of such "optimal" codons is considered an adaptation for translational efficiency. Need it always be the case that selection should favor the use of a translationally optimal codon? Here, we investigate one possible confounding factor, namely, the need to specify information in exons necessary to enable correct splicing. As expected from such a model, in Drosophila many codons show different usage near intron–exon boundaries versus exon core regions. However, this finding is in principle also consistent with Hill–Robertson effects modulating usage of translationally optimal codons. However, several results support the splice model over the translational selection model: 1) the trends in codon usage are strikingly similar to those in mammals in which codon usage near boundaries correlates with abundance in exonic splice enhancers (ESEs), 2) codons preferred near boundaries tend to be enriched for A and avoid C (conversely those avoided near boundaries prefer C rather than A), as expected were ESEs involved, and 3) codons preferred near boundaries are typically not translationally optimal. We conclude that usage of translationally optimal codons usage is compromised in the vicinity of splice junctions in intron-containing genes, to the effect that we observe higher levels of usage of translationally optimal codons at the center of exons. On the gene level, however, controlling for known correlates of codon bias, the impact on codon usage patterns is quantitatively small. These results have implications for inferring aspects of the mechanism of splicing given nothing more than a well-annotated genome.

Key Words: splicing • codon usage bias • Hill–Robertson interference • ESE • Drosophila melanogaster


    Introduction
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
In a wide range of genomes analyzed to date, synonymous codons are not used with equal frequency despite coding for the same amino acid. Rather, codon usage is typically biased towards certain codons, reflecting a balance between mutational biases, drift, and selective forces (Ikemura 1985Go; Duret 2002Go; Bierne and Eyre-Walker 2006Go). This balance varies not only between species but also between genes within the same organism. Notably, in a variety of species, including Drosophila, some synonymous codons are used more frequently in highly expressed genes (Duret and Mouchiroud 1999Go). These preferred codons have been termed "optimal" because their use is thought to minimize the time of ribosomal occupancy and/or the rate of amino acid misincorporation relative to alternative synonymous codons (Akashi 1994Go; Duret 2002Go). Selection for rate or accuracy of translation, we refer to generically as selection on translational "efficiency." The translational efficiency hypothesis is supported by the observation that in many, often distantly related species optimal codons correspond to the most abundant cognate tRNAs (Ikemura 1985Go; Kanaya et al. 2001Go).

The genomic signature of this relationship is less pronounced in fruitfly than in some other eukaryotes (Kanaya et al. 2001Go). However, this is likely owing to weakened selection following a recent reduction in population size (Akashi 1995Go; McVean and Vieira 2001Go) rather than to qualitatively different selective forces operating on codon usage in Drosophila. Additional factors contributing to skewed codon usage are usually analyzed within this translational efficiency framework. For example, the degree of selective constraint on the encoded protein (Bierne and Eyre-Walker 2006Go), mutational biases (e.g., biased gene conversion) (Kliman and Hey 1994Go; Duret 2002Go; Bierne and Eyre-Walker 2006Go), recombination rate (Hey and Kliman 2002Go), Hill–Robertson interference (Hey and Kliman 2002Go; Bierne and Eyre-Walker 2006Go), and protein length (Duret and Mouchiroud 1999Go) are considered to correlate with or modulate selection for translationally optimal codon usage.

Need it always be the case that selection, if unrestricted, should favor the use of a translationally optimal codon? Here, we investigate one possible confounding factor, namely, the need to specify information in exons necessary to enable correct splicing. This can include binding sites in exons for serine-arginine–rich (SR) type proteins (Blencowe 2000Go). These binding sites, known as exonic splice enhancers (ESEs), are critical for the faithful removal of introns from pre-mRNA transcripts, especially in species with a complex intron–exon structure where regulated splicing may require weak splice sites (Ast 2004Go; Dewey et al. 2006Go; Garg and Green 2007Go; Ram and Ast 2007Go).

Efforts to characterize ESEs on a genome-wide scale have been made for human and mouse (Fairbrother et al. 2002Go; Fairbrother, Yeo, et al. 2004Go), zebrafish (Yeo et al. 2004Go), Caenorhabditis elegans (Robinson 2005Go), and recently Arabidopsis thaliana (Pertea et al. 2007Go). To our knowledge, a comprehensive, genome-wide survey of ESE motifs in Drosophila has yet to be undertaken. However, the available evidence suggests that they are important in Drosophila and function like those characterized in mammals. Firstly, in genes where exonic splicing regulation has been examined in some detail, notably ‘doublesex’ and ‘fruitless,’ purine-rich elements have been attributed key roles (Lynch and Maniatis 1996Go; Heinrichs et al. 1998Go; Labourier et al. 1999Go), just as they have in mammals (Blencowe 2000Go). Secondly, Drosophila ESEs interact with members of the SR protein family (Labourier et al. 1999Go; Kim et al. 2003Go), which is strongly associated with ESEs in vertebrates (Blencowe 2000Go).

With a genome-wide characterization of ESEs currently lacking in Drosophila, we use the enrichment of codons near intron–exon boundaries as a possible surrogate for the involvement of codons in splice-regulatory elements. Although patterns of enrichment may be caused by splice-related factors other than ESEs, for example, the avoidance of cryptic splice sites (Eskesen et al. 2004Go), prior evidence suggests that ESE involvement is the best predictor (Chamary and Hurst 2005aGo; Parmley et al. 2007Go).

We find that, in Drosophila, certain codons are indeed significantly enriched or avoided near intron–exon boundaries. Aside from splice-related constraints, there is, however, a qualitatively different explanation for such deviations, namely, that they reflect stronger selection for translational efficiency owing to reduced Hill–Robertson interference (for a recent explanation of the Hill–Robertson effect, see Comeron et al. 2007Go). The finding that in intronless Drosophila genes usage of translationally optimal codons is reduced in the center of the gene is consistent with such a force (Comeron and Kreitman 2002Go). As applied to patterns within exons, this model rests on the presumption that selection is weaker in introns than in exons, hence codons near intron–exon boundaries have strong selection on only one side of them, whereas those in exon cores are flanked by sites under selection in both 5' and 3' directions. In this paper, in part, we ask whether the trends we observe are better explained by selection for splicing than by Hill–Robertson effects modulating the use of translationally optimal codons.

To this end, we ask whether the trends in codon usage near intron–exon boundaries concord with those seen in species (notably mouse) in which ESEs have been described and in which ESE involvement accounts for much of the pattern of codon usage near intron–exon boundaries (Parmley and Hurst 2007Go). In addition, as codons participating in ESE motifs, characterized in some details in a number of vertebrates, were found to be A-rich and C-poor (Blencowe 2000Go; Fairbrother, Yeo, et al. 2004Go; Parmley et al. 2007Go), we ask whether preferred codons tend to be rich in A and avoid C (and conversely whether those codons avoided near boundaries are more commonly rich in C rather than A). Thirdly, we ask whether there is an incongruity between synonymous codons preferred near boundaries and those identified as translationally optimal. We report that in all 3 tests, splice control is a better explanation for the trends than Hill–Robertson effects. Finally then, we attempt to quantify to what extent, in intron-containing genes, the need to accommodate splicing-related sequence compromises optimal codon usage. To this end, we quantify the deviation from translational optimality introduced by the presence of introns and ask whether it is greater for genes with a higher proportion of sequence in the vicinity of splice sites and how it compares to known correlates of codon usage bias.


    Material and Methods
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Expression Data
Organism-wide gene expression data for adult Drosophila melanogaster were obtained from Flyatlas (www.flyatlas.org) using the FlyMean track. Unspecific hybridization and other factors can lead to transcripts being incorrectly identified as expressed when, in fact, they are not. In order to reduce the number of such false positives, especially among the set of genes expressed at low levels, only transcripts that show significant expression in at least 3 out of 4 replicates, as computed by Affymetrix software (FlyCall ≥ 3), were retained. In addition, we excluded all transcripts of genes present in the codon usage training set of Carbone et al. (2003) (141 genes, available at http://www.ihes.fr/~materials/genomes/Dmelanogaster/refset.txt) as well as all transcripts annotated as ribosomal by the Gene Ontology Consortium (GO:0005840), of which many show high homology and/or coexpression and may as a result have biased regression analyses by forming a cluster of high leverage at the upper end of the expression range.

Sequence Data
For all transcripts in the reduced Flyatlas data set, sequences with annotated intron–exon structure were downloaded from the University of California Santa Cruz genome browser (http://genome.ucsc.edu/cgi-bin/hgTables) using the Flybase gene track (April 2004 assembly). Genes were discarded in either of the following cases: 1) transcripts had no conventional start (ATG) or termination codon (TAA, TAG, TGA), 2) transcripts had an internal in-frame stop codon, 3) exonic sequence was not a multiple of 3 nt and hence unlikely to be coding for a protein product, and 4) one or more introns in the gene had other than canonical splice sites (GT–AG). Furthermore, given the analytical importance of distinguishing whether or not exonic sequence is proximal to intronic sequence and therefore possibly involved in splicing regulation, we excluded all apparently intronless transcripts (1,873) for which alternative intron-containing splice products were annotated in Flybase (119) or which had more than one exon annotated in matching RefSeq entries (11).

The final data set for which adequate and reliable information was available for both expression and sequence characteristics comprises 9,745 transcripts, including 1,703 intronless transcripts. Supplementary table 3 (Supplementary Material online) contains by-gene information about relevant sequence characteristics and codon usage biases.

Codon Abundance
Exons were trimmed to contain only full codons. First and last full codons were discarded given their known involvement in splice site consensus. For each codon separately, relative abundance near the intron–exon boundary was determined for the first 34 codon positions across all trimmed exons, separately for the 5' and 3' ends of exons (for details, see Parmley et al. 2007Go).

Codon Adaptation Index
The codon adaptation index (CAI) measures the extent to which a gene uses synonymous codons thought to be translationally favorable because they are more abundant in very highly expressed genes. Values range from 0 to 1, with 1 indicating perfect adaptation. CAI is highly correlated to some other commonly used measures of codon bias (Hey and Kliman 2002Go) and provides an accurate description of codon bias even for relatively short sequences (Comeron and Aguade 1998Go). CAI for full and partial coding sequences was computed using the codonW program (J. Peden) supplying D. melanogaster–specific CAI adaptiveness values as determined by Carbone et al. (2003) (http://www.ihes.fr/~materials/genomes/Dmelanogaster/wv.txt).

Conflict Resolution Index
Conflict resolution index (CRI) was computed as follows. Codons were assigned to 1 of 3 classes, "favoring translation efficiency" (coded c = 1; 19 codons, black background in fig. 2), "favoring splicing regulation" (coded c = 2; 18 codons, white background), or "uninformative" (ignored; 22 codons, gray background). Each informative codon was assigned a weight representing the conflict-relevant degeneracy of the associated amino acid. For example, the 4-fold degenerate (d = 4) amino acid proline (P) has 1 codon that resolves the conflict in favor of translation efficiency (CCC, s = 1) and 2 codons preferred near the boundary and hence assumed to resolve the conflict in favor of ESEs (CCA, CCT, s = 2); the weight for CCC is then simply taken as the ratio of degeneracy over the number of solutions in favor of the demand under consideration, that is, for CCC: 4/1 = 4 and for CCA or CCT: 4/2 = 2. CRI is then computed as the sum of weighted codes divided by the sum of weights over all informative codons.


Figure 2
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Information on optimal codons and relative codon abundance near intron–exon boundaries for all degenerately coded amino acids in Drosophila melanogaster. A codon is marked as preferred or avoided near the intron–exon boundary when there is a significant correlation between distance from the boundary and relative codon abundance after correction for multiple testing (see supplementary table 3, Supplementary Material online). Significant correlations obtained using a random 50% sample of the original set of genes are marked (+). Note that ‘preferred’ codons and optimal codons form almost perfectly exclusive groups and that, moreover, optimal codons are frequently classified as ‘avoided.’ See Material and Methods for the relevance of differential codon shading.

 
Supplementary table 4 (Supplementary Material online) contains a full list of codes and weights for all informative codons.


    Results
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Trends in Codon Usage near Intron–Exon Boundaries Are Well Conserved and Reflect ESE Nucleotide Content
In human and mouse, relative amino acid abundances change as one approaches the intron–exon boundary, and these changes are well predicted by the involvement of the underlying codons in ESEs (Parmley et al. 2007Go). A subsequent analysis revealed that equivalent patterns exist in Drosophila exons and that amino acids preferred or avoided near intron–exon junctions correspond almost perfectly to those observed in vertebrates (Warnecke T, Parmley JL, Hurst LD, unpublished data). We now confirm that this correspondence extends to the codon level, just as it does in mammals (Parmley and Hurst 2007Go).

We fitted linear regression models to describe each trend in codon abundance (relative codon usage vs. distance from intron–exon boundary). A negative slope indicates a codon preferred near boundaries. We then compared the slope coefficients (ß), as a measure of both the direction and strength of preference trends, for all degenerate codons between Drosophila and vertebrates. We found them to be very highly correlated (fig. 1). This indicates a striking level of conservation of patterns of codon usage across metazoa in the vicinity of exon–intron boundaries. Moreover, given that vertebrate patterns can be accounted for in large part by the need to specify SR-binding motifs (i.e., ESEs) (Parmley and Hurst 2007Go; Parmley et al. 2007Go), this strongly suggests that ESE coding might also explain abundance trends in the vicinity of intron–exon boundaries in Drosophila.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Relationship between slope coefficients (ß) of linear regression models fitted to codon abundance patterns across (A) the 5' ends and (B) the 3' ends of mouse and Drosophila internal exons, respectively. Each data point represents one degenerate codon. Negative values indicate that the respective codon is relatively more frequent near the intron–exon boundary. The steeper the negative/positive slope the more dramatic the preference/avoidance trend (for details on particular codons, see supplementary table 1 [Supplementary Material online]). Slope coefficients, taken to indicate similar preference or avoidance patterns, are highly correlated (ßDrosophila ~ ßhuman 5': Spearman's r = 0.74, 3': r = 0.77; ßDrosophila ~ ßmouse 5': r = 0.74, 3': r = 0.71; all: P < 2.2E–16, N = 59), that is, similar codons are preferred or avoided in mouse and Drosophila at exon termini, respectively.

 
Are the codons preferred near boundaries rich in A and depleted for C as expected if owing to selection on ESEs? We find this to be so: codons significantly enriched near the boundary are uncommonly rich in A (43% of nucleotides) but depleted in C (10%), whereas the reverse is observed in codons that are avoided (A: 8%, C: 27%; chi-square test statistic = 7.6, P < 0.006), a pattern also characteristic of ESEs in vertebrates (Parmley et al. 2007Go).

Translationally Optimal Codons Are Not Splice Optimal Codons
Were the translational selection model correct, we should expect that those codons preferred near splice sites should, owing to weaker Hill–Robertson interference, be the translationally optimal ones, just as such codons are enriched at the periphery of intronless genes. Are then codons favored near boundaries translationally optimal? Figure 2 shows significant codon abundance trends near the boundary across internal exons in Drosophila (applying Bonferroni-corrected significance thresholds; see supplementary table 1 (Supplementary Material online) for statistics for boundary-proximal trends for all codons) alongside information on which codon is translationally optimal for any one degenerately coded amino acid. We find that, with the exception of CGT, codons putatively involved in ESEs (preferred near the boundary) are never translationally optimal codons. Furthermore, translationally optimal codons are frequently avoided near the boundary. It follows that a majority of synonymous codons (37/59 = 62.71%) can be reconciled with either exonic splicing regulation or translation efficiency but not both, whereas only a single codon caters for both needs (CGT: 1/59 = 1.69%), with the remaining codons not attributable to either group (21/59 = 35.59%).

Adaptation for Translation Efficiency Is Lower in Exonic Sequence Flanking Introns
Given the above results, we should expect that exon cores should be enriched in translationally optimal codons compared with exon flanks. Moreover, we should expect that the difference between cores and flanks should be more marked for highly expressed genes. To address these issues, we compiled a set of 9,745 nuclear Drosophila genes for which reliable expression and sequence information were available (see Material and Methods). The majority of primary transcripts (8,042/9,745 = 83%) are interrupted by at least one intron. In the absence of comprehensive protein abundance data for Drosophila, we approximated translation levels by transcription levels in adult fruitfly as determined by microarray analysis. We use the CAI (Sharp and Li 1987Go) as a measure of adaptation for translational efficiency (see Material and Methods).

As previously reported (Duret and Mouchiroud 1999Go; Bierne and Eyre-Walker 2006Go), quantitative differences in expression correlates with CAI, explaining approximately 9% of the variance in CAI (supplementary fig. 1, Supplementary Material online). To test the splice constraint model, we examined the difference in CAI between sequence in the center of exons (cores) and sequence proximal to introns (flanks) for individual genes ({Delta}CAI = (CAIcore – CAIflank)/((CAIcore + CAIflank)/2)). Flanks were defined as sequence within 48 nt of an intron–exon boundary. This figure was chosen as the vast majority of functional ESEs can be assumed to fall within this region, especially because we know that ESEs function in a position-dependent manner and catalyze splicing less efficiently with increasing distance from the splice site (Graveley et al. 1998Go).

For each gene, we concatenated all flanks and all cores, respectively, trimmed so that they only contained complete codons. Only genes with a minimum of 192 nt in each category were considered in analyses relating to flank/core differences. This effectively excludes genes with less than 2 introns (48 x 2 x [N = 2] = 192, N being the number of introns) but was considered prudent to avoid misleading CAI values for short sequences. We find that, as expected, for the average gene, adaptation towards translation efficiency is higher in exon cores (median [{Delta}CAI] = 0.06993, P = 0, Wilcoxon signed-rank test; N = 5,529). The deviation is even stronger considering individual internal exons (≥192 nt), regardless of whether we define cores to be the total exonic sequence minus flanks (≥96 nt; median [{Delta}CAI] = 0.073, P = 0, Wilcoxon signed-rank test; N = 12,026) or the center-most portion of an exon of equal length to the flanks (=96 nt; median [{Delta}CAI] = 0.086, P = 0, Wilcoxon signed-rank test; N = 12,026).

Also as predicted, the difference between cores and flanks is more pronounced in highly expressed genes (Spearman's r = 0.04986, P = 0.0002, N = 5,529), albeit marginally so, suggesting that the leverage of selection to produce translationally well-adapted sequence is somewhat lower in regions flanking intron–exon boundaries.

The above results, although certainly supportive of the role of selection for splice efficiency near intron–exon boundaries, fail to explicitly consider the dual demands on selection for translationally optimal codons and for splice optimal codons. To examine this, we developed the CRI to measure to what extent degenerate amino acids in a given sequence are specified by either splice optimal or translationally optimal codons (see Material and Methods). CRI values closer to 1 indicate that there is a greater tendency to encode amino acids with translationally optimal codons. In the current analysis, we examined intragenic differences so that controlling for regional nucleotide background was not considered imperative.

When we computed gene-specific differences in conflict resolution between exon cores and flanks ({Delta}CRI) for the same set of genes (N = 5,529), we found that, on average, exon cores have lower CRI values (median [{Delta}CRI] = –0.0285, P = 0, Wilcoxon signed-rank test; N = 5,529) indicating, as expected, that the conflict is resolved in favor of translation efficiency more frequently than in exon flanks. We obtained qualitatively equivalent results when we determined codon abundance trends from a random 50% sample of genes (indicated in parentheses in fig. 2) and use those trends to calculate CRI in the remaining 50% of genes.

Like {Delta}CAI, {Delta}CRI shows a weak association with expression in the expected direction (Spearman's r = –0.0385, P = 0.004, N = 5,592). These results support the conclusions made on the basis of {Delta}CAI values but tie them more cogently to both conflicting coding demands. Redefining flanks to be shorter only strengthens the results (flank: 21 nt, minimum concatenated flank 84 nt: {Delta}CAI = 0.109, {Delta}CRI = –0.0452, N = 5,498; flank: 30 nt, minimum concatenated flank 120 nt: {Delta}CAI = 0.091, {Delta}CRI = –0.038; N = 5,809; all P << 0.0001), presumably because shorter flanks can be expected to have higher average ESE density (Fairbrother, Holste, et al. 2004Go).

Genes with a Higher Proportion of Coding Sequence near the Boundary Exhibit Lower CAI, but the Effect Is Weak
The above results all strongly argue against the translational selection/Hill–Robertson model and for the splice constraint model as an explanation for altered codon usage near exon–intron boundaries in Drosophila. Assuming this, we can then ask how much selection for translationally optimal codons is underestimated if a gene has introns. Making the assumption that CAI in core regions more adequately reflects the degree to which codon usage has been optimized for translation efficiency, we can estimate the error introduced when looking at the entire coding sequence of a gene. Figure 3 plots the proportional deviation of core CAI from whole-gene CAI. The median of the distribution is shifted to the left (median = –0.0151, P = 0, Wilcoxon signed-rank test; N = 5,529) suggesting that whole-gene estimates of CAI will on average underestimate true adaptation by 1.5% in comparison to intronless genes where CAIcore = CAIwhole-gene. Thus, the average effect of eliminating exon flanks when calculating CAI is very modest in quantitative terms. However, for an appreciable proportion of genes, CAI is underestimated rather more substantially (see fig. 3).


Figure 3
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Distribution of proportional deviations of CAIcore from CAIwhole-gene for Drosophila genes with a minimum of 192 nt in concatenated flank as well as core regions (N = 5,529). The dashed line indicates the median (median deviation = –0.0151, P = 0, Wilcoxon signed-rank test). Dotted lines and associated labels indicate that CAIcore values for 10% (2.5%) of genes are at least 9% (16%) larger than CAI values derived under inclusion of sequence flanking intron–exon boundaries.

 
The above results suggest that genes with a high proportion of coding sequence near (e.g., within 50 nt) intron–exon boundaries should, other things being equal, show less optimal adaptation for translational efficiency. But, on the gene level, how strong is any such effect compared with other predictors of CAI? Given that the proportion of sequence near the boundary (proportion of sequence within 50 nt of an intron–exon boundary [Prop50]) is correlated to known predictors of codon usage bias, notably protein length (Spearman's r = –0.5609, P = 0, N = 5,529), often in a nonlinear fashion, we employed an ordinal logistic regression model to tease out any independent contribution of Prop50.

CAIwhole-gene values were partitioned into bins containing an equal number of genes and used as the dependent variable. Prop50 was entered alongside other variables (table 1) as a potential predictor. The variance explained by such a model is necessarily small because we lose prodigious amounts of information by considering ordinal bins. However, we can nonetheless gain an insight into whether Prop50 makes an independent contribution, its relative size and direction. We recover (in order of relative contribution) expression level, protein length, total length of intronic sequence (compare Comeron and Kreitman 2002Go), and also Prop50 as independent predictors of CAI. Table 1 contains the results of a mixed stepwise ordinal logistic regression model using 10 bins, but the results are robust for a range of bin sizes (supplementary table 2, Supplementary Material online).


View this table:
[in this window]
[in a new window]

 
Table 1 Results from an Ordinal Logistic Regression (10 Bins, Stepwise Mixed Parameter Selection)

 
The relative contribution of Prop50 is small, consistently less than 5% of the variance explained by expression level, but significant and in the expected direction, that is, genes with higher proportion of sequence near the boundary show lower CAI. The number of introns makes no independent contribution when Prop50 is included but features among the significant predictors when Prop50 is not considered (data not shown). We also find a positive correlation between CAI and the total length of intronic sequence, which might be explained by Hill–Robertson effects, with long interspersed introns reducing selection interference between loci within the same gene (Comeron and Kreitman 2002Go).

Might Stronger Hill–Robertson Effects near Intron–Exon Junctions Explain Observed Trends?
In drawing conclusions about the relative importance of splice-related selection over selection on translational efficiency in determining codon usage near intron–exon boundaries, we make the assumption that interference is weaker in coding regions flanking introns than in exon cores. The inverse scenario, namely, that Hill–Robertson interference is stronger in sites flanking introns, would provide an alternative explanation of reduced codon bias at intron–exon junctions but appears unparsimonious for 3 reasons.

First, although there is evidence for Drosophila intronic sequence to be frequently under greater selective constraint than synonymous sites (Andolfatto 2005Go), we would still expect coding sequence, composed to two-thirds of typically much more highly constrained nonsynonymous sites (Andolfatto 2005Go), to exhibit higher levels of interference. This expectation is confirmed by empirical evidence from Drosophila that the presence of intronic sequence does in fact ameliorate rather then intensify Hill–Robertson interference (Comeron and Kreitman 2002Go).

Second, such a model fails, for example, to explain why the observed trends should both match those observed in mice and accord with the predicted overrepresentation of A and underrepresentation of C. Finally, the model is inconsistent with data from long exons. If introns impose greater Hill–Robertson interference than exons, then we expect the core regions of very large exons to show the greatest difference in CAI compared with exon flanks, as they would be most distant from the strongly interfering sites. By contrast, if coding sequence imposes stronger interference, we expect core parts of long exons to show lower CAI and less difference between center and flanks. Analysis of long individual exons (upper 5% of exon length distribution equivalent to exons longer than 1,218 nt) supports the second possibility: exon cores show only very weakly higher codon adaptation (median [{Delta}CAI] = 0.01, P = 0.028, Wilcoxon signed-rank test; N = 1,070) and the difference disappears when defining cores as centrally located sequence of the same length as the flanking regions (=96 nt, median [{Delta}CAI] = –0.003, P = 0.683, Wilcoxon signed-rank test; N = 1,070). We conclude that our assumption of weaker Hill–Robertson interference in exon flanks is robust.


    Discussion
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Selection to use translationally optimal codons is phylogenetically widespread but heterogeneous within genomes and even within individual genes, reflecting a complex interplay of neutral and selective forces. In addition, it has become increasingly apparent that selection on synonymous sites is as mechanistically diverse as it is frequent (Chamary and Hurst 2005aGo, 2005bGo; Chamary et al. 2006Go; Resch et al. 2007Go). Indeed, we are not the first to point out that the presence of multiple selection pressures can lead to conflicts about which synonymous codon to use. For example, the need to encode ribosome-binding motifs has been shown to bring about translationally suboptimal codon choice in Escherichia coli (Smith and Eyre-Walker 2001Go). Likewise, Carlini et al. (2001)Go showed for some highly transcribed Drosophila genes that optimal codons are avoided because they would generate adverse mRNA secondary structures (Carlini et al. 2001Go). Furthermore, 5' and 3' regions of genes can show markedly reduced frequencies of optimal codons, likely owing to the presence of regulatory elements (Qin et al. 2004Go).

Similarly, aside from splice-related selection that we have described here, several other forces may contribute to intragenic variation in codon usage. Qin et al. (2004)Go showed for some prokaryotes and budding yeast that codon usage bias has a tendency to increase towards the 3' end of a gene. This is consistent with purifying selection against nonsense errors, which are more costly the more partial protein has already been produced (Bulmer 1988Go; Eyre-Walker and Bulmer 1993Go). Systematic intragenic variation is also associated with differences in domain-specific functional importance of amino acid residues (Lin et al. 2003Go), trinucleotide repeats (Desai et al. 2004Go), and the origin and differential expression history of gene parts (chimeric jingwei gene in Drosophila) (Zhang et al. 2005Go). That participation of sequence in alternative or constitutive exons (Iida and Akashi 2000Go) also correlates with codon usage may reflect expression-related selection or splice-related selection.

These findings and the current study highlight that, to understand both intra- and interlocus variation in codon usage, we need to be aware that competing demands on synonymous sites exist and that selection can modify codon usage on a very fine spatial scale. Codon bias in a larger sequence is unlikely to be the result of forces acting homogeneously across the sequence range but rather constitutes the combined effect of regional sequence characteristics and locally resolved conflicting selection pressures.

A further important corollary of our work is that one should not extrapolate findings from single-exon genes to single exons within genes. Although for single-exon genes, codon usage bias in Drosophila follows a U-shaped trajectory, considered to be owing to Hill–Robertson interference (Comeron and Kreitman 2002Go; Qin et al. 2004Go; Comeron and Guthrie 2005Go), the opposite is true in individual exons. Although Hill–Robertson forces might still be present (they may indeed make selection on splice efficiency less subject to interference), they do not leave their mark as an enrichment in translationally optimal codons in the vicinity of intron–exon boundaries.

That splice-related selection dominates over translational selection at the flanks of exons has at least 2 further important implications. First, attempts to ascertain what sequence functions as ESEs are typically labor intensive and require a considerable amount of experimentation. If we assume that the patterns in codon usage in the vicinity of intron–exon boundaries reflect selection for preservation of ESEs, rather than selection for translationally optimal codons, this opens up the possibility of inferring the sequences that have a high probability of functioning as ESE, given nothing more than a well-annotated genome. Those codons with negative slopes are more likely to be involved, those with positive slopes less likely. Translating this possibility into a robust method is beyond the scope of this paper and is left to future work.

Second, given that Drosophila's patterns of codon usage near intron–exon boundaries correlates so well with that in mammals, inference from sequence alone can be drawn as to whether a species uses SR proteins bound to ESEs in the splicing process. If we find the same A-rich and C-poor codons preferred near boundaries, we may, with no more information, conclude that the species in question employs SR protein–based mechanisms for intron removal.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary tables 1–4 and figure 1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
T.W. is funded by the Medical Research Council, United Kingdom.


    Footnotes
 
Dan Graur, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Material and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics (1994) 136:927–935.[Abstract]

    Akashi H. Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila DNA. Genetics (1995) 139:1067–1076.[Abstract]

    Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature (2005) 437:1149–1152.[CrossRef][Medline]

    Ast G. How did alternative splicing evolve? Nat Rev Genet (2004) 5:773–782.[Web of Science][Medline]

    Bierne N, Eyre-Walker A. Variation in synonymous codon use and DNA polymorphism within the Drosophila genome. J Evol Biol (2006) 19:1–11.[CrossRef][Web of Science][Medline]

    Blencowe BJ. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci (2000) 25:106–110.[CrossRef][Web of Science][Medline]

    Bulmer M. Codon usage and intragenic position. J Theor Biol (1988) 133:67–71.[CrossRef][Web of Science][Medline]

    Carbone A, Zinovyev A, Képès F. Codon Adaptation Index as a measure for dominating codon bias. Bioinformatics (2003) 19:2005–2015.[Abstract/Free Full Text]

    Carlini DB, Chen Y, Stephan W. The relationship between third-codon position nucleotide content, codon bias, mRNA secondary structure and gene expression in the drosophilid alcohol dehydrogenase genes Adh and Adhr. Genetics (2001) 159:623–633.[Abstract/Free Full Text]

    Chamary JV, Hurst LD. Biased codon usage near intron-exon junctions: selection on splicing enhancers, splice-site recognition or something else? Trends Genet (2005a) 21:256–259.[CrossRef][Web of Science][Medline]

    Chamary JV, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol (2005b) 6:R75.[CrossRef][Medline]

    Chamary J-V, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet (2006) 7:98–108.[CrossRef][Web of Science][Medline]

    Comeron JM, Aguade M. An evaluation of measures of synonymous codon usage bias. J Mol Evol (1998) 47:268–274.[CrossRef][Web of Science][Medline]

    Comeron JM, Guthrie TB. Intragenic Hill-Robertson interference influences selection intensity on synonymous mutations in Drosophila. Mol Biol Evol (2005) 22:2519–2530.[Abstract/Free Full Text]

    Comeron JM, Kreitman M. Population, evolutionary and genomic consequences of interference selection. Genetics (2002) 161:389–410.[Abstract/Free Full Text]

    Comeron JM, Williford A, Kliman RM. The Hill-Robertson effect: evolutionary consequences of weak selection and linkage in finite populations. Heredity (2007) doi: 10.1038/sj.hdy.6801059.

    Desai D, Zhang K, Barik S, Srivastava A, Bolander ME, Sarkar G. Intragenic codon bias in a set of mouse and human genes. J Theor Biol (2004) 230:215–225.[CrossRef][Web of Science][Medline]

    Dewey CN, Rogozin IB, Koonin EV. Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics (2006) 7:311.[CrossRef][Medline]

    Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev (2002) 12:640–649.[CrossRef][Web of Science][Medline]

    Duret L, Mouchiroud D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, Arabidopsis. Proc Natl Acad Sci USA (1999) 96:4482–4487.[Abstract/Free Full Text]

    Eskesen ST, Eskesen FN, Ruvinsky A. Natural selection affects frequencies of AG and GT dinucleotides at the 5’ and 3’ ends of exons. Genetics (2004) 167:543–550.[Abstract/Free Full Text]

    Eyre-Walker A, Bulmer M. Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res (1993) 21:4599–4603.[Abstract/Free Full Text]

    Fairbrother WG, Holste D, Burge CB, Sharp PA. Single nucleotide polymorphism-based validation of exonic splicing enhancers. PLoS Biol (2004) 2:E268.[CrossRef][Medline]

    Fairbrother WG, Yeh RF, Sharp PA, Burge CB. Predictive identification of exonic splicing enhancers in human genes. Science (2002) 297:1007–1013.[Abstract/Free Full Text]

    Fairbrother WG, Yeo GW, Yeh R, Goldstein P, Mawson M, Sharp PA, Burge CB. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res (2004) 32:W187–W190.[Abstract/Free Full Text]

    Garg K, Green P. Differing patterns of selection in alternative and constitutive splice sites. Genome Res (2007) 17:1015–1022.[Abstract/Free Full Text]

    Graveley BR, Hertel KJ, Maniatis T. A systematic analysis of the factors that determine the strength of pre-mRNA splicing enhancers. EMBO J (1998) 17:6747–6756.[CrossRef][Web of Science][Medline]

    Heinrichs V, Ryner LC, Baker BS. Regulation of sex-specific selection of fruitless 5' splice sites by transformer and transformer-2. Mol Cell Biol (1998) 18:450–458.[Abstract/Free Full Text]

    Hey J, Kliman RM. Interactions between natural selection, recombination and gene density in the genes of drosophila. Genetics (2002) 160:595–608.[Abstract/Free Full Text]

    Iida K, Akashi H. A test of translational selection at ‘silent’ sites in the human genome: base composition comparisons in alternatively spliced genes. Gene (2000) 261:93–105.[CrossRef][Web of Science][Medline]

    Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol (1985) 2:13–34.[Abstract]

    Kanaya S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T. Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol (2001) 53:290–298.[CrossRef][Web of Science][Medline]

    Kim S, Shi H, Lee DK, Lis JT. Specific SR protein-dependent splicing substrates identified through genomic SELEX. Nucleic Acids Res (2003) 31:1955–1961.[Abstract/Free Full Text]

    Kliman RM, Hey J. The effects of mutation and natural-selection on codon bias in the genes of Drosophila. Genetics (1994) 137:1049–1056.[Abstract]

    Labourier E, Allemand E, Brand S, Fostier M, Tazi J, Bourbon HM. Recognition of exonic splicing enhancer sequences by the Drosophila splicing repressor RSF1. Nucleic Acids Res (1999) 27:2377–2386.[Abstract/Free Full Text]

    Lin K, Tan SB, Kolatkar PR, Epstein RJ. Nonrandom intragenic variations in patterns of codon bias implicate a sequential interplay between transitional genetic drift and functional amino acid selection. J Mol Evol (2003) 57:538–545.[CrossRef][Web of Science][Medline]

    Lynch KW, Maniatis T. Assembly of specific SR protein complexes on distinct regulatory elements of the Drosophila doublesex splicing enhancer. Genes Dev (1996) 10:2089–2101.[Abstract/Free Full Text]

    McVean GAT, Vieira J. Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila. Genetics (2001) 157:245–257.[Abstract/Free Full Text]

    Parmley JL, Hurst LD. Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol Biol Evol (2007) 24:1600–1603.[Abstract/Free Full Text]

    Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD. Splicing and the evolution of proteins in mammals. PLoS Biol (2007) 5:e14.[CrossRef][Medline]

    Pertea M, Mount SM, Salzberg SL. A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thaliana. BMC Bioinformatics (2007) 8:159.[CrossRef][Medline]

    Qin H, Wu WB, Comeron JM, Kreitman M, Li WH. Intragenic spatial patterns of codon usage bias in prokaryotic and eukaryotic genomes. Genetics (2004) 168:2245–2260.[Abstract/Free Full Text]

    Ram O, Ast G. SR proteins: a foot on the exon before the transition from intron to exon definition. Trends Genet (2007) 23:5–7.[CrossRef][Web of Science][Medline]

    Resch AM, Carmel L, Marino-Ramirez L, Ogurtsov AY, Shabalina SA, Rogozin IB, Koonin EV. Widespread positive selection in synonymous sites of mammalian genes. Mol Biol Evol (2007) 24:1821–1831.[Abstract/Free Full Text]

    Robinson RM. Splicing signals in Caenorhabditis elegans: candidate exonic splicing enhancer motifs (2005) Washington (DC): University of Washington.

    Sharp PM, Li WH. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res (1987) 15:1281–1295.[Abstract/Free Full Text]

    Smith NG, Eyre-Walker A. Why are translationally sub-optimal synonymous codons used in Escherichia coli? J Mol Evol (2001) 53:225–236.[CrossRef][Web of Science][Medline]

    Yeo G, Hoon S, Venkatesh B, Burge CB. Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proc Natl Acad Sci USA (2004) 101:15700–15705.[Abstract/Free Full Text]

    Zhang J, Long M, Li L. Translational effects of differential codon usage among intragenic domains of new genes in Drosophila. Biochim Biophys Acta (2005) 1728:135–142.[Medline]

Accepted for publication September 24, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
K. Zeng and B. Charlesworth
Estimating Selection Intensity on Synonymous Codon Usage in a Nonequilibrium Population
Genetics, October 1, 2009; 183(2): 651 - 662.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
T. Zhou, M. Weems, and C. O. Wilke
Translationally Optimal Codons Associate with Structurally Sensitive Sites in Proteins
Mol. Biol. Evol., July 1, 2009; 26(7): 1571 - 1580.
[Abstract] [Full Text] [PDF]


Home page
Gen Biol EvolHome page
V. L. Bauer DuMont, N. D. Singh, M. H. Wright, and C. F. Aquadro
Locus-Specific Decoupling of Base Composition Evolution at Synonymous Sites and Introns along the Drosophila melanogaster and Drosophila sechellia Lineages
Gen Biol Evol, June 22, 2009; 2009(0): 67 - 74.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. Haerty and B. Golding
Similar Selective Factors Affect Both between-Gene and between-Exon Divergence in Drosophila
Mol. Biol. Evol., April 1, 2009; 26(4): 859 - 866.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
U. Friberg and W. R. Rice
Cut Thy Neighbor: Cyclic Birth and Death of Recombination Hotspots via Genetic Conflict
Genetics, August 1, 2008; 179(4): 2229 - 2238.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/12/2755    most recent
msm210v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Warnecke, T.
Right arrow Articles by Hurst, L. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Warnecke, T.
Right arrow Articles by Hurst, L. D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?