MBE Advance Access originally published online on May 9, 2007
Molecular Biology and Evolution 2007 24(8):1744-1751; doi:10.1093/molbev/msm093
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Dual Modes of Natural Selection on Upstream Open Reading Frames
Microbial Analysis Group, Broad Institute of MIT and Harvard, Cambridge, Massachusetts
E-mail: neafsey{at}broad.mit.edu.
| Abstract |
|---|
|
|
|---|
Upstream open reading frames (uORFs) are common features of eukaryotic genes, occurring in 10%–25% of 5' leader sequences. Upstream ORFs that have been subjected to experimental analysis have been generally found to decrease translational efficiency of the downstream coding sequence. Previous investigations of uORFs in mammals and yeast have detected uORFs conserved over long evolutionary distances, prompting speculation about the nature and cause of the natural selection underlying such conservation. We have analyzed uORFs in the basidiomycetous fungal pathogen Cryptococcus neoformans to discern the properties of this purifying selection. We find that uORFs in the Cryptococcus species complex are conserved at twice the expected rate, and we report 122 uORFs that are conserved among all four sequenced Cryptococcus strains. A significantly greater proportion of uORF losses occur via direct mutation to the uORF start codon than expected. This observation suggests that mutational disruption of a uORF that leaves the start codon intact may be selectively disadvantageous, perhaps because of the risk of premature translation initiation. Accounting for this constrained mode of loss and comparing the relative conservation of uORFs between the 5' leader and control sequences enables us to calculate that at least a third of uORFs may be conserved for their effects on translational efficiency. The remaining fraction may be conserved either by chance or as a result of selective pressure to prevent premature translation initiation from the uORF start codon. We find that the majority of conserved uORFs do not exhibit codon usage bias or conservation at the amino acid level, and therefore they do not likely encode bioactive peptides. Our analysis suggests that uORFs are an important and underappreciated mechanism of post-transcriptional gene regulation in eukaryotes.
Key Words: uORF uAUG conservation translation Cryptococcus
| Introduction |
|---|
|
|
|---|
Microarrays have given the biological community abundant genome-wide data on rates of DNA transcription. The relative ease with which microarray data can now be acquired should not obscure the fact that transcription is not synonymous with expression. Indeed, there is growing evidence of significant variation in mRNA transcript half-life (Wang et al. 2002
Short open reading frames in the 5' leader sequence of genes called upstream open reading frames (uORFs) are known to affect the translational efficiency of many eukaryotic genes (Morris and Geballe 2000
; Meijer and Thomas 2002
; Vilela and McCarthy 2003
). Upstream ORFs are common genomic features, with estimates of uORF incidence in mammalian genes ranging as high as 25% (Crowe, Wang, and Rothnagel 2006
) and 10%–22% of fungal genes (Galagan et al. 2005). Although some uORFs may augment expression by obscuring other cis-acting inhibitory elements (Geballe and Sachs 2000
), most experimentally tested eukaryotic uORFs are translational repressors. Upstream ORFs have been shown to affect translational efficiency negatively through a variety of means, including ribosome-blocking by the encoded peptide, ribosome stalling at the uORF termination codon, induction of the nonsense-mediated decay (NMD) pathway, and failure of the ribosome to re-initiate at the genic translation start site after disengaging from the uORF (Gaba et al. 2001
). Upstream ORFs that have been experimentally tested through cell-free translation assays or other means have been found to decrease the rate of translation up to 20-fold (Hinnebusch 2005
), although some uORFs appear to have little impact, or a variable impact, on translation rates (e.g., Wang and Rothnagel 2004
).
In accordance with the scanning model of translation initiation (Kozak 1994
), it has been suggested that some uORFs may be conserved to prevent deleterious premature translation initiation from upstream AUG (uAUG) triplets (Iacono, Mignone, and Pesole 2005
; Lynch, Scofield, and Hong 2005
; Lynch 2006
). Premature translation initiation leading to genic read-through would, at best, add extraneous peptides to the N-terminus of the encoded protein if the uAUG were in the same reading frame as the genic ORF, and, at worst, it would create a frameshift-induced nonsense mutation and entirely eliminate translation of the genic ORF. In this latter case, even if the uORF decreases the translation rate of the adjacent genic sequence, the phenotypic effect may be less severe than premature translation initiation, which results in the ribosome's reading through the genic translation start site. This hypothesis is supported by the observation that uAUGs are significantly under-represented in 5' leader sequences in mammals, yeast, and prokaryotes (Saito and Tomita 1999; Hahn et al. 2003; Churbanov et al. 2005
; Iacono, Mignone, and Pesole 2005
).
We have investigated the genomic distribution and conservation of uORFs in four recently sequenced strains of the fungal pathogen Cryptococcus neoformans. The Cryptococcus system is well suited for the analysis of uORF evolution. Phylogenetic analysis indicates a total synonymous divergence among strains comparable to that observed between human and mouse, meaning that the sequenced strains have diverged just enough to exhibit turnover in their uORF complements, but not so much that it isn't possible to accurately align their 5' leader sequences. In addition, approximately 23,000 full-length cDNAs have been sequenced for one of the strains (Loftus et al. 2005), permitting conservative estimation of the minimum extent of thousands of 5' leader and 3' trailer sequences.
We find that although uORFs are less common in 5' leader sequences than expected by chance, they are conserved at twice the expected rate. In addition, we report that uORFs in Cryptococcus exhibit little evidence of selection for length, codon bias, or amino acid content, and they are most likely conserved only at the level of the open reading frame. Analysis of the nature of the mutations by which uORFs have been lost in individual lineages allows us to estimate that although many uORFs may be conserved to insulate uAUGs and prevent premature translation initiation, at least a third of conserved uORFs are maintained because of their impact on translation efficiency. These observations suggest that uORFs are a widespread and important mechanism of post-transcriptional regulation in eukaryotes.
| Methods |
|---|
|
|
|---|
Genome Alignment and 5' Leader Mapping
The genome assemblies of four strains of C. neoformans were obtained from the Web sites of the sequencing centers that produced them (strain JEC21: TIGR; strain WM276: Michael Smith Genome Center; strains H99 and R265: Broad Institute). Whole genome alignments were created using a multistep process, with strain JEC21 as a reference. First, pairwise alignments between JEC21 and the other sequenced strains were created using PatternHunter (Ma, Tromp, and Li 2002
Analysis of uORF Incidence and Conservation
For the purposes of this analysis, a uORF was defined as an AUG triplet followed by at least one intervening codon and a stop codon. Upstream ORFs were permitted to overlap each other, and they were either contained entirely in the 5' leader sequence or allowed to overlap the downstream coding ORF by a single base pair. Upstream ORFs were considered to be conserved if, in the multiple alignment of orthologous leader sequences, all strains exhibited a start codon and a stop codon in the same position, and those start and stop codons were in the same frame relative to each other. Losses were inferred when a uORF was conserved in all strains but one, and gains were inferred when a uORF was present in only a single strain. This simple mode of parsimony inference is likely to underestimate the absolute loss/gain rate ratio because of cases where parallel loss events in multiple lineages are mistaken for gains, but it does not influence the conclusions we draw from comparing the relative rates of loss/gain observed between the 5' leader sequence and control sequences or the relative rates of different modes of uORF loss and gain.
Modes of uORF loss were divided into several categories for analysis. Mutations that disrupted the AUG start codon were tallied separately from mutations that disrupted the stop codon or reading frame. This latter category of non-AUG mutations was further refined to include only those mutations that created an "uninsulated" AUG with no stop codon between it and the coding sequence, and to exclude the non-AUG mutations that destroyed the original uORF but did not create an uninsulated uAUG owing to the presence of a downstream "backup" stop triplet. Two categories of uORF gain were tallied: cases in which a mutation created a new AUG codon upstream and in-frame with an existing stop codon, and cases in which a mutation put a pre-existing uAUG triplet in the context of an ORF. Rates of uORF loss were calculated by dividing the number of observed, diagnosable loss events by the number of conserved uORFs in the same sequence class.
The expected incidence of uORFs in 5' leader and control sequence classes was calculated from the observed incidence of potential start (AUG) and stop (UAA/UAG/UGA) triplets in individual leader sequences. For each leader sequence, the numbers of empirically observed start and stop triplets were tallied, and then their relative order and frame were randomly permuted 1,000 times to generate a distribution from which an expected number of ORFs was derived.
Initiation/Termination Context
The initiation context of conserved and nonconserved uORFs was compared to the initiation context of genic ORFs exhibiting high and low codon bias. Codon bias was evaluated with the ENC' statistic (Novembre 2002
). "High" and "low" bias gene sets correspond, respectively, to the genes at least two standard deviations above and below the average codon bias across all genes. Ten nucleotide positions immediately upstream and seven nucleotide positions immediately downstream of the initiation codon were examined in pairwise comparisons among the different ORF classes (conserved uORF vs. non-conserved uORF, high codon bias genic ORF vs. low codon bias genic ORF) from strain JEC21. A heterogeneity chi-squared test was used to determine the significance of difference in nucleotide usage at each position.
The sequence context of uORF stop codons was also compared between conserved and nonconserved uORFs in the manner described above.
Conservation of uORF-Encoded Peptides
Conserved uORFs were evaluated for conservation at the amino acid level. The ratio of nonsynonymous to synonymous polymorphisms (Ka/Ks) was computed for all conserved uORFs and a subset of 12 conserved uORFs that were 20–99 codons in length (after Crowe, Wang, and Rothnagel 2006
). For each set, all uORFs were trimmed of their start and stop codons and concatenated into a single sequence for each Cryptococcus strain to obtain an overall estimate of Ka/Ks across the set. The Ka/Ks ratio for each set was calculated with codeml model M0 in the PAML 3.14 package (Yang 1997
). Codon usage bias for strain JEC21 was measured in each concatenated uORF set with the ENC' statistic (Novembre 2002
).
Evaluation of Annotation Accuracy
The accuracy of predicted translation start sites in the TIGR annotation was measured in two ways. First, the conservation of the predicted genic translation initiation codon was compared to the conservation rate of the first AUG triplet encountered upstream and the first two AUG triplets encountered downstream of the predicted translation start site. Verification of predicted translation start sites in the TIGR gene calls using a comparative method was largely successful. The AUG codon at the predicted translation start site was conserved 80.8% of the time (4,118/5,095). The first and second internal AUG codons were, respectively, perfectly conserved at rates of 78.8% (4,019 of 5,095 genes) and 80.2% (4,090/5,095). The first uAUG triplet was conserved only 52.5% of the time in leader sequences that contained an uAUG not in the context of a uORF (84/161;
2 test; P = 1.1E-18), suggesting that this class of triplets is much less likely to be part of the genic coding sequence.
Conservation of the intervals between AUG triplets was also used to evaluate annotation accuracy. The ratio of nonsynonymous to synonymous divergence (Ka/Ks) was computed for all of the AUG–AUG intervals defined by the triplets scored for conservation in the previous analysis with codeml model M0 in the PAML 3.14 package (Yang 1997
). Supplementary Figure 1 is a frequency histogram of Ka/Ks estimates for three intervals: first upstream AUG to the predicted translation initiation codon (TIC), TIC to the first internal coding AUG, and second internal coding AUG to the third internal coding AUG. The short length of some of these sequence intervals generates noise in the Ka/Ks statistic, but it is clear that most AUG–AUG intervals within the genic coding region have Ka/Ks ratios less than the AUG–TIC intervals that are upstream of the predicted coding sequence. This result again indicates that the predicted translation start sites are correct in the majority of C. neoformans genes. A small number of genes exhibiting Ka/Ks ratios < 0.30 in their upstream AUG–AUG intervals were excluded from further analysis.
| Results |
|---|
|
|
|---|
Upstream ORF Incidence and Conservation
The overall incidence and conservation of uORFs in non-spliced, cDNA-defined 5' leader sequences of strain JEC21 is illustrated in figure 1. The antisense strand of 5' leader sequences (5'–) and the sense and antisense strands of 3' leader sequences (3'+; 3'–) were used as control sequences for measuring uORF incidence and conservation, as randomly occurring ORFs in these strands are presumably selectively neutral. A total of 249 of 2,167 (11.5%) 5' leader sequences contained at least 1 uORF. This rate is at least fourfold smaller than the incidence of uORFs in any of the control sequences [(5'–): 1,040/2,167 = 48%; (3'+): 1,660/2,698 = 61.5%; (3'–): 1,635/2,698 = 60.6%]. This result is lower than the 36% incidence reported by Iacono, Mignone, and Pesole (2005)
|
|
The reduced incidence of uORFs in 5' leader sequences may be explained in large part by the reduced incidence of AUG triplets. Figure 2 indicates that the observed incidence of AUG triplets is significantly lower than expected given the nucleotide composition of the 5' leaders sequence, similar to what has been found in mammals and yeast (Churbanov et al. 2005
|
Despite this reduced incidence of AUG triplets, uORFs are more common in the 5' leader sequence than expected by chance, given the observed incidence of potential start and stop triplets. Figure 3 illustrates that uORFs are 1.6 times more common than expected (216 observed; 134 expected) in 5' leader sequence after allowing for a shortage of AUG triplets, as determined by simulated shuffling of observed potential start and stop triplets. This means that the potential start and stop triplets in the 5' leader are in the correct orientation (start upstream of stop) and in the same reading frame much more frequently than expected by chance. Upstream ORFs are less common than expected in the sense strand of the 3'UTR, perhaps because of a greater than expected incidence of potential stop triplets in those sequences (data not shown). Averaging across the control sequences, the observed/expected ratio of uORF abundance was 0.95 (2,421 observed; 2,545 expected).
|
Calculating the Fraction of Conserved Regulatory uORFs
These observations of conservation suggest that uORFs may be functional repressors of translation, but they may also be conserved to prevent premature translation initiation from uAUGs. The stop codon of such uORFs would effectively be acting as insulation between the uAUG and genic coding sequence.
We sought to determine the proportion of uORFs conserved for uAUG insulation rather than for a potential impact on translation. To do so we note that conservation of a uORF for an insulatory purpose would predict purifying selection on the stop codon and reading frame, but not on the uORF start codon (a uAUG). Mutations disrupting the start codons of such uORFs would presumably be free to drift to fixation. Conservation of a uORF for its effect on translational efficiency, however, requires purifying selection on all parts of the uORF.
Given these different expectations regarding conservation, we tested for the prevalence of insulatory uORFs by analyzing modes of uORF loss. We identified instances of uORFs recently disrupted by mutation and recorded whether or not the mutation disrupted the uORF start codon (Table 1). As expected for uORFs acting as uAUG insulators, a greater proportion of uORFs were lost because of start-disrupting mutations than expected (5'+: 35 of 41 losses; control ORFs: 632 of 973 losses;
2 test; P = 0.0062). Assuming that uORFs and control ORFs are subject to a similar profile of mutations, we infer that purifying selection has filtered out mutations that solely disrupted uORF stop codons or reading frames, thereby reducing the overall uORF loss rate by approximately 24% (observed losses = 41; expected losses = 54).
We measured the rate of diagnosable uORF losses in the 5' leader and control sequence classes as 0.17 and 0.37, respectively, by dividing the observed number of losses by the number of conserved uORFs. This twofold difference in loss rates between 5' leader and control uORFs indicates that the 24% reduction in loss rate caused by selection to prevent premature translation initiation is insufficient to explain the degree of conservation observed for uORFs in the 5' leader sequence. We infer that the balance of the difference in loss rates may be due to selection to preserve the effects of the 5' leader uORFs on translation efficiency of the downstream genic ORF.
To determine the fraction of uORFs conserved for their impact on translation, we assume that the observed rate of uORF loss (Lo; 0.334 lost/conserved) is a mixture of two loss rates:
|
|
Initiation/Termination Context
We compared the nucleotide usage 10 bp upstream and 7 bp downstream of the translation initiation codon between genes exhibiting high and low codon bias. Miyasaka (1999)
detected a correlation between the degree of codon bias and the optimality of the initiation context in yeast genes, suggesting that selection can operate on both features simultaneously to affect translation efficiency. Further, Crowe, Wang, and Rothnagel (2006)
found evidence of selection to optimize uORF initiation context for uORFs conserved between human and mouse. In Cryptococcus, genes exhibiting high codon bias exhibit significantly greater usage of adenine nucleotides in positions –3 (heterogeneity
2 test; P = 0.001) and position –1 (heterogeneity
2 test; P = 0.001) relative to the first site of the translation initiation codon. Interestingly, conserved uORFs exhibit significantly greater usage of cytosine in position –6 relative to uORFs recently lost or gained in a given C. neoformans lineage (heterogeneity
2 test; P = 0.05). However, there was no significant difference in nucleotide usage between high and low codon bias genes at position – 6, indicating that this result for conserved uORFs may be spurious. Thus we did not find evidence of strong selection to generate optimal or nonoptimal contexts for translation initiation at conserved uORFs in Cryptococcus.
In a similar manner, we also compared nucleotide usage in the vicinity of uORF stop codons between conserved and non-conserved uORFs. Grant and Hinnebusch (1994)
found that ribosome reinitiation frequency following uORF translation at the GCN4 locus in yeast was strongly associated with the A/U richness of the final uORF codon and 10 base pairs following the stop codon. For Cryptococcus, we found little difference in nucleotide composition between the termination regions of conserved and nonconserved uORFs. Uracil nucleotides were significantly more common in the third position of the last codon of nonconserved uORFs relative to conserved uORFs (heterogeneity
2 test; P = 0.01), but adenine nucleotides were most commonly found at this location in conserved uORFs. We conclude that there is no strong selective pressure to modulate nucleotide composition in uORF termination regions in Cryptococcus.
Upstream AUG Conservation
We failed to find evidence that uAUGs are conserved outside the context of uORFs. We found 23 instances of unambiguous loss and 82 conserved instances of uAUGs in the 5' leader sequences that were not associated with uORFs, yielding a loss rate of 23/82 = 0.28. We found 1,359 cases of unambiguous loss and 3,410 conserved uAUGs across the control sequences, for a loss rate of 0.39. These rates were not significantly different (
2 test; P = 0.13), indicating that AUG codons are not selectively maintained outside the context of uORFs in Cryptococcus 5' leader sequences. Previous analyses have reported conservation of uAUG triplets in mammals and yeast (Churbanov et al. 2005
; Iacono, Mignone, and Pesole 2005
), but they did not distinguish whether the signal was independent of selection to maintain the integrity of uORFs.
Selection on uORF Length
We compared the size distribution of conserved and nonconserved uORFs from the 5' leader sequences and from the 5'– control sequence. Churbanov et al. (2005)
and Iacono, Mignone, and Pesole (2005)
report that observed uORFs in fungal and mammalian sequences are significantly shorter than expected. In Cryptococcus, the average length of nonconserved uORFs in both the sense and antisense strands of the 5' leader sequence is longer (63 and 43 bp, respectively) than conserved uORFs in both strands, but the average length of conserved uORFs is almost identical in the sense and antisense strands (30.5 bp and 29.1 bp, respectively). One would predict that conserved uORFs might be shorter than nonconserved uORFs by chance, because longer uORFs have a greater likelihood of incurring an indel that puts the start and stop codons out of frame. We did find, however, that conserved uORFs in the 5' leader were significantly enriched in the 15–27 bp range relative to nonconserved uORFs or conserved uORFs in the antisense strand (nonparametric bootstrapping; P = 2.7E-4). We tested to see whether uORF length was selectively maintained in the presence of potential "backup" stop triplets by tallying instances in which the original stop codon of a uORF was mutated or not mutated in the presence of a downstream, in-frame stop. We observed a smaller frequency of utilization of back-up stops in the 5' leader sequence (30 utilized/50 unutilized = 0.60) compared to the control sequences (830 utilized/954 unutilized = 0.87), but this difference is not significant (
2 test; P = 0.11). This result suggests a lack of selective pressure to keep uORFs at their present size when there is an easy mutational path to lengthen them. We conclude that uORFs may be under selection for shorter length in Cryptococcus, but the evidence is weak.
Conservation of uORF-Encoded Peptides
Crowe, Wang, and Rothnagel (2006)
report evidence of conservation at the amino acid level for mammalian uORFs greater than 20 amino acids in length, and they suggest that many uORFs may encode bioactive peptides. To test whether conserved Cryptococcus uORFs may also be subject to such selective constraint, we estimated the ratio of nonsynonymous to synonymous substitution rates (Ka/Ks) in a concatenation of all 124 conserved Cryptococcus uORFs, as well as in a concatenation of 12 "long" conserved uORFs that were at least 20 amino acids in length. A Ka/Ks ratio close to 1 indicates neutral evolution, whereas a Ka/Ks ratio close to 0 suggests purifying selection.
We measured a Ka/Ks ratio of 0.82 for all conserved uORFs and a ratio of 0.42 for 12 conserved uORFs that were greater than 20 amino acids in length. For comparison, the average Ka/Ks ratio for Cryptococcus genic sequences is 0.18, and only 5.4% of Cryptococcus genes have a Ka/Ks ratio greater than or equal to the value observed for the set of long conserved uORFs (see Methods).
Using the ENC' statistic (Novembre 2002
), we also measured relative codon usage bias in each concatenated uORF set to detect selection for translational efficiency or accuracy at the codon level. While the set of 12 long uORFs exhibits a codon usage bias close to the median value for Cryptococcus genes (long uORF ENC' = 57.3; genic median ENC' = 55.8, where higher values signify lower codon bias), only 8.7% of Cryptococcus genes exhibit codon usage bias as weak or weaker than the usage bias observed across all conserved uORFs (ENC' = 59.3).
These results suggest that there may be weak purifying selection on the set of 12 conserved uORFs greater than 20 amino acids in length, but that the majority of conserved uORFs do not experience selection at the amino acid level or translational level and therefore are not likely to encode bioactive peptides.
| Discussion |
|---|
|
|
|---|
The uORFs present in the 5' leader sequences of Cryptococcus and most other eukaryotes are conserved for at least two significant reasons. Approximately 40% of uORFs are maintained in the genome because their inhibitory effects on translation efficiency presumably increase organismal fitness. Others are conserved over short time periods but beyond neutral expectation as a result of "insulation selection." These latter uORFs may be subject to purifying selection only until they experience an AUG-inactivating mutation that drifts to fixation. Zhang and Dietrich (2005)
The level at which this purifying selection acts appears in most cases to be the open reading frame itself, rather than individual uORF codons. Several recent manuscripts have focused on uAUG triplets as a unit of selection and/or conservation (Churbanov et al. 2005
; Crowe, Wang, and Rothnagel 2006
), but we find no evidence that uAUG triplets are conserved outside the context of uORFs. Indeed, we find that when uORFs are lost via mutation, they are disproportionately destroyed through mutation to the uAUG that initiates their translation. Further, we find no strong evidence that uORF length is under selection. Upstream ORFs that are perfectly conserved among the four sequenced Cryptococcus strains are shorter than uORFs that are not conserved, but this effect most likely derives from the increased probability that longer uORFs will incur a frame-shifting indel mutation over time. Neither do we detect strong evidence that the majority of conserved uORFs are under selection at the peptide coding level or translational efficiency level. A set of 12 conserved uORFs that were greater than 20 amino acids in length exhibited codon usage comparable to that observed in genic sequences and showed a Ka/Ks ratio that suggests purifying selection, albeit at a level lower than 95% of annotated Cryptococcus genes. Most conserved uORFs are shorter than 20 amino acids, however, and show no evidence of purifying selection or codon usage bias, even when concatenated to enhance signal strength. So, while some longer uORFs may encode functional, bioactive peptides, most uORFs are conserved only so far as to maintain their open reading frame.
Some thoroughly studied uORFs, such as GCN4 in yeast, facilitate context-dependent post-transcriptional gene regulation (Gaba et al. 2001
; Arava et al. 2005
). There is the possibility that the extended residence time in the genome afforded to most uORFs as a result of their protective role in preventing premature translation initiation might increase the likelihood that some uORFs could be exapted into regulatory mechanisms and conserved permanently (Lynch 2006
). Such a hypothesis is difficult to test directly, but it is possible to imagine a context-dependent regulatory uORF evolving in a step-wise process under this scenario. A new uORF might initially fix in a population as a consequence of a neutral or nearly neutral impact on the translational efficiency of the downstream gene. Because of the need to keep the uAUG of the uORF insulated, this uORF would enjoy a residence time in the genome longer than typically expected for a neutral genomic feature. One could imagine a compensatory mutation occurring during this extended residence time (e.g., a strengthened transcription factor binding site) that could make any decrease in translational efficiency caused by the uORF beneficial. Subsequent mutations might then tune the uORF's impact on translation to an optimal level or make its effects on translation context dependent. In vitro translation assays to determine the precise effects of individual uORFs are already underway using uORFs from Cryptococcus, and they will likely be necessary to determine which features of a uORF determine its impact on translation in this organism.
Changes in translational efficiency may be compensated for by changes in transcription rates for constitutively expressed genes, as well. Just as transcriptional regulatory elements have been observed to exhibit redundancy and turnover (Tanay, Regev, and Shamir 2005
), translational regulatory elements may also undergo flux over evolutionary time periods, leading to a complex interplay within and between transcriptional and translational factors. The lack of detectable selection on uORFs at the coding level or for initiation/termination sequence context suggests that promoter elements and uORFs may evolve in close proximity in 5' leader sequences with low interference.
The degree to which translational efficiency affects organismal fitness is unclear. Empirical evidence from yeast indicates that there is a great deal of variation in translational efficiency among genes (MacKay et al. 2004). Variation in codon bias among genes in other genomes suggests that selection differentially influences translational efficiency among genes, not just in yeast but in a host of organisms (Akashi 2001
; Duret 2002
; Chamary, Parmley, and Hurst 2006
). If a significant fraction of the genes in every genome do not experience strong selection for optimal translational efficiency, as many analyses of codon bias suggest, then perhaps the abundance of uORFs in eukaryotic genomes may be explained by viewing uORFs as selectively neutral features despite their impact on translation. Studies examining variation in translational efficiency at the population level would cast much light on this question.
The existence of uORFs conserved over deep evolutionary time, however (Iacono, Mignone, and Pesole 2005
; Zhang and Dietrich 2005
), suggests that many uORFs are not selectively neutral features and that they are conserved precisely because of their potential to have a negative impact on translational efficiency. Post-transcriptional gene regulation may not be efficient in terms of cellular resources and energy, but it may sometimes offer a more expedient mechanism of changing gene expression than transcriptional modulation. Certain classes of genes, such as those prone to aggregation when overexpressed (DePristo, Weinreich, and Hartl 2005
) or oncogenes (Mehta, Trotta, and Peltz 2006
) might benefit from features such as uORFs that check their ultimate expression level. Genes whose expression level must be precisely regulated might also benefit from reduced translational efficiency, as it has been demonstrated that a high rate of transcription coupled with a low rate of translation can minimize noise in eukaryotic gene expression (McAdams and Arkin 1997
; Blake et al. 2003
; Fraser et al. 2004
).
Upstream ORFs are common, easily identified features of eukaryotic genes. The subtle, dual nature of the selective forces underpinning their conservation underscores the need for clusters of related genome sequences in deciphering functional noncoding elements. Further, the emerging universality of uORFs and other mechanisms of post-transcriptional gene regulation underscores the need for full-length cDNA libraries and other resources to identify 5' leader sequences and 3' untranslated regions in newly sequenced genomes. Given the ubiquity of uORFs in genomes, it is clear that comprehending the mechanism of their impact on translation will ultimately be essential to understanding eukaryotic gene expression.
| Supplementary Material |
|---|
|
|
|---|
Supplementary Table 1 and Figure 1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This work was supported in part by funding from the National Science Foundation and the National Institute of Allergy and Infectious Diseases. We thank Matt Sachs, Scott Roy, and two anonymous reviewers for helpful comments on this manuscript.
| Footnotes |
|---|
Laura Katz, Associate Editor
| References |
|---|
|
|
|---|
Akashi H. Gene expression and molecular evolution. Curr Opin Genet Dev. (2001) 11:660–666.[CrossRef][Web of Science][Medline]
Arava Y, Boas FE, Brown PO, Herschlag D. Dissecting eukaryotic translation and its control by ribosome density mapping. Nucleic Acids Res. (2005) 33:2421–2432.
Blake WJ, Kaern M, Cantor CR, Collins JJ. Noise in eukaryotic gene expression. Nature. (2003) 422:633–637.[CrossRef][Medline]
Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. (2003) 13:721–731.
Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet. (2006) 7:98–108.[CrossRef][Web of Science][Medline]
Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV. Evolutionary conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic genes. Nucleic Acids Res. (2005) 33:5512–5520.
Crowe ML, Wang XQ, Rothnagel JA. Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides. BMC Genomics. (2006) 7:16.[CrossRef][Medline]
DePristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet. (2005) 6:678–687.[CrossRef][Web of Science][Medline]
Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. (2002) 12:640–649.[CrossRef][Web of Science][Medline]
Fraser HB, Hirsh AE, Giaever G, Kumm J, Eisen MB. Noise minimization in eukaryotic gene expression. PLoS Biol. (2004) 2:e137.[CrossRef][Medline]
Gaba A, Wang Z, Krishnamoorthy T, Hinnebusch AG, Sachs MS. Physical evidence for distinct mechanisms of translational control by upstream open reading frames. Embo J. (2001) 20:6453–6463.[CrossRef][Web of Science][Medline]
Galagan JE, Calvo SE, Cuomo C, et al, (50 co-authors). Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature (2005) 438:1105–1115.[CrossRef][Medline]
Geballe AP, Sachs MS. Translational control by upstream open reading frames. In: Translational Control of Gene Expression—Sonenberg N, Hershey JWB, Mathews MB, eds. (2000) Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. 595–614.
Grant CM, Hinnebusch AG. Effect of sequence context at stop codons on efficiency of reinitiation in GCN4 translational control. Mol Cell Biol. (1994) 14:606–618.
Hinnebusch AG. Translational regulation of GCN4 and the general amino acid control of yeast. Annu Rev Microbiol. (2005) 59:407–450.[CrossRef][Web of Science][Medline]
Iacono M, Mignone F, Pesole G. uAUG and uORFs in human and rodent 5' untranslated mRNAs. Gene. (2005) 349:97–105.[CrossRef][Web of Science][Medline]
Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. (2002) 12:656–664.
Kozak M. Determinants of translational fidelity and efficiency in vertebrate mRNAs. Biochimie. (1994) 76:815–821.[Medline]
Loftus BJ, Fung E, Roncaglia P, et al, (54 co-authors). The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science. (2005) 307:1321–1324.
Lynch M. The origins of eukaryotic gene structure. Mol Biol Evol. (2006) 23:450–468.
Lynch M, Scofield DG, Hong X. The evolution of transcription-initiation sites. Mol Biol Evol. (2005) 22:1137–1146.
Ma B, Tromp J, Li M. PatternHunter: faster and more sensitive homology search. Bioinformatics. (2002) 18:440–445.
MacKay VL, Li X, Flory MR, et al, (12 co-authors). Gene expression analyzed by high-resolution state array analysis and quantitative proteomics: response of yeast to mating pheromone. Mol Cell Proteomics. (2004) 3:478–489.
McAdams HH, Arkin A. Stochastic mechanisms in gene expression. Proc Natl Acad Sci USA. (1997) 94:814–819.
Mehta A, Trotta CR, Peltz SW. Derepression of the Her-2 uORF is mediated by a novel post-transcriptional control mechanism in cancer cells. Genes Dev. (2006) 20:939–953.
Meijer HA, Thomas AA. Control of eukaryotic protein synthesis by upstream open reading frames in the 5'-untranslated region of an mRNA. Biochem J. (2002) 367:1–11.[CrossRef][Web of Science][Medline]
Miyasaka H. The positive relationship between codon usage bias and translation initiation AUG context in Saccharomyces cerevisiae. Yeast. (1999) 15:633–637.[CrossRef][Web of Science][Medline]
Morris DR, Geballe AP. Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol. (2000) 20:8635–8642.
Novembre JA. Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol. (2002) 19:1390–1394.
Serikawa KA, Xu XL, MacKay VL, Law GL, Zong Q, Zhao LP, Bumgarner R, Morris DR. The transcriptome and its translation during recovery from cell cycle arrest in Saccharomyces cerevisiae. Mol Cell Proteomics. (2003) 2:191–204.
Tanay A, Regev A, Shamir R. Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast. Proc Natl Acad Sci USA. (2005) 102:7203–7208.
Vilela C, McCarthy JE. Regulation of fungal gene expression via short open reading frames in the mRNA 5' untranslated region. Mol Microbiol. (2003) 49:859–867.[CrossRef][Web of Science][Medline]
Wang XQ, Rothnagel JA. 5'-untranslated regions with multiple upstream AUG codons can support low-level translation via leaky scanning and reinitiation. Nucleic Acids Res. (2004) 32:1382–1391.
Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO. Precision and functional specificity in mRNA decay. Proc Natl Acad Sci USA. (2002) 99:5860–5865.
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. (1997) 13:555–556.
Zhang Z, Dietrich FS. Identification and characterization of upstream open reading frames (uORF) in the 5' untranslated regions (UTR) of genes in Saccharomyces cerevisiae. Curr Genet. (2005) 48:77–87.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. Zhang, T. Zhao, W. Li, and M. Vore The 5'-Untranslated Region of Multidrug Resistance Associated Protein 2 (MRP2; ABCC2) Regulates Downstream Open Reading Frame Expression through Translational Regulation Mol. Pharmacol., February 1, 2010; 77(2): 237 - 246. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Calvo, D. J. Pagliarini, and V. K. Mootha Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans PNAS, May 5, 2009; 106(18): 7507 - 7512. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Tautz Polycistronic peptide coding genes in eukaryotes--how widespread are they? Briefings in Functional Genomics, January 1, 2009; 8(1): 68 - 74. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. S. Hughes, C. O. Buckley, and D. E. Neafsey Complex Selection on Intron Size in Cryptococcus neoformans Mol. Biol. Evol., February 1, 2008; 25(2): 247 - 253. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Puyaubert, L. Denis, and C. Alban Dual Targeting of Arabidopsis HOLOCARBOXYLASE SYNTHETASE1: A Small Upstream Open Reading Frame Regulates Translation Initiation and Protein Targeting Plant Physiology, February 1, 2008; 146(2): 478 - 491. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







