Skip Navigation


MBE Advance Access originally published online on March 20, 2007
Molecular Biology and Evolution 2007 24(5):1140-1148; doi:10.1093/molbev/msm045
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/5/1140    most recent
msm045v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roy, S. W.
Right arrow Articles by Neafsey, D. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roy, S. W.
Right arrow Articles by Neafsey, D. E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Evolutionary Conservation of UTR Intron Boundaries in Cryptococcus

Scott William Roy*, David Penny* and Daniel E. Neafsey{dagger}

* Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
{dagger} Microbial Analysis Group, Broad Institute of MIT and Harvard University

E-mail: scottwroy{at}gmail.com.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
Despite significant progress, the general functional and evolutionary significance of the untranslated regions (UTRs) of eukaryotic transcripts remain mysterious. Particularly mysterious is the common occurrence of spliceosomal introns in transcript UTRs because UTR splicing is not necessary for restoration of transcript coding sequence. In general, it is not known to what extent such splicing performs an important function or merely represents spliceosomal "noise." We conducted the first analysis of evolutionary conservation of UTR splicing. Among 4 species from Cryptococcus neoformans species complex, we find high levels of conservation of UTR intron boundary sequences, strongly suggesting that UTR intron splicing is conserved by purifying selection. We estimate that 50–90% of splice boundaries are maintained by selection. Donor site sequences are more highly conserved than acceptor sequences, and splicing boundaries are more conserved in 5' UTRs than in 3' UTRs. In addition, we report a variety of differences between patterns of UTR splicing in Cryptococcus and corresponding patterns in animals and plants. These results focus attention on the functional roles of eukaryotic UTRs and deepen the mystery of UTR intron splicing.

Key Words: untranslated regions • genome evolution • purifying selection


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
Spliceosomal introns are sequences in eukaryotic genomes that are removed from RNA transcripts by the spliceosome prior to nuclear export and translation. The apparent absence of a general function for introns, as well as their peculiar phylogenetic distribution across eukaryotic lineages, makes them a central mystery of genome evolution (for recent reviews, see Rogozin et al. 2005Go; Jeffares et al. 2006Go; Rodríguez-Trelles et al. 2006Go). A wealth of work over the past few years has delineated the contours of intron evolution within coding sequences over different timescales (e.g., Fedorov et al. 2002Go; Rogozin et al. 2003Go; Collins and Penny 2005Go; Roy and Gilbert 2005Go; Slamovits and Keeling 2006Go).

However, the vast majority of theoretical and empirical work on spliceosomal intron evolution has focused on introns that interrupt coding sequences, overlooking spliceosomal introns present in both 5' and 3' untranslated regions (UTRs) of protein-coding transcripts for many species (Pesole et al. 2001; Chung et al. 2006Go; Hong et al. 2006Go). Such UTR introns are found across a wide variety of eukaryotic lineages and reach large numbers in at least some plants and animals (Chung et al. 2006Go; Hong et al. 2006Go). The existence and broad phylogenetic distribution of UTR introns remain quite mysterious; whereas the removal of introns from coding regions is clearly necessary for accurate translation of full-length proteins, the necessity of splicing of noncoding regions is less obvious. In particular, the relatively high frequency of introns in 3' UTRs is surprising, as the presence of stop codons upstream of these intron boundaries might be expected to cause these transcripts to be targeted for degradation by the nonsense-mediated decay (NMD) pathway (Hentze and Kulozik 1999Go). One possibility is that UTR intron presence affects posttranscriptional expression level, as recently found for the EF1{alpha}-A3 gene of Arabidopsis (Chung et al. 2006Go).

There is an increasing appreciation of the posttranscriptional regulatory effects of UTRs. In the 5' UTR, short open reading frames (ORFs) upstream of the translation initiation site are known to regulate translation levels (Morris and Geballe 2000Go; Meijer and Thomas 2002Go; Vilela and McCarthy 2003Go). In general, ATG triplets lying upstream of the true translation initiation site have been shown to be preferentially conserved (Churbanov et al. 2005Go; Zhang and Dietrich 2005Go; Crowe et al. 2006Go), indicating a functional role for these sites. One of the few general proposed functions of 5' UTR splicing to date is minimization of UTR length in order to avoid mutation to ATG codons, which could cause repression of translation (Hong et al. 2006Go). The 3' UTR sometimes contains targets for miRNAs, allowing for specific posttranscriptional regulation (Lai 2002Go; Bartel 2004Go; Lall et al. 2006Go).

A central reason that the evolution of UTR introns has largely escaped consideration to date is the rapid sequence evolution of UTRs (Larizza et al. 2002; Shabalina et al. 2004). In general, UTR sequences evolve more rapidly than coding sequences, and UTR sequences lack a clear general organizing principle such as coding frame, making alignment of UTR sequences over even moderate evolutionary distances difficult. In addition, computational prediction of UTR introns is plagued by similar uncertainties, and the 3' bias of cDNA libraries obscures splicing pattern in 5' regions of transcripts. However, the increasing availability of closely related clusters of well-characterized full-genome sequences and large numbers of full-length cDNAs from intron-rich species finally allows for the study of UTR splicing.

We studied the evolution of UTR splicing in 4 species of the Cryptococcus neoformans species complex (Xu et al. 2000Go; Loftus et al. 2005Go). Pairwise synonymous divergence within the clade has been previously estimated as ranging from 11% to 37% (Stajich JE, Neafsey DE, unpublished observations), allowing for alignment of many noncoding regions. We studied all C. neoformans UTR introns for which 4-way orthologous sequences were known: 442 introns in 334 5' UTRs and 180 introns in 126 3' UTRs. We determined levels of conservation at intron splice sites across the clade.

Our major findings include the following: 1) UTR intron boundaries are generally highly conserved; 2) 5' UTR intron boundaries are more highly conserved than 3' UTR intron boundaries; 3) intron donor sites are more highly conserved than intron acceptor sites in both 5' and 3' UTRs; 4) we found no cases of (near) exact intron loss/gain; 5) there is a relatively high density of introns beginning with a GC nucleotide, as opposed to the canonical GT, particularly in 3' UTRs; 6) we find significant interconversion of GC and GT donor sites; and 7) patterns of intron density and length in Cryptococcus vary significantly from previously reported patterns in plants and animals. In total, these results indicate purifying selection maintaining the majority of splice sites for UTR introns, particularly in 5' UTRs. One possible explanation for conservation of 5' UTR splicing could be the presence of alternative transcription or translation initiation sites. These results underscore the importance of UTR regions in gene evolution.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
Data Set
We compared orthologous UTRs between 4 members of the C. neoformans species complex. The genome assemblies of 4 strains of C. neoformans were obtained from the websites of the sequencing centers that produced them (strain JEC21: TIGR; strain WM276: Michael Smith Genome Center; and strains H99 and R265: Broad Institute). Whole-genome alignments were created using a multistep process using strain JEC21 as a reference. First, pairwise alignments between JEC21 and the other sequenced strains were created using PatternHunter (Ma et al. 2002Go). Blocks of 4-way homologous contigs were then identified using a hierarchical synteny clustering algorithm. Multiple alignments of homologous regions were generated using Multi-LAGAN (Brudno et al. 2003Go). The 5' and 3' UTRs in the alignments were identified using a full-length cDNA library for JEC21 produced by TIGR (available at http://www.tigr.org/tdb/e2k1/cna1/). Each cDNA was matched to a region of the JEC21 genome using BLAT (Kent 2002Go). The 5' and 3' UTRs were inferred when the matching region of a cDNA sequence extended beyond the boundaries of a coding sequence identified in the TIGR annotation of the JEC21 strain. In cases where multiple cDNA BLAT matches partially overlapped coding regions, 5' and 3' UTR boundaries correspond to the most distal cDNA match. In the few (~10) cases in which use of an alternative inframe upstream ATG extended the coding sequence relative to the TIGR annotation, we used this upstream start site. (Thus we conservatively defined 5' UTR introns. If in fact a downstream ATG is the true translation start site, this will lead to an actual UTR intron being identified as a coding sequence intron and excluded from the analysis.)

We extracted the 334 5' UTR and 126 3' UTR 1:1 ortholog sets showing evidence of UTR splicing in C. neoformans. In total, there were 442 5' and 180 3' introns. UTR intron sequences were checked for canonical GT...AG or GC...AG boundaries. Visual inspection yielded no cases in which there was evidence for precise excision/insertion of an intron (i.e., with most/all of the intron removed with fewer than 10 bp on either side). Described analyses were performed by novel Perl programs.

We used BlastN to map all 23,000 available full-length JEC21 cDNAs against the genomic JEC21 sequence to determine variation in the position of the transcription start site.

Definition of Conserved Regions and Levels of Conservation
We utilized a simple definition of conserved regions as nucleotide positions without gaps or uncertain nucleotides (e.g., N). For each single nucleotide or dinucleotide in the alignment, we determined conservation across all 4 species. In addition, we tried a variety of other definitions of "conserved" nucleotide positions. We defined conserved positions as those in which flanking sequence (using total windows of 10, 14, or 20 bp, excluding the position in question) showed at least a threshold level of conservation (50% or 75%). Estimated levels of conservation were relatively constant for all definitions used (69–75%), as were estimated fractions of conserved intronic boundary sites. Therefore, we used the simple and straightforward criterion of all ungapped positions.


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
Patterns of UTR Splicing in Cryptococcus
The patterns of UTR splicing in C. neoformans are summarized in table 1. Mean and median intron lengths were 157.0 and 75 bp in 5' UTRs and 63.1 and 57 bp in 3' UTRs, thus intron lengths in 5' but not 3' UTRs showed pronounced rightward skew. Length of 5' UTR introns was negatively correlated with exonic distance from the translation initiation site (r = –0.108, P = 0.015). However, introns very near the translation start site (within 12 bp) showed similar mean (176.1) and median (70 bp) lengths to all 5' UTR introns.


View this table:
[in this window]
[in a new window]

 
Table 1 Summary of Introns in UTRs and Coding Sequences in Cryptococcus neoformans

 
Mean and median exon lengths were 132.8 and 91 bp in 5' UTRs and 164.5 and 113 bp in 3' UTRs. Terminal 5' UTR exons (the sequence between the 3' most intron and the translation start site) tended to be shorter with a mean length of 108.3 and a median length of 62 bp, and there was a much higher frequency of short exons (31.3% of terminal exons had lengths less than 30 bp vs. 15.7% for nonterminal exons, 1.2 x 10–8 by a Fisher Exact test).

Introns constituted a much higher fraction of total UTR length in 5' UTRs (38.4%) than in 3' UTRs (17.2%). The vast majority of intron-containing 5' UTRs (87.6%) and 3' UTRs (80.6%) had a single intron; only 2.3% of 5' and 5.6% of 3' UTRs had more than 2. In total, there were 3.38 introns per kilobase of exonic sequence on average in intron-containing 5' UTRs and 2.56 per kilobase in 3' UTRs. Including both intron-containing and intronless UTRs, there was 1.01 intron per exonic kilobase in 5' UTRs and 0.29 in 3' UTRs. By comparison, there are 3.31 introns per exonic kilobase in coding sequences of C. neoformans.

Intronic and Exonic UTR Sequence Conservation
Levels of sequence conservation at ungapped positions were similar across different classes of sites. A total of 70.9% of all 5' UTR sites and 72.3% of all 3' UTR sites were conserved across all 4 species. In the 5' UTR, 71.6% of exonic sites and 69.7% of intronic sites were conserved across species; in the 3' UTR, 72.9% of exonic sites and 69.4% of intronic sites were conserved. For purposes of direct comparison with levels of conservation of intron boundary dinucleotides, we directly calculated levels of conservation of dinucleotides. In the 5' UTR, 55.3% of exonic dinucleotides and 51.0% of intronic dinucleotides were conserved across species (fig. 1a), slightly higher than expected from levels of single nucleotide conservation (71.6 x 71.6% = 51.3% and 69.7 x 69.7% = 48.5%, respectively). In the 3' UTR, 56.0% of exonic dinucleotides and 51.1% of intronic dinucleotides were conserved (vs. expected 53.1% and 48.2%, respectively).


Figure 1
View larger version (22K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Evolutionary conservation of UTR intron boundaries. (A) Percent conservation of intron boundary dinucleotides 5'(light) and 3'(dark) UTRs. Donor G(T/C) values were calculated excluding donor sites that are either GT or GC for all 4 species. (B) Nucleotide conservation near UTR intron boundaries. Data for 5'UTR (above) and 3'UTR (below) are shown. Dashed lines indicate the total exonic and intron averages over the corresponding region.

 
Conservation of Intron Boundaries
In general, intron donor site dinucleotides were more highly conserved than acceptor sites (fig. 1a). In all, 88.1% (281/319) of donor dinucleotides in 5' UTRs and 83.6% (97/116) in 3' UTRs were conserved across all 4 species. Excluding 18 sites that varied between GC and GT in the 4 species, 91.5% (281/307) of 5' donors and 88.2% (97/110) of 3' donors were conserved. Acceptor dinucleotides showed 81.7% (268/328) conservation in 5' UTRs and 71.4% (85/119) conservation in 3' UTRs. The difference in conservation of all donor and acceptor sites is different at the P = 0.0023 level by a Fisher Exact test. The difference in conservation of all 5'and 3' UTR intron boundary dinucleotides is different at the P = 0.011 level.

Sequence conservation at the donor site extended beyond the GT/GC dinucleotide (fig. 1b). For 5' and 3' UTRs, 84.5% and 80.7% of nucleotides in positions 3–6 were conserved, respectively. No intronic conservation was found in the acceptor site beyond the AG dinucleotide: conservation in positions –3 to –6 showed slightly lower conservation than all intronic sites. Exonic sites flanking the donor site showed slightly elevated levels of conservation; exonic acceptor sites did not show elevated conservation (fig. 1b).

Estimated Fraction of Intron Boundary Conservation
The lower level of sequence divergence at intron boundaries strongly suggests selective conservation. In the simplest model, if a fraction f of intron boundaries were strictly conserved by selection while other boundaries were unconstrained by selection, we would expect that the level of intron boundary divergence would be (1 – f) times the neutral level of divergence. Using general intron dinucleotide conservation as the benchmark for the neutral divergence rate, we get (1 – f) x 51.5% = 11.9% for 5' UTR donor sites, yielding f = 75.7%. Excluding polymorphic GT/GC acceptor sites, we estimate f = 82.7%. These estimates for all classes of sites are given in table 2.


View this table:
[in this window]
[in a new window]

 
Table 2 Level of Conservation of Intron Boundary Dinucleotides and Estimated Frequency of Boundaries that Are Conserved by Selection (in Parentheses), as Described in the Text

 
Donor Boundary Sequences
2.5% (8/319) and 8.6% (10/116) of 5' and 3' introns, respectively, began "GC" in C. neoformans (see example in fig. 2a). In total, there were 16 5' sites 14 3' sites that were GC in at least one species and GC or GT in all 4 species. There were 11 cases of conversion in 5' UTRs, of which 7 appear to be GT -> GC transitions and one appears to be a GC -> GT conversion by parsimony (i.e., one dinucleotide was common to 3 species; see example in fig. 2b). One of the conversions is from an ancestral GT to a GC in species JEC21. In the 3' UTRs, there was one case of GC -> GT transition and one additional case of GT -> GC, again in species JEC21.


Figure 2
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— GC donor sequences in UTR introns. (A) An example of a GC...AG intron in the 3'UTR sequence of CNC01100 and orthologs. Upper/lowercase indicates exonic/intronic sequence. Translation stop codon is underlined. (B) Apparent interconversion from GT to GC splice boundary in the 5'UTR of CNG04230 ortholog in R265. Translation start codon is underlined.

 
Intron Conservation Near Coding Sequence Boundaries
The boundaries of 5' UTR introns near the start codon (within 12 bp) were particularly highly conserved, with 95% of donor sites (98% excluding 2 GT/GC sites) and 93.7% of acceptor sites conserved across species (P = 0.045 compared with all 5' UTR donors by a 1-tailed Fisher Exact test for donor sites; P = 0.011 for acceptor sites; figs. 2b and 3a). One case of a mutated acceptor site near the ATG start site is shown in figure 3b (in gene CN05760). Interestingly, a G indel directly before the ATG start codon provides a possible nearby alternative acceptor site. By contrast, boundaries of 3' UTR introns within 12 bp of the stop codon were nonsignificantly less conserved than other 3' UTR introns, with 7/11 donor and 6/11 acceptor sites conserved.


Figure 3
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— 5'UTR introns near translation start codon. (A) Conservation of intron boundaries directly before ATG start codon in 5'UTR of CNA03330. (B) Lack of conservation of acceptor site for Cryptococcus neoformans neoformans intron 4 bases before translation start codon (underlined) in the 5'UTR of CNA05760. In H265 and WM276, which have a CG dinucleotide instead of the canonical AG nucleotide, a G indel (arrow) provides a possible acceptor site directly before the stop codon.

 
Large UTR Deletions Involving Introns
We found no evidence for (near) exact insertion/loss of the entire intron sequence. However, we did observe large genomic deletions in which most or all of an intron was removed, presumably altering splicing patterns (fig. 4). In the 5' UTR of gene CNA07270, a large genomic deletion has deleted the entirety of the intronic sequence as well as flanking exonic sequence (fig. 4a). In the 5' UTR of CNA01200, a large deletion genomic deletion spans most of an intron as well as adjacent exonic sequence (fig. 4b). In the 5' UTR of CNA01680, a 19 bp deletion is associated with the creation of an intron acceptor site in strain JEC21 (the A...G spanning the deletion; fig. 4c).


Figure 4
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Large indels involving UTR introns. (A) A large deletion has deleted nearly all of the 5'UTR intron along with adjacent coding sequence in the H99 ortholog to CN07270. Note that WM276 has experienced a GT ->GG mutation as well as a change in translation initiation site (underlined). (B) A large deletion has deleted nearly all of a 5'UTR intron along with adjacent coding sequence in the WM276 ortholog to CNA01200. (C) A 19 bp deletion in CNA01680 leaves an AG dinucleotide, which now functions as an acceptor site for a 5'UTR intron. The nearby AG dinucleotide in the ancestral sequence is a possible ancestral acceptor.

 
Alternative Promoters and 5' UTR Splicing
Alternative splicing of 5' UTRs has been shown to be associated with alternative promoters in some cases. To determine whether transcription start sites were more variable in intron-containing 5' UTRs, for each gene, we calculated the variance of transcription start site (among available full-length cDNAs), normalized (i.e., divided) by total UTR length for each gene (Supplementary Materials online). Intron-containing UTRs had significantly higher normalized variances than UTRs without introns (P < 1 x 10–7, Mann-Whitney U test), consistent with 5' UTR splicing being associated with use of alternative promoters.


    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
We report the first genome-wide study of the evolutionary conservation of intron splicing in UTRs. We find that boundary sequences of UTR introns are preferentially conserved, suggesting conservation of intron splicing by selection. Intron conservation is particularly pronounced in 5' UTRs near the translation initiation site.

UTR Evolution
The evolutionary and functional significance of UTRs of eukaryotic transcripts remains a mystery (Churbanov et al. 2005Go; Lynch et al. 2005Go; Hong et al. 2006Go). Most prokaryotes and some eukaryotes have extremely short or nonexistent UTRs, thus UTRs are not essential for transcription or translation (discussed in Lynch 2006Go). One function and/or liability of UTRs concerns the presence of upstream ATG codons in 5' UTRs (premature start codons [PSCs]). In general, 5' UTRs are depleted for PSCs suggesting that PSCs are often disfavored (Hahn et al. 2003Go). However, those PSCs that are present appear to be preferentially conserved, often paired with a nearby inframe stop codon to yield a short upstream ORF (or uORF; Churbanov et al. 2005Go; Iacono et al. 2005Go). Such uORFs have been shown to affect expression levels, and their preferential evolutionary conservation suggests that they have been utilized as functional posttranscriptional regulators of gene expression (Churbanov et al. 2005Go; Neafsey DE, Galagan JE, unpublished results). On the other hand, the potential for random mutation to deleterious PSCs make 5' UTRs a liability, one that could potentially be acted upon by selection (Lynch et al. 2005Go).

Patterns of UTR Splicing across Eukaryotic Lineages
The pattern of UTR splicing in Cryptococcus varies considerably from previously reported patterns in other species (Hong et al. 2006Go). Intron density in 5' UTRs (1.0 per kilobase) was only 3.5 times higher than in 3' UTRs, less than in plants (10 times) and much less than in animals (30–300 times). The median intron size in 5' UTRs was only 31.6% larger than in 3' UTRs and 35.0% larger than in coding sequences, less than for previously studied animals and plants (103–289% greater than 3' UTRs, 98–713% higher than coding sequences; Hong et al. 2006Go). The 3' UTR intron length distribution showed very little rightward skew (mean/median = 1.1), contrary to previously studied animals and plants (ratio from 2.8 to 9.2).

Other patterns of Cryptococcus UTR splicing mirrored results for other species (Hong et al. 2006Go). Intron density in 5' UTRs was 3.5 times lower than in coding regions, similar to the previous maximum known value of 2.9 in Arabidopsis thaliana. Intron length in 5' UTRs was significantly negatively correlated with distance from the ATG start codon. As with previous species, intron densities in 5' UTRs were higher than in 3' UTRs and lower than in coding sequences, and intron lengths in 3' UTRs were lower than in 5' UTRs and similar to lengths in coding sequences. As in plants and animals, most UTRs have zero or one intron, and very few have more than two.

Evolutionary Conservation of UTR Splicing
We show here that boundaries of intron sequences are evolutionarily conserved in Cryptococcus. We estimate that 50–90% of boundaries are conserved. This strongly suggests not only that selection is retaining general UTR splicing but also that the exact boundaries of the spliced intronic sequence are retained (i.e., that intron boundaries are not rapidly shifting through evolution).

We find that 5' UTR intron boundaries are particularly highly conserved, with an estimated 82.7% and 62.7% of donor and acceptor boundaries having been maintained by selection. Evolutionary conservation is even more pronounced near the ATG translation initiation start site, where an estimated 96.6% of donor boundaries and an estimated 86.1% of acceptor boundaries have been preferentially conserved by selection. These very high levels of splice site preservation indicate an important role for 5' UTR splicing. Recently, a role in posttranscriptional regulation was shown for a 5' UTR intron in Arabidopsis (Chung et al. 2006Go), which provides one possible general role for 5' UTR splicing. The particularly high levels of 5' conservation near the translation start site could indicate an even greater influence of sequences and splice boundaries at the end of the UTR on posttranscriptional regulation.

The presence of 3' UTR introns is surprising, as such introns are expected to subject the transcript to degradation by the NMD pathway. However, we find that 3' UTR splice site boundaries are preferentially retained through selection, indicating a functional role for 3' UTR splicing. One possibility is that alternative splicing of 3' UTR introns could utilize NMD as a mechanism for posttranscriptional expression regulation. An even more interesting possibility is that sequences of 3' UTR introns could contain targets for miRNAs, in which case alternative splicing of these introns could utilize RNAi as a mechanism for posttranscriptional regulation (Stark et al. 2003Go).

Lack of Intron Loss and Gain in UTRs
We found no evidence for loss or gain of exact or nearly exact intron sequences in the studied species. For Cryptococcus, this result is not surprising as a previous study of intron loss/gain in Cryptococcus coding regions found almost no intron loss/gain (Stajich and Dietrich 2006Go). We have previously suggested that in such cases of stasis, a lack of spontaneous mutations leading to intron loss/gain alleles is likely to explain the dearth of change (Roy and Gilbert 2006Go; Roy and Hartl 2006Go; Roy and Penny 2006Go). In this case, the lack of intron loss/gain in UTRs is not surprising. It would be interesting to compare rates of intron loss/gain in UTRs and coding sequences in species with a much larger amount of intron loss/gain.

Longer Introns in 5' UTRs
As found previously for plants and animals (Hong et al. 2006Go), introns in C. neoformans 5' UTRs tend to be longer than in coding regions or 3' UTRs. Hong et al. (2006)Go previously suggested that this might reflect selection against exonic ATGs in 5' UTRs. According to their model, expansion of introns (i.e., movement of the splicing boundary into adjacent sequence) could be positively selected based on the incorporation of such ATGs into the intronic sequence.

The current results are not very supportive of this model. First, if selection for intron boundary movement were an important factor in the evolution of 5' UTR splicing, we might expect to see frequent movement of intron boundaries. Instead, we show here that the boundaries of UTR introns, and particularly of 5' UTR introns, tend to be conserved through evolution. Second, if in fact 5' introns were longer due to incorporation of previously exonic ATGs, ATGs should be more common near intron boundaries than in the middle of introns. However, no such trend is seen—in fact there is a slight trend toward ATGs being less common at the boundaries of 5' UTR introns (fig. 5).


Figure 5
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— ATG frequency in 5'UTR introns. Frequency of ATGs per triplet divided by 1/64 (expected frequency assuming equal nucleotide frequencies) along the intron length for 5'UTR introns in bins of 0.05 are given. Error bars give the standard deviation assuming a binomial distribution. The trace shows the best fit to the data for a second-order polynomial.

 
Moreover, although the overall deficit of ATGs in exonic regions of 5' UTRs presumably reflects selection against many or most new PSCs, previous results (Churbanov et al. 2005Go; Neafsey DE, Galagan JE, unpublished data) show that PSCs that are present in 5' UTRs (particularly those of uORFs) are conserved through evolution, suggesting that their presence in the spliced UTR is favored by natural selection. Thus, we would expect that an intron expansion event that led to the removal of an exonic 5' UTR ATG from the transcript might in fact be disfavored.

An Alternative Explanation for Longer 5' UTR Introns
We propose instead that the greater intron length in 5' UTRs reflects the fundamentally different function of 5' UTR introns. The 5' UTRs are often alternatively spliced, with alternative forms sometimes being associated with alternative promoters or translation initiation sites (Garvin et al. 1988Go; Mironov et al. 1999Go; Loftus et al. 2005Go; Kimura et al. 2006Go). In such cases, we might expect some interference between the 2 alternative elements. In the case of alternative promoters, transcription enhancers/repressors acting at one promoter might interfere with transcription regulation at the alternative promoters. In the simplest case of alternative translation initiation sites, an upstream site would be utilized in spliced transcripts but a downstream site would be utilized in unspliced transcripts. In this case, in unspliced transcripts, the alternative (upstream) ATG would essentially be a PSC. In either case, the interference between sites might be decreased by increasing the length of genomic sequence between sites, in which case there would be direct selection for increased intron length. These forces might be stronger closer to the beginning of the coding sequence, explaining the inverse relationship between distance from the coding sequence and intron length (Hong et al. 2006Go).

Splicing Conservation in UTRs and Coding Regions
These results follow a wealth of recent results demonstrating conservation of introns in coding sequences for a variety of lineages (e.g., Roy and Hartl 2006Go; Roy and Penny 2006Go; Stajich and Dietrich 2006Go). However, "conservation" in the 2 cases denotes different things. The previous studies assessed intron loss/gain, that is, loss of the genomic sequence corresponding to the intron, processes that usually do not alter the sequence of the eventual transcript. Here, we show intron boundary conservation, indicating conservation of splicing of the sequence; in this case, lack of conservation would indicate lack of splicing or a difference in splice boundaries, leading to an alteration in the eventual transcript sequence. In particular, whereas the lack of intron loss/gain change in coding sequences could simply reflect a dearth of the necessary mutations (i.e., Roy and Hartl 2006Go), in this case, the conservation of intron splicing boundaries indicates purifying selection maintaining intron splicing. These differences notwithstanding both sets of results attest to the generally slow rate of change of splicing patterns in eukaryotic transcripts.


    Concluding Remarks
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
UTRs of transcripts have sometimes been treated as largely neutral features. However, the present results indicate general importance of UTR splicing, adding to the growing appreciation of the functional importance of UTRs. It seems most likely that UTR splicing is important for proper gene expression; however, the precise role of UTR splicing is not clear. Experimental studies should probe the importance of UTR splicing for regulation of transcription, nuclear export, and translation.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
Supplementary materials are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 
We thank Manuel Irimia for helpful comments on the manuscript. S.W.R. thanks Wen Wang and his laboratory for stimulating conversations and hospitality and for keeping him out from underneath the tires of Chinese buses. This work was supported in part by funds from the National Science Foundation and the National Institute of Allergy and Infectious Diseases (D.N.).


    Footnotes
 
Aoife McLysaght, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Concluding Remarks
 Supplementary Material
 Acknowledgements
 References
 

    Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell (2004) 116:281–297.[CrossRef][Web of Science][Medline]

    Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res (2003) 13:721–731.[Abstract/Free Full Text]

    Chung BYW, Simons C, Firth AE, Brown CM, Hellens RP. Effects of 5’ UTR introns on gene expression in Arabidopsis thaliana. BMC Genomics (2006) 7:120.[CrossRef][Medline]

    Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV. Evolutionary conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic genes. Nucleic Acids Res (2005) 33:5512–5520.[Abstract/Free Full Text]

    Collins L, Penny D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol (2005) 22:1053–1066.[Abstract/Free Full Text]

    Crowe ML, Wang XQ, Rothnagel JA. Evidence for conservation and selection of upstream open reading frames suggests probable encoding of bioactive peptides. BMC Genomics (2006) 7:16.[CrossRef][Medline]

    Fedorov A, Merican AF, Gilbert W. Large-scale comparison of intron positions among animal, plant, and fungal genomes. Proc Natl Acad Sci USA (2002) 99:16128–16133.[Abstract/Free Full Text]

    Garvin AM, Pawar S, Marth JD, Perlmutter RM. Structure of the murine lck gene and its rearrangement in a murine lymphoma cell line. Mol Cell Biol (1988) 8:3058–3064.[Abstract/Free Full Text]

    Hahn MW, Stajich JE, Wray GA. The effects of selection against spurious transcription factor binding sites. Mol Biol Evol (2003) 20:901–906.[Abstract/Free Full Text]

    Hentze MW, Kulozik AE. A perfect message: RNA surveillance and nonsense-mediated decay. Cell (1999) 96:307–310.[CrossRef][Web of Science][Medline]

    Hong X, Scofield DG, Lynch M. Intron size, abundance, and distribution within untranslated regions of genes. Mol Biol Evol (2006) 23:2392–2404.[Abstract/Free Full Text]

    Iacono M, Mignone F, Pesole G. uAUG and uORFs in human and rodent 5'untranslated mRNAs. Gene (2005) 349:97–105.[CrossRef][Web of Science][Medline]

    Jeffares DC, Mourier T, Penny D. The biology of intron gain and loss. Trends Genet (2006) 22:16–22.[CrossRef][Web of Science][Medline]

    Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res (2002) 12:656–664.[Abstract/Free Full Text]

    Kimura K, Wakamatsu A, Suzuki Y, Ota T, Nishikawa T, Yamashita R, Yamamoto J, Sekine M, Tsuritani K, Wakaguri H, et al. Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes. Genome Res (2006) 16:55–65. (32 co-authors).[Abstract/Free Full Text]

    Lai E. Micro RNAs are complementary to 3’ UTR sequence motifs that mediate negative post-transcriptional regulation. Nat Genet (2002) 30:363–364.[CrossRef][Web of Science][Medline]

    Lall S, Grun D, Krek A, Chen K, Wang YL, Dewey CN, Sood P, Colombo T, Bray N, Macmenamin P, et al. A genome-wide map of conserved microRNA targets in C. elegans. Curr Biol (2006) 16:460–471. (15 co-authors).[CrossRef][Web of Science][Medline]

    Larizza A, Makalowski W, Pesole G, Saccone C. Evolutionary dynamics of mammalian mRNA untranslated regions by comparative analysis of orthologous human, artiodactyl and rodent gene pairs. Comput Chem (2002) 26:479–490.[CrossRef][Web of Science][Medline]

    Loftus BJ, Fung E, Roncaglia P, Rowley D, Amedeo P, Bruno D, Vamathevan J, Miranda M, Andreson IJ, Fraser JA, et al. The genome of the Basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science (2005) 307:1321–1324. (54 co-authors).[Abstract/Free Full Text]

    Lynch M. The origins of eukaryotic genome structure. Mol Biol Evol (2006) 23:450–468.[Abstract/Free Full Text]

    Lynch M, Scofield DG, Hong X. The evolution of transcription-initiation sites. Mol Biol Evol (2005) 22:1137–1146.[Abstract/Free Full Text]

    Ma B, Tromp J, Li M. PatternHunter: faster and more sensitive homology search. Bioinformatics (2002) 18:440–445.[Abstract/Free Full Text]

    Meijer HA, Thomas AA. Control of eukaryotic protein synthesis by upstream open reading frames in the 5'-untranslated region of an mRNA. Biochem J (2002) 367:1–11.[CrossRef][Web of Science][Medline]

    Mironov AA, Fickett JW, Gelfand MS. Frequent alternative splicing of human genes. Genome Res (1999) 12:1288–1293.

    Morris DR, Geballe AP. Upstream open reading frames as regulators of mRNA translation. Mol Cell Biol (2000) 20:8635–8642.[Free Full Text]

    Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S. Structural and functional features of eukaryotic mRNA untranslated regions. Gene (2001) 276:73–81.[CrossRef][Web of Science][Medline]

    Rodríguez-Trelles F, Tarrío R, Ayala FJ. The origins and evolution of spliceosomal introns. Annu Rev Genet (2006) 40:47–76.[CrossRef][Medline]

    Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV. Analysis of evolution of exon-intron structure of eukaryotic genes. Brief Bioinform (2005) 6:118–134.[Abstract/Free Full Text]

    Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol (2003) 13:1512–1517.[CrossRef][Web of Science][Medline]

    Roy SW, Gilbert W. Complex early genes. Proc Natl Acad Sci USA (2005) 102:1986–1991.[Abstract/Free Full Text]

    Roy SW, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles, and progress. Nat Rev Genet (2006) 7:211–221.[CrossRef][Web of Science][Medline]

    Roy SW, Hartl DL. Very little intron loss/gain in plasmodium: intron loss/gain mutation rates and intron number. Genome Res (2006) 16:750–756.[Abstract/Free Full Text]

    Roy SW, Penny D. Large-scale intron conservation and order-of-magnitude variation in intron loss/gain rates in apicomplexan evolution. Genome Res (2006) 16:1270–1275.[Abstract/Free Full Text]

    Shabalina SA, Ogurtsov AY, Rogozin IB, Koonin EV, Lipman DJ. Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids (2004) 32:1782–1782.

    Slamovits CH, Keeling PJ. A high density of ancient spliceosomal introns in oxymonad excavates. BMC Evol Biol (2006) 6:34.[CrossRef][Medline]

    Stajich JE, Dietrich FS. Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans. Eukaryotic Cell (2006) 5:789–793.[Abstract/Free Full Text]

    Stark A, Brennecke J, Russell RB, Cohen SM. Identification of Drosophila microRNA targets. PLoS Biol (2003) 1:E60.[CrossRef][Medline]

    Vilela C, McCarthy JE. Regulation of fungal gene expression via short open reading frames in the mRNA 5'untranslated region. Mol Microbiol (2003) 49:859–867.[CrossRef][Web of Science][Medline]

    Xu J, Vilgalys R, Mitchell TG. Multiple gene geneologies reveal recent dispersion and hybridization in the human pathogenic fungus Cryptococcus neoformans. Mol Ecol (2000) 38:1214–1220.

    Zhang Z, Dietrich FS. Identification and characterization of upstream open reading frames (uORF) in the 5’ untranslated regions (UTR) of genes in Saccharomyces cerevisiae. Curr Genet (2005) 48:77–87.[CrossRef][Web of Science][Medline]

Accepted for publication February 16, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
M. Irimia, S. W. Roy, D. E. Neafsey, J. F. Abril, J. Garcia-Fernandez, and E. V. Koonin
Complex selection on 5' splice sites in intron-rich organisms
Genome Res., November 1, 2009; 19(11): 2021 - 2027.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
J. L. Ekena, B. C. Stanton, J. A. Schiebe-Owens, and C. M. Hull
Sexual Development in Cryptococcus neoformans Requires CLP1, a Target of the Homeodomain Transcription Factors Sxi1{alpha} and Sxi2a
Eukaryot. Cell, January 1, 2008; 7(1): 49 - 57.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/5/1140    most recent
msm045v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roy, S. W.
Right arrow Articles by Neafsey, D. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roy, S. W.
Right arrow Articles by Neafsey, D. E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?