Skip Navigation


MBE Advance Access originally published online on March 8, 2007
Molecular Biology and Evolution 2007 24(7):1447-1457; doi:10.1093/molbev/msm048
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/7/1447    most recent
msm048v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

A Very High Fraction of Unique Intron Positions in the Intron-Rich Diatom Thalassiosira pseudonana Indicates Widespread Intron Gain

Scott William Roy and David Penny

Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand

E-mail: scottwroy{at}gmail.com.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Although spliceosomal introns are present in all characterized eukaryotes, intron numbers vary dramatically, from only a handful in the entire genomes of some species to nearly 10 introns per gene on average in vertebrates. For all previously studied intron-rich species, significant fractions of intron positions are shared with other widely diverged eukaryotes, indicating that 1) large numbers of the introns date to much earlier stages of eukaryotic evolution and 2) these lineages have not passed through a very intron-poor stage since early eukaryotic evolution. By the same token, among species that have lost nearly all of their ancestral introns, no species is known to harbor large numbers of more recently gained introns. These observations are consistent with the notion that intron-dense genomes have arisen only once over the course of eukaryotic evolution. Here, we report an exception to this pattern, in the intron-rich diatom Thalassiosira pseudonana. Only 8.1% of studied T. pseudonana intron positions are conserved with any of a variety of divergent eukaryotic species. This implies that T. pseudonana has both 1) lost nearly all of the numerous introns present in the diatom–apicomplexan ancestor and 2) gained a large number of new introns since that time. In addition, that so few apparently inserted T. pseudonana introns match the positions of introns in other species implies that insertion of multiple introns into homologous genic sites in eukaryotic evolution is less common than previously estimated. These results suggest the possibility that intron-rich genomes may have arisen multiple times in evolution. These results also provide evidence that multiple intron insertion into the same site is rare, further supporting the notion that early eukaryotic ancestors were very intron rich.

Key Words: intron gain • genome evolution • eukaryotic evolution • phylogenetic reconstruction


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Spliceosomal introns are sequences that interrupt coding sequences in eukaryotes and which are removed from RNA transcripts by the spliceosome. Introns are specific and common to eukaryotes, however genomic intron number varies by more than 4 orders of magnitude, from hundreds of thousands of introns per genome in many metazoans and plants to fewer than 100 introns in a wide variety of protists, algae, and fungi (compiled in Logsdon 1998Go; Jeffares et al. 2006Go; Roy and Gilbert 2006Go; Rodríguez-Trelles et al. 2006b). The timing, causes, and mechanisms of intron origin have long been matters of debate (e.g., Hickey and Benkel 1986Go; Stoltzfus 1994Go; Elder 1991Go, 2000Go; Giroux et al. 1994Go; Poole et al. 1998Go; Venkatesh et al. 1999Go; Lynch 2002Go; Fedorov et al. 2003; Mourier and Jeffares 2003Go; de Roos 2004Go; Fedorov and Fedorova 2004Go; Roy 2004Go; Sverdlov, Babenko, et al. 2004Go; Collins and Penny 2005Go; Niu et al. 2005Go; Lin and Zhong 2005Go; Perumal et al. 2005Go; Fedorov and Fedorova 2006Go; Knowles and McLysaght 2006Go; Logsdon 1991).

Recent genomic studies have shown 2 commonalities among diverse intron-rich species. First, in all studied intron-rich species (arbitrarily defined hereafter as at least 0.5 introns per gene on average), a significant fraction of intron positions match the position of introns in distantly related species (Fedorov et al. 2002Go; Rogozin et al. 2003Go). For instance, 60% of introns in the fission yeast Schizosaccharomyces pombe were found to match an intron position in one or more species out of 6 studied nonfungal species (Rogozin et al. 2003Go). Around a quarter of intron positions in conserved coding regions are conserved between the flowering plant Arabidopsis thaliana and humans (Rogozin et al. 2003Go). This pattern indicates that at least a significant fraction of introns in these species are retained from much earlier stages of eukaryotic evolution. Second, genome-wide comparisons of closely related species in a wide variety of intron-rich lineages have shown a surprising dearth of recent intron gains (Roy et al. 2003Go; Nielsen et al. 2004Go; Stajich and Dietrich 2006Go; Roy and Hartl 2006Go; Lin et al. 2006Go; Roy and Penny 2006aGo, 2006bGo, 2007; Roy et al. 2006Go). Rates of intron gain in the past tens to hundreds of million years in mammals, Plasmodium, Theileria, basidiomycetes and euascomycetes fungi, Entamoeba, and land plants have been very low, equating to less than 1 intron gain per gene per 2 billion years. Two apparent exceptions involve nematodes and Oikopleura urochordates. In both taxa large numbers of lineage-specific intron positions suggest higher average rates of intron gain over deeper evolutionary distances (i.e., >100 MYA), however, both lineages retain at least around 15% of their ancestral introns, and rates of gain in the past 100 Myr of Caenorhabditis appear to also be low (Seo et al. 2001Go; Guiliano et al. 2002Go; Edvardsen et al. 2004Go; Coghlan and Wolfe 2004Go; Raible et al. 2005Go; Roy and Penny 2006bGo).

Recent studies have also confirmed that all known extremely intron-poor lineages have arisen by massive loss of relatively large numbers of ancestral introns. For example, a large fraction of S. pombe’s roughly 6,000 total introns were present in the fungal ancestor (because 60% of introns in conserved regions share positions with introns in nonfungi), implying that Saccharomyces cerevisiae’s much smaller number of introns (~300 in the entire genome) reflects massive intron loss (Rogozin et al. 2003Go). Similar conclusions can be drawn for all other known extremely intron-poor lineages. Recently, Slamovits and Keeling (2006)Go confirmed that some species of excavates, a diverse and potentially very early branching group of eukaryotes that includes a variety of extremely intron-poor lineages, contain a large numbers of ancestral intron positions, indicating massive (and likely recurrent) intron loss in very intron-poor excavates (Archibald et al. 2002Go). Thus, 1) all previously studied intron-rich eukaryotes retain significant numbers of ancestral introns and thus have never gone through a nearly intronless stage, whereas 2) all species that have lost nearly all of their ancestral introns have not replaced these ancestral introns with more recently gained introns but instead remain nearly intronless.

Here, we report an exception to this pattern in the diatom Thalassiosira pseudonana (Armbrust et al. 2004Go). Thalassiosira pseudonana genes contain an average of 1.4 introns per gene, of which only around 8.1% were found to match intron positions in any of a variety of distantly related eukaryotes. Thalassiosira pseudonana has thus lost the vast majority of the numerous introns present in the chromalveolate ancestor and has also gained a large number of introns. These results have important implications for the ongoing debate on the relative importance of intron gain and loss (Rogozin et al. 2003Go, 2005Go; Babenko et al. 2004Go; Qiu et al. 2004Go; Csurös et al. 2005; Nguyen et al. 2005Go; Roy and Gilbert 2005a, 2005b), and support the notion that early eukaryotic ancestors were very intron rich, with intron losses outnumbering intron gains through subsequent evolution over a wide variety of eukaryotic lineages (Roy and Gilbert 2005aGo, 2005bGo).


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Gene Sequences and Intron Positions
Predicted coding sequences and corresponding intron positions for full genome sequences of the studied chromalveolates were downloaded from the following sources: T. pseudonana (http://www.plasmodb.org/plasmo/home.jsp; version 1), Plasmodium falciparum (PlasmoDB.org; version 2), Theileria parva (National Center for Biotechnology Information [NCBI]; accession number AAGK01000001.1), Toxoplasma gondii (http://www.toxodb.org/toxo/home.jsp; version 4.0).

Intron Prediction in T. pseudonana
We studied the predicted introns in the T. pseudonana genome. In all, 98.7% of introns begin with the canonical GT (and 99.2% begin with G[T/C]); 99.5% end with the canonical AG. However, we found 2 patterns suggesting that some introns in T. pseudonana are in fact coding sequences that have incorrectly been predicted to be intronic. First, there is a pronounced excess of introns that are a multiple of 3 bases (3n; 9,571 total) versus introns that are one more than a multiple of 3 bases (3n + 1; 3,037) or 2 more than a multiple of 3 bases (3n + 2; 3,029). Secondly, among introns that are a multiple of 3 bases, a larger fraction (75.2%) of 3n introns lack in-frame stop codons. This is a much larger fraction than for 3n + 1 (29.1%) or 3n + 2 (28.6%) introns.

We can make separate estimates of the number of predicted introns that are instead protein encoding (i.e., not spliced out) from these 2 observations. First, assuming that there should be equal numbers of introns of the 3 different classes (an assumption which works very well for 3n + 1 and 3n + 2, which have 3,037 and 3,029 predicted introns, respectively), there are 6,540 "too many" 3n introns. Second, assuming that equal fractions of introns in the 3 classes should lack stop codons (which again works well for 3n + 1 and 3n + 2 introns, which have 29.1% and 28.4% stop-lacking introns, respectively), we estimate based on the 2,368 stop-containing 3n introns that there should be 972 stop-lacking introns, 6,233 fewer than currently predicted. The similarity of these 2 estimates suggests that around 6200 to 6600 (~90%) of the 7,205 predicted 3n in-frame stop codon–lacking introns are in fact coding sequence. We therefore excluded all 3n introns that lack in-frame stop codons from further consideration.

Fortunately, problems with intron prediction are unlikely to yield gene structures in which flanking coding sequences on both sides of the intron show strong sequence similarity to corresponding sequences in orthologous genes because incorrect intron prediction will lead to frameshifts and/or large alignment gaps (e.g., Roy et al. 2003Go; Roy and Hartl 2006Go). Indeed, exclusion of the excess 3n introns did not substantially change the total numbers of introns in conserved regions for the studies reported here or cause large changes in the results. In particular, that predicted intron positions in conserved regions are in fact true intron positions is underscored by the lack of gaps in nearby coding sequence: for each of the 3 pairwise comparisons with apicomplexan species, more than 80% of species-specific intron positions in conserved regions were in alignment regions with no gap within 5 codons.

Comparison with Apicomplexans
For each apicomplexan species, BlastP searches against T. pseudonana yielded best reciprocal hits. Each protein sequence pair was then aligned in ClustalW using default parameters and intron positions mapped onto resultant alignments. We defined "conserved regions" of the alignment as 33% amino acid identity over windows of 25 amino acid alignment positions (including gaps) both upstream and downstream of the intron position, as previously described (Roy and Hartl 2006Go).

Comparison with 684 Eucariotic Orthologous Groups
We downloaded intron positions and sequences from the 684 sets of orthologous groups (Rogozin et al. 2003Go) from the NCBI Web site (ftp.ncbi.nlm.nih.gov/pub/koonin/intron_evolution). For each of the 8 species from the original data set, we performed BlastP searches of the 684 proteins against the predicted T. pseudonana proteome. Putative T. pseudonana orthologs were assigned in cases in which the orthologous gene from all the 8 species had a best hit to the same T. pseudonana gene. This yielded 493 putative sets of orthologs. Protein sequences were then aligned in ClustalW using default parameters and intron positions were mapped onto the corresponding alignments. Following Rogozin et al. (2003)Go, we excluded alignment positions that contained gaps in one or more species, as well as flanking alignment positions.

Estimation of Ancestral Intron Numbers
We downloaded Csurös’ intron evolution program from his Web site (www.iro.umontreal.ca/~csuros/introns/) and applied his method (Csurös 2005Go) as well as that of Roy and Gilbert (2005aGo, 2005bGo) to the 493 sets of eukaryotic orthologs among the 9 species.

Branch Length Estimation
Branch lengths for the eukaryote phylogeny were inferred in PAUP* 4.0b10 (Swofford 2002Go) under the minimum evolution criterion. The distance matrix was determined under the JTT model of Jones et al. (1992)Go protein evolution in Splitstree 4.0 (Huson 1998Go) with gapped sites excluded. Importantly, the relative pattern of branch lengths among the taxa was either maintained or exaggerated when nonhomogenous rates parameters (invariant sites and gamma distribution) were included, regardless of whether the tree topology was constrained or estimated from the data. This was also the case when various alternative models of protein evolution were used.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Phylogenetic Relationships of Species Studied
Figure 1 shows the most likely phylogenetic relationship between the species studied. It is unknown whether plants are more closely related to chromalveolates or to fungamals (fungi + animals) (Sogin 1991Go; Phillippe et al. 2000; Stechmann and Cavalier-Smith 2002Go).


Figure 1
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Likely phylogenetic relationships between species studied.

 
Pairwise Comparison of T. pseudonana to 3 Apicomplexan Species
We compared intron positions in conserved regions of alignment between T. pseudonana and available genomic sequences from 3 apicomplexans (the closest relatives to diatoms for which full genome sequences were available): P. falciparum, T. parva, and T. gondii. Only between 1.9% and 4.2% of T. pseudonana intron positions were shared with each species (table 1).


View this table:
[in this window]
[in a new window]

 
Table 1 Summary of Pairwise Comparisons between Thalassiosira pseudonana and 3 Apicomplexan Species

 
Analysis of 493 Sets of Eukaryotic Orthologs
We next studied 684 sets of eukaryotic orthologs between 8 eukaryotic species previously compiled by Rogozin et al. (2003)Go. We identified putatively orthologous T. pseudonana genes for 493 of the 684 sets. There were 321 T. pseudonana introns in conserved regions of alignment. Only 26/321 T. pseudonana introns (8.1%) were found at positions shared with introns from any of the other 8 species. This is a much smaller fraction than for each of the other 8 species (fig. 2). For instance, a 4 times higher fraction of P. falciparum intron positions (31.6%, 92/291) were shared with another species (this distribution is different from the T. pseudonana distribution at the P = 8.9 x 10–9 level by a Fisher Exact test). Figure 3 summarizes the pattern of intron position sharing with nonchromalveolates for T. pseudonana and P. falciparum.


Figure 2
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Fraction of shared introns. For each species, the fraction of intron positions that are shared with a member of another group (chromalveolates, plants, and fungamals) is shown for 493 sets of eukaryotic orthologs.

 

Figure 3
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Patterns of intron conservation for Plasmodium falciparum and Thalassiosira pseudonana in 493 sets of eukaryotic orthologs. Percentages show the fraction of P. falciparum/T. pseudonana intron positions that are shared with fungamals and/or plants.

 
Level of Intron Loss in T. pseudonana
Comparison of levels of intron loss/gain highlights the discrepancy between T. pseudonana and other lineages. For instance, of the 51 introns shared between P. falciparum and both fungamals and plants (and thus very likely to have been present in the T. pseudonana–P. falciparum ancestor), only 4 (7.8%) are shared with T. pseudonana; by contrast, more than a third (4/14) of T. pseudonana introns shared with both fungamals and plants are shared with P. falciparum (P = 8.0 x 10–8 by a Fisher Exact test). Assuming a plant–chromalveolate sister relationship, A. thaliana and T. pseudonana are equally closely related to fungamals. However, A. thaliana introns are 4 times more likely to be shared with a fungamal species (32.7% in A. thaliana vs. 7.5% in T. pseudonana). Similarly, 51/80 P. falciparum introns shared with fungamals are present in A. thaliana, a fraction 13 times higher than the fraction shared with T. pseudonana, despite the fact that T. pseudonana is more closely related to P. falciparum.

Shared T. pseudonana Intron Positions Are Broadly Shared
Are the few introns that are shared between T. pseudonana and other eukaryotes ancestral? Intron positions common to T. pseudonana and other species could either be ancestral introns retained in T. pseudonana or could be introns that have been gained in T. pseudonana at positions corresponding to intron positions in the other species (parallel insertion). If the shared T. pseudonana introns are due to parallel insertion, they would not be expected to show any particular phylogenetic distribution. For instance, an otherwise Homo sapiens-specific intron position would be about as likely to be shared with T. pseudonana as would an intron position common to many species. If instead the shared T. pseudonana intron positions largely reflect ancestral introns, we would expect the intron positions to be broadly shared, reflecting their ancestral nature.

Among the 4,961 intron positions that are present in at least one of the nondiatom species, 704 (14.2%) are shared between 2 or more broad phylogenetic groups (plants, fungamals, and apicomplexans; table 2 and fig. 4). If the intron positions that are shared between T. pseudonana and other species were attributable to parallel intron gain, we would therefore expect 14.2% of introns be shared with at least 2 groups. Instead, fully half (13/26) of T. pseudonana introns that are shared with another species are shared with at least 2 groups (P = 1.7 x 10–5 by a Fisher Exact test). This pattern is not expected if the shared intron positions between T. pseudonana and other species are due to secondary insertion in T. pseudonana and suggesting that at least perhaps half the shared T. pseudonana introns are ancestral retained introns.


View this table:
[in this window]
[in a new window]

 
Table 2 Thalassiosira pseudonana Intron Positions That Are Shared with At least 2 Broad Phylogenetic Groups (apicomplexans, plants, and fungamals)

 

Figure 4
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Shared T. pseudonana intron positions are broadly shared. The pattern of sharing between eukaryotic groups for all introns (left) and for only those introns that are shared with T. pseudonana (right) are shown. For each set, the fraction of introns that are shared between multiple groups (excluding T. pseudonana) is given.

 
Further, the 13 T. pseudonana introns that are shared with multiple groups also show a very high degree of conservation across species within those groups. For instance, whereas only 19.2% (126/656) of all intron positions in the data set that are shared between H. sapiens and a plant/fungus/apicomplexan (and thus presumably present in the animal ancestor) are retained in Drosophila melanogaster, 6/11 of these introns that are also shared with T. pseudonana are retained in D. melanogaster (P = 0.010 by a Fisher Exact test). Whereas only 13.3% (90/675) of introns shared between an animal and A. thaliana and/or P. falciparum are retained in S. pombe, fully three-fifths (8/13) of these introns that are also shared with T. pseudonana are retained in S. pombe (P = 9.1 x 10–5).

These patterns are consistent with these few ancestral introns having been selectively retained in T. pseudonana because they encode some important function or otherwise experience lower rates of intron loss. The 13 T. pseudonana introns that are shared with species from only one major phylogenetic group may be shared ancestral introns that have been lost from other species or may represent cases of parallel insertion.

Patterns of Shared T. pseudonana Intron Positions
The 13 intron positions shared broadly between T. pseudonana and other species are not randomly distributed across the 493 studied genes. Instead, 2 genes contain 2 of the broadly shared introns each (table 2, fig. 5). In a 3rd case, one of the broadly shared T. pseudonana intron positions is found in the same gene as one of the 13 T. pseudonana intron positions that is more narrowly shared (in this case between T. pseudonana and H. sapiens). This pattern is expected if T. pseudonana has largely retained the ancestral intron–exon structure in a small number of genes, but is not predicted from parallel insertion.


Figure 5
View larger version (71K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Two genes each contain 2 shared T. pseudonana introns. Intron positions are indicated by gray boxes. (A) KOG2031, small nuclear ribonucleoprotein F. (B) KOG4519, prefoldin.

 
Low Rate of Parallel Insertion in T. pseudonana
These data inform the more general debate about the incidence of parallel intron insertion (Tarrío et al. 2003Go; Qiu et al. 2004Go; Sadusky et al. 2004Go; Stoltzfus 2004Go; Csurös 2005Go; Nguyen et al. 2005Go; Sverdlov et al. 2005Go). Central to this debate has been analysis of 488 kb of conserved coding sequence in 684 sets of orthologous genes compiled by Rogozin et al. (2003)Go. The most sophisticated estimates come from 2 independent analyses, which reconstructed intron loss and gain evolution in the data set under the assumption that there was a constant number of fixed possible intron insertion sites. Under this assumption, both analyses estimated that there were approximately 35,000–41,000 possible intron sites (Csurös 2005Go; Nguyen et al. 2005Go). Because there are 7,236 total intron positions in the data set, this would predict that approximately 18–21% (7,236 divided by the number of sites) are occupied by an intron in the data set. Therefore, we would expect that 18–21% of new intron insertions would match an intron position from a species in this data set. Instead, we found that only 8.1% of introns in orthologous genes from T. pseudonana match introns positions from another species, suggesting that parallel insertion is less frequent than the previous analyses estimated (Csurös 2005Go; Nguyen et al. 2005Go). Because some half of these introns are broadly shared intron positions that are likely ancestral (see above), it is very likely that the rate of parallel insertion is lower still. There are 13 shared T. pseudonana intron positions that are not broadly shared (and thus are stronger candidates for parallel insertions), and 295 that are specific to T. pseudonana. Allowing that all 13 of the not-broadly shared introns represent parallel insertions, this would imply that some 4.2% of new intron insertions match positions of other introns, some 5-fold lower than expected from the previous analyses. We note that these estimates are more in keeping with an earlier analysis by Sverdlov et al. (2005)Go, which suggested that parallel insertion accounted for perhaps 5% of observed shared intron positions.

Phase of T. pseudonana Introns
Introns may fall in phase 0 (between 2 codons), phase 1 (between the 1st and 2nd bases of a codon), or phase 2 (between the 2nd and 3rd bases of a codon). A wide variety of studied species exhibit an excess of phase 0 introns over introns in the other 2 phases (Fedorov et al. 1992Go; Long et al. 1995Go; Ruvinsky et al. 2005Go). However, it is not known whether this is due to partial retention of a phase-biased ancestral set of introns or to phase-biased intron insertion (Dibb and Newman 1989Go; Fedorov et al. 1992Go; Long et al. 1995Go, 1998Go; Souza et al. 1998; Long and Rosenberg 2000Go; Paquette et al. 2000Go; Fedorov et al. 2001Go; Roy et al. 2001Go; Wolf et al. 2001Go; Lynch 2002Go; Fedorov et al. 2003; Coghlan and Wolfe 2004Go; Qiu et al. 2004Go; Sverdlov, Rogozin, et al. 2004; Roy and Gilbert 2005cGo; Ruvinsky et al. 2005Go; Vibranovski et al. 2005Go; Ruvinsky and Ward 2006Go; Nguyen et al. 2006Go). These questions relate to the fundamental questions about the ultimate origin of introns (e.g., Stoltzfus et al. 1994Go; Logsdon et al. 1995Go; Rzhetsky et al. 1997Go; Roy et al. 1999Go; Roy 2003Go; de Souza 2003Go; Elder 1991Go, 2000Go; de Roos 2004Go; Koonin 2006Go). Previous studies have shown that putatively recently gained introns also exhibit a bias toward insertion into phase 0, however, in many cases the methods by which these "recently inserted" introns were identified may be unable to confidently distinguish between recently gained and multiply lost introns (Krzywinski and Besansky 2002Go; Kiontke et al. 2004Go; Roy and Penny 2006bGo). Because the vast majority of T. pseudonana introns appear to have been gained since early eukaryotic evolution, the phase distribution of T. pseudonana introns is informative about patterns of intron insertion and the possibility of phase-biased intron gain.

Among the entire set of T. pseudonana genes, there were 3,394 (31.5%) phase 0 introns, 4,296 (39.9%) phase 1 introns, and 3,081 (28.6%) phase 2 introns (these are different at a P << 10–5 level by a chi-square test). Thus, although T. pseudonana shows a phase bias, supporting the notion of biased intron insertion, the pattern does not follow the general eukaryotic pattern in which the largest number of introns are in phase 0. Nonetheless, these data support the notion that introns may insert differentially into the 3 phases, in turn supporting the notion that the observed phase zero bias in other species could largely or completely reflect insertional biases.

Rates of Sequence and Intron Evolution
What explains the higher rates of intron loss and gain evolution in the T. pseudonana lineage? One possibility is that high rates of intron loss/gain in T. pseudonana simply reflect a generally higher rate of molecular evolution (a possibility suggested by an anonymous reviewer). We investigated this possibility by looking at 2 sources of data. First, we used ungapped positions in the 493 sets of orthologs among 9 species. Using standard models of protein sequence evolution, we estimated branch lengths (see Supplementary Material online). The estimated length of the branch leading to P. falciparum was roughly 1.5 times longer than that leading to T. pseudonana. Thus, if these estimates are accurate, rates of protein evolution in the lineage leading to T. pseudonana have been slower than in that leading to P. falciparum, not faster.

However, branch length estimation for such deeply diverged species with such sparse sampling of taxa poses a host of difficulties. Therefore, we undertook a survey of a variety of previous phylogenetic studies that included both taxa, analyzing branch lengths from published trees (e.g., Budin and Philippe 1998Go; Saldarriaga et al. 2001Go; Bapteste et al. 2002; Fast et al. 2002Go; Kuvardina et al. 2002Go; Harper et al. 2005Go). This literature survey showed no clear bias toward longer branch length in either apicomplexans or stramenopiles, even in instances where the much greater sampling density of stramenopiles is expected to lead to artificially longer estimated branch lengths. Thus, although the long evolutionary distances involved still make it difficult to make conclusive statements about relative sequence rate evolution, there does not seem to be evidence that the rapid rate of intron loss/gain evolution in T. pseudonana corresponds to a generally rapid rate of sequence evolution.

In general we see no a priori reason that rates of intron loss/gain evolution should correlate with rates of protein evolution. The 2 types of evolution are dependent on entirely different set of mutations (recombination with reverse-transcribed mRNAs for intron loss and possibly intron gain, possibly repetitive element insertion for intron gain, vs. point mutation for sequence evolution) and are likely to be governed by fundamentally different selective forces. Nonetheless, previous results have found that metazoan lineages that have experienced high rates of intron loss have also experienced high rates of protein sequence change (Raible et al. 2005Go). In addition, many of the eukaryotic lineages that were originally placed at the base of the eukaryotic tree apparently in part due to long-branch attraction arising from high rates of sequence change along these branches also have experienced high levels of intron loss (see, for instance, Logsdon 1998Go). The correspondence of sequence evolution to other types of genome evolution including intron loss/gain clearly requires further exploration.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
We show that the vast majority of introns in T. pseudonana have been gained since early eukaryotic evolution, unlike previously studied species. In all previously studied intron-rich eukaryotes, significant fractions of the intron positions are shared with distantly related species, implying that most of these introns have been retained from early eukaryotic ancestors (Rogozin et al. 2003Go; Slamovits and Keeling 2006Go). Here, we show that the diatom T. pseudonana constitutes a deviation from this pattern. In pairwise comparisons, no more than 4.2% of T. pseudonana’s introns were shared with each of 3 other chromalveolate species, the apicomplexans P. falciparum, T. parva, and T. gondii. Analysis of 493 sets of eukaryotic orthologs among plant, animal, fungal, and apicomplexan species further showed that only 8.1% of T. pseudonana introns in conserved coding regions are shared with any of the other species. Of these, half are broadly shared across widely diverged eukaryotic groups, suggesting that they are retained ancestral introns.

Large-Scale Intron Loss and Gain in T. pseudonana
Both T. pseudonana and P. falciparum belong to the broad phylogenetic group chromalveolates. In all, 31.3% of Plasmodium falciparum’s intron positions in conserved regions of 493 sets of putatively orthologous genes are shared with a nonchromalveolate species, most of which were presumably present in the chromalveolate ancestor. Of these only 5% (5/91) are also shared with T. pseudonana, thus T. pseudonana has apparently lost the vast majority of the ancestral chromalveolate introns.

On the other hand, only 8.1% of T. pseudonana introns are shared with any of the other species. That 31.3% of P. falciparum intron positions are shared with a nonchromalveolate implies that a significant number of introns present in the chromalveolate ancestor have both 1) been retained in P. falciparum (or more generally in apicomplexans) and 2) are represented in nonchromalveolates. If a significant fraction of T. pseudonana introns were present in the chromalveolate ancestor, we would similarly therefore expect a significant fraction to be shared with both P. falciparum and studied nonchromalveolate species. That only a small fraction of T. pseudonana introns are shared with nondiatoms thus strongly suggests that they have been largely gained since the chromalveolate ancestor. Thus, the current data indicates that T. pseudonana has experienced a large amount of both intron loss and gain since the chromalveolate ancestor.

Transition from Intron-Poor to Intron-Gain Genomes?
All previously studied intron-rich eukaryotic genomes share significant fractions of their intron positions with widely diverged species. This both implies that a large fraction of their introns are retained from early eukaryotic ancestors and, thus, that these lineages have never passed through a very intron-poor phase. Similarly, all known very intron-poor eukaryotic species are now known to be related to much more intron-rich taxa (which have retained large numbers of ancestral introns) and thus are known to be very intron poor due to nearly complete secondary loss of ancestral introns. These previous results imply that among previously studied taxa, "intron richness" arose only once (though subsequent fluctuation in intron number through intron-rich lineages is possible).

The current results open up the possibility of a 2nd emergence of an intron-rich genome from an ancestrally intron-poor genome. There are 2 possibilities for the history of intron number in the T. pseudonana lineage (fig. 6). The lineage could have remained relatively intron rich since the chromalveolate ancestor, with ongoing intron loss and gain gradually replacing the ancestral introns with more recently gained introns. Alternatively, the lineage could have experienced massive loss of ancestral introns leading to a nearly intronless genome (as implied for various known nearly intronless modern taxa [Rogozin et al. 2003Go]) with subsequent widespread intron insertion leading to modern intron richness. Further study of additional diatom species (some of which have ongoing genome projects) should help to answer this question.


Figure 6
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— Two possible evolutionary histories for T. pseudonana. The near-complete divergence of intron positions in T. pseudonana could either be explained by ongoing intron loss and gain leading to nearly complete turnover of introns (dashed trace) or to nearly complete loss of introns followed by subsequent intron gain (solid trace).

 
It is indeed striking that all previously studied lineages that have lost all of their ancestral introns remain very intron poor, suggesting that in most cases the transition from an intron-rich to intron-poor genome is not easily reversible. Why should this be? One possibility is that large-scale intron loss enables changes in the spliceosome which prevent the spliceosome from recognizing new gene-interrupting insertions. In this case, the reduced spliceosomal flexibility would cause intron-poor taxa to experience low levels of successful intron insertion and thus to remain intron poor.

Intron-Rich Ancestral Eukaryotes
Previous analyses of the Rogozin et al. data set have disagreed as to the ancestral density of introns in early eukaryotic ancestors and as to the relative importance of subsequent intron loss and gain (reviewed in Rodríguez-Trelles et al. 2006b). Roy and Gilbert (2005aGo, 2005b)Go estimated that the fungamal ancestor and the plant–fungamal ancestor had nearly as many introns as the most intron-rich modern species and that subsequent eukaryotic evolution has been dominated by an excess of intron loss over intron gain. By contrast, Csurös (2005)Go and Nguyen et al. (2005)Go independently concluded that ancestral intron densities were more modest and that the history of intron number had been more mixed, with some lineages experiencing dramatic increases in intron number (more gain than loss) while others experienced equally dramatic decreases (more loss than gain).

The difference in conclusions between these analyses reflects their different assumptions about the frequency of parallel insertion. Roy and Gilbert (2005aGo, 2005bGo) assumed that all intron positions that were shared between species reflected ancestral introns (essentially assuming that parallel insertions were infrequent enough to be ignored), whereas Csurös (2005)Go as well as Nguyen et al. (2005)Go assumed that there were a certain number of possible intron insertion sites, which (presumably to make the problem more tractable) were assumed to be constant through evolution. The various impacts of the models’ assumptions have not been thoroughly tested.

These results provide an opportunity to put an upper limit on the importance of parallel intron gain on intron evolution. In all 4.2% (13/307) putative intron insertions in T. pseudonana coincide with intron positions from another species. Following the previous models in assuming a constant number of possible intron insertion sites, this would suggest that the 7,236 intron positions represented across the 8 studied species (excluding T. pseudonana) occupy around 4.2% of the possible sites, suggesting that there are roughly 171,000 "possible intron insertion sites" out of the 488 kb of studied sequence. Even assuming that all 26/320 shared T. pseudonana introns represent parallel insertions suggests roughly 89,000 sites (= 320/26 x 7236), still more than twice the previous estimates.

Underestimating the total number of sites will lead to overestimation of the probability that 2 inserted introns will occupy the same site (parallel insertion), presumably leading to underestimation of the number of truly ancestral introns. We used these directly estimated numbers of "possible intron sites" to reconstruct intron evolution of the 684 groups of eukaryotic orthologs using the method of Csurös. The results are shown in figure 7. For each ancestral node, the estimated number of ancestral introns is much closer to that previously estimated by Roy and Gilbert (2005bGo) than to that estimated by Csurös (2005)Go or Nguyen et al. (2005)Go. This suggests that the methods of Csurös and Nguyen et al. may have greatly overestimated the importance of parallel insertion, and that Roy and Gilbert's method, which ignores parallel insertion, may give more accurate estimates.


Figure 7
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 7.— Estimated ancestral intron densities (introns per kilobase) in 684 sets of orthologous genes. "Current" values were estimated using the Csurös’ method using estimated numbers of "possible intron sites" assuming that 13 and 26 of the shared T. pseudonana intron positions represent parallel insertions, as explained in the text. "Previous" values give corresponding previous estimates by Roy and Gilbert (2005aGo; RG), Csurös (2005)Go, and Dollo parsimony (as used by Rogozin et al. 2003Go). Bold text indicates the previous estimate that is closest to the current estimates.

 

    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
We show that, unlike all previously studied intron-rich eukaryotic species, the vast majority of introns in the intron-rich diatom T. pseudonana are found at unique positions, suggesting that T. pseudonana has lost nearly all of its ancestral introns and gained a large number of new introns. These results open up the possibility that genomes rich in spliceosomal introns may have arisen multiple times in eukaryotic evolution, however, further study of other diatom species will be necessary. The finding of a very low level of parallel insertion supports the notion that early eukaryotic ancestors were very intron rich and thus furthers the idea that intron density was very high early on in eukaryotic evolution and that most (or even all) eukaryotic lineages have since experienced a reduction in intron number.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
We thank Manuel Irimia for helpful discussions and constructive comments, Jason Stajich for help with the ancestral intron density estimates, and Matt Phillips for phylogenetic analyses, branch length estimation, and for general wise pronouncements.


    Footnotes
 
Arndt von Haeseler, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 

    Archibald JM, O'Kelly CJ, Doolittle WF. The chaperonin genes of jakobid and jakobid-like flagellates: implications for eukaryotic evolution. Mol Biol Evol. (2002) 19:422–431.[Abstract/Free Full Text]

    Armbrust EV, Berges JA, Bowler C. (45 co-authors). The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science (2004) 306:79–86.[Abstract/Free Full Text]

    Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res. (2004) 32:3724–3733.[Abstract/Free Full Text]

    Bapteste E, Brinkmann H, Lee JA, (11 co-authors). The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci USA (2002) 99:1414–1419.[Abstract/Free Full Text]

    Budin K, Philippe H. New insights into the phylogeny of eukaryotes based on ciliate hsp70 sequences. Mol Biol Evol. (1998) 15:943–956.[Abstract]

    Coghlan A, Wolfe K. Origins of recently gained introns in Caenorhabditis. Proc Natl Acad Sci USA (2004) 101:11362–11367.[Abstract/Free Full Text]

    Collins L, Penny D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol. (2005) 22:1053–1066.[Abstract/Free Full Text]

    Csurös M. Likely scenarios of intron evolution. In Comparative Genomics, Vol. 3678 of LNBI. McLysaght A, Huson DH, editors; Heidelberg, Germany: Springer-Verlag p. 47–60 (2005).

    de Roos ADG. Origins of introns based on the definition of exon modules and their conserved interfaces. Bioinformatics (2004) 21:2–9.[CrossRef][Web of Science][Medline]

    de Souza SJ. The emergence of a synthetic theory of intron evolution. Genetica (2003) 118:117–121.[CrossRef][Web of Science][Medline]

    de Souza SJ, Long M, Klein RJ, Roy S, Lin S, Gilbert W. Toward a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins. Proc Natl Acad Sci USA (1998) 95:5094–5099.[Abstract/Free Full Text]

    Dibb NJ, Newman AJ. Evidence that introns arose at protosplice sites. EMBO J (1989) 8:2015–2021.[Web of Science][Medline]

    Edvardsen RB, Lerat E, Maeland AD, Flat M, Tewari R, Jensen MF, Lehrach H, Reinhardt R, Seo HC, Chourrout D. Hypervariable and highly divergent intron/exon organizations in the chordate Oikopleura dioica. J Mol Evol. (2004) 59:448–457.[CrossRef][Web of Science][Medline]

    Elder D. Evolution of split genes. J Theor Biol. (1991) 152:427–428.[CrossRef][Web of Science][Medline]

    Elder D. Split gene origin and periodic introns. J Theor Biol. (2000) 207:455–472.[CrossRef][Web of Science][Medline]

    Fast NM, Xue L, Bingham S, Keeling PJ. Re-examining alveolate evolution using multiple protein molecular phylogenies. J Eukaryot Microbiol. (2002) 49:30–37.[CrossRef][Web of Science][Medline]

    Fedorov A, Cao X, Saxonov S, de Souza SJ, Roy SW, Gilbert W. Intron distribution difference for 276 ancient and 131 modern genes suggests the existence of ancient introns. Proc Natl Acad Sci USA (2001) 98:13177–13182.[Abstract/Free Full Text]

    Fedorov A, Fedorova L. Introns: mighty elements from the RNA world. J Mol Evol. (2004) 59:718–721.[CrossRef][Web of Science][Medline]

    Fedorov A, Fedorova L. Where is the difference between the genomes of humans and annelids? Genome Biol. (2006) 7:203.[CrossRef][Medline]

    Fedorov A, Merican AF, Gilbert W. Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci USA (2002) 99:16128–16133.[Abstract/Free Full Text]

    Fedorov A, Roy S, Fedorova L, Gilbert W. Mystery of intron gain. Genome Res. (2003) 13:2236–2241.[Abstract/Free Full Text]

    Fedorov A, Suboch G, Bujakov M, Fedorova L. Analysis of nonuniformity in intron phase distribution. Nucleic Acids Res. (1992) 20:2553–2557.[Abstract/Free Full Text]

    Giroux M, Clancy M, Baier J, Ingham L, McCarty D, Hannah L. De novo synthesis of an intron by the maize transposable element dissociation. Proc Natl Acad Sci USA (1994) 91:12150–12154.[Abstract/Free Full Text]

    Guiliano D, Hall N, Jones S, Clark L, Corton C, Barrell B, Blaxter M. Conservation of long-range synteny and microsynteny between the genomes of two distantly related nematodes. Genome Biol. (2002) 3:RESEARCH0057.[Medline]

    Harper JT, Wannders E, Keeling PJ. On the monophyly of chromalveolates using a six-protein phylogeny of eukaryotes. Int J Syst Evol Microbiol. (2005) 55:487–496.[Abstract/Free Full Text]

    Hickey D, Benkel B. Introns as relict retrotransposons: implications for the evolutionary origin of eukaryotic mRNA splicing mechanisms. J Theor Biol. (1986) 121:283–291.[CrossRef][Web of Science][Medline]

    Huson D. Splitstree—a program for analyzing and visualizing evolutionary data. Bioinformatics (1998) 14:68–73.[Abstract/Free Full Text]

    Jeffares DC, Mourier T, Penny D. The biology of intron gain and loss. Trends Genet. (2006) 22:16–22.[CrossRef][Web of Science][Medline]

    Jones DT, Taylor WR, Thornton JM. Rapid generation of mutation data matrices from protein sequences. Bioinformatics (1992) 3:275–282.

    Kiontke K, Gavin NP, Raynes Y, Roehrig C, Piano F, Fitch DHA. Caenorhabditis phylogeny predicts convergence of hermaphroditism and extensive intron loss. Proc Natl Acad Sci USA (2004) 101:9003–9008.[Abstract/Free Full Text]

    Koonin EV. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct (2006) 1:22.[CrossRef][Medline]

    Knowles DG, McLysaght A. High rate of recent intron gain and loss in simultaneously duplicated Arabidopsis genes. Mol Biol Evol. (2006) 23:1548–1557.[Abstract/Free Full Text]

    Krzywinski J, Besansky NJ. Frequent intron loss in the white gene: a cautionary tale for phylogeneticists. Mol Biol Evol. (2002) 19:362–366.[Abstract/Free Full Text]

    Kuvardina ON, Leander BS, Aleshin VV, Myl'nikov AP, Keeling PJ, Simdyanov TG. The phylogeny of colpodellids (Alveolata) using small subunit rRNA gene sequences suggests they are the free-living sister group to apicomplexans. J Eukaryot Microbiol. (2002) 49:498–504.[CrossRef][Web of Science][Medline]

    Lin H, Zhu W, Silva J, Gu X, Buell CR. Intron gain and loss in segmentally duplicated genes in rice. Genome Biol. (2006) 7:R41.[CrossRef][Medline]

    Lin K, Zhong DY. The excess of 5' introns in eukaryotic genomes. Nucleic Acids Res. (2005) 33:6522–6527.[Abstract/Free Full Text]

    Logsdon JM Jr. The recent origins of spliceosomal introns revisited. Curr Opin Genet Dev. (1998) 8:637–648.[CrossRef][Web of Science][Medline]

    Logsdon JM Jr, Tyshenko MG, Dixon C, D-Jafari J, Walker VK, Palmer JD. Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the introns-late theory. Proc Natl Acad Sci USA (1995) 92:8507–8511.[Abstract/Free Full Text]

    Long M, de Souza SJ, Rosenberg C, Gilbert W. Relationship between "proto-splice sites" and intron phases: evidence from dicodon analysis. Proc Natl Acad Sci USA (1998) 95:219–223.[Abstract/Free Full Text]

    Long M, Rosenberg C. Testing the "proto-splice sites" model of intron origin: evidence from analysis of intron phase correlations. Mol Biol Evol. (2000) 17:1789–1796.[Abstract/Free Full Text]

    Long M, Rosenberg C, Gilbert W. Intron phase correlations and the evolution of the intron/exon structure of genes. Proc Natl Acad Sci USA (1995) 92:12495–12499.[Abstract/Free Full Text]

    Lynch M. Intron evolution as a population-genetic process. Proc Natl Acad Sci USA (2002) 99:6118–6123.[Abstract/Free Full Text]

    Mourier T, Jeffares DC. Eukaryotic intron loss. Science (2003) 300:1393.[Free Full Text]

    Nguyen HD, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol (2005) 1:e79.[CrossRef][Medline]

    Nguyen HD, Yoshihama M, Kenmochi N. Phase distribution of spliceosomal introns: implications for intron origin. BMC Evol Biol. (2006) 6:69.[CrossRef][Medline]

    Nielsen C, Friedman B, Birren B, Burge C, Galagan J. Patterns of intron gain and loss in fungi. PLoS Biol (2004) 2. e422.

    Niu DK, Hou WR, Li SW. mRNA-mediated intron losses: evidence from extraordinarily large exons. Mol Biol Evol. (2005) 22:1475–1481.[Abstract/Free Full Text]

    Palmer JD, Logsdon JM. The recent origin of introns. Curr Opin Gen Dev. (1991) 1:470–477.[CrossRef][Medline]

    Paquette SM, Bak S, Feyereisen R. Intron-exon organization and phylogeny in a large superfamily, the paralogous cytochrome P450 genes of Arabidopsis thaliana. DNA Cell Biol. (2000) 19:307–317.[CrossRef][Web of Science][Medline]

    Perumal BS, Sakharkar KR, Chow VT, Pandjassarame K, Sakharkar MK. Intron position conservation across eukaryotic lineages in tubulin genes. Front Biosci (2005) 10:2412–2419.[Web of Science][Medline]

    Philippe H, Lopez P, Brinkmann H, Budin K, Germot A, Laurent J, Moreira D, Müller M, Le Guyader H. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc R Soc Lond B Biol Sci. (2000) 267:1213–1221.[Medline]

    Poole AM, Jeffares DC, Penny D. The path from the RNA world. J Mol Evol. (1998) 46:1–17.[CrossRef][Web of Science][Medline]

    Qiu WG, Schisler N, Stoltzfus A. The evolutionary gain of spliceosomal introns: sequences and phase preferences. Mol Biol Evol. (2004) 21:1252–1263.[Abstract/Free Full Text]

    Raible F, Tessmar-Raible K, Osoegawa K, et al, (12 co-authors). Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science (2005) 310:1325–1326.[Abstract/Free Full Text]

    Rodríguez-Trelles F, Tarrío R, Ayala FJ. Models of spliceosomal intron proliferation in the face of widespread ectopic expression. Gene (2006a) 366:201–208.[CrossRef][Web of Science][Medline]

    Rodríguez-Trelles F, Tarrío R, Ayala FJ. Origin and evolution of spliceosomal introns. Annu Rev Genet. (2006b) 40:47–76.[CrossRef][Medline]

    Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol. (2003) 13:1512–1517.[CrossRef][Web of Science][Medline]

    Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV. Analysis of evolution of exon-intron structure of eukaryotic genes. Brief Bioinform (2005) 6:118–134.[Abstract/Free Full Text]

    Roy SW. Recent evidence for the exon theory of genes. Genetica (2003) 118:251–266.[CrossRef][Web of Science][Medline]

    Roy SW. The origin of recent introns: transposons? Genome Biol. (2004) 5:251.[CrossRef][Medline]

    Roy SW, Fedorov A, Gilbert W. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA (2003) 100:7158–7162.[Abstract/Free Full Text]

    Roy SW, Gilbert W. Complex early genes. Proc Natl Acad Sci USA (2005a) 102:1986–1991.[Abstract/Free Full Text]

    Roy SW, Gilbert W. Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Natl Acad Sci USA (2005b) 102:5773–5778.[Abstract/Free Full Text]

    Roy SW, Gilbert W. The pattern of intron loss. Proc Natl Acad Sci USA (2005c) 102:713–718.[Abstract/Free Full Text]

    Roy SW, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. (2006) 7:211–221.[CrossRef][Web of Science][Medline]

    Roy SW, Hartl DL. Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number. Genome Res. (2006) 16:750–756.[Abstract/Free Full Text]

    Roy SW, Irimia M, Penny D. Very little intron gain in Entamoeba histolytica genes laterally transferred from prokaryotes. Mol Biol Evol. (2006) 23:1824–1827.[Abstract/Free Full Text]

    Roy SW, Lewis BP, Fedorov A, Gilbert W. Footprints of primordial introns on the eukaryotic genome. Trends Genet. (2001) 17:496–499.[CrossRef][Web of Science][Medline]

    Roy SW, Nosaka M, de Souza SJ, Gilbert W. Centripetal modules and ancient introns. Gene (1999) 238:85–91.[CrossRef][Web of Science][Medline]

    Roy SW, Penny D. Large-scale intron conservation and order-of-magnitude variation in intron loss/gain rates in apicomplexan evolution. Genome Res. (2006a) 16:1270–1275.[Abstract/Free Full Text]

    Roy SW, Penny D. Smoke without fire: most reported cases of intron gain in nematodes instead reflect intron losses. Mol Biol Evol. (2006b) 23:2259–2262.[Abstract/Free Full Text]

    Roy SW, Penny D. Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol. (2007) 24:171–181.[Abstract/Free Full Text]

    Ruvinsky A, Eskesen ST, Eskesen FN, Hurst LD. Can codon usage bias explain intron phase distributions and exon symmetry? J Mol Evol. (2005) 60:99–104.[CrossRef][Web of Science][Medline]

    Ruvinsky A, Ward W. A gradient in the distribution of introns in eukaryotic genes. J Mol Evol. (2006) 63:136–141.[CrossRef][Web of Science][Medline]

    Rzhetsky A, Ayalam FJ, Hsu LC, Chang C, Yoshida A. Exon/intron structure of aldehyde dehydrogenase genes supports the "introns-late" theory. Proc Natl Acad Sci USA (1997) 94:6820–6825.[Abstract/Free Full Text]

    Sadusky T, Newman AJ, Dibb NJ. Exon junction sequences as cryptic splice sites: implications for intron origin. Curr Biol. (2004) 14:505–509.[Web of Science][Medline]

    Saldarriaga JF, Taylor FJR, Keeling PJ, Cavalier-Smith T. Dinoflagellate nuclear SSU rRNA phylogeny suggests multiple plastid losses and replacements. J Mol Evol. (2001) 53:204–213.[CrossRef][Web of Science][Medline]

    Seo H-C, Kube M, Edvardsen RB, et al, (11 co-authors). Miniature genome in the marine chordate Oikopleura dioica. Science (2001) 294:2506.[Free Full Text]

    Slamovits CH, Keeling PJ. A high density of ancient spliceosomal introns in oxymonad excavates. BMC Evol Biol. (2006) 6:34.[CrossRef][Medline]

    Sogin ML. Early evolution and the origin of eukaryotes. Curr Opin Genet Dev. (1991) 1:457–463.[CrossRef][Medline]

    Stajich JE, Dietrich FS. Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans. Eukaryot Cell. (2006) 5:789–793.[Abstract/Free Full Text]

    Stechmann A, Cavalier-Smith T. Rooting the eukaryote tree by using a derived gene fusion. Science (2002) 297:89–91.[Abstract/Free Full Text]

    Stoltzfus A. Origin of introns—early or late? Nature (1994) 369:526–527.[Medline]

    Stoltzfus A. Molecular evolution: introns fall into place. Curr Biol. (2004) 14:R351–R352.[CrossRef][Web of Science][Medline]

    Stoltzfus A, Spencer DF, Zuker M, Logsdon JM, Doolittle WF. Testing the exon theory of genes: the evidence from protein structure. Science (1994) 265:202–207.[Abstract/Free Full Text]

    Sverdlov AV, Babenko VN, Rogozin IB, Koonin EV. Preferential loss and gain of introns in 3' portions of genes suggests a reverse-transcription mechanism of intron insertion. Gene (2004) 338:85–91.[CrossRef][Web of Science][Medline]

    Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Reconstruction of ancestral protosplice sites. Curr Biol. (2004) 14:1505–1508.[CrossRef][Web of Science][Medline]

    Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Conservation versus parallel gains in intron evolution. Nucleic Acids Res. (2005) 33:1741–1748.[Abstract/Free Full Text]

    Swofford DL. PAUP*: phylogenetic analysis using parsimony (* and other methods). In: Version 4.0b8 (2002) Sunderland (MA): Sinauer Associates.

    Tarrío R, Rodríguez-Trelles F, Ayala FJ. A new Drosophila spliceosomal intron position is common in plants. Proc Natl Acad Sci USA (2003) 50:123–130.[Medline]

    Venkatesh B, Ning Y, Brenner S. Late changes in spliceosomal introns define clades in vertebrate evolution. Proc Natl Acad Sci USA (1999) 96:10267–10271.[Abstract/Free Full Text]

    Vibranovski MD, Sakabe NJ, de Oliveira RS, de Souza SJ. Signs of ancient and modern exon-shuffling are correlated to the distribution of ancient and modern domains along proteins. J Mol Evol. (2005) 61:341–350.[CrossRef][Web of Science][Medline]

    Wolf YI, Kondrashov FA, Koonin EV. Footprints of primordial introns on the eukaryotic genome: still no clear traces. Trends Genet. (2001) 17:499–501.[CrossRef][Web of Science][Medline]

    Yoshihama M, Nakao A, Nguyen HD, Kenmochi N. Analysis of ribosomal protein gene structures: implications for intron evolution. PLoS Genet. (2006) 2:e25.[CrossRef][Medline]

Accepted for publication February 26, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
R. Tarrio, F. J. Ayala, and F. Rodriguez-Trelles
From the Cover: Alternative splicing: A missing piece in the puzzle of intron gain
PNAS, May 20, 2008; 105(20): 7223 - 7228.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Csuros, I. B. Rogozin, and E. V. Koonin
Extremely Intron-Rich Genes in the Alveolate Ancestors Inferred with a Flexible Maximum-Likelihood Approach
Mol. Biol. Evol., May 1, 2008; 25(5): 903 - 911.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Irimia and S. W. Roy
Spliceosomal introns as tools for genomic and evolutionary analysis
Nucleic Acids Res., March 1, 2008; 36(5): 1703 - 1712.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y.-H. Loh, S. Brenner, and B. Venkatesh
Investigation of Loss and Gain of Introns in the Compact Genomes of Pufferfishes (Fugu and Tetraodon)
Mol. Biol. Evol., March 1, 2008; 25(3): 526 - 535.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/7/1447    most recent
msm048v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?