MBE Advance Access originally published online on March 1, 2007
Molecular Biology and Evolution 2007 24(5):1093-1096; doi:10.1093/molbev/msm037
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letters |
The Evolution of Spliceosomal Introns in Alveolates
Frontier Science Research Center, University of Miyazaki, Kiyotake, Miyazaki, Japan
E-mail: kenmochi{at}med.miyazaki-u.ac.jp.
| Abstract |
|---|
|
|
|---|
Many issues concerning the evolution of spliceosomal introns remain poorly understood. In this respect, the reconstruction of the evolution of introns in deep branching species such as alveolates is of special significance. In this study, we inferred the intron evolution in alveolates using 3,368 intron positions in 162 orthologs from 10 species (9 alveolates and 1 outgroup, Homo sapiens). We found that although very few intron gains and losses have occurred in Theileria and Plasmodium recently, many intron gains and losses have occurred in the evolution of alveolates. Thus, the rates of intron gain and loss in alveolates have varied greatly across time and lineage. Our results seem to support the notion that massive intron gains and losses have occurred during short episodes, perhaps coinciding with major evolutionary events.
Key Words: intron evolution intron gain intron loss spliceosomal intron alveolate evolution apicomplexan evolution
| Introduction |
|---|
|
|
|---|
Eukaryotic genes are often interrupted by extra DNA sequences, which must be spliced out of the pre-mRNA by the spliceosome. These extra sequences are called spliceosomal introns (or introns for short) as opposed to exons, the parts of genes that encode proteins. Many issues concerning the evolution of spliceosomal introns, for example, the intron density of the last eukaryotic common ancestor, the evolutionary forces underlying intron evolution, and the evolutionary significance of introns, remain poorly understood. Reconstruction of intron evolution in deep branching species, such as in the alveolate kingdom, is of special interest for the clarification of these issues.
Most studies about intron evolution have so far been limited to the crown group (i.e., containing metazoans, fungi, and plants) (Rogozin et al. 2003
; Roy et al. 2003
; Nielsen et al. 2004
; Qiu et al. 2004
; Yoshihama et al. 2007
). Two recent studies focused on alveolates, but only at a relatively late stage (Roy and Hartl 2006
; Roy and Penny 2006
). These studies showed that Plasmodium and Theileria underwent very low rates of intron gain and loss during the last
100 Myr. The evolution of introns during the early stage of these 2 lineages as well as in other alveolates remains unknown.
This study is aimed at reconstructing the dynamics of intron gain and loss throughout alveolate evolution. We compiled 162 gene orthologs from 9 alveolates and an outgroup, Homo sapiens. The 9 alveolates are: Tetrahymena thermophila, Perkinsus marinus, Cryptosporidium parvum, Toxoplasma gondii, Babesia bigemina, Theileria parva, Theileria annulata, Plasmodium falciparum, and Plasmodium vivax. Figure 1 shows the evolution of 3,368 intron positions in 162,939 bp of alignment regions. The intron density in the last common ancestor of alveolates is about one third of that in humans and is roughly the same as that in T. thermophila. Since then, the evolutionary paths leading to P. marinus, T. gondii, and Theileria were rich in intron gains, whereas those leading to C. parvum, B. bigemina, and Plasmodium were rich in intron losses. The total number of intron losses slightly outnumbered intron gains (as 2,766 vs. 2,242).
|
Introns are distributed differently within genes depending on the intron richness of their genomes. In intron-poor genomes, introns are often located within the 5' portions of genes, whereas they are evenly distributed in intron-rich genomes. The mRNA-mediated intron loss has been suggested to be the cause of this pattern (Mourier and Jeffares 2003
On the one hand, our results show that very few intron gains and losses have occurred recently in Plasmodium and Theileria. Out of the 3,368 intron positions we investigated, we found no intron gains or losses in either P. vivax or P. falciparum since they diverged from each other. Similarly, we found only one intron gain and one intron loss in T. parva and no intron gains or losses in T. annulata since they diverged from each other. These results are consistent with previous results obtained using a greater number of gene orthologs but fewer species (Roy and Hartl 2006
; Roy and Penny 2006
), suggesting that trends of intron gain and loss can be revealed by using a moderate number of gene orthologs.
On the other hand, our results also show that many intron gains and losses occurred at early periods during the evolutionary paths leading to Plasmodium and Theileria. For example, many intron gains occurred during the period from the time of divergence of P. marinus to the apicomplexa ancestor, whereas many intron losses occurred during the following period. In addition, the evolutionary path leading to P. marinus was very rich in intron gains, whereas the evolutionary path leading to C. parvum was very rich in intron losses. Thus, the rates of intron gain and loss in alveolates have varied greatly across time and lineage (fig. 2 and Supplementary Material 2, Supplementary Material online).
|
Although certain other considerations, such as saturation of sequence changes and the amino acid substitution model may affect the estimated times of divergence (Roger and Hug 2006
32,000) taken from highly conserved sequence alignments weakens the potential impact of these considerations. Secondly, the variations in estimated times of divergence are quite small (often less than 2-fold; Roger and Hug 2006
Our method (Nguyen et al. 2005
) inferred that 269 (22%) of the 1,217 introns in H. sapiens are shared with 471 introns in the alveolata ancestor. Thus, using parsimony, at least 22% of the introns in H. sapiens must date back to the last common ancestor of H. sapiens with alveolates. Assuming an intron-free ancestral genome at the time of mitochondrial endosymbiosis, one or several large-scale intron gain events must have occurred during the period between the time of endosymbiosis and the emergence of the last common ancestor of metazoans and alveolates. One such event may have happened right after endosymbiosis and may have been the cause of nucleuscytosol compartmentalization, a major step in the emergence of the first eukaryote (Koonin 2006
; Martin and Koonin 2006
). Because none of the sequenced genomes that diverged before the last common ancestor of metazoans and alveolates has so far been shown to be intron rich, other large-scale intron insertion events may have also occurred during the period between the last eukaryotic common ancestor and the last common ancestor of metazoans and alveolates.
The study of intron gains and losses in paralogous gene families from the crown group has suggested that large-scale intron gains and losses may have occurred during transitional evolutionary epochs (Babenko et al. 2004
). In this study, we have shown further that the evolution of introns in an early branching kingdom, the alveolates, also follows this pattern. The large-scale episodic occurrence of intron gain and loss can be accounted for by 2 models. The neutralist model proposes that transitional evolutionary episodes were often associated with population bottlenecks, which would have weakened purifying selection and facilitated the fixation of otherwise deleterious mutations, such as intron gains and losses (Lynch and Conery 2003
; Babenko et al. 2004
). Alternatively, large-scale intron gains and losses may be associated with occasional invasions of transposable elements, whose reverse transcriptases may have determined the rates of intron gain and loss (Roy and Hartl 2006
). The rigorous testing of these models will be the subject of future research.
| Methods |
|---|
|
|
|---|
Compilation of the Data Set
Supplementary Material 3 (Supplementary Material online) shows the sources of the genome sequences and annotations of the 10 species studied. We used the same method for collecting gene orthologs as published elsewhere (Nguyen et al. 2006
Because there were no annotations for P. marinus and B. bigemina, the protein sequences of T. annulata were used as inputs for the Blast program (Altschul et al. 1997
) to find DNA regions in these 2 genomes that most matched those of T. annulata. The regions were then extracted and their gene structures were manually constructed based on sequence similarity with T. annulata and other species. In the end, we were able to obtain 162 (full or partial) orthologs from all 10 species (Supplementary Materials 4 and 5, Supplementary Material online).
Construction of Phylogenetic Trees and Intron Evolution
Multiple sequence alignments of each of these orthologs were built using MUSCLE (Edgar 2004
). An ad hoc program was written in C to map the intron positions on these alignments and extract an intron presence/absence matrix of all intron positions in the conserved regions. We also examined each alignment manually to add other intron positions, which were not in the conserved regions and clearly not misaligned, to the intron presence/absence matrix (Supplementary Material 6, Supplementary Material online). All conserved regions having a length
50 amino acids were extracted and concatenated together, and the SEQBOOT (100 replicates), PROTDIST, NEIGHBOR, and CONSENSE programs of the PHYLIP package (Felsenstein and Churchill 1996
) were used to build a phylogenetic tree for the species (Supplementary Material 7, Supplementary Material online). Finally, the intron presence/absence matrix and the phylogenetic tree (with H. sapiens as the outgroup) were used as inputs for our maximum likelihood method (Nguyen et al. 2005
) to infer intron evolution.
Estimation of the Times of Divergence
A simple algorithm was used to construct a linearized tree from the phylogenetic tree. Let hX be the height of node X and lXY be the length of branch XY. In our algorithm, the tree is traversed by the postorder traversal so that at each internal node, T, the heights of its 2 child nodes U and V are already known. The algorithm first computes: h1 = hU + lTU and h2 = hV + lTV. If h1
h2, hT is assigned the value of h1 and the heights of all nodes on the subtree rooted at V are scaled up with the h1/h2 ratio. Otherwise, the roles of U and V are exchanged when computing hT. The advantage of our algorithm as compared with the one proposed by Takezaki et al. (1995)
is that none of the branches on the linearized tree will have a length of zero. Finally, we calibrated the times of divergence using the assumption that the ciliate T. thermophila branched off the tree
800 MYA (Douzery et al. 2004
).
| Supplementary Materials |
|---|
|
|
|---|
Supplementary Materials are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
The authors would like to thank Dr Tetsuo Hashimoto for useful comments. This study was supported by Grants-in-Aid for Scientific Research (14035103, 15310135, 188093, and 17770207) from the Ministry of Education, Culture, Sports, Science and Technology and the Japan Society for the Promotion of Science (JSPS). H.D.N. is a research fellow of the JSPS (17-05174). Preliminary sequence data of Toxoplasma gondii and Plasmodium vivax, Perkinsus marinus, and Babesia bigemina were obtained from ToxoDB (http://www.toxoDB.org), PlasmoDB (http://www.plasmoDB.org), the Institute for Genomic Research (http://www.tigr.org/tdb/e2k1/pmg/), and the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/Projects/B_bigemina/), respectively. Sequencing of T. gondii, P. marinus, and B. bigemina is funded by the National Institute of Allergy and Infectious Disease, the National Science Foundation, and the Wellcome Trust Sanger Institute, respectively. Sequencing of P. vivax is funded by the National Institute of Allergy and Infectious Diseases, the US Department of Defense, and the Burroughs Wellcome Fund.
| Footnotes |
|---|
Takashi Gojobori, Associate Editor
| References |
|---|
|
|
|---|
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:33893402.
Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res (2004) 32:37243733.
Douzery EJP, Snell EA, Bapteste E, Delsuc F, Philippe H. The timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? Proc Natl Acad Sci USA (2004) 101:1538615391.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 32:17921797.
Felsenstein J, Churchill GA. A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol (1996) 13:93104.[Abstract]
Koonin EV. The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? Biol Direct (2006) 1:22.[CrossRef][Medline]
Lynch M, Conery JS. The origin of genome complexity. Science (2003) 302:14011404.
Martin W, Koonin EV. Introns and the origin of nucleus-cytosol compartmentalization. Nature (2006) 440:4145.[CrossRef][Medline]
Mourier T, Jeffares DC. Eukaryotic intron loss. Science (2003) 300:1393.
Nguyen HD, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol (2005) 1:e79.[CrossRef][Medline]
Nguyen HD, Yoshihama M, Kenmochi N. Phase distribution of spliceosomal introns: implications for intron origin. BMC Evol Biol (2006) 6:69.[CrossRef][Medline]
Nielsen CB, Friedman B, Birren B, Burge CB, Galagan JE. Patterns of intron gain and loss in fungi. PLoS Biol (2004) 2:e422.[CrossRef][Medline]
Qiu WG, Schisler N, Stoltzfus A. The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol (2004) 21:12521263.
Roger AJ, Hug LA. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Philos Trans R Soc Lond B Biol Sci (2006) 361:10391054.
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol (2003) 13:15121517.[CrossRef][Web of Science][Medline]
Roy SW, Fedorov A, Gilbert W. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA (2003) 100:71587162.
Roy SW, Hartl DL. Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number. Genome Res (2006) 16:750756.
Roy SW, Penny D. Large-scale intron conservation and order-of magnitude variation in intron loss/gain rates in apicomplexan evolution. Genome Res (2006) 16:12701275.
Sverdlov AV, Babenko VN, Rogozin IB, Koonin EV. Preferential loss and gain of intron in 3' portions of genes suggests a reverse-transcription mechanism on intron insertion. Gene (2004) 338:8591.[CrossRef][Web of Science][Medline]
Takezaki N, Rzhetsky A, Nei M. Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol (1995) 12:823833.[Abstract]
Yoshihama M, Nakao A, Nguyen HD, Kenmochi N. Analysis of ribosomal protein gene structures: implications for intron evolution. PLoS Genet (2006) 2:e25.[CrossRef][Medline]
Yoshihama M, Nguyen HD, Kenmochi N. Intron dynamics in ribosomal protein genes. PLoS ONE (2007) 2:e141.[CrossRef]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. R. Omilian, D. G. Scofield, and M. Lynch Intron Presence-Absence Polymorphisms in Daphnia Mol. Biol. Evol., October 1, 2008; 25(10): 2129 - 2139. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy and M. Irimia Origins of Human Malaria: Rare Genomic Changes and Full Mitochondrial Genomes Confirm the Relationship of Plasmodium falciparum to Other Mammalian Parasites but Complicate the Origins of Plasmodium vivax Mol. Biol. Evol., June 1, 2008; 25(6): 1192 - 1198. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Csuros, I. B. Rogozin, and E. V. Koonin Extremely Intron-Rich Genes in the Alveolate Ancestors Inferred with a Flexible Maximum-Likelihood Approach Mol. Biol. Evol., May 1, 2008; 25(5): 903 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy and D. Penny Widespread Intron Loss Suggests Retrotransposon Activity in Ancient Apicomplexans Mol. Biol. Evol., September 1, 2007; 24(9): 1926 - 1933. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


