MBE Advance Access originally published online on May 23, 2006
Molecular Biology and Evolution 2006 23(8):1548-1557; doi:10.1093/molbev/msl017
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
High Rate of Recent Intron Gain and Loss in Simultaneously Duplicated Arabidopsis Genes
Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
E-mail: aoife.mclysaght{at}tcd.ie.
| Abstract |
|---|
|
|
|---|
We examined the gene structure of a set of 2563 Arabidopsis thaliana paralogous pairs that were duplicated simultaneously 2060 MYA by tetraploidy. Out of a total of 23,164 introns in these genes, we found that 10,004 pairs have been conserved and 578 introns have been inserted or deleted in the time since the duplication event. This intron insertion/deletion rate of 2.7 x 103 to 9.1 x 104 per site per million years is high in comparison to previous studies. At least 56 introns were gained and 39 lost based on parsimony analysis of the phylogenetic distribution of these introns. We found weak evidence that genes undergoing intron gain and loss are biased with respect to gene ontology terms. Gene pairs that experienced at least 2 intron insertions or deletions show evidence of enrichment for membrane location and transport and transporter activity function. We do not find any relationship of intron flux to expression level or G + C content of the gene. Detection of a bias in the location of intron gains and losses within a gene depends on the method of measurement: an intragene method indicates that events (specifically intron losses) are biased toward the 3' end of the gene. Despite the relatively recent acquisition of these introns, we found only one case where we could identify the mechanism of intron originthe TOUCH3 gene has experienced 2 tandem, partial, internal gene duplications that duplicated a preexisting intron and also created a novel, alternatively spliced intron that makes use of a duplicated pair of cryptic splice sites.
Key Words: intron gain and loss Arabidopsis thaliana intron evolution genome evolution
| Introduction |
|---|
|
|
|---|
The origin and evolution of introns in eukaryotic genomes has been hotly debated for many years. Central to these arguments is the question of how abundant intron gains and losses are. The evolution of introns is influenced by both mutation bias and selection. Intron length and intron number often appear to be affected independently. Mutation biases may cause positional biases of introns within a gene (Mourier and Jeffares [2003]
Most previous studies of intron gain and loss have focused on identifying the prototypic gene structure in early eukaryotes and have thus examined this phenomenon in very distantly related eukaryotic genomes (some recent examples of large-scale studies include: Rogozin et al. 2003
; Qiu et al. 2004
; Rogozin et al. 2005
; Roy and Gilbert 2005a
). These broadly similar studies have returned strikingly different conclusions from an intron-rich ancestor with a preponderance of intron loss (Roy and Gilbert 2005a
) to a less intron-dense animalplant ancestor, with gains outnumbering losses (Rogozin et al. 2003
). The different outcomes are probably due to differing assumptions about the properties of intron gain sites (Nguyen et al. 2005
; Rogozin et al. 2005
) or to different patterns of evolution in different lineages (Nielsen et al. 2004
; Roy and Gilbert 2006
). Indeed, a reanalysis of the Rogozin et al. (2003)
data by Roy and Gilbert (2005b)
using maximum likelihood methods instead of parsimony concluded that intron loss, and not gain, had dominated their evolution. Recently, Roy and Gilbert (2005a)
estimated the rate of intron loss and gain to be 2 x 103 to 2 x 104 per million years and 6 x 107 to 4 x 106 per site per million years, respectively, based on comparisons across diverse eukaryotic lineages.
Studies of intron gain and loss in more recently diverged genomes include mammals (Roy et al. 2003
), Caenorhabditis (Coghlan and Wolfe 2004
), and fungi (Nielsen et al. 2004
). In a comparison of human and rodent introns, Roy et al. (2003)
uncovered only loss events. The Coghlan and Wolfe (2004)
study searched only for gain events and found evidence for 122 newly inserted introns that originated in the 80110 Myr that separate Caenorhabditis elegans and Caenorhabditis briggsae. Nielsen et al. (2004)
examined the patterns of intron evolution in fungi and uncovered a combination of intron loss and gain events.
The genome of the model plant Arabidopsis thaliana provides an ideal data set for examining intron gain and loss. Mounting evidence supports the occurrence of at least one, and likely multiple, whole-genome duplication events in the Arabidopsis lineage (Arabidopsis Genome Initiative 2000
; Blanc et al. 2000
; Paterson et al. 2000
; Vision et al. 2000
; Simillion et al. 2002
; Blanc et al. 2003
; Bowers et al. 2003
; Blanc and Wolfe 2004
). The most recent of these genome duplication events is the most unequivocal, having generated a set of large "blocks" of duplicated genes that cover almost the entire genome with no overlap between blocks (Blanc et al. 2003
; Bowers et al. 2003
). Subsequent to the whole-genome duplication, many duplicated genes were lost and only approximately 2000 genes remain in duplicate today. The genes retained in duplicate are not a random sample of all genes and are biased for genes with a function in transcriptional regulation (Seoighe and Gehring 2004
).
Here we analyze the set of paralogous pairs of genes generated by this recent genome duplication for evidence of intron gain and loss in the period since the duplication event. These genes were all duplicated simultaneously and by the same mechanism. At the time of duplication, both paralogs had identical gene structures. This is not necessarily the case for paralogs that have duplicated by other means, for example, retrocopied genes (which are generated by the reverse transcription of mRNA and insertion of the cDNA into the genome) are usually completely devoid of introns at the time of duplication. We estimate the rates of intron loss and gain and test for a relationship with other properties of the genes concerned, such as expression level, G + C content, intragenic location, and function.
| Methods |
|---|
|
|
|---|
Duplicated Arabidopsis Genes
The sequences of the set of genes duplicated in the most recent whole-genome duplication as described by Blanc et al. (2003)
Identification of Nonconserved Introns
A total of 2563 Arabidopsis paralogs generated by a recent whole-genome duplication were aligned at the protein level using T-Coffee version 1.32 with default parameters (Notredame et al. 2000
). For each of the pair, we identified the positions in the alignment corresponding to the intron splice site locations of each of the 23,164 introns in these genes.
The quality of the alignment around the intron splice site was evaluated by examining 10 alignment positions on each side of the splice site following the methods of Coghlan and Wolfe (2004)
. An unambiguous alignment region was defined as one with at least 5 conserved amino acids and no alignment gaps in the 10 alignment positions on each side of the splice site (20 positions in total). An intron was conserved if the location and phase were identical in the alignment of the two paralogs and if there were no other introns within 5 amino acids of this position on either side. An intron was classified as nonconserved if there was no intron in the paralog in an identical position or within 5 amino acids in the alignment. Cases where the alignment was ambiguous, intron location but not phase was conserved, or where there was another intron within 5 amino acids on either side of the splice site were ambiguous and were excluded from further analysis.
Detection and Alignment of Plant Homologous Sequences
All Arabidopsis sequence pairs with at least one nonconserved intron were used as queries in a Blast search against genomic DNA of the Viridiplantae division of GenBank. The database was searched using TblastN with an expectation (e) value threshold of 1 x 104 and only retaining hits with an e value within a range of 1 x 105 from the top non-A. thaliana hit. Some of the retrieved hits were very long (e.g., entire chromosomes) which may feasibly contain more than one genuine homolog. For each hit, all of the high scoring pairs with e values below the threshold were selected.
The Blast search returned many sequence fragments that did not align with the whole Arabidopsis gene or with the region surrounding the intron. These short fragments negatively affect the quality of the sequence alignment produced by automated methods. We implemented an iterative protocol to remove poorly aligned sequences and sequences that did not span the area of interest (i.e., the region of the intron) as follows. Retrieved similar sequences were initially aligned to the already aligned Arabidopsis pair using T-Coffee. In the first iteration, retrieved sequences that did not have at least 5 aligned bases in 30 bp on either side of the intron splice site were removed, and the remaining sequences were realigned. The resulting alignment was reexamined, sequences with fewer than 10 aligned bases within 30 bp of the intron were removed and the sequences were realigned. In the third iteration, only sequences with 15 aligned bases within 30 bp of the intron position were retained. In each case, the 30-bp window is offset by 10 bp on each side to avoid the immediate region of the splice site that has a tendency to align poorly if there is an intron in one of the sequences (i.e., the splice site region aligns completely to one side or the other and not partially on each side of the intron as would be expected). A final alignment was produced for each of the nonconserved Arabidopsis introns and remaining homologous sequences.
Identification of Homologous Introns
An homologous intron was identified from an alignment as a stretch of at least 40 bp aligned between the 6 and +6 Arabidopsis intron splice site nucleotides and aligned with gaps in the Arabidopsis gene lacking the intron, that is, requiring that the intron is at least 28 bp long. We required that 10 base pairs on both sides of the splice site region (from 15 to 5 and from +5 to +15) should be aligned without gaps in order to unambiguously declare the presence or absence of a homologous intron. If one of the aligned sequences had gaps in this region, it was removed from the alignment. These "cleaned" alignments were used to construct a neighbor-joining tree for each nonconserved intron with ClustalW (Thompson et al. 1994
) using Kimura's correction for multiple hits and ignoring positions with gaps.
Similarity of Introns and Other Regions of the Arabidopsis Genome
We used Blast to search with the sequence of all nonconserved introns against the genome of A. thaliana without filtering low complexity regions and with an e value threshold of 1. In order to recover any hits that might be missed by the Blast method, we also used SSearch with the threshold set to 0.1 and default parameters (Pearson 1996
). We discarded the self-hits and those hits with a length of less than 50% of the query sequence, this removed most of the hits due to repeats in the sequence. In order to remove those hits that were due to a large-scale duplication (whole-gene duplication or segmental genome duplication), we removed hits where the similarity extended for long regions outside the intron sequence.
Expression Level Differences Between Genes with Gained and Lost Introns
Affymetrix data from 11 microarrays corresponding to expression levels in leaf (3), stem (4), and root (4) for growth in two different conditionsgreenhouse and growth chamberwere downloaded from the Geo Web site (http://www.ncbi.nlm.nih.gov/projects/geo). All the genes for which we had expression data were classified into 10 equal-sized expression categories. The data from the same tissue in the equal growth conditions were pooled before analysis.
Using only those genes in which all introns had been classified as gained, lost, or conserved, we examined if those genes with gained or lost introns were more abundant in certain expression categories using a chi-squared test.
Distribution of Gains and Losses Along the Coding Sequence
Intergene Method
All introns were classified into 10 different location categories according to their relative position along the cds of the gene. Category 1 indicates that the intron was in a position between 0% and 10% of the length of the gene, category 2 indicates that the intron was in a position 1120% along the gene, etc. We examined if there was any significant difference in the distribution along the coding sequence between gained, lost, and conserved introns using a chi-squared test.
Intragene Method
The intragenic location of intron gains and losses was also examined on a per gene basis as per the method of Lin and Zhang (2005)
. Each gene was classified as one of unbiased, 5' biased, or 3' biased based on the relative number of introns in the 5' or the 3' half of the gene. The null expectation is that the number of genes with a 5' bias should equal the number with a 3' bias, and this was tested using a chi-squared test. This was done separately for all nonconserved introns (in 486 genes), for gained introns, and for lost introns.
Examination of Functional Bias in Gene with Nonconserved Introns
GOslim annotation data for the genes in the Arabidopsis genome were downloaded from The Arabidopsis Information Resource (TAIR) Web site (Berardini et al. 2004
) on 10 December 2005. Each gene pair was assigned the combined GOslim terms of each of its genes.
For the purposes of this analysis, we excluded 636 gene pairs that contained no nonconserved introns and at least one ambiguous intron because we cannot be sure whether these are cases of conserved or nonconserved gene structure. This resulted in a set of 1927 gene pairs which we could definitively say did or did not experience an intron indel.
The expected frequencies of GOslim terms among the 281 gene pairs with at least one nonconserved intron were determined using simulations. We randomly sampled 281 gene pairs from the 1927 paralogous genes in our data set and noted the distribution of GOslim terms. This was repeated 100,000 times. The mean and standard deviation (SD) of the frequency of each GOslim term was calculated for the simulations and compared with the observed data. This procedure was repeated for the gene pairs with at least 2 and with at least 3 nonconserved introns. Correction for multiple tests was done in two alternative ways: Bonferroni correction and BenjaminiHochberg correction (Benjamini and Hochberg 1995
).
We performed another set of 100,000 simulations for genes with at least 1, 2, and 3 nonconserved introns correcting for number of introns in the gene pair. In the simulations, for each of the original 281 gene pairs, a gene pair was selected randomly from the list of pairs with at least one member of the selected pair having the same number of introns as one member of the original pair.
| Results |
|---|
|
|
|---|
Recent Changes in Arabidopsis Gene Structure
We examined 2563 paralogous Arabidopsis gene pairs as identified by Blanc et al. (2003)
We identified 10,004 pairs of introns that have been conserved in both Arabidopsis paralogs since the genome duplication, 578 nonconserved introns (Supplementary Table 2, Supplementary Material online), and 2578 ambiguous cases. The 578 nonconserved introns are the results of either intron gain into one paralog or loss from the other. We found 281 genes having one nonconserved intron each. An additional 115 gene pairs have experienced multiple intron indels in the time since duplication, 37 of which had 3 or more intron gains or losses (table 1).
|
Identification of Intron Gains and Losses
To distinguish intron gains from losses, we required genomic sequence data from homologous plant genes. We searched the Viridiplantae division of GenBank for similar flowering plant genomic DNA sequences spanning the intron position and aligned them to the Arabidopsis pair (see Methods). We again employed very stringent criteria on the quality of the sequence alignments. The most important criterion was the exclusion of alignments where there were gaps close to the intron position. These gaps may be indicative of poor alignment quality, thus making it impossible to confidently discern the presence or absence of an intron at the site of interest.
An intron was inferred to have originated in the common ancestor of all genes containing the intron. An intron gain was scored when the nonconserved intron was present only in one A. thaliana genome-duplication paralog and other paralogs of this gene that duplicated after the tetraploidy event (e.g., fig. 1A). Evidence for intron loss comes from the presence of an intron in the same location in any earlier diverging flowering plant gene (e.g., fig. 1B). We could confidently assign 56 intron gain events and 39 intron loss events (Supplementary Table 2, Supplementary Material online).
|
Intragenic Location of Intron Indels
Conflicting studies say that intron loss is (Mourier and Jeffares 2003
When we examined the intron indels identified in this study using an intergene method similar to that of Nielsen et al. (2004)
, we did not find evidence for a bias in the intragenic location of gain and loss events, though they do appear to be more common in the middle of genes (see Supplementary Figures, Supplementary Material online). We also tested for bias in the location of intron indels using the intragene method of Lin and Zhang (2005)
and found an excess of nonconserved introns in the 3' end of genesonly 189 genes display a 5' bias in the location of nonconserved introns compared with 273 genes that display a 3' bias; P
0.001. The distribution of gained introns alone showed no significant bias. There were significantly more genes with a 3' bias of lost introns compared with a 5' bias (23 and 11 genes, respectively; P
0.05). However, if we exclude genes with at least one ambiguous intron (i.e., in a poorly aligned region or close to another intron), then there is no bias in the intragenic location of intron indels. The differences in the results from the two methods may be due to a greater robustness of the Lin and Zhang (2005)
method to large variation in gene size because it only splits each gene into 2 location categories.
Relationship to Gene Expression and G + C Content
Previous studies have indicated that intron evolution is correlated with other genic and genomic features. Selection for transcriptional efficiency has led to the reduction in length (but not frequency) of introns in Arabidopsis and other eukaryotes (Castillo-Davis et al. 2002
; Seoighe et al. 2005
), and G + Crich regions of vertebrate genomes have shorter introns on average (Duret et al. 1995
). We examined whether these phenomena known to influence intron length also influence intron gain and loss. We searched for evidence of a relationship between intron gain or loss and gene expression level based on microarray data and found no difference between genes with gained, lost, or conserved introns. Similarly, we found no significant difference in the G + C content of genes containing introns with different fates (see Supplementary Figures, Supplementary Material online).
Function of Genes experiencing Intron Indels
We compared the function of pairs of genes with nonconserved introns with those that only contained conserved introns (and no ambiguous introns) using the GOslim Gene Ontology classifications from TAIR (Berardini et al. 2004
). A summary of the results is presented in table 2. We did not consider gain and loss events separately because of low statistical power.
|
Results uncorrected for multiple testing indicate that gene pairs that experienced at least one intron indel are enriched for gene ontology (GO) terms involving cytosol and hydrolase activity while transcription factor activity as well as unknown molecular function and biological processes are underrepresented; gene pairs that experienced at least 2 intron indels are enriched for "other membranes," transport, and transporter activity; gene pairs that experienced at least 3 intron indels are enriched for nucleotide-binding functions and signal transduction (all significant at the 1% level). When we repeated the simulations correcting for number of introns, the results were not significantly different. Coghlan and Wolfe (2004)
There are 47 GOslim categories in this analysis. Because of multiple testing, if we consider each of these categories to be independent, we would expect just less than 2.5 categories to falsely appear significant at the 5% level and less than 0.5 categories to falsely appear significant at the 1% level. We observe more categories with P values
0.05 and
0.01, respectively, which indicates that most of these results are true positives but does not indicate which ones. When we correct for multiple testing using the Bonferroni correction, no GO terms are overrepresented (table 2). However, Bonferroni correction is extremely strict, especially in cases where there may be some dependence between categories (as is the case with GO terms). When we use BenjaminiHochberg correction (Benjamini and Hochberg 1995
), which aims to minimize the false discovery rate (FDR; i.e., the fraction of significant results that are actually false positives), genes with at least one intron indel are enriched for the terms cytosol, hydrolase activity, kinase activity, nucleotide binding, transferase activity, and transport at the 5% FDR level; genes with at least 2 intron indels are enriched for the terms "other membranes," transport, and transporter activity also at the 5% FDR level (table 2). No terms remain significant for genes with at least 3 intron indels, which may be caused by low statistical power due to the small numbers of genes.
Search for Origins of Introns
The mechanisms of intron gain remain enigmatic (Fedorov, Roy, Fedorova et al. 2003
). Possible modes of intron origin almost all involve the incorporation of copies of other genomic DNA into a gene as an intron, be it directly through DNA duplication or indirectly through RNA intermediates (Roy and Gilbert 2006
). We searched the entire Arabidopsis genome for DNA sequences with significant similarity to any of the nonconserved introns we identified using Blast and SSearch. We excluded self-hits, short hits (less than 50% of the query length), and any hits that were due to a large duplication (e.g., complete gene duplication, segmental chromosome duplication).
We could successfully identify the origin of only one recently gained intron. The mechanically inducible TOUCH3 gene (TCH3; At2g41100) has gained an intron with respect to its paralog the Calmodulin 3 gene (CAM3; At3g56800) that only contains one intron. TCH3 also contains one conserved intron and one intron in an ambiguous portion of the alignment with respect to CAM3. The sequence similarity search revealed that the new TCH3 intron is significantly similar to the conserved intron within TCH3 and to an intron in an adjacent paralogous gene in the chromosome (At2g41090). Inspection of the TCH3 gene sequence indicated that this intron was duplicated by a partial, internal gene duplication that also duplicated exonic sequence (fig. 2). The 5 homologous sequences we identified in Solanum tuberosum (potato, 3 homologs), Arachis hypogaea (peanut), and Oryza sativa (rice) resembled the CAM3 gene in gene structure and length. The dotplot of the TCH3 gene against itself (fig. 2B) indicates that much of the original gene was in fact duplicated twice (Sistrunk et al. 1994
), generating the 2 new introns (fig. 3), but one of these (intron 2) was excluded by our alignment quality criteria during the assignment of conserved and nonconserved introns. Interestingly, TCH3 has an alternative splice form (supported by cDNA evidence; GenBank NC_003071) that uses a pair of cryptic splice sites (AGGT) fortuitously present in the original duplicated gene segment as the ends of a new intron (fig. 3). This mechanism of intron gain was originally proposed over 15 years ago (Rogers 1989
).
|
|
| Discussion |
|---|
|
|
|---|
The work reported here examines the dynamics of intron gain and loss on a much more recent scale than any previous studies. We observe a rate of gain and loss of introns of 2.7 x 103 to 9.1 x 104 events per intron site per million years (578 indel events out of 10 582 characterized intron locations in the 2060 Myr since the genome duplication). This rate is higher than found in most previous studies stretching over broader evolutionary periods. If we extrapolate the amount of intron gain and loss to the whole data set (i.e., 60% of nonconserved introns are gains), the rate of intron gain (2.0 x 1012 to 5.9 x 1012 gains per site per year; based on 2 873 004 possible insertion sites that pass alignment quality criteria) is similar to that found by Roy and Gilbert (2005a)
1500 humanmouse orthologs. Additionally, there may be some lineage-specific intron indel acceleration due to neutral drift to fixation of gain and loss polymorphisms facilitated by the tiny effective population size imposed by A. thaliana's self-fertilization lifestyle.
Intron gain and loss in paralogous genes has been previously studied in a broad range of eukaryotic genomes (Babenko et al. 2004
; Castillo-Davis et al. 2004
; Qiu et al. 2004
). Two studies examining this phenomenon in very old duplicate genes both found an excess of intron gain events (Babenko et al. 2004
; Qiu et al. 2004
). Analysis of introns in duplicated genes in Plasmodium malaria parasites of human and mouse indicated that intron indels are very frequent in paralogous genes, although they did not distinguish between gain and loss (Castillo-Davis et al. 2004
). One of the problems with these analyses lies in the estimation of the intron/exon structure at the time of duplication. The paralogous genes studied are likely to have been duplicated at widely different timesmaking rate estimation problematicand by different mechanisms, including retrocopying of the gene via an mRNA intermediate that usually removes all introns from the gene. By contrast, the genes selected for analysis here were all duplicated at the same time and preserving gene structures.
All of these analyses of intron gain and loss find high rates of gene structure evolution in paralogs. Special features of paralogous genes that may give rise to higher intron flux include a possible contribution from subfunctionalization of alternative splice variants (Su et al. 2006
) which may involve changes in gene structure. However, this is more likely to involve the loss than the gain of an intron because it proposes the loss of alternative splicing by at least one of the duplicate genes. Some of the paralogous gene pairs in this study experienced multiple intron indels (table 1 and Supplementary Table 1, Supplementary Material online), and it is not clear if there is something special about these genes. A GO term analysis indicates that the group of gene pairs with 2 or more nonconserved introns is enriched for functions involved in transport, transporter activity, and other membranes with respect to the entire group of paralogous genes. However, it is not clear why this should be the case.
Intron gain may be overestimated when there has been a parallel intron loss in the outgroup sequences. The Dollo Parsimony method we employed here does not attempt to correct for this, unlike likelihood methods. However, likelihood methods require an estimate of the rate of intron loss in order to estimate parallel loss events, and these estimates are not readily available for all lineages. A compromise has often been to assume constant rates on all lineages, which may not be biologically realistic. Our analysis uncovered approximately 1·5 times more intron gains than losses during recent Arabidopsis evolution (although the difference is not significant based on a chi-squared test). If the actual frequencies are equal in this data set, then that would imply a 9% chance of parallel loss in all outgroup sequences. If there is just a single outgroup from rice, this equates to a rate of intron loss of 4·5 x 104 per intron per million years (assuming a monocotdicot divergence date of 200 MYA), which is comparable to the rate estimated by Roy and Gilbert (2005a)
. Where there are more numerous or more closely related outgroups, the rate of loss must be much higher to create this pattern of parallel intron loss. A tendency for parallel intron loss of particular introns (over random intron loss) has been observed in diverse Caenorhabditis genes and in the White gene of animals (Krzywinski and Besansky 2002
; Cho et al. 2004
). If this phenomenon holds in plant genomes or more generally if intron loss is more frequent in rice, then the parsimony method used here to infer intron gain and loss will be even more susceptible to the overassignment of intron gains due to parallel loss in the outgroup.
Arabidopsis thaliana has a famously small genome. One might have therefore naïvely predicted an excess of recent intron loss events, which we do not observe (though the amount of intron loss is high). However, the broad correlation between genome size and intron size in vertebrates (McLysaght et al. 2000
) is not generally apparent in plant genomes (Wendel et al. 2002
). This uncoupling of genome size and intron size is mirrored here by an uncoupling of genome reduction and intron loss. The yeast Cryptococcus neoformans has a similar uncoupling of these phenomena in its small, yet intron-dense, genome (Loftus et al. 2005
). In future research, it will be interesting to investigate whether the high rate of intron flux in paralogous genes can be related to subfunctionalization or neofunctionalization following gene duplication.
| Supplementary Material |
|---|
|
|
|---|
Supplementary Tables 1 and 2 and Supplementary Figures are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
The authors would like to thank Ken Wolfe and Kirsten Bratke for helpful discussions. This work is supported by Science Foundation Ireland.
Funding to pay the Open Access publication charges for this article was provided by the Smurfit Institute of Genetics.
| Footnotes |
|---|
William Martin, Associate Editor
| References |
|---|
|
|
|---|
Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796815.[CrossRef][Medline]
Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. 2004. Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res 32:372433.
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 85:289300.
Berardini TZ, Mundodi S, Reiser R, et al. (18 co-authors). 2004. Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135:111.
Blanc G, Barakat A, Guyot R, Cooke R, Delseny M. 2000. Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12:1093101.
Blanc G, Hokamp K, Wolfe KH. 2003. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res 13:13744.
Blanc G, Wolfe KH. 2004. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 16:166778.
Bowers JE, Chapman BA, Rong J, Paterson AH. 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422:4338.[CrossRef][Medline]
Castillo-Davis CI, Bedford TB, Hartl DL. 2004. Accelerated rates of intron gain/loss and protein evolution in duplicate genes in human and mouse malaria parasites. Mol Biol Evol 21:14227.
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA. 2002. Selection for short introns in highly expressed genes. Nat Genet 31:4158.[Web of Science][Medline]
Cho S, Jin SW, Cohen A, Ellis RE. 2004. A phylogeny of caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res 14:120720.
Coghlan A, Wolfe KH. 2004. Origins of recently gained introns in Caenorhabditis. Proc Natl Acad Sci USA 101:113627.
Duret L, Mouchiroud D, Gautier C. 1995. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 40:30817.[CrossRef][Web of Science][Medline]
Fedorov A, Roy S, Cao X, Gilbert W. 2003. Phylogenetically older introns strongly correlate with module boundaries in ancient proteins. Genome Res 13:11557.
Fedorov A, Roy S, Fedorova L, Gilbert W. 2003. Mystery of intron gain. Genome Res 13:223641.
Jeffares DC, Mourier T, Penny D. 2006. The biology of intron gain and loss. Trends Genet 22:1622.[CrossRef][Web of Science][Medline]
Krzywinski J, Besansky NJ. 2002. Frequent intron loss in the white gene: a cautionary tale for phylogeneticists. Mol Biol Evol 19:3626.
Lin K, Zhang DY. 2005. The excess of 5' introns in eukaryotic genomes. Nucleic Acids Res 33:65227.
Loftus BJ, Fung E, Roncaglia P, et al. (54 co-authors). 2005. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science 307:13214.
Lynch M. 2002. Intron evolution as a population-genetic process. Proc Natl Acad Sci USA 99:611823.
Lynch M. 2006. The origins of eukaryotic gene structure. Mol Biol Evol 23:45068.
McLysaght A, Enright AJ, Skrabanek L, Wolfe KH. 2000. Estimation of synteny conservation and genome compaction between pufferfish (Fugu) and human. Yeast 17:2236.[CrossRef][Web of Science][Medline]
Mourier T, Jeffares DC. 2003. Eukaryotic intron loss. Science 300:1393.
Nguyen HD, Yoshihama M, Kenmochi N. 2005. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol 1:e79.[CrossRef][Medline]
Nielsen CB, Friedman B, Birren B, Burge CB, Galagan JE. 2004. Patterns of intron gain and loss in fungi. PLoS Biol 2:e422.[CrossRef][Medline]
Notredame C, Higgins DG, Heringa J. 2000. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:20517.[CrossRef][Web of Science][Medline]
Paterson AH, Bowers JE, Burow MD, et al. (11 co-authors). 2000. Comparative genomics of plant chromosomes. Plant Cell 12:152340.
Pearson WR. 1996. Effective protein sequence comparison. Methods Enzymol 266:22758.[Web of Science][Medline]
Qiu WG, Schisler N, Stoltzfus A. 2004. The evolutionary gain of spliceosomal introns: sequence and phase preferences. Mol Biol Evol 21:125263.
Rogers JH. 1989. How were introns inserted into nuclear genes? Trends Genet 5:2136.[CrossRef][Web of Science][Medline]
Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV. 2005. Analysis of evolution of exon-intron structure of eukaryotic genes. Brief Bioinform 6:11834.
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. 2003. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13:15127.[CrossRef][Web of Science][Medline]
Roy SW, Fedorov A, Gilbert W. 2003. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA 100:715862.
Roy SW, Gilbert W. 2005a. Rates of intron loss and gain: implications for early eukaryotic evolution. Proc Natl Acad Sci USA 102:57738.
Roy SW, Gilbert W. 2005b. Complex early genes. Proc Natl Acad Sci USA 102:198691.
Roy SW, Gilbert W. 2006. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet 7:21121.[CrossRef][Web of Science][Medline]
Seoighe C, Gehring C. 2004. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet 20:4614.[CrossRef][Web of Science][Medline]
Seoighe C, Gehring C, Hurst LD. 2005. Gametophytic selection in Arabidopsis thaliana supports the selective model of intron length reduction. PLoS Genet 1:e13.[CrossRef][Medline]
Simillion C, Vandepoele K, Van Montagu MC, Zabeau M, Van de Peer Y. 2002. The hidden duplication past of Arabidopsis thaliana. Proc Natl Acad Sci USA 99:1362732.
Sistrunk ML, Antosiewicz DM, Purugganan MM, Braam J. 1994. Arabidopsis TCH3 encodes a novel Ca2+ binding protein and shows environmentally induced and tissue-specific regulation. Plant Cell 6:155365.[Abstract]
Su Z, Wang J, Yu J, Huang X, Gu X. 2006. Evolution of alternative splicing after gene duplication. Genome Res 16:1829.
Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:467380.
Vision TJ, Brown DG, Tanksley SD. 2000. The origins of genomic duplications in Arabidopsis. Science 290:21147.
Wendel JF, Cronn RC, Alvarez I, Liu B, Small RL, Senchina DS. 2002. Intron size and genome size in plants. Mol Biol Evol 19:234652.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. D. Cutter, A. Dey, and R. L. Murray Evolution of the Caenorhabditis elegans Genome Mol. Biol. Evol., June 1, 2009; 26(6): 1199 - 1234. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Xie, X. Li, B. J. Glover, S. Bai, G.-Y. Rao, J. Luo, and J. Yang Duplication and Functional Diversification of HAP3 Genes Leading to the Origin of the Seed-Developmental Regulatory Gene, LEAFY COTYLEDON1 (LEC1), in Nonseed Plant Genomes Mol. Biol. Evol., August 1, 2008; 25(8): 1581 - 1592. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Tarrio, F. J. Ayala, and F. Rodriguez-Trelles From the Cover: Alternative splicing: A missing piece in the puzzle of intron gain PNAS, May 20, 2008; 105(20): 7223 - 7228. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Carmel, Y. I. Wolf, I. B. Rogozin, and E. V. Koonin Three distinct modes of intron dynamics in the evolution of eukaryotes Genome Res., July 1, 2007; 17(7): 1034 - 1044. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy and D. Penny A Very High Fraction of Unique Intron Positions in the Intron-Rich Diatom Thalassiosira pseudonana Indicates Widespread Intron Gain Mol. Biol. Evol., July 1, 2007; 24(7): 1447 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy and D. Penny Patterns of Intron Loss and Gain in Plants: Intron Loss-Dominated Evolution and Genome-Wide Comparison of O. sativa and A. thaliana Mol. Biol. Evol., January 1, 2007; 24(1): 171 - 181. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. W. Roy, M. Irimia, and D. Penny Very Little Intron Gain in Entamoeba histolytica Genes Laterally Transferred from Prokaryotes Mol. Biol. Evol., October 1, 2006; 23(10): 1824 - 1827. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





