Skip Navigation


MBE Advance Access originally published online on February 21, 2008
Molecular Biology and Evolution 2008 25(5):821-830; doi:10.1093/molbev/msn013
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/5/821    most recent
msn013v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Krauss, V.
Right arrow Articles by Eisenhardt, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krauss, V.
Right arrow Articles by Eisenhardt, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Near Intron Positions Are Reliable Phylogenetic Markers: An Application to Holometabolous Insects

Veiko Krauss*, Christian Thümmler*, Franziska Georgi*, Jörg Lehmann{dagger}, Peter F. Stadler{dagger},{ddagger} and Carina Eisenhardt*

* Department of Genetics, Institute of Biology II, University of Leipzig, Leipzig, Germany
{dagger} Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany
{ddagger} Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany; RNomics Group, Fraunhofer Institute for Cell Therapie and Immunology, Leipzig; Institute for Theoretical Chemistry, University of Vienna, Wien, Austria; Santa Fe Institute, Santa Fe, NM

E-mail: krauss{at}rz.uni-leipzig.de.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Today, the reconstruction of the organismal evolutionary tree is based mainly on molecular sequence data. However, sequence data are sometimes insufficient to reliably resolve in particular deep branches. Thus, it is highly desirable to find novel, more reliable types of phylogenetic markers that can be derived from the wealth of genomic data. Here, we consider the gain of introns close to older preexisting ones. Because correct splicing is impeded by very small exons, nearby pairs of introns very rarely coexist, that is, the gain of the new intron is nearly always associated with the loss of the old intron. Both events may even be directly connected as in cases of intron migration. Therefore, it should be possible to identify one of the introns as ancient (plesiomorphic) and the other as novel (derived or apomorphic). To test the suitability of such near intron pairs (NIPs) as a marker class for phylogenetic analysis, we undertook an analysis of the evolutionary positions of bees and wasps (Hymenoptera) and beetles (Coleoptera) in relation to moths (Lepidoptera) and dipterans (Diptera) using recently completed genome project data. By scanning 758 putatively orthologous gene structures of Apis mellifera (Hymenoptera) and Tribolium castaneum (Coleoptera), we identified 189 pairs of introns, one from each species, which are located less than 50 nt from each other. A comparison with genes from 5 other holometabolan and 9 metazoan outgroup genomes resulted in 22 shared derived intron positions found in beetle as well as in butterflies and/or dipterans. This strongly supports a basal position of hymenopterans in the holometabolous insect tree. In addition, we found 31 and 12 intron positions apomorphic for A. mellifera and T. castaneum, respectively, which seem to represent changes inside these branches. Another 12 intron pairs indicate parallel intron gains or extraordinarily small exons. In conclusion, we show here that the analysis of phylogenetically nested, nearby intron pairs is suitable to identify evolutionarily younger intron positions and to determine their relative age, which should be of equal importance for the understanding of intron evolution and the reconstruction of the eukaryotic tree.

Key Words: intron evolution • molecular phylogenetics • near intron pair (NIP) • intron gain • insect phylogeny


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The overwhelming part of the literature in molecular phylogenetics is based upon the analysis of nucleic acid and amino acid sequences. Despite decades of effort by molecular systematists, evolutionary trees of eukaryotes still remain partly unresolved or inconsistent with each other (for review, see Roger and Hug 2006Go). This difficulty in resolving divergences is mainly due to the rareness of reliable synapomorphies (common derived characters) among deep branches. Single- and few-gene sequence analyses did not resolve such issues because of the large number of homoplasies in nucleic acid and protein data and the potential failure of sequence-based phylogenetic methods in cases of rate variation between lineages (Yang 1996Go; Savard, Tautz, and Lercher 2006Go). Recent genome-scale analyses demonstrated that even a large amount of data might not be sufficient to recover the true phylogeny (Jeffroy et al. 2006Go; Longhorn et al. 2007Go).

Therefore, so-called genome-level characters (for review, see Boore 2006Go) may have great potential for resolving crucial relationships for which no other data seem to be promising. The best available example for such characters appears to be the insertion of transposable elements in mammalian introns. Recently, work based on this character type has precisely resolved the relationships of placental mammals (Kriegs et al. 2006Go; Nishihara et al. 2006Go). In relation to sequence analysis, the signal-to-noise ratio of these analyses is outstanding. Boore (2006)Go summarized that only one out of 128 analyzed mammalian retrotranspositions was found to be homoplastic. However, among ecdysozoans, a similar transposon insertion analysis will be severely impeded by the longer time of divergence and the generally higher sequence substitution rate.

Spliceosomal intron positions had been introduced as another promising class of phylogenetic markers for robustly resolving divergence (Venkatesh et al. 1999Go; Rokas and Holland 2000Go; Nguyen et al. 2005Go; Roy and Gilbert 2005Go; Zheng et al. 2007Go). Introns have a slow rate of insertion and loss and evolve largely unaffected by the coding sequence (CDS) (Lynch and Richardson 2002Go; Roy et al. 2003Go; Yandell et al. 2006Go). However, some other studies (e.g., Cho et al. 2004Go; Krauss et al. 2005Go) found that identical introns were frequently lost independently in separate evolutionary lineages. On the other hand, recent analyses (Sverdlov et al. 2005Go; Yoshihama et al. 2006Go) estimated that parallel intron gains at orthologous positions in different evolutionary lineages account for 1.3–3.0% of all intron positions. Under these conditions, methods to improve the reliability of intron markers are highly desirable (Krzywinski and Besansky 2002Go; Wada et al. 2002Go).

Therefore, we were interested to efficiently exclude homoplastic (parallel, reverse, or convergent) changes of intron positions from analysis. During our study of eIF2{gamma} intron evolution (Krauss et al. 2005Go), we documented several cases of successive losses and gains of introns at only slightly different positions. When mapped onto the gene tree, this results in a phylogenetically nested distribution of evolutionarily newer introns (fig. 1). Therefore, we introduce the term "near intron pair" for 2 introns which exist in orthologous genes of different genomes at nearby locations less than 50 nt away from each other. Normally, 2 introns cannot coexist that close in 1 gene because exons smaller than about 50 nt are relatively rare (Saeys et al. 2007Go) and functionally disadvantageous (Hwang and Cohen 1997Go; Carlo et al. 2000Go). This is also consistent with a study by Lynch and Kewalramani (2003)Go, which shows that exon sizes are more uniform than expected under a random insertion model. Therefore, changes of intron positions over distances of less than 50 nt should represent reliable synapomorphic character states.


Figure 1
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Synapomorphic distribution of a NIP. This distribution allows the derivation of 1) a plesiomorphic intron present in at least one outgroup and in A and 2) a synapomorphic intron with a known time frame of gain, which supports a common ancestor of B and the ingroup under exclusion of A. Note that the intron loss and gain events might have occurred simultaneously by intron migration.

 
Here, as a proof of concept, we have analyzed the relationships of major holometabolous insect groups. Accounting for more than 50% of all animal species (Kristensen 1999Go), holometabolous insects (Endopterygota or Holometabola) form the most diverse and successful group of multicellular organisms. Holometabola comprises 11 orders, 4 of which—Coleoptera, Hymenoptera, Diptera, and Lepidoptera—contain over 97% of the species diversity of this group (Grimaldi and Engel 2005Go). The monophyly of each order of the Endopterygota is relatively well supported. In contrast, the interordinal relationships of holometabolic insects are uncertain (Beutel and Pohl 2006Go). Recently, Savard et al. (2006)Go analyzed expressed sequence tag (EST) sequence data and concluded that the Hymenoptera are the most basal group of the Holometabola. A contrasting, widely accepted former hypothesis supposed that the Neuropteriformia (Coleoptera + Neuropterida) are the basal group (e.g., Kristensen 1999Go; Grimaldi and Engel 2005Go). Here, we present evidence that the Hymenoptera diverged from the common ancestor of the Lepidoptera and the Diptera significantly before the Coleoptera. Thereby, we show that NIPs are suitable for phylogenetic analysis and that such an analysis can determine the relative age of introns at the same time.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Compilation of the Data Set
We downloaded all 7180 predicted protein sequences of the Apis mellifera genome (Build 2.1, Honeybee Genome Sequencing Consortium 2006Go) from National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov) and excluded from this set doublets, shorter isoforms and peptides smaller than 140 amino acids. The remaining 6 487 proteins were used for TBlastN (Altschul et al. 1997Go) searches against the Beetle Base (Tribolium castaneum scaffold, Wang et al. 2007Go). This resulted in 2119 Apis proteins with exactly one corresponding CDS in Tribolium that 1) covers at least 50% of the Apis sequence with significantly higher amino acid identity than any other Tribolium CDS and 2) contains at least one sequence insertion of more than 42 nt in Tribolium which is surrounded by a gapless region of the Apis sequence in the TBlastN alignment. As such putative intronic sequence insertions, we identified both 1) gaps inside 1 alignment and 2) intervals between 2 successive alignments in every TBlastN or BlastX search which was performed using genomic sequences during this study. In a next step, these 2119 Apis sequences were used for TBlastN searches against the human genome database (reference only) to find at least one corresponding CDS in Homo that 1) covers at least 50% of the Apis sequence with significantly higher amino acid identity than any other Homo CDS and 2) contains at least one sequence insertion of more than 60 nt in Homo, which is surrounded by a gapless region of the Apis sequence in the TBlastN alignment.

Next, the CDS of all Apis proteins for which we identified at least one such corresponding human gene were downloaded. Again, genes were discarded if the Apis CDS contained no intron. At the end of this selection, we obtained 758 putatively orthologous groups each consisting of 1 gene of A. mellifera, 1 gene of T. castaneum, and at least 1 gene of Homo sapiens. Occasional gene duplications in the human lineage are supposed to have occurred after the divergence of the deuterostomian from the protostomian lineage (Holland 2003Go; Putnam et al. 2007Go), which allows more than one orthologous gene in vertebrates.

Subsequently, annotated versions of the identified Tribolium gene sequences were downloaded or, alternatively, generated by manual alignment of open reading frame translations to the Apis protein sequence. To find the exact position of a putative Tribolium intron, which corresponds to alignment gaps in Apis larger than 42 nt, we used the similarity of the coded amino acids together with 5' and 3' splice position weight matrices for U2-type introns of Drosophila (Sheth et al. 2006Go). These manual alignments and annotations were done using the program MacVector 7.2 (Accelrys, San Diego, CA). The resulting gene structures remained sometimes incomplete because Tribolium intron positions located in unalignable regions of Apis could not be determined. Identified intron positions of Apis and Tribolium were named according to the homologous Apis triplet and the position inside this triplet (e.g., 127-2) and sampled. We assumed that all introns with identical positions in putatively orthologous genes are homologous to each other, although the corresponding sequences were generally too diverged to be aligned. The orthology of each identified Tribolium gene structure containing a near intron to an intron of Apis (see below) was tested by a reciprocal Blast search (BlastX) to the Apis protein set (Tatusov et al. 1997Go).

Analysis of NIPs
Based on the sampled intron positions of Apis and Tribolium, we plotted the lengths of internal exons, that is, of exons residing between 2 introns inside the CDS. In human cells, the critical exon size seems to be 50 nt, suggested by the finding that the few smaller exons have developed specific mechanisms to increase their inclusion into the mRNA (Hwang and Cohen 1997Go; Carlo et al. 2000Go). A similar critical size was revealed by our data, which showed that small exons were only above the size of 50 nt more abundant than neighboring intron pairs (consisting of an Apis-specific intron and a Tribolium-specific intron) (fig. 2B). Accordingly, it is likely that many of these intron pairs with a distance greater than 50 nt have evolved by the differential loss of introns bordering former, smaller exons and will be phylogenetically uninformative. Therefore, we limited our analysis to the 189 intron pairs whose positions differed by between 1 and 49 nt. In the following, these intron pairs will be named NIPs.


Figure 2
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— (A) Abundances of internal (located between 2 introns inside of the CDS) exon sizes in 758 putatively orthologous gene pairs of Apis mellifera (4 798 exons) and Tribolium castaneum (1 863 exons). (B) Distribution of the distances of 266 neighboring intron pairs and the sizes of 170 small exons up to 70 nt.

 
For each NIP, we constructed an alignment of putatively orthologous metazoan proteins using BlastP search results (Altschul et al. 1997Go) from the corresponding Apis protein. We included in this alignment only the single protein with the lowest E value for each target species (see below). Proteins associated with a higher E value than the next-related (paralogous) Apis protein were also excluded. In addition, we searched for arthropod EST sequences covering the NIPs to support the intron annotation and the expression of the corresponding mRNA. During this step, we used TBlastN (Altschul et al. 1997Go) and the NCBI EST database and found that more than 95% of the NIPs were covered by ESTs of at least one arthropod species. At this stage, some NIPs were excluded from analysis due to insufficient sequence conservation around the NIP or due to apparent gene duplications during metazoan evolution.

Subsequently, the phylogenetic distribution of each intron of a pair was evaluated using TBlastN and manual annotation of the adjacent, putatively orthologous exons in the Arthropod genomes of Drosophila melanogaster (or Drosophila pseudoobscura), Anopheles gambiae, Aedes aegypti, Bombyx mori, Nasonia vitripennis, Pediculus humanus, Acyrthosiphon pisum, and Daphnia pulex; the Deuterostomia genomes of Gallus gallus, Danio rerio, Ciona intestinalis (or Ciona savigni), and Strongylocentrotus purpuratus (all at NCBI, http://www.ncbi.nlm.nih.gov), in the Platyhelminthes genome of Schistosoma mansoni (Sanger Institute, http://www.sanger.ac.uk/Projects/S_mansoni); and in the Cnidaria genome of Nematostella vectensis (Joint Genome Institute, http://genome.jgi-psf.org). For this purpose, we retrieved the as yet unassembled genome sequences of Acyrthosiphon and Daphnia from the NCBI trace site by discontiguous megablast and assembled the sequences of adjoining exons. For every species, we selected the hit containing the fragments with highest amino acid or nucleotide identity. To find the exact position of a putative intron, we used the similarity of the coded amino acids together with 5' and 3' splice position weight matrices for U2-type introns (Sheth et al. 2006Go). All genome fragments whose intron positions could not be associated with a corresponding Apis triplet were excluded from analysis. Finally, orthology of all obtained gene fragments was tested by the reciprocal best Blast hit method (Tatusov et al. 1997Go). If the corresponding BlastX comparison of a gene fragment to Apis proteins failed, it was discarded. Altogether, we excluded 54 of the initial 189 NIPs from further analyses because of insufficient sequence conservation or gene duplications.

Construction of Intron Distribution Matrices
From the 135 remaining intron pairs, an intron position matrix and a corresponding intron pair matrix were manually created and analyzed in MacClade 4.0 (Maddison DR and Maddison WP 2005Go) (supplementary material 1 and 2, Supplementary Material online). Within the intron position matrix each intron of a pair is coded as "1," each empty position as "0," and no data as "?". Within the intron pair matrix, the upstream intron is coded as "1" and the downstream intron as "2," whereas intronless pair positions and no data are coded as "?".

Automated Analysis
To control and complement the manual analysis described above, we performed the following computational tasks. The identified 135 NIPs, which were obtained from 118 different genes, were automatically evaluated by constructing multiple alignments of the extracted CDS. For this purpose, DIALIGN2 (Morgenstern et al. 2006Go) was used in translated mode. For some data sets, we also used transAlign (Bininda-Emonds 2005Go), which translates CDSs to amino acid sequences, creates a ClustalW (Chenna et al. 2003Go) protein alignment, and back-translates it to aligned DNA sequences. The procedure included a check for correct splice site dinucleotides (GT/AG and GC/AG) and an automatic adaption of CDS annotations. The intron positions of the different sequences were determined according to the homologous Apis triplets using translated BLAT (Kent 2002Go). For all 135 NIPs, we found no differences between the results of the automatic and the manual analysis. At the sites of introns, the received CDS alignments were supplemented with the first 8 and the last 14 intronic nucleotides. These intron-containing nucleotide alignments were complemented with a schematic picture for each NIP (supplementary material 3, Supplementary Material online).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Near Intron Positions with Distances Less than 50 nt Are More Abundant than Small Exons
Based on 2 NIPs in the eIF2{gamma} gene, we had preliminarily suggested that Coleoptera and not Hymenoptera is the sister group of the Mecopterida, which contains the Diptera and the Lepidoptera (Krauss et al. 2005Go). Here, we performed a systematic genome-scale analysis of intron positions to 1) estimate the phylogenetic reliability of NIPs and to 2) reconstruct the relatedness of Coleoptera and Hymenoptera to the Mecopterida. To this end, we build putatively orthologous groups of intron-containing genes of A. mellifera, T. castaneum, and humans (Materials and Methods). Subsequently, we determined the intron positions inside the corresponding Apis and Tribolium genes according to the Apis triplet position and compiled the resulting sizes of internal exons.

The distribution of exon sizes shows, for both species, a maximum between 120 and 220 nt, a long tail of relatively few large exons and a few small exons (fig. 2). Only 39 of 4798 Apis exons and 8 of 1863 Tribolium exons are smaller than 50 nt. This distribution caused us to search for phylogenetically informative intron position pairs whose positions differ from each other only by between 1 and 49 nt (Material and Methods). This search resulted initially in 189 NIPs, each consisting of a specific Apis intron and a specific Tribolium intron. After confirming the corresponding intron positions using metazoan protein and arthropod EST sequences, we evaluated their phylogenetic distributions. This was done by the identification and annotation of the adjacent, putatively orthologous exons in 14 metazoan species (Materials and Methods). We used Drosophila, Anopheles, Aedes, and Bombyx as ingroup taxa; Nasonia as sister group to Apis; and Acyrthosiphon, Pediculus, Daphnia, Danio, Gallus, Ciona, Strongylocentrotus, Schistosoma, and Nematostella as outgroup. We did not include other vertebrate and Drosophila genomes because of the overwhelming similarity of occupied intron positions between used and those unused genomes.

Near Intron Positions Are Information-Rich Phylogenetic Markers
For 135 out of 189 identified NIPs, we obtained a gapless alignment of putatively orthologous CDSs around the 2 intron positions in at least some of the outgroup and ingroup species. On average, genomic regions corresponding to the 135 NIPs could be identified in 97.2% of the ingroup and 84.9% of the outgroup species (supplementary material 4, Supplementary Material online). Only 26.1% (ingroup) and 54.9% (outgroup) of these orthologous genomic regions contained at least one intron out of the analyzed pair. The relatively low abundance of introns in the ingroup is due to the intron-poor genomes of the dipterans. For the analyzed intron pairs, an intron position matrix and a corresponding intron pair matrix were manually created and analyzed in MacClade 4.0 (Maddison DR and Maddison WP 2005Go). The collected intron data were mapped onto a tree (fig. 3B) according to the commonly accepted phylogeny of metazoans and the hypothesis that hymenopterans diverged at the base of radiation of holometabolous insects as supported by this analysis (see below) as well as previous ones (Krauss et al. 2005Go; Savard et al. 2006Go).


Figure 3
View larger version (48K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Example of a NIP distribution in metazoans. The NIP of the transmembrane 9 superfamily member 2 gene suggests plesiomorphy of intron 331-0 in relation to the synapomorphic intron 339-0 of Tribolium and Bombyx. Note that 331-0 appears to be a synapomorphy of arthropods. (A) Partial alignment of the nearby intron positions 331-0, 335-0 and 339-0. Uppercase sequences correspond to exons. Both Strongylocentrotus and Danio contain one triplet fewer than all other species. Therefore, their intron positions were excluded from cladistic analysis. (B) Cladogram, based on analysis in MacClade 4 (Maddison DR and Maddison WP 2005Go). Introns are symbolized by white (335-0), gray (331-0), or black (339-0) squares. At least 2 independent intron loss events (in the Pediculus and in the dipteran lineages) do not interfere with the analysis.

 
Using this tree, 102 out of 135 NIPs could be differentiated into a plesiomorphic (outgroup shared) and an apomorphic (derived) intron position relative to the separation of the evolutionary lineages of Apis and Tribolium (fig. 4). In all, 22 of these NIPs contain common derived (synapomorphic) introns of Tribolium with Diptera and/or Lepidoptera species (table 1). In contrast, no common derived intron of Apis with Diptera or Lepidoptera species could be found. This result supports both the suitability of NIP distributions for phylogenetic analysis and the existence of a monophyletic group consisting of Diptera + Lepidoptera + Coleoptera to the exclusion of the Hymenoptera.


Figure 4
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Results of the NIP analysis between Apis mellifera and Tribolium castaneum. Using a simplified cladogram which shows Apis, Tribolium, the united ingroups (Drosophila, Anopheles, Aedes, and Bombyx), and the united outgroups (Acyrthosiphon, Pediculus, Daphnia, Schistosoma, Gallus, Danio, Ciona, Strongylocentrotus, and Nematostella), the distributions of the apomorphic intron positions are classified. The abundances of the resulting classes are given. Note that circles which contain more than 1 color indicate the possible occurrence of Apis, Tribolium, and/or no introns in the out- or the ingroup species. If both Apis and Tribolium introns of one NIP were detected in outgroup species, apomorphic and plesiomorphic introns could not be derived. Such cases were classified as inconsistent intron distributions.

 

View this table:
[in this window]
[in a new window]

 
Table 1 Synapomorphic Intron Positions of Endopterygota Excluding Hymenoptera

 
Sources of Homoplasy in NIP Characters
The quality (the information content) of a phylogenetic marker is mirrored by the fraction of homoplastic character distributions along the tree. Concerning NIPs, inconsistent distributions of intron positions along the tree are an easily detectable form of homoplasy which may be caused by 1) ancient small exons which have been fused independently with neighboring exons in different lineages through the loss of different bordering introns or by 2) combined intron loss and gain, occuring independently at the same positions, in different lineages. Actually, whereas 102 NIPs appear phylogenetically informative (fig. 4), 12 NIPs revealed an inconsistent distribution, that is, both intron positions were found in at least one of the outgroup species (table 2). One of these inconsistencies (GI: 66546088) was surely caused by an ancient small exon, which was found, bordered by both introns of this NIP, in Nematostella. Three other cases also implicate an ancient small exon whose bordering introns have been lost differently in the analyzed lineages because otherwise the intron positions of those pairs had to have changed more than twice (GIs: 66512196, 66499842, and 48120807). However, in the remaining 8 cases, the inconsistency was caused by the occurrence of the putative apomorphic intron in only one of the outgroups. Therefore, we wondered whether the apparently apomorphic intron positions of these pairs might have been gained in parallel in holometabolous insects and in those outgroup lineages. We tested this possibility using additional intron positions, which were found during the sampling of NIPs less than 50 nt away from one or both introns of an analyzed pair in out- and ingroup species (supplementary material 3, Supplementary Material online). We collected these independent NIPs in a separate presence/absence matrix (supplementary material 5, Supplementary Material online) and analyzed all intron position changes, which could be unambiguously attributed to one branch of the tree (fig. 5).


View this table:
[in this window]
[in a new window]

 
Table 2 Inconsistently Distributed Near Intron Pairs

 

Figure 5
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Apomorphic introns of NIPs mapped onto the phylogenetic tree. Downward arrows denote intron gain into each lineage as unambiguously supported by intron distribution of out- and ingroup taxa. Filled arrows mark the types of intron changes for which we searched. Empty arrows mark phylogenetically nested intron changes, which were found nearby to introns of Apis/Tribolium NIPs between other species. The tree branches are scaled (Gaunt and Miles 2002Go; Douzery et al. 2004Go; Grimaldi and Engel 2005Go; Berney and Pawlowski 2006Go).

 
Our result confirmed earlier reports by Raible et al. (2005)Go and Putnam et al. (2007)Go that the speed of intron evolution varies strongly between the evolutionary lineages of metazoans. Schistosoma and Ciona represent the terminal branches in which most novel introns evolved (6 and 5, respectively). In contrast, during the same time as those introns emerged in Ciona, the vertebrates Danio and Gallus have not experienced any intron position changes. We noted that Schistosoma and Ciona alone contributed 2 and 4 introns, respectively, to the 8 single, inconsistently distributed introns (table 2). Thus, at least those 6 introns (0.55% of altogether 1086 detected introns in 135 NIPs) were most likely caused by parallel intron loss and gain.

In summary, we found evidence for both suspected types of homoplasy. However, the amount of homoplasy is clearly too small to interfere with the phylogenetic analysis. It is important to add that both the reliability of such analyses and the numbers of inconsistent distributions critically depend on the number and kind of outgroups used. The usage of fewer outgroups would have resulted in fewer inconsistent distributions, but probably also in some contrary, seemingly synapomorphic evidence. In addition, outgroups with a relatively fast intron evolution (such as Ciona and Schistosoma) will boost the amount of homoplasy.

Interestingly, we detected no example for the opposite type of inconsistent intron distribution, that is, we never found both intron positions in at least one of the ingroup species. This might be due to the much smaller evolutionary divergence of the ingroup in relation to the outgroup lineages, which is represented by the sum of corresponding branch lengths (fig. 5).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
In this study, we show that the determination of a plesiomorphic/apomorphic NIP is suitable 1) to reveal intron gain events, 2) to assign a relative age to apomorphic (gained) introns, and 3) to use combined intron loss and gain events to evaluate phylogenetic hypotheses. Whereas intron losses are common evolutionary events, only relatively few intron gains have been convincingly demonstrated (Rodriguez-Trelles et al. 2006Go; Roy and Penny 2006Go, 2007Go). Altogether, we identified 66, 36, and 19 novel introns in Tribolium, in Apis, and in other analyzed species, respectively (figs. 4 and 5). The maximum age of the younger of these apomorphic intron positions (which are restricted to holometabolous insects) is about 320 Myr (Grimaldi and Engel 2005Go).

Inside this novel introns, we have not found any significant sequence conservation to other introns, to transposons, or to exons (data not shown). Consequently, the origin of this introns could not be determined. It remains possible that at least some of these novel intron positions resulted from intron migration (intron sliding) and did not involve the insertion of novel introns. However, because of the gapless conservation of the CDS around the NIPs, this would require some convergent base substitutions and, thus, appears rather unlikely especially for larger distances between the introns of one pair. Based on the structure of splice sites and available evidence, intron sliding processes might have occurred most probably at positions spaced by 1 (Rogozin et al. 2000Go) or 3 nt (Krauss et al. 2005Go; Hiller et al. 2007Go). Interestingly, whereas NIPs with 1 nt distance are not overrepresented in our data (4 of 189), NIPs with 3 nt distance appear significantly more abundant than such with other spacing (13 of 189, P = 0.0003 by student's t-test). Thus, intron migration might be a pathway to the emergence of some NIPs, however, in our opinion, successive losses and gains of introns might have contributed more to the data. These hypothetical pathways to NIPs will be very difficult to dissect because of the high abundance of independent intron losses (see Roy and Penny 2007Go and references therein), which renders the detection of an intronless NIP site, a potential intermediate of the second pathway, worthless.

Consistent with other studies (Carmel et al. 2007Go and references therein), we observed highly divergent rates of intron evolution. According to a hypothesis of evolution of genomic complexity (Lynch and Conery 2003Go), intron gain would have resulted in slightly deleterious alleles and thus may have occurred mainly during limited time spans, being driven by decreased population sizes. This is consistent with our data, if we compare 22 apomorphic intron positions that were gained in the ancestral line of Diptera, Lepidoptera, and Coleoptera (but not Hymenoptera) with only 12 intron positions, which emerged later in the evolutionary line of Tribolium (fig. 5). According to the fossil record (Grimaldi and Engel 2005Go), such a common ancestor may not have existed longer than about 40 Myr (from the start of the late Carboniferous to 285 MYA), whereas the beetle line is at least 280 Myr old. The identified numbers of lineage-specific intron changes are also significantly different from the protein divergence rates between holometabolic insects (Zdobnov and Bork 2007Go). Specifically, these authors found that the common branch of Diptera, Lepidoptera, and Coleoptera is about 20 times shorter than the Tribolium branch. In addition, we did not find any NIP-based differences between Apis and Nasonia, although the corresponding evolutionary lines separated at the latest 150 MYA (Grimaldi and Engel 2005Go). Taken together, this is a significant finding as it means that NIPs may offer phylogenetic information complementary to sequence analyses.

Based on altogether 24 synapomorphic NIP distributions from the current analysis and the preliminary investigation on the eIF2{gamma} gene (Krauss et al. 2005Go), our study supports without any contrary evidence a more basal position of the hymenopterans in relation to the beetles within the tree of holometabolous insects. This result contradicts most former phylogenetic hypotheses (for review, see Beutel and Pohl 2006Go), but it is backed by paleontologists (Rohdendorf and Rasnitsyn 1980Go), morphologists (Ross 1965Go; Kukalová-Peck and Lawrence 2004Go), sequence analysis of ESTs (Savard et al. 2006Go), genomic sequences (Zdobnov and Bork 2007Go), and application of mixed DNA/RNA models to 18S data (Misof et al. 2007Go). Because this study is based exclusively on the analysis of intron position differences between Apis and Tribolium, we could not test the possibility of a basal position of Lepidoptera or Diptera inside the Holometabola. However, corresponding trees would be less parsimonious than our tree (fig. 5). Specifically, a Holometabola tree placing basal the Diptera, the Lepidoptera, or the Mecopterida (Diptera + Lepidoptera) would result in 8, 20, or 22 additional inconsistent NIP distributions, respectively (table 1), and would resolve none of the 12 other inconsistent distributions (table 2). In addition, to our knowledge, corresponding hypotheses have never been proposed (see e.g., Beutel and Pohl 2006Go). It appears more important to expand the genome-scaled studies of holometabolous insects to include species of smaller groups, such as the Strepsiptera, Neuropterida, and Mecoptera. Without genome projects for such species, it still remains possible to use the apomorphic introns determined in this study to resolve the phylogenetic position of these groups by sequencing the corresponding gene fragments.

Finally, though we searched only for resolving the Coleoptera–Hymenoptera–Mecopterida trifurcation, we found 19 apomorphies on other branches of the used tree (fig. 5). This points to a general usability of near intron positions as novel phylogenetic marker in metazoans and, hopefully, in all eukaryotes.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary materials 1–5 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We gratefully acknowledge the sequencing of the yet unpublished genomes of Danio rerio, Strongylocentrotus purpuratus, Schistosoma mansoni, Daphnia pulex, Acyrthosiphon pisum, Pediculus humanus, Nasonia vitripennis, and Tribolium castaneum. We thank 3 anonymous reviewers for their insightful comments that helped to improve the manuscript. This work was supported by the Deutsche Forschungsgemeinschaft (KR2065/2-1 to V.K. and STA850/6-1 to P.F.S.). PFS holds external affiliations with the Santa Fe Institute, the Instutite of Theoretical Chemistry of the University of Vienna, and the Fraunhofer Institute for Cell Therapy and Immunology.


    Footnotes
 
Barbara Ruth Holland, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.[Abstract/Free Full Text]

    Berney C, Pawlowski J. A molecular time-scale for eukaryote evolution recalibrated with the continuous microfossil record. Proc Biol Sci (2006) 273:1867–1872.[Abstract/Free Full Text]

    Beutel RG, Pohl H. Endopterygote systematics—where do we stand and what is the goal (Hexapoda, Arthropoda)? Syst Entomol (2006) 31:202–219.[CrossRef]

    Bininda-Emonds ORP. transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinformatics (2005) 6:156.[CrossRef][Medline]

    Boore JL. The use of genome-level characters for phylogenetic reconstruction. Trends Ecol Evol (2006) 21:439–446.[CrossRef][Medline]

    Carlo T, Sierra R, Berget SM. A 5' splice site-proximal enhancer binds SF1 and activates exon bridging of a microexon. Mol Cell Biol (2000) 20:3988–3995.[Abstract/Free Full Text]

    Carmel L, Wolf YI, Rogozin IB, Koonin EV. Three distinct modes of intron dynamics in the evolution of eukaryotes. Genome Res (2007) 17:1034–1044.[Abstract/Free Full Text]

    Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res (2003) 31:3497–3500.[Abstract/Free Full Text]

    Cho S, Jin SW, Cohen A, Ellis RE. A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res (2004) 14:1207–1220.[Abstract/Free Full Text]

    Douzery EJ, Snell EA, Bapteste E, Delsuc F, Philippe H. The timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? Proc Natl Acad Sci USA (2004) 101:15386–15391.[Abstract/Free Full Text]

    Gaunt MW, Miles MA. An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. Mol Biol Evol (2002) 19:748–761.[Abstract/Free Full Text]

    Grimaldi D, Engel MS. The evolution of the insects (2005) New York: Cambridge University Press.

    Hiller M, Nikolajewa S, Huse K, Szafranski K, Rosenstiel P, Schuster S, Backofen R, Platzer M. TassDB: a database of alternative tandem splice sites. Nucleic Acids Res (2007) 35:D188–D192.[Abstract/Free Full Text]

    Holland PW. More genes in vertebrates? J Struct Funct Genomics (2003) 3:75–84.[CrossRef][Medline]

    Honeybee Genome Sequencing Consortium. Insights into social insects from the genome of the honeybee Apis mellifera. Nature (2006) 443:931–949.[CrossRef][Medline]

    Hwang DY, Cohen JB. U1 small nuclear RNA-promoted exon selection requires a minimal distance between the position of U1 binding and the 3' splice site across the exon. Mol Cell Biol (1997) 17:7099–7107.[Abstract]

    Jeffroy O, Brinkmann H, Delsuc F, Philippe H. Phylogenomics: the beginning of incongruence? Trends Genet (2006) 22:225–231.[CrossRef][Web of Science][Medline]

    Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res (2002) 12:656–664.[Abstract/Free Full Text]

    Krauss V, Pecyna M, Kurz K, Sass H. Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2{gamma}. Mol Biol Evol (2005) 22:74–84.[Abstract/Free Full Text]

    Kriegs JO, Churakov G, Kiefmann M, Jordan U, Brosius J, Schmitz J. Retroposed elements as archives for the evolutionary history of placental mammals. PLoS Biol (2006) 4:e91.[CrossRef][Medline]

    Kristensen NP. Phylogeny of endopterygote insects, the most successful lineage of living organisms. Eur J Entomol (1999) 96:237–253.

    Krzywinski J, Besansky NJ. Frequent intron loss in the white gene: a cautionary tale for phylogeneticists. Mol Biol Evol (2002) 19:362–366.[Abstract/Free Full Text]

    Kukalová-Peck J, Lawrence JF. Relationships among coleopteran suborders and major endoneopteran lineages: evidence from hind wing characters. Eur J Entomol (2004) 101:95–144.

    Longhorn SJ, Foster PG, Vogler AP. The nematode–arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases. Cladistics (2007) 23:130–144.[CrossRef][Web of Science]

    Lynch M, Conery JS. The origins of genome complexity. Science (2003) 302:1401–1404.[Abstract/Free Full Text]

    Lynch M, Kewalramani A. Messenger RNA surveillance and the evolutionary proliferation of introns. Mol Biol Evol (2003) 20:563–571.[Abstract/Free Full Text]

    Lynch M, Richardson AO. The evolution of spliceosomal introns. Curr Opin Genet Dev (2002) 12:701–710.[CrossRef][Web of Science][Medline]

    Maddison DR, Maddison WP. MacClade 4.08 (2005) Sunderland (MA): Sinauer Associates.

    Misof B, Niehuis O, Bischoff I, Rickert A, Erpenbeck D, Staniczek A. Towards an 18S phylogeny of hexapods: accounting for group-specific character covariance in optimized mixed nucleotide/doublet models. Zoology (Jena) (2007) 110:409–429.[Medline]

    Morgenstern B, Prohaska SJ, Pöhler D, Stadler PF. Multiple sequence alignment with user-defined anchor points. Algorithms Mol Biol (2006) 1:6.[CrossRef][Medline]

    Nguyen HD, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol (2005) 1:e79.[CrossRef][Medline]

    Nishihara H, Hasegawa M, Okada N. Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc Natl Acad Sci USA (2006) 103:9929–9934.[Abstract/Free Full Text]

    Putnam NH, Srivastava M, Hellsten U. (19 co-authors). Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science (2007) 317:86–94.[Abstract/Free Full Text]

    Raible F, Tessmar-Raible K, Osoegawa K. (12 co-authors). Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science (2005) 310:1325–1326.[Abstract/Free Full Text]

    Rodriguez-Trelles F, Tarrio R, Ayala FJ. Origins and evolution of spliceosomal introns. Annu Rev Genet (2006) 40:47–76.[CrossRef][Medline]

    Roger AJ, Hug LA. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Phil Trans R Soc B (2006) 361:1039–1054.[CrossRef][Medline]

    Rogozin IB, Lyons-Weiler J, Koonin EV. Intron sliding in conserved gene families. Trends Genet (2000) 16:430–432.[CrossRef][Web of Science][Medline]

    Rohdendorf BB, Rasnitsyn AP. Historical development of the class Insecta (1980) Moscow (Russia): Nauka Press.

    Rokas A, Holland PWH. Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol (2000) 15:454–459.[CrossRef][Medline]

    Ross HH. A textbook of entomology (1965) New York: Wiley.

    Roy SW, Fedorov A, Gilbert W. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA (2003) 100:7158–7162.[Abstract/Free Full Text]

    Roy SW, Gilbert W. Resolution of a deep animal divergence by the pattern of intron conservation. Proc Natl Acad Sci USA (2005) 102:4403–4408.[Abstract/Free Full Text]

    Roy SW, Penny D. Smoke without fire: most reported cases of intron gain in nematodes instead reflect intron losses. Mol Biol Evol (2006) 23:2259–2262.[Abstract/Free Full Text]

    Roy SW, Penny D. Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol (2007) 24:171–181.[Abstract/Free Full Text]

    Saeys Y, Rouze P, Van de Peer Y. In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists. Bioinformatics (2007) 23:414–420.[Abstract/Free Full Text]

    Savard J, Tautz D, Lercher MJ. Genome-wide acceleration of protein evolution in flies (Diptera). BMC Evol Biol (2006) 6:e7.[CrossRef]

    Savard J, Tautz D, Richards S, Weinstock GM, Gibbs RA, Werren JH, Tettelin H, Lercher MJ. Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. Genome Res (2006) 16:1334–1338.[Abstract/Free Full Text]

    Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res (2006) 34:3955–3967.[Abstract/Free Full Text]

    Sverdlov AV, Rogozin IB, Babenko VN, Koonin EV. Conservation versus parallel gains in intron evolution. Nucleic Acids Res (2005) 33:1741–1748.[Abstract/Free Full Text]

    Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science (1997) 278:631–637.[Abstract/Free Full Text]

    Venkatesh B, Ning Y, Brenner S. Late changes in spliceosomal introns define clades in vertebrate evolution. Proc Natl Acad Sci USA (1999) 96:10267–10271.[Abstract/Free Full Text]

    Wada H, Kobayashi M, Sato R, Satoh N, Miyasaka H, Shirayama Y. Dynamic insertion-deletion of introns in deuterostome EF-1{alpha} genes. J Mol Evol (2002) 54:118–128.[CrossRef][Web of Science][Medline]

    Wang L, Wang S, Li Y, Paradesi MSR, Brown SJ. BeetleBase: the model organism database for Tribolium castaneum. Nucleic Acids Res (2007) 35:D476–D479.[Abstract/Free Full Text]

    Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, Hartzell G, Lewis S, Rubin GM. Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol (2006) 2:e15.[CrossRef][Medline]

    Yang Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol (1996) 11:367–372.[CrossRef]

    Yoshihama M, Nakao A, Nguyen HD, Kenmochi N. Analysis of ribosomal protein gene structures: implications for intron evolution. PLoS Genet (2006) 2:e25.[CrossRef][Medline]

    Zdobnov EM, Bork P. Quantification of insect genome divergence. Trends Genet (2007) 23:16–20.[CrossRef][Web of Science][Medline]

    Zheng J, Rogozin IB, Koonin EV, Przytycka TM. Support for the coelomata clade of animals from a rigorous analysis of the pattern of intron conservation. Mol Biol Evol (2007) 24:2583–2592.[Abstract/Free Full Text]

Accepted for publication January 7, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/5/821    most recent
msn013v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Krauss, V.
Right arrow Articles by Eisenhardt, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krauss, V.
Right arrow Articles by Eisenhardt, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?