MBE Advance Access originally published online on July 14, 2004
Molecular Biology and Evolution 2004 21(11):2022-2033; doi:10.1093/molbev/msh207
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Retroelement Dynamics and a Novel Type of Chordate Retrovirus-like Element in the Miniature Genome of the Tunicate Oikopleura dioica



* Biofuture Research Group, Physiologische Chemie I, Biozentrum, University of Würzburg, am Hubland, Würzburg, Germany;
Institute for Molecular Genetics, Berlin, Germany; and
Sars Centre for Marine Molecular Biology, Bergen High Technology Centre, Bergen, Norway
E-mail: volff{at}biozentrum.uni-wuerzburg.de.
| Abstract |
|---|
|
|
|---|
Retrotransposable elements have played an important role in shaping eukaryotic DNA, and their activity and turnover rate directly influence the size of genomes. With approximately 15,000 genes within 6575 megabases, the marine tunicate Oikopleura dioica, a nonvertebrate chordate, has the smallest and most compact genome ever found in animals. Consistent with a massive elimination of retroelements, only one apparently novel clade of nonlong terminal repeat (non-LTR) retrotransposons was detected within 41 megabases of nonredundant genomic sequences. In contrast, at least six clades of non-LTR elements were identified in the less compact genome of the tunicate Ciona intestinalis. Unexpectedly, Ty3/gypsy-related Tor LTR retrotransposons presented an astonishing level of diversity in O. dioica. They were generally poorly or apparently not corrupted, indicating recent activity. Both Tor3 and Tor4b families bore an envelope-like open reading frame, suggesting possible horizontal acquisition through infection. The Tor4b envelope-like gene might have been obtained from a paramyxovirus (RNA virus). Tor3 and Tor4b are phylogenetically clearly distinct from vertebrate retroviruses (Retroviridae) and are more reminiscent of certain insect and plant sequences. Tor elements potentially represent a so far unknown, ancient type of infectious retroelement in chordates. Their distribution and transmission dynamics in tunicates and other chordates deserve further study.
Key Words: Oikopleura retroelement retrovirus compact genome reverse transcriptase envelope
| Introduction |
|---|
|
|
|---|
An important part of eukaryotic genomes has been generated by the copying of RNA molecules into DNA through a process called reverse transcription (Brosius 1999, 2003; Boeke 2003). For example, at least 40% of the human genome consists of retroposed sequences, and even more extreme situations have been described in plants and other organisms (Kazazian and Moran 1998; SanMiguel et al. 1998; International Human Genome Sequencing Consortium 2001; Liu et al. 2003). Reverse transcription is generally catalyzed in vivo by reverse transcriptases encoded by autonomous endogenous retrotransposons. A simple classification, based on structural and mechanistic features and supported by the molecular phylogeny of reverse transcriptase sequences, makes a distinction between autonomous retroelements with long terminal repeats (LTR retrotransposons sensu lato, including all known retroviruses), non-LTR retrotransposons (also called LINEs), and Penelope-like elements (Volff, Hornung, and Schartl 2001; Eickbush and Malik 2002; Arkhipova et al. 2003). Short interspersed nuclear elements (SINEs) and other categories of nonautonomous sequences can be mobilized in trans, sometimes very efficiently, by the retrotransposition machinery of reverse transcriptase-encoding retroelements (Esnault, Maestre, and Heidmann 2000; Kajikawa and Okada 2002; Dewannieux, Esnault, and Heidmann 2003). Some retrotransposons can also retrotranspose 3' flanking genomic sequences (Moran, DeBerardinis, and Kazazian 1999; Pickeral et al. 2000).
The impact of retrotransposons on genome size depends on both their rate of retrotransposition and their frequency of elimination (Petrov 2001; Kidwell 2002). Retrotransposition efficiency is determined by the number of retrotransposition-competent elements in the genome and by their level of activity. Genomes and transposable elements have both developed strategies to control transposition (Ketting et al. 1999; Jensen, Gassama, and Heidmann 1999; Tabara et al. 1999; Bestor 2003; Sundararajan, Lee, and Garfinkel 2003), which has allowed their cosurvival over extremely long periods of evolution (Burke et al. 1998). Elimination of retrotransposable elements can occur either through different molecular mechanisms or by natural selection against individual insertions, against genomic rearrangements mediated by ectopic homologous recombination between nonallelic copies, and/or against retrotransposition itself (Eickbush and Furano 2002 and references therein). Finally, having a larger or a smaller genome might be adaptive depending on the circumstance (Petrov 2001), natural selection acting here on the balance between retrotransposition and elimination.
Distinct organisms can respond to retrotransposons in different ways, possibly as the result of differential evolutionary constraints (Petrov 2001; Eickbush and Furano 2002). Species with a compacted genome are of particular interest to explore the sequences, molecular mechanisms, and selective pressures involved in genome size evolution. For example, the genomes of the smooth pufferfishes Takifugu rubripes (Fugu) and Tetraodon nigroviridis, which are about eight times smaller than the human genome, are the most compact genomes described to date in vertebrates (Brenner et al. 1993; Crollius et al. 2000; Aparicio et al. 2002). Smooth pufferfish genomes contain a low percentage of repetitive DNA but, surprisingly, a high diversity of autonomous retrotransposable elements, some of them having been recently active (Crollius et al. 2000; Aparicio et al. 2002; Bouneau et al. 2003; Neafsey and Palumbi 2003; Volff et al. 2003). We now analyze retrotransposon evolutionary behavior in the genome of the tunicate Oikopleura dioica, which represents an even more extreme context of compaction.
Chordates are divided into urochordates (tunicates), cephalochordates, and vertebrates. Urochordates such as Oikopleura dioica (larvacean) and the sea squirt Ciona intestinalis (ascidian) form the sister group of cephalochordates and vertebrates and are therefore instrumental for comparisons of chordates with nonchordates. Important for this study, O. dioica has the smallest animal genome reported so far (6575 megabases [Mb]), in which partial sequencing has revealed a strong compaction of all noncoding regions (Seo et al. 2001). Gene density in O. dioica is approximately two to three times higher than in Ciona intestinalis (with a similar set of genes; Dehal et al. 2002) and Takifugu rubripes (Aparicio et al. 2002) and about 20 times higher than in the human genome (International Human Genome Sequencing Consortium 2001). Whereas most of the genome of C. intestinalis has been recently sequenced and assembled (Dehal et al. 2002), we have generated 41 Mb of nonredundant sequence data from O. dioica through whole-genome shotgun sequencing. Alignments of this data set with sequences of twenty BAC inserts and 2,000 expressed sequence tags showed a very dense and uniform coverage of the genome with numerous small gaps (data not shown).
Here we report a detailed analysis of the reverse transcriptase retrotransposon complement with respect to the degree of genome compaction in chordates, using genomic information from Ciona, Oikopleura, and vertebrates. In addition, this study was expected to provide new hints concerning the evolution of retrotransposable elements near the transitions between chordates and nonchordates, on the one hand, and between invertebrate chordates and vertebrates on the other hand. A particularly interesting question concerns the possible presence and nature of (endogenous) retroviruses in Oikopleura, since vertebrate retroviruses are phylogenetically clearly distinct from those found in insects, nematodes, and plants.
| Materials and Methods |
|---|
|
|
|---|
Genome Sequencing and Assembling
Briefly, sperm DNA was sonicated, size-fractionated around 2 kilobases, and ligated into the pUC19 vector. Approximately 180,000 ends of plasmid inserts were sequenced. The average read length taken into account was rather short (422 bp) to avoid low quality sequences. Reads displaying 50 bp identity were assembled, leading to a data set of 44,797 contigs representing 40,983 megabases of nonredundant sequences. Alignments of 800 nonredundant expressed sequence tags (ESTs) with this shotgun data set showed an average coverage of 65%, with 83% of ESTs matched on more than one-quarter of their length and 38% covered on at least three-quarters of it. Sequences data have been submitted to GenBank/EMBL under accession numbers AY634216-AY634229.
Sequence Analysis
Sequence analysis was performed using the GCG Wisconsin package (Version 10.3, Accelrys Inc., San Diego, Calif.). Multiple sequence alignments were generated using PileUp from the GCG Wisconsin package and ClustalX (Thompson et al. 1997). Phylogenic analyses were performed on amino-acid alignment using maximum likelihood based on Bayesian interference with MrBayes v3.0b4 (Ronquist and Huelsenbeck 2003), using the Neighbor-Joining method (Saitou and Nei 1987; 1,000 pseudosamples) as implemented in PAUP* (Swofford 1998), and using quartet-based maximum likelihood with Tree-Puzzle (Schmidt et al. 2002). No potential artifact due to long branch attraction was detected. Blast analysis was essentially performed using sequence databases accessible from the National Center for Biotechnology Information (NCBI) server (www.ncbi.nlm.nih.gov/BLAST/).
Other Web sites used are: the server for the prediction of signal peptides (http://bioinformatics.leeds.ac.uk/prot_analysis/Signal.html); the Takifugu rubripes Blast server at the U.K. Human Genome Mapping Project Resource Centre, Hinxton, Cambridge (http://fugu.hgmp.mrc.ac.uk/blast); the protein and nucleic acid sequence motif search server of the GenomeNet Database Service at the Bioinformatics Center, Institute for Chemical Research, Kyoto University (http://motif.genome.ad.jp); the Pfam database of protein families at the Washington University in St. Louis (http://pfam.wustl.edu/index.html); the Ciona intestinalis Blast server at NCBI (www.ncbi.nlm.nih.gov/BLAST/Genome/cionaWGSBlast.html); the conserved protein domain search server at NCBI (www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi); a server for prediction of coiled-coils in proteins (www.russell.embl.de/cgi-bin/coils-svr.pl); a server for the prediction of transmembrane domains in proteins (www.sbc.su.se/
miklos/DAS/); the simple modular architecture research tool server at the EMBL (Heidelberg; http://smart.embl-heidelberg.de).
| Results |
|---|
|
|
|---|
Paucity of Non-LTR Retrotransposons in the Genome of O. dioica
We searched for the presence of reverse transcriptase retrotransposons in different sequence databases including those of both urochordate genomes using the TBlastN program (protein queries against six-frame translations of a nucleotide database; Altschul et al. 1990). At least 14 major clades of non-LTR retrotransposons have been described so far in eukaryotes (Malik, Burke, and Eickbush 1999; Lovsin, Gubensek, and Kordis 2001; Eickbush and Malik 2002). Clades contain elements that are grouped together with ample phylogenetic support and date back to the Precambrian era, i.e., clades are at least 600 Myr old (Malik, Burke, and Eickbush 1999; Volff et al. 2003). Strikingly, none of the known clades of non-LTR retrotransposons could be identified in O. dioica, whereas at least six of them were clearly detected in C. intestinalis (table 1 and fig. 1; see also Simmen and Bird 2000; Permanyer, Gonzalez-Duarte, and Albalat 2003; Kojima and Fujiwara 2004 for C. intestinalis retrotransposons). These C. intestinalis clades included the LOA and R2 clades, which thus far have not been found outside arthropods. As observed in arthropods, R2 retrotransposons are also integrated in 28S rDNA in C. intestinalis (Kojima and Fujiwara 2004); 28S ribosomal RNA genes were identified in the O. dioica database, but they were not associated with any R2 sequence. The four other clades (NeSL, L1, I, and L2) were present in nonchordates, C. intestinalis, and vertebrates but were not detected in Oikopleura. These observations suggested extinction, or at least strong copy number reduction, of major clades of non-LTR retrotransposons in the Oikopleura lineage.
|
|
Odin, a Novel Family of Non-LTR Retrotransposons in the Genome of O. dioica
Nevertheless, reverse transcriptase sequences presenting preferential similarities to sequences from non-LTR retrotransposons were detected in the O. dioica database (fig. 2). These sequences, forming a unique family containing divergent elements, were named Odin (for Oikopleura dioica non-LTR; figs. 13
|
|
About 80 Odin reverse transcriptase sequences were detected by TBlastN analysis in the 41 Mb genome assembly of O. dioica using a 335 amino acid reverse transcriptase sequence as a query (we can not exclude that some nonoverlapping short hits might belong to a same partially sequenced element). About 30% of Odin endonuclease and reverse transcriptase-encoding sequences were corrupted by stop or frameshift mutations, suggesting that an important proportion of Odin elements is not functional. The degree of nucleotide sequence identity between Odin elements ranged from less than 60% up to 90%. The low nucleotide identity between certain copies indicated that Odin is an ancient family of non-LTR retrotransposons in the genome of O. dioica.
To improve phylogenetic resolution, the apurinic-apyrimidinic endonuclease and reverse transcriptase amino acid sequences were concatenated and submitted to different types of analysis (fig. 3). This allowed visualizing phylogenetic relationships between some clades that were not detected using reverse transcriptase sequences alone (fig. 1). All methods used indicated that Odin was more closely related to evolutionary younger clades of non-LTR retrotransposons (Jockey, I, LOA, Tad1, R1, CR1, Rex1, and L2) than to more divergent clades (L1 and Rte; Malik Burke and Eickbush 1999; Eickbush and Malik 2002). This confirmed the results of the Blast analysis of public databases using Odin sequences as queries, which were always more related to evolutionary younger clades than to the L1 and Rte clades (data not shown). On the other hand, none of the different methods used revealed any preferential phylogenetic relationship of Odin with any particular clade of non-LTR retrotransposons (fig. 3). Even if we can not exclude that Odin is related to a known clade but that our methodology failed to detect this relationship, our results suggest that Odin represents the first known member of a new clade of non-LTR retrotransposons.
Diversity and Probable Recent Activity of Ty3/gypsy LTR Retrotransposons in the Genome of O. dioica
An unexpected variety of LTR retrotransposons from the Ty3/gypsy group were identified in the genome O. dioica. They were classified into four major families named Tor-1 to Tor-4 (for Ty3/gypsy Oikopleura retrotransposons) according to their phylogenetic position (figs. 45
). Tor elements appeared very diverse and fairly divergent from retrotransposons from other organisms (fig. 5). In particular, none of the four clades of Ty3/gypsy elements identified in the tunicate C. intestinalis (Simmen and Bird 2000; Goodwin and Poulter 2002; this study) was found to be closely related to the Tor retrotransposons of O. dioica.
|
|
Tor retrotransposons generally presented the classical structural features found in most other Ty3/gypsy retrotransposons (Levin 2002), with two overlapping open reading frames called gag (encoding a structural nucleic acid-binding protein with one or several putative zinc fingers) and pol (encoding protease, reverse transcriptase, RnaseH, and integrase; fig. 4). In contrast to other Tor families, no frameshift was observed in Tor3.1 retrotransposons between gag and pol, which were separated by a stop codon at the same position in divergent elements (less than 60% nucleotide identity). Such a structure has already been reported for the Moloney murine leukemia virus, in which about 5% of ribosomes translating the gag gene read through the UAG terminator and translate the in-frame pol gene to produce the gag-pol fusion polyprotein (Wills, Gesteland, and Atkins 1991 and references therein).
The structure and number of zinc fingers was variable in the Gag proteins of Tor retrotransposons (fig. 4). A classical CCHC motif (CX2CX4HX4C; Pfam accession PF00098; Smart accession SM00343) usually found in Gag sequences from retrotransposons and vertebrate retroviruses was detected in Tor1, Tor4, and some Tor3 retrotransposons, but the number of motifs was variable depending on the type of element (one, two, and three motifs in Tor1, Tor4, and Tor3, respectively). Interestingly, this generally conserved motif was not present in the Gag protein of Tor2 and of certain Tor3 retrotransposons. Instead of it, a CCCH putative zinc finger domain was identified in Tor2 (CX5/7CX5/7CX3H) and in Tor3 (CX8CX4/5CX3H). Such motifs are reminiscent of the CCCH zinc fingers (CX8CX5CX3H and similar motifs) that have been found in diverse proteins, some of them binding to RNA (Pfam accession PF00642; Smart accession SM00356; Blackshear 2002). CCCH motifs are also present in the matrix protein 2 of several paramyxoviruses from the Pneumovirinae subfamily (ssRNA negative-strand viruses) and in proteins encoded by the fowlpox and Chilo iridescent viruses (dsDNA viruses, no RNA stage). To the best of our knowledge, this is the first time that CCCH motifs are reported in (putative) proteins encoded by retrotransposable elements.
About 180 Tor reverse transcriptase sequences were detected in the 41 Mb genome assembly (about 50 sequences for Tor2, Tor3, and Tor4, but only two for Tor1; the remaining sequences were too short to be classified unambiguously). The degree of nucleotide identity inside of the Tor2, Tor3, and Tor4 families ranged from less than 60% up to 9899% (the two Tor1 elements showed 70% nucleotide identity). On average, open reading framecorrupting mutations were observed each 7.5 kb, 2.8 kb, and 16 kb for Tor2, Tor3, and Tor4, respectively. In addition, elements with apparently intact gag and pol open reading frames were identified for all four Tor families. These observations, added to the high degree of nucleotide identity between certain copies, strongly suggested that some Tor elements were recently or even are still active. The high level of sequence divergence that was observed even within a same family demonstrated the ancient origin of the Tor elements.
A Novel Diverse Group of Putative Chordate Retroviruses in the Genome of O. dioica
Interestingly, multiple Tor elements phylogenetically belonging to the Ty3/gypsy group contained a third open reading frame (orf3) beside the gag and pol regions (figs. 47![]()
![]()
). A third open reading frame has already been identified in several plant and insect Ty3/gypsy retrotransposons including Gypsy (also called errantiviruses; Boeke et al. 1999), in some plant Ty1/copia elements, and in nematode nonchordate BEL retrotransposons (Lerat and Capy 1999; Malik, Henikoff and Eickbush 2000 and references therein). It was found to encode a transmembrane protein with structural similarities to the envelope (Env) protein of vertebrate retroviruses. Env proteins are required for virus infection and transmission, and some insect errantiviruses can indeed form infective viral-like particles (Kim et al. 1994; Song et al. 1994).
|
|
Even though orf3 putative products from Tor elements were very divergent, they displayed typical features of envelope proteins such as predicted transmembrane domains, coiled-coil-like motifs, conserved cysteines (figs. 47
Possible Acquisition of the Tor4b Envelope-like Gene from a Paramyxovirus
To trace back the origin of the env-like gene of Tor3 and Tor4b elements, sequence databases were analyzed using different search methods (see Malik, Henikoff, and Eickbush 2000). No protein strongly homologous to the Env-like sequences of Tor elements could be identified. Nevertheless, fusion glycoprotein sequences from paramyxoviruses (negative-sense genome single-stranded RNA viruses), which mediate membrane fusion, were systematically present among the five best hits using a Tor4b Env-like protein as a query. For instance, 50% similarity (26% identity) over 109 amino acids was detected with the fusion protein of the Human parainfluenza virus 1 (E = 0.31). Even if the expected value E was relatively modest, it was more than 1,000 times more significant than those obtained in some comparisons between Env-like proteins within the Tor3 group (data not shown), which can possibly be explained by the rapid evolution of the env gene in retroelements. Using the human parainfluenza virus 1 fusion protein against our O. dioica database, a Tor4b Env-like protein was giving the best hit (E = 0.01). Sequence comparison indicated conserved positions for the putative canonical fusion tripeptide F-X-G (Misseri et al. 2003 and references therein), for cysteine residues, for the predicted transmembrane domain, and for one coiled-coil (the Tor4b Env-like protein has a second predicted coiled-coil; fig. 7). Hence, these results suggested that the Tor4b family might have acquired its env-like gene from a paramyxovirus.
Other Reverse Transcriptase Retrotransposons
Among other groups of LTR retrotransposons, only DIRS1-like elements were found in Oikopleura. Sequences available in the database only allowed the reconstruction of a 2.4 kb open reading frame encoding the reverse transcriptase and RnaseH domains, but neither the gag open reading frame nor the LTRs could be identified. In the only two elements for which the sequence could be extended in the 3' direction, only one additional conserved open reading frame could be found
2 kb downstream from the reverse transcriptase/RnaseH-encoding region. We could not establish without ambiguity that the 420 amino acid conceptual translation product of this open reading frame corresponded to the lambda-like recombinase found in other DIRS1-related elements (Goodwin and Poulter 2002), since some very conserved residues were missing (data not shown). About 70 reverse transcriptase sequences from DIRS1-related elements were detected in the 41 Mb genome assembly, with degrees of nucleotide identity ranging from less than 60% up to 90%. DIRS1-related elements were apparently absent from the sea squirt genome and would therefore represent the only type of autonomous retroelement lost in C. intestinalis but present in O. dioica. Our results were also consistent with a possible extinction of Ty1/copia and BEL retrotransposons, which were not detected in Oikopleura (table 1). No "vertebrate" retroviruses (Retroviridae) were found in either tunicate, confirming that they might indeed be specific for the vertebrate lineage.
Penelope-like retroelements were detected in the genome of both O. dioica and C. intestinalis. In O. dioica these elements formed a diverse monophyletic group presenting no obvious preferential relationship with any particular Penelope-like retrotransposon from other organisms (data not shown). Complete elements with an apparently intact unique open reading frame encoding a reverse transcriptase and a C-terminal YIG endonuclease (Volff, Hornung, and Schartl 2001; Arkhipova et al. 2003) were identified, suggesting recent activity. This was confirmed by the presence of very similar elements presenting 9899% nucleotide identity. However, the high divergence between some copies (less than 60% nucleotide identity) showed the diversity and ancient origin of the group of Penelope-like retrotransposons found in O. dioica. About 60 reverse transcriptase sequences were detected in the 41 Mb genome assembly. We did not determine if the 5' untranslated region of Penelope-like elements of O. dioica contains an intron, as observed in flies and bdelloid rotifers (Arkhipova et al. 2003).
| Discussion |
|---|
|
|
|---|
In this study we have analyzed the autonomous retrotransposons of O. dioica, a chordate with the smallest and most compact genome reported to date in the animal kingdom. Even though our shotgun sequencing data set densely and rather evenly covers the genome of Oikopleura, we cannot exclude that some elements representing conserved retrotransposon families are present but were not detected in this study, particularly because they are difficult to clone. In particular, heterochromatic regions rich in repetitive sequences might be more refractory to shotgun sequencing and might be underrepresented in our data set. Genome size estimation based on the degree of coverage of expressed sequence tags by the shotgun data set indicated a total size of 65 Mb for "clonable" DNA, a value very close to the 72 Mb obtained for the complete genome using flow cytometry (Seo et al. 2001). This suggested that "nonclonable" heterochromatic regions, if any, do not represent an important fraction of the genome of Oikopleura.
Taken together, our observations strongly support a massive elimination of major groups of autonomous retrotransposable elements in the O. dioica lineage. This is particularly true for non-LTR retrotransposons, for which only one possibly novel clade of element with apurinic/apyrimidinic endonuclease could be detected. In contrast, the genome of C. intestinalis contains at least six clades of non-LTR retrotransposons (Simmen and Bird 2000; Permanyer, Gonzalez-Duarte, and Albalat 2003; Kojima and Fujiwara 2004; this study).
Ty1/Copia and BEL LTR retrotransposons might also have been lost. This "purification" possibly coincided with an intense process of genome compaction. Hence, our analysis of non-LTR retrotransposons in O. dioica supports an association between genome compaction and a drastic reduction of both copy number and diversity of some autonomous retroelements. This situation differs from that observed in the (less) compact genome of the pufferfish Takifugu rubripes, where a relatively high level of retrotransposon diversity has been maintained despite a strong reduction of repetitive DNA content (Crollius et al. 2000; Aparicio et al. 2002; Bouneau et al. 2003; Neafsey and Palumbi 2003; Volff et al. 2003). Whether genome compaction in O. dioica is related to the exceptional short life cycle of this organism (generation time of 45 days at 20°C, 12 days in the tropics) is a relevant but unsolved question, relevant because a correlation between genome size and longevity has been proposed in other animals (Monaghan and Metcalfe 2000).
Among LTR retrotransposons, DIRS1 and four diverse families of Ty3/gypsy-related elements (Tor1-Tor4) were detected in O. dioica, but neither Ty1/copia nor BEL retrotransposons could be found. We estimated that about 400 reverse transcriptase (pseudo)genes are present in the 41 Mb genome data set, suggesting a total number of autonomous retrotransposons around 600. This number is similar to that observed in the slime mold Dictyostelium discoideum and the nematode Caenorhabitis elegans but higher than that observed in the three times larger genome of the fly Drosophila melanogaster (Kidwell 2002). Oikopleura apparently has a much lower number of retrotransposons than the pufferfish Takifugu rubripes and than other vertebrates with a less compact genome. Retroelements without reverse transcriptase region including very truncated non-LTR retrotransposons, SINE elements, or nonautonomous LTR retrotransposons have not been analyzed here. The genome of O. dioica is not devoid of DNA transposons (class II), since four copies of a Tc1/mariner element could also be identified (data not shown).
A major question to be answered concerns the localization of retrotransposons in the compact genome of Oikopleura. In the pufferfish Tetraodon nigroviridis, retroelements are accumulating in specific heterochromatic regions of the genome together with other types of repeats (Dasilva et al. 2002; Bouneau et al. 2003). We found that an incomplete copy of Tor3 element is integrated 1 kb away from a stretch of 19 tandem telomeric repeats, and several Tor4a insertions are located within tandem clusters of a 58 bp repeat (data not shown). Additional studies are required to determine if repetitive DNA is compartmentalized in the compact genome of O. dioica.
Common characteristics of Ty3/gypsy-related Tor families were a low level of sequence corruption, suggesting a rather recent retrotranspositional activity, and a modest copy number. In the case of a vertical transmission (inheritance), this would indicate a rapid turnover of Tor retrotransposons, i.e., frequent retrotransposition and subsequent elimination maintaining a low number of functional copies. However, the identification of an envelope-like open reading frame in Tor3 and Tor4b strongly suggests that these elements might have been introduced more recently into the genome of Oikopleura through infectious horizontal transfer. Nevertheless, envelope-mediated infectivity remains to be demonstrated for the Tor elements as well as for the great majority of invertebrate retrovirus-like sequences; infection has been shown to date only for the Gypsy element of Drosophila (Kim et al. 1994; Song et al. 1994). Furthermore, cases of horizontal transmission of LTR retrotransposons can also occur in the apparent absence of any detectable env-like gene, as demonstrated for the transfer of a Copia element between two species of Drosophila (Jordan, Matyunina, and McDonald 1999). Hence, we cannot exclude that other Tor elements without any obvious orf3 might also have been acquired more recently by the genome of O. dioica.
According to the phylogenetic analysis of their reverse transcriptase and integrase domain, Tor3 and Tor4b belong to the group of Ty3/gypsy-related retrotransposons. From the phylogenetic point of view, they are therefore clearly different from vertebrate retroviruses (Retroviridae, represented by the Human Immunodeficiency Virus 1 and the Rous Sarcoma Virus in fig. 5; Varmus and Brown 1989; Eickbush and Malik 2002). Ty3/gypsy-related Env-encoding retrovirus-like elements have been described in plants (Peterson-Burch et al. 2000; Vicient, Kalendar, and Schulman 2001) and insects (Kim et al. 1994; Pelisson et al. 1994; Song et al. 1994; Leblanc et al. 1997; Pantazidis, Labrador, and Fontdevila 1999), but we could not find any evidence for a preferential phylogenetic relationship between these elements and those identified in O. dioica.
Invertebrate retroviruses have probably obtained their env genes independently from distinct origins (Malik, Henikoff, and Eickbush 2000). Different nematode BEL-like retroviruses might have acquired their env gene from phleboviruses (single ambisense-stranded RNA viruses) and Herpesviridae (double-stranded DNA viruses) and insect Gypsy-related errantiviruses from baculoviruses (double-stranded DNA viruses). Our results suggest that a paramyxovirus (negative-sense genome single-stranded RNA virus) provided the Tor4b env gene. Interestingly, restricted sequence similarities have also been recently reported between the Env protein of Gypsy and paramyxovirus fusion proteins (Misseri et al. 2003). There is no indication about the origin of the env gene in the Tor3 family. Either env has been acquired in a common ancestor of Tor3 and Tor4 and subsequently lost in the Tor4a subfamily, or env-like genes have been gained independently in Tor3 and Tor4b. Another open question concerns the origin of the atypical CCCH zinc fingers domains in Tor2 and certain Tor3 elements. These motifs might have evolved from a preexisting canonical CCHC domain. Alternatively, they might have been acquired from other coding sequences, for example from paramyxovirus matrix genes.
The function of the Env-like sequences in Oikopleura Tor elements remains to be determined. Nevertheless, Tor3 and Tor4b potentially represent, besides vertebrate retroviruses, a second and so far unidentified family of infectious retroelements of the chordate phylum. The extreme degree of divergence between Tor retrovirus-like elements suggests that they belong to an evolutionary ancient group of infectious retroelements. Therefore, they might not be restricted only to Oikopleura but rather be much more widespread. In the wake of SARS, experts are suggesting that surveillance for emerging diseases should extend to sampling and characterization of the entire panoply of viruses, which are circulating not only in people but also in animals (see Nature 424:113, 2003). Future prospects include the study of the distribution, transmission dynamics, and infection mechanisms of the Tor retrovirus-like elements in tunicates and other chordates, as well as the assessment of their potential as tools for transgenesis.
| Acknowledgements |
|---|
|
|
|---|
We are grateful to Manfred Schartl for discussions and encouragement. J.N.V. is supported by the Biofuture program from the German Bundesministerium für Bildung and Forschung (BMBF).
| Footnotes |
|---|
William Jeffery, Associate Editor
| References |
|---|
|
|
|---|
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410.[CrossRef][Web of Science][Medline]
[Anonymous]. 2003. We have been warned. Nature 424:113.[CrossRef][Medline]
Aparicio, S., J. Chapman, E. Stupka et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:13011310.
Arkhipova, I. R., K. I. Pyatkov, M. Meselson, and M. B. Evgen'ev. 2003. Retroelements containing introns in diverse invertebrate taxa. Nat. Genet. 33:123124.[CrossRef][Web of Science][Medline]
Bestor, T. H. 2003. Cytosine methylation mediates sexual conflict. Trends Genet. 19:185190.[CrossRef][Web of Science][Medline]
Blackshear, P. J. 2002. Tristetraprolin and other CCCH tandem zinc-finger proteins in the regulation of mRNA turnover. Biochem. Soc. Trans. 30:945952.[CrossRef][Web of Science][Medline]
Boeke, J. D. 2003. The unusual phylogenetic distribution of retrotransposons: a hypothesis. Genome Res. 13:19751983.
Boeke, J. D., T. H. Eickbush, S. B. Sandmeyer, and D. F. Voytas. 1999. Metaviridae. Pp. in F. A. Murphy, ed. Virus taxonomy: ICTV VIIth report. Springer-Verlag, New York;349357.
Bouneau, L., C. Fischer, C. Ozouf-Costaz, A. Froschauer, O. Jaillon, J.-P. Coutanceau, C. Körting, J. Weissenbach, A. Bernot, and J.-N. Volff. 2003. An active non-LTR retrotransposon with tandem structure in the compact genome of the pufferfish Tetraodon nigroviridis. Genome Res. 13:16861695.
Brenner, S., G. Elgar, R. Sandford, A. Macrae, B. Venkatesh, and S. Aparicio. 1993. Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265268.[CrossRef][Medline]
Brosius, J. 1999. Genomes were forged by massive bombardments with retroelements and retrosequences. Genetica 107:209238.[CrossRef][Web of Science][Medline]
. 2003. The contribution of RNAs and retroposition to evolutionary novelties. Genetica 118:99116.[CrossRef][Web of Science][Medline]
Burke, W. D., H. S. Malik, W. C. Lathe 3rd, and T. H. Eickbush. 1998. Are retrotransposons long-term hitchhikers? Nature 392:141142.[CrossRef][Medline]
Craigie, R. 2002. Retroviral DNA integration. Pp. 613630 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. American Society of Microbiology Press, Washington, D.C.
Crollius, H. R., O. Jaillon, C. Dasilva et al. (12 co-authors). 2000. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res. 10:939949.
Dasilva, C., H. Hadji, C. Ozouf-Costaz, S. Nicaud, O. Jaillon, J. Weissenbach, and H. R. Crollius. 2002. Remarkable compartmentalization of transposable elements and pseudogenes in the heterochromatin of the Tetraodon nigroviridis genome. Proc. Natl. Acad. Sci. USA 99:1363613641.
Dehal, P., Y. Satou, R. K. Campbell et al. (87 co-authors). 2002. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298:21572167.
Dewannieux, M., C. Esnault, and T. Heidmann. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35:4148.[CrossRef][Web of Science][Medline]
Eickbush, T. H., and A. V. Furano. 2002. Fruit flies and humans respond differently to retrotransposons. Curr. Opin. Genet. Dev. 12:669674.[CrossRef][Web of Science][Medline]
Eickbush, T. H., and H. S. Malik. 2002. Origins and evolution of retrotransposons. Pp. 11111144 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. American Society of Microbiology Press, Washington, D.C.
Esnault, C., J. Maestre, and T. Heidmann. 2000. Human LINE retrotransposons generate processed pseudogenes. Nat. Genet. 24:363367.[CrossRef][Web of Science][Medline]
Feng, Q., J. V. Moran, H. H. Kazazian Jr., and J. D. Boeke. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905916.[CrossRef][Web of Science][Medline]
Goodwin, T. J., and R. T. Poulter. 2002. A group of deuterostome Ty3/gypsy-like retrotransposons with Ty1/copia-like pol-domain orders. Mol. Genet. Genomics 267:481491.[CrossRef][Web of Science][Medline]
International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860921.[CrossRef][Medline]
Jensen, S., M. P. Gassama, and T. Heidmann. 1999. Taming of transposable elements by homology-dependent gene silencing. Nat. Genet. 21:209212.[CrossRef][Web of Science][Medline]
Jordan, I. K., L. V. Matyunina, and J. F. McDonald. 1999. Evidence for the recent horizontal transfer of long terminal repeat retrotransposon. Proc. Natl. Acad. Sci. USA 96:1262112625.
Kajikawa, M., and N. Okada. 2002. LINEs mobilize SINEs in the eel through a shared 3' sequence. Cell 111:433444.[CrossRef][Web of Science][Medline]
Kazazian, H. H. Jr., and J. V. Moran. 1998. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19:1924.[CrossRef][Web of Science][Medline]
Ketting, R. F., T. H. Haverkamp, H. G. van Luenen, and R. H. Plasterk. 1999. Mut-7 of C. elegans, required for transposon silencing and RNA interference, is a homolog of Werner syndrome helicase and RNaseD. Cell 99:133141.[CrossRef][Web of Science][Medline]
Kidwell, M. G. 2002. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:4963.[CrossRef][Web of Science][Medline]
Kim, A., C. Terzian, P. Santamaria, A. Pelisson, N. Prud'homme, and A. Bucheton. 1994. Retroviruses in invertebrates: the gypsy retrotransposon is apparently an infectious retrovirus of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 91:12851289.
Kojima, K. K., and H. Fujiwara. 2004. Cross-genome screening of novel sequence-specific non-LTR retrotransposons: various multicopy RNA genes and microsatellites are selected as targets. Mol. Biol. Evol. 21:207217.
Leblanc, P., S. Desset, B. Dastugue, and C. Vaury. 1997. Invertebrate retroviruses: ZAM a new candidate in D. melanogaster. EMBO J. 16:75217531.[CrossRef][Web of Science][Medline]
Lerat, E., and P. Capy. 1999. Retrotransposons and retroviruses: analysis of the envelope gene. Mol. Biol. Evol. 16:11981207.[Abstract]
Levin, H. L. 2002. Newly identified retrotransposons of the Ty3/gypsy class in fungi, plants and vertebrates. Pp. 684701 in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II. American Society of Microbiology Press, Washington, D.C.
Liu, G., S. Zhao, J. A. Bailey, S. C. Sahinalp, C. Alkan, E. Tuzun, E. D. Green, and E. E. Eichler. 2003. Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome Res. 13:358368.
Lovsin, N., F. Gubensek, and D. Kordis. 2001. Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia. Mol. Biol. Evol. 18:22132224.
Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793805.[Abstract]
Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73:51865190.
Malik, H. S., S. Henikoff, and T. H. Eickbush. 2000. Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10:13071318.
Misseri, Y., G. Labesse, A. Bucheton, and C. Terzian. 2003. Comparative sequence analysis and predictions for the envelope glycoproteins of insect endogenous retroviruses. Trends Microbiol. 11:253256.[CrossRef][Web of Science][Medline]
Monaghan, P., and N. B. Metcalfe, 2000. Genome size and longevity. Trends Genet. 16:331332.[CrossRef][Web of Science][Medline]
Moran, J. V., R. J. DeBerardinis, and H. H. Kazazian Jr. 1999. Exon shuffling by L1 retrotransposition. Science 283:15301534.
Neafsey, D. E., and S. R. Palumbi. 2003. Genome size evolution in pufferfish: a comparative analysis of diodontid and tetraodontid pufferfish genomes. Genome Res. 13:821830.
Pantazidis, A., M. Labrador, and A. Fontdevila. 1999. The retrotransposon Osvaldo from Drosophila buzzatii displays all structural features of a functional retrovirus. Mol. Biol. Evol. 16:909921.[Abstract]
Pelisson, A., S. U. Song, N. Prud'homme, P. A. Smith, A. Bucheton, and V. G. Corces. 1994. Gypsy transposition correlates with the production of a retroviral envelope-like protein under the tissue-specific control of the Drosophila flamenco gene. EMBO J. 13:44014411.[Web of Science][Medline]
Permanyer, J., R. Gonzalez-Duarte, and R. Albalat. 2003. The non-LTR retrotransposons in Ciona intestinalis: new insights into the evolution of chordate genomes. Genome Biol. 4:R73.[CrossRef][Medline]
Peterson-Burch, B. D., D. A. Wright, H. M. Laten, and D. F. Voytas. 2000. Retroviruses in plants? Trends Genet. 16:151152.[CrossRef][Web of Science][Medline]
Petrov, D. A. 2001. Evolution of genome size: new approaches to an old problem. Trends Genet. 17:2328.[CrossRef][Web of Science][Medline]
Pickeral, O. K., W. Makalowski, M. S. Boguski, and J. D. Boeke. 2000. Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res. 10:411415.
Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:15721574.
Saitou, N., and M. Nei. 1987. The Neighbor-Joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L. Bennetzen 1998. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20:4345.[CrossRef][Web of Science][Medline]
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502504.
Seo, H. C., M. Kube, R. B. Edvardsen et al. (11 co-authors). 2001. Miniature genome in the marine chordate Oikopleura dioica. Science 294:2506.
Simmen, M. W., and A. Bird. 2000. Sequence analysis of transposable elements in the sea squirt, Ciona intestinalis. Mol. Biol. Evol. 17:16851694.
Song, S. U., T. Gerasimova, M. Kurkulos, J. D. Boeke, and V. G. Corces. 1994. An env-like protein encoded by a Drosophila retroelement: evidence that gypsy is an infectious retrovirus. Genes Dev. 8:20462057.
Sundararajan, A., B. S. Lee, and D. J. Garfinkel. 2003. The Rad27 (Fen-1) nuclease inhibits Ty1 mobility in Saccharomyces cerevisiae. Genetics 163:5567.
Swofford, D. L. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Tabara, H., M. Sarkissian, W. G. Kelly, J. Fleenor, A. Grishok, L. Timmons, A. Fire, and C. C. Mello. 1999. The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99:123132.[CrossRef][Web of Science][Medline]
Tatusova, T. A, and T. L. Madden. 1999. Blast 2 sequencesa new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174:247250.[CrossRef][Web of Science][Medline]
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:48764882.
Varmus, H., and P. Brown. 1989. Retroviruses. Pp. 53108 in D. E. Berg and M. M. Howe, eds. Mobile DNA. American Society of Microbiology Press, Washington, D.C.
Vicient, C. M., R. Kalendar, and A. H. Schulman. 2001. Envelope-class retrovirus-like elements are widespread, transcribed and spliced, and insertionally polymorphic in plants. Genome Res. 11:20412049.
Volff, J.-N., L. Bouneau, C. Ozouf-Costaz, and C. Fischer. 2003. Diversity of retrotransposable elements in compact pufferfish genomes. Trends Genet. 19:674678.[CrossRef][Web of Science][Medline]
Volff, J.-N., U. Hornung, and M. Schartl. 2001. Fish retroposons related to the Penelope element of Drosophila virilis define a new group of retrotransposable elements. Mol. Genet. Genomics 265:711720.[CrossRef][Web of Science][Medline]
Volff, J.-N., C. Körting, K. Sweeney, and M. Schartl. 1999. The non-LTR retrotransposon Rex3 from the fish Xiphophorus is widespread among teleosts. Mol. Biol. Evol. 16:14271438.[Abstract]
Wills, N. M., R. F. Gesteland, and J. F. Atkins. 1991. Evidence that a downstream pseudoknot is required for translational read-through of the Moloney murine leukemia virus gag stop codon. Proc. Natl. Acad. Sci. USA 88:69916995.
Yang, J., H. S. Malik, and T. H. Eickbush. 1999. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl. Acad. Sci. USA 96:78477852.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Llorens, R. Futami, D. Bezemer, and A. Moya The Gypsy Database (GyDB) of mobile genetic elements Nucleic Acids Res., January 11, 2008; 36(suppl_1): D38 - D46. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. R. Arkhipova Distribution and Phylogeny of Penelope-Like Elements in Eukaryotes Syst Biol, December 1, 2006; 55(6): 875 - 885. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








