Skip Navigation


MBE Advance Access originally published online on November 2, 2005
Molecular Biology and Evolution 2006 23(2):411-420; doi:10.1093/molbev/msj046
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/2/411    most recent
msj046v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bringaud, F.
Right arrow Articles by Ghedin, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bringaud, F.
Right arrow Articles by Ghedin, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Article

The Trypanosoma cruzi L1Tc and NARTc Non-LTR Retrotransposons Show Relative Site Specificity for Insertion

Frédéric Bringaud*, Daniella C. Bartholomeu{dagger}, Gaëlle Blandin{dagger}, Arthur Delcher{dagger}, Théo Baltz*, Najib M. A. El-Sayed{dagger},{ddagger} and Elodie Ghedin{dagger},{ddagger}

* Laboratoire de Génomique Fonctionnelle des Trypanosomatides, UMR-5162 Centre National de la Recherche Scientifique, Université Victor Segalen Bordeaux 2, Bordeaux Cedex, France; {dagger} The Institute for Genomic Research, Rockville; and {ddagger} Department of Microbiology and Tropical Medicine, George Washington University

E-mail: bringaud{at}u-bordeaux2.fr.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The trypanosomatid protozoan Trypanosoma cruzi contains long autonomous (L1Tc) and short nonautonomous (NARTc) non–long terminal repeat retrotransposons. NARTc (0.25 kb) probably derived from L1Tc (4.9 kb) by 3'-deletion. It has been proposed that their apparent random distribution in the genome is related to the L1Tc-encoded apurinic/apyrimidinic endonuclease (APE) activity, which repairs modified residues. To address this question we used the T. cruzi (CL-Brener strain) genome data to analyze the distribution of all the L1Tc/NARTc elements present in contigs larger than 10 kb. This data set, which represents 0.91x sequence coverage of the haploid nuclear genome (~55 Mb), contains 419 elements, including 112 full-length L1Tc elements (14 of which are potentially functional) and 84 full-length NARTc. Approximately half of the full-length elements are flanked by a target site duplication, most of them (87%) are 12 bp long. Statistical analyses of sequences flanking the full-length elements show the same highly conserved pattern upstream of both the L1Tc and NARTc retrotransposons. The two most conserved residues are a guanine and an adenine, which flank the site where first-strand cleavage is performed by the element-encoded endonuclease activity. This analysis clearly indicates that the L1Tc and NARTc elements display relative site specificity for insertion, which suggests that the APE activity is not responsible for first-strand cleavage of the target site.

Key Words: apurinic/apyrimidinic endonuclease • L1Tc • NARTc • non-LTR retrotransposon • retroposon • site specificity • Trypanosoma cruzi


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Retrotransposons are ubiquitous mobile genetic elements that transpose through an RNA intermediate and are found in the genome of most eukaryotes (Capy et al. 1998Go). They can be divided into two lineages that utilize completely different mechanisms of integration. Those elements with long terminal repeats (LTR), called LTR-retrotransposons, are similar both in structure and retrotransposition mechanism to retroviruses (Whitcomb and Hughes 1992Go). Those elements that lack LTR, called non-LTR retrotransposons or retroposons, use a simpler mechanism of transposition. The current model for transposition of non-LTR retrotransposons was proposed based on the analysis of the insect R2 element (Luan et al. 1993Go). This model predicts that an element-encoded endonuclease performs a single-strand nick of the target DNA, generating an exposed 3'-hydroxyl that serves as a primer for reverse transcription of the element's RNA. The complementary strand of the new DNA copy of the element is thus directly synthesized onto the chromosome by the element-encoded reverse transcriptase. The second single-strand nick is carried out on the other strand, a few base pairs downstream of the first nick, by the same element-encoded endonuclease, generating a primer for the second-strand synthesis of the retroelement. Consequently, the non-LTR retroelements are flanked by a direct repeat corresponding to the sequence between the two single-strand nicks performed by the element-encoded endonuclease, called target site duplication (TSD). They also have a variable length poly(A) or A-rich 3'-tail due to the involvement of an RNA intermediate.

Non-LTR retroelements are very diverse in structure, can insert into a wide variety of DNA target sequences, and have been divided into five groups depending on phylogenetic analyses (Eickbush and Malik 2002Go). Members of the R2 group integrate within very specific sequences, such as rDNA genes (R2 and R4) and the spliced leader (SL) RNA genes (NeSL-1, SLACS, CZAR, CRE1, and CRE2) (Craig 1997Go). The site specificity is due to the element-encoded integrase-like domain, which presents characteristics of restriction enzymes (Yang, Malik, and Eickbush 1999Go; Volff et al. 2001Go). In contrast, most of the non-LTR retroelements constituting the four other groups as exemplified by the human L1 element are considered to be randomly distributed in the genome. All these retroelements encode an endonuclease domain homologous to apurinic/apyrimidinic endonucleases (APE), not related to the integrase-like domain of the R2 group. However, the observed bias in the base composition at the L1 insertion site correlates with the relative sequence specificity of the L1-encoded APE-like domain, indicating that the distribution of these retroelements is not random (Feng et al. 1996Go; Jurka 1997Go; Cost and Boeke 1998Go; Tatout, Lavie, and Deragon 1998Go).

Trypanosomatids are unicellular protists including human pathogens responsible for Chagas' disease (Trypanosoma cruzi), African sleeping sickness (Trypanosoma brucei), and leishmaniasis (Leishmania ssp.). Recently, the genome sequence of these three trypanosomatid parasites has been completed (Berriman et al. 2005Go; El-Sayed et al. 2005Go; Ivens et al. 2005Go). The non-LTR retrotransposons constitute the most abundant mobile elements described in the genome of T. cruzi and T. brucei (~3% of nuclear genome), while no potentially active mobile elements have been characterized so far in Leishmania major. The few T. cruzi CZAR (7.25 kb) and T. brucei SLACS (6.3 kb) are site-specific retroelements only found in the SL RNA genes (Aksoy et al. 1987Go; Villanueva et al. 1991Go). However, the most abundant non-LTR elements are L1Tc and NARTc in T. cruzi (Martin et al. 1995Go; Bringaud et al. 2002Go), ingi and RIME in T. brucei (Hasan, Turner, and Cordingley 1984Go; Kimmel, Ole-MoiYoi, and Young 1987Go; Murphy et al. 1987Go), with 320 (L1Tc), 133 (NARTc), 115 (ingi), and 86 (RIME) copies per haploid genomes (El-Sayed et al. 2005Go). In T. cruzi, the first 250 bp of the autonomous L1Tc (4.9 kb) and the nonautonomous NARTc (0.25 kb) elements share the first 78 residues and other conserved blocks (fig. 1A), suggesting that NARTc was derived from L1Tc by a 3'-deletion. Similarly, the nonautonomous T. brucei RIME (0.5 kb) appears as a truncated version of the autonomous T. brucei ingi (5.25 kb) by deletion of the central 4.7 kb fragment. The potentially functional L1Tc and ingi encode a large single protein (1,574 and 1,657 amino acids, respectively) responsible for their retrotransposition. This protein contains the central reverse transcriptase (Garcia-Perez et al. 2003Go) and RNAse H (Olivares et al. 2002Go) domains, a C-terminal DNA-binding domain (Pays and Murphy 1987Go), and an N-terminal APE-like domain (Olivares, Alonso, and Lopez 1997Go) (fig. 1B). In contrast, the nonautonomous NARTc and RIME elements presumably use the L1Tc- or ingi-encoded enzymatic activities for their own transposition, as previously shown for the nonautonomous human Alu and eel UnaSINE1 elements, which take advantage of the L1 and UnaL2 machinery, respectively (Kajikawa and Okada 2002Go; Dewannieux, Esnault, and Heidmann 2003Go).



View larger version (14K):
[in this window]
[in a new window]
 
FIG. 1.— Schematic representation and comparison of the L1Tc and NARTc non-LTR retrotransposons present in the Trypanosoma cruzi genome. Panel A shows a comparison of the schematic nucleotide sequences of a L1Tc (4.9 kb) and a NARTc (0.25 kb) contained in BAC62 (ACC: AF208537) and BAC52 (ACC: AF215898) sequences, respectively (Olivares et al. 2000Go). The L1Tc retroelement contains a single long ORF (4,722 bp), from position 137 (ATG codon) to position 4859 (TAG codon), which encodes a 1,574-amino acid protein. The first 78 bp of the L1Tc retroelements are identical to the beginning of the NARTc retroelements (gray boxes), whereas the following 172 bp of L1Tc are 54% identical with the corresponding sequence of NARTc (white boxes). The last 13 bp upstream of the poly(dA) terminal sequence (black box at the end of both maps) are also conserved between L1Tc and NARTc (85% identity). Panel B is a schematic map of the single protein encoded by the autonomous L1Tc element (1,574 amino acids). Localization of known domains is indicated on the map.

 
The T. cruzi L1Tc/NARTc and T. brucei ingi/RIME were considered to be randomly distributed in the genome (S. Bhattacharya, Bakre, and A. Bhattacharya 2002Go). However, it has recently been observed that the T. brucei ingi and RIME elements display a relative site specificity for insertion (Bringaud et al. 2004Go). Here, we show that T. cruzi L1Tc and NARTc are inserted downstream of a highly conserved motif. According to the current model for retrotransposition of non-LTR retrotransposons, this conserved motif is probably the binding site of the L1Tc-encoded APE-like domain.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Database Mining
To identify all the L1Tc/NARTc elements in T. cruzi assembled genome segments larger than 10 kb (or contigs >10 kb), we used complementary Blast approaches. A BlastN search was performed on contigs >10 kb with the NARTc nucleotide sequence (250 bp), which shares with L1Tc the first 78 bp and the last 13 bp (plus the poly(A) tail) (fig. 1), to identify all NARTc and full-length or 3'-truncated L1Tc. To detect the 5'-truncated L1Tc and confirm the other L1Tc elements previously detected, a TBlastN search was performed with the full-length L1Tc protein sequence as query (1,574 amino acids).

Statistical Analysis
To quantify the degree of conservation at each column in the sequence multialignments, a chi-square score was computed comparing the observed distribution of ACGTs in the column to the distribution in the entire genome. The background ACGT distribution for the genome was obtained by counting the occurrences of each base in the set of all shotgun reads used for the assembly of the genome. Then in each of the four multialignments at each column, the chi-square score was computed as:

where oi is the observed number of occurrences of character i in the given column, and ei is the expected number of occurrences of character i computed as the proportion of character i in all reads times the number of sequences in that column of the multialignment. Using 3 degrees of freedom, a {chi}2 value of 16.3 corresponds to a significance level of P < 0.001.

The consensus sequence located upstream of the trypanosome retroelements was also shown using a graphic representation called "sequence logo" (Schneider and Stephens 1990Go; Crooks et al. 2004Go). This analysis was performed on the same data set as for the chi-square test with the online program (http://weblogo.berkeley.edu/logo.cgi).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Identification of All the L1Tc and NARTc in the T. cruzi Genome
The nuclear genome of T. cruzi (CL-Brener strain) was recently completed using a whole-genome shotgun strategy (El-Sayed et al. 2005Go). The current draft of the T. cruzi genome assembly at 15x sequence coverage consists of 784 scaffolds greater than 5 kb built by 3,954 contigs, totaling 60 Mb. The longest contig was 256 kb (the smallest is 215 bp) and the contig N50 size was only 25.8 kb (i.e., half of the nucleotides incorporated into contigs are in contigs greater than 25.8 kb) due to the repetitive and hybrid nature of the CL-Brener genome (El-Sayed et al. 2005Go). Because the L1Tc non-LTR retrotransposon is ~4.9 kb long, contigs longer than 10 kb have been retained for this analysis. This data set is composed of 1,701 contigs, totaling 49.9 Mb (83% of the annotated data set) and representing approximately 0.91x sequence coverage of the haploid nuclear genome (~55 Mb). To identify all the L1Tc (4.9 kb) and NARTc (0.25 kb) retroelements in this data set, we have used Blast approaches, as described in Materials and Methods. We identified 419 L1Tc and NARTc elements, including 24 truncated elements, which could be either L1Tc or NARTc because they only contain the 78-bp N-terminal conserved sequence (table 1). Among the 296 identified L1Tc elements, 59 are not complete due to their location at one extremity of the contigs. Approximately one half of the completely sequenced L1Tc are truncated at their 5', 3', or both extremities (118, 4, and 3 elements, respectively). Interestingly, most of the incomplete L1Tc are 5'-truncated, as observed for other non-LTR retrotransposons as a result of the low processivity of the element-encoded reverse transcriptase (George, Burke, and Eickbush 1996Go; Kazazian and Moran 1998Go). The other half is full-length L1Tc (112 elements). Although there is no certainty as to what represents a functional L1Tc element, we consider that the few elements which contain a single long open reading frame (ORF) encoding a 1,574-amino acids protein are probably functional (14 elements). Contigs larger than 10 kb also contain 98 NARTc elements, most of them being full length (84 elements), the other being 5'-truncated (7 elements), or 3'-truncated (7 elements). This data set contains an additional NARTc element truncated because of its location at the extreme end of the contig. The analyzed data set containing 419 elements represents 0.91x sequence coverage of the haploid nuclear genome, indicating that the T. cruzi haploid genome (1x) contains ~460 non-LTR retrotransposons (~345 L1Tc and ~115 NARTc), including ~15 L1Tc which potentially codes for functional retrotransposition enzymes. These data are consistent with our previous estimate based on the analysis of the T. cruzi genome survey sequences (GSS) database (286 L1Tc and 140 NARTc per haploid genome) (Bringaud et al. 2002Go). The genome contains three times more L1Tc than NARTc and about half of the elements are full length. It is noteworthy that the genome of the hybrid T. cruzi CL-Brener strain contains two distinct diploid haplotypes (El-Sayed et al. 2005Go), consequently, the number of non-LTR retrotransposons per nuclear genome would range between 1,000 and 1,500.


View this table:
[in this window]
[in a new window]
 
Table 1 Number of L1Tc and NARTc Non-LTR Retrotransposons Contained in the Trypanosoma cruzi "contigs" Larger than 10 kba

 
Most L1Tc and NARTc Are Flanked by a 12-bp Target Site Duplication
According to the current model for retrotransposition of non-LTR retrotransposons, the target-primed reverse transcription (TPRT) process is initiated by the element-encoded endonuclease, which performs a single-strand cleavage (Luan et al. 1993Go). The same endonuclease also cleaves the other strand, a few nucleotides downstream of the first cleavage site. Consequently, the sequence located between both single-strand cleavages is duplicated to form the target site duplication (TSD) flanking the newly retrotransposed element. To study the insertion site of the T. cruzi retroelements, we compared the 5'- and 3'-adjacent sequence of all full-length L1Tc (112 elements) and NARTc (84 elements) identified in contigs larger than 10 kb. We have also included in this analysis all full-length NARTc (24 elements) present in the other contigs (smaller than 10 kb). Based on their flanking regions (120 bp upstream and downstream of each element), these 220 full-length elements (112 L1Tc and 108 NARTc) form 124 groups of nearly identical sequences (fig. 2), eight groups contain both L1Tc and NARTc elements, suggesting that both elements are using the same retrotransposition machinery, as previously observed for the human L1/Alu elements (Dewannieux, Esnault, and Heidmann 2003Go) and the UnaL2/UnaSINE1 elements from eel (Kajikawa and Okada 2002Go). Among the 220 full-length elements analyzed, 117 are flanked by a TSD (fig. 2A). Interestingly, the TSD is composed of 12 bp for 105 L1Tc/NARTc (87%), and the other 12 elements (13%) show a 13-bp TSD. The same situation was observed for the T. brucei ingi/RIME non-LTR retrotransposons, that is, 34 of 36 analyzed TSD are 12 bp long (94.4%) and the two others are 11 bp long (5.6%) (Bringaud et al. 2004Go). As far as we know, the size conservation of the TSD (12 bp) is unique to trypanosome retroelements. Indeed, all of the other nonsite-specific non–LTR retrotransposons characterized so far have polymorphic flanking direct repeats, as exemplified by human L1, Alu, and rodent ID elements, whose sizes range between 4 and 26 bp (Jurka 1997Go). The size of the TSD primarily depends on the position of the second-strand cleavage. The mechanism of the second-strand cleavage at the downstream site is poorly understood, however, it is commonly accepted that the element-encoded endonuclease is responsible for the first and second single-strand nicks of the target DNA. Thus, it is tempting to propose that the conservation of the TSD size, resulting from the retrotransposition of the T. brucei and T. cruzi retroelements, is due to mechanistic properties shared by the ingi- and L1Tc-encoded endonuclease.




View larger version (216K):
[in this window]
[in a new window]
 
FIG. 2.— Comparison of the 5'- and 3'-adjacent sequences flanking the L1Tc/NARTc retroelements identified by Trypanosoma cruzi genome sequencing. In this figure is shown only one representative of each group of elements flanked, or not, by nearly identical sequences. Only full-length L1Tc/NARTc elements are considered in panels AC, while 5'-truncated elements are shown in panel D. The retroelements flanked by a TSD are shown in panels A and D, while those that are not flanked by a TSD are shown in panels B and C. In panel B are shown those which have 5'-adjacent sequences identical to elements described in panel A. In the right margin the name of L1Tc and/or NARTc elements is indicated. The number with six characters corresponds to the end of the contig (ct) number, the last number indicates the position of the retroelement in the contig, and the accession number of the corresponding contig is indicated in the right margin ("ACC n°"). The "L" and "N" columns indicate the number of L1Tc and NARTc analyzed elements flanked by nearly identical sequences, respectively. The alignment of all the selected sequences was based on the retroelement sequences (gray column headed "L1Tc/NARTc") from which only the first and the last 6 bp, separated by the type of retroelement (L1Tc or NARTc), are shown. The number of residues truncated at the 5'-extremity of L1Tc/NARTc elements is indicated in brakets in panel D. The potentially functional L1Tc elements that code for a full-length protein (1,574 amino acids) are indicated by white characters on a black background. In panels A, B, and D, the TSD flanking the retroelements (called "TSD") is indicated by bold faced and capital characters, while, in panels C and B, the equivalent sequences of the TSD-less retroelements are indicated as "putative TSD." Lower case characters in the TSD (panel A and D) correspond to nonconserved residues. The numbers in column "(n)" indicate the 5'-adjacent sequence flanking the L1Tc/NARTc shared by elements in panels A, B, and D. In some cases, the analyzed retroelements are preceded by another retroelement sequence (indicated by the word "L1Tc," shaded in gray and identified in parentheses). Residues within the TSD and 5'-flanking sequences that match the consensus are indicated with white characters on a black background.

 
About half of the L1Tc/NARTc elements are truncated, most of them showing a 5'-deletion (~80% of the truncated elements) (table 1). In humans, greater than 95% of L1 elements are variably 5'-truncated during retrotransposition by TPRT (Kazazian and Moran 1998Go). These 5'-truncated L1 elements, as well as full-length elements, are flanked by a TSD (Morrish et al. 2002Go). To determine whether the same feature is present in trypanosomatid non-LTR retrotransposons, we searched for TSD flanking all of the 3'- and/or 5'-truncated L1Tc/NARTc elements. None of the 3'-truncated elements have a TSD, however, all the 5'-truncated NARTc (7 elements) and a few 5'-truncated L1Tc (five out of the 118 elements) are flanked by a 12-bp TSD (11 elements) or 11-bp TSD (1 element) (table 1 and fig. 2D), as observed for the full-length elements.

Among the 220 full-length elements analyzed, 103 are not flanked by a TSD (fig 2B and C). However, 32 of these TSD-less elements are preceded by a sequence found upstream of L1Tc/NARTc flanked by a TSD (14.5% of the full-length elements) (fig. 2B), which are probably generated by homologous recombination between retroelements to generate chimeric retrotransposons flanked by unrelated regions. Homologous recombination events imply that an equivalent number of the TSD-less elements should share their 3'-flanking region with elements flanked by a TSD. This was indeed observed because 33 TSD-less elements are followed by a sequence found downstream of L1Tc/NARTc flanked by a TSD (15% of the full-length elements) (data not shown). Because the probability of homologous recombination events increases with the size of the homologous sequences, our hypothesis implies that the proportion of elements flanked by a TSD should be higher for the short NARTc elements (0.25 kb) than for the long L1Tc elements (4.9 kb). Indeed, approximately three times more full-length NARTc are flanked by a TSD, as compared to the full-length L1Tc (79.6% vs. 27.7%). Similarly, 100% of 5'-truncated NARTc and 4% of 5'-truncated L1Tc are flanked by a TSD (see table 1 and fig. 2). The relative low abundance of 5'-truncated L1Tc flanked by a TSD, as compared to full-length L1Tc flanked by a TSD (4% vs. 27.7%), is probably related to the expansion of two groups of 31 and 55 elements lacking the first 178 and 3,560 residues, respectively, which are located in the same genomic environment (data not shown). If we consider that these 31 and 55 related elements result from duplication and are representative of two different 5'-truncated L1Tc, the proportion of L1Tc flanked by a TSD is in the same range for 5'-truncated and full-length elements (16% vs. 27.7%). We previously observed the same features with the T. brucei ingi/RIME elements, that is, one-tenth of the T. brucei full-length elements are flanked by a TSD and an unknown sequence, and 85% and 48% of RIME and ingi, respectively, are flanked by a TSD (Bringaud et al. 2004Go).

L1Tc and NARTc Are Preceded by a Conserved Motif
To further study the insertion site of the T. cruzi retroelements, we determined the conservation of nucleotides flanking the full-length L1Tc and NARTc elements flanked by a TSD. For this analysis, only a single sequence of each group of nearly identical sequences (67 groups) has been considered (fig. 2A). The sequence downstream of the retroelements does not show a conserved pattern. However, a well-conserved motif is located in the vicinity of the first-strand cleavage (fig. 3A). The most conserved residues are a guanine and an adenine (91% and 99% of conservation, respectively), that flank the first-strand cleavage (positions –12 and –13 upstream of the retroelements). In addition, nine other residues between positions –14 and –31 show more than 50% of conservation. To determine whether the consensus pattern present upstream of the L1Tc/NARTc retroelements is statistically significant, we performed a chi-square test on the same data set (fig. 4A). This analysis clearly demonstrates that a conserved motif (GAxxAxGaxxxxxtxTATG{uparrow}Axxxxxxxxxxx; the arrow indicates the first-strand cleavage site) precedes both the NARTc and L1Tc retrotranposons. The presence of the same conserved pattern upstream of both L1Tc (fig. 4B) and NARTc (fig. 4C) confirms that both elements use the same machinery for their retrotransposition.



View larger version (27K):
[in this window]
[in a new window]
 
FIG. 3.— Base frequencies at different positions of the TSD and adjacent regions of the L1Tc/NARTc retrotransposons identified in contigs. In panel A, the frequencies have been analyzed in the TSD (12 bp) and the 5'- and 3'-adjacent sequences (27 and 16 bp, respectively) of 67 different full-length L1Tc and NARTc elements flanked by a TSD (fig. 2A). The L1Tc/NARTc sequences and downstream TSD have not been analyzed. In panel B, the same analysis was performed as in panel A, except that the 32 different full-length L1Tc and NARTc elements analyzed are not flanked by a TSD (fig. 2C). The first column (called "pos") indicates the nucleotide position: for the 5'-flanking region of both panels (from position –01 to –40) the numbering starts before the 5'-extremity of the retroelement; for the 3'-flanking region of panel A (from position +01 to +16) the numbering starts after the 3'-extremity of the downstream TSD. For the retroelements displaying a TSD (panel A) or putative TSD (panel B) with 13 residues (12 for panel A and 4 for panel B), the first residue located upstream of the retroelement was not considered. The values in columns "T," "C," "A," and "G" represent the percentage of the T, C, A, and G residues, respectively, at individual positions. Values superior to 50% are indicated: 50%–60% (underlined), 60%–70% (underlined and bold faced), 70%–80% (underlined and gray shaded), 80%–90% (underlined, bold faced and gray shaded), and 90%–100% (white characters on a black background). The last column (named "cons") shows the conserved residues. An arrow in the right margin indicates, in panel A, the position of the first single-strand cleavage.

 


View larger version (32K):
[in this window]
[in a new window]
 
FIG. 4.— The {chi}2 values for individual positions of the TSD and adjacent regions. The analysis was performed on one representative sequence for each group of L1Tc and NARTc (panel A), L1Tc (panel B), or NARTc (panel C) elements flanked by nearly identical sequences. The {chi}2 values were calculated as described in Materials and Methods from the set of 67 L1Tc/NARTc (figs. 2A and 3A), 31 L1Tc, and 53 NARTc sequences. For the L1Tc and NARTc elements displaying a TSD with 13 residues, the first residue located upstream of the retroelement was not considered (12, 5, and 8 sequences, in panels A, B, and C, respectively). The base composition of the whole Trypanosoma cruzi genome sequence was used to determine the background base distribution. The {chi}2 values above the broken horizontal line (16.3) correspond to significance levels of P < 0.001 for 3 degrees of freedom. The TSD is boxed by a gray line and the first- and second-strand cleavages are indicated by "FIRST" and "SECOND," respectively.

 
Approximately half of the T. cruzi L1Tc/NARTc elements are not flanked by a TSD (103 out of 220) (fig. 2). As discussed above, about one-third of these TSD-less elements are preceded by a sequence found upstream of L1Tc/NARTc flanked by a TSD (fig. 2B) and may result from homologous recombination. The other 71 sequences are flanked by unrelated and unknown sequences (fig. 2C). However, as observed for the retroelements flanked by TSD, the latter group of elements representing approximately one-third of all the full-length L1Tc/NARTc is preceded by the same conserved pattern (fig. 3B). This indicates that most, if not all, NARTc and L1Tc elements are preceded by the conserved motif described above. Similarly, among the 76 analyzed full-length T. brucei elements, 40 (52.6%) are TSD-less and most of them show the upstream conserved sequence (Bringaud et al. 2004Go). Altogether, these observations demonstrate that the T. cruzi, as well as the T. brucei, non-LTR retrotransposons are not randomly distributed in the genome as previously proposed, but instead show a relative site specificity probably dictated by the retroelement-encoded endonuclease.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The non-LTR retrotransposons, L1Tc and NARTc, of the protozoan parasite T. cruzi were thought to be randomly distributed in the nuclear genome. Here we show that these elements present a relative insertion site specificity. Indeed, the 220 full-length L1Tc/NARTc elements identified in the sequenced T. cruzi genome (CL-Brener strain) are preceded by a conserved pattern (GAxxAxGaxxxxxtxTATG{uparrow}Axxxxxxxxxxx), which may be the binding site of the element-encoded endonuclease. According to the current model, the retroelement-encoded endonuclease domain dictates whether the site of insertion of non-LTR retrotransposons is specific or not (Luan et al. 1993Go). Most of the nonsite-specific non–LTR retrotransposons contain an APE-like domain, which is thought to determine the site of retroelement insertion. Olivares et al. (2003)Go showed that the T. cruzi L1Tc APE-like domain contains an APE activity (Olivares, Alonso, and Lopez 1997Go). Furthermore, overexpression of the L1Tc APE-like domain protects T. cruzi against DNA damaging stresses (Olivares et al. 2003Go), indicating that the L1Tc-encoded APE-like domain is active in vivo and may have a protective role. Consequently, it has been proposed that this APE-repair activity could act as a signal for new retrotransposition events. APE recognizes modified purine and pyrimidine residues, which are randomly generated in the genome. After excision of the damaged DNA base by APE, the DNA repair machinery replaces the excised residue by the equivalent unmodified nucleotide (Mol, Hosfield, and Tainer 2000Go). This implies that, if the APE activity determines the site of insertion, the T. cruzi non-LTR retrotransposons should be randomly distributed and flanked by nonconserved sequences. This hypothesis is not in agreement with the relative site specificity of insertion we observed for most, if not all, T. cruzi retroelements. For example, the first-strand cleavage is performed between two highly conserved G and A residues (91% and 99% of conservation, respectively), which is not compatible with insertion at apurinic/apyrimidinic sites via APE-mediated repair. Furthermore, it has been observed that the human L1-encoded APE-like domain shows no preference for apurinic/apyrimidinic sites and preferentially cleaves unmodified DNA molecules within AT-rich sequences (Feng et al. 1996Go). This preferred experimentally defined integration site is similar to the target consensus sequence located upstream of the Alu elements, which use the L1 machinery for retrotransposition (Jurka 1997Go; Dewannieux, Esnault, and Heidmann 2003Go). Altogether this indicates that, as observed for the L1 and Alu elements, the apurinic/apyrimidinic sites are not the preferred insertion site of the T. cruzi L1Tc and NARTc elements.

To tentatively explain the presence of a consensus sequence usptream of the L1Tc and NARTc retroelements, two alternative hypotheses should be considered. First, it has been shown that mobilization of the human L1 elements can be mediated by endonuclease-independent retrotransposition to repair double-strand break DNA (Morrish et al. 2002Go). This hypothesis cannot be retained because endonuclease-independent retrotransposition generates retroelements, which lack the TSD and are not preceded by a conserved motif. Second, it has also been proposed that the T. brucei ingi element encodes multiple endonuclease functions, including the N-terminal APE domain and a C-terminal domain homologous to integrases and histidine-asparagine-histidine endonucleases (McClure, Donaldson, and Corro 2002Go). However, this endonuclease-like C-terminal domain is not present in the L1Tc element. Consequently, the L1Tc element encodes a single endonuclease domain, the N-terminal APE-like domain, known to be responsible for the target site recognition in other non-LTR retrotransposons.

Trypanosoma cruzi and T. brucei non-LTR retrotransposon pairs (L1Tc/NARTc and ingi/RIME, respectively), composed of long autonomous (L1Tc and ingi) and short nonautonomous (NARTc and RIME) elements, share many characteristics (Bringaud et al. 2004Go), such as (1) equivalent copy number per haploid genome (in the same range), (2) conservation of the 5'-extremity (78 bp for T. cruzi and 250 bp for T. brucei), (3) conservation of the TSD size (12 bp), (4) equivalent proportion of TSD-less elements, (5) a relative site specificity for insertion, and (6) the autonomous and nonautonomous elements are preceded by the same conserved motif, suggesting that they share the same retrotransposition machinery. However, there is a major difference between the consensus sequence observed upstream of T. cruzi (GAxxAxGaxxxxxtxTATG{uparrow}Axxxxxxxxxxx) and T. brucei (AxxxxxxxTtgxGTxGGxTxxx{uparrow}tTxTxxTxxxxxx) non-LTR retrotransposons (fig. 5). This target site difference is probably related to the divergent L1Tc and ingi APE-like domains, which are only 23.8 % identical.



View larger version (30K):
[in this window]
[in a new window]
 
FIG. 5.— Comparison of Trypanosoma cruzi and Trypanosoma brucei consensus sequences. The consensus sequence located upstream of the T. brucei (ingi and RIME) or T. cruzi (L1Tc and NARTc) retroelements is shown using the "sequence logo" graphic representation (Schneider and Stephens 1990Go; Crooks et al. 2004Go). For T. cruzi, the analysis was performed on the same data set as used in figures 2A, 3A, and 4A. For T. brucei, the analysis was performed on the GSS data set used in figure 6 in Bringaud et al. (2004)Go. For the 12 T. cruzi elements (from the 67 elements constituting the data set) displaying a TSD with 13 residues, the first residue located upstream of the retroelement was not considered. The logo representation displays the frequencies of bases at each position, as the relative heights of letters, along with the degree of sequence conservation as the total height of a stack of letters, measured in bits of information. The vertical scale is in bits, with a maximum of 2 bits possible at each position. For the numbering, position +1 corresponds to the first residue of the retroelement. The retroelement sequence (CCCTGGC...) and the TSD are indicated.

 

    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Bill Wickstead for sharing informations. F.B. and T.B. were supported by the Centre National de Recherche Scientifique, the Conseil Régional d'Aquitaine, and the Ministère de l'Education Nationale de la Recherche et de la Technologie. N.M.A.E.S. and colleagues were supported by National Institutes of Health grants AI43062 and AI45038.


    Footnotes
 
Pierce Capy, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Aksoy, S., T. M. Lalor, J. Martin, L. H. Van der Ploeg, and F. F. Richards. 1987. Multiple copies of a retroposon interrupt spliced leader RNA genes in the African trypanosome, Trypanosoma gambiense. EMBO J. 6:3819–3826.[Web of Science][Medline]

    Bhattacharya, S., A. Bakre, and A. Bhattacharya. 2002. Mobile genetic elements in protozoan parasites. J. Genet. 81:73–86.[Web of Science][Medline]

    Berriman, M., E. Ghedin, C. Hertz-Fowler et al. (90 co-authors). 2005. The genome of the African trypanosome, Trypanosoma brucei. Science 309:416–422.[Abstract/Free Full Text]

    Bringaud, F., N. Biteau, E. Zuiderwijk, M. Berriman, N. M. El-Sayed, E. Ghedin, S. E. Melville, N. Hall, and T. Baltz. 2004. The ingi and RIME non-LTR retrotransposons are not randomly distributed in the genome of Trypanosoma brucei. Mol. Biol. Evol. 21:520–528.[Abstract/Free Full Text]

    Bringaud, F., J. L. García-Pérez, S. R. Heras, E. Ghedin, N. M. El-Sayed, B. Andersson, T. Baltz, and M. C. Lopez. 2002. Identification of non-autonomous non-LTR retrotransposons in the genome of Trypanosoma cruzi. Mol. Biochem. Parasitol. 124:73–78.[CrossRef][Web of Science][Medline]

    Capy, P., C. Bazin, D. Higuet, and T. Langin. 1998. Dynamics and evolution of transposable elements. Landes Bioscience, Austin, Tex.

    Cost, G. J., and J. D. Boeke. 1998. Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry 37:18081–18093.[CrossRef][Medline]

    Craig, N. L. 1997. Target site selection in transposition. Annu. Rev. Biochem. 66:437–474.[CrossRef][Web of Science][Medline]

    Crooks, G. E., G. Hon, J. M. Chandonia, and S. E. Brenner. 2004. WebLogo: a sequence logo generator. Genome Res. 14:1188–1190.[Abstract/Free Full Text]

    Dewannieux, M., C. Esnault, and T. Heidmann. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35:41–48.[CrossRef][Web of Science][Medline]

    Eickbush, T. H., and H. S. Malik. 2002. Origins and evolution of retrotransposons. Pp. 1111–1144 in A. G. Craig, R. Craigie, M. Gellert, and A. M. Lambowitz, eds. Mobile DNA II. ASM Press, Washington, D.C.

    El-Sayed, N. M. A., P. Myler, D. C. Bartholomeu et al. (76 co-authors). 2005. The genome sequence of Trypanosoma cruzi, etiological agent of Chagas' disease. Science 309:409–415.[Abstract/Free Full Text]

    Feng, Q., J. V. Moran, H. H. Kazazian, and J. D. Boeke. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916.[CrossRef][Web of Science][Medline]

    Garcia-Perez, J. L., C. I. Gonzalez, M. C. Thomas, M. Olivares, and M. C. Lopez. 2003. Characterization of reverse transcriptase activity of the L1Tc retroelement from Trypanosoma cruzi. Cell. Mol. Life Sci. 60:2692–2701.[CrossRef][Web of Science][Medline]

    George, J. A., W. D. Burke, and T. H. Eickbush. 1996. Analysis of the 5' junctions of R2 insertions with the 28S gene: implications for non-LTR retrotransposition. Genetics 142:853–863.[Abstract]

    Hasan, G., M. J. Turner, and J. S. Cordingley. 1984. Complete nucleotide sequence of an unusual mobile element from Trypanosoma brucei. Cell 37:333–341.[CrossRef][Web of Science][Medline]

    Ivens, A. C., C. Peacock, E. A. Worthey et al. (102 co-authors). 2005. The genome of the kinetoplastid parasite, Leishmania major. Science 309:436–442.[Abstract/Free Full Text]

    Jurka, J. 1997. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl. Acad. Sci. USA 94:1872–1877.[Abstract/Free Full Text]

    Kajikawa, M., and N. Okada. 2002. LINEs mobilize SINEs in the eel through a shared 3' sequence. Cell 111:433–444.[CrossRef][Web of Science][Medline]

    Kazazian, H. H. Jr., and J. V. Moran. 1998. The impact of L1 retrotransposons on the human genome. Nat. Genet. 19:19–24.[CrossRef][Web of Science][Medline]

    Kimmel, B. E., O. K. Ole-MoiYoi, and J. R. Young. 1987. Ingi, a 5.2-kb dispersed sequence element from Trypanosoma brucei that carries half of a smaller mobile element at either end and has homology with mammalian LINEs. Mol. Cell. Biol. 7:1465–1475.[Abstract/Free Full Text]

    Luan, D. D., M. H. Korman, J. L. Jakubczak, and T. H. Eickbush. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595–605.[CrossRef][Web of Science][Medline]

    Martin, F., C. Maranon, M. Olivares, C. Alonso, and M. C. Lopez. 1995. Characterization of a non-long terminal repeat retrotransposon cDNA (L1Tc) from Trypanosoma cruzi: homology of the first ORF with the ape family of DNA repair enzymes. J. Mol. Biol. 247:49–59.[CrossRef][Web of Science][Medline]

    McClure, M. A., E. Donaldson, and S. Corro. 2002. Potential multiple endonuclease functions and a ribonuclease H encoded in retroposon genomes. Virology 296:147–158.[CrossRef][Web of Science][Medline]

    Mol, C. D., D. J. Hosfield, and J. A. Tainer. 2000. Abasic site recognition by two apurinic/apyrimidinic endonuclease families in DNA base excision repair: the 3' ends justify the means. Mutat. Res. 460:211–229.[Web of Science][Medline]

    Morrish, T. A., N. Gilbert, J. S. Myers, B. J. Vincent, T. D. Stamato, G. E. Taccioli, M. A. Batzer, and J. V. Moran. 2002. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat. Genet. 31:159–165.[CrossRef][Web of Science][Medline]

    Murphy, N. B., A. Pays, P. Tebabi, H. Coquelet, M. Guyaux, M. Steinert, and E. Pays. 1987. Trypanosoma brucei repeated element with unusual structural and transcriptional properties. J. Mol. Biol. 195:855–871.[CrossRef][Web of Science][Medline]

    Olivares, M., C. Alonso, and M. C. Lopez. 1997. The open reading frame 1 of the L1Tc retrotransposon of Trypanosoma cruzi codes for a protein with apurinic-apyrimidinic nuclease activity. J. Biol. Chem. 272:25224–25228.[Abstract/Free Full Text]

    Olivares, M., J. L. Garcia-Perez, M. C. Thomas, S. R. Heras, and M. C. Lopez. 2002. The non-LTR (long terminal repeat) retrotransposon L1Tc from Trypanosoma cruzi codes for a protein with RNase H activity. J. Biol. Chem. 277:28025–28030.[Abstract/Free Full Text]

    Olivares, M., M. C. Lopez, J. L. Garcia-Perez, P. Briones, M. Pulgar, and M. C. Thomas. 2003. The endonuclease NL1Tc encoded by the LINE L1Tc from Trypanosoma cruzi protects parasites from daunorubicin DNA damage. Biochim. Biophys. Acta 1626:25–32.[Medline]

    Olivares, M., M. C. Thomas, A. Lopez-Barajas, J. M. Requena, J. L. Garcia-Perez, S. Angel, C. Alonso, and M. C. Lopez. 2000. Genomic clustering of the Trypanosoma cruzi nonlong terminal L1Tc retrotransposon with defined interspersed repeated DNA elements. Electrophoresis 21:2973–2982.[CrossRef][Web of Science][Medline]

    Pays, E., and N. B. Murphy. 1987. DNA-binding fingers encoded by a trypanosome retroposon. J. Mol. Biol. 197:147–148.[CrossRef][Web of Science][Medline]

    Schneider, T. D., and R. M. Stephens. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18:6097–6100.[Abstract/Free Full Text]

    Tatout, C., L. Lavie, and J. M. Deragon. 1998. Similar target site selection occurs in integration of plant and mammalian retroposons. J. Mol. Evol. 47:463–470.[CrossRef][Web of Science][Medline]

    Villanueva, M. S., S. P. Williams, C. B. Beard, F. F. Richards, and S. Aksoy. 1991. A new member of a family of site-specific retrotransposons is present in the spliced leader RNA genes of Trypanosoma cruzi. Mol. Cell. Biol. 11:6139–6148.[Abstract/Free Full Text]

    Volff, J. N., C. Korting, A. Froschauer, K. Sweeney, and M. Schartl. 2001. Non-LTR retrotransposons encoding a restriction enzyme-like endonuclease in vertebrates. J. Mol. Evol. 52:351–360.[CrossRef][Web of Science][Medline]

    Whitcomb, J. M., and S. H. Hughes. 1992. Retroviral reverse transcription and integration: progress and problems. Annu. Rev. Cell Biol. 8:275–306.[CrossRef][Web of Science][Medline]

    Yang, J., H. S. Malik, and T. H. Eickbush. 1999. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl. Acad. Sci. USA 96:7847–7852.[Abstract/Free Full Text]

Accepted for publication October 19, 2005.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Eukaryot CellHome page
F. Bringaud, M. Berriman, and C. Hertz-Fowler
Trypanosomatid Genomes Contain Several Subfamilies of ingi-Related Retroposons
Eukaryot. Cell, October 1, 2009; 8(10): 1532 - 1542.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. C. Bartholomeu, G. C. Cerqueira, A. C. A. Leao, W. D. daRocha, F. S. Pais, C. Macedo, A. Djikeng, S. M. R. Teixeira, and N. M. El-Sayed
Genomic organization and expression profile of the mucin-associated surface protein (masp) family of the human pathogen Trypanosoma cruzi
Nucleic Acids Res., June 1, 2009; 37(10): 3407 - 3417.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
R. T. Souza, M. R. M. Santos, F. M. Lima, N. M. El-Sayed, P. J. Myler, J. C. Ruiz, and J. F. da Silveira
New Trypanosoma cruzi Repeated Element That Shows Site Specificity for Insertion
Eukaryot. Cell, July 1, 2007; 6(7): 1228 - 1238.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. R. Heras, M. C. Lopez, M. Olivares, and M. C. Thomas
The L1Tc non-LTR retrotransposon of Trypanosoma cruzi contains an internal RNA-pol II-dependent promoter that strongly activates gene transcription and generates unspliced transcripts
Nucleic Acids Res., April 1, 2007; 35(7): 2199 - 2214.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/2/411    most recent
msj046v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Bringaud, F.
Right arrow Articles by Ghedin, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bringaud, F.
Right arrow Articles by Ghedin, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?