Skip Navigation


MBE Advance Access originally published online on March 14, 2008
Molecular Biology and Evolution 2008 25(6):1148-1157; doi:10.1093/molbev/msn061
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/6/1148    most recent
msn061v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ichiyanagi, K.
Right arrow Articles by Okada, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ichiyanagi, K.
Right arrow Articles by Okada, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Mobility Pathways for Vertebrate L1, L2, CR1, and RTE Clade Retrotransposons

Kenji Ichiyanagi*,{dagger} and Norihiro Okada{dagger}

* Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
{dagger} Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama, Japan

E-mail: nokada{at}bio.titech.ac.jp


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Autonomous non–long terminal repeat retrotransposons (NLRs) are ubiquitous mobile genetic elements that insert their DNA copies at new locations by retrotransposition. In vertebrates, there are 4 NLR clades, L1, L2, CR1, and RTE, which diverged in the Precambrian era. It has been demonstrated that retrotransposition of L1 and L2 members proceeds via coordinated reactions of targeted DNA cleavage and reverse transcription catalyzed by the NLR-encoded proteins, which are followed by the joining of the 5' (upstream) junction. However, the study on the mobility pathways for vertebrate NLRs is so far limited to L1 and L2. In this report, using target analysis of nested transposons for genomic copies, we studied retrotransposition pathways for a variety of vertebrate NLRs, including those of the L1, L2, CR1, and RTE clades in the human, cow, opossum, chicken, and zebrafish genomes. Thus, this study constitutes the first comprehensive analysis of NLR retrotransposition products in vertebrates. Our data revealed that these elements share similar mechanisms for the cleavages of the 2 target DNA strands and for the initiation of reverse transcription. Possible endonuclease-independent insertions were also identified. Overall, our results suggest the existence of multiple retrotransposition pathways that are conserved among the diverse NLR clades in various vertebrate hosts.

Key Words: LINE • transposition • DNA repair • endonuclease • reverse transcription


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Autonomous non–long terminal repeat retrotransposons (NLRs; also known as long interspersed nuclear elements or LINEs) are eukaryotic retrotransposable elements. Their insertions can potentially cause mutations, and their copies can serve as a substrate for homologous recombination, leading to genomic rearrangements (Ostertag and Kazazian 2001aGo; Deininger et al. 2003Go). Moreover, NLR insertions can alter genome functions because they carry functional sequences, such as transcriptional promoters (Becker et al. 1993Go; Speek 2001Go), polyA signals (Perepelitsa-Belancio and Deininger 2003Go), RNA splicing donors and acceptors (Belancio et al. 2006Go; Tamura et al. 2007Go), and nuclear matrix attachment sites (Chimera and Musich 1985Go; Jordan et al. 2003Go). Therefore, NLR mobility has made a significant impact both on the structure and regulation of genomes during evolution.

NLRs can be subdivided into at least 13 clades based on the reverse transcriptase (RT) sequences that they encode, and the clades have been established as far back as the Precambrian era (Malik et al. 1999Go). NLRs of the L1, L2, CR1, and RTE clades are present in vertebrate genomes, although their current proliferative activities are strikingly different. In human and mouse, only the L1 clade is currently active; in contrast, genomic copies of all the other clades are heavily mutated, indicating their recent inactivity. In the chicken genome, only the CR1 clade contains copies that are not heavily mutated, a sign that they transposed recently (young copies). The cow genome carries young elements of the L1 and RTE clades, whereas young elements of the L1, L2, and RTE clades are present in the opossum genome. The zebrafish genome contains young elements of all 4 clades (see RepBase for these vertebrate NLRs; Jurka et al. 2005Go). NLRs of different clades in the same species are distantly related, whereas the NLRs of the same clade in different species are closely related (fig. 1A). For example, zebrafish L1 is much more closely related to human L1 than to zebrafish L2.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— NLRs and their retrotransposition pathways. (A) The phylogeny of RTs encoded by the NLRs analyzed in this report. A tree was constructed by the Neighbor–Joining method computed on the MEGA3.1 program (Kumar et al. 2004Go) with RT sequences obtained from GenBank (DQ000238 [GenBank] for L1_BT) or from RepBase (Jurka et al. 2005Go). The tree was then slightly revised by referencing some previous analyses that had included a larger number of RT sequences to identify the clades (Malik et al. 1999Go; Volff et al. 2000Go; Lovsin et al. 2001Go; Zupunski et al. 2001Go; Kapitonov and Jurka 2003Go; Ohshima and Okada 2005Go; Sugano et al. 2006Go). The hosts and clades of the NLRs are indicated on the right. (B) Representative pathways for NLR retrotransposition. Arrows indicate reactions as follows: (1) transcription of NLR, (2) translation, (3) complex formation of the NLR RNA and EN/RT protein, (4) bottom-strand cleavage, (5a) reverse transcription initiated with annealing of the NLR RNA and target DNA, (5b) reverse transcription, (6a) top-strand cleavage at a site downstream of the site of bottom-strand cleavage, (6b) top-strand cleavage at a site upstream of the site of bottom-strand cleavage, (7) annealing of NLR cDNA and target DNA, (8) nontemplated DNA synthesis from the end of the upstream target DNA, (9) sense-strand synthesis and ligation at the 5' junction, (10)] NLR RNA-independent DNA synthesis, (11) exonucleolytic degradation of 5'-overhanged DNAs, (12) endogenous double-strand break (DSB), and (13) exonucleolytic degradation of target DNA from the site of DSB. The major pathways generating a TSD (pathways A and B) are indicated by bold arrows. TSDs, MH regions, and extranucleotides are indicated on the retrotransposition products.

 
Studies on mammalian (L1 clade), fish (L2 clade), and insect (R1 and R2 clades) NLRs have revealed that NLRs are propagated by retrotransposition. During this process, the sequence of the original NLR copy is first transcribed into an RNA to produce a protein containing RT and endonuclease (EN) domains (fig. 1B, arrows 1 and 2). This protein forms a complex with its own RNA (fig. 1B, arrow 3). The protein nicks the target duplex DNA through the EN activity (fig. 1B, arrow 4). Using the resulting 3'-OH end as a primer, the RT domain of the protein initiates reverse transcription of the NLR RNA to synthesize the antisense DNA strand of a new NLR copy by a mechanism called target-primed reverse transcription or TPRT (fig. 1B, arrow 5a; Luan et al. 1993Go; Cost et al. 2002Go; Anzai et al. 2005Go). The second strand of the target duplex becomes cleaved during or after TPRT, possibly by the EN domain (fig. 1B, arrow 6a; Christensen and Eickbush 2005Go), detaching the upstream region of the target duplex from the downstream DNA. The sense-strand DNA synthesis and ligation of the NLR DNA and the upstream target DNA at the 5' junction complete retrotransposition (fig. 1B, arrows 7-9), although detailed mechanisms for these steps have been conjectural. Recently, we developed a bioinformatic method to analyze the target–NLR junctions of genomic NLR copies, named target analysis of nested transposons (TANT) (Ichiyanagi and Okada 2006Go; Ichiyanagi, Nakajima, et al. 2007Go). Using the TANT method, we selected for genomic NLR copies that reside within other transposons so that alterations of the target sequence by the NLR insertion could be inferred from the consensus sequence of the host transposon. This method revealed that genomic integrants show several distinct features in their target-site sequence and 5' and 3' junctions, suggesting the existence of multiple pathways for NLR retrotransposition.

Due to the lack of an appropriate cell culture–based system, mobility pathways for vertebrate NLRs other than human and rodent L1s and fish L2 are poorly studied. Therefore, the generality of the detailed mobility reactions deduced from the L1 and L2 studies remains a question. Because many vertebrate genomes have been sequenced, the TANT method is now applicable to NLRs in a wide variety of host genomes. Thus in this study, we analyzed retrotransposition products of the 4 NLR clades present in human, cow, opossum, chicken, and zebrafish by TANT. Our data support the generality of the retrotransposition pathways for a diverse set of NLRs and vertebrate hosts.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Nomenclature
All NLR names were used according to the latest version of the RepBase RepeatMasker edition, repeatmaskerlibraries-20061006 (Jurka et al. 2005Go). The sole exception was ZfL2-2 (CR1-2_DR in RepBase) in figure 1A to avoid any confusion about the clade to which it belongs. The clade previously defined as CR1 (Malik et al. 1999Go; Kapitonov and Jurka 2003Go) has since been proposed to include 2 distinct clades, CR1 and L2 (Lovsin et al. 2001Go; Ogiwara et al. 2002Go; Sugano et al. 2006Go). Some have suggested that the CR1 clade may be further separated into 2 clades, CR1 and REX1 (Volff et al. 2000Go; Lovsin et al. 2001Go; Ohshima and Okada 2005Go). Whereas the monophyletic relationship of CR1- and REX1-related elements is supported by a high bootstrap value (Kapitonov and Jurka 2003Go), it remains uncertain if REX1-related elements constitute an authentic clade (Sugano et al. 2006Go). In this paper, therefore, we regarded zebrafish REX1-1_DR and chicken CR1 as CR1 clade elements and opossum L2_Mars and zebrafish ZfL2s were classified into the L2 clade.

TANT Method
The detailed method and criteria to identify genomic copies of NLRs that reside within other transposons has been described (Ichiyanagi and Okada 2006Go; Ichiyanagi, Nakajima, et al. 2007Go). To obtain genomic data, we downloaded RepeatMasker tables from the UCSC genome browser (Karolchik et al. 2004Go) for the genomes of Bos taurus (bosTau2, March 2005), Monodelphis domestica (monDom4, January 2006), Gallus gallus (galGal3, May 2006), and Danio rerio (danRer4, March 2006). Both the cow and opossum genomes contain many subfamilies of L1 clade NLRs; however, we analyzed only L1_BT for cow and L1_Mdo1 for opossum as these represent the youngest subfamilies in their respective genomes. The CR1 clade of NLRs in chicken includes subfamilies of CR1-A to CR1-I, CR1-X, and CR1-Y (Vandergon and Reitman 1994Go; Wicker et al. 2005Go). Because these subfamilies are closely related (65–85% identity among the RT sequences), we combined the data for these subfamilies for the statistical analysis. The divergence of the analyzed NLR copies from their respective consensus sequences was always comparable (0–15%). We only analyzed 5'-truncated insertions.

TANT Method Identifying TST-Associated Human L1 Integrants
We previously analyzed human L1 integrants by the TANT method (Ichiyanagi, Nakajima, et al. 2007Go). Because only a small number of target-site truncation (TST)–associated integrants were identified in that analysis, here we selectively identified TST integrants to study the junction structure and target-site sequence of these integrants. Candidates for L1 elements associated with a TST in the human genome (hg18, March 2006) were first screened using a perl script (available upon request) that was modified from the commonly used TANT method script. To collect candidates with a TST, this script searched for integrants in which repStart of the downstream moiety of the host transposon was more than 1 bp downstream to repEnd of the upstream moiety (repStart and repEnd are RepeatMasker outputs indicating the start and end points, respectively, of transposons with respect to their consensus sequences). Then, we manually analyzed each candidate to determine the 5' and 3' junction features to identify a TST. We only collected integrants that retained the 3' polyA tail.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Using the TANT method, we collected and analyzed genomic copies of cow L1 (L1_BT) and RTE (BovB); opossum L1 (L1_Mdo1), L2 (L2_Mars), and RTE (RTE_Mdo); chicken CR1 and zebrafish CR1 (REX1-1_DR); and RTE (Expander1_DR). Due to genomic rearrangements during evolution, genomic NLR copies, in particular ancient copies, may have lost some of their original features generated by retrotransposition. To minimize this possibility, we selected for relatively young elements (with less than 15% divergence from their respective consensus sequences) that contained a complete 3' end (polyA or microsatellite tail) because retrotransposition products in general carry a complete 3' region in contrast to a large 5' truncation. Then, their integration features were compared. To be comprehensive, our previous data for human L1 and zebrafish L1 and L2 elements were incorporated in the comparisons and are shown in the figures and table.

Target-Site Duplication Generated upon Retrotransposition
For all NLRs analyzed herein, more than half (50–90%) of the integrants carry a duplication of the target sequence of varying length at both target–NLR junctions (target-site duplication; TSD) (table 1, 5th row). These frequent TSD associations were also observed for human L1 and zebrafish L1 and L2 (table 1; Moran et al. 1996Go; Gilbert et al. 2002Go, 2005Go; Symer et al. 2002Go; Ichiyanagi and Okada 2006Go; Ichiyanagi. Nakajima, et al. 2007Go). A close inspection of these TSD lengths highlights an NLR-directed pattern. The majority of the TSDs for human, cow, opossum, and zebrafish L1 clade elements were 7–18 bp in length, with 13–15 bp as the most abundant for human, cow, and opossum L1s (fig. 2A). For opossum and zebrafish L2 clade elements, TSDs were generally ≤12 bp, with 4–6 bp TSDs as the most abundant (fig. 2B). The lengths of TSDs of chicken and zebrafish CR1 clade elements showed a similar distribution (fig. 2C). The majority of TSDs of RTE clade elements of cow, opossum, and zebrafish were 7–15 bp, with 10–12 bp as the most abundant (fig. 2D). The L2- and CR1 clade elements encode closely related proteins (Lovsin et al. 2001Go), whereas the L1, L2, and RTE clades are distantly related in the NLR phylogeny (Malik et al. 1999Go). Therefore, the TSD lengths of given NLRs fall within clade-specific ranges regardless of their hosts. Therefore, the length of the TSD is dictated by the NLR-encoded proteins.


View this table:
[in this window]
[in a new window]

 
Table 1 Summary of LINE Integrants Analyzed by the TANT Method

 

Figure 2
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Distribution of TSD lengths. TSD lengths were categorized by 3-bp intervals for integrants of NLRs of the L1 (A), L2 (B), CR1 (C), and RTE (D) clades. The data for human and zebrafish L1s and zebrafish L2s are from our previous reports (Ichiyanagi and Okada 2006Go; Ichiyanagi, Nakajima, et al. 2007Go). For each NLR, their hosts and the number of TSD-associated integrants analyzed (n) are indicated on the top left (A) or top right (BD) corner.

 
Target Sequences for NLR Retrotransposition
Integrants associated with a TSD provide information on the target sequences for NLR insertion. If the upstream target sequence ends in 5'-ABC[DEFG]-3' and the downstream target sequence starts with 5'-[DEFG]HIJ-3' (the brackets indicate a TSD), the target sequence for the NLR insertion can be inferred as 5'-ABC{downarrow}DEFGHIJ-3', where the arrow indicates the site of insertion. We thus determined and compiled nucleotide sequences of the NLR insertion targets (fig. 3). The L1 insertions in cow (L1_BT) and opossum (L1_Mdo1) have a preference for 5'-NN{downarrow}AAAA-3' and 5'-TT{downarrow}AAAA-3', respectively (fig. 3A and D). This is consistent with the observation that human and fish L1s have a target preference for 5'-TT{downarrow}AAAA-3' (fig. 3G and Ichiyanagi, Nishihara, et al. 2007Go). The target-site specificity of human L1 is due to the activity of its encoding EN to preferentially cleave 5'-TTTT{downarrow}AA-3' on the complementary strand as the target cleavage during retrotransposition depends on the EN activity (fig. 1B, arrow 4; Feng et al. 1996Go; Cost and Boeke 1998Go). The other L1-encoded ENs likely have similar substrate specificities. On the other hand, NLRs of the L2, CR1, and RTE clades showed only very limited, if any, conservation in their target sequences, although there seemed to be some bias toward AT-rich sequences (fig. 3JO). Therefore, L2-, CR1-, and RTE-encoded ENs have very limited sequence specificity, whereas L1 clade NLRs (more specifically, the subclade M of the L1 clade; Ichiyanagi, Nishihara, et al. 2007Go) encode more sequence-specific ENs.


Figure 3
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Compilation of target-site sequences for NLR insertions. Target-site sequences for integrants with a TSD (A, D, G, JO), a short TST (B and H), or a long TST (C and I) are shown in a WebLogo graphical representation (Crooks et al. 2004Go). The level of base conservation is shown as a "bit," which ranges from 0 (each of the 4 bases occurs at a given position with 25% frequency; i.e., no conservation) to 2 (a single base occupies a given position in all target sequences; i.e., 100% conservation). The names of the NLRs are indicated at the top of each panel. The NLR insertion site is located between nucleotide positions –1 and 1. For TST-associated integrants, only sequences downstream of the insertion sites (nucleotide positions 1–5) were analyzed due to the lack of sequence information upstream. Due to the small sample size, individual sequences are shown for the L1_Mdo1 integrants associated with short (E) or long (F) TSTs. The data for (G) was obtained from Ichiyanagi, Nakajima, et al. (2007)Go.

 
TSTs upon EN-Dependent and -Independent Retrotranspositions
In addition to TSDs, we also identified truncations of a target sequence (TST). For each of the NLRs analyzed herein, 10–45% of insertions were associated with a TST (table 1, 7th row). The lengths of these truncations ranged from 1 bp to 2.5 kb, with most of them ≤100 bp. Such truncation can result from retrotransposition in which the top-strand cleavage took place upstream of the site of the bottom-strand cleavage (fig. 1B, arrow 6b; Gilbert et al. 2002Go, 2005Go; Ichiyanagi, Nakajima, et al. 2007Go). It has also been proposed that a substantial fraction of TST-associated retrotransposition involves an EN-independent mechanisms for target-site selection (pathway E in fig. 1B; Morrish et al. 2002Go, 2007Go; Anzai et al. 2005Go; Ichiyanagi, Nakajima, et al. 2007Go; Sen et al. 2007Go). As described above, TSD integrants of the L1 clade elements show consensus target sequences, a definite sign of EN-dependent retrotransposition. Therefore, we analyzed the target sequences for their TST integrants. Our previous study on zebrafish L2 clade elements (Ichiyanagi, Nakajima, et al. 2007Go) revealed a bimodal distribution of TST lengths, discriminating short and long TSTs (≤12 bp and ≥13 bp, respectively). We thus classified TST integrants according to this definition. The cow L1 (L1_BT) collection included 12 short TSTs and 5 long TSTs. Among the short TST integrants, the 5'-{downarrow}AAAA-3' sequence is conserved (fig. 3B), indicating EN-dependent retrotransposition. Interestingly, the long TST integrants showed a significantly reduced level of conservation (fig. 3C; P = 0.027 by a U test). These results suggest that short and long TSTs are generated by different mechanisms, with short TSTs EN dependent and long TSTs EN independent. The TST-associated opossum L1 (L1_Mdo1) integrants included 2 short and 4 long TSTs. Although the sample number was small, we noted that the target sequences of long TST integrants did not resemble 5'-{downarrow}AAAA-3' (fig. 3E and F), consistent with an EN-independent mechanisms. The target sequences of short TST integrants were also dissimilar to 5'-{downarrow}AAAA-3'. Thus, L1_Mdo1 might have an EN-independent mechanisms even for short target truncations; however, such a conclusion requires more examples. We also collected and analyzed 35 genomic human L1 copies associated with TSTs, including 14 short TSTs and 21 long TSTs, using the TANT method (see Materials and Methods) because our previous analysis identified only a small number of human L1 TST integrants (Ichiyanagi, Nakajima, et al. 2007Go). Thus, the current data set includes 17 short and 22 long TST integrants of human L1. The data showed that the majority of short TST integrants are inserted in sequences homologous to 5'-{downarrow}AAAA-3' (fig. 3H). In contrast, the long TST-associated integrants showed limited conservation (fig. 3I). Indeed, these sequences were a mixture of 2 types of sequences. Whereas 12 sequences were identical, or highly similar (with a single substitution of G for A), to 5'-{downarrow}AAAA-3', the other 10 sequences were dissimilar; for example, 5'-{downarrow}TCCC-3' and 5'-{downarrow}CCTG -3'. Therefore, some of the long TST integrants of human L1 are also likely products of EN-independent retrotransposition, which is consistent with a recent study by Sen et al. (2007)Go in which TST-associated human L1 copies were identified and characterized by comparison to closely related primate genomes. Altogether, our results underscore the generality of the 2 different (EN-dependent and -independent) targeting mechanisms, although EN-independent retrotransposition occurs less frequently.

Various Features at the 5' Junction
Features of the 5' junctions for the NLR integrants likely reflect the joining reactions that occur between the NLR and target DNA. Whereas it is well studied that the covalent attachment at the 3' junction is guaranteed by the usage of target DNA strand to prime reverse transcription by the NLR-encoded protein, the mechanisms for 5' joining has been conjectural. Sequence analyses on human L1 and zebrafish L1 and L2 elements have revealed that, at the 5' junction, integrants either share some sequence with their target DNA (called microhomology or MH stretch) or carry insertion of extranucleotides of unknown origin, suggesting the presence of alternative mechanisms to ligate the target and NLR DNA (Moran et al. 1996Go; Gilbert et al. 2002Go, 2005Go; Symer et al. 2002Go; Martin et al. 2005Go; Zingler et al. 2005Go; Babushok et al. 2006Go; Ichiyanagi and Okada 2006Go; Ichiyanagi, Nakajima, et al. 2007Go).

In some fraction of integrants analyzed here, the 5' junctions were joined via an MH stretch (33–68% of integrants; table 1, 8th row), whereas some integrants (9–61%; table 1, 10th row) carried extra 5' nucleotides and the rests were joined directly to their targets (0–26%; table 1, 9th row). Therefore, various joining patterns at the 5' junction are the conserved feature for these NLRs, likely due to plural pathways for ligation at the 5' junction.

To test if the pattern of the 5' junction is dictated by the host rather than the NLR clade (Ichiyanagi, Nakajima, et al. 2007Go), we compared the fractions of integrants with an insertion of extra 5' nucleotides (table 1, 10th row). For human L1 integrants, we used the data set of our previous collection (the left column of L1PA in table 1) and did not include the data collected in this work (the right column of L1PA in table 1) because the latter has a strong assortment bias to select for minor integrants that are associated with a TST. Whereas NLRs of the same clade in different hosts show varied fractions for extra 5' nucleotides (e.g., 9–25% for L1s and 22–61% for RTEs), there seems to be a trend that the fraction of integrants carrying extra 5' nucleotides is conserved among NLRs in the same host (e.g., ~27% for opossum L1 and RTE, and 52–61% for zebrafish L2, CR1, and RTE). However, there are exceptions (opossum L2 and zebrafish L1) having a fraction different from those of other NLRs in the same host. Thus, rigorous validation requires more data. Unfortunately, other NLRs in the representative hosts analyzed here are either absent or too ancient to be analyzed by the TANT method.

The 3' Junction: Footprints of the Reactions at Initiation of Reverse Transcription
The 3' ends of the NLRs analyzed consist of polyadenosine (L1s) or tandem repeats of 3- to 8-bp sequences (L2s, CR1s, and RTEs), such as (ACT)n for RTE_Mdo and (GATTCTAT)n for chicken CR1. For the majority of these NLR integrants (58–83% of those analyzed), the 3' terminal sequence (1 to several bp in length) of the NLRs overlaps with the end of the target sequence (table 1, 11th row), with just 1 exception, L2_Mars (only 28%). Therefore, the frequent overlaps between the NLR and target sequences at the 3' junctions in retrotransposon products, called 3' MH, are a conserved feature among these NLRs. The 3' MH is likely generated by retrotransposition reactions in which the NLR RNA becomes base paired with the EN-cleaved strand (i.e., the primer strand) of the target duplex DNA to facilitate the initiation of reverse transcription (fig. 1B, arrow 5a; Ostertag and Kazazian 2001bGo; Kulpa and Moran 2006Go; Ichiyanagi, Nakajima, et al. 2007Go).

Insertion of Extranucleotides at the 3' Junction and Its Correlation with TST
A portion (17–55%) of the NLR integrants carries an insertion of extranucleotides (1–156 bp) at the 3' junction (table 1 bottom). Some of these extrasequences contain repeats of mono-, di-, or tetranucleotides (data not shown). We suggest that these extra 3' nucleotides were likely generated by the NLR-encoded RTs rather than some sort of DNA synthesis by host-encoded DNA polymerases (fig. 1B, arrow 10) because it has been demonstrated that the L1- and R2-encoded RTs have an activity to append some nucleotides at the primer end before initiating canonical reverse transcription (Luan and Eickbush 1995Go; Cost et al. 2002Go).

We previously reported that TST integrants of zebrafish L2 NLRs are frequently associated with the insertion of extra 3' nucleotides (Ichiyanagi, Nakajima, et al. 2007Go; fig. 4F). The integrants of NLRs analyzed here also show such correlation (fig. 4A, C, and GK), except for L1_BT, L1_DR, and L2_Mars (fig. 4B, D, and E). For example, of 53 TSD integrants of chicken CR1, only 4 (8%) carried extra 3' nucleotides, whereas significantly more TST integrants (17 out of 26; 65%) carried such extranucleotides (P = 5 x 10–8 by a {chi}2 test; fig. 4G). These data suggest that the correlation between target truncation and insertion of extra 3' nucleotides is a general feature for NLR retrotransposition. As the extra 3' nucleotides are likely results of noncanonical DNA synthesis by the RT, such action would lead the subsequent top-strand cleavage at an unusual site (for short TST; pathways C and D in fig. 1B). It is also possible that extranucleotides are often synthesized when the NLR-encoded protein utilizes an endogenous DNA break as a primer (for long TSTs; pathway E in fig. 1B).


Figure 4
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Two-dimensional matrix analysis for the interrelatedness of the target-site alteration and the 3' junction feature. Integrants of each NLR were first categorized as having a TSD or TST. These integrants were then further categorized according to the absence (white) or presence (black) of extra 3' nucleotides. The number of samples in each category is indicated within the rectangles. The stars on the right indicate the statistical significance with P values of 0.039 (single star), 0.012 (2 stars), or <0.001 (3 stars) by {chi}2 independence tests. The names of the NLRs are indicated at the top of each panel with the NLR clades and the hosts in parentheses. For L1PA integrants (A), previously reported (Ichiyanagi, Nakajima, et al. 2007Go) and current data are combined. The data for (D) and (F) are from our previous studies (Ichiyanagi and Okada 2006Go; Ichiyanagi, Nakajima, et al. 2007Go).

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Cultured cell-based studies as well as biochemical studies have afforded many mechanistic insights into NLR retrotransposition. However, application of these cell-based studies has been thus far limited to a small number of species. For vertebrate NLRs, only L1 clade elements in human and rodents have been extensively studied, whereas retrotransposition of other NLR clades and even L1 elements present in other vertebrates remain poorly understood. The TANT method, which uses genomic sequence information, provides an opportunity to study retrotransposition products of a variety of NLRs in various species. Indeed, the study presented here constitutes the first comprehensive analysis of genomic integrants of all NLR clades present in fish, birds, marsupials, and eutherians and highlights the generality of reactions during retrotransposition.

Mechanism for the Top-Strand Cleavage
Our data revealed the profiles of TSD length in NLR retrotransposition products. The lengths of the TSDs flanking the genomic copies of L1PAs (Szak et al. 2002Go), BovB (Nijman et al. 2002Go), and RTE_Mdo (Gentles et al. 2007Go) were analyzed previously; however, these analyses did not include TSDs of ≤10 bp because the short direct repeats identified could not be statistically validated as authentic TSDs. One of the advantages of the TANT method is that TSDs as short as 1 bp can be identified, providing complete TSD length profiles. We found that the length distribution of TSDs is conserved among NLRs of the same clade, whereas distantly related NLRs show different length distributions even if they are present in the same host. These results argue in favor of the idea that TSD is generated by the action of NLR-encoded proteins.

During retrotransposition, both strands of the target duplex DNA must be cleaved to insert the NLR DNA, and the position of the top-strand cleavage (fig. 1B, arrow 6a) relative to the bottom-strand cleavage (fig. 1B, arrow 4) likely determines the length of the TSD. It has been demonstrated that the bottom strand is cleaved by the NLR-encoded EN during TPRT reactions (Luan et al. 1993Go; Cost et al. 2002Go). On the other hand, the mechanism of the top-strand cleavage is not well understood. However, it is conceivable that the top strand is also cleaved by the NLR-encoded EN. Indeed, biochemical results using the R2Bm protein (an insect NLR) have proposed that the bottom- and top-strand cleavages are ordered reactions catalyzed by a homodimer of the EN/RT protein (Christensen and Eickbush 2005Go). In this model, 1 subunit is responsible for the bottom-strand cleavage and reverse transcription and the other subunit cleaves the top strand. Thus, the spatial distance between the 2 EN catalytic sites in the homodimer dictates the length of the TSDs. Such distance is likely well conserved among NLRs in the same clade, although there is a caveat: Some fraction of the cow L1 and RTE elements showed short TSDs as compared with NLRs of the same clade (fig. 2A and D). In cow, nucleolytic degradation might more often occur to shorten the single-stranded 3' overhang generated by the top-strand cleavage. In any event, the peaks in the TSD length distribution for these cow elements are same as those for other NLRs of the same clade. Therefore, our data support the generality of the mechanism for the bottom- and top-strand cleavages, positions of which are determined by the configuration of the NLR-encoded protein.

EN-Independent Retrotransposition
Whereas the canonical retrotransposition pathway involves the target DNA cleavage by the NLR-encoded ENs, cell-based studies of L1 (Morrish et al. 2002Go; Morrish et al. 2007Go) and R1Bm (an insect R1 clade NLR; Anzai et al. 2005Go), a study on noncanonical human L1 insertions (Sen et al. 2007Go) and our previous TANT study (Ichiyanagi, Nakajima, et al. 2007Go) have proposed that retrotransposition can proceed independently of the EN activity at low frequencies. Here, we identified several cow and opossum L1 integrants where the target sequences are dissimilar to the consensus sequence. Most of these integrants are associated with long (≥13 bp) TST, which is consistent with the idea that long TST is a result of EN-independent retrotransposition (Gilbert et al. 2005Go; Ichiyanagi, Nakajima, et al. 2007Go). The most likely mechanism that can stand in proxy for the EN action to provide a TPRT primer is the utilization of an endogenous DNA break (fig. 1B, pathway E; Morrish et al. 2002Go; Ichiyanagi, Nakajima, et al. 2007Go; Sen et al. 2007Go). Indeed, repair of such breaks is often concomitant with DNA truncation (Lieber et al. 2003Go). In any event, our data provide genomic evidence for the presence of the EN-independent NLR retrotransposition pathway in various hosts.

Multiple Pathways for NLR Retrotransposition
Whereas the reverse transcription is central in the NLR retrotransposition process, these are some steps before and after reverse transcription. These include target DNA selection and cleavage, priming the reverse transcription, and ligation at the 5' junctions. Previous studies (Gilbert et al. 2005Go; Ichiyanagi, Nakajima, et al. 2007Go) have suggested that there are plural mechanisms at each of these steps, making many distinct pathways for the mobility of human L1 and zebrafish L2 elements. However, the generality of the existence of such pathways remained a question. The present analysis revealed that the various integration features are basically conserved among diverse NLRs. Moreover, also conserved is the interrelationship between the addition of extra 3' nucleotides and target truncation. These results provide the genomic evidence that different clades of NLRs in various hosts share the multiple strategies for their amplification.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Drs Kenji Kojima and Jun Suzuki for critical reading of the manuscript. This work was supported by a grant-in-aid to N.O. from the Ministry of Education, Cultures, Sports, Science and Technology, Japan and by the 21st Century Center of Excellence program of the Ministry.


    Footnotes
 
Naoko Takezaki, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Anzai T, Osanai M, Hamada M, Fujiwara H. Functional roles of 3'-terminal structures of template RNA during in vivo retrotransposition of non-LTR retrotransposon, R1Bm. Nucleic Acids Res (2005) 33:1993–2002.[Abstract/Free Full Text]

    Babushok DV, Ostertag EM, Courtney CE, Choi JM, Kazazian HH Jr. L1 integration in a transgenic mouse model. Genome Res (2006) 16:240–250.[Abstract/Free Full Text]

    Becker KG, Swergold GD, Ozato K, Thayer RE. Binding of the ubiquitous nuclear transcription factor YY1 to a cis regulatory sequence in the human LINE-1 transposable element. Hum Mol Genet (1993) 2:1697–1702.[Abstract/Free Full Text]

    Belancio VP, Hedges DJ, Deininger P. LINE-1 RNA splicing and influences on mammalian gene expression. Nucleic Acids Res (2006) 34:1512–1521.[Abstract/Free Full Text]

    Chimera JA, Musich PR. The association of the interspersed repetitive KpnI sequences with the nuclear matrix. J Biol Chem (1985) 260:9373–9379.[Abstract/Free Full Text]

    Christensen SM, Eickbush TH. R2 target-primed reverse transcription: ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Mol Cell Biol (2005) 25:6617–6628.[Abstract/Free Full Text]

    Cost GJ, Boeke JD. Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry (1998) 37:18081–18093.[CrossRef][Web of Science][Medline]

    Cost GJ, Feng Q, Jacquier A, Boeke JD. Human L1 element target-primed reverse transcription in vitro. EMBO J (2002) 21:5899–5910.[CrossRef][Web of Science][Medline]

    Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res (2004) 14:1188–1190.[Abstract/Free Full Text]

    Deininger PL, Moran JV, Batzer MA, Kazazian HH Jr. Mobile elements and mammalian genome evolution. Curr Opin Genet Dev (2003) 13:651–658.[CrossRef][Web of Science][Medline]

    Feng Q, Moran JV, Kazazian HH Jr., Boeke JD. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell (1996) 87:905–916.[CrossRef][Web of Science][Medline]

    Gentles AJ, Wakefield MJ, Kohany O, Gu W, Batzer MA, Pollock DD, Jurka J. Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica. Genome Res (2007) 17:992–1004.[Abstract/Free Full Text]

    Gilbert N, Lutz-Prigge S, Moran JV. Genomic deletions created upon LINE-1 retrotransposition. Cell (2002) 110:315–325.[CrossRef][Web of Science][Medline]

    Gilbert N, Lutz S, Morrish TA, Moran JV. Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol Cell Biol (2005) 25:7780–7795.[Abstract/Free Full Text]

    Ichiyanagi K, Nakajima R, Kajikawa M, Okada N. Novel retrotransposon analysis reveals multiple mobility pathways dictated by hosts. Genome Res (2007) 17:33–41.[Abstract/Free Full Text]

    Ichiyanagi K, Nishihara H, Duvernell DD, Okada N. Acquisition of endonuclease specificity during evolution of L1 retrotransposon. Mol Biol Evol (2007) 24:2009–2015.[Abstract/Free Full Text]

    Ichiyanagi K, Okada N. Genomic alterations upon integration of zebrafish L1 elements revealed by the TANT method. Gene (2006) 383:108–116.[CrossRef][Web of Science][Medline]

    Jordan IK, Rogozin IB, Glazko GV, Koonin EV. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet (2003) 19:68–72.[CrossRef][Web of Science][Medline]

    Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res (2005) 110:462–467.[CrossRef][Web of Science][Medline]

    Kapitonov VV, Jurka J. The esterase and PHD domains in CR1-like non-LTR retrotransposons. Mol Biol Evol (2003) 20:38–46.[Abstract/Free Full Text]

    Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ. The UCSC Table Browser data retrieval tool. Nucleic Acids Res (2004) 32:D493–D496.[Abstract/Free Full Text]

    Kulpa DA, Moran JV. Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat Struct Mol Biol (2006) 13:655–660.[CrossRef][Web of Science][Medline]

    Kumar S, Tamura K, Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.[Abstract/Free Full Text]

    Lieber MR, Ma Y, Pannicke U, Schwarz K. Mechanism and regulation of human non-homologous DNA end-joining. Nat Rev Mol Cell Biol (2003) 4:712–720.[CrossRef][Web of Science][Medline]

    Lovsin N, Gubensek F, Kordi D. Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in deuterostomia. Mol Biol Evol (2001) 18:2213–2224.[Abstract/Free Full Text]

    Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell (1993) 72:595–605.[CrossRef][Web of Science][Medline]

    Luan DD, Eickbush TH. RNA template requirements for target DNA-primed reverse transcription by the R2 retrotransposable element. Mol Cell Biol (1995) 15:3882–3891.[Abstract]

    Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol (1999) 16:793–805.[Abstract]

    Martin SL, Li WL, Furano AV, Boissinot S. The structures of mouse and human L1 elements reflect their insertion mechanism. Cytogenet Genome Res (2005) 110:223–228.[CrossRef][Web of Science][Medline]

    Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH Jr. High frequency retrotransposition in cultured mammalian cells. Cell (1996) 87:917–927.[CrossRef][Web of Science][Medline]

    Morrish TA, Gilbert N, Myers JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer MA, Moran JV. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet (2002) 31:159–165.[CrossRef][Web of Science][Medline]

    Morrish TA, Garcia-Perez JL, Stamato TD, Taccioli GE, Sekiguchi J, Moran JV. Endonuclease-independent LINE-1 retrotransposition at mammalian telomeres. Nature (2007) 446:208–212.[CrossRef][Medline]

    Nijman IJ, van Tessel P, Lenstra JA. SINE retrotransposition during the evolution of the pecoran ruminants. J Mol Evol (2002) 54:9–16.[CrossRef][Web of Science][Medline]

    Ogiwara I, Miya M, Ohshima K, Okada N. V-SINEs: a new superfamily of vertebrate SINEs that are widespread in vertebrate genomes and retain a strongly conserved segment within each repetitive unit. Genome Res (2002) 12:316–324.[Abstract/Free Full Text]

    Ohshima K, Okada N. SINEs and LINEs: symbionts of eukaryotic genomes with a common tail. Cytogenet Genome Res (2005) 110:475–490.[CrossRef][Web of Science][Medline]

    Ostertag EM, Kazazian HH Jr. Biology of mammalian L1 retrotransposons. Annu Rev Genet (2001a) 35:501–538.[CrossRef][Web of Science][Medline]

    Ostertag EM, Kazazian HH Jr. Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res (2001b) 11:2059–2065.[Abstract/Free Full Text]

    Perepelitsa-Belancio V, Deininger P. RNA truncation by premature polyadenylation attenuates human mobile element activity. Nat Genet (2003) 35:363–366.[CrossRef][Web of Science][Medline]

    Sen SK, Huang CT, Han K, Batzer MA. Endonuclease-independent insertion provides an alternative pathway for L1 retrotransposition in the human genome. Nucleic Acids Res (2007) 35:3741–3751.[Abstract/Free Full Text]

    Speek M. Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes. Mol Cell Biol (2001) 21:1973–1985.[Abstract/Free Full Text]

    Sugano T, Kajikawa M, Okada N. Isolation and characterization of retrotransposition-competent LINEs from zebrafish. Gene (2006) 365:74–82.[CrossRef][Web of Science][Medline]

    Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD. Human L1 retrotransposition is associated with genetic instability in vivo. Cell (2002) 110:327–338.[CrossRef][Web of Science][Medline]

    Szak ST, Pickeral OK, Makalowski W, Boguski MS, Landsman D, Boeke JD. Molecular archeology of L1 insertions in the human genome. Genome Biol (2002) 3:research0052.[Medline]

    Tamura M, Kajikawa M, Okada N. Functional splice sites in a zebrafish LINE and their influence on zebrafish gene expression. Gene (2007) 390:221–231.[CrossRef][Web of Science][Medline]

    Vandergon TL, Reitman M. Evolution of chicken repeat 1 (CR1) elements: evidence for ancient subfamilies and multiple progenitors. Mol Biol Evol (1994) 11:886–898.[Abstract]

    Volff JN, Korting C, Schartl M. Multiple lineages of the non-LTR retrotransposon Rex1 with varying success in invading fish genomes. Mol Biol Evol (2000) 17:1673–1684.[Abstract/Free Full Text]

    Wicker T, Robertson JS, Schulze SR. (11 co-authors). The repetitive landscape of the chicken genome. Genome Res (2005) 15:126–136.[Abstract/Free Full Text]

    Zingler N, Willhoeft U, Brose HP, Schoder V, Jahns T, Hanschmann KM, Morrish TA, Lower J, Schumann GG. Analysis of 5' junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining. Genome Res (2005) 15:780–789.[Abstract/Free Full Text]

    Zupunski V, Gubensek F, Kordis D. Evolutionary dynamics and evolutionary history in the RTE clade of non-LTR retrotransposons. Mol Biol Evol (2001) 18:1849–1863.[Abstract/Free Full Text]

Accepted for publication March 6, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
25/6/1148    most recent
msn061v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ichiyanagi, K.
Right arrow Articles by Okada, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ichiyanagi, K.
Right arrow Articles by Okada, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?