MBE Advance Access originally published online on June 30, 2007
Molecular Biology and Evolution 2007 24(9):2009-2015; doi:10.1093/molbev/msm130
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Acquisition of Endonuclease Specificity during Evolution of L1 Retrotransposon

* Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan
Department of Biological Sciences, Southern Illinois University, Edwardsvillie
E-mail: nokada{at}bio.titech.ac.jp.
| Abstract |
|---|
|
|
|---|
L1 is the most proliferative autonomous retroelement that comprises about 20% of mammalian genomes. Why L1s have proliferated so extensively in mammalian genomes is an important yet unsolved question. L1 copies are amplified via retrotransposition, in which the DNA cleavage specificity by the L1-encoded endonuclease (EN) primarily dictates sites of insertion. Whereas mammalian L1s show target preference for 5'-TTAAAA-3', other L1-like elements exhibit various degrees of target specificity. To gain insights on diversification of the EN specificity during L1 evolution, ENs of zebrafish L1 elements were analyzed here. We revealed that they form 3 discrete clades, M, F, and Tx1, which is in stark contrast to a single L1 clade in mammalian species. Interestingly, zebrafish clade M elements cluster as a sister group of mammalian L1s and show target-site preference for 5'-TTAAAA-3'. In contrast, elements of the clade F, the immediate outgroup of the clade M, show little specificity. We identified certain clade-specific amino acid residues in EN, many of which are located in the cleft that recognizes the substrate, suggesting that these amino acid alterations have generated 2 types of ENs with different substrate specificities. The distribution pattern of the 3 clades suggests a possibility that the acquisition of target specificity by the L1 ENs improved the L1 fitness under the circumstances in mammalian hosts.
Key Words: cleavage specificity endonuclease L1 evolution LINE retrotransposon
| Introduction |
|---|
|
|
|---|
Mammalian genomes contain over half a million copies of long interspersed nuclear elements (LINE)-1 (L1), which account for 17–23% of the respective genomes (Lander et al. 2001
Because of this TPRT mechanism, the DNA cleavage specificity of the EN domain primarily determines sites of LINE insertion (Luan et al. 1993
; Feng et al. 1996
, 1998
; Takahashi and Fujiwara 2002
). Human L1 preferentially inserts at 5'-TT
AAAA-3', where "
" indicates the site of insertion (Moran et al. 1996
; Gilbert et al. 2002
, 2005
; Morrish et al. 2002
; Symer et al. 2002
), and its EN cleaves the TpA bond in 5'-TTTTAA-3' on the complementary strand (Feng et al. 1996
; Cost and Boeke 1998
). The mammalian L1s belong to the L1 clade, which includes numerous LINEs of a variety of organisms (Malik et al. 1999
). Phylogenetic analysis has suggested that ENs of L1-clade LINEs are the oldest of LINE-encoded AP endonuclease–like ENs (Malik et al. 1999
), which share a common ancestral origin with cellular AP endonuclease and DNase I. Because L1-clade elements show varying degrees of target-site specificity (Zingler et al. 2005
), important unresolved issues include how the L1-encoded ENs evolved to acquire the target specificity and whether such specificity is implicated in the explosive L1 proliferation in mammals.
To better understand L1 dynamics and evolution, it is necessary to study L1s in the nonmammalian vertebrate classes as well as mammalian L1s. The zebrafish genome, for example, has a variety of L1 elements. These L1s form multiple clades, each of which contains retrotranspositionally active elements (Duvernell et al. 2004
; Furano et al. 2004
; Ichiyanagi and Okada 2006
), which is in contrast to that mammalian L1s generally form a single clade despite their enormous copy numbers (Smit et al. 1995
; Furano 2000
; Furano et al. 2004
; Khan et al. 2006
). Phylogenetic relationships among these fish L1 clades and the mammalian clade are uncertain based on the analyses of the RT sequences. In the present study, we analyzed the ENs of these zebrafish L1 elements. We revealed the presence of 3 clades, phylogenetic relationships among these and mammalian L1s, their target-site specificity, and some clade-specific amino acid residues in the EN domain that might be involved in defining L1 target specificities. Based on these results, we discuss an impact of the acquisition of target specificity on the L1 evolution.
| Methods |
|---|
|
|
|---|
Phylogenetic Analysis
The amino acid sequences of the EN domains of all LINEs analyzed were deduced from the representative nucleotide sequences available in RepBase (Jurka et al. 2005
Analysis of Target Sites of Genomic Zebrafish L1s
The locations and nucleotide sequences (with 60-bp flanking regions) of L1-1_DR to L1-10_DR and L1-Tx1-1_DR elements in the zebrafish genome (danRer3, May 2005) were downloaded from the University of California, Santacruz, genome browser at http://genome.ucsc.edu/ (Hinrichs et al. 2006
). Along with these L1 copies, we selected L1_DRs that carry the complete 3' terminus. The genomic copies of L1-SW1_DR to L1-SW4_DR were previously reported (Duvernell et al. 2004
). The target-site duplication (TSD) was identified using GENETYC-MAC 10.1.1 (software development) with the criteria: 1) the unit of direct repeat is
12 bp, 2) the divergence of the 2 units is
10% of their lengths, and 3) the repeated sequences are located within the 60-bp region flanking the LINE (in some cases, the TSD region overlapped the L1 sequence). Then, we inferred the target-site sequences for these L1 copies with a TSD. For instance, if an L1 sequence is flanked by 5'-ATCAGCTTAAAATAGAAGTTTTAG-'3' at the 5' end and 5'-AAAATAGAAGTTTTAGTAAATTG-3' at the 3' end (the underlines indicate a TSD), the inferred target sequence for this L1 insertion is 5'-ATCAGCTT
AAAATAGAAGTTTTAGTAAATTG -3' (
indicates the site of insertion).
| Results and Discussion |
|---|
|
|
|---|
The Endonuclease Phylogeny
It has been reported that the zebrafish genome harbors various L1 elements (Duvernell et al. 2004
|
Interestingly, the clade M also includes mammalian L1s. The sibling relationship of the mammalian and fish members of clade M has been seen in Neighbor-Joining trees of the RT sequences, but bootstrap supports were low (Furano et al. 2004
Compilation of Target Sites for Zebrafish L1 Insertions
Next, we tried to investigate cleavage specificities of the endonucleases encoded by the 3 clades of L1 elements. Our attempts to obtain recombinant endonucleases of the zebrafish L1s were not successful. Thus, we determined the sequences of their insertion sites because L1 insertion sites reflect the cleavage specificity of the element-encoded ENs (Feng et al. 1996
; Cost et al. 2002
). Our previous analysis (Ichiyanagi and Okada 2006
) revealed that about half of the collected zebrafish L1 copies (clades M and F) have a TSD of 11–20 bp, although this analysis could identify only 20 target-site sequences. Thus, we further collected
200 genomic L1 copies and searched for a direct repeat of 12 bp or longer (allowing <10% divergence between the repeat units) in 60-bp regions flanking each end of an L1 copy. Target sequences were then inferred from the sequence information of genomic copies containing an obvious TSD. We could identify TSDs in 67 and 14 copies of the clades M and F, respectively, whereas no members of the clade Tx1 have an obvious TSD. Our subsequent analysis, therefore, focused on the clades M and F.
The compilation of the target-site sequences revealed striking differences between the clades M and F. All L1s in the clade M have a target preference for sites resembling 5'-TT
AAAA-3' (fig. 2A), which is the consensus sequence for human L1 targets as well. On the other hand, L1s in the clade F exhibit only very weak specificity, if any, and thus do not show the preference for 5'-TTAAAA-3' (fig. 2B). The difference in the conservation among these target sequences was validated by a Mann–Whitney test for the numbers of nucleotides identical to 5'-TTAAAA-3' in a target sequence (P < 0.00001; fig. 2C). Therefore, L1s of the clades M and F have significantly different target preference for their insertion, implying different degrees of cleavage specificity of their ENs.
|
It has been suggested that ENs of L1s are the oldest of the AP endonuclease-like ENs of LINEs, including the clades of L1, RTE, Tad, R1, LOA, I, CR1, L2, and Jockey (Malik et al. 1999
Clade-Specific Amino Acids on the DNA Recognition Cleft of the EN Domains
Human L1 EN specifically cleaves 5'-TTTT
AA-3' on the complementary strand of the consensus target sequence, 5'-TTAAAA-3'. Rather than the simple base sequence, this enzyme has been proposed to recognize the special geometry of the 5'-TnAn-3' duplex, which has a narrow minor groove in the A tract and structural flexibility at the T-A step (Cost and Boeke 1998
), where base stacking is minimal (Mack et al. 2001
; Stefl et al. 2004
). Based on the crystal structure of the free EN domain of human L1, a structural model for substrate recognition has been proposed (Weichenrieder et al. 2004
). In the model, the adenosine downstream of the scissile bond is flipped out from the helix and the EN domain recognizes this extrahelical adenosine with Phe-193 and Ile-204 interacting with the sugar moiety, and Arg-155 and Ser-202 making hydrogen bonds to the base moiety.
To infer which amino acid alterations are ascribable to the difference in the degree of target-site specificity between the clades M and F, we aligned the amino acid sequences of their ENs (fig. 3). Whereas many residues are conserved among all or most ENs (fig. 3; gray-shaded residues), some residues are clade specific (fig. 3; residues shaded by light pink, pink, light blue, or blue; see caption for color codes). Interestingly, some of these clade-specific residues are clustered in the regions around residues Arg-155 and Phe-193 of human L1 EN. When mapped onto the crystal structure of human L1 EN, the conserved residues are clustered in the protein interior, catalytic center, and bottom of the DNA-binding cleft (fig. 4B and C). In contrast, the clade-specific residues constitute the wall of the DNA-binding cleft, which contains Arg-155 and Phe-193 (fig. 4C). In addition, the finger-like ß hairpin (residues from Phe-194 to Tyr-201), which comprises a part of the wall (fig. 4C), carries 4 clade-specific residues and 2 amino acid additions in clade F (fig. 3). Therefore, its configuration is likely altered severely in the clade F ENs. These features argue in favor of the idea that the cumulative amino acid alterations (including indels) at the clade-specific positions are responsible for generating the clade M ENs with the unique target specificity.
|
|
Target Specificity and L1 Proliferation
Each zebrafish L1 subfamily seems to include currently active elements, as judged by the presence of genomic copies that are identical, or almost identical, to the consensus sequences. As discussed above, L1s diverged into the 3 clades in the common ancestor of fish and mammals. In the fish lineage, these L1s have retained their activity of proliferation regardless of the degree of target specificity, although the copy numbers are lower than in mammals (100–1,000 copies, fig. 1). Why fish have tolerated the proliferative activities of both sequence-specific and nonsequence-specific L1s is currently unknown. However, the tolerance for nonsequence-specific transposable elements seems a character of zebrafish because it also harbors total
70,000 copies of L2-, CR1-, and RTE-clades of LINEs, all of which exhibit nonsequence-specific insertions (Ichiyanagi et al. 2007
On the other hand, only the clade M elements have maintained their proliferative activity in the mammalian lineage to occupy a substantial fraction of the genomes (
20%, >500,000 copies in total). Therefore, it is conceivable that the clade M members have gained much better fitness in mammalian hosts. Such better fitness should involve several cumulative factors, such as the regulation of L1 transcription, the efficiency of retrotransposition, and the neutralization of the harmful potential of insertions. We propose that the acquisition of target specificity, rather than a diminution thereof, by L1 endonucleases is one of the factors for the successful amplification of mammalian L1s because the moderate restriction of insertion targets provides a better chance for L1 to be tolerated by host genomes. For example, the preference for 5'-TTAAAA-3' substrates directs the L1 insertion toward noncoding regions, which partly neutralizes the mutagenic toxicity of the L1 insertion (Cost and Boeke 1998
), thereby better than random insertion. It may be also possible that the cleavage specificity for 5'-TTTT
AA-3' on the primer strand generates a more efficient L1 retrotransposition machinery by providing a better probability of annealing of the target DNA and the polyA tail of the L1 RNA, which assists the initiation of reverse transcription (Ostertag and Kazazian 2001
; Kulpa and Moran 2006
). Consistently, zebrafish clade M members have higher copies numbers than clade F elements (fig. 1; P = 0.045 and 0.034 by U and t tests, respectively), although these numbers show relatively large variance.
Mammalian L1 elements have undergone multiple waves of amplification in the last >100 Myr (Furano 2000
). After each wave, a new subfamily emerged from a preexisting active subfamily, then it predominated over the predecessor for the replication process possibly by acquiring a different regulatory sequence in its 5' untranslated region and/or the rapid evolution of the ORF1 protein (Martin et al. 1985
; Adey et al. 1994
; Furano 2000
; Boissinot and Furano 2001
; Khan et al. 2006
). Because these L1s are all clade M members (fig. 1 and the supplementary figure, Supplementary Material online), the acquisition of target specificity made little impact on this phenomenon. Rather, as discussed above, our results suggest that the acquired target specificity played an important role in the predomination of clade M members over other clades at earlier stages. In closing, our study underscores that better understanding of the dynamics of nonmammalian vertebrate L1s will provide important information on why L1s have proliferated so extensively in the course of mammalian evolution.
| Supplementary Material |
|---|
|
|
|---|
In addition to elements shown in figure 1, all other human L1 subfamilies (except for L1PA17 and L1MA4 and 5) were included in the phylogenetic analysis. Multiple alignment, Neighbor-Joining tree construction, and bootstrap calculations were carried out as described in Methods. Supplementary figure is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Mr Mitsuhiro Nakamura for helpful comments on the manuscript. This work was supported by a Grant-in-Aid to N.O. from the Ministry of Education, Culture, Sports, Science and Technology of Japan and by the 21st Century Center of Excellence program of the ministry.
| Footnotes |
|---|
1 Present address: Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Mishima, Shizuoka, Japan.
William Martin, Associate Editor
| References |
|---|
|
|
|---|
Adey NB, Schichman SA, Graham DK, Peterson SN, Edgell MH, Hutchison CA 3rd. Rodent L1 evolution has been driven by a single dominant lineage that has repeatedly acquired new transcriptional regulatory sequences. Mol Biol Evol (1994) 11:778–789.[Abstract]
Boissinot S, Furano AV. Adaptive evolution in LINE-1 retrotransposons. Mol Biol Evol (2001) 18:2186–2194.
Cost GJ, Boeke JD. Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry (1998) 37:18081–18093.[CrossRef][Medline]
Cost GJ, Feng Q, Jacquier A, Boeke JD. Human L1 element target-primed reverse transcription in vitro. EMBO J (2002) 21:5899–5910.[CrossRef][Web of Science][Medline]
Duvernell DD, Pryor SR, Adams SM. Teleost fish genomes contain a diverse array of L1 retrotransposon lineages that exhibit a low copy number and high rate of turnover. J Mol Evol (2004) 59:298–308.[CrossRef][Web of Science][Medline]
Feng Q, Moran JV, Kazazian HH Jr, Boeke JD. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell (1996) 87:905–916.[CrossRef][Web of Science][Medline]
Feng Q, Schumann G, Boeke JD. Retrotransposon R1Bm endonuclease cleaves the target sequence. Proc Natl Acad Sci USA (1998) 95:2083–2088.
Furano AV. The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. Prog Nucleic Acid Res Mol Biol (2000) 64:255–294.[Web of Science][Medline]
Furano AV, Duvernell DD, Boissinot S. L1 (LINE-1) retrotransposon diversity differs dramatically between mammals and fish. Trends Genet (2004) 20:9–14.[CrossRef][Web of Science][Medline]
Gentles AJ, Wakefield MJ, Kohany O, Gu W, Batzer MA, Pollock DD, Jurka J. Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica. Genome Res (2007) 17:992–1004.
Gibbs RA, Weinstock GM, Metzker ML, et al, (230 co-authors). Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 428:493–521.[CrossRef][Medline]
Gilbert N, Lutz S, Morrish TA, Moran JV. Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol Cell Biol (2005) 25:7780–7795.
Gilbert N, Lutz-Prigge S, Moran JV. Genomic deletions created upon LINE-1 retrotransposition. Cell (2002) 110:315–325.[CrossRef][Web of Science][Medline]
Hinrichs AS, Karolchik D, Baertsch R, et al, (27 co-authors). The UCSC genome browser database: update 2006. Nucleic Acids Res (2006) 34:D590–D598.
Hohjoh H, Singer MF. Cytoplasmic ribonucleoprotein complexes containing human LINE-1 protein and RNA. EMBO J (1996) 15:630–639.[Web of Science][Medline]
Ichiyanagi K, Nakajima R, Kajikawa M, Okada N. Novel retrotransposon analysis reveals multiple mobility pathways dictated by hosts. Genome Res (2007) 17:33–41.
Ichiyanagi K, Okada N. Genomic alterations upon integration of zebrafish L1 elements revealed by the TANT method. Gene (2006) 383:108–116.[CrossRef][Web of Science][Medline]
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res (2005) 110:462–467.[CrossRef][Web of Science][Medline]
Khan H, Smit A, Boissinot S. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res (2006) 16:78–87.
Kolosha VO, Martin SL. In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition. Proc Natl Acad Sci USA (1997) 94:10155–10160.
Kulpa DA, Moran JV. Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat Struct Mol Biol (2006) 13:655–666.[CrossRef][Web of Science][Medline]
Kumar S, Tamura K, Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.
Lander ES, Linton LM, Birren B, et al, (255 co-authors). Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
Lindblad-Toh K, Wade CM, Mikkelsen TS, et al, (236 co-authors). Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 438:803–819.[CrossRef][Medline]
Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell (1993) 72:595–605.[CrossRef][Web of Science][Medline]
Mack DR, Chiu TK, Dickerson RE. Intrinsic bending and deformability at the T-A step of CCTTTAAAGG: a comparative analysis of T-A and A-T steps within A-tracts. J Mol Biol (2001) 312:1037–1049.[CrossRef][Web of Science][Medline]
Malik HS, Burke WD, Eickbush TH. The age and evolution of non-LTR retrotransposable elements. Mol Biol Evol (1999) 16:793–805.[Abstract]
Martin SL, Bushman FD. Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol Cell Biol (2001) 21:467–475.
Martin SL, Cruceanu M, Branciforte D, Wai-Lun Li P, Kwok SC, Hodges RS, Williams MC. LINE-1 retrotransposition requires the nucleic acid chaperone activity of the ORF1 protein. J Mol Biol (2005) 348:549–561.[CrossRef][Web of Science][Medline]
Martin SL, Voliva CF, Hardies SC, Edgell MH, Hutchison CA 3rd. Tempo and mode of concerted evolution in the L1 repeat family of mice. Mol Biol Evol (1985) 2:127–140.[Abstract]
Mikkelsen TS, Hillier LW, Eichler EE, et al, (67 co-authors). Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 437:69–87.[CrossRef][Medline]
Mikkelsen TS, Wakefield MJ, Aken B, et al, (235 co-authors). Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature (2007) 447:167–177.[CrossRef][Medline]
Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH Jr. High frequency retrotransposition in cultured mammalian cells. Cell (1996) 87:917–927.[CrossRef][Web of Science][Medline]
Morrish TA, Gilbert N, Myers JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer MA, Moran JV. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet (2002) 31:159–165.[CrossRef][Web of Science][Medline]
Ostertag EM, Kazazian HH Jr. Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res (2001) 11:2059–2065.
Smit AFA, Hubley R, Green P. 1996–2004. RepeatMasker Open-3.0 [Internet]. Available from:http://www.repeatmasker.org.
Smit AF, Toth G, Riggs AD, Jurka J. Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J Mol Biol (1995) 246:401–417.[CrossRef][Web of Science][Medline]
Stefl R, Wu H, Ravindranathan S, Sklenar V, Feigon J. DNA A-tract bending in three dimensions: solving the dA4T4 vs. dT4A4 conundrum. Proc Natl Acad Sci USA (2004) 101:1177–1182.
Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD. Human L1 retrotransposition is associated with genetic instability in vivo. Cell (2002) 110:327–338.[CrossRef][Web of Science][Medline]
Takahashi H, Fujiwara H. Transplantation of target site specificity by swapping the endonuclease domains of two LINEs. EMBO J (2002) 21:408–417.[CrossRef][Web of Science][Medline]
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res (1997) 25:4876–4882.
Waterston RH, Lindblad-Toh K, Birney E, et al, (222 co-authors). Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 420:520–562.[CrossRef][Medline]
Weichenrieder O, Repanas K, Perrakis A. Crystal structure of the targeting endonuclease of the human LINE-1 retrotransposon. Structure (2004) 12:975–986.[Medline]
Zingler N, Weichenrieder O, Schumann GG. APE-type non-LTR retrotransposons: determinants involved in target site recognition. Cytogenet Genome Res (2005) 110:250–268.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. K. Kojima and N. Okada mRNA Retrotransposition Coupled with 5' Inversion as a Possible Source of New Genes Mol. Biol. Evol., June 1, 2009; 26(6): 1405 - 1420. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ichiyanagi and N. Okada Mobility Pathways for Vertebrate L1, L2, CR1, and RTE Clade Retrotransposons Mol. Biol. Evol., June 1, 2008; 25(6): 1148 - 1157. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




