Skip Navigation


MBE Advance Access originally published online on April 7, 2008
Molecular Biology and Evolution 2008 25(7):1395-1404; doi:10.1093/molbev/msn081
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/7/1395    most recent
msn081v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kojima, K. K.
Right arrow Articles by Kanehisa, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kojima, K. K.
Right arrow Articles by Kanehisa, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Systematic Survey for Novel Types of Prokaryotic Retroelements Based on Gene Neighborhood and Protein Architecture

Kenji K. Kojima*,{dagger} and Minoru Kanehisa*

* Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
{dagger} Graduate School of Biosciences and Biotechnology, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan

E-mail: kojima.k.ac{at}m.titech.ac.jp.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Retroelements, elements encoding reverse transcriptase (RT), are ubiquitous in eukaryotes and have a great influence on the evolution of our genome. Detailed information is available on eukaryotic retroelements; however, prokaryotic retroelements are poorly understood. Recently, new types of eukaryotic retroelements were characterized on the basis of their gene composition and their phylogenetic positions. Here we performed a systematic survey to identify novel types of prokaryotic retroelements by analyzing gene neighborhood and protein architecture. We found novel types of gene combination and examined whether they represent actual retroelements. Five monophyletic groups were identified that were distinct from characterized prokaryotic retroelements, showed specific gene combination, were distributed patchily, and included at least 1 example of recent integration. These results strongly indicated the frequent horizontal transfer of these elements. One group encoded DNA polymerase A. A possible function of DNA polymerase A in the life cycle of retroelements is catalyzing second-strand cDNA synthesis, which is DNA polymerization performed using a DNA template not an RNA template. Another group encoded both bacterial primase and carbon–nitrogen hydrolase. Primase is likely to synthesize primers to initiate reverse transcription. Two other groups also encoded carbon–nitrogen hydrolase as a fusion protein with RT. It is difficult to speculate on the function of hydrolase in the life cycle of retroelements. The last group encoded dual RT proteins, which are likely to form heterodimers during replication. The protein sets of these 5 groups of prokaryotic retroelements were completely different from those of eukaryotic retroelements, indicating that the survival constraints of prokaryotic elements were distinct from those of eukaryotic elements. It is likely that these prokaryotic retroelements are maintained as extrachromosomal DNA or RNA or are accidentally integrated into genomes. Our findings presented the possibility that many types of extrachromosomal prokaryotic retroelements remain to be characterized. In addition, we found 8 RT genes were associated with clustered regularly interspaced short palindrome repeats (CRISPRs) of the CRISPR–Cas system. These RT genes are likely to work in immunity against RNA phages via cDNA synthesis.

Key Words: retroelement • reverse transcriptase • DNA polymerase • primase • CRISPR–Cas system


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Reverse transcriptase (RT) is an enzyme that catalyzes DNA polymerization using an RNA template. It is believed that RT played a central role in the transition from the RNA world to the DNA world. Nevertheless, at present, almost all RT genes are observed in mobile genetic elements, for example, retroviruses and retrotransposons (Coffin et al. 1997Go). Elements that include the RT gene are called retroelements.

Even now, the RT activities greatly influence eukaryotes. Retroviruses such as human immunodeficiency virus and human T-lymphotropic virus are major pathogens causing infectious diseases and cancers (Coffin et al. 1997Go). More than 40% of our genome is occupied by retrotransposed sequences (Lander et al. 2001Go), most of which have been transposed by the RT of long interspersed nuclear element-1 (L1) (Esnault et al. 2000Go; Dewannieux et al. 2003Go). Both L1 insertion and recombination between 2 L1 copies at different loci cause genetic diseases (Deininger et al. 2003Go; Chen et al. 2005Go). However, RT activity is not always harmful. Telomerase RT is essential for the maintenance of chromosome termini (Blackburn 2000Go). L1 retrotransposition occasionally mediates DNA repair (Morrish et al. 2002Go). The mobilization of both L1 and non-L1 sequences contributes to genome evolution by altering gene expression (van de Lagemaat et al. 2003Go; Bejerano et al. 2006Go), forming processed pseudogenes (Esnault et al. 2000Go), and shuffling exons (Moran et al. 1999Go).

The RT gene originated in the RNA world and has been transmitted to eukaryotes where it has diversified. Is this gene present in prokaryotes? Three groups of prokaryotic retroelements have been characterized: group II introns, retrons, and diversity-generating retroelements (DGRs). Group II introns are self-splicing introns that multiply via reverse transcription (Zimmerly et al. 1995Go; Cousineau et al. 2000Go). They are the only retroelement group distributed among all 3 domains of life. The RT of retrons produces a short single-stranded DNA that is covalently linked to a short single-stranded RNA; this DNA has been designated multicopy single-stranded DNA (msDNA) (Lampson et al. 2005Go). Rychlik et al. (2001)Go reported a retron-like RT that produces double-stranded DNA with single-stranded overhangs; this has been termed sdsDNA. The functions of msDNA and sdsDNA are unknown. DGRs are a recently characterized group of retroelements that generate site-directed adenine–specific mutations (Doulatov et al. 2004Go). DGRs consist of 1 gene for RT, 1 template repeat (TR), and 1 or 2 regions of variability (VR) in the coding region for lectin fold proteins. It was proposed that the DGR RT reverse transcribes the TR RNA and that the TR cDNA fragment with mutations is replaced with a VR region.

Recent progress in genome sequencing has resulted in the identification of many RT gene sequences in both eukaryotic and prokaryotic genomes; this identification has led to the characterization of new groups of retroelements in eukaryotes (Goodwin and Poulter 2001Go; Lyozin et al. 2001Go; Volff et al. 2001Go; Poulter and Goodwin 2005Go). The existence of these new groups is supported not only by the fact that their phylogenetic positions are located at a distance from those of other known retroelements but also by their novel gene compositions that have never been observed in other retroelements. Dictyostelium intermediate repeat sequence 1 (DIRS-1) and related retrotransposons encode tyrosine recombinase domains downstream of the RT gene (Goodwin and Poulter 2001Go; Poulter and Goodwin 2005Go); this implies that their integration mechanisms involve the use of recombinase not integrase. Comparative analyses of another type of retrotransposons, that is, Penelope-like elements, revealed that they encode GIY-YIG (also called Uri) endonucleases (Lyozin et al. 2001Go; Volff et al. 2001Go); this endonuclease activity has been experimentally demonstrated (Pyatkov et al. 2004Go). The existence of the endonuclease domain along with frequent 5' truncations further indicates the target-primed reverse transcription of these elements. These findings indicate that the genetic composition of a retroelement affects its life cycle.

In this study, in order to investigate a variety of prokaryotic retroelements, we performed a systematic survey to identify the novel genetic compositions of prokaryotic retroelements. We carefully examined whether novel types of gene combination are accidentally formed or whether they represent mobile retroelements. Five monophyletic groups with the same gene combination were patchily distributed among bacteria and included at least 1 example of recent integration. These results strongly indicated that the RT genes and the RT-associated genes of these 5 groups were cotransferred horizontally. We will discuss the possible functions of RT-associated genes in these retroelements. We will also discuss the structure and function of RT genes associated with the clustered regularly interspaced short palindrome repeat (CRISPR)–Cas system because it is likely that RT genes are coupled with the CRISPR–Cas system on several occasions.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Identification of RT Genes and Putative Retroelements
The Reference Sequence (RefSeq) collection of 26 July 2006 and the National Center for Biotechnology Information (NCBI) Blast 2.2.11 package were downloaded from the NCBI FTP server (ftp://ftp.ncbi.nih.gov/blast/). To find all RT genes in prokaryotic genomes, we searched the RefSeq protein database using position-specific iterative basic local alignment search tool PSI-Blast (Altschul et al. 1997Go) with 3 iterations using representative protein sequences of all characterized retroelement groups as queries (supplementary table S1, Supplementary Material online). Of the protein hits obtained, only proteins from prokaryotes were selected. Proteins whose E value for HMMER 2.3.2 (http://hmmer.janelia.org/) with RVT_1.hmm (PF00078) was higher than 5.0 were removed. We performed the PSI-Blast search repeatedly using all prokaryotic RT proteins found in the previous iteration as queries until no new prokaryotic RT proteins were found. We obtained the neighboring gene sequences of each RT gene and clustered all neighboring genes using the BLASTCLUST program in the NCBI Blast 2.2.11 package with the similarity threshold set at 30% identity and the minimum length coverage set at 0.5. The strategies used to screen mobile retroelements from these clusters are described in the Results and Discussion. We also manually investigated whether RT proteins contain additional domains or motifs by using the NCBI conserved domain database (CDD) search (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi).

Phylogenetic Analyses
We collected all RT sequences of putative retroelements, extracted their RT domains using HMMER with RVT_1.hmm followed by manual boundary adjustment, and aligned them with the characterized retroelements using MAFFT 5.6.4 (http://align.bmr.kyushu-u.ac.jp/mafft/software/). We selected 2 sequences for each class of group II introns proposed by Zimmerly et al. (2001)Go. We used all the characterized retroelements other than group II introns except DGR_B.thetalotaomicron and DGR_N.7120_1 because these 2 sequences were truncated in the RT domain. To exclude the uncertainty caused by alignment errors, we extracted only the sites at which all the sequences were aligned without gaps. After the extraction, 139 sites of 82 RT sequences remained. The alignment file is available upon request. We used ModelGenerator (http://bioinf.may.ie/software/modelgenerator/) to obtain the models and parameters for the likelihood analysis. We applied the model Blosum62 + I + G for the analyses on the basis of Akaike information criteria 1, 2 and the Bayesian information criterion (Keane et al. 2006Go). The maximum likelihood (ML) tree was constructed with 500 bootstrap replicates by Treefinder (http://www.treefinder.de/). The Bayesian phylogenetic inference tree was constructed using MrBayes 3.1 (http://mrbayes.csit.fsu.edu/). The Markov chain Monte Carlo chain length was 500,000 generations with trees sampled every 100 generations; the first 1,250 trees were discarded as burn-in. The Neighbor-Joining (NJ) tree was constructed with 1,000 bootstrap replicates by ClustalX (http://www.embl.de/~chenna/clustal/darwin/). The distance was computed as percent divergence. The phylogenetic trees were drawn with the aid of NJPLOT (http://pbil.univ-lyon1.fr/software/njplot.html) and FigTree (http://evolve.zoo.ox.ac.uk/software.html?id=figtree).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Gene Neighborhood and Protein Architectures of Prokaryotic RT Genes
Using an iterative PSI-Blast search and HMMER, we found 516 prokaryotic RT sequences from the RefSeq protein database (data not shown). Of these sequences, 7 were highly similar to certain eukaryotic retroelements. Because they were either contaminated or horizontally transferred into prokaryotic genomes recently, we excluded these sequences from further analysis.

We collected genes from both sides of the remained 509 RT genes. After clustering genes with a sequence identity threshold >30%, we excluded clusters belonging to the following 4 categories: 1) clusters of transposase genes and their derivatives, 2) clusters of genes from the same genus, 3) clusters with <3 RT gene representatives, and 4) clusters with many similar genes not associated with RT genes. These clusters were excluded because they could have been accidentally formed due to independent transposon insertions near RT genes, an ancient single RT insertion event, or independent RT gene insertions near members of a large gene family. Finally, we manually examined all the remaining clusters and found that 1 cluster was due to a gene prediction error inside the catalytic RNA domains of group II introns and another resulted from group II intron insertions at the same gene. We also searched multidomain proteins containing RT using CDD (Marchler-Bauer et al. 2003Go) and HMMER. Except X domain and/or HNH-type endonucleases encoded by group II introns, we found that 13 RT proteins shared motifs with other RT proteins. We added 2 cas1-flanked RT genes (ZP_01359337 and ZP_00766283) to the set of putative retroelements because cas1 genes were fused with the other 6 RT genes. In summary, we found 8 gene families flanked or fused with 40 RT genes and 3 examples of combination between 2 RT genes (table 1). We now describe the putative retroelements indicated by the numbers shown in the left column of table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 Genes Flanked or Fused with RT

 
We classified the putative retroelements into groups based on their gene composition and structure (table 1). We divided the putative retroelements encoding hydrolase into 3 groups (E, F1, and F2). Group E encoded bacterial primase in addition to hydrolase, and the hydrolase gene was flanked by the RT gene. In both groups F1 and F2, the hydrolase gene was fused with the RT gene. The length of group F1 proteins was approximately 200 amino acids longer than that of group F2 proteins. We also divided cas1-fused RT genes into 1 group and 2 singletons based on the length of the proteins encoded by these genes and the existence of cas2 genes. In group H1, cas2 genes were located near the RT-cas1 fusion gene. The singleton H2 (the putative retroelement no. 40) contained no cas2 genes. The RT protein length of the singleton H3 (no. 41) was longer than those of the group H1 retroelements and the singleton H2. The other groups consisted of putative retroelements with the same gene composition.

Phylogenetic Analyses
It is important to determine whether or not the genes flanked or fused with RT genes are components of mobile retroelements. We examined this question by analyzing the phylogeny, distribution, and recent integration of these genes. First, we constructed phylogenetic trees of these putative retroelements and characterized retroelements with 3 methods: the ML method, Bayesian phylogenetic inference method, and the NJ method. Figure 1 shows the ML tree. Because the root of the RT protein family is still being debate (Eickbush and Malik 2002Go), the tree in figure 1 was arbitrarily rooted for convenient display, although it should be considered unrooted.


Figure 1
View larger version (48K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Phylogenetic tree of characterized retroelements and putative retroelements proposed in this study. The tree was arbitrarily rooted for convenient display, although it should be considered unrooted. Putative retroelements are represented as the number shown in table 1, and names of characterized retroelements are gray colored. The bootstrap value is shown at each node if it is more than 50%. The bootstrap values in the ML tree, the posterior probability in the Bayesian inference tree, and the bootstrap value in the NJ tree for each group are shown in parentheses from the left to the right.

 
Because our goal is to characterize novel types of retroelements, we initially investigated whether these putative retroelement groups belong to characterized retroelement groups. According to our phylogenetic analysis, group A retroelements represent 1 lineage of retrons. This identification was supported by the finding that the putative retroelements no. 1 and 2 correspond to retrons Mx65 and Mx162, respectively. Group B included DGR_N.punctiforme and DGR_N.7120-2; therefore, the retroelements in this group are DGRs. The phylogenetic analysis indicated that the members of group C also belong to DGRs, and we found TRs and VRs, the sequences characteristic of DGRs, in all 3 group C elements (data not shown). Hence, the members of group C also belong to DGRs. We did not analyze groups A–C further.

The RT genes flanked or fused with genes that are components of mobile retroelements are expected to be monophyletic or paraphyletic. The results of our phylogenetic analysis supported the monophyly of the putative retroelements encoding DNA polymerase A (the group D). Group G elements encoded 2 RT genes, which were not similar to each other. The upstream RT genes (YP_107211, YP_049163, and YP_682737) of group G were clustered; the downstream RT genes (YP_107210, YP_049164, and YP_682736) were clustered at a position distant from the upstream RT genes. Putative retroelements encoding hydrolase were clustered into 3 lineages (groups E, F1, and F2) that were consistent with their structure. Although our phylogenetic trees also indicated the monophyly of these 3 groups, when additional RT sequences were included in the phylogenetic analysis, the analysis did not support the clustering of these 3 groups (data not shown). The monophyly of groups D, E, F1, F2, and G was statistically supported by the 3 methods used (fig. 1, numbers in parentheses). The phylogenetic analysis did not support the monophyly of either cas1-encoding putative retroelements or group H1 retroelements but only of the putative retroelements no. 36–38. This result is consistent with the phylogenetic tree based on the cas1 sequences in which the cas1 genes of putative retroelements no. 39–41 were positioned in distinct lineages (Makarova et al. 2006Go). The phylogeny and structure of cas1-encoding putative retroelements strongly implied that the RT genes were associated with cas1 multiple times. Although the cas1-encoding putative retroelements were phylogenetically related to group II introns, we did not identify self-splicing intron motifs near these sequences.

Patchy Distribution
We investigated the host organisms of each putative retroelement group. Retroelements are likely to be horizontally transferred if an organism shares retroelements of a particular group not with closely related organisms but with distantly related organisms. For example, the host organisms of group E were beta-proteobacteria (Bordetella), alpha-proteobacteria (Paracoccus), and green sulfur bacteria (Chlorobium). Because many proteobacterial genomes have been completely sequenced, group E elements are obviously patchily distributed. Similarly, groups D, F1, F2, and G showed patchy distribution (table 1), and therefore, they were likely to be horizontally transferred. There was no evidence for the patchy distribution of monophyletic cas1-encoding putative retroelements (no. 36–38) because Chlorobium and Pelodictyon are closely related green sulfur bacteria.

Analysis for Recent Integration Events
Comparing the genome sequence of closely related organisms can reveal the recent integration events of putative retroelements. We performed a BlastN search with ~40-kb sequence around putative retroelements as queries for all genomes of the same genus at the Web site NCBI Blast with microbial genomes (http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi). We found 7 integration events and classified them into 2 types (table 1, daggers and double daggers). Detailed information of these strain-specific inserts is shown in supplementary table S2 (Supplementary Material online).

The first type of integration event (table 1 and supplementary table S2 [Supplementary Material online], daggers) involved retroelements that had hitchhiked on bacteriophages and were integrated into genomes as parts of prophages. We found 2 phage integrations, and in these cases, we were able to characterize target site duplications generated by phage integrations. Both inserts encoded phage integrase family proteins. However, we were unable to characterize the boundaries of the retroelements inside the phages. Retrons and DGRs were also reported inside prophages (Doulatov et al. 2004Go; Lampson et al. 2005Go).

In other 5 cases of integration events (table 1 and supplementary table S2 [Supplementary Material online], double daggers), retroelements were integrated at highly variable loci. Genes at these loci differed between strains. For example, retroelement no. 21 was positioned in a 17,793-bp strain-specific sequence of Pseudomonas syringae pv. phaseolicola 1448A. The same locus of P. syringae pv. syringae B728a was occupied by the gene of the insecticidal toxin protein TcdA1, and P. syringae pv. tomato str. DC3000 contained 14 genes, including nonribosomal peptide synthetase genes, at this locus. In these 5 cases, we could find no target site duplications, even though 2 of the integrants (integrants of retroelements no. 21 and 33) contained phage integrase genes. We also found 1 retron and 1 DGR inserted at highly variable loci without target site duplications (data not shown). These loci were likely hot spots for foreign DNA insertions, and the integration was independent of retroelement machineries.

We characterized putative retroelements belonging to groups F1 (no. 26) and F2 (no. 30) as elements that are a part of phage integrations and those belonging to groups D (no. 21), E (no. 23), F1 (no. 27), F2 (no. 32), and G (no. 33) as elements that are a part of strain-specific inserts at highly variable loci. These integration positions are similar to those of retrons and DGRs. These integrations indicated that these putative retroelements do not possess integration mechanisms and their passive integration into the genomes of either phages or host bacteria. We could not identify the boundaries of any putative retroelements encoding cas1 (no. 36–43).

The phylogeny, patchy distribution, and recent integration into genomes strongly imply that these RT genes and their neighboring genes of groups D, E, F1, F2, and G cannot be separated and are transferred together. Therefore, it is reasonable to consider groups D, E, F1, F2, and G as additional groups of prokaryotic retroelements, although additional sequence information and experiments are required to confirm the mobility of these elements. Putative retroelements encoding cas1 do not appear to be typical retroelements, but frequent coupling between RT and cas1 genes suggests advantages conferred by this coupling. We will hereafter discuss the possible functions of genes encoded by each retroelement group and cas1-encoding putative retroelements.

DNA Polymerase A (Group D)
The downstream genes of RT genes in retroelements no. 20–22 contained the motif of DNA polymerase A (CDD search E value, 9 x 10–21) (figs. 2A and 3Go). The DNA polymerase A domain is observed in prokaryotic DNA polymerase I, mitochondrial DNA polymerase gamma, and T-odd bacteriophage DNA polymerase (Patel et al. 2001Go). Our sequence alignments and phylogenetic analysis showed that DNA polymerase A genes flanked by RT genes constitute an independent lineage distinct from other DNA polymerases and are most related to DNA polymerase I (fig. 3; data not shown). The phylogeny of DNA polymerase A was consistent with that of RT (data not shown).


Figure 2
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Schematic structures of retroelements. Each retroelement is represented by the number shown in table 1. Protein-coding genes are shown as boxed arrows. RT genes are colored with black, and genes associated with RT are shaded.

 

Figure 3
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Alignment of DNA polymerase A conserved motifs. Asterisks indicate residues invariant among all DNA polymerase A. Plus symbols indicate residues invariant among group D and DNA polymerase I, whereas minus symbols indicate residues invariant among all DNA polymerase A other than group D. NAP1, Erythrobacter sp. NAP1; 1448A, Pseudomonas syringae pv. phaseolicola 1448A; CJ2, Polaromonas naphthalenivorans CJ2; K12, Esherichia coli K12; and Yeast, Saccharomyces cerevisiae.

 
DNA polymerase I contains 3 domains—5'–3' exonuclease, 3'–5' exonuclease, and DNA polymerase A—whereas retroelement DNA polymerase A includes only a DNA polymerase A domain (data not shown). Because the 3'–5' exonuclease removes mismatched nucleotides, retroelement DNA polymerase is likely to generate more substitutions than DNA polymerase I. Another finding also implied that the retroelement DNA polymerases allow many substitutions. The 2 invariant residues (N and D) in conserved region 6 of the DNA polymerase A domain are substituted in retroelement DNA polymerases (fig. 3, minus symbols). The mutation of the asparagine residue in Escherichia coli DNA polymerase I did not affect its RT activity but resulted in low fidelity (Minnick et al. 1999Go).

The possible function of retroelement DNA polymerase A is its involvement in the synthesis of the second-strand cDNA after reverse transcription. Second-strand cDNA synthesis is a DNA-dependent DNA polymerization reaction as against that of first-strand cDNA synthesis, which is an RNA-dependent DNA polymerization reaction. Although retroviral RT can use DNA as a template and actually synthesizes double-stranded cDNA in vivo (Coffin et al. 1997Go), the RT encoded by group II introns was reported to have very weak activity with DNA templates (Smith et al. 2005Go). Retroelements that encode DNA polymerase have the advantage of being able to synthesize stable double-stranded cDNA rapidly.

Bacterial Primase (Group E)
Three RT genes (no. 23–25) were positioned between bacterial primase genes and hydrolase genes (fig. 2B). Three host organisms had canonical primase genes in addition to the retroelement primase genes (supplementary fig S1, Supplementary Material online). The retroelement primases could supply RNA primers for reverse transcription. Retroelements prepare primers in various ways, even though RT is potentially able to initiate DNA polymerization without primers (Wang and Lambowitz 1993Go). The retrovirus Rous sarcoma virus uses cellular tRNAs (Harada et al. 1975Go), the long terminal repeat (LTR) retrotransposon Tf1 uses the 3' end of template RNA (Levin 1995Go), and the non-LTR retrotransposon R2 uses the 3' end of nicked target DNA as primers (Luan et al. 1993Go). Protein priming and 2'-OH usage have also been reported (Wang and Seeger 1992Go; Lampson et al. 2005Go). Group E retroelements can prepare primers efficiently.

Hydrolase (Groups E, F1, and F2)
As described above, group E retroelements included predicted hydrolase genes (fig. 2B). In addition, we found 7 RT genes fused with hydrolase genes (table 1, no. 26–32). These hydrolase genes belonged to the same protein family, that is, the carbon–nitrogen hydrolase family Pfam00795, which includes nitrilase, aliphatic amidase, biotinidase, and carbamylase (Bork and Koonin 1994Go; Pace and Brenner 2001Go). This hydrolase family has been classified into 13 branches (fig. 4), and the substrates of 9 branches have been characterized. These proteins hydrolyze carboxyamides (-CO-NH2), internal amide bonds (-CO-NH-) between methylene (-CH2-) chains, terminal carbamyl groups (-NH-CO-NH2), and cyano groups (-CN); and some have reverse hydrolase activities. Their functions are not restricted to metabolic pathways. Some are considered to control protein degradation (N-terminal amidase), to suppress tumors (Nit and NitFhit), and to modify proteins posttranslationally (NB12). Because no close relationships were observed between retroelement hydrolases and a certain branch of characterized hydrolases (fig. 4), the functions of these proteins can only be supposed on the basis of the general features of this carbon–nitrogen hydrolase family. The substrate specificities of these proteins indicate that peptide bonds are not suitable substrates for this hydrolase family. It is unlikely that retroelement hydrolases function in metabolic pathways. They may function in posttranslational protein modification or in signal cascades by catalyzing the hydrolysis of amide bonds in order to regulate protein degradation.


Figure 4
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Alignment of catalytic motifs of carbon–nitrogen hydrolase family. Sequences of all retroelements and consensus sequences of all 13 branches are shown. Uppercases and lowercases in consensus sequences indicate residues conserved among >80% and >50%, respectively, of sequences of each branch, and dots show sites with low conservation. Asterisks indicate 3 catalytic invariant residues.

 
RT (Group G)
Retroelements no. 33–35 comprised 2 RT genes (fig. 2C). The sequences of the upstream RT and the downstream RT genes were distinct, and therefore, did not result from the tandem duplication of 1 RT gene. These RTs possibly coordinate their functions during replications. Considering that the dimerization of RT proteins is observed in retroelements such as retroviruses and non-LTR retrotransposons (Coffin et al. 1997Go; Christensen and Eickbush 2005Go), it is possible that 2 RT proteins encoded by group G retroelements form a heterodimer.

Cas1 (Group H1, Singletons H2 and H3, and Doublet H4)
The 3 types of analyses performed in this study did not indicate that cas1-associated RT genes are mobile elements. We consider that these RT genes are accessory components of the CRISPR–Cas system, an acquired immune system in prokaryotes, because the cas1 gene is an essential component of this system (Makarova et al. 2006Go; Barrangou et al. 2007Go). The CRISPR–Cas system is composed of 3 essential components and many other nonessential components. The 3 essential components are 2 genes (cas1 and cas2) and unusual repeats named CRISPRs. A CRISPR element consists of a direct repeat of ~28 to 40 bp with the copies separated by a unique spacer of ~25 to 40 bp. A CRISPR–Cas system in Streptococcus thermophilus was revealed to provide immunity against DNA phages that have sequences identical to the spacers in the system (Barrangou et al. 2007Go). One of the characteristics of the CRISPR–Cas system is its heterogeneity. More than 20 protein families are reported as components of the CRISPR–Cas system; however, only 2 components (cas1 and cas2) are common to almost all characterized CRISPR–Cas systems. In addition, the gene order in this system is highly variable. These facts indicate the redundancy of nonessential components and the existence of accessory pathways.

We observed that both cas2 genes and CRISPRs were located near the putative retroelements no. 36–39 and 41–43 and found CRISPRs near element no. 40 (supplementary fig S2, Supplementary Material online). Because more than 90 CRISPRs have been reported from bacteria and archaea (Makarova et al. 2006Go), and most of them do not have RT genes in the neighborhood, RT activity is unnecessary for CRISPR formation. This means that RT genes are nonessential components of the CRISPR–Cas system.

One possible function of RT in the CRISPR–Cas system is synthesizing the cDNA of RNA phage genomes. CRISPR–Cas systems without RT genes can synthesize new spacers derived from DNA phages (Barrangou et al. 2007Go). If the RT in the CRISPR–Cas system synthesizes the cDNA of RNA phage genomes, other components of the system can insert new spacers derived from RNA phages as well as DNA phages. If this is true, the RT of CRISPR–Cas system can provide immunity against RNA phages, which could be advantageous. In order to investigate this possibility, we searched spacers derived from RNA phages but we could identify no spacers derived from mobile genetic elements (data not shown). This is probably because there is little sequence data on phages infecting bacteria in whose genomes, we found RT genes with the CRISPR–Cas system. The actual function of RT in the CRISPR–Cas system remains to be elucidated.

Novel Types of Prokaryotic Retroelements
We identified 5 groups of prokaryotic retroelements (groups D, E, F1, F2, and G). The gene combination of these prokaryotic retroelements has never been found in eukaryotic retroelements. This indicates that retroelements are faced with different constraints between prokaryotes and eukaryotes. Prokaryotic genomes are much more compact than eukaryotic genomes, and therefore, retrotransposons, which are the most common type of retroelements in eukaryotes, could not survive for a long time. In fact, we found no multicopy RT sequences other than group II introns, and the highest copy number of group II introns was 11.

In eukaryotes, hepadnaviruses, caulimoviruses, and mitochondrial retroplasmids do not need to be integrated into host genomes for their replication and are maintained extrachromosomally (Coffin et al. 1997Go; Eickbush and Malik 2002Go). They encode neither integrase nor endonuclease. The 5 prokaryotic retroelement groups found in this study do not encode enzymes responsible for integration. On the basis of their patchy distribution and their passive integration into genomes, we consider that the prokaryotic retroelements identified in this study exist mainly as extrachromosomal DNA or RNA. Their extrachromosomal existence explains why we were able to find only a few examples of each retroelement group. These prokaryotic retroelements can exist as short single-stranded RNA, single-stranded DNA, double-stranded DNA, or RNA–DNA hybrids. It is possible that many extrachromosomal prokaryotic retroelements remain to be discovered.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary tables S1 and S2 and figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
This work was supported by grants from the Ministry of Education, Culture, Sports, Science, Technology and the Japan Science and Technology Agency. The computational resource was provided by the Bioinformatics Center, Institute for Chemical Research, Kyoto University. K.K.K. is the recipient of a Grant-in-Aid from the Japan Society for the Promotion of Science for Young scientists.


    Footnotes
 
Harvé Philippe, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.[Abstract/Free Full Text]

    Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. CRISPR provides acquired resistance against viruses in prokaryotes. Science (2007) 315:1709–1712.[Abstract/Free Full Text]

    Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature (2006) 441:87–90.[CrossRef][Medline]

    Blackburn EH. The end of the (DNA) line. Nat Struct Biol (2000) 7:847–850.[CrossRef][Web of Science][Medline]

    Bork P, Koonin EV. A new family of carbon-nitrogen hydrolases. Protein Sci (1994) 3:1344–1346.[Web of Science][Medline]

    Chen JM, Stenson PD, Cooper DN, Ferec C. A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet (2005) 117:411–427.[CrossRef][Web of Science][Medline]

    Christensen SM, Eickbush TH. R2 target-primed reverse transcription: ordered cleavage and polymerization steps by protein subunits asymmetrically bound to the target DNA. Mol Cell Biol (2005) 25:6617–6628.[Abstract/Free Full Text]

    Coffin JM, Hughes SH, Varmus HE. Retroviruses (1997) Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press.

    Cousineau B, Lawrence S, Smith D, Belfort M. Retrotransposition of a bacterial group II intron. Nature (2000) 404:1018–1021.[CrossRef][Medline]

    Deininger PL, Moran JV, Batzer MA, Kazazian HH Jr. Mobile elements and mammalian genome evolution. Curr Opin Genet Dev (2003) 13:651–658.[CrossRef][Web of Science][Medline]

    Dewannieux M, Esnault C, Heidmann T. LINE-mediated retrotransposition of marked Alu sequences. Nat Genet (2003) 35:41–48.[CrossRef][Web of Science][Medline]

    Doulatov S, Hodes A, Dai L, Mandhana N, Liu M, Deora R, Simons RW, Zimmerly S, Miller JF. Tropism switching in Bordetella bacteriophage defines a family of diversity-generating retroelements. Nature (2004) 431:476–481.[CrossRef][Medline]

    Eickbush TH, Malik HS. Origins and evolution of retrotransposons. In: Mobile DNA II—Craig NL, Craigie R, Gellert M, Lambowitz AM, eds. (2002) Washington (DC): American Society of Microbiology Press. 1111–1144.

    Esnault C, Maestre J, Heidmann T. Human LINE retrotransposons generate processed pseudogenes. Nat Genet (2000) 24:363–367.[CrossRef][Web of Science][Medline]

    Goodwin TJ, Poulter RT. The DIRS1 group of retrotransposons. Mol Biol Evol (2001) 18:2067–2082.[Abstract/Free Full Text]

    Harada F, Sawyer RC, Dahlberg JE. A primer ribonucleic acid for initiation of in vitro Rous sarcarcoma virus deoxyribonucleic acid synthesis. J Biol Chem (1975) 250:3487–3497.[Abstract/Free Full Text]

    Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol (2006) 6:29.[CrossRef][Medline]

    Lampson BC, Inouye M, Inouye S. Retrons, msDNA, and the bacterial genome. Cytogenet Genome Res (2005) 110:491–499.[CrossRef][Web of Science][Medline]

    Lander ES, Linton LM, Birren B, et al, (100 co-authors). Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]

    Levin HL. A novel mechanism of self-primed reverse transcription defines a new family of retroelements. Mol Cell Biol (1995) 15:3310–3317.[Abstract]

    Luan DD, Korman MH, Jakubczak JL, Eickbush TH. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell (1993) 72:595–605.[CrossRef][Web of Science][Medline]

    Lyozin GT, Makarova KS, Velikodvorskaja VV, Zelentsova HS, Khechumian RR, Kidwell MG, Koonin EV, Evgen'ev MB. The structure and evolution of Penelope in the virilis species group of Drosophila: an ancient lineage of retroelements. J Mol Evol (2001) 52:445–456.[Web of Science][Medline]

    Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct (2006) 1:7.[CrossRef][Medline]

    Marchler-Bauer A, Anderson JB, DeWeese-Scott C, et al, (27 co-authors). CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res (2003) 31:383–387.[Abstract/Free Full Text]

    Minnick DT, Bebenek K, Osheroff WP, Turner RM Jr., Astatke M, Liu L, Kunkel TA, Joyce CM. Side chains that influence fidelity at the polymerase active site of Escherichia coli DNA polymerase I (Klenow fragment). J Biol Chem (1999) 274:3067–3075.[Abstract/Free Full Text]

    Moran JV, DeBerardinis RJ, Kazazian HH Jr. Exon shuffling by L1 retrotransposition. Science (1999) 283:1530–1534.[Abstract/Free Full Text]

    Morrish TA, Gilbert N, Myers JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer MA, Moran JV. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet (2002) 31:159–165.[CrossRef][Web of Science][Medline]

    Pace HC, Brenner C. The nitrilase superfamily: classification, structure and function. Genome Biol (2001) 2:REVIEWS0001.[Medline]

    Patel PH, Suzuki M, Adman E, Shinkai A, Loeb LA. Prokaryotic DNA polymerase I: evolution, structure, and "base flipping" mechanism for nucleotide selection. J Mol Biol (2001) 308:823–837.[CrossRef][Web of Science][Medline]

    Poulter RT, Goodwin TJ. DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet Genome Res (2005) 110:575–588.[CrossRef][Web of Science][Medline]

    Pyatkov KI, Arkhipova IR, Malkova NV, Finnegan DJ, Evgen'ev MB. Reverse transcriptase and endonuclease activities encoded by Penelope-like retroelements. Proc Natl Acad Sci USA (2004) 101:14719–14724.[Abstract/Free Full Text]

    Rychlik I, Sebkova A, Gregorova D, Karpiskova R. Low-molecular-weight plasmid of Salmonella enterica serovar Enteritidis codes for retron reverse transcriptase and influences phage resistance. J Bacteriol (2001) 183:2852–2858.[Abstract/Free Full Text]

    Smith D, Zhong J, Matsuura M, Lambowitz AM, Belfort M. Recruitment of host functions suggests a repair pathway for late steps in group II intron retrohoming. Genes Dev (2005) 19:2477–2487.[Abstract/Free Full Text]

    van de Lagemaat LN, Landry JR, Mager DL, Medstrand P. Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet (2003) 19:530–536.[CrossRef][Web of Science][Medline]

    Volff JN, Hornung U, Schartl M. Fish retroposons related to the Penelope element of Drosophila virilis define a new group of retrotransposable elements. Mol Genet Genomics (2001) 265:711–720.[CrossRef][Web of Science][Medline]

    Wang GH, Seeger C. The reverse transcriptase of hepatitis B virus acts as a protein primer for viral DNA synthesis. Cell (1992) 71:663–670.[CrossRef][Web of Science][Medline]

    Wang H, Lambowitz AM. The Mauriceville plasmid reverse transcriptase can initiate cDNA synthesis de novo and may be related to reverse transcriptase and DNA polymerase progenitor. Cell (1993) 75:1071–1081.[CrossRef][Web of Science][Medline]

    Zimmerly S, Guo H, Perlman PS, Lambowitz AM. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell (1995) 82:545–554.[CrossRef][Web of Science][Medline]

    Zimmerly S, Hausner G, Wu X. Phylogenetic relationships among group II intron ORFs. Nucleic Acids Res (2001) 29:1238–1250.[Abstract/Free Full Text]

Accepted for publication April 2, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
MicrobiologyHome page
F. J. M. Mojica, C. Diez-Villasenor, J. Garcia-Martinez, and C. Almendros
Short motif sequences determine the targets of the prokaryotic CRISPR defence system
Microbiology, March 1, 2009; 155(3): 733 - 740.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. M. Simon and S. Zimmerly
A diversity of uncharacterized reverse transcriptases in bacteria
Nucleic Acids Res., December 1, 2008; 36(22): 7219 - 7229.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/7/1395    most recent
msn081v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Kojima, K. K.
Right arrow Articles by Kanehisa, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kojima, K. K.
Right arrow Articles by Kanehisa, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?