MBE Advance Access originally published online on March 10, 2006
Molecular Biology and Evolution 2006 23(6):1097-1100; doi:10.1093/molbev/msj122
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
Two Families of Rep-Like Genes That Probably Originated by Interspecies Recombination Are Represented in Viral, Plasmid, Bacterial, and Parasitic Protozoan Genomes




* School of Botany and Zoology, The Australian National University, Canberra, Australia;
Food Science, University of Wisconsin at Madison;
The Australian Centre for International and Tropical Health and Nutrition, Queensland Institute of Medical Research, Brisbane, Queensland, Australia; and
Department of Microbiology, Russian State Medical University, Moscow, Russia
E-mail: mark.gibbs{at}anu.edu.au.
| Abstract |
|---|
|
|
|---|
Two families of genes related to, and including, rolling circle replication initiator protein (Rep) genes were defined by sequence similarity and by evidence of intergene family recombination. The Rep genes of circoviruses were the best characterized members of the "RecRep1 family." Other members of the RecRep1 family were Rep-like genes found in the genomes of the Canarypox virus, Entamoeba histolytica, and Giardia duodenalis and in a plasmid, p4M, from the Gram-positive bacterium, Bifidobacterium pseudocatenulatum. The "RecRep2 family" comprised some previously identified Rep-like genes from plasmids of phytoplasmas and similar Rep-like genes from the genomes of Lactobacillus acidophilus, Lactococcus lactis, and Phytoplasma asteris. Both RecRep1 and RecRep2 proteins have a nucleotide-binding domain significantly similar to the helicases (2C proteins) of picorna-like viruses. On the N-terminal side of the nucleotide binding domain, RecRep1 proteins have a domain significantly similar to one found in nanovirus Reps, whereas RecRep2 proteins have a domain significantly similar to one in the Reps of pLS1 plasmids. We speculate that RecRep genes have been transferred from viruses or plasmids to parasitic protozoan and bacterial genomes and that Rep proteins were themselves involved in the original recombination events that generated the ancestral RecRep genes.
Key Words: interspecies recombination gene family circovirus RecRep1 pLS1 plasmid parasitic protozoan
Protein families often have complex origins and histories. Genes are created, duplicated, and lost, and similarly, gene subsequences that encode protein domains are duplicated, reordered, and lost. Operons and complete genes are also exchanged between different species by horizontal gene transfer or interspecies recombination. Until recently, there was no evidence that sequences that encoded only part of a protein, rather than a complete protein, could be exchanged between organisms to create new genes and, hence, proteins. Gibbs and Weiller (1999)
reported that the replication initiator proteins (Reps) of circoviruses contained evidence of this process. These protein sequences had an N-terminal part (115 aa out of 312 aa) related to the Reps of nanoviruses and a C-terminal part (125 aa) related to the 2C proteins of picorna-like viruses. Nanoviruses and circoviruses have similar circular single-stranded DNA genomes (Meehan et al. 1997
), but picorna-like viruses have linear single-stranded RNA genomes and do not produce a DNA form at any stage of their replication. Picorna-like viruses are not related in any way to nanoviruses and only through the 2C protein sequence to circoviruses. Hence, the circovirus Rep genes appeared to have originated through recombination, combining gene segments from unrelated viruses. In another report, Oshima et al. (2001)
characterized a plasmid, pOYW, from the onion yellows phytoplasma, that had a Rep-like gene. The N-terminal half of the protein sequence (192 aa out of 377 aa) was related to the Rep proteins of pLS1 family plasmids and a C-terminal region (100 aa) was related to the Rep genes of circoviruses and the helicases of picorna-like viruses.
The Reps of circoviruses, nanoviruses, and the pLS1 family plasmids are probably functionally similar (Gruss and Ehrlich 1989
; Hafner et al. 1997
; del Solar et al. 1998
, Cheung 2004
). They initiate rolling circle replication by binding at an origin of replication sequence, catalyzing a break (nick) in the plus strand, from which a host-encoded DNA polymerase extends to copy the complementary circle. The Rep probably becomes linked to the nicked DNA and cuts and ligates the copied DNA to reform single-stranded circles. Since the initial analyses of the circovirus and pOYW Reps, the genomes of many more organisms have been sequenced. Here we report the presence of related Rep-like genes in the genomes of a disparate set of organisms and plasmids. We have analyzed the relationships of the newly discovered genes and found that they belong to two families with unusual recombinant relationships; we have also identified conserved motifs in the proteins they encode.
We identified Rep-like genes by searching the GenBank databases (releases from 2000 to mid 2005) using the BlastP and PSI-Blast programs to detect protein sequences translated from the genes (Altschul et al. 1997
). The circovirus and pOYW Rep proteins were used as the query sequences, as were the protein sequences translated from each of the new Rep-like genes as they were identified. From these searches, 11 new Rep-like genes were identified (table 1). Related sequences detected in these searches included the 174 known circovirus Reps and four known Rep-like genes in plasmids from phytoplasma closely similar to pOYW. The new Rep-like genes included three apparently complete genes in each of the genomes of Entamoeba histolytica and Lactobacillus acidophilus. Single, apparently complete, Rep-like genes were also found in the genomes of Giardia duodenalis, Lactococcus lactis, Phytoplasma asteris, and the Canarypox virus and in a small double-stranded DNA plasmid, p4M, from B. pseudocatenulatum. Other open reading frames, five in all, were also detected that were most closely related to the complete new Rep-like genes but that lacked long regions of sequence at the 5' or 3' end. Such apparently truncated genes were found in all the listed genomes and one of the plasmids from a phytoplasma.
|
Expect values (E values) obtained from BlastP database searches of the complete "nr" database (January 2005) were used to estimate the significance of similarities between the protein sequences translated from the Rep-like genes. This process was used to assess if the proteins, and therefore the genes, were homologues (Brenner, Chothia, and Hubbard 1998
|
Evidence of recombination was detected by searching for matching conserved domains using the Conserved Domains database (Marchler-Bauer and Bryant 2004
|
N-terminal sequences provided a different picture of distinct affinities. Significant similarities were detected between an N-terminal region (95110 aa long) of the RecRep1 proteins and the conserved domain pfam02407 (table 3). Pfam02407 is comprised of Reps from nanoviruses and other related viruses in the Nanoviridae. Significant matches were also identified with individual nanovirus and circovirus Reps across the same N-terminal region. By contrast, when searches were made with N-terminal regions of RecRep2 proteins, no homology to pfam02407, pfam00910, or individual nanovirus, circovirus, or picorna-like virus sequences were detected. Instead, significant matches were detected between an N-terminal region (110130 aa long) and pfam01719 and with individual Reps from pLS1 family plasmids. Pfam01719 is comprised of replication proteins of the pLSI (type 2) family of plasmids from Gram-positive bacteria.
Alignments of the protein sequences translated from the RecRep1 and RecRep2 genes (see Figures 1 and 2, Supplementary Material online) made using T-Coffee (Notredame, Higgins, and Heringa 2000
) supported the evidence of recombination and were consistent with the groupings indicated by the Blast analysis. The alignments showed that all the protein sequences included an NTP-binding site (Walker A motif) within the C-terminal domain (Walker et al. 1982
). Both families also appeared to have the "Walker B" motif, although some lacked the residue that starts the motif, that is, arginine or lysine, and the spacing of the elements of the motif was inconsistent. Ilyina and Koonin (1992)
reported three other motifs common to plasmids and some DNA viruses that replicate by the rolling circle mechanism, and it was widely believed that the circovirus and nanovirus Reps had the three motifs (Hafner et al. 1997
; Mankertz and Hillenbrand 2001
). Our alignments of the RecRep1 family proteins, including the circovirus Reps, showed that only the last of the three motifs was present, generally as "EYCSKE." The regions that were previously identified as matching motifs deviated substantially from the motif descriptions. Nanovirus Reps also lack the first two motifs. Oshima et al. (2001)
noted that the pOYW Rep-like gene encoded a protein with the motifs described by Ilyina and Koonin (1992)
. We found that only the second and third of the three motifs were conserved in the RecRep2 proteins. We did identify some other conserved amino acid motifs. They included a motif of the form "(I/L)H(D/N)KD" that lay on the N-terminal side of the two-His motif in the RecRep2 family and two moderately conserved motifs in the N-terminal domain of the RecRep1 family, that is, "(K/R)RWxFT(I/L)NN" and "IxGxEx45TPHLQG."
The distribution of RecRep genes in a variety of different plasmids and viruses and the existence of domain homologues in other extrachromosomal elements suggests that the RecRep gene lineages originated in such elements. The presence of RecRep genes in the genomes of a small number of disparate cellular organisms is probably explained by their spread with extrachromosomal elements, followed by integration. It seems likely that the RecRep proteins were themselves involved in the integration events as they probably have DNA binding, cutting, and ligating activity. It is also possible that Rep proteins had some role in the recombination events that generated the first ancestral RecRep1 and RecRep2 genes. There are, of course, alternatives to all these speculations about origins, and we are unsure about the functions of most of the Rep-like proteins. It is surprising to find representatives of these gene families in the genomes of parasitic protozoans. No viruses or plasmid-like DNAs from the groups mentioned here have been found associated with protozoans, although the presence of the RecRep genes suggests that the protozoans have been exposed to extrachromosomal elements that carry them. A search for such elements should now be made.
| Supplementary Material |
|---|
|
|
|---|
Supplementary Figures 1 and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Footnotes |
|---|
Laura Katz, Associate Editor
| References |
|---|
|
|
|---|
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:33893402.
Brenner, S. E., C. Chothia, and T. J. Hubbard. 1998. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95:60736078.
Cheung, A. K. 2004. Detection of template strand switching during initiation and termination of DNA replication of porcine circovirus. J. Virol. 78:42684277.
del Solar, G., R. Giraldo, M. J. Ruiz-Echevarria, M. Espinosa, and R. Diaz-Orejas. 1998. Replication and control of circular bacterial plasmids. Microbiol. Mol. Biol. Rev. 62:434464.
Gibbs, M. J., and G. F. Weiller. 1999. Evidence that a plant virus switched hosts to infect a vertebrate and then recombined with a vertebrate-infecting virus. Proc. Natl. Acad. Sci. USA 96:80228027.
Gruss, A., and S. D. Ehrlich. 1989. The family of highly interrelated single-stranded deoxyribonucleic acid plasmids. Microbiol. Rev. 53:231241.
Hafner, G. J., M. R. Stafford, L. C. Wolter, R. M. Harding, and J. L. Dale. 1997. Nicking and joining activity of banana bunchy top virus replication protein in vitro. J. Gen. Virol. 78:17951799.[Abstract]
Ilyina, T. V., and E. V. Koonin. 1992. Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res. 20:32793285.
Korf, I., M. Yandell, and J. Bedell. 2003. BLAST: an essential guide to the basic alignment search tool. O'Reilly and Associates, Sebastopol, Calif.
Mankertz, A., and B. Hillenbrand. 2001. Replication of porcine circovirus type 1 requires two proteins encoded by the viral rep gene. Virology 279:429438.[CrossRef][ISI][Medline]
Marchler-Bauer, A., and S. H. Bryant. 2004. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 32:W327W331.
Meehan, B. M., J. L. Creelan, M. S. McNulty, and D. Todd. 1997. Sequence of porcine circovirus DNA: affinities with plant circoviruses. J. Gen. Virol. 78:221227.[Abstract]
Notredame, C., D. G. Higgins, and J. Heringa. 2000. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302:205217.[CrossRef][ISI][Medline]
Oshima, K., S. Kakizawa, H. Nishigawa, T. Kuboyama, S. Miyata, M. Ugaki, and S. Namba. 2001. A plasmid of phytoplasma encodes a unique replication protein having both plasmid- and virus-like domains: clue to viral ancestry or result of virus/plasmid recombination? Virology 285:270277.[CrossRef][Medline]
Walker, J. E., M. Saraste, M. J. Runswick, and N. J. Gay. 1982. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1:945951.[ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||