Skip Navigation


MBE Advance Access originally published online on August 22, 2006
Molecular Biology and Evolution 2006 23(11):2220-2233; doi:10.1093/molbev/msl092
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/11/2220    most recent
msl092v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gray, R.
Right arrow Articles by Centurion-Lara, A
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gray, R.
Right arrow Articles by Centurion-Lara, A
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Molecular Evolution of the tprC, D, I, K, G, and J Genes in the Pathogenic Genus Treponema

RR Gray*, CJ Mulligan*, BJ Molini{dagger}, ES Sun{ddagger}, L Giacani{dagger}, C Godornes{dagger}, A Kitchen*, SA Lukehart{dagger},{ddagger} and A Centurion-Lara{dagger},{ddagger}

* Department of Anthropology, University of Florida
{dagger} Departments of Medicine, University of Washington
{ddagger} Departments of Pathobiology, University of Washington

E-mail: mulligan{at}anthro.ufl.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We investigated the evolution of 6 genes from the Treponema pallidum repeat (tpr) gene family, which encode potential virulence factors and are assumed to have evolved through gene duplication and gene conversion events. The 6 loci (tprC, D, G, J, I, and K) were sequenced and analyzed in several members of the genus Treponema, including the 3 subspecies of human T. pallidum (T. pallidum subsp. pallidum, pertenue, and endemicum), Treponema paraluiscuniculi (rabbit syphilis), and the unclassified Fribourg-Blanc (simian) isolate. Phylogenetic methods, recombination analysis, and measures of nucleotide diversity were used to investigate the evolutionary history of the tpr genes. Numerous instances of gene conversion were detected by all 3 methods including both homogenizing gene conversion that involved the entire length of the sequence as well as site-specific conversions that affected smaller regions. We determined the relative age and directionality of the gene conversion events whenever possible. Our data are also relevant to a discussion of the evolution of the treponemes themselves. Higher levels of variation exist between the human subspecies than within them, supporting the classification of the human treponemes into 3 subspecies. In contrast to published theories, the divergence and diversity of T. pallidum subsp. pertenue relative to the other subspecies does not support a much older origin of yaws at the emergence of modern human, nor is the level of divergence seen in T. pallidum subsp. pallidum consistent with a very recent (<500 years) origin of this subspecies. In general, our results demonstrate that intragenomic recombination has played a significant role in the evolution of the studied tpr genes and emphasize that efforts to infer evolutionary history of the treponemes can be complicated if past recombination events are not recognized.

Key Words: Treponema • tpr genes • phylogeny • gene conversion • evolution • syphilis


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The evolution of bacterial genomes has been heavily influenced by processes such as horizontal gene transfer and homologous recombination, both of which can accelerate adaptation through the generation of new alleles (Feavers et al. 1992Go; Baldo et al. 2006Go). Horizontal (or lateral) gene transfer occurs through the uptake of genetic material from another genome, that is, an intergenomic event, and includes transformation, conjugation, and transduction (Ochman et al. 2000Go). Homologous recombination, which is typically an intragenomic event, also occurs with high frequency in bacterial genomes (Maynard Smith et al. 1991Go; Feil and Spratt 2001Go; Feil et al. 2001Go). Several outcomes may arise from a recombination event, including translocations, deletions, duplications, inversions, and gene conversions (Hughes 2000Go). Gene conversions are intragenomic events that are the result of a nonreciprocal transfer of genetic information from a donor locus to a recipient locus, either through the permanent transfer of genetic material to the recipient locus or through the temporary use of the donor sequence as a template for DNA synthesis on the recipient strand (Santoyo and Romero 2005Go).

Gene conversion is especially important in the evolution of gene families (Slightom et al. 1980Go; Drouin et al. 1999Go; Lathe and Bork 2001Go; Noonan et al. 2004Go). Gene families are composed of paralogous genes, which are defined as two or more genes within the same genome that are so similar in DNA sequence they are assumed to have originated from one ancestral gene (King and Stansfield 1997Go). The initial event creating the gene family was thus likely to be one or more duplication events. The high sequence homology between paralogous genes that signals a past duplication event also sets the stage for potential future homologous recombination events (Schimenti 1994Go; Posada et al. 2002Go). Orthologous genes, on the other hand, share sequence homology and are assumed to be descendant from a common ancestral gene but are present in different species (King and Stansfield 1997Go; Gogarten and Olendzenski 1999Go). In this case, the genes most likely evolved through speciation rather than duplication. Recombination can significantly impact inferred phylogenetic relationships (Feil et al. 1999Go; Holmes et al. 1999Go; Feil and Spratt 2001Go; Worobey 2001Go). In the case of gene families, gene conversion can cause paralogous genes to clade more closely than orthologous genes, thus confusing the order of evolution of the organisms (Drouin et al. 1999Go).

There are 2 seemingly opposite outcomes of gene conversion, concerted evolution and increased sequence diversity, which may be distinctive of different stages of multigene evolution (Santoyo and Romero 2005Go). After a gene family has been generated by ancient duplication events, paralogous and orthologous comparisons should exhibit the same degree of divergence. If the paralogous comparisons are more similar, then the genes in a multigene family are evolving in a nonindependent manner leading to homogenization of the genes, or concerted evolution (Ohta 1990Go; Howell-Adams and Seifert 2000Go; Liao 2000Go; Lathe and Bork 2001Go). This may be beneficial in the case where a weakly advantageous point mutation arises in one gene, and its effect is multiplied when the entire gene sequence is converted to other loci (Dover 2002Go). This is consistent with the proposal that purifying selection may operate on genes that have undergone duplication on the assumption that a duplicated gene must have an initial benefit for the organism, and thus, its sequence must be conserved (Lynch and Conery 2000Go; Kondrashov et al. 2002Go). As the sequences accumulate neutral diversity, though, the process of gene conversion becomes less efficient. After time, only small "islands" of homology exist and a site-specific system of shorter regions of gene conversion may take over, the outcome of which is increased sequence variation (Zhang et al. 1992Go, 1997Go; Zhang and Norris 1998Go; Santoyo and Romero 2005Go; Taguchi et al. 2005Go). This is consistent with Ohno (1970)Go, who suggested that duplicated genes are under less selective pressure and may accumulate more mutations leading to loss of the paralog or creation of a new function (Kimura and King 1979Go; Walsh 1995Go; Wagner 1998Go; Lynch and Force 2000Go). Thus, concerted evolution and increased sequence diversity may indicate earlier and later stages, respectively, in the evolution of gene families (Santoyo and Romero 2005Go).

In this study, we examine genes in the tpr (Treponema pallidum repeat) gene family in members of the genus Treponema (Spirochete family of bacteria) to investigate the evolution of the gene family and, possibly, evolution of the treponemes themselves. The tpr gene family consists of 12 paralogous genes that comprise 2% of the T. pallidum genome and have probably evolved through gene duplication and gene conversion. These genes are related to the major outer sheath protein in Treponema denticola (TDE0405); however, it appears that T. denticola did not experience a history of gene duplication and gene conversion at this locus because T. denticola possesses only 1 tpr-like gene (Seshadri et al. 2004Go). The tpr gene family in T. pallidum is believed to encode potential virulence factors and is divided into 3 families: Subfamily I (tprC, D, I, and F), Subfamily II (tprE, G, and J), and Subfamily III (tprA, B, H, K, and L). The gene products from Subfamilies I and II have conserved amino and carboxyl terminal sequences with unique central regions, whereas Subfamily III has scattered conserved and unique or variable regions (Centurion-Lara et al. 1999Go). Gene conversion has previously been reported in tprK (Centurion-Lara, Godornes, et al. 2000Go; Centurion-Lara et al. 2004Go). Seven variable regions within tprK were proposed to have been created by gene conversion using sequences from the flanking regions of tprD as donors (Centurion-Lara et al. 2004Go). The degree of diversity in these variable regions appears to increase in the presence of adaptive immune pressure, suggesting that a function of these gene conversions may be to create antigenic diversity (Centurion-Lara et al. 2004Go).

The pathogenic treponemes include 3 T. pallidum subspecies, Treponema carateum, Treponema paraluiscuniculi (rabbit syphilis), and the unclassified Fribourg-Blanc (simian) isolate. The 3 T. pallidum subspecies include pallidum, which is the causative agent of human venereal syphilis, and pertenue and endemicum, which cause yaws and bejel, respectively. Treponema carateum is the etiological agent of pinta, although no isolates of this organism are known to exist. None of the pathogenic treponemes mentioned above can be propagated in vitro. The complete T. pallidum subsp. pallidum genome (from the Nichols strain) was sequenced in 1998 and is considered the reference strain (Fraser et al. 1998Go). Treponema denticola, considered a nonpathogenic treponeme, probably had an ancient divergence with T. pallidum based on the large difference in GC content between T. pallidum and T. denticola (52.8% and 37.9%, respectively) and in genome length (1.14 Mb and 2.84 Mb, respectively) (Seshadri et al. 2004Go), and thus, the T. denticola sequence was not considered in this study. Although lateral gene transfer has been identified as a probable evolutionary force in the genome of T. denticola, no evidence exists for lateral gene transfer in T. pallidum (Seshadri et al. 2004Go).

In this project, we examined 8 strains of T. pallidum subsp. pallidum and 2 strains each of T. pallidum subsp. pertenue and T. pallidum subsp. endemicum, representing all known propagated human strains (2 additional T. pallidum subsp. pertenue strains have recently been obtained and are under study) as well as 2 nonhuman strains, T. paraluiscuniculi and the simian isolate. Six tpr genes, representing all 3 subfamilies, were sequenced: tprC, D, G, J, I, and K. In order to investigate the evolution of these tpr genes, we utilized phylogenetic methods, general measures of nucleotide diversity, and specific methods to detect recombination events.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Treponemal Strains and tpr Sequencing
All treponemal isolates used in this study were propagated in New Zealand White rabbits (Lukehart et al. 1980Go) with the approval of the University of Washington Institutional Animal Care and Use Committee. The Fribourg-Blanc strain was isolated from the popliteal lymph node of a baboon from a yaws-endemic area (Fribourg-Blanc and Mollaret 1969Go); a single report describes an experimental infection of humans with this strain (Smith et al. 1971Go). Strain designations and origins of the isolates are indicated in table 1. Organisms were extracted by mincing infected testicular tissue in 0.9% saline and were quantitated by dark-field microscopy. Treponemal suspensions were mixed with an equal volume of 2x DNA lysis buffer (20mM Tris, pH 8; 0.2 M ethylenediaminetetraacetic acid, pH 8; 1.0% sodium dodecyl sulfate). DNA from treponemes was extracted as previously described (LaFond et al. 2003).


View this table:
[in this window]
[in a new window]

 
Table 1 Treponema Isolates Used in This Study

 
Full-length open reading frames (ORFs) of 1791–2268 bp (table 2) from each strain were amplified, cloned, and sequenced as previously described (Giacani et al. 2004Go; Sun et al. 2004Go). The ORFs were amplified from T. pallidum strains by polymerase chain reaction (PCR) using primers (table 2) located in the flanking regions of the genes, cloned into the TOPO II vector (Invitrogen, Carlsbad, CA), and sequenced in both directions by the primer walking approach as previously described (Centurion-Lara, Sun, et al. 2000Go); the amplicons at tprG and J from MexicoA were obtained using primers internal to the start and stop codons and contained no flanking sequence. A minimum of 2 clones were sequenced for each amplicon, and ambiguities were resolved by sequencing a third clone from an independent PCR, except for the Gauthier tprG, I, and J ORFs, for which a single clone for each ORF was sequenced in both directions. For most sequences, 5 clones were analyzed. The T. paraluiscuniculi sequences were described previously (Giacani et al. 2004Go). GenBank accession numbers for the sequences are as follows: tprC—NC_000919, AY536645-6, AY550204, AY542157, AY590560, AY550206, AY542153-5, AY685236, and DQ886671-73; tprD—AF217537-41, AF187952, AY685237, AE000520, AY533515, and AY542156; tprI—AY533508-14, NC_000919, DQ886678-82; tprG/tprJ—NC_000919, AF073527, AY685239-40, and DQ886674-77; tprK—NC_000919, AY685248-50, and DQ886683-700.


View this table:
[in this window]
[in a new window]

 
Table 2 Treponema pallidum Primers Used in This Study

 
Evolutionary Analyses of Sequences
Six loci were considered in this analysis: tprC, D, G, I, J, and K. Sequences were aligned using ClustalX (Thompson et al. 1997Go) as well as manually using BioEdit to ensure proper amino acid alignment (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Frameshift mutations in T. paraluiscuniculi (basepair 439 in tprC and tprD, basepair 653 in tprG1 and tprG2) and T. pallidum subsp. pallidum Sea81-4 (basepair 1860 at tprG) were removed from the alignment, as this would have created a misalignment of the amino acids for the rest of the sequence. Levels of nucleotide diversity within and between human treponemal subspecies ({pi} and Dxy, respectively) were calculated using DNAsp v. 4.10.4 (Rozas et al. 2003Go). GC content, using all available tpr sequences from human treponemes (see table 1), was calculated using PAML (Yang 1997Go). An analysis of molecular variance (AMOVA) was performed for tprC, I, and K using Arlequin version 3.0 (Excoffier et al. 2005Go).

Phylogenetic Analyses
Maximum likelihood (ML) methods were used to infer the phylogenetic relationships among the tested loci. First, the most appropriate substitution model for each locus was determined using Modeltest 3.06 (Posada and Crandall 1998Go). The following models were selected for each locus: tprC (without T. paraluiscuniculi)—HKY + I + {Gamma}, which allows for different base frequencies and a separate transition and transversion rate (HKY model; Hasegawa et al. 1985Go), as well as a proportion of invariant sites (I) and a gamma distribution of mutation rates ({Gamma}); tprD—HKY + {Gamma}; tprG/J—general time reversible (GTR) + {Gamma}, which is a GTR model that allows 6 different mutation rate categories as well as a gamma distribution of mutation rates ({Gamma}) (with Nichols J) and HKY + {Gamma} (without Nichols J); tprI—HKY; tprK—HKY + {Gamma}; tprC, D, and I—GTR + {Gamma}. The HKY + {Gamma} model was used for the phylogeny including all 12 Nichols tpr genes to reduce computational time due to the complexity of the data set. An ML phylogeny was inferred using PAUP* 4.0b10 (Swofford 2002Go) and the indicated substitution model. Full heuristic searching with the simple addition of sequences and Tree Bisection-Reconnection branch-swapping algorithms were used to traverse the tree space. Bootstrap analysis (1,000 ML replicates) was performed using PAUP* 4.0b10 to determine the relative support for internal nodes. Third positions were excluded in a separate analysis in order to determine if these positions had been subject to mutational saturation.

Detection of Recombination
The Recombination Detection Program (RDP2) package (Martin et al. 2005Go) was used to detect recombination. This program implements several nonparametric methods to identify recombinant and parental sequences and to estimate break point positions that identify the limits of the recombinant DNA in the sequences (Martin et al. 2005Go). We used 4 methods implemented in the RDP2 program: the RDP method, which is a phylogenetic method that uses discordant branching patterns to infer recombination; the Maximum Chi-squared (MaxChi) method (Smith 1992Go; Posada and Crandall 2001Go), which uses a sliding-window approach along pairwise comparisons to identify discrepancies; the Chimera method (Posada and Crandall 2001Go), which is similar to MaxChi but uses triplets of sequences instead of pairs; and GENECONV, which compares fragments of sequence pairs (Padidam et al. 1999Go). Nondefault settings that were used consisted of a window size of 100, linear sequences, maximum P value of 0.01 or 0.001 and a Bonferroni correction. All events were listed. For the RDP method, internal and external reference sequences were used, the window size was set to 10, and 0–100 sequence identity was used. For both the MaxChi and the Chimera methods, the number of variable sites was set to 30 with 1,000 permutations and a maximum P value of 0.05. For the GENECONV method, the program was set to scan sequence triplets. In all cases, the same alignment files from the phylogenetic analyses were used for the recombination analyses.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We examined 6 genes of the tpr gene family (tprC, D, G, J, I, and K) in 3 human treponemal subspecies (T. pallidum subsp. pallidum, endemicum, and pertenue) and in 2 nonhuman treponemes (T. paraluiscuniculi and the simian isolate) (table 1). We were interested in the relationship of the genes and alleles to one another as well as evidence for recombination. Because of the well-documented evidence for gene conversion in gene families and because no evidence exists for lateral gene transfer in T. pallidum, we were specifically interested in identifying intragenomic recombination events, that is, gene conversion. In order to investigate the evolution of these tpr genes, we utilized 1) phylogenetic methods, 2) specific methods to detect recombination events, and 3) general measures of nucleotide diversity and composition.

Phylogenetic Analyses
In order to obtain an overall view of the genetic diversity at all of the studied loci, a ML tree was created using an alignment of 2,708 nt from all 12 available tpr gene sequences for T. pallidum subsp. pallidum Nichols strain (obtained from GenBank) (fig. 1a). Sequences from Subfamily I (tprC, D, I, and F) and Subfamily II (tprE, J, and G) cluster in 2 separate clades that are each clearly separated from the rest of the phylogeny. In contrast, Subfamily III (tprA, B, H, L, and K) sequences do not cluster with each other or any other sequences and are distributed with varying branch lengths between the Subfamily I and II clades. These results are consistent with previous studies in which Subfamily III membership was less clearly defined than the other subfamilies (Centurion-Lara, Sun, et al. 2000Go).


Figure 1
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— ML phylogenies of multiple tpr loci. The specific tpr locus designation is appended to each strain name. Bootstrap values based on (a) 250 and (b) 1,000 replications are shown next to branches. (a) Unrooted ML phylogeny of 12 Nichols tpr nucleotide sequences based on an alignment of 2708 bp. (b) ML phylogeny of tprC, D, and I nucleotide sequences based on an alignment of 1,797 bp using midpoint rooting. Subspecies designations are indicated by vertical lines. Gray boxes indicate clades in which paralogous sequences group together.

 
Phylogenetic Analyses of Subfamily I
In order to focus on Subfamily I diversity, a ML phylogeny was generated for all available DNA sequences for Subfamily I loci: tprC, D, and I (fig. 1b). All of the tprI sequences clade together, whereas the tprC and D sequences are interspersed with each other such that there are no major monophyletic tprC or D clades. There are 3 instances in which paralogous sequences cluster more closely than their orthologous counterparts, all of which involve tprC and D: 1) The tprC and D sequences for 4 of the T. pallidum subsp. pallidum strains; 2) the tprC and D sequences from pertenue Gauthier (along with SamoaD tprC); and 3) the tprC and D sequences for T. paraluiscuniculi. The 8 pallidum sequences are identical, whereas the pertenue Gauthier and T. paraluiscuniculi tprC and D sequences differ by a maximum of 1 and 3 point mutations, respectively, highlighting the paralogous relationship of tprC and D in these strains.

Individual ML trees were also created for each of the Subfamily I loci examined in this study: tprC, D, and I. In the tprD phylogeny, 2 distinct clades are evident (fig. 2a). One clade comprises 4 identical T. pallidum subsp. pallidum sequences and T. paraluiscuniculi, all of which carry the D2 allele (using terminology of Centurion-Lara, Sun, et al. 2000Go; sequences that differ by a few basepairs but have the same defining motif are considered the same allele). The second clade comprises the other 4 identical T. pallidum subsp. pallidum sequences that carry the D allele and the T. pallidum subsp. pertenue Gauthier strain, which carries the D3 allele that is 95% homologous to the D allele (Centurion-Lara, Sun, et al. 2000Go). Although sequence data are unavailable, PCR analysis suggests that T. pallidum subsp. endemicum and non-Gauthier strains of T. pallidum subsp. pertenue would cluster in the D2 clade (Centurion-Lara, Sun, et al. 2000Go). The D and D2 alleles differ from each other by a 330-bp central region at basepairs 855–1180 and 3 smaller variable regions at basepairs 1275–1306, 1425–1503, and 1569–1626 (relative to the Nichols strain sequence). A contiguous expanse encompassing the 4 variable regions (basepairs 855–1626) was removed, and the new alignment was used to generate a ML tree in which the 8 T. pallidum subsp. pallidum sequences comprise a monophyletic clade (data not shown).


Figure 2
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Unrooted ML phylogenies based on nucleotide sequences of (a) tprD; (b) tprC; (c) tprI. Human subspecies are circled and labeled. Bootstrap values based on 1,000 replications are shown next to branches.

 
An initial tprC phylogeny included all strains (not shown). This phylogeny contained a very long branch leading to T. paraluiscuniculi, which increased the scale by an order of magnitude (data not shown). The long T. paraluiscuniculi branch, along with the paralogous grouping of T. paraluiscuniculi tprC and D sequences in figure 1a, suggested a gene conversion event in T. paraluiscuniculi that replaced the ancestral sequence at tprC with tprD (table 3 summarizes the proposed gene conversion events between tprC and D). The T. paraluiscuniculi sequence was removed and an alternative phylogeny was generated (fig. 2b). In the new phylogeny, the 3 human subspecies cluster separately with strong bootstrap support (94–100%). Simian is contained in the T. pallidum subsp. pertenue clade, although it is distinct from the 2 T. pallidum subsp. pertenue sequences. The T. pallidum subsp. pallidum sequences form 3 well-supported clusters within a monophyletic clade. Four of the T. pallidum subsp. pallidum sequences are identical to each other as well as to the tprD sequences in these same strains and are considered to carry the D allele at both loci (see examples of paralogous sequences clustering above). The tprC alleles in the other 4 T. pallidum subsp. pallidum strains show high sequence homology with the D allele and are labeled D-like alleles (Centurion-Lara et al. 2004Go) (table 3). The fact that there is higher similarity between the paralogous tprC and D sequences in the strains carrying the D allele (they are identical) than between their respective homologs suggests that a gene conversion event has occurred between tprC and tprD in the D allele strains. In this case, tprC appears to be the likely donor because there is detectable homology among all of the subspecies at this locus, whereas the D and D2 alleles differ by a long central variable region. Furthermore, the tprC and tprD sequences in the T. pallidum subsp. pertenue Gauthier strain are identical, suggesting a third gene conversion event, again with the tprC locus serving as the donor due to the detectable homology among the tprC homologs (table 3).


View this table:
[in this window]
[in a new window]

 
Table 3 Polymorphism at the tprC and tprD Loci among Pathogenic Treponemes

 
The tprI phylogeny includes the same isolates as the tprC phylogeny with the exception of T. paraluiscuniculi, which does not have a tprI locus (fig. 2c). The phylogeny for tprI shows a relatively long branch between T. pallidum subsp. pallidum and the other treponemes, similar in length (0.016 substitutions/site) to the corresponding branch in the tprC phylogeny (0.014 substitutions/site). There is 100% bootstrap support for the 2 monophyletic clades consisting of T. pallidum subsp. pallidum (all 8 pallidum sequences are identical) and T. pallidum subsp. endemicum, respectively, moderate support for clustering of simian with T. pallidum subsp. pertenue SamoaD (85%), and little support for a T. pallidum subsp. pertenue + simian clade (62%), although the simian sequence clearly does not belong with the other 2 clades. The tprI phylogeny confirms the close relationship between the unclassified Fribourg-Blanc simian isolate and T. pallidum subsp. pertenue that was also evident in the tprC phylogeny. Phylogenetic clustering of these sequences suggests that there is no strong species boundary, a conclusion that is supported by the fact that the simian treponeme is reported to infect humans (Smith et al. 1971Go).

Phylogenetic Analyses of Subfamily II
The tpr Subfamily II consists of tprE, G, and J. Previous studies have shown that the T. pallidum subsp. pallidum Nichols tprG and J sequences are highly homologous at the 5' and 3' ends, whereas the central regions show extreme divergence (Giacani et al. 2005Go), specifically at 2 variable regions (V1 = sites 976–1510, with a small internal region of homology at sites 1168–1295, and the much smaller V2 = sites 1879–1947) that are unlikely to have evolved through point mutation. Different V1 and V2 sequences are classified as "G" and "J" motifs, which are used to define the G, J, and G/J alleles present at tprG and J loci. The G allele is defined as a "G motif" at V1 and V2, the J allele is defined as a "J motif" at V1 and V2, and the G/J allele is defined as a "G motif" at V1 and a "J motif" at V2. At the tprG locus, analysis of our alignment shows that 2 of the 3 pallidum strains analyzed at this locus (Nichols and Sea81-4) carry the G allele, whereas the other pallidum strain (MexicoA) and the pertenue strain (Gauthier) carry the G/J allele. At tprJ, Nichols and MexicoA carry the J allele, whereas Sea81-4 and pertenue Gauthier carry the G/J allele (data not shown). PCR analysis indicates that the other 5 pallidum strains discussed in this study also carry the J allele at tprJ, although sequence data do not exist for these strains. PCR analysis also indicates that the other T. pallidum subsp. pertenue strain (SamoaD) as well as a T. pallidum subsp. endemicum strain (IraqB) carry the G/J allele (A Centurion-Lara, unpublished data). In T. paraluiscuniculi, the positions corresponding to tprE and J contain 2 almost identical G/J allele sequences that are designated the G2 and G1 alleles, respectively (Giacani et al. 2004Go). In T. paraluiscuniculi G1 and G2 alleles, the second half of V1 is somewhat different than V1 in the human G/J allele, although it is still much more similar to the G/J allele than to the J allele. Furthermore, in T. paraluiscuniculi, tprG has recombined with tprI (Subfamily I) to form a single allele termed the "G/I" hybrid at the tprG locus (Giacani et al. 2004Go).

For our phylogenetic analysis, tprG and J sequences were grouped together because of the high amount of gene conversion at and between these loci (fig. 3). In this phylogeny, a long branch with 100% bootstrap support leads to the Nichols and MexicoA tprJ sequences, whereas the MexicoA tprG, Sea81-4 tprJ, and Gauthier tprG and J form a polytomy (no bootstrap support). The tprG sequences from Nichols and Sea81-4 form a highly supported clade (99%) and are clearly closer to the rest of the sequences than Nichols and MexicoA tprJ. Because the J allele is only found in T. pallidum subsp. pallidum, whereas the G/J allele is found in T. pallidum subsp. pallidum, pertenue, and endemicum and T. paraluiscuniculi, the latter is most likely ancestral. The "J motif" at V2 may be the result of a gene conversion or lateral gene transfer, although no sequence homology was found in a search of the public database. The "G" motif at V2 in the G allele also occurs in Nichols tprE (data not shown) and may represent a small gene conversion event from tprE to tprG that replaced only V2 of the ancestral G/J allele in Nichols and Sea81-4 (although more tprE sequence data are needed to be certain). The Gauthier tprG and J sequences differ by only 2 bp and may also represent a paralogous clustering reflective of a gene conversion event, although the polytomy makes it difficult to be certain. The T. paraluiscuniculi clade is also highly supported (100%), which represents a paralogous clustering of closely related G/J alleles at tprE and J.


Figure 3
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Unrooted ML phylogeny of tprG and tprJ. The specific tpr locus designation is appended to each strain name. Subspecies are circled and labeled. For Treponema paraluiscuniculi, G1 signifies the G/J allele at tprJ and G2 signifies the G/J at tprE. Bootstrap values based on 1,000 replications are shown next to branches.

 
Phylogenetic Analyses of Subfamily III
The tprK phylogeny includes multiple clones from all represented strains because the locus is highly variable and accumulates mutations within a single infection (fig. 4). Seven variable regions have been identified in tprK that are likely the result of gene conversion events, with the probable donor sites located in the 3' and 5' flanking regions of tprD (Centurion-Lara et al. 2004Go). These variable regions were removed from our analysis in order to focus on the nonrecombinant history of the locus (variable regions were slightly modified to capture additional flanking sites, i.e., basepairs 132–180, 596–671, 749–834, 866–920, 963–1059, 1141–1215, and 1291–1390). Treponema paraluiscuniculi appears to be an appropriate outgroup for tprK as the scale is on the same order of magnitude as tprC and I. Strong bootstrap support is shown for the T. paraluiscuniculi (100%) and T. pallidum subsp. endemicum (97%) clades, as well as for a combined T. pallidum subsp. pallidum + T. pallidum subsp. pertenue clade (96%) in which these 2 subspecies are unresolved relative to each other. However, the fact that variable regions in tprK appear to accumulate more variation in response to selective pressure (Centurion-Lara et al. 2004Go) and clones from single individuals show single nucleotide polymorphisms (SNPs) even after removal of variable regions suggests that tprK may evolve differently than the other tpr loci.


Figure 4
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— ML phylogeny of tprK, rooted using CuniculiA strains. Subspecies designations are indicated by vertical lines and labeled. Bootstrap values based on 1,000 replications are shown next to branches.

 
Statistical Tests for Recombination
Four tests (RDP, MaxChi, Chimera, and GENECONV) in the RDP2 package were used to investigate recombination events in the Subfamily I, II, and III genes (table 4). We use these methods to identify significant recombination and the location of recombinant break points, but we do not infer donor and recipient alleles because there is likely interlocus recombination also occurring that will be undetected because these methods focus on a single locus at a time. In all cases, the same alignment files from the phylogenetic analyses were used for the recombination analyses. Our primary interest was to investigate support for the putative regions of gene conversion identified in the phylogenetic analyses. Relatively strong overlap was shown in the results from all 4 methods, and, in general, MaxChi found the most recombination events, which was previously shown to be the most powerful test in the RDP2 package (Posada and Crandall 2001Go).


View this table:
[in this window]
[in a new window]

 
Table 4 Recombinant Regions Identified by RDP2

 
In tprD, one region of recombination was identified in all of the T. pallidum subsp. pallidum D2 allele sequences, pertenue Gauthier, and T. paraluiscuniculi (see table 4 for exact location of recombinant regions). These results are consistent with a recombination break point present at site 855, which marks the beginning of the central variable region that differentiates the D and D2 alleles. In tprC, 2 regions of recombination in the T. pallidum subsp. pallidum D allele sequences were identified at basepairs 137–889 and 1459–1728. This result is consistent with the 100% clustering of these sequences within the T. pallidum subsp. pallidum clade (fig. 2b). Multiple recombination events were identified in MexicoA and Sea81-3 that are consistent with a branch leading to a monophyletic clade containing MexicoA and Sea81-3 in the tprC phylogeny (fig. 2b). In tprI, only one recombination event was identified in pertenue SamoaD although the sequence has only 2 unique single nucleotide polymorphisms in this region, and point mutation seems a more likely evolutionary mechanism than recombination in this case.

In tprG and J, more than 40 recombination events were identified when the significance level was set to P = 0.01. This result was impossible to interpret precisely, so the analysis was performed again with more stringent settings of P = 0.001 and the requirement that more than one method was necessary to identify a recombination event. Five sequences showed no evidence of recombination under these conditions: pallidum Sea81-4 tprJ (G/J allele), pertenue Gauthier tprG (G/J allele), pertenue Gauthier tprJ (G/J allele), pallidum Nichols tprJ (J allele), and pallidum MexicoA tprJ (J allele). However, all 4 methods identified recombination at the region containing V2 in tprG sequences for both pallidum G alleles (Sea81-4 and Nichols) as well as the pallidum G/J allele (MexicoA). There are 4 polymorphisms between V1 and V2 that are shared between the pallidum G alleles and MexicoA G/J, although they are not found in any of the other G/J alleles from other subspecies, which may contribute to this result.

In tprK, with the extended variable regions excluded, no recombination events were found by any of the methods. These results agree with the phylogenetic analyses, which also do not indicate any recombination outside of the variable regions (although Giacani et al. [2004]Go did identify a putative region of recombination at tprK in CuniculiA between V5 and V6 that was not detected in any of our analyses).

The results of the tests for recombination were consistent with the phylogenetic analyses in the overall detection of a high level of recombination across the studied loci. The recombination tests also identified new regions of recombination, particularly at tprC. Overall, far more recombination was indicated at tprG and J than for any other locus studied here, and this result is consistent with our phylogenetic analyses that revealed multiple instances of paralogous clustering and the presence of multiple divergent alleles at tprG and J.

Analysis of Nucleotide Diversity and Composition
Additional measures, such as nucleotide diversity and GC content, can be used to investigate recombination events, with the acknowledgment that other phenomena also affect these measures (Baldo et al. 2006Go). The amount of within-subspecies genetic diversity is low for all 3 subspecies at loci tprC, I, and K ({pi} = 0.0–0.0076) (table 5). At tprD and J, however, the diversity within T. pallidum subsp. pallidum is very high ({pi} = 0.101 and 0.0958, respectively), reflecting the intra–subspecies gene conversion events discussed above. The amount of diversity at tprG within T. pallidum subsp. pallidum is intermediate ({pi} = 0.0154), and specifically lower than tprJ, reflecting a smaller putative gene conversion event, that is, the V2 region.


View this table:
[in this window]
[in a new window]

 
Table 5 Levels of Nucleotide Diversity within and between Subspecies

 
The pattern of genetic diversity between subspecies of T. pallidum differs among loci, especially for T. pallidum subsp. pallidum. The Dxy nucleotide diversity between T. pallidum subsp. pertenue and T. pallidum subsp. endemicum is fairly consistent within tprC, I, and K (no tprD sequence data currently exist for T. pallidum subsp. endemicum). The Dxy distance between T. pallidum subsp. pallidum and the other 2 subspecies is approximately doubled relative to the distance between T. pallidum subsp. pertenue and T. pallidum subsp. endemicum at tprC and I, consistent with the long branches leading to T. pallidum subsp. pallidum in these phylogenies. However, at tprK, the distance between T. pallidum subsp. pallidum and T. pallidum subsp. pertenue is much smaller than the distance between T. pallidum subsp. endemicum and the others (0.0028 vs. 0.011 and 0.013), consistent with the clustering of T. pallidum subsp. pallidum and T. pallidum subsp. pertenue in the tprK phylogeny. At tprG, the distance between T. pallidum subsp. pallidum and T. pallidum subsp. pertenue is intermediate (Dxy = 0.0130), whereas at tprJ the distance between T. pallidum subsp. pallidum and T. pallidum subsp. pertenue is only slightly lower than between these same 2 subspecies at tprD (Dxy = 0.0962). Again, this is in agreement with the proposed gene conversion or horizontal gene transfer event that created the highly divergent J allele in most T. pallidum subsp. pallidum strains.

Previous studies have suggested that gene conversion events lead to increased GC content at third codon positions (Eyre-Walker 1993Go; Galtier et al. 2001Go; Galtier 2003Go; Noonan et al. 2004Go). Although the molecular mechanism is unknown, it may be due to a GC bias in mismatch repair, which is required to resolve conversion events (Galtier et al. 2001Go; Noonan et al. 2004Go). Third positions reflect this bias more strongly because they are under less selective constraint because base changes at this position are less likely to result in a change in amino acid. At each of the 6 tpr loci studied here, GC content was increased at the third position (GC3) relative to the first and second positions combined (GC1 + 2) (table 6) although not as dramatically as reported in other systems (Noonan et al. 2004Go). This analysis supports our general finding of multiple gene conversion events at the studied loci.


View this table:
[in this window]
[in a new window]

 
Table 6 Average GC Content at Combined First + Second (GC1 + 2) and Third Codon (GC3) Positions

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Intragenomic homologous recombination appears to have been a major force in the evolution of the tpr gene family in the pathogenic Treponema species. After the gene duplication events that created the gene family, our phylogenetic analyses of tprC, D, I, G, J, and K suggest that the high levels of homology among the loci have supported multiple gene conversion events both within and between these tpr genes. Although lateral gene transfer can theoretically produce the genetic signatures we describe, this mechanism has not been reported in T. pallidum (lateral gene transfer has been identified as a probable evolutionary force in the genome of T. denticola due to the signature presence of phage-mediated integration events and restriction-modification systems that may serve as a barrier against lateral gene transfer, but neither of these signatures is present in T. pallidum [Seshadri et al. 2004Go]), and gene conversion appears more likely, particularly at tprC, D, G, and K, where donor regions can be identified within the same genome. No donor sequence was identified for the V1 region of the J allele at tprJ and, thus, horizontal gene transfer cannot be definitively ruled out.

In Subfamily I, we propose 3 gene conversion events between loci tprC and tprD; 1) a tprC-to-tprD conversion that introduced the D allele into tprD in the D pallidum strains, 2) a tprC-to-tprD conversion in T. pallidum subsp. pertenue Gauthier strain that introduced the D3 allele into tprD, and 3) a tprD-to-tprC conversion in T. paraluiscuniculi that introduced the D2 allele into tprC (table 3). At this point, there are insufficient data to determine the order of the 3 proposed gene conversion events. However, it is clear that the tprC-to-tprD conversions (#1 and 2) represent 2 distinct events because the pertenue Gauthier sequences differ by only 2 bp and the pallidum sequences are identical, but the pertenue Gauthier and pallidum sequences differ from each other by 55 bp. Furthermore, our results suggest that the tprC locus is likely to be older than the tprD locus because there is more variation among the pallidum D-like alleles at tprC compared with the pallidum D2 sequences at tprD that are identical (fig. 2a and b). At tprD, the D2 allele is most likely the ancestral allele because it is found in multiple subspecies (i.e., pallidum, pertenue, and endemicum) as well as in T. paraluiscuniculi, whereas the D allele is only found in a subset of pallidum strains. Non-D2 alleles in the 4 pallidum strains and pertenue Gauthier are likely the result of 2 subsequent gene conversion events, as described above.

At tprC, a single gene conversion event (originating from tprD) is posited in T. paraluiscuniculi based on the phylogenetic analyses. The RDP2 recombination analysis identified several small, additional recombinant regions in all of the T. pallidum subsp. pallidum sequences, which is consistent with the higher diversity observed in the T. pallidum subsp. pallidum tprC sequences (fig. 2b, table 4). Close inspection of the tprC alignment (including pallidum and non-pallidum strains) revealed the presence of a high number of nonsynonymous mutations in the pallidum strains that were grouped in clusters rather than scattered throughout the alignment. The transition/transversion ratio was decreased in these clusters and initially revealed a significant signal for positive selection at tprC in T. pallidum subsp. pallidum (data not shown). However, when we examined an alignment of tprC, D, and I together, we found that the majority of the transversions and nonsynonymous mutations were unique to T. pallidum subsp. pallidum at tprC. The presence of clustered mutations, with a high frequency of transversions, argues against accumulated point mutations and instead suggests that multiple smaller, "site-specific" gene conversion events may have occurred at tprC in T. pallidum subsp. pallidum. This may be similar to the presence of multiple, variable regions in tprK, although the tprC regions do not appear to undergo rapid sequence variation as occurs in tprK. These putative recombination events at tprC would have to have occurred prior to the major tprC-to-tprD gene conversion that replaced the D2 allele with the D allele at tprD because the D alleles at tprC and D are identical (table 3). In proteins with antigenic relevance, recombination produces variation that may have an adaptive purpose. However, it is not understood whether these proteins are more likely to undergo recombination or whether high variability simply increases the power of detection of these events (Baldo et al. 2006Go). In the case of tprC, which may be a cell-surface protein as predicted by PSORT analysis (data not shown), it appears that the majority of tprC variation may be a result of gene conversion events supporting an adaptive explanation.

At the tprI locus, variation among the treponemes was scattered and did not highlight a specific region that might have undergone gene conversion as described above for other loci. However, the fact that all 8 of the T. pallidum subsp. pallidum tprI sequences were identical is intriguing, considering this was not the case at any other locus in our study. There are several possible explanations for the 100% sequence homology including a significantly lower (point) mutation rate at tprI in T. pallidum subsp. pallidum, a more recent divergence of the T. pallidum subsp. pallidum tprI sequences, functional constraint at tprI in T. pallidum subsp. pallidum, or a gene conversion event that occurred prior to evolution of the T. pallidum subsp. pallidum tprI sequences (if a sequence longer than that in our data set were replaced, the recombination event would go undetected by our recombination analysis that looked for the end points of recombination events). Both a lower mutation rate and more recent evolution of T. pallidum subsp. pallidum tprI seem unlikely because the lengths of branches leading to T. pallidum subsp. pallidum at tprI and tprC are comparable (0.016 and 0.014 substitutions/site, respectively). Using a tprI phylogeny, a test for selection on the branch leading to the 8 T. pallidum subsp. pallidum sequences indicated that the nonsynonymous/synonymous rate ratio was not significantly different from 1.0 (data not shown). Thus, functional constraint does not appear to explain the lack of mutations at this locus in the pallidum subspecies. No specific gene conversion events were identified at tprI in our analyses (although GC3 content was highest at tprI). The most likely explanation may be that the rate of point mutations is generally low at all tpr genes and the pallidum tprI sequences have escaped (by chance) the frequent gene conversion events that are mainly responsible for the diversity seen at tprC and D in T. pallidum subsp. pallidum.

In Subfamily II, the various alleles that occur at tprG and J (and at tprE and J in T. paraluiscuniculi), that is, the G, J, and G/J alleles, are strongly suggestive of multiple gene conversion events although the directionality of these events is difficult to determine due to the complexity of the DNA sequences at these loci. Because the G/J allele occurs in multiple subspecies and at multiple loci (fig. 3), it appears to be the ancestral sequence. The Nichols and Mexico tprJ sequences have a divergent central region that is suggestive of a gene conversion event (or horizontal gene transfer) that replaced the G/J allele (most likely only the V1 region was replaced with a "J motif" V1). Unlike the scenario proposed above for gene conversions at tprC and D, no donor region is immediately apparent for the gene conversion that created the "J motif" V1 (Blast searches did not identify any homology between the "J motif" at the VI variable region and any other treponemal or nontreponemal sequences). Interestingly, the clustering of the pallidum strains is not consistent between loci. At tprJ, only Sea81-4 has apparently escaped the gene conversion, which created the divergent J allele. At tprG, however, Nichols and Sea81-4 appear to have shared a gene conversion event creating the "G motif" at V2 to the exclusion of MexicoA. This is in contrast with tprD, where the ancestral D allele was replaced by the D2 allele in Nichols but not in MexicoA or Sea81-4. A consistent history of the evolution of the subspecies pallidum strains, therefore, cannot be ascertained from these data.

Previous studies have demonstrated a high frequency of gene conversion events at the tprK locus in T. pallidum subsp. pallidum (Centurion-Lara et al. 2004Go). Seven variable regions have been identified at tprK that are likely the result of multiple gene conversion events, with the probable donor sites located in the 3' and 5' flanking regions of tprD. A multisite/multistep recombination process has been described for the accumulation of diversity within the variable regions. This diversity was shown to accumulate more dramatically in the presence of adaptive immune pressure suggesting a mechanism to generate antigenic diversity in T. pallidum (Centurion-Lara et al. 2004Go). It is probable that the gene conversion operating at tprK is different than that affecting the tprC and D loci. Gene conversion at the tprK locus generates diversity, and each event seems to affect a relatively small region (each variable region is 48–99 bases). In contrast, gene conversions at the tprD loci appear to result in concerted evolution and affect a larger portion of the gene because the T. pallidum subsp. pallidum D alleles are identical at both tprC and D. These seemingly contradictory outcomes of concerted evolution and increased diversity, both mediated by gene conversion, may be explained by differing stages of evolution in a multigene family (Santoyo and Romero 2005Go). In the first stage, after initial gene duplication to create the multigene family, homogenization or concerted evolution is likely the dominant force because the high sequence homology drives gene conversion at a faster rate than point mutation occurs. As point mutations accumulate over time, homologous recombination is no longer as effective, and the rate of point mutations may surpass that of gene conversion. At this stage of evolution of a multigene family, smaller scale "site-specific" gene conversion may become more significant, thus allowing concerted evolution to occur in small regions, whereas antigenic variation is created throughout the gene (Santoyo and Romero 2005Go). This explanation may indicate a younger history for the tprD sequences (i.e., concerted evolution stage) relative to tprK, whereas the tprK (and possibly tprC) sequences are experiencing site-specific gene conversion events, possibly leading to increased antigenic variation.

Evolutionary History of the Treponemes
Several scenarios have been proposed for the evolution of the human treponemal species (Baker and Armelagos 1988Go; Powell and Cook 2005Go). A New World versus Old World origin of venereal syphilis has been long debated (for a recent review, see Powell and Cook 2005Go). The Columbian hypothesis originally suggested that venereal syphilis (T. pallidum subsp. pallidum) originated in the New World and was brought to Europe by Columbus' crews returning from the New World (Crosby 1969Go). This was based on the paucity of skeletal and historical evidence for treponemal disease in the Old World prior to the early 1500s. For example, Rothschild (2003)Go proposed that yaws (T. pallidum subsp. pertenue) was the most ancestral of the 3 T. pallidum subspecies and was present at least as far back as the origin of modern humans in Africa, and the other 2 subspecies each derived from yaws, with T. pallidum subsp. pallidum evolving in the New World no more than ~2000 years ago (Rothschild 2003Go). Baker and Armelagos (1988)Go have proposed an alternative Columbian hypothesis, which suggests that venereal syphilis evolved in Europe from a New World nonvenereal treponeme that was introduced to Europe by Columbus' crews. This hypothesis is based on the lack of specific evidence for venereal syphilis in the New World, despite the overwhelming evidence of treponemal disease. The Pre-Columbian hypothesis suggests that treponemal diseases, including venereal syphilis, existed in the Old World prior to Columbus' voyages but were diagnosed incorrectly. One scenario suggests that pinta was the original form present throughout the world during the Pleistocene, followed by the evolution of yaws (12,000 years ago), then endemic syphilis (9,000 years ago), and, finally, venereal syphilis (5,000 years ago) (Hackett 1963Go). Finally, a Unitarian hypothesis, based on skeletal morphology data, has been advanced by Hudson (1965)Go, who suggests that venereal syphilis, endemic syphilis, yaws, and pinta are not in fact distinct diseases but rather are environmentally determined manifestations of the same disease. More recently, Armelagos et al. (2005)Go have reviewed the molecular literature on human treponemes and they suggest a lack of molecular distinction between these subspecies (Armelagos et al. 2005Go).

Our molecular data suggest that the 3 subspecies are legitimately classified as distinct entities (fig. 2). The phylogenies for tprC and tprI demonstrate high bootstrap support for the separation of the 3 subspecies into separate clades, with relatively long branches leading to endemicum and pallidum. In tprD, pertenue is artificially closer to some pallidum sequences because of the gene conversion event that separates pallidum D and D2 alleles. In the tprK phylogeny, there is no bootstrap support to separate pallidum and pertenue, but the tprK phylogeny is difficult to interpret because this locus has an exceedingly high mutation rate as demonstrated by the fact that multiple tprK sequences exist in a single individual (even when the variable regions are removed). Furthermore, AMOVA results reveal a significant amount of among-subspecies variation (70–95%, P = 0.000) when analyzing tprC, I, and K from all 3 subspecies further supporting the genetic distinctiveness of the subspecies (data not shown). It is clear that recombination has played a significant role in the evolution of the tpr genes and possibly in the evolution of the treponemes. Therefore, studies that look only at SNPs, as reviewed by Armelagos et al. (2005)Go, will miss this high level of recombination, and it may appear there are few subspecies-specific variants because recombination has frequently scrambled alleles within a subspecies, for example, the D and D2 alleles at tprD. Moreover, the fact that these recombination events are unique to a subspecies argues strongly in favor of the genetic distinctiveness of the 3 subspecies.

Ascertaining the distinctiveness of the subspecies is prerequisite to resolving their evolutionary history. In general, our analyses do not appear to support a dramatically older origin of yaws relative to venereal syphilis contra Rothschild (2003)Go, that is, we do not see greater levels of variation or longer tree branches for T. pallidum subsp. pertenue relative to T. pallidum subsp. pallidum (figs. 2b, 2c, and 4, table 4). Furthermore, our results do not clearly support current hypotheses that consider venereal syphilis to be the most recently evolved of the treponemal syndromes. For instance, multiple gene conversion events have occurred within subsets of T. pallidum subsp. pallidum strains at tprC, D, G, and J (figs. 2a, 2b, and 3) arguing for an older evolution of the entire subspecies of pallidum in order to allow sufficient time for these events to have occurred. The long branch in the tprI phylogeny leading to T. pallidum subsp. pallidum reflects a large number of point mutations, which are assumed to evolve in a clock-like manner, suggesting that more evolution has occurred on the branch to pallidum than on the branches to the other 2 subspecies. Our results are generally consistent with a relatively coincident evolution of the 3 human treponemal subspecies as proposed by Hackett (1963) but contra Rothschild (2003)Go, who proposed dramatically different time frames for evolution of yaws and venereal syphilis. Moreover, the T. pallidum subsp. pallidum sequences appear to carry too much variation to support the modified Columbian hypothesis of evolution of venereal syphilis within the past 500 years (Baker and Armelagos 1988Go). This is further supported by the fact that at least one T. pallidum subsp. pallidum strain (Nichols) was collected in the early 1900s and is identical at the loci examined to several other strains collected later in the 20th century, suggesting that the mutation rate is not high enough to have created variants within this time frame. Additional samples, for example, more representatives of T. pallidum subsp. endemicum and T. pallidum subsp. pertenue, and analysis of more loci will be needed to definitively answer questions concerning the origin and evolution of the treponemes. Moreover, the high levels of recombination revealed in our study suggest that the analysis of contiguous sequence data, as opposed to analysis of scattered SNPs, will be necessary to identify possible recombination events prior to reconstruction of the evolutionary history of the treponemes.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
This work was supported in part by US Public Health Service grants AI 34616, AI 42143, and AI 63940.


    Footnotes
 
Jennifer Wernegreen, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Armelagos GJ, Harper KN, Ocampo PS. (2005) On the trail of the twisted Treponeme: searching for the origins of syphilis. Evol Anthropol: Issues News Rev 14:240–2.[CrossRef]

    Baker BJ and Armelagos GJ. (1988) The origin and antiquity of syphilis: paleopathological diagnosis and interpretation. Curr Anthropol 29:703–38.[CrossRef][ISI][Medline]

    Baldo L, Bordenstein S, Wernegreen JJ, Werren JH. (2006) Widespread recombination throughout Wolbachia genomes. Mol Biol Evol 23:437–49.[Abstract/Free Full Text]

    Centurion-Lara A, Castro C, Barrett L, Cameron C, Mostowfi M, Van Voorhis WC, Lukehart SA. (1999) Treponema pallidum major sheath protein homologue Tpr K is a target of opsonic antibody and the protective immune response. J Exp Med 189:647–56.[Abstract/Free Full Text]

    Centurion-Lara A, Godornes C, Castro C, Van Voorhis WC, Lukehart SA. (2000) The tprK gene is heterogeneous among Treponema pallidum strains and has multiple alleles. Infect Immun 68:824–31.[Abstract/Free Full Text]

    Centurion-Lara A, LaFond RE, Hevner K, Godornes C, Molini BJ, Van Voorhis WC, Lukehart SA. (2004) Gene conversion: a mechanism for generation of heterogeneity in the tprK gene of Treponema pallidum during infection. Mol Microbiol 52:1579–96.[CrossRef][ISI][Medline]

    Centurion-Lara A, Sun ES, Barrett LK, Castro C, Lukehart SA, Van Voorhis WC. (2000) Multiple alleles of Treponema pallidum repeat gene D in Treponema pallidum isolates. J Bacteriol 182:2332–5.[Abstract/Free Full Text]

    Crosby A. (1969) The early history of syphilis: a reappraisal. Am Anthropol 71:218–27.[CrossRef