MBE Advance Access originally published online on March 6, 2007
Molecular Biology and Evolution 2007 24(5):1242-1258; doi:10.1093/molbev/msm039
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Multiple Evolutionary Mechanisms Drive Papillomavirus Diversification



* Skin Cancer Center Charité, University Hospital of Berlin, Berlin, Germany
École Polytechnique Fédérale de Lausanne, School of Computer & Communication Sciences, Laboratory for Computational Biology and Bioinformatics, Lausanne, Switzerland
Deutsches Krebsforschungszentrum/German Cancer Research Centre, Infection and Cancer, Heidelberg, Germany
E-mail: i.bravo{at}dkfz.de.
| Abstract |
|---|
|
|
|---|
The circular, double-stranded 8-kb DNA genome of papillomaviruses (PVes) consists mainly of 4 large genes, E1, E2, L2, and L1. Approximately 150 papillomavirus genomes have been sequenced to date. We analyzed a representative sample of 53 PVes genomes using maximum likelihood, Bayesian inference, maximum parsimony, and distance-based methods both on nucleotide (nt) and on amino acid (aa) alignments. When the 4 genes were analyzed separately, aa-inferred phylogenies contradicted each other less than nt-inferred trees (judged by partition homogeneity tests). In particular, gene combinations including the L2 gene generated significant incongruence (P < 0.001). Combined analyses of the remaining genes E1E2L1 produced a well-supported phylogeny including supertaxon ß +
+
+
-PVes (infecting Artiodactyla, Carnivora, Primates, and Rodentia) and supertaxon
+
+ µ +
+
-PVes (infecting Carnivora, Lagomorpha, Primates, and Rodentia). Based on the tree topology, host-linked evolution appears plausible at shallow, rather than deeper, taxonomic levels. Diversification within PVes may also involve adaptive radiation establishing different niches (within a single-host species) and recombination events (within single-host cells). Heterogeneous groups of closely related PVes infecting, for example, humans and domestic animals such as hamster, dog, and cattle suggest multiple infections across species borders. Additional evolutionary phenomena such as strong codon usage preferences, and computational biases including reconstruction artifacts and insufficient taxon sampling, may contribute to the incomplete resolution of deep phylogenetic nodes. The molecular data globally supports a complex evolutionary scenario for PVes, which is driven by multiple mechanisms but not exclusively by coevolution with corresponding hosts.
Key Words: adaptive radiation coevolution high performance computing host interspecies transmission recombination virology zoonosis
| Introduction |
|---|
|
|
|---|
Papillomaviruses (PVes) infect stratified squamous epithelia of warm-blooded animals. Targets of the infection are undifferentiated keratinocytes in the basal cell layer. The progression of the virus infection depends on keratinocyte differentiation (Bedell et al. 1991
In the past years, the available number of complete PV genome sequences has increased substantially and comprises nearly 150 GenBank entries (November 2006). The PV genome is a single molecule of double-stranded DNA and comprises approximately 8,000 bp. Eight well-defined open reading frames (ORFs) are encoded, which are all transcribed from the same DNA strand with the same orientation. The translated proteins are classified as "early" (E) and "late" (L) based on their temporal expression. They include 3 regulatory genes involved in transcription and replication (E1, E2, and E4), 3 oncogenes (E5, E6, and E7), and 2 genes coding for self-assembling proteins that give rise to the viral capsid (L1 and L2; Münger and Howley 2002
). The complete L1 gene, or fragments of it, is commonly used for detecting PV infections and for typing PVes. For this reason, PV systematics have traditionally been inferred from the L1 gene, defining clear-cut nucleotide (nt) identity thresholds for the delimitation of higher taxonomic units such as "species" and "genera" (de Villiers et al. 2004
; Bernard 2005
).
PVes have been isolated from birds, marsupials, and placental mammals and are generally considered to be highly specific for their hosts. However, bovine PVes are able to cause nonproductive infections in horses and other only distantly related mammals (Thomas et al. 1964
; Lancaster et al. 1977
; Pfister et al. 1981
; Trenfield et al. 1985
; Otten et al. 1993
; Chambers et al. 2003
). Many viral taxa such as Alphapapillomavirus (
-PVes), Deltapapillomavirus (
-PVes), and Lambdapapillomavirus (
-PVes) roughly correspond to their mammalian host taxa, namely Primates, Artiodactyla, and Carnivora (Bernard et al. 1994
; Myers et al. 1994
; Chan et al. 1995
; Farmer et al. 1995
; de Villiers et al. 2004
; García-Vallvé et al. 2005). Furthermore, phylogenetic clusters of PV variants are congruent with the geographic origin, at least in some human PV (HPV) types such as HPV-16 and HPV-18 (Chan et al. 1992; Ho et al. 1993; Ong et al. 1993; Yamada et al. 1997
; Arias-Pulido et al. 2005
; Prado et al. 2005). This has led to the general assumption that "host-linked evolution" (Chan et al. 1995
, 1997
) is the driving force for the diversification of PVes (Halpern 2000
; Bernard et al. 2006
).
However, the evolutionary mechanisms of PVes are more complex. For example, infections across species borders termed zoonoses (WHO Expert Committee 1982
) may have contributed to the evolution of PVes (Myers et al. 1996
; Rector, Van Doorslaer, et al. 2005; Gottschling et al. 2007
). In addition, phylogenetic inconsistencies between early and late genes have been identified for some groups in the
-PVes (García-Vallvé et al. 2005; Narechania et al. 2005
). This group includes cervical cancer-associated human PVes (HPVes) and accounts for more than half of the complete PV genomes in GenBank. Evolutionary incongruence might arise from singular events in the past such as recombination, the establishment of new ecological niches, and/or asymmetric genome convergences driven by intense selection (Narechania et al. 2005
; Varsani et al. 2006
). However, well-supported contradicting tree topologies between early and late genes have not been found for Betapapillomaviruses (ß-PVes) (Gottschling et al. 2007
), which represent another important and diverse PV clade. This may indicate that concerted evolution of early and late genes is the rule in PVes.
Knowledge about viral evolution is still relatively poor compared with living organisms. However, a broad range of bioinformatics tools has been applied to analyze the complete PV genome (or at least properly alignable regions of it) during the past 2 years. The computation of confidence values for internal nodes allows for well-substantiated phylogenetic conclusions (Chen et al. 2005
; Rector, Van Doorslaer, et al. 2005; Schiffman et al. 2005). Appropriate outgroup choice enables the evaluation of evolutionary polarity in PVes (García-Vallvé et al. 2005; Narechania et al. 2005
; Gottschling et al. 2007
). Nonetheless, a comprehensive scenario of evolution and phylogenetic relationships within PVes has not yet been developed, especially with respect to the basal nodes of the tree. The usage of high performance computing techniques and platforms in combination with advanced maximum likelihood (ML) search algorithms such as RAxML (Stamatakis et al. 2005
; Stamatakis 2006b
) enables thorough ML-based phylogenetic analyses including a sufficiently large amount of 1,000 bootstrap replicates.
In this study, we aim to identify those PV sequences that perturb the reconstruction of a concerted phylogeny and to choose the optimal set of suitable genes for phylogenetic inference. We have calculated ML bootstrap values and compared them with alternative phylogenetic methods and criteria including Bayesian inference, maximum parsimony (MP), and distance-based methods. Partition homogeneity tests (PHTs) quantify, how and whether distinct individual genes can be combined into multigene alignments in order to infer a consensus phylogeny. We have applied various techniques to achieve the best-possible reduction of reconstruction artifacts. By application of these techniques, we provide the best-supported phylogenetic tree of PVes so far. It might serve as a basis for improved classifications, outgroup choice for internal phylogenetic analyses, and critical time estimates in future studies. Our results support a multicausal scenario of PV evolution including host-linked evolution, recombination, possible transmission across species borders, and potential adaptive radiation events acting together under mutual influence.
| Materials and Methods |
|---|
|
|
|---|
For each of the genes (E1, E2, L2, and L1), a representative set of 53 sequences covering the currently known diversity of PVes (table 1) was manually aligned at the amino acid (aa) level and back-translated into codon-aligned nt sequences with Se-Al v2.0a72 (Rambaut 2001
stamatak/index-Dateien/material/Alignment-Data.zip.
|
The "complete genome" matrix comprising the concatenated E1E2L2L1 sequences was partitioned into the 4 genes (supplementary table S1, Supplementary Material online) in order to investigate previously reported divergent gene evolution in PVes (Bravo and Alonso 2004; García-Vallvé et al. 2005; Narechania et al. 2005
0.001 were considered as indicators for significant incongruence between the partitions (Cunningham 1997
ML-based phylogenetic analyses were conducted using the parallel Message Passing Interface (MPI) version of RAxML-VI-HPC (Stamatakis 2006b
; freely available at http://icwww.epfl.ch/
stamatak). The analyses were executed on the Infiniband cluster at the Technical University of Munich (www.lrr.in.tum.de/Par/arch/infiniband), which is equipped with 136 AMD Opteron processors. Initially, the best-scoring aa substitution model was determined by optimizing branch lengths and model parameters on a fixed random stepwise addition sequence MP RAxML starting tree under the 20 distinct aa substitution models currently implemented in the program. Parameters were optimized on a fixed MP tree because ML model parameters do not change significantly when optimized on a reasonable (i.e., nonrandom) tree (Yang 1996
). For L2, the best-scoring aa model was WAG + F +
(WAG with empirical base frequencies and the
model of rate heterogeneity; Whelan and Goldman [2001]
) and rtREV + F +
(rtREV with empirical base frequencies and the
model of rate heterogeneity; Dimmic et al. [2002]
) for the E1, E2, and L1 genes (supplementary table S2, Supplementary Material online).
For DNA analyses, we used the GTR +
model of nt substitution (with 4 discrete rate categories) because RAxML only provides GTR +
and the GTR + CAT approximation (Stamatakis 2006a
) of rate heterogeneity for nt data. The rationale for this is that the shape of the topology has a significantly higher impact on final likelihood values than model details. Therefore, RAxML implements a technically highly optimized GTR likelihood function which allows for a more exhaustive exploration of the huge tree search space, and thus yields better results than competing ML programs on real data (Stamatakis et al. 2005
; Stamatakis 2006b
). Nonetheless, the usage of rate heterogeneity has a significant impact on final tree shapes. An estimate of the proportion of invariant sites is not implemented in RAxML due to statistical concerns regarding the simultaneous usage of
- and P-Invar, which are discussed in the RAxML manual. Finally, we did not use the significantly faster GTR + CAT approximation of rate heterogeneity because the alignments were relatively small with respect of the number of taxa, and thus, we were concerned about insufficient data for the estimation of per-site evolutionary rates. Moreover, trees inferred under GTR + CAT scored on average slightly worse (12 log likelihood units) under GTR +
than trees inferred entirely under GTR +
.
We analyzed all multigene alignments under both plain (one set of ML substitution parameters was estimated over the entire alignment) and mixed models (ML model parameters were estimated separately for each gene). In order to determine the best-known ML tree for each alignment/model combination, we executed 127 tree searches from distinct random stepwise addition sequence MP starting trees on 128 processors of the Infiniband cluster. Therefore, each central processing unit (CPU) executed one tree inference on a distinct starting tree, whereas the 128th CPU acted as master process for work distribution as previously described (Stamatakis 2006b
). Thereafter, we executed 1,000 nonparametric bootstraps for each alignment with RAxML, and the bootstrap values were drawn on the best-scoring ML-tree using the respective RAxML program option (see RAxML manual for details). In total, we executed over 10,000 nonparametric bootstraps and over 1,270 ML searches for best-known trees.
Bayesian phylogenetic analyses were performed with BEAST version 1.3 (Drummond et al. 2002
; Drummond and Rambaut 2003
; freely distributed by the authors at http://evolve.zoo.ox.ac.uk/beast/). For the WAG +
aa substitution model (Whelan and Goldman 2001
) with 4 discrete
rate categories as well as for the HKY +
nt substitution model (Hasegawa et al. 1985
) with 4 discrete categories, we used an uncorrelated relaxed clock. In such models, the rate for each branch of the tree is drawn independently and identically from the underlying exponential distribution (Drummond et al. 2006
). Parameter values were optimized via Markov Chain Monte Carlo methodology after repetitive short heuristic searches (50,000 iterations with 10,000 burn-in cycles). The unweighted pair group method with arithmetic mean was used to construct a starting tree for the BEAST analyses, and the final topology was estimated based on 1,000,000 iterations using 100,000 burn-in cycles and sampling every 1,000 iteration.
MP and distance-based calculations were run in PAUP*. Trees were generated by performing heuristic searches with tree bisection reconnection and starting trees obtained via random taxon addition with 10 replicates (parsimony) or Neighbor-Joining (distance measure: mean character difference), respectively. No upper limit for the number of equally parsimonious trees was specified. In addition, we assessed the performance of the parsimony ratchet (Nixon 1999
) in order to search all most-parsimonious tree (MPT) islands, despite the fact that the number of MPTs was low during heuristic searches with PAUP*. We used perlRat v.1.9a (Bininda-Emonds 2006
) to generate batch files for parsimony ratchet runs with PAUP*. Nonparametric bootstrap support was estimated based on 1,000 replicates using the same search strategy as in the tree searches. The best-fit substitution model for nt data was selected based on the Akaike Information Criterion as implemented in Modeltest 3.7 (Posada and Crandall 1998
) and was used for distance-based analyses (with ML settings). Gaps were treated as missing data in all analyses.
| Results |
|---|
|
|
|---|
The E1E2L1 Open Reading Frames of PVes Are Phylogenetically Congruent
Data on length and number of informative sites of the aa and nt alignments (calculated with the best-fit model, GTR +
+ I; number of substitution types, 6; number of distinct data patterns under this model, 4003 using the complete data matrix in PAUP* analyses) used in this study is provided in supplementary table S1 (Supplementary Material online). Overall, PHT values were low between gene pairs of nt sequence data (P
0.020; table 2) but were consistently higher for aa sequence data. With respect to aa sequence data, each gene pair that included the L2 gene yielded low PHT values (P < 0.010 taking into consideration the entire taxon sampling), whereas all other pairs rendered values above the threshold. PHT values increased, even in analyses including the L2 gene, when PVes with well-supported, contradicting phylogenetic positions (i.e., HPV-16, Mupapillomaviruses [µ-PVes], "PlPV"; see below) were excluded from the analyses and when well-defined, taxonomic subsets were separately investigated (e.g.,
-,
-PVes, and the supergroup comprising Kappapapillomaviruses [
-PVes], Nupapillomaviruses [
-PVes], Sigmapapillomaviruses [
-PVes],
-, and µ-PVes). The results shown in supplementary table S2 (Supplementary Material online) support the combination of the E1E2L1 ORFs at the aa level for a simultaneous phylogenetic analysis, but not the combination of the L2 ORF with any other gene. Therefore, we performed thorough phylogenetic analyses with each single gene and with the E1E2L1 gene combination.
|
Phylogenetic Relationships in PVes Have Largely Reliable Support
The various phylogenetic approaches explored in this study comprising different sequence data (nt vs. aa sequences), different partitions (separated genes vs. combined analyses), different models (mixed models vs. plain models), and different methodological criteria (ML, Bayesian inference, MP and distance) did not render overall congruent phylogenies. Nonetheless, the statistic support for many internal nodes was extraordinarily high. A series of major monophyletic assemblages could clearly be distinguished, 1) independently of whether the data was analyzed at aa or nt level (fig. 1); 2) independently of whether the data was analyzed simultaneously or in separate partitions (figs. 1 and 3
|
|
|
The PV genera, including Xipapillomaviruses (
-PVes), Gammapapillomaviruses (
-PVes),
-, ß-,
-,
-,
-, and µ-PVes were each confidently monophyletic under the different approaches investigated (bootstrap support values from ML analyses, maximum likelihood bootstrap support (MLBS); MP analyses, maximum parsimony bootstrap support (MPBS); and distance analyses, distance bootstrap support (DBS) each > 75 and Bayesian posterior probabilities, bayesian posterior probability (BPP) > 0.95), with the exceptions of
-PVes that were not well supported in the separate L2 and L1 analyses and a few other nodes in nt and in separate E2 and L2 aa analyses (fig. 1 and Supplementary fig. S1, Supplementary Material online). Monophyly of PV clades therefore corresponded to monophyly of some infected host taxa such as Primates (
-, ß-,
-, µ-PVes), Artiodactyla (
-,
-PVes), Lagomorpha (
-PVes), and Carnivora (
-PVes).
Deeper phylogenetic nodes showed similarly high confidence values in the various analyses (MLBS, MPBS, DBS > 75 and BPP > 0.95):
+
-PVes (both infecting Artiodactyla), HPV-101 + HPV-103 (infecting Primates),
+ McPV-2 (infecting Rodentia, only 71 MPBS in the L1 analysis), and
+
-PVes (infecting Primates and Rodentia). Furthermore, the 2 groupings super-
-PVes (including ChPV) and super-
-PVes (comprising
- and
-PVes, BPV-7, CfPV-2, HPV-101 and HPV-103) were well supported by the multigene analyses (figs. 12).
Figure 2 shows the best-scoring likelihood tree of the combined genes E1E2L1 that was calculated using the best-fit model (rtREV + F +
; supplementary table S2, Supplementary Material online) with the statistical support values for each of the 4 phylogenetic approaches used. The relationships between the major monophyletic assemblages described above and the remaining PVes were not fully resolved, but the following 7 groupings and individual viruses could be confidently stated at highest taxonomic level (fig. 2 and Supplementary fig. S2, Supplementary Material online):
- 1) A diverse and heterogeneous clade (with respect to the hosts) comprising the groups
,
, µ,
, and
(97 MLBS, 1.00 BPP, 82 MPBS, and 86 DBS).
-,
-, and
-PVes clustered together (94 MLBS, 1.00 BPP, and 72 MPBS) and were the sister group of
- and µ-PVes (97 MLBS, 1.00 BPP, 94 MPBS, and DBS 79). PVes in this
+
+ µ +
+
-supertaxon were isolated from Carnivora, Lagomorpha, Primates, and Rodentia.
- 2) Another diverse and heterogeneous clade (with respect to the hosts) comprised super-
-, super-
-, and ß-PVes (84 MLBS, 0.99 BPP, 86 MPBS, and 56 DBS), whereas the latter 2 appeared to be closely related by moderate statistical support (59 MLBS, 0.99 BPP, 56 MPBS, and 53 DBS). PVes in this ß +
+
+
-supertaxon infected Artiodactyla, Carnivora, Primates, and Rodentia.
- 3) Close relationship of the 2 PVes isolated from Cetacea: PsPV-1 and TtPV-2 (100 MLBS, 1.00 BPP, 93 MPBS, and 100 DBS).
- 4) Close relationship between the artiodactylan
+
-supertaxon with the equine PV (70 MLBS and 58 DBS), and both were closely allied to CPV-3 (62 MLBS).
- 2) Another diverse and heterogeneous clade (with respect to the hosts) comprised super-
However, relationships between these 4 groups and 5)
-PVes, 6) RaPV, and 7) TmPV only showed low statistical support.
-PVes and the 2 PVes isolated from Cetacea were closely related as inferred from the separate E1 gene analysis (
+ o-supertaxon: 87 MLBS, 1.00 BPP, 70 MPBS, and 75 DBS; Supplementary fig. S1, Supplementary Material online).
Some Nodes Show Well-Supported Phylogenetic Contradiction
The comparison of the 4 separate E1-, E2-, L2-, and L1-phylogenies identified 3 (groups of) PVes with highly supported, contradicting phylogenetic positions under the ML criterion (MLBS > 75; frequently supported also by alternative methods), namely HPV-16 within
-PVes, "PlPV" within
-PVes, and µ-PVes within the
+
+ µ +
+
-supertaxon (Supplementary fig. S1 and table S3, Supplementary Material online). In addition, a series of nodes contradicted close relationships identified in the 3-genes ML tree (fig. 2) with high statistical support from alternative methods (DBS, MPBS > 75 and BPP > 0.95; see supplementary table S4 [Supplementary Material online] for details).
Phylogenies of PVes and Their Hosts Are Partly Incongruent
Some of the PV groups recovered were largely congruent to the corresponding taxa of the mammals they infected (see above). However, a series of PV types did not cluster accordingly to the phylogeny of their hosts:
- PVes infecting Primates:
-, ß-,
-, µ-, and
-PVes never constituted a monophyletic group in any of our analyses, and not even 2 of them showed a close relationship.
- Non-human PVes infecting Primates: RhPV-1 (rhesus monkey) and CCPV + PCPV (chimpanzees) nested within
-PVes and appeared independently derived from the paraphyletic HPVes of this group.
- PVes infecting Bovidae: OPV-1, OPV-2, BPV-1, BPV-2, and BPV-5 were paraphyletic, and not monophyletic, as their mammalian host taxon; to the contrary, OPV-1 + OPV-2 showed a well-supported close relationship with PVes infecting Cervidae (i.e., DPV, EEPV, RPV; 94 MLBS, 1.00 BPP, 80 MPBS, 98 DBS).
- Other PVes infecting Artiodactyla: BPV-7 and ChPV +
-PVes were each only distantly related with the core artiodactylan
+
-PVes.
- PVes infecting Cetartiodactyla: Cetacean PVes (PsPV-1 + TtPV-2) did not cluster with any of the artiodactylan PVes, not even with the core
+
-PVes.
- PVes infecting Carnivora: neither CfPV-2 nor CPV-3 was closely related to the core carnivoran
-PVes, but alternatively with HPV-101 + HPV-103 (65 MLBS, 1.00 BPP, 65 MPBS, 94 DBS) and
+
+
-PVes (62 MLBS), respectively.
- PVes infecting Rodentia: McPV-2 +
-PVes, MnPV-1, and TmPV did not constitute an own clade.
| Discussion |
|---|
|
|
|---|
The E1E2L1 Gene Combination is Suitable for the Reconstruction of Papillomavirus Evolution
The importance of knowledge about the evolution of pathogens such as PVes has been increasingly acknowledged during the last years (Bernard 2005
So far, previous molecular studies have focused on particular ORFs such as the E1E2 genes (Bravo and Alonso 2007), the E5 gene (Bravo and Alonso 2004), the E6 gene (Van Ranst et al. 1995
), the E7 gene (Van Ranst et al. 1992
), and the L1 gene (Chan et al. 1995
; de Villiers et al. 2004
). Only few approaches have incorporated as much genetic information as possible to investigate various PV genes separately (García-Vallvé et al. 2005) and/or particular subordinate PV groups (e.g., HPVes: Van Ranst et al. [1995]
,
-PVes: Narechania et al. [2005]
, ß-PVes: Gottschling et al. [2007]
). Thus, we present the first comprehensive analysis on the internal phylogenetic relationships of PVes in this study using the 4 large genes and covering the currently known diversity.
We have aimed to minimize reconstruction artifacts by manual refinement of the alignment. The PHTs (table 2) indicate that the E1E2L1 ORF combination at aa level is well suited for simultaneous phylogenetic inference of the entire PV sequence data set. However, the inclusion of the L2 gene in analyses appears to be only justified when the reconstruction of PVes at a lower taxonomic level is addressed. Our study therefore identifies those parts of the PV genome that can confidently be combined to minimize the degree of data-inherent perturbation for phylogenetic analyses of PVes.
Based on extraordinarily high statistical support for many nodes, we confirm the existence of a series of PV clades that have been previously ranked as "genera" based on L1 gene analyses (de Villiers et al. 2004
; Bernard et al. 2006
). Our results reliably expand the knowledge about basic relationships between such groups of PVes, particularly by identifying the supertaxa ß +
+
+
- and
+
+ µ +
+
-PVes. This is a clear advantage with respect to the present formal listing of more than 15 equally ranked groups and may be of importance for future PV classification. Despite our extensive phylogenetic analyses, the precise positions of the supertaxa
+ o- and
+
-PVes as well as of a few isolated PVes including CPV-3, EcPV, MnPV-1, RaPV, and TmPV remain currently unresolved.
Host-linked Evolution Alone Cannot Explain the Molecular Trees of PVes
The apparent congruence between phylogenies of both PVes and their mammalian hosts has initially led to the assumption that host-linked evolution is the driving force for virus diversification (Bernard et al. 1994
; Myers et al. 1994
; Chan et al. 1995
; Halpern 2000
; de Villiers et al. 2004
; García-Vallvé et al. 2005; Bernard et al. 2006
). Despite their proven importance, in-depth investigations of the role of coevolutionary interactions in phylogenetic diversification of pathogens and host lineages are remarkably limited (Nunn 2004
). Coevolution is plausible if the phylogeny of a group of hosts is congruent with the phylogeny of a group of corresponding parasites, organelles, or pathogens. The presence of
-PVes on Primates,
-PVes on Artiodactyla, and
-PVes on Carnivora makes, for example, such an assumption plausible at a first glance.
However, our results challenge the view that host-linked evolution fully explains the molecular PV trees without alternative. Viral phylogeny is frequently incongruent to hominid phylogeny (Purvis 1995
) at a broad scale, and molecular data for PVes also contradicts the coevolutionary hypothesis: non-human PVes infecting Primates (RhPV-1 and CCPV+PCPV) do not have basal, but highly derived and polyphyletic positions within
-PVes, and are closely related to different HPV types (fig. 2). Concomitantly, the numerous HPVes are not monophyletic, though this would have been expected if strict coevolution between hominids and PVes had occurred. This is in agreement with previous studies that showed a large diversity of non-human PVes nesting within
- and ß-PVes in a polyphyletic pattern (Chan et al. 1997
; Antonsson and Hansson 2002
; Gottschling et al. 2007
).
Another instructive example for evolutionary incongruence between host- and PV-phylogenies is given by the monophyly of the Bovidae (Hernández Fernández and Vrba 2005
) and the paraphyly of the corresponding PVes from the
-group (fig. 2). Furthermore, bovine PVes infecting the same host (Bos taurus) are found in at least 3 only distantly related lineages. This is in agreement with a previous study that found a broad spectrum of only distantly related bovine PVes (Ogawa et al. 2004
). Other cases of incongruence between PV- and host-phylogenies comprise PVes that infect Cetartiodactyla, Rodentia, and Carnivora (see results section), for which the hypothesis of exclusively host-linked evolution in PVes is likewise rejected by the molecular trees.
Various Putative Interferences Including Ancient Recombination Events May Perturb the Reconstruction of Papillomavirus Evolution
The question arises whether phylogenetic incongruence between PVes and hosts reflects the natural history of the viruses or whether it is due to reconstruction artifacts, as suggested for metazoan phylogeny (Baurain et al. 2006
). Long branch attraction by rate heterogeneity among different parts of the tree (Philippe et al. 2005
) is frequently discussed as a reason for phylogenetic interferences, but may have played a minor role in our reconstructions. The branches of the trees are largely well balanced, with the only exception of some isolated PVes showing uncertain phylogenetic positions (e.g., CPV-3, EcPV, MnPV-1, RaPV, and TmPV). Particularly, the L2 phylogeny exhibits some prime outliers (e.g., RaPV and TtPV-2; Supplementary fig. S1, Supplementary Material online), and we have excluded this gene from our simultaneous analysis based on the PHT results. The limitation arising from gene exclusion might become negligible in phylogenomics because it is possible to discard more than half of the data, although still recovering highly supported and plausible trees substantially devoid of tree reconstruction artifacts (Delsuc et al. 2005
; Jeffroy et al. 2006
).
Evolutionary disturbance may account for conflicting tree topologies when different genetic regions are investigated separately (Bravo and Alonso 2004; García-Vallvé et al. 2005; Gottschling et al. 2007
). Recently, recombination, which necessarily had to occur within single-host cells, has been investigated more rigorously, and up to 7 such events have been reconstructed in PVes by bioinformatics approaches (Narechania et al. 2005
; Varsani et al. 2006
). Five of the possible recombination sites are located in the L2 gene, which is in agreement with our PHT results. They clearly indicate the phylogenetic incongruence between L2 and the remaining genes. The potential for evolutionary signal perturbation of L2 is underlined by the large amount of positions that may not be homologous, or may have been saturated by multiple substitutions, and have therefore been removed from our analyses (only 35% of the original length after GBlocks processing; supplementary table S1 [Supplementary Material online]). Derived both from the relatively low number of well-supported phylogenetic conflicts (Supplementary fig. S1 and tables S3S4, Supplementary Material online) and from the relative stability of the trees using multigene matrices, ancient recombination should be considered rather rare events in PVes.
Knowledge about diversity is still extremely sparse, not only but also especially for PVes, and insufficient taxon sampling can have a major impact on phylogenetic analyses (Jeffroy et al. 2006
). Thus, the present scattered and fragmentary collection of both human and non-human PVes might influence any reconstruction of PV evolution. The number of complete PV genomes available is extremely biased toward human sources due to the clinical focus on the association between HPV infections and various types of cancer. The insufficient sampling of non-human PVes is best illustrated by the historical distinction between "HPVes" and "animal PVes" as were they two distinct and unrelated entities (Bernard 2005
). Thus, increasing the taxon sampling by isolating and sequencing novel PVes from systematically selected hosts is of crucial importance. This may shed light on viral evolution and on the interactions with their hosts by breaking the long branches that lead to isolated viruses in the molecular trees.
Additional evolutionary phenomena at the molecular level such as strong codon usage bias between PVes and their hosts have been reported (Ong et al. 1997
; Zhou et al. 1999
; Zhao et al. 2003
; Mossadegh et al. 2004
; Bravo and Müller 2005
). Codon preferences might contribute to the weak PHT values when investigating PV sequence data at the nt level, even under exclusion of the 3rd-codon position in our analyses. We have aimed to avoid such phylogenetic interferences by the usage of sequence data at the aa level. However, the impact on the trees by persistence of ancestral polymorphisms ("incomplete lineage sorting": Maddison [1997]
; Maddison and Knowles [2006]
) remains to be determined in future studies.
Infections across Species Borders and Adaptive Radiations May Have Additionally Contributed to Papillomavirus Diversification
Given the assumption that the evolutionary incongruence essentially reflects the natural history of both PVes and their hosts, alternative explanations for the PV tree topologies should be discussed. Our molecular data suggest that PVes infect groups of organisms (lineages) rather than particular species, at least in geological times. As previously suggested (Myers et al. 1996
; Rector, Van Doorslaer, et al. 2005; Gottschling et al. 2007
), lateral gene transfer by infections across species borders may be relatively frequent within those groups of close relationship. The more the phylogenetic distance grows between the native and the putative new hosts, the more such zoonoses will become unlikely. For closely related PVes, the main obstacle for infection of novel hosts might usually be the absence of physical contact, exemplified by the exceptional case of a zookeeper who temporarily tested positive for a chimpanzee PV (Antonsson and Hansson 2002
). Furthermore, the bovine PV-1 is able to infect horses and to produce sarcoids (Pfister et al. 1981
; Otten et al. 1993
; Chambers et al. 2003
), and even roughly half of the healthy horses in contact with infected fellows carry bovine viruses in their skin (Bogaert et al. 2005
). Finally, the perpetuation of host specificity might have particular importance in the
+ o-supertaxon, where sexual intercourse is required for contagion.
The increasing proximity of human and animal populations has generally led to the increase of zoonotic transmission events, but the factors causing them are still poorly understood (Mahy and Brown 2000
). Multiple invasions of only distantly related mammals may explain the existence of clades such as the super-
-PVes infecting humans and domestic animals such as hamster, dog, and cattle. PV establishment on a new host has not yet been experimentally verified (Halpern 2000
; Bernard et al. 2006
), but endothermy of the hosts might be one of the licenses that allows the viruses to cross the species barrier. Furthermore, a low immune status may facilitate invasions into comparable ecological environments provided by putative new hosts as shown for influenza viruses (Weiss 2003
; Fislova and Kostolansky 2005
; Kaye and Pringle 2005
). However, the underlying mechanisms of host invasion have not been seriously addressed for PVes to date. To the contrary, their presence in humans has exclusively been regarded as old primate inheritance (Bernard et al. 2006
), without considering alternative explanations and without accounting for the topological inconsistencies described above.
The split between CCPV + PCPV and HPV-13 has been considered to reflect the speciation between Pan and Homo (Van Ranst et al. 1995
; Halpern 2000
), but such an assumption ignores the derived phylogenetic position of chimpanzee PVes within the
-PVes (García-Vallvé et al. 2005; Bravo and Alonso 2007). A similar polyphyletic pattern of non-human PVes nested within various HPV species has also been observed for the ß-PVes (Gottschling et al. 2007
). Despite the inferred importance of the evolutionary mechanisms discussed above, adaptive radiation in a PV ancestor (e.g., by establishment of new ecological niches) followed by temporally close, host-linked evolution (García-Vallvé et al. 2005; Bravo and Alonso 2007) may also explain the present tree topologies of
- and ß-PVes. Initial analyses for the identification of PVes in the normal skin of different animals have recovered hundreds of partial sequences from putative novel viruses (Forslund et al. 1999
; Antonsson and Hansson 2002
; Ogawa et al. 2004
). These results suggest that a puzzling diversity of PVes within the same host is the rule rather than the exception. However, this implies that host-linked evolution can be primarily reconstructed at shallow (such as the L1 gene-based "species," de Villiers et al. [2004]
) rather than at deeper taxonomic level ("genera").
| Conclusions |
|---|
|
|
|---|
Our data shows that nt alignments harbor more sequence heterogeneity than aa alignments, and we propose to exclusively use aa sequence data in future PV phylogenetic analyses despite the significantly larger computational cost. Comparing the single genes in separate analyses, only few nodes show well-supported phylogenetic contradictions. Particularly, the L2 gene appears to exhibit a high potential of biasing phylogeny reconstructions, which is a strong argument to use the E1E2L1 ORF combination for multigene analyses. Based on well-resolved molecular phylogenies using this gene combination, diversification within PVes cannot be explained monocausally but rather results from multiple evolutionary mechanisms. The relative frequencies of host-linked evolution, adaptive radiation, recombination, and lateral gene transfer (fig. 4) must therefore be quantified and their potential for reciprocal interaction be analyzed. Each single case may typically include components of each of those mechanisms. The most plausible explanation may challenge traditional views about the interactions between warm-blooded vertebrates and their colonizing PVes. Thus, we recommend the development and improvement of phylogenetic methods that detect and remove those parts of the data containing a high level of perturbing signals. Finally, the generation of phylogenetically representative full-genome PV sequences especially from nonhuman hosts is necessary in order to fill the numerous gaps in the current knowledge about PV evolution.
|
| Supplementary Material |
|---|
|
|
|---|
Supplementary figures S1 and S2 and tables S1S4 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Footnotes |
|---|
Aoife McLysaght, Associate Editor
| References |
|---|
|
|
|---|
Ahola H, Bergman P, Ström AC, Moreno-Lopéz J, Pettersson U. Organization and expression of the transforming region from the European elk papillomavirus (EEPV). Gene (1986) 50:195205.[CrossRef][ISI][Medline]
Antonsson A, Hansson BG. Healthy skin of many animal species harbors papillomaviruses which are closely related to their human counterparts. J Virol (2002) 76:1253712542.
Arias-Pulido H, Peyton CL, Torrez-Martínez N, Anderson DN, Wheeler CM. Human papillomavirus type 18 variant lineages in United States populations characterized by sequence analysis of LCR-E6, E2, and L1 regions. Virology (2005) 338:2234.[CrossRef][ISI][Medline]
Baurain D, Brinkmann H, Philippe H. Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors? Mol Biol Evol (2007) 24:69.
Bedell MA, Hudson JB, Golub TR, Turyk ME, Hosken M, Wilbanks GD, Laimins LA. Amplification of human papillomavirus genomes in vitro is dependent on epithelial differentiation. J Virol (1991) 65:22542260.
Bernard H-U. The clinical importance of the nomenclature, evolution and taxonomy of human papillomaviruses. J Clin Virol (2005) 32:16.[ISI][Medline]
Bernard H-U, Calleja-Macias IE, Dunn ST. Genome variation of human papillomavirus types: phylogenetic and medical implications. Int J Cancer (2006) 118:10711076.[CrossRef][ISI][Medline]
Bernard H-U, Chan S-Y, Manos MM, Ong C-K, Villa LL, Delius H, Peyton CL, Bauer HM, Wheeler CM. Identification and assessment of known and novel human papillomaviruses by polymerase chain reaction amplification, restriction fragment length polymorphisms, nucleotide sequence, and phylogenetic algorithms. J Infect Dis (1994) 170:10771085.[ISI][Medline]
Bible JM, Mant C, Best JM, et al, (11 co-authors). Cervical lesions are associated with human papillomavirus type 16 intratypic variants that have high transcriptional activity and increased usage of common mammalian codons. J Gen Virol (2000) 81:15171527.
Bininda-Emonds ORP. Distributed by the author (2006) Jena (Germany): Friedrich-Schiller-Universität.
Bogaert L, Martens A, De Baere C, Gasthuys F. Detection of bovine papillomavirus DNA on the normal skin and in the habitual surroundings of horses with and without equine sarcoids. Res Vet Sci (2005) 79:253258.[CrossRef][ISI][Medline]
Boshart M, Gissmann L, Ikenberg H, Kleinheinz A, Scheurlen W, zur Hausen H. A new type of papillomavirus DNA, its presence in genital cancer biopsies and in cell lines derived from cervical cancer. EMBO J (1984) 3:11511157.[ISI][Medline]
Bravo IG, Alonso Á. Mucosal human papillomaviruses encode four different E5 proteins whose chemistry and phylogeny correlate with malignant or benign growth. J Virol (2004) 78:1361313626.
Bravo IG, Alonso Á. Phylogeny and evolution of papillomaviruses based on the E1 and E2 proteins. Virus Genes (2007) 34:249262.[CrossRef][ISI][Medline]
Bravo IG, Müller M. Codon usage in papillomavirus genes. Papillomavirus Rep (2005) 16:6372.[CrossRef]
Campo MS, Moar MH, Laird HM, Jarrett WF. Molecular heterogeneity and lesion site specificity of cutaneous bovine papillomaviruses. Virology (1981) 113:323335.[CrossRef][ISI][Medline]
Carney HC, England JJ, Hodgin EC, Whiteley HE, Adkison DL, Sundberg JP. Papillomavirus infection of aged Persian cats. J Vet Diagn Invest (1990) 2:294299.
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol (2000) 17:540552.
Chambers G, Ellsmore VA, O'Brien PM, Reid SWJ, Love S, Campo MS, Nasir L. Association of bovine papillomavirus with the equine sarcoid. J Gen Virol (2003) 84:10551062.
Chan S-Y, Bernard H-U, Ratterree M, Birkebak TA, Faras AJ, Ostrow RS. Genomic diversity and evolution of papillomaviruses in rhesus monkeys. J Virol (1997) 71:49384943.[Abstract]
Chan S-Y, Delius H, Halpern AL, Bernard H-U. Analysis of genomic sequences of 95 papillomavirus types: uniting typing, phylogeny, and taxonomy. J Virol (1995) 69:30743083.[Abstract]
Chan S-Y, Ho L, Ong C-K, et al, (11 co-authors). Molecular variants of human papillomavirus type 16 from four continents suggest ancient pandemic spread of the virus and its coevolution with humankind. J Virol (1992) 66:20572066.
Chen EY, Howley PM, Levinson AD, Seeburg PH. The primary structure and genetic organization of the bovine papillomavirus type 1 genome. Nature (1982) 299:529534.[CrossRef][Medline]
Chen Z, Schiffman M, Herrero R, DeSalle R, Burk R. Human papillomavirus (HPV) types 101 and 103 isolated from cervicovaginal cells lack an E6 open reading frame (ORF) and are related to gamma-papillomaviruses. Virology (2007) 360:447453.[CrossRef][Medline]
Chen Z, Terai M, Fu L, Herrero R, DeSalle R, Burk RD. Diversifying selection in human papillomavirus type 16 lineages based on complete genome analyses. J Virol (2005) 79:70147023.
Christensen ND, Cladel NM, Reed CA, Budgeon LR, Welsh PA, Patrick SD, Kreider JW. Laboratory production of infectious stocks of rabbit oral papillomavirus. J Gen Virol (1996) 77:17931798.
Christensen ND, Cladel NM, Reed CA, Han R. Rabbit oral papillomavirus complete genome sequence and immunity following genital infection. Virology (2000) 269:451461.[CrossRef][ISI][Medline]
Cole ST, Danos O. Nucleotide sequence and comparative analysis of the human papillomavirus type 18 genome. Phylogeny of papillomaviruses and repeated structure of the E6 and E7 gene products. J Mol Biol (1987) 193:599608.[CrossRef][ISI][Medline]
Cunningham CW. Can three incongruence tests predict when data should be combined? Mol Biol Evol (1997) 14:733740.[Abstract]
Danos O, Katinka M, Yaniv M. Human papillomavirus 1a complete DNA sequence: a novel type of genome organization among papovaviridae. EMBO J (1982) 1:231236.[ISI][Medline]
Delius H, Hofmann B. Primer-directed sequencing of human papillomavirus types. Curr Top Microbiol Immunol (1994) 186:1331.[Medline]
de Villiers E-M, Fauquet C, Broker TR, Bernard H-U, zur Hausen H. Classification of papillomaviruses. Virology (2004) 324:1727.[CrossRef][ISI][Medline]
Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. (2005) 6:361375.[ISI][Medline]
Dimmic MW, Rest JS, Mindell DP, Goldstein RA. rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol (2002) 55:6573.[CrossRef][ISI][Medline]
Doorbar J. The papillomavirus life cycle. J Clin Virol (2005) 32:715.[ISI][Medline]
Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol (2006) 4:e88.[CrossRef][Medline]
Drummond AJ, Nicholls GK, Rodrigo AG, Solomonc W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics (2002) 161:13071320.
Drummond AJ, Rambaut A. BEAST 1.3. In: Bayesian evolutionary analysis sampling trees (2003) Oxford: Department of Zoology, University of Oxford.
Egawa K. Eccrine-centred distribution of human papillomavirus 63 infection in the epidermis of the plantar skin. Br J Dermatol (2005) 152:993996.[CrossRef][ISI][Medline]
Eg



