Molecular Biology and Evolution 19:1656-1671 (2002)
© 2002 Society for Molecular Biology and Evolution
Molecular Phylogeny of Living Xenarthrans and the Impact of Character and Taxon Sampling on the Placental Tree Rooting



||
¶
*Laboratoire de Paléontologie, Paléobiologie et Phylogénie, Institut des Sciences de l'Evolution, Université Montpellier II, Montpellier, France;
Biology and Biochemistry, Queen's University of Belfast;
Department of Biology, University of California;
Department of Biochemistry, University of Nijmegen, Nijmegen, The Netherlands;
||Bioinformatics, GlaxoSmithKline, Pennsylvania;
¶Institute for Biodiversity and Ecosystems Dynamics, University of Amsterdam
| Abstract |
|---|
|
|
|---|
Extant xenarthrans (armadillos, anteaters and sloths) are among the most derived placental mammals ever evolved. South America was the cradle of their evolutionary history. During the Tertiary, xenarthrans experienced an extraordinary radiation, whereas South America remained isolated from other continents. The 13 living genera are relicts of this earlier diversification and represent one of the four major clades of placental mammals. Sequences of the three independent protein-coding nuclear markers
2B adrenergic receptor (ADRA2B), breast cancer susceptibility (BRCA1), and von Willebrand Factor (VWF) were determined for 12 of the 13 living xenarthran genera. Comparative evolutionary dynamics of these nuclear exons using a likelihood framework revealed contrasting patterns of molecular evolution. All codon positions of BRCA1 were shown to evolve in a strikingly similar manner, and third codon positions appeared less saturated within placentals than those of ADRA2B and VWF. Maximum likelihood and Bayesian phylogenetic analyses of a 47 placental taxa data set rooted by three marsupial outgroups resolved the phylogeny of Xenarthra with some evidence for two radiation events in armadillos and provided a strongly supported picture of placental interordinal relationships. This topology was fully compatible with recent studies, dividing placentals into the Southern Hemisphere clades Afrotheria and Xenarthra and a monophyletic Northern Hemisphere clade (Boreoeutheria) composed of Laurasiatheria and Euarchontoglires. Partitioned likelihood statistical tests of the position of the root, under different character partition schemes, identified three almost equally likely hypotheses for early placental divergences: a basal Afrotheria, an Afrotheria + Xenarthra clade, or a basal Xenarthra (Epitheria hypothesis). We took advantage of the extensive sampling realized within Xenarthra to assess its impact on the location of the root on the placental tree. By resampling taxa within Xenarthra, the conservative Shimodaira-Hasegawa likelihood-based test of alternative topologies was shown to be sensitive to both character and taxon sampling. | Introduction |
|---|
|
|
|---|
Living xenarthrans are represented by three morphologically distinct lineages: armored armadillos, toothless anteaters, and phyllophagous tree-sloths. The 30 living species of the order Xenarthra (Wetzel 1985
Solving the phylogenetic position of the order Xenarthra within Mammalia is of primary importance to understand the morphological and biogeographical processes that shaped the early stages of placental evolution. Indeed, despite their highly specialized morphology, xenarthrans retain anatomical and physiological characters thought to be primitive for placental mammals (McKenna 1975
). This composite morphology, mixing ancestral and derived characters, has made the order's position within placentals very difficult to assess (Engelmann 1985
; Gaudin et al. 1996
). On the basis of the retention of numerous archaic features, morphologists (Gregory 1910
; McKenna 1975
; Novacek 1992
; Shoshani and McKenna 1998
) have long proposed that Xenarthra represents the sister-group to all other eutherians thereby named Epitheria. However, morphological synapomorphies defining epitherians are weak, and their phylogenetic distribution is equivocal (Gaudin et al. 1996
). Early studies of complete mitochondrial genomes did not support a basal position for Xenarthra (Arnason, Gullberg, and Janke 1997
). More recent analysesincluding a larger taxon samplingsuggested a sister-group relationship between the nine-banded armadillo (Dasypus novemcinctus) and representatives of the African clade (Waddell et al. 1999
; Cao et al. 2000
; Mouchaty et al. 2000a
). However, complete mitochondrial genome analyses appear to be affected by insufficient taxon samplingespecially within Xenarthralikely responsible for long-branch attraction and rooting artifacts (Waddell et al. 1999
) and suffer from saturation in the deepest parts of the tree (Cao et al. 2000
; Springer et al. 2001
).
By contrast two recent independent analyses based on the large concatenation of mainly nuclear genes for a broad taxon sampling of eutherian mammals have shown that the order Xenarthra on its own represents one of the major clades of placentals (Madsen et al. 2001
; Murphy et al. 2001a,
2001b
). These studies provided convincing evidence for an arrangement of placental orders into four major clades: (I) Afrotheria, (II) Xenarthra, (III) Euarchontoglires and (IV) Laurasiatheria, emphasizing the crucial influence of tectonic events in their early differentiation. A close relationship between the Northern Hemisphere clades III and IV has been proposed (i.e., "Boreoeutheria"; Springer and de Jong 2001
) with either Afrotheria or Xenarthra as the most basal clade, suggesting a Southern Hemisphere origin for eutherian mammals (Eizirik, Murphy, and O'Brien 2001
; Madsen et al. 2001
). In fact, the relationships between these four clades directly depend on the unstable position of the root. Recent application of the Bayesian approach (Yang and Rannala 1997
; Huelsenbeck et al. 2001
) to this problem using a large character sampling supported a basal position of Afrotheria (Murphy et al. 2001b
). However, in this study, Xenarthra suffers from poor taxonomic representation (only three taxa) relative to the other three major clades that include 8 (Afrotheria), 11 (Euarchontoglires), and 20 (Laurasiatheria) taxa.
Here we present a study including the broadest taxonomic representation so far considered in a molecular approach to xenarthran phylogeny. We constructed a supermatrix of 47 placental taxa and three marsupial outgroups, including 12 of the 13 living xenarthran genera, for three genetically independent protein-coding nuclear genes:
2B Adrenergic receptor (ADRA2B), Breast Cancer Susceptibility exon 11 (BRCA1), and von Willebrand Factor exon 28 (VWF) representing a total of 5,130 aligned nucleotide sites. Choice of these nuclear markers was guided by their wide use for inferring the phylogeny of placental mammals (Springer et al. 1997
; Stanhope et al. 1998a,
1998b
; Madsen et al. 2001
), demonstrating that their resolving power was higher than for mitochondrial markers at the mammalian interordinal level (Springer et al. 1999
; Springer et al. 2001
). Furthermore, their coding status provides the opportunity to compare the phylogenetic signal contained in both nucleotides and amino acids. The extensive sampling realized within Xenarthra permits us to resolve intraordinal relationships, and to investigate the impact of an increased taxon sampling on the root position of the placental tree. Increased nucleotide sampling has a dominant impact on phylogenetic accuracy (Poe and Swofford 1999
; Rosenberg and Kumar 2001
), but increased taxon sampling has also been shown to facilitate phylogenetic inference (Lecointre et al. 1993
; Hillis 1996
; Rannala et al. 1998
). Indeed, breaking potentially long branches by adding taxa within one of the two most basal clades might help to stabilize the placental ingroup topology in its deepest parts. By using the maximum likelihood (ML) and the Bayesian framework this study aims to (1) compare the evolutionary dynamics and phylogenetic content of these three protein-coding nuclear markers evolving under different selective pressures, (2) resolve the phylogeny of living xenarthrans with special reference to armadillos, and (3) evaluate the effect of an increased taxon sampling within Xenarthra for finding the root of the placental tree.
| Materials and Methods |
|---|
|
|
|---|
Taxonomic Sampling
Thirteen xenarthran species representing all living generaexcept the rare and cryptic subterranean genus Chlamyphoruswere sampled. We chose the data set encompassing all representatives of placentals sequenced to date for the three nuclear coding genes ADRA2B, BRCA1, and VWF. Three marsupials (Macropus, Didelphis, and Vombatus) were used as outgroups to locate the root of the placental tree (see Supplementary Material).
Data Acquisition
Xenarthran samples preserved in 95% ethanol were stored in the mammalian tissue collection of the Institut des Sciences de l'Evolution de Montpellier (Catzeflis 1991
). Total DNAs were extracted for D. novemcinctus (nine-banded armadillo), Dasypus kappleri (great long-nosed armadillo), Chaetophractus villosus (larger hairy armadillo), Euphractus sexcinctus (six-banded armadillo), Zaedyus pichiy (pichi), Tolypeutes matacus (southern three-banded armadillo), Cabassous unicinctus (southern naked-tailed armadillo), P. maximus (giant armadillo), Cyclopes didactylus (pygmy anteater), Tamandua tetradactyla (collared anteater), Myrmecophaga tridactyla (giant anteater), Bradypus tridactylus (pale-throated three-toed sloth), and Choloepus didactylus (southern two-toed sloth).
The single exon gene ADRA2B was amplified and sequenced using the primers designed by Springer et al. (1997)
and the following additional pairs of forward (A) and reverse (B) primers: A4 (5'-GCCATCGCGGCNGYCRYCACCTTCCTCATC-3'), B4 (5'-GCTGCGYTTGGCAATCAGGTAGAGTCG-3'), and B5 (5'-GCGCCCAGGCTGTAGCTGAAGAAGAA-3'). The exon 28 of VWF was amplified according to Delsuc et al. (2001)
. PCR products for ADRA2B and VWF were purified from 1% agarose gels using Amicon Ultrafree-DA columns (Millipore) and sequenced on both strands using automatic sequencing (Big Dye Terminator cycle kit) on an ABI 310 (PE Applied Biosystems). BRCA1 gene sequences were obtained as described elsewhere (Teeling et al. 2000
; Madsen et al. 2001
). The 2,900-bp region of exon 11 was amplified in three overlapping segments, as described in Teeling et al. (2000)
and Madsen et al. (2001)
. PCR primers for each segment were as follows: [1] Forward UF1 (5'-GTTTCAAACTTGCATGTGGAGCC-3'), Reverse XR11 (5'-GCAGATTCTTTTTCCAATGATTCTG-3'); [2] Forward GF8 (5'-GGCCTTCATCCTGAGGATTTTATCAA-3'), Reverse R19 (5'-TGYAAATACTGAGTATCAAGTTCACT-3'); [3] Forward XBF17 (5'-TATGGCACTCARGAYAGTATCTCATT-3'), Reverse BRCA1B (5'-GTTGGAAGCAGGGAAGCTCTTCATC-3'). Sequencing was performed using PCR primers and additional internal primers in segment [1] Forward F4 (5'-GAAAGTTAATGAGTGGTTTTCCAGAA-3'), Reverse UR7 (5'-CTTCCTCCGATAGGTTTTCCCAA-3') and in segment [3] Forward F25 (5'-AACTAGGTAGAAACAGAGGRCCTA-3'), Reverse R26 (5'-TTAGGYCCTCTGTTTCTACCTAGTT-3').
The 26 xenarthran sequences new to this study have been deposited in the EMBL data bank. Taxonomy and accession numbers referring to all sequences used in this study are provided as Supplementary Material.
Sequence Alignment
Sequences were manually aligned with the ED editor of the MUST package (Philippe 1993
). We excluded a glutamic acid repeat region of ADRA2B and also a 21-bp region of BRCA1 that is repeated up to four times according to Madsen et al. (2001)
. The concatenated placental data set is 5,130 bp long with ADRA2B (1,152 bp), BRCA1 (2,788 bp), and VWF (1,190 bp). All introduced gaps were treated as missing data in subsequent analyses. Alignments are available upon request.
Phylogenetic Analyses
All phylogenetic analyses were conducted under the ML and Bayesian approaches. ML was chosen because it is known to be less sensitive to potential long-branch attraction artifacts, and it takes into account the underlying molecular evolutionary process (Swofford et al. 2001
). The Bayesian approach allows the analysis of large phylogenetic data sets under complex evolutionary models (Huelsenbeck et al. 2001
). Using a likelihood framework also permits the statistical comparison of competing hypotheses and topologies obtained from different partition and rooting schemes (Huelsenbeck and Rannala 1997
; Whelan, Lio, and Goldman 2001
).
Results from Modeltest 3.06 (Posada and Crandall 1998
) based on the Akaike Information Criterion (AIC) indicate that the General Time Reversible model (GTR, Yang 1994
) plus a gamma (
) distribution of parameter
(Yang 1996a
) and a proportion I of invariable sites was the best model for each of the three data sets. However, to render ML analyses comparable between PAUP* version 4.0b8 (Swofford 1998
) and PAML version 3.0c (Yang 1997
) which does not allow the use of invariable sites, we choose the GTR +
8 model with eight discrete rate categories to represent the continuous
distribution.
In order to avoid excessive computation time, ML analyses with PAUP* version 4.0b8 (Swofford 1998
) were conducted using a loop approach to estimate the best tree and the optimal likelihood parameters. First, substitution rate matrix and among-site substitution rate heterogeneity parameters were optimized on a neighbor-joining (NJ) topology derived from ML distances obtained using a GTR +
8 model (
= 0.5). Second, a ML heuristic search was conducted by Tree Bisection Reconnection (TBR) branch swapping to identify the optimal tree under these GTR +
8 parameter estimates. Third, likelihood parameters were reestimated on this new topology. Four, a new heuristic tree search was run under the reestimated GTR +
8 parameters. This loop procedure was performed until stabilization of both topology and parameters was attained (after three cycles). The Bayesian approach to phylogenetic reconstruction (Yang and Rannala 1997
; Huelsenbeck et al. 2001
) was implemented using MrBayes 2.01 (Huelsenbeck and Ronquist 2001
). Metropolis-coupled Markov chain Monte Carlo (MCMCMC) sampling was performed with four chains that were run for 500,000 generations, using default model parameters as starting values.
The robustness of nucleotide-derived trees was estimated by Bootstrap Percentages (BP) (Felsenstein 1985
) computed by PAUP* using the optimal ML estimated parameters, with NJ starting trees and TBR branch swapping. The number of TBR rearrangements was unlimited for the Xenarthra data set (1,000 bootstrap replications) and set to 1,000 per replicate for the Placental data set (100 bootstrap replications). Bayesian posterior probabilities were picked from the 50% majority rule consensus of trees sampled every 20 generations after removing trees obtained before chains reached apparent stationarity (burn-in determined by empirical checking of likelihood values).
Evaluation of the Saturation of Nucleotide Substitutions
The nucleotide substitution saturation of the phylogenetic markers was evaluated with the graphical method used by Philippe and Forterre (1999)
. The inferred number of substitutions between each pair of sequences was estimated from the ML tree as the sum of the lengths of all branches linking these two sequences. The saturation level was estimated by plotting the number of observed differences as a function of the ML inferred number of substitutions for all 1,225 pairwise comparisons for 50 sequences. In these graphics, the straight line (Y = X) represents the situation for which the number of inferred substitutions equals the number of observed differences, i.e., there is no detected homoplasy in the data. The nucleotide substitution saturation is evidenced when the number of inferred substitutions increased, whereas the number of observed differences remained constant (plateau shape).
Statistical Tests of the Position of the Root: Impact of Character Partitions and Taxon Sampling
Extensive studies (Madsen et al. 2001
; Murphy et al. 2001a,
2001b;
this article) identified four major clades of placentals: Afrotheria, Xenarthra, Euarchontoglires, and Laurasiatheria. Assuming the respective monophyly of these cladesas suggested by the ML analyses of the datathere are 15 possible ways to connect the four clades into a bifurcating topology rooted by marsupials. These 15 alternative topologies were compared by the ML test of Kishino and Hasegawa (1989)
, with introduction of the Shimodaira and Hasegawa (1999)
correction for multiple comparisons of topologies defined a posteriori (SH test).
To evaluate the impact of character sampling on the location of the root, we performed SH tests using different character partitions of the combined data set. As ADRA2B, BRCA1, and VWF locate to different chromosomes and their protein products markedly differ in function, these nuclear genes likely evolve under differential selective pressures. To accommodate expected differences in molecular evolution, we followed the approach of Yang (1996b)
for analyzing multiple genes. Thus, each codon position of the three genes has been attributed one independent GTR +
8 model for which the different sets of likelihood parameters (base frequencies, substitution rate parameters, the
parameter of the
distribution, and branch lengths) were estimated. This allows us to take into account both differences in substitution pattern and rate heterogeneity between both the three codon positions and the three different genes. This partitioned ML model was used in PAML to compute log-likelihoods and confidence probability values for the 15 topologies corresponding to the possible locations of the root between the four major clades of placentals. The following four sets of characters were analyzed: (1) first and second codon positions of the three genes (ABV 1 + 2; 3,421 sites; 6 partitions), (2) first and second codon positions of ADRA2B and VWF plus all codon positions of BRCA1 (A2BV2; 4,350 sites; 7 partitions), (3) all codon positions of the three genes (ABV 1 + 2 + 3; 5,130 sites; 9 partitions), and (4) concatenated amino acids of the three proteins (ABV AA; 1,709 sites). Because of computational time constraints, a single JTT (Jones, Taylor, and Thornton 1992
) plus
8 model was assumed for the amino acid concatenated data set.
To assess the impact of increased taxon sampling within Xenarthra on the location of the root, we conducted SH tests after resampling of xenarthran species belonging to its three subgroups: sloths (two taxa), anteaters (three), and armadillos (eight). To do this, we built all taxon combinations with each one of the 13 available xenarthrans (13 combinations), with one from all three xenarthran subclades (48 combinations of three taxa), with two xenarthrans per subclade (84 combinations of six taxa), with three (except for sloths) xenarthrans per subclade (56 combinations of eight taxa), and with all xenarthrans (the original combination including 13 taxa). Because of computational time limitations, PAUP* was used to perform SH tests assuming a single GTR +
8 model on the different concatenated data sets.
| Results |
|---|
|
|
|---|
Molecular Evolution of the Three Nuclear Markers
Base Compositions
The three nuclear genes exhibit marked differences in base composition. ADRA2B and VWF appear quite similar in being GC-rich, with mean overall values of 64.9% and 61.9%, respectively. On the contrary, BRCA1 is rather AT-rich with a mean overall value of 59.6%. As expected, these differences in base composition are especially contrasted on third codon positions. Indeed, ADRA2B and VWF show mean GC3 values of more than 80%, whereas this value is only 30% for BRCA1. This GC3 enrichment is particularly pronounced in xenarthrans that possess values of more than 90% for ADRA2B and VWF. This phenomenon implies that the ML base composition stationarity assumption is satisfied for BRCA1 but not for several taxa when considering all codon positions of ADRA2B and VWF (P < 0.05;
2 test of TREE-PUZZLE 4.02; Strimmer and von Haeseler 1996
Evolutionary Dynamics
Comparisons of ML GTR +
8 parameters estimated for each codon position reveal a particular behavior of BRCA1, relative to ADRA2B and VWF (fig. 1
). Indeed, for the two latter genes, we observed almost the same pattern of substitutions at first and second codon positions. These positions are characterized by an excess of transitions over transversions especially marked at second codon positions but seem to exhibit comparable relative evolutionary rates. Furthermore, these positions share similarly small values of the gamma shape parameter (
< 0.65) indicating strong rate heterogeneity at these positions within each gene. As expected from genetic code redundancies, third codon positions of ADRA2B and VWF evolve about three times faster than their respective first and second codon positions. High gamma shape parameter values (
> 2.92) indicate that these fast rates have a rather homogenous distribution along third codon positions of these exons. In striking contrast to this typical coding gene pattern, all codon positions of BRCA1 seem to behave similarly and evolve at about the same rate (fig. 1
). Indeed, there is almost no difference between the three codon positions in terms of the transition-transversion ratio, the gamma shape parameter, and the relative evolutionary rate. They are all characterized by high values of the gamma shape parameter (
> 2.09) and an evolutionary rate about three times higher than the slowest evolving character partition, i.e. the second codon positions of ADRA2B.
|
Given the marked differences between the three genes and between their codon positions, we combined them following the partitioned likelihood approach of Yang (1996b)
8 and branch length parameters is attributed to each codon position. This result indicates that the partitioned likelihood parameter-rich model incorporating nine independent sets of 106 free parametersthree base frequencies, five GTR rates, one Gamma shape, and 97 branch lengthsmore appropriately describes the underlying evolutionary process than a single GTR +
8 model for the whole concatenation. Additional analyses based on the AIC (not shown) indicate that the better fit of the partitioned models according to genes and codon positions is caused, in decreasing order of impact, by variable evolutionary rates along branches (cf. individual phylograms provided in the Supplementary Material section), to variable rates between sites (as measured by the
distribution), to variable base compositions, and to variable GTR substitution rates. Given the huge computation time constraints to search for the best tree under partitioned likelihood models, we however conducted heuristic searches using a single GTR +
8 model under PAUP*, but evaluated alternative rooting hypotheses under partitioned models with PAML.
Nucleotide Substitution Saturation Analyses
Saturation plots of the pairwise observed differences between the 50 sequences as a function of the pairwise number of substitutions inferred on the ML tree (Philippe and Forterre 1999
) are presented in figure 2
for third codon positions of each gene. The slope of the regression line between the numbers of observed differences and inferred substitutions is an indication of the relative saturation level of the characters considered: the greater the number of inferred substitutions (abscissa) is for a given number of differences (ordinate), the higher the level of saturation. Under this graphical representation, third codon positions of BRCA1 (slope = 0.44) appear less saturated than third codon positions of ADRA2B (slope = 0.28) and VWF (slope = 0.19) for which there is considerable dispersion of pairwise comparison points. This pattern indicates that multiple substitutions are more frequent at these positions and that a beginning of saturation is reached even for comparisons between quite closely related taxa. For the 141 pairwise comparisons involving the three marsupials against the 47 placentals (triangles in fig. 2
), the plots reach a plateau for ADRA2B and VWF (slopes of 0.004 and 0.05, respectively) but not for BRCA1 (slope = 0.31). So, third codon positions of BRCA1 show a weak tendency to form a plateau, being almost linearly distributed just below the Y = X straight line. These saturation analyses strengthen the peculiarity of BRCA1 third codon positions, which appear to be less saturated than those of ADRA2B and VWF. By contrast, saturation analyses for first and second codon positions of the three genes did not reveal significant saturation, graphical patterns being very similar to the one of BRCA1 third codon positions (data not shown). Therefore, third codon positions of BRCA1 do not seem to exhibit strong saturation in nucleotide substitutions and are more likely to have retained some deep phylogenetic signal than those of ADRA2B and VWF.
|
Congruence Between Individual Genes
Topological congruence between the individual genes was evaluated by crossed SH tests in which the highest-likelihood topologies obtained with individual and combined data sets were compared against each other (table 1 ). Each individual data set significantly rejects the ML topology of the other two, with the exception of the BRCA1 ML topology under the ADRA2B data set. However, none of the three individual data sets significantly rejects the ML topology obtained with the combined data set (0.18 < P < 0.61), suggesting that combining individual genes leads to a phylogenetic estimate compatible with the signal contributed by each individual gene. ML trees derived from each of the three individual data sets that were combined are supplied as supplementary material. Thus, we decided to concatenate the three data sets in a total evidence approach designed to maximize the number of characters analyzed.
|
Phylogenetic Results
Xenarthrans
The ML phylogram depicting intraxenarthran relationships is presented in figure 3 . It shows an almost fully resolved topology apart from two notable exceptions within the armadillos (Dasypodidae). Some of the nodes (Folivora, Vermilingua, Cingulata, and Dasypus) are defined by exclusive amino acid deletions occurring in the ADRA2B and BRCA1 proteins. It is also noticeable that all armadillos included in this study have a very reduced glutamic acid repeat region in ADRA2B as compared with most other placental mammals (data not shown). Within Pilosa, the monophyly of sloths (Folivora), anteaters (Vermilingua), and Myrmecophaginae (Myrmecophaga + Tamandua) receives 100% ML bootstrap (BPML) support and a Bayesian posterior probability (Pbay) of 1.00. The concatenation of the three genes also strongly supports (BPML = 100; Pbay = 1.00) the existence of three distinct lineages within Cingulata: (Dasypus), (Priodontes, Cabassous, Tolypeutes) and (Chaetophratus, Euphractus, Zaedyus). The genus Dasypus, represented here by two divergent species (D. novemcinctus and D. kappleri), appears well separated from the two other groups, which are closely related (BPML = 100; Pbay = 1.00). However, relationships within these two clades (Tolypeutinal and Euphractinal) are characterized by extremely short internal branches, low ML bootstrap supports, and moderate Bayesian posterior probabilities.
|
Placentals
Figure 4 presents the ML phylogram obtained for the relationships among placental mammals using the A2BV2 data set. Despite the shortness of its deepest nodes, this tree strongly supports (BPML > 98; Pbay = 1.00) four major placental clades: (I) Afrotheria, (II) Xenarthra, (III) Euarchontoglires and (IV) Laurasiatheria. The grouping of superclades III and IV into Boreoeutheria is also robustly evidenced (BPML = 98; Pbay = 1.00). In this ML topology, Afrotheria appears as the first offshoot of the placental tree, with moderate support (BPML = 61; Pbay = 0.81). However, the Afrotheria + Xenarthra hypothesis is preferred when removing all third codon positions (BPML = 57; Pbay = 0.46). Relationships among afrotherians are rather well resolved. Our results strongly support the division of Afrotheria in two main clades: Paenungulata (Proboscidea, Sirenia, and Hyracoidea; BPML = 100; Pbay = 1.00) and a clade grouping Tubulidentata, Macroscelidea, Chrysochloridae, and Tenrecidae (BPML = 99; Pbay = 1.00). Whereas the relationships among paenungulates remain unresolved, there is strong support for a sister-group relationship between Chrysochloridae and Tenrecidae (BPML = 94; Pbay = 1.00) and the early emergence of Tubulidentata is evidenced (BPML 99; Pbay =1.00). Within Boreoeutheria, our data robustly support the monophyly of rodents (BPML= 100; Pbay = 1.00) within the superclade III, but their grouping with lagomorphs into Glires is only moderately supported (BPML = 60; Pbay = 0.97). The interrelationships of Primates, Scandentia, and Dermoptera are not resolved by our data. Within Laurasiatheria, we retrieved the monophyly of Eulipotyphla (core insectivores), Cetartiodactyla, Perissodactyla, and Chiroptera (BPML = 100; Pbay = 1.00) and a robust sister-group relationship between Carnivora and Pholidota is evidenced (BPML = 94; Pbay = 1.00). However, the relationships between these five monophyletic groups remain unclear, with the exception of an early divergence separating Eulipotyphla from the others (BPML = 89; Pbay = 1.00).
|
Statistical Tests of the Position of the Root
Effect of Character Sampling
To evaluate the stability of the root of the placental tree in the three data sets, we compared the 15 possible bifurcating topologies depicting ingroup relationships between the four major clades using the SH test (table 2 ). The results of the partitioned SH tests appear to depend on the analyzed characters. Indeed, when only first and second codon positions (ABV 1 + 2) are considered, a basal Xenarthra + Afrotheria clade appears as the most likely hypothesis, whereas the early emergence of Afrotheria is favored when considering either only first and second codon positions of ADRA2B and VWF plus all codon positions of BRCA1 (A2BV2), all codon positions of the three genes (ABV 1 + 2 + 3), or amino acids (ABV AA). However, the first three hypotheses (basal Afrotheria, basal Afrotheria + Xenarthra, and Epitheria) are not significantly different from a statistical perspective and are almost indistinguishable on the basis of their likelihood scores when considering only positions 1 + 2 or amino acids (table 2 ). All other alternative hypotheses but two are significantly worse at the 5% level whatever data set is considered (table 2 ). The two remaining alternatives break the monophyly of Boreoeutheria by placing either Euarchontoglires or Laurasiatheria at the base of a tree in which Xenarthra and Afrotheria group together. Despite the fact that they are not significantly rejected by the SH test (P values ranging from 0.106 to 0.203), they involved a severe drop in log-likelihood values relative to the three main hypotheses (table 2 ). The monophyly of Boreoeutheria is strongly suggested; thus it is highly likely that the position of the root falls along a branch connecting Afrotheria, Xenarthra, and Boreoeutheria.
|
Effect of Taxon Sampling
In order to test the effect of increasing taxon sampling within Xenarthra on the location of the root, we compared the results of SH tests for the three main competing hypotheses previously identified (basal Afrotheria, Afrotheria + Xenarthra, and Epitheria) using all possible trees, including 1 (13 combinations), 3 (48), 6 (84), 8 (56), and 13 (1) xenarthran taxa. Table 3 summarizes the percentage of times that each of these three alternatives appears as the best hypothesis. As previously observed (table 2 ) when considering all 13 xenarthrans for the first and second codon positions of the three genes, the Afrotheria + Xenarthra hypothesis appeared as the best hypothesis instead of the basal Afrotheria hypothesis. This result is dependent on the number of species chosen to represent Xenarthra. Indeed, if only one xenarthran is considered, the basal Afrotheria hypothesis is the most likely in 77% of the comparisons when considering only positions 1 + 2. Increasing the xenarthran taxonomic representation inverted the tendency, and results converged toward the Afrotheria + Xenarthra hypothesis (table 3 ). In contrast, the situation was clearer when using the two longer data sets. In those cases, a basal position of Afrotheria was preferred in all comparisons but two (when using the sloths Choloepus or Bradypus with all codon positions). Interestingly, despite the fact that Epitheria was not rejected at the 5% significance level, it was never scored as the highest likelihood hypothesis over all 606 topologies explored.
|
To appreciate the extent of likelihood variation between comparisons including a variable number of xenarthran taxa, we graphically represented the SH test P values observed for the Epitheria hypothesis, as a function of the number of xenarthrans considered (fig. 5 ). This graph illustrates the interacting effects of increasing both taxa and character sampling. Two effects can be evidenced. First, the longer the sequences, the lower the support for the Epitheria hypothesis. Second, the larger the species sampling, the smaller the P value dispersion range. However, it seems that increasing the number of analyzed characters leads to the expansion of P value ranges by contrasting the differences between estimates. For example, when a single xenarthran is sampled for the all-codon-positions data set, the P values range from a highly nonsignificant value of 0.631 with the sloth Bradypus, to a marginally significant value of 0.054 with the armadillo Cabassous. Thus, including only a small number of selected xenarthran representatives could be misleading in terms of the results of hypotheses testing, especially when a large number of nucleotides is considered.
|
| Discussion |
|---|
|
|
|---|
Comparative Evolutionary Dynamics and Phylogenetic Content of the Three Nuclear Markers
Detailed analyses of the evolutionary dynamics of the three nuclear protein-coding genes revealed marked differences in terms of base compositions and substitution patterns. Each of these genes is located on a different chromosome in the human genomechromosomes 2, 12, and 17 for ADRA2B, VWF, and BRCA1, respectivelyand observed differences in base composition might be related to the isochore structure of the genome (Bernardi 2001
> 2.09). This unusual pattern of evolution might reflect peculiar selective constraints acting on this large spectrum regulatory protein (Deng and Brodie 2000
Placental Mammal Phylogeny
In many aspects, the phylogenetic picture of placental mammal relationships that we obtained is fully compatible with the challenging studies of Madsen et al. (2001)
and Murphy et al. (2001a,
2001b)
. Our study based on the detailed ML analysis at the DNA and amino acid levels of 5,130 bp of three nuclear protein-coding genes yielded conclusions consistent with those obtained on 16,397 bp by Murphy et al. (2001b)
. They strongly confirmed the existence of the four major placental clades Xenarthra, Afrotheria, Euarchontoglires, and Laurasiatheria and add support to the grouping of the Northern Hemisphere clades Euarchontoglires and Laurasiatheria in the so-called Boreoeutheria (Springer and de Jong 2001
).
Xenarthra
We present here the first comprehensive study of xenarthran molecular phylogeny based on the analysis of three nuclear markers for 12 of the 13 living genera. Analyses of the placental data sets (fig. 4
) unambiguously supported the monophyly of the order Xenarthra and its division in the two suborders Cingulata (armadillos) and Pilosa (anteaters and sloths). This result confirms those obtained on both morphological (Engelmann 1985
; Patterson et al. 1992
; Gaudin 1999
) and molecular (van Dijk et al. 1999
; Delsuc et al. 2001
; Madsen et al. 2001
; Murphy et al. 2001a
) characters. Our results also led to a clear picture of Xenarthra interrelationships with almost all nodes robustly evidenced (fig. 3
).
Relationships within anteaters and sloths (Pilosa) are fully congruent with previous analyses (Delsuc et al. 2001
) and confirm both the respective monophyly of anteaters (Vermilingua) and sloths (Folivora), and the early emergence of the pygmy anteater (Cyclopes) within Vermilingua (fig. 3
). These results emphasize the very deep split between larger body sized anteaters (Tamandua and Myrmecophaga) and the pygmy Cyclopes reflected by numerous morphological peculiarities related to its strictly arboreal way of life (Gaudin and Branham 1998
).
Regarding armadillos (Cingulata, Dasypodidae), our results clearly identify three distinct lineages corresponding to the subfamilies defined by McKenna and Bell (1997, pp. 82102)
: Dasypodinae (D. kappleri and D. novemcinctus), Euphractinae (Chaetophractus, Euphractus, and Zaedyus) and Tolypeutinae (Priodontes, Cabassous, and Tolypeutes). The evidence strongly supports the early emergence of Dasypodinae, whereas Tolypeutinae and Euphractinae unequivocally cluster together (fig. 3
). Such a relationship has already been proposed based on the study of spermatozoa (Cetica et al. 1998
) and contradicts the early morphological assessment of Engelmann (1985)
. However, relationships within the subfamilies Euphractinae and Tolypeutinae remain unclear. Our results suggest the grouping of Euphractus and Zaedyus excluding Chaetophractus within Euphractinae. Such a grouping is consistently retrieved in most of the analyses but appears with only poor support in the total evidence ML tree (fig. 3
). This lack of resolution is not unexpected because the three genera are morphologically so similar that their interrelationships have always been considered as unresolved (Engelmann 1985
; Patterson, Segall, and Turnbull 1989
). More surprising is the trifurcation obtained within Tolypeutinae. Indeed, a close relationship between Cabassous and Priodontes was expected on the basis of their comparable external morphologies (Engelmann 1985
; Wetzel 1985
) and spermatozoa (Cetica et al. 1998
). By contrast, in our ML phylogeny, Priodontes emerges in a basal position relative to Cabassous and Tolypeutes but with low support (fig. 3
). Thus, relationships between these three genera are still unclear. However, Tolypeutes possesses a highly distinct morphology shaped by the anatomical constraints induced by its capacity to entirely roll into a ball, hence rendering the interpretation of homology in postcranial characters difficult. In this context, it is interesting to note that characters of the ear region tend to support a sister-group relationship between Tolypeutes and Cabassous as proposed by our data (Patterson, Segall, and Turnbull 1989
). The apparent lack of resolution observed within these two clades despite the high number of nucleotides analyzed raises the question of the rapidity by which the splitting events occurred. The shortness of internal nodes associated with low-support values suggest that rapid cladogenesis left only short time intervals for molecular synapomorphies to accumulate in these two groups. It is hoped that considering more rapidly evolving molecules, such as nuclear introns or mitochondrial genes, will help to resolve the only two remaining uncertainties in Xenarthra phylogeny.
Afrotheria
Despite its highly provocative nature for morphologists (Novacek 2001
), there is at present little doubt about the naturalness of the African clade for which support consistently increases as molecular evidences accumulate (de Jong, Leunissen, and Wistow 1993
; Douzery and Catzeflis 1995
; Lavergne et al. 1996
; Springer et al. 1997
, 1999
; Stanhope et al. 1998a,
1998b;
Mouchaty et al. 2000b;
Madsen et al. 2001
; Murphy et al. 2001a,
2001b;
van Dijk et al. 2001
). Our results provide strong ML bootstrap support for nodes within Afrotheria that were previously difficult to resolve (Madsen et al. 2001
; Murphy et al. 2001a
). Indeed, whereas the retrieval of the well-defined Paenungulata (elephants, sirenians, and hyraxes) was expected, a second major afrotherian clade regrouping Tubulidentata (aardvark), Macroscelidea (elephant shrews), Chrysochloridae (golden moles), and Tenrecidae (tenrecs) is robustly confirmed. Within this clade, the grouping of the two exinsectivoran families Chrysochloridae and Tenrecidae is also well supported. Moreover, in agreement with Murphy et al. (2001b)
, our results support the early emergence of Tubulidentata leading to the monophyly of an African insectivore-like clade (Macroscelidea + Afrosoricida). It is finally worth noting that, as in previous analyses of concatenated nuclear genes (Madsen et al. 2001
; Murphy et al. 2001a,
2001b
), relationships among paenungulates remain unsettled despite the high numbers of nucleotides analyzed. The resolution of this tricky question and the verification of the proposed sister-group relation between Macroscelidea and Afrosoricida might benefit from an expanding taxon sampling within Afrotheria.
Boreoeutheria
The association of Euarchontoglires and Laurasiatheria appears strongly supported in all ML analyses (fig. 4
), and its monophyly is suggested by most ML statistical tests (table 2
). Such an arrangement, fundamentally opposing Northern and Southern placental groups, argues for a more crucial importance of plate tectonic events in the early diversification of placental mammals than previously recognized (Eizirik, Murphy, and O'Brien 2001
; Madsen et al. 2001
; Murphy et al. 2001a,
2001b
). It also reveals the occurrence of parallel adaptive radiations on separated continental masses that are likely responsible for the long-standing difficulties encountered by systematic morphologists to reconstruct placental mammal evolution (Madsen et al. 2001
; Scally et al. 2001
).
The Euarchontoglires clade containing the orders Primates, Scandentia (tree shrews), Dermoptera (flying lemurs), Lagomorpha (rabbits, hares and pikas) and Rodentia (rodents) is consistently retrieved in all analyses with high support values (fig. 4
). The monophyly of rodents is unambiguously supported and their association with lagomorphs into Glires received moderate support. This arrangement, which has for a long time been favored by morphological studies (Gregory 1910
; Simpson 1945
), has proved difficult to retrieve in molecular studies, but was finally confirmed by larger taxon sampling (Murphy et al. 2001a;
Huchon et al. 2002
). However, it has been questioned whether the Murphy et al. (2001a)
data set is adequate to resolve the relationships of rabbits, rodents, and primates or needs more genes with longer sequences (Rosenberg and Kumar 2001
).
Similar to Afrotheria and Xenarthra, the clade Laurasiatheria (Waddell, Okada, and Hasegawa 1999
) is always supported by 100% bootstrap support values or Bayesian posterior probabilities of 1.00 in our analyses. The existence of this clade is corroborated by analyses of complete mitochondrial genomes, in which the basal position of the hedgehog is likely considered to be an artifact because of its peculiar nucleotide composition (Cao et al. 2000
; Mouchaty et al. 2000b;
Nikaido et al. 2001
). Within this clade, the proposal of an early divergence of Eulipotyphla from the remaining laurasiatherians (Waddell, Okada, and Hasegawa 1999
; Murphy et al. 2001b
) finds support in our data. The monophyly of Chiroptera is unambiguously supported by our data as is the paraphyly of Microchiroptera (Hipposideros, Megaderma, Tonatia, Myotis, Tadarida) and its subsequent implications for the evolution of echolocation and flight in bats (Hutcheon, Kirsch, and Pettigrew, 1998
; Teeling et al. 2000
, 2002
; Springer et al. 2001
). An interesting input provided by our data resides in the reliable support obtained for the sister-group relationship of Pholidota (pangolins) and Carnivora (fig. 4
). This grouping, called Ferae by Shoshani and McKenna (1998)
, was suggested by early molecular studies (de Jong et al. 1985
; Shoshani 1986
; de Jong, Leunissen, and Wistow 1993
). In the light of our results and those of Murphy et al. (2001b)
, it seems likely that the presence of a well-developed ossified tentorium in both carnivores and pangolins represents a true synapomorphy uniting these two orders (Shoshani and McKenna 1998
). Cetartiodactyla is indubitably monophyletic based on our data (fig. 4
) and other multiple sets of molecular evidence (Montgelard, Catzeflis, and Douzery 1997
; Shimamura et al. 1997
; Gatesy et al. 1999
). In addition, the artiodactyl affinities of whales have been confirmed by recent fossil discoveries (Gingerich et al. 2001
; Thewissen et al. 2001
). Among Cetartiodactyla, the relationships are entirely congruent with those obtained from studies of other genes (Montgelard, Catzeflis, and Douzery 1997
; Ursing and Arnason 1998
; Gatesy et al. 1999
; Madsen et al. 2001
; Murphy et al. 2001a,
2001b
) and SINE insertions (Shimamura et al. 1997
; Nikaido, Rooney, and Okada 1999
). Nevertheless, relationships among these five major Laurasiatherian clades remain unclear.
Character and Taxon Sampling and the Root of the Placental Tree
Despite extensive taxon sampling, our data did not allow the exact position of the root of the placental tree to be determined. Indeed, like Madsen et al. (2001)
and Murphy et al. (2001a)
, we identified three almost equally likely possible root locations corresponding to the following topological arrangements: basal Afrotheria, basal Xenarthra + Afrotheria, and basal Xenarthra (as sister-group to Epitheria). Recently, Murphy et al. (2001b)
proposed the root along the branch leading to Afrotheria. One of our goals was to evaluate to what extent increasing the taxon representation of a previously poorly sampled basal clade (Xenarthra) might help to stabilize the position of the root on the placental tree.
Although not statistically conclusive, a detailed examination of the impact of both character and taxon sampling on root location tests (SH tests) allows for the identification of clear trends. Thus, it is worth noting that Epitheria never appears as the most likely hypothesis, suggesting that Xenarthra does not represent the earliest offshoot of the placental tree as previously thought (McKenna 1975
). The location of the root for the two remaining hypotheses is sensitive to the molecular characters sampled. Indeed, the basal Afrotheria + Xenarthra hypothesis tends to be slightly favored when only first and second codon positions are used, whereas the basal Afrotheria solution is preferred when adding BRCA1 third codon positions or third codon positions of the three genes. It is however not clear whether this results from signals caused by the growing number of sampled characters (3,4215,130) or to noise coming from some saturated third codon positions. Moreover, it is striking to note that in amino acid analyses the three competing hypotheses for the position of the root (basal Afrotheria, Afrotheria + Xenarthra, and Epitheria) only differ by no more than 0.75 units of log-likelihood (table 2 ). Increasing the number of genes that are analyzed, in association with a critical examination of different codon or amino acid partitions, will be important in further investigations of the rooting problem.
It is also important to maintain adequate taxon sampling. Indeed, our results show that including only a single xenarthran representative could be misleading in terms of SH test results, especially when the number of analyzed characters increased. Actually, considering at least one xenarthran per major lineage (armadillos, anteaters, and sloths) might help to stabilize their position in the tree. It is likely that increasing the sampling within Xenarthra contributed toward reducing the effect of potential base composition biases or long branch attraction artifacts associated with the distantly related marsupial outgroups. We suggest here that additional sampling among Marsupials and the addition of a representative of Monotremes might also contribute toward attaining more reliability for assessing the root in placentals. Larger taxon sampling for different genes might help to find a reliable rooting of the ingroup by mitigating the p




