MBE Advance Access originally published online on July 28, 2007
Molecular Biology and Evolution 2007 24(10):2266-2276; doi:10.1093/molbev/msm156
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Mapping Human Genetic Ancestry
,
,||



,
,||
* Center for Integrative Bioinformatics of Vienna, Max F. Perutz Laboratories, Vienna, Austria
Leibniz Institute for Age Research—Fritz Lipmann Institute, Jena, Germany
University of Vienna, Austria
Medical University of Vienna, Austria
|| University of Veterinary Medicine Vienna, Austria
E-mail: ingo.ebersberger{at}univie.ac.at.
| Abstract |
|---|
|
|
|---|
The human genome is a mosaic with respect to its evolutionary history. Based on a phylogenetic analysis of 23,210 DNA sequence alignments from human, chimpanzee, gorilla, orangutan, and rhesus, we present a map of human genetic ancestry. For about 23% of our genome, we share no immediate genetic ancestry with our closest living relative, the chimpanzee. This encompasses genes and exons to the same extent as intergenic regions. We conclude that about 1/3 of our genes started to evolve as human-specific lineages before the differentiation of human, chimps, and gorillas took place. This explains recurrent findings of very old human-specific morphological traits in the fossils record, which predate the recent emergence of the human species about 5-6 MYA. Furthermore, the sorting of such ancestral phenotypic polymorphisms in subsequent speciation events provides a parsimonious explanation why evolutionary derived characteristics are shared among species that are not each other's closest relatives.
Key Words: lineage sorting species evolution human speciation homoplasy fossils
| Introduction |
|---|
|
|
|---|
Reconstructing the evolutionary process that has molded contemporary humans out of the ancestors shared with their closest relatives, the great apes, is one of the key objectives in evolutionary research (Nuttal 1904
However, with both amount of data and number of studies increasing, the crux of the matter emerges. Regardless of the type of phylogenetically informative data chosen for analysis, the evolutionary history of humans is reconstructed differently with different sets of data (summarized in Ebersberger 2004
). The dilemma is best exemplified when genetic distances are considered for evolutionary studies. The extent of DNA sequence divergence between humans and chimpanzees in combination with various calibration points for the molecular clock places the split of the 2 species around 6 million years before present (MYBP) (Glazko and Nei 2003
; Patterson et al. 2006
). This considerably young age of the human species conflicts with at least part of the fossil record on early human evolution (Brunet et al. 2005
; Patterson et al. 2006
). Moreover, datings obtained from different regions of the human genome vary over a range of more than 4 Myr—some placing the human-chimp split as recent as 4 MYBP (Patterson et al. 2006
)—where exons seem to support older splits than introns (Osada and Wu 2005
). Eventually, when genetic distances are considered to infer the evolutionary relationships among humans and the great apes, even the species genetically most similar to humans varies with the locus under study (Ruvolo 1997
; Satta et al. 2000
; Chen and Li 2001
; Patterson et al. 2006
; Hobolth et al. 2007
).
To understand why regions in the human genome can differ in their evolutionary history, it needs to be acknowledged that genetic lineages represented by DNA sequences in the extant species trace back to allelic variants in the shared ancestral species (Nei 1987
) (fig. 1). In here, these variants persist until they join in their most recent common ancestor (MRCA). Some genetic lineages, however, do not coalesce in the progenitor exclusively shared by humans and chimpanzees. They enter, together with the lineage descending from the gorilla, the ancestral population of all 3 species, where any 2 of the 3 lineages can merge first. Thus, in two-thirds of the cases, a genealogy results in which humans and chimpanzees are not each other's closest genetic relatives. The corresponding genealogies are incongruent with the species tree. In concordance with the experimental evidences, this implies that there is no such thing as a unique evolutionary history of the human genome. Rather, it resembles a patchwork of individual regions following their own genealogy.
|
The recent availability of a chimpanzee genome draft sequence and its comparison to the human genome has resulted in a genome-wide collection of genetic differences between the 2 species (The Chimpanzee Sequencing and Analysis Consortium 2005
In the present study, we focus on 3 aspects. First, what is the fraction of the human genome, and particularly of the genes therein, for which chimpanzees are not our closest relatives, and how are these regions distributed along human chromosomes. To this end, we compare a genome-wide collection of 23,210 human and chimpanzee DNA sequences with their homologs in gorilla, orangutan, and rhesus, respectively. Based on a likelihood approach, we first identify those sequence trees that significantly reject chimpanzees as our closest relatives, that is, sequence trees that are incongruent with the species tree. Second, we reestimate the splitting times for the human and great ape lineages and assess the ancestral population sizes of the ancient species shared by humans and chimpanzees from the fraction of incongruent sequence trees. Third, we determine the position of incongruent sequence trees relative to genes and exons in the human genome and discuss the consequences of the complex genetic ancestry of humans for our view on human and chimpanzee evolution.
| Materials and Methods |
|---|
|
|
|---|
Data
We downloaded 33,018 alignments of DNA sequences from human, chimpanzee, gorilla, orangutan, and rhesus, originating from a large-scale shotgun sample sequencing study of a western lowland gorilla (HCGOM data set [Patterson et al. 2006
reich.
Data Preprocessing
The HCGOM data set is available only as Arachne alignments (Jaffe et al. 2003
) of DNA shotgun sequence reads from the 4 nonhuman primate species to the National Center for Biotechnology Information (NCBI) Build 34 human genome assembly. No information on the quality of the individual aligned sequence reads was provided. To utilize this data set nonetheless, a number of preprocessing steps were required. For each Arachne alignment, we first extracted the individual sequence reads from the gorilla and the orangutan, respectively. Sequences from the chimpanzee and rhesus were ignored at this step because we used the assembled genome sequences instead. For each sequence, we retrieved the corresponding source sequence and its quality information from the NCBI trace archive (http://www.ncbi.nlm.nih.gov/Traces). This step excluded 6,459 Arachne alignments from further analysis because for at least one orangutan sequence we could find no entry in the NCBI trace archive. We next identified start and end of the subsequence that was used in the Arachne alignment in the source sequence. With this positional information, the corresponding base quality substring was extracted. Subsequently, we assembled the gorilla and orangutan sequences separately into contigs using Phrap (http://www.phrap.org). These contigs were then aligned with the human genome sequence (NCBI Build 36) using BLAT (Kent 2002
) with setting "minScore = 300." If more than one BLAT hit was obtained, the one with the highest score was retained. Human genome sequence positions covered by both gorilla and orangutan sequences were then identified and the corresponding human DNA sequence was extracted with nibFrag (http://www.soe.ucsc.edu/
kent/src/unzipped/utils/nibFrag/) from the Build 36 genome assembly. To obtain the orthologous regions in the chimpanzee and rhesus genomes, we used the pairwise human–chimpanzee and human–rhesus genome assemblies provided at the Human Genome Browser Web site (Kent et al. 2002
). The corresponding DNA sequences together with their base quality information were then extracted from the chimpanzee genome assembly (PanTro2) and the rhesus genome assembly (rheMac2). Subsequently, we aligned the DNA sequences from the 5 primate species with ClustalW (Thompson et al. 1994
). From these multiple sequence alignment, we eventually extracted those regions where sequences from all 5 species overlap. The resulting 30,112 multiple sequence alignments are available upon request.
Quality Screen
Positions with a Phred value below 20 were masked in the aligned chimpanzee, gorilla, orangutan, and rhesus sequences, respectively (Ewing and Green 1998
; Taudien et al. 2006
). Increasing the threshold to a Phred value of 40 had no significant effect on the outcome of the analysis but a mere decrease of the number of analyzed position. Alignment regions with a clustering of masked positions are indicative of an overall low sequence quality. To identify and remove such regions, we penalized alignment columns including a masked position with a minimum of –2 (one masked nucleotide) and a maximum of –6 (all nonhuman nucleotides have been masked) and rewarding unmasked columns with +1. Then we extracted the highest scoring subalignment. Only subalignments with a length
300 bp were further analyzed, leaving 26,909 alignments.
Phylogenetic Analysis
Phylogenetic tree reconstructions and molecular clock tests were performed with Tree-Puzzle (Schmidt et al. 2002
) using the Hasegawa-Kishino-Yano model of DNA sequence evolution (Hasegawa et al. 1985
) and 4 rate categories to model substitution rate heterogeneity among sites. Alignments for which the automatic root search implemented into Tree-Puzzle did not place the root on the rhesus branch were removed from the analysis (leaving 25,500 alignments), as where such alignments for which the molecular clock was rejected on a significance level of 0.05 (leaving 23,210 alignments). Posterior probabilities for the 15 unrooted tree topologies were estimated from the individual likelihoods assuming a uniform prior distribution on the set of trees. The fraction of incongruent sequence trees was assumed to follow a binomial distribution. Ninety-five percent confidence intervals (CI) were assessed using the normal approximation to the binomial distribution.
Dating of Speciation Events
To assess the splitting times of the 5 species, we first inferred the sequence tree together with its branch lengths from a concatenation of the alignments that did not reject the molecular clock and for which the root was placed on the rhesus branch. However, splitting times estimated from DNA sequence comparisons predate the historical separation of the species. To minimize this bias, we omitted alignments where the inferred sequence tree deviates significantly (posterior probability
0.95) from the species tree. The split of the orangutan 16 MYBP was used to calibrate the molecular clock.
Identification of Sequence Alignments Overlapping with Human Genes
To identify alignments that overlap with genes and exons in the human genome, we compared the alignment position with the position of "known genes" in the human genome as annotated in the Ensembl release 39 (http://www.ensembl.org/Homo_sapiens/index.html).
Analysis of Gene Ontology Terms
To investigate whether gene function has an influence on the phylogeny of the gene sequence, we analyzed the representation of Gene Ontology (GO) terms in our data. To this end, we used FatiGO2 (Al-Shahrour et al. 2006
) to screen for GO terms that are overrepresented in the fraction of genes for which at least one exon overlaps with an alignment that significantly supports an incongruent sequence tree. As a reference, we used those genes in our data set to which a sequence tree was mapped that does not differ significantly from the species tree.
| Results |
|---|
|
|
|---|
The finished human genome sequence (International Human Genome Sequencing Consortium 2004
|
Contrasting Phylogenetic Signals in the Data
We first assessed the fraction of sequence trees that is incongruent with the species tree. To this end, we calculated the likelihoods for all 15 unrooted tree topologies (Felsenstein 1981
40% of the alignments provide no clear support for a single branching pattern. Consistent with the prediction that the inclusion of alignments with no clear phylogenetic signal leads to an overestimation of the fraction of incongruent sequence trees (Yang 2002
|
|
To identify the subset of our data significantly supporting only a single phylogeny, we consider only sequence trees that are supported with a posterior probability of at least 95%. This leaves us with 11,945 phylogenetically informative alignments (tables 1 and 3). Among these, 23.0% (95% CI 22.2–23.8%) support a closer relationship of gorilla to either humans or chimpanzees, although they recover the monophyly of the 3 species. Trees where the gorilla is placed closer to the chimpanzee and trees with a human–gorilla sister group are observed equally often (1,369 and 1,361, respectively). Note that still 0.6% (95% CI 0.4–0.7%) of the resolved sequence trees place the orang within the human–chimp–gorilla subtree.
|
Subsequently, we checked whether the amount of incongruent sequence trees varies in different subsets of the human genome. First, we considered alignments that overlap with genes and exons. No significant difference in the frequency of the individual tree topologies compared with the genome-wide average is seen (table 3). This figure changes when we assess the fraction of incongruent sequence trees separately for the individual chromosomes (table 1 and fig. 3A). Values range between a low of 9.9% (95% CI 7.6–12.8%) for human chromosome X to a high of 29.3% (95% CI 26.1–32.8%) for human chromosome 8. A 1-factorial analysis of variance and a subsequent Bonferroni posttest rejected the hypothesis that all autosomes show the same mean fraction of incongruent sequence trees (P < 0.01). However, in a subsequent pairwise comparison, only chromosomes 7 and 8, displaying the smallest and largest proportion of incongruent trees, respectively, differ significantly. Including the X chromosome into the analysis revealed that the fraction of incongruent sequence trees on the sex chromosome is significantly different to that observed on all human autosomes, except for chromosome 7 and 22.
|
The Paleodemographic History of Humans and Chimpanzees
The intertwined genetic relationships between humans, chimpanzees, and gorillas allow conclusions about the paleodemographic histories of these species. Under a model of random genetic drift, the probability, p(H,C)G, to observe a congruent human–chimpanzee–gorilla sequence tree depends on the effective size, Ne(HC), of the ancient population, from which humans and chimpanzees emerged, and the time in years, T(HC), the progenitor species has persisted. More precisely,
|
|
e(HC) = 49,000 (95% CI 48,000–51,000). However,
e(HC) is substantially smaller when we use the X chromosomal data to assess p(H,C)G (
eHC) = 28,000; 95% CI = 24,000–32,000). An overview of the
e(HC) variation among the individual human chromosomes is shown in figure 3B.
|
The Evolutionary Age of Human Genetic Lineages
Subsequently, we identified those human genes in our data set for which exonic sequence overlapped with regions with an incongruent genealogy. A total of 125 genes were identified this way, 63 overlapped with regions placing chimpanzees closer to gorillas and 62 with regions supporting a human–gorilla grouping. A detailed list is given in the supplementary table 1 (Supplementary Material online).
| Discussion |
|---|
|
|
|---|
The evolutionary history of humans and the genetic relationships to their next closest relatives, the great apes, have been central to numerous studies in the past. Still, the picture about how and when humans emerged as a species from the common ancestor shared with chimpanzees remains fragmentary. In the present study, we have reanalyzed a data set of 23,210 alignments of human, chimpanzee, gorilla, orangutan, and rhesus DNA sequences from randomly chosen regions of the human genome (Patterson et al. 2006
A Map of Our Genetic Ancestry
In view of the random character of the sampling strategy, our results indicate that roughly one-quarter of our genome shares no immediate ancestry with chimpanzees. To get an impression about the spatial distribution of such regions, we mapped the resolved sequence trees onto the human chromosomes (fig. 5 and supplementary fig. 2 [Supplementary Material online]). Incongruent sequence trees are present on all autosomes as well as on the X chromosome and display no general tendency for a regional clustering. Thus, these chromosomes emerge by and large as random assemblies of regions owning a distinct evolutionary relationship to the great apes. Reshuffling of parental chromosomal loci during meiosis has presumably acted to decouple the evolutionary histories of genetic regions located on the same DNA molecule (Paabo 2003
; Hobolth et al. 2007
). However, the probability to observe an incongruent sequence tree seems not to be the same throughout our genome. When we asses the fraction of incongruent sequence trees for the individual human chromosomes, values range between 18% and 29% with a mean of 23.6% for the human autosomes, and it is as low as 10% for the human X chromosomes (cf., fig. 3A). For the autosomes, we can only speculate about the causes leading to the observed variation in the fraction of incongruent trees. Likely explanations fall into 2 categories. The first one assumes that all autosomes share the same fraction of sites with an incongruent genealogy. However, the power to detect these sites differs significantly for each chromosome. As one possible scenario, we could imagine that recombination patterns may have varied among chromosomes in the human–chimpanzee ancestor. As a consequence, sites following an incongruent genealogy might be more scattered and thus more difficult to detect in our analysis, for one chromosome than for another. However, we find no correlation between the fraction of incongruent sequence trees and the fraction of unresolved trees for the individual autosomes. The second category of explanations assumes that the individual autosomes differ significantly in their fraction of sites with an incongruent genealogy. Such a deviation from the neutral model detailed in figure 1 could point toward the effect of selection. For example, regions under balancing selection (Charlesworth 2006
) in the common ancestor of humans and chimpanzees have a higher probability to retain ancestral polymorphisms and, thus, would show up as regions in the human genome with an increased fraction of incongruent sequence trees. In turn, selective sweeps (Voight et al. 2006
) in the human–chimp ancestral species would remove any ancestral polymorphisms and, thus, result in islands in our genome, where incongruent sequence trees are depleted. Notably, in figure 5 and supplementary figure 2 (Supplementary Material online) individual genomic regions can be seen that are conspicuously free of incongruent sequence trees.
|
In summary, whatever causes the observed variation in the fraction of incongruent sequence trees, we can think of no influence resulting in an overestimation of this fraction. From this perspective, our figure of 23.6% incongruencies serves as a conservative baseline for the entire human autosomes. Any region deviating significantly from this figure is indicative of having an evolutionary history that differs from the genome average.
The situation is, on a first sight, different for the human X chromosome. So far, we have not taken into account its lower effective population size, which is only three-quarter that of the autosomes. Thus, a significant lower observation frequency of incongruent sequence trees on the X chromosome to the autosomes is not surprising.
Population Size of the Species Ancestral to Humans and Chimpanzees
From the observed frequency of incongruent sequence trees on the autosomes, we conclude that the species from which humans and chimpanzees emerged had a population size in the range of 49,000. This value appears in agreement with previous studies that chose different approaches to this question (Wall 2003
; Hobolth et al. 2007
). The estimate, however, essentially depends on the choice of the individual species generation times and the calibration of the molecular clock. For example, Hobolth et al. (2007)
used a mean generation time as high as 25 years for all extant and ancestral species, a value that can be safely considered unrealistic. This choice essentially implies that their data support a substantially higher effective population size of the predecessor species of humans and chimpanzees than their stated figure of 50,000. Our assumption of g(HC) = 20 years can still be considered as too high and might reflect more the generation time of contemporary humans rather than that of the ancestral species we shared with chimps. Reducing g(HC) to 15 years increases
e(HC) to 67,000 (95% CI 65,000–70,000). On the other hand, our dating for the split of the orangutan lineage (16 MYBP) used for calibrating the molecular clock might be too far back in time. A recent study on estimating the time points when the individual primate lineages have emerged places the split of the orangutan lineage 13 MYBP (Glazko and Nei 2003
). Based on this splitting time,
(HC) reduces to 1.7 Myr. With 20 years as an estimate for g(HC),
e(HC), equals 41,000 (95% CI 39,000–42,000). Taking into account both our ignorance of g(HC) and T(HC), we conclude that the effective population size of the species ancestral to humans and chimpanzees was 49,000 with a range from 39,000 to 69,000. A comparison of these estimates with the effective population sizes assessed for the contemporary human and great ape populations (Kaessmann et al. 2001
; Yu et al. 2001
) reveals that considerably little has changed in the demographic history of chimpanzees and gorillas. In contrast, about a 5-fold reduced effective population size is observed for extant humans and bonobos. This adds a further line of evidence that humans—as must have bonobos—experienced a severe demographic bottleneck in their recent evolutionary history (Kaessmann et al. 2001
).
The Peculiar Evolutionary History of the Human X Chromosome
The effective population size of the species ancestral to humans and chimpanzees estimated from the fraction of incongruent sequence trees observed on the X chromosome is 28,000 (95% CI 24,000–32,000). This value is significantly smaller than the expected Ne,X(HC)=3/4 x 49,000 = 36,750, based on the analysis of the autosomal data. Accordingly, the reduction in the observation frequency of incongruent sequence trees on the X chromosome compared with the autosomes cannot be explained by its lower effective population size alone (binomial test: P < 0.0001). Moreover, when we assess the splitting times from the concatenated X chromosomal alignments, these results also differ from those obtained from the autosomal data (fig. 4B). The split of humans and chimpanzees is dated more recent (5.4 MYBP), and the time the common ancestor of humans and chimpanzees has existed is about 700,000 years longer than the corresponding estimate from the autosomes. When we use the dating from the X chromosomal sequences to assess Ne,X(HC) from the fraction of incongruent sequence trees on the X chromosome, we estimated an ancestral population size of 37,000 (95% CI 32,000–43,000). This value is almost exactly the expected three-quarters of the population size estimated from the autosomes. Thus, the prolonged time span for the human–chimpanzee ancestor for the X chromosome suffices to explain its reduced fraction of incongruent sequence trees. To date, we can only speculate about the underlying cause of the marked difference in the evolutionary history of the autosomes and the X chromosomes. Rehybridization of human and chimpanzees subsequent to their initial separation into 2 different species, as suggested in Patterson et al. (2006)
and Osada and Wu (2005)
, might serve as one explanation. Alternatively, recurrent selective sweeps in the human–chimp ancestral population occurring more frequently on the X than on the autosomes—due to the direct exposition of recessive mutations to selection in males—could result in a similar figure. The removal of ancestral polymorphisms by selective sweeps has 2 effects. First, it reduces the fraction of incongruent sequence trees. Second, it reduces the coalescent time for human and chimpanzee genetic lineages. By that it places the human–chimp split inferred from their genetic distances more recent, and thus, it prolongs the estimated time span the ancestral species to humans and chimpanzees has existed. However, alternative explanations might exist. In this context, the distribution of incongruent sequence trees on the X chromosome might be informative. If repeatedly selective sweeps acted on the X chromosome, we would expect extended regions on the X that are depleted of incongruent sequence trees. Presumably, no such clustering is expected when hybridization between humans and chimpanzees would have caused the observed scenario. Unfortunately, the number of 50 incongruent sequence trees we could map to the X chromosome is yet too small to allow for this analysis.
The Evolutionary Age of the Human Genome
Twenty-three percent of our genome sharing no immediate ancestry with chimpanzees has a further interesting implication. Necessarily, the corresponding genetic lineages must have split from their MRCA shared with any other species already prior to the speciation of the gorilla. Thus, a substantial part of the human genome began to evolve—in a today's point of view—"human specific" way long before humans emerged as a species. Obviously, in our analysis, we observe only 2/3 of such "old" genetic lineages (fig. 1, dashed graphs) because the sequence trees of the remaining 1/3 agree with the species tree. Taking this into account, our results suggest that the ancestry of as much as 35% of our genome dates back to the ancient species we shared with the gorilla. What is the relevance for the evolution of functional sequences in the human genome? We observe incongruent sequence trees with frequencies around the genome-wide average in regions covered by genes and exons (cf., table 3 and supplementary table 1 [Supplementary Material online]). Furthermore, no overrepresentation of certain GO terms in the list of genes for which an incongruent sequence tree overlapped with an exon was observed (data not shown). Jointly, this suggests that the evolutionary relationships of human and chimpanzee sequences are by and large independent from the presence and function of genes and exons. From this figure, we conclude that roughly 1 out of 3 genes (35%) contain at least parts, which evolved human specific already in the progenitor species of humans, chimpanzees, and gorillas. The consequences are intriguing.
Despite extensive studies on early human evolution, it is still unclear when in our evolutionary history we split from the ancestral species shared with the chimpanzees. Particularly puzzling in this context is the apparent discrepancy between the dating of this split based on genetic evidences and the age of fossils, which have been—due to their display of certain human specific characteristics—assigned to a hominid species, that is a species more closely related to humans than to chimpanzees. The extent of genetic differences between humans and chimpanzees usually places the split of humans and chimpanzees around 5–6 MYBP (Chen and Li 2001
; Glazko and Nei 2003
; Kuroki et al. 2006
), with a variation of at least 4 Myr across the genome (Barton 2006
; Patterson et al. 2006
). However, the current interpretation of the fossil record argues for the presence of hominids already 5.8 MYBP (Orrorin tugenensis, Senut et al. 2001
, and Ardipithecus kadabba, WoldeGabriel et al. 2001
) and presumably as early as 6.5–7.4 MYBP (Sahelanthropus tchadensis, Brunet et al. 2005
). Hitherto, only a single attempt has been made to reconcile this discrepancy. Patterson et al. (2006)
proposed a circuitous evolutionary scenario, where humans and chimpanzees separated initially prior to the emergence of S. tchadensis, explaining both the fossil record and the evolutionary older parts of the human genome. A later gene flow between the 2 species facilitated by rehybridization with the chimpanzee was then proposed to explain the evolutionary younger fraction of our genome. Our results, however, provide a far more parsimonious explanation.
The varying evolutionary ages of the human genome are to a large extent simply a consequence of the stochastic nature of the coalescent process determining the genealogy of human and chimpanzee genetic lineages (Barton 2006
). More importantly, in view of the age of certain human genetic lineages (cf., supplementary table 1, Supplementary Material online), it seems mandatory to consider that a number of phenotypic characteristics nowadays judged as human-specific inventions (apomorphies) existed de facto already in the ancestral species of humans and chimpanzees. It is only because the corresponding genetic lineages were lost in our next relatives that these characters became confined to humans. The unequivocal assignment of fossil remains to a species more closely related to humans than to chimpanzees based on the presence of certain human-specific apomorphies should, therefore, be taken with a grain of salt. A similar point can be made in context of the recent discussion concerning the position of Australopithecus afarensis in the hominid phylogeny (Rak et al. 2007
). This species has been proposed as the common ancestor of later hominines (Johanson and White 1979
; White et al. 2006
) including the genus Homo. However, the observation of an evolutionary derived ramal morphology of A. afarensis, resembling the status in gorilla, whereas contemporary humans and chimpanzee display the ancestral state, was taken as evidence to exclude A. afarensis from human ancestry (Rak et al. 2007
).
The problem in using apomorphies for the reconstruction of phylogenetic relationships, however, extends beyond the classification of fossils. A nonnegligible fraction of human genes is expected to share an immediate ancestry with the gorilla or with the common ancestor of chimpanzees and gorillas. Because gene products essentially define the phenotype, we can expect a certain proportion of derived morphological characters to support the sister grouping of humans and gorillas, or chimpanzees and gorillas. This expectation is corroborated by comparative morphological studies between humans, chimpanzees, and gorillas (Shoshani et al. 1996
; Collard and Wood 2000
). For a number of phenotypic characters, either humans or chimpanzees share the derived character state with other great apes although the ancestral character state is still seen in the respective other species (supplementary table 2, Supplementary Material online). To water down the apparently contradictory character of such data, taxonomists proposing the human–chimp sister group status employed additional evolutionary scenarios, for example, the same derived character state might have arisen twice independently during evolution (Pilbeam 1986
; Lockwood et al. 2004
). Opponents of the human–chimp grouping, on the other hand, exploited such observations to promote alternative phylogenies of humans and the great apes (Schwartz 1984
). However, our genome-wide comparison of DNA sequences between human and great ape species provides an alternative explanation, which easily resolves the discrepancy between the various schools favoring one over the other phylogeny. The random sorting of ancestral genetic polymorphisms that have a phenotypic polymorphism associated can explain why synapomorphies can be shared among species that are not each other's closest relatives.
In summary, our study highlights the extent and implications of the intertwined genetic relationships between humans, chimpanzees, and gorillas. Clearly, a comprehensive understanding of how humans evolved their unique characteristics, which distinguishes them from all other extant species, depends essentially on our knowledge of the evolutionary history of our genes. From this perspective, an extensive sequencing of the gorilla genome will be required to make full use of the chimpanzee genome sequence on the way toward a map of our genetic ancestry.
| Supplementary Material |
|---|
|
|
|---|
Supplementary figures 1and 2 and tables 1 and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
The authors wish to thank Heiko A. Schmidt and Vinh Le Sy for help in the Maximum Likelihood Analyses and Tanja Gesell and Steffen Klaere for helpful comments on the manuscript. The work was in part supported by the German National Genome Network grant 01GR0105. Financial support from the Wiener Wissenschafts- und Technologie-Fond is also acknowledged.
| Footnotes |
|---|
Associate Editor
| References |
|---|
|
|
|---|
Al-Shahrour F, Minguez P, Tarraga J, Montaner D, Alloza E, Vaquerizas JM, Conde L, Blaschke C, Vera J, Dopazo J. BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res. (2006) 34:W472–W476.
Barton NH. Evolutionary biology: how did the human species form? Curr Biol. (2006) 16:R647–R650.[CrossRef][Web of Science][Medline]
Brunet M, Guy F, Pilbeam D, Lieberman DE, Likius A, Mackaye HT, Ponce de Leon MS, Zollikofer CP, Vignaud P. New material of the earliest hominid from the Upper Miocene of Chad. Nature (2005) 434:752–755.[CrossRef][Medline]
Charlesworth D. Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. (2006) 2:e64.[CrossRef][Medline]
Chen FC, Li WH. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. (2001) 68:444–456.[CrossRef][Web of Science][Medline]
The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 437:69–87.[CrossRef][Medline]
Chou HH, Hayakawa T, Diaz S, Krings M, Indriati E, Leakey M, Paabo S, Satta Y, Takahata N, Varki A. Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proc Natl Acad Sci USA (2002) 99:11736–11741.
Collard M, Wood B. How reliable are human phylogenetic hypotheses. Proc Nat Acad Sci USA (2000) 97:5003–5006.
Ebersberger I. Chimpanzee genome. In: Encyclopedia of Molecular Cell Biology and Molecular Medicine (2004) Weinheim, Germany: Wiley-VCH. 551–578.
Enard W, Przeworski M, Fisher S, Lai C, Wiebe V, Kitano T, Monaco A, Paabo S. Molecular evolution of FOXP2, a gene involved in speech and language. Nature (2002) 418:869–872.[CrossRef][Medline]
Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. (1998) 8:186–194.
Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. (1981) 17:368–376.[CrossRef][Web of Science][Medline]
Gilad Y, Man O, Paabo S, Lancet D. Human specific loss of olfactory receptor genes. Proc Natl Acad Sci USA (2003) 100:3324–3327.
Glazko GV, Nei M. Estimation of divergence times for major lineages of primate species. Mol Biol Evol. (2003) 20:424–434.
Goodman M. Evolution of the immunologic species specificity of human serum proteins. Hum Biol. (1962) 34:104–150.[Web of Science][Medline]
Goodman M, Porter CA, Czelusniak J, Page SL, Schneider H, Shoshani J, Gunnell G, Groves CP. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol Phylogenet Evol. (1998) 9:585–598.[CrossRef][Web of Science][Medline]
Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. (1985) 22:160–174.[CrossRef][Web of Science][Medline]
Hobolth A, Christensen OF, Mailund T, Schierup MH. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. (2007) 3:e7.[CrossRef][Medline]
Horai S, Satta Y, Hayasaka K, Kondo R, Inoue T, Ishida T, Hayashi S, Takahata N. Man's place in Hominoidea revealed by mitochondrial DNA genealogy. J Mol Evol. (1992) 35:32–43.[CrossRef][Web of Science][Medline]
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature (2004) 431:931–945.[CrossRef][Medline]
Jaffe DB, Butler J, Gnerre S, Mauceli E, Lindblad-Toh K, Mesirov JP, Zody MC, Lander ES. Whole-genome sequence assembly for mammalian genomes: arachne 2. Genome Res. (2003) 13:91–96.
Johanson DC, White TD. A systematic assessment of early African hominids. Science (1979) 203:321–330.[CrossRef][Web of Science][Medline]
Kaessmann H, Wiebe V, Weiss G, Paabo S. Great ape DNA sequences reveal a reduced diversity and an expansion in humans. Nat Genet. (2001) 27:155–156.[CrossRef][Web of Science][Medline]
Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. (2002) 12:656–664.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. (2002) 12:996–1006.
King MC, Wilson AC. Evolution at two levels in humans and chimpanzees. Science (1975) 188:107–116.
Klein J, Takahata N. Where do we come from? (2002) Berlin (Germany): Springer.
Kuroki Y, Toyoda A, Noguchi H, Taylor TD, et al, (19 co-authors). Comparative analysis of chimpanzee and human Y chromosomes unveils complex evolutionary pathway. Nat Genet. (2006) 38:158–167.[CrossRef][Web of Science][Medline]
Li WH, Saunders MA. News and views: the chimpanzee and us. Nature (2005) 437:50–51.[CrossRef][Medline]
Lockwood CA, Kimbel WH, Lynch JM. Morphometrics and hominoid phylogeny: support for a chimpanzee-human clade and differentiation among great ape subspecies. Proc Natl Acad Sci USA (2004) 101:4356–4360.
Nei M. Molecular evolutionary genetics (1987) New York: Columbia University Press.
Nuttal GHF. Blood immunity and blood relationship (1904) Cambridge: Cambridge University Press.
Osada N, Wu CI. Inferring the mode of speciation from genomic data: a study of the great apes. Genetics (2005) 169:259–264.
Paabo S. The mosaic that is our genome. Nature (2003) 421:409–412.[CrossRef][Medline]
Pamilo P, Nei M. Relationships between gene trees and species trees. Mol Biol Evol. (1988) 5:568–583.[Abstract]
Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D. Genetic evidence for complex speciation of humans and chimpanzees. Nature (2006) 441:1103–1108.[CrossRef][Medline]
Pilbeam D. Hominoid evolution and hominoid origins. Am Anthropol. (1986) 88:295–312.[CrossRef][Web of Science]
Rak Y, Ginzburg A, Geffen E. Gorilla-like anatomy on Australopithecus afarensis mandibles suggests Au. afarensis link to robust australopiths. Proc Natl Acad Sci USA (2007) 104:6568–6572.
Ruvolo M. Molecular phylogeny of the hominoids: inferences from multiple independent DNA sequence data sets. Mol Biol Evol. (1997) 14:248–265.[Abstract]
Satta Y, Klein J, Takahata N. DNA archives and our nearest relative: the trichotomy problem revisited. Mol Phylogenet Evol. (2000) 14:259–275.[CrossRef][Web of Science][Medline]
Schmidt HA, Strimmer K, Vingron M, von Haeseler A. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (2002) 18:502–504.
Schwartz JH. The evolutionary relationships of man and orang-utans. Nature (1984) 308:501–505.[CrossRef]
Senut B, Pickford M, Gommery D, Mein P, Cheboi K, Coppens Y. First hominid from the Miocene (Lukeino Formation, Kenya). C R Acad Sci. Ser II A Earth Planet Sci. (2001) 332:137–144.
Shoshani J, Groves CP, Simons EL, Gunnell GF. Primate phylogeny: morphological vs. molecular results. Mol Phylogenet Evol. (1996) 5:102–154.[CrossRef][Web of Science][Medline]
Sibley CG, Ahlquist JE. The phylogeny of the hominoid primates, as indicated by DNA-DNA hybridization. J Mol Evol. (1984) 20:2–15.[CrossRef][Web of Science][Medline]
Strimmer K, von Haeseler A. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA (1997) 94:6815–6819.
Takahata N, Satta Y. Evolution of the primate lineage leading to modern humans: phylogenetic and demographic inferences from DNA sequences. Proc Natl Acad Sci USA (1997) 94:4811–4815.
Taudien S, Ebersberger I, Glockner G, Platzer M. Should the draft chimpanzee sequence be finished? Trends Genet. (2006) 22:122–125.[CrossRef][Web of Science][Medline]
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Varki A. A chimpanzee genome project is a biomedical imperative. Genome Res. (2000) 10:1065–1070.
Vigilant L, Paabo S. A chimpanzee millennium. Biol Chem. (1999) 380:1353–1354.[CrossRef][Web of Science][Medline]
Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol. (2006) 4:e72.[CrossRef][Medline]
Wall JD. Estimating ancestral population sizes and divergence times. Genetics (2003) 163:395–404.[Web of Science][Medline]
White TD, WoldeGabriel G, Asfaw B, et al, (22 co-authors). Asa Issie, Aramis and the origin of Australopithecus. Nature (2006) 440:883–889.[CrossRef][Medline]
WoldeGabriel G, Haile-Selassie Y, Renne PR, Hart WK, Ambrose SH, Asfaw B, Heiken G, White T. Geology and palaeontology of the Late Miocene Middle Awash valley, Afar rift, Ethiopia. Nature (2001) 412:175–178.[CrossRef]
Yang Z. Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. Genetics (2002) 162:1811–1823.
Yu N, Zhao Z, Fu YX, et al, (11 co-authors). Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol Biol Evol. (2001) 18:214–222.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. A. Cranston, B. Hurwitz, D. Ware, L. Stein, and R. A. Wing Species Trees from Highly Incongruent Gene Trees in Rice Syst Biol, October 1, 2009; 58(5): 489 - 500. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. Degnan, M. DeGiorgio, D. Bryant, and N. A. Rosenberg Properties of Consensus Methods for Inferring Species Trees from Gene Trees Syst Biol, June 4, 2009; (2009) syp008v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Churakov, J. O. Kriegs, R. Baertsch, A. Zemann, J. Brosius, and J. Schmitz Mosaic retroposon insertion patterns in placental mammals Genome Res., May 1, 2009; 19(5): 868 - 875. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Burgess and Z. Yang Estimation of Hominoid Ancestral Population Sizes under Bayesian Coalescent Models Incorporating Mutation Rate Variation and Sequencing Errors Mol. Biol. Evol., September 1, 2008; 25(9): 1979 - 1994. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Avise and T. J. Robinson Hemiplasy: A New Term in the Lexicon of Phylogenetics Syst Biol, June 1, 2008; 57(3): 503 - 507. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







