MBE Advance Access originally published online on December 28, 2006
Molecular Biology and Evolution 2007 24(3):757-768; doi:10.1093/molbev/msl209
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Whole-mtDNA Genome Sequence Analysis of Ancient African Lineages
,
* Department of Biology, University of Maryland
Center for the Advanced Study of Hominid Paleobiology
Department of Anthropology, George Washington University
E-mail: tishkoff{at}umd.edu.
| Abstract |
|---|
|
|
|---|
Studies of human mitochondrial (mt) DNA genomes demonstrate that the root of the human phylogenetic tree occurs in Africa. Although 2 mtDNA lineages with an African origin (haplogroups M and N) were the progenitors of all non-African haplogroups, macrohaplogroup L (including haplogroups L0L6) is limited to sub-Saharan Africa. Several L haplogroup lineages occur most frequently in eastern Africa (e.g., L0a, L0f, L5, and L3g), but some are specific to certain ethnic groups, such as haplogroup lineages L0d and L0k that previously have been found nearly exclusively among southern African "click" speakers. Few studies have included multiple mtDNA genome samples belonging to haplogroups that occur in eastern and southern Africa but are rare or absent elsewhere. This lack of sampling in eastern Africa makes it difficult to infer relationships among mtDNA haplogroups or to examine events that occurred early in human history. We sequenced 62 complete mtDNA genomes of ethnically diverse Tanzanians, southern African Khoisan speakers, and Bakola Pygmies and compared them with a global pool of 226 mtDNA genomes. From these, we infer phylogenetic relationships amongst mtDNA haplogroups and estimate the time to most recent common ancestor (TMRCA) for haplogroup lineages. These data suggest that Tanzanians have high genetic diversity and possess ancient mtDNA haplogroups, some of which are either rare (L0d and L5) or absent (L0f) in other regions of Africa. We propose that a large and diverse human population has persisted in eastern Africa and that eastern Africa may have been an ancient source of dispersion of modern humans both within and outside of Africa.
Key Words: genetics mtDNA genomes Africa mtDNA haplogroups Homo sapiens Tanzania Khoisan speakers
| Introduction |
|---|
|
|
|---|
Genetic analysis of mitochondrial (mt) DNA has been an important tool in understanding human evolution due to characteristics of mtDNA, such as high copy number, lack of recombination, high substitution rate, and a maternal mode of inheritance (Ballard and Whitlock 2004
Comprehensive studies of the human mtDNA genome have been carried out by analyzing single nucleotide polymorphisms (SNPs) determined by restriction fragment length polymorphism (RFLP) analysis and sequences of the first hypervariable region of the d-loop (Chen et al. 1995
, 2000
; Salas et al. 2002
, 2004
). These studies have demonstrated that human mtDNA is geographically structured and may be classified into groups of related haplotypes (i.e, haplogroups) (Chen et al. 1995
; Wallace et al. 1999
). Only 2 mtDNA macrohaplogroups (M and N) and their derivatives persisted in non-Africans after the migration of modern humans out of Africa. Macrohaplogroup L is geographically limited to sub-Saharan Africa and has been divided into haplogroups L0L6 (Mishmar et al. 2003
; Salas et al. 2004
; Kivisild, Metspalu, et al. 2006
). The phylogeny of macrohaplogroup L is largely based on d-loop sequence and RFLP analysis and is, therefore, not well resolved (fig. 1A and B) particularly at basal tree nodes (Kivisild, Metspalu, et al. 2006
). In particular, African mtDNAs that belong to L0 and L1 fall into several distinctive subhaplogroups, but their history is complex and poorly understood (Pereira et al. 2001
; Kivisild, Metspalu, et al. 2006
).
|
Haplogroup L0 is divided into subhaplogroups L0a, L0d, L0f, and L0k (Salas et al. 2002
A global sample of complete mtDNA genome sequences has become publicly available making it possible to more precisely make phylogenetic inferences and calculate divergence dates for these mtDNA haplogroups (Ingman et al. 2000
; Torroni et al. 2001
; Ingman and Gyllensten 2003
; Mishmar et al. 2004
; Ruiz-Pesini et al. 2004
; Macaulay et al. 2005
; Thangaraj et al. 2005
; Kivisild, Metspalu, et al. 2006; Kivisild, Shen, et al. 2006). However, these previous analyses have included few samples representing many of the African mtDNA subhaplogroups (L0a, L0d, L0f, L0k, L1b, L1c, and L5), particularly from people residing in eastern Africa. East African populations may provide important clues toward understanding modern human origins. Both paleobiological and archeological data indicate that modern humans may have originated in eastern Africa (McBrearty and Brooks 2000
; White et al. 2003
), perhaps as early as 196,000 years ago (kya) (McDougall et al. 2005
). In addition, the earliest migrations of modern humans out of Africa are thought to have originated from eastern Africa (Tishkoff et al. 1996
; Quintana-Murci et al. 1999
; Kivisild et al. 2004
). Despite the paleobiological evidence that modern humans originated in eastern Africa, previous genetic studies have observed that L0k and L0d, which are found primarily among the SAK, occur at the root of the human mtDNA gene tree (Chen et al. 1995
; Ingman et al. 2000
; Mishmar et al. 2003
; Ruiz-Pesini et al. 2004
; Kivisild, Metspalu, et al. 2006
). However, the presence of click-speaking populations in Tanzania (the Hadza and Sandawe) as well as Y chromosome data from the Hadza (Knight et al. 2003
), Ethiopian, and Sudanese populations (Underhill et al. 2000
, 2001
; Cruciani et al. 2002
; Semino et al. 2002
) indicate that the SAK may have originated in eastern Africa, although the divergence between populations from these regions was quite ancient. Until now, no genetic data existed for the Sandawe.
In this study, we compare several complete mtDNA genomes of Tanzanians with a global panel of mtDNA genomes to clarify the evolutionary history of the African mtDNA haplogroups and to better characterize the role that populations in East Africa played in the origin and dispersal of modern humans across Africa. Generally, Tanzanians appear to have a high level of mtDNA genome diversity that is distributed among several mtDNA haplogroups that originated at different times in modern human history. These data suggest that populations in Tanzania have played an important and persistent role in the origin and diversification of modern humans.
| Materials and Methods |
|---|
|
|
|---|
Sample Collection
All Tanzanian samples were obtained from blood samples collected with informed consent and Institutional Review Board approval. The Ju-speaking !Xun (also known as Vasekela) and Khoe-speaking Khwe samples were collected from individuals in the area of Schmidtsdrift in the northwest Cape of South Africa and were provided by Dr M. Kotze. We obtained additional SAK samples from the Human Genome Diversity PanelCentre d'Etude du Polymorphisme Humain (HGDPCEPH). We selected samples for this study in 3 ways. First, we chose a sample of Tanzanians from 5 linguistically and culturally diverse ethnic groups (language classification is listed within parentheses): Sandawe (Khoisan), Hadza (Khoisan), Burunge (Afro-Asiatic), Maasai (Nilo-Saharan), and Turu (Niger-Kordofanian). Samples were chosen to represent the relative frequencies of the mtDNA L2, L3, M, and N haplogroups present in a much larger sample of d-loop sequences and mtDNA SNP data collected from over 700 Tanzanians (Gonder MK, Mortensen H, Reed F, Tishkoff SA, unpublished data). Second, we sequenced a subset of 25 Tanzanian samples to represent the most ancient mtDNA L0, L1, and L5 haplogroups (L0a, L0d, L0f, L1c, and L5). Third, we sequenced 10 samples from the SAK, 9 of which belong to mtDNA haplogroups L0d (n = 7) and L0k (n = 2). Finally, we sequenced the mtDNA genomes of 4 Bakola pygmies to expand our sample size of haplogroup L1c. All samples were combined for analysis with a global data set of 254 human sequences obtained from mtDBHuman Mitochondrial Genome Database (http://www.genpat.uu.se/mtDB/) and from GenBank. Complete mtDNA genome sequences of a chimpanzee (Pan troglodytes) and a gorilla were used as outgroups to the human mtDNA genomes for phylogenetic analyses (GenBank accession numbers D38113 and X93347).
Sequencing
We amplified mtDNA genome sequences in two 8.5 kilobase (kb) using overlapping fragments, a touchdown polymerase chain reaction (PCR) protocol (Don et al. 1991
) and high-fidelity Platinum Taq polymerase following the manufacturer's protocol (Invitrogen Corporation, Carlsbad, CA). Sequencing was done using Big Dye Ready Reactions Kits using protocols specified by the manufacturer (Applied Biosystems, Inc., Foster City, CA). We processed 48 sequencing reactions using an ABI 3100 Genetic Analyzer for each individual, resulting in complete upstream and downstream mtDNA genome sequences (Rieder et al. 1998
). PCR and sequencing primer sequences are given in table S1 (Supplementary Material online). We assembled sequences using Sequencher 4.1 (GeneCodes Corporation, Ann Arbor, MI) and annotated them according to the Cambridge Reference Sequence (Andrews et al. 1999
). We prepared the mtDNA sequences for phylogenetic analysis in ClustalX (Thompson et al. 1997
) by aligning them to the mtDNA genome sequences from mtDB Web site. We improved the resulting alignment by visual inspection in MacClade version 4.05 (Maddison WP and Maddison DR 2000
).
Statistical Tests of Genetic Diversity and Neutrality
Because the d-loop is prone to problems with homoplasy and has been shown to produce unreliable gene trees (Maddison et al. 1992
; Ballard and Whitlock 2004
), we excluded the d-loop from all analyses shown in this study (except fig. S1, Supplementary Material online). We included sites corresponding to basepairs 57716,023 of the Cambridge Reference Sequence (Andrews et al. 1999
) in our analysis. We calculated the following summary statistics using DnaSP version 3.99 (Rozas et al. 2003
): numbers of sequences (n), segregating sites (S), nucleotide diversity (
), and average number of nucleotide differences (k) for various subsets of the mtDNA genome sequences (Rozas et al. 2003
). We also tested for deviations from expectations of neutrality, including Tajima's D, D* of Fu and Li, and F* of Fu and Li using DnaSP.
Phylogenetic Analyses
We determined optimal models of nucleotide sequence evolution by log likelihood ratio tests (Huelsenbeck and Crandall 1997
) as implemented by PAUP* version 4.0b10 (Swofford 2002
). We calculated a distance matrix and Neighbor-Joining (NJ) tree for the 324 samples using the HKY85 substitution model, with gamma-distributed rates and 32 discrete nucleotide substitution rate categories. We subjected the resulting tree to 100,000 bootstrap replicates with resampling to provide statistical support for the basal branches of the mtDNA gene tree.
In order to better resolve the phylogenetic relationships of the African mtDNA lineages, we also analyzed a subset of the original data set using Bayesian analysis. Due to computational limitations, this data set included all mtDNA genome sequences from sub-Saharan Africans (n = 89), African representatives of the M and N haplogroups (n = 5), and a geographically diverse sample of mtDNAs of non-Africans derived from haplogroups M and N (n = 18). We completed phylogenetic analyses of this smaller data set in MrBayes 3.1 (Ronquist and Huelsenbeck 2003
) using the optimal model of sequence evolution determined in PAUP* (i.e., HKY85). In MrBayes, we ran 4 chains simultaneously for one million generations until the standard deviation (SD) of split frequencies was less than 0.01 under the HKY85 model with gamma-distributed rates and 16 rate categories.
Time to Most Recent Common Ancestor
We tested for nucleotide substitution rate heterogeneity for each gene tree using a likelihood ratio test (2ln
) comparing constrained (molecular clock enforced) versus unconstrained (no clock) trees (Huelsenbeck and Crandall 1997
). Because we found significant rate heterogeneity in the NJ and Bayesian trees, we estimated divergence times for the mtDNA haplogroup clades using a penalized likelihood (PL) model as implemented in the program r8s 1.07 (Sanderson 1997
, 2002
, 2003
) using the optimal smoothing value (S = 320) obtained by a cross-validation procedure in R8s.
We estimated confidence intervals (CIs) for each tree node using a 100 replicate bootstrap resampling procedure (Baldwin and Sanderson 1998
) that was implemented by Perl scripts in the r8s-bootkit provided by Torsten Eriksson at http://www.bergianska.se/index_forskning_soft.html. We generated 100 bootstrap replicate data sets from the tree obtained from MrBayes v. 3.1 using SEQBOOT in PHYLIP (Felenstein 1993
). In order to determine the 95% CI, we used a log likelihood decline of 2.0 units, which is roughly equivalent to 2 SDs (Sanderson and Doyle 2001
). We conducted all analyses using a discrete approximation of a gamma distribution to accommodate for among-site rate heterogeneity. We calibrated our time to most recent common ancestor (TMRCA) estimates by assuming that the Pan and Homo lineages had separated from each other completely by 6 MYA (Kumar et al. 2005
; Patterson et al. 2006
) and added 500 ky for lineage sorting (Macaulay et al. 2005
).
Median-Joining Network Analysis
Networks of L0/L1 mtDNA genome haplogroups were constructed using Network 4.1.1.1
[EC]
(Fluxus Technology Ltd., 2004 [Bandelt et al. 1999
]) in order to provide a detailed analysis of nucleotide substitutions along branches. We prepared sequences for analysis in MacClade 4.06 OS X. We excluded all invariant nucleotide positions in our L0/L1 alignment. We found 486 variable sites in the coding region, which spanned basepairs 59316,077 of the Cambridge Reference Sequence (Andrews et al. 1999
). These 486 variable sites were each assigned equal weight in the analyses (for contrasting method see Finnila et al. 2001
). Additionally, we constructed a median-joining (MJ) network including P. troglodytes as an outgroup (GenBank accession number D38113). Although Network 4.1.1.1
[EC]
is not intended for interspecies comparison, the chimpanzee was included to root the network and to unambiguously infer branching patterns at the base of the human mtDNA network for comparisons to our phylogenetic analyses.
| Results |
|---|
|
|
|---|
We sequenced a total of 62 African complete mtDNA genomes for this study that have been assigned GenBank accession numbers EF184580EF184641. These mtDNA genomes were from individuals belonging to several ethnic groups in Tanzania (n = 49), click-speaking !Xun and Khwe populations from South Africa (n = 10), and Bakola Pygmies from Cameroon (n = 4). These samples were selected in order to fully represent L0a, L0d, L0f, L0k, L1c, L5, L2, L3, M, and N haplogroup lineages (table S2, Supplementary Material online). These newly sequenced African mtDNA genomes were aligned and compared with a global assortment of 254 mtDNA genomes of peoples of diverse geographic origin. Table S3 (Supplementary Material online) lists the GenBank accession number, sampling provenance, and major geographic region for these 254 mtDNAs. Diversity statistics are given in table 1. The genetic diversity present in this sample was broadly consistent with previous studies (Ingman et al. 2000
) among Africans (3.92 x 103) and Tanzanians (3.80 x 103) was more than twice that among non-Africans (1.81 x 103). However, the level of variation in Africa may be artificially elevated to some extent by the selection of genomes for sequencing that would maximize haplogroup representation.
|
We detected significant departures from neutrality expectations (table 1), as measured by Tajima's D, in the global data set and the pooled African and non-African data sets, but not in Tanzanians. D* and F* statistics of Fu and Li revealed significant departures from neutrality in all populations. We also tested all mtDNAs belonging to subsets of haplogroups L0 and L1 for deviations from neutrality expectations (results not shown). None of these subsets significantly deviated from neutrality expectations, except for mtDNAs belonging to L5 (Tajima's D = 1.29, P < 0.001; D* of Fu and Li = 1.26, P < 0.05; F* of Fu and Li = 1.37, P < 0.05). Other studies of whole-mtDNA genome diversity in Africa did not report significant deviations from neutrality expectations (Ingman et al. 2000
Due to the high frequency of homoplasy in the mtDNA d-loop, we compared the topology of an NJ tree reconstructed from the complete mtDNA sequences of the 322 samples (fig. S1, Supplementary Material online) with a tree reconstructed using the mtDNA sequences excluding the d-loop (fig. 2). The topologies of the 2 trees were similar, but the basal branches of the complete mtDNA sequences had lower bootstrap values. In contrast, the NJ tree of the mtDNA genomes that excluded the d-loop had higher statistical support for the basal branches separating the haplogroups (L0, L1, L5, L2, L3, M, and N), with bootstrap values ranging from 61% to 91%.
|
There are several notable characteristics of the NJ tree shown in figure 2. First, the L0/L5/L1/L2/L3 haplogroups are African specific, as previously reported (Ingman et al. 2000
In order to better resolve the evolutionary history of the most ancient mtDNA haplogroup lineages using Bayesian maximum likelihood analyses, we next analyzed a smaller data set composed of all mtDNA genomes of people from sub-Saharan Africa and a subset of the samples obtained from GenBank. The subset of samples included a global panel representing all of the major non-African haplogroup lineages. The Bayesian tree is shown in figure 3. The overall tree topologies of the Bayesian tree and NJ tree were similar. Clade credibility scores, which are a measure of the posterior probability of the tree branching structure, ranged from 73% to 100%. MtDNAs of Africans belonging to haplogroups L0 and L1 form the most basal lineages of the human mtDNA gene tree. Within L0, L0d forms the most basal branch of the tree and also contains 2 reciprocally monophyletic clades composed of Tanzanians and SAK, respectively. L0k forms a clade with L0f and L0a, providing additional support of independent origins of the Khoisan-specific L0d and L0k haplogroup lineages. L1b and L1c form a clade that does not include L5. In contrast to the NJ trees (fig. 2 and fig. S1, Supplementary Material online), L5 occupies an intermediate phylogenetic position between L1 and L2, as has been previously reported (Shen et al. 2004
).
|
Using a log likelihood test (Huelsenbeck and Crandall 1997
= 750.9,
2 df = 318, P < 0.05, 100 permutations) and for the smaller data set (n = 114; 2ln
= 349.5,
2 df = 112, P < 0.05, 100 permutations). Simulations have shown that it is difficult to root a phylogeny precisely when the outgroup is very distant relative to the ingroup, as is the case in the present study (Penny et al. 1995
= 433.94,
2 df = 317, P < 0.05, 100 permutations) and for the smaller data set (n = 113; 2ln
= 150.41,
2 df = 111, P < 0.05, 100 permutations).
Subsequent to our discovery that these data do not follow a clock-like model, we applied a PL algorithm to account for substitution rate heterogeneity among the mtDNA haplogroup clades to calculate TMRCAs for various nodes in the gene tree shown in figure 3. Table 2 lists these TMRCA dates and their 95% CIs. Our TMRCA estimate for the global mtDNA genome tree is 194.3 ± 32.55 kya, which is very close to the age of the earliest modern humans estimated from fossil data (McDougall et al. 2005
) as well as some early studies of mtDNA diversity (e.g., Vigilant et al. 1991
; Horai et al. 1995
[when corrected for a Pan/Homo split 6.5 MYA]). We also observe an origin of L0 (146.4 ± 25.1 kya) and L1 (140.4 ± 33 kya), slightly more recent than the appearance of modern humans based on the paleontological record (Clark et al. 2003
; White et al. 2003
; McDougall et al. 2005
). The L0d mtDNAs have a TMRCA of 106 ± 20.2 kya. The TMRCA of mtDNAs of the SAK belonging to L0d is 90.4 ± 18.9 kya, whereas the TMRCA of L0d mtDNAs belonging to Tanzanians is more recent (30.6 ± 17.8 kya). The TMRCA of L0k, L0f, and L0a is 139.8 ± 24.6 kya. The TMRCA of the SAK L0k is 70.9 ± 19.7 kya. The TMRCA of L0f, which is observed only in eastern Africa, indicates that it is a relatively old lineage (94.9 ± 9.4 kya). The TMRCA of L0a (54.6 ± 5.7 kya) is more recent than the TMRCA of L0f, even though these mtDNA samples originate from diverse regions in Africa. We attribute the relatively old TMRCA (and highly negative Tajima's D) of L5 (129.4 ± 22.1 kya) to the divergent sequence of the L5 mtDNA from a single Tanzanian Mbugwe individual compared with the three L5 mtDNAs from the Tanzanian Sandawe that differed from each other by very few basepairs. The TMRCAs of L2 and L3 are more recent (96.7 ± 10.7 kya) compared with those of L0, L1, and L5. The age of the youngest node containing both African and non-African sequences (node S) is 94.3 ± 9.9 kya and represents an upper bound time estimate for an exodus out of Africa.
|
Phylogenetic analyses of mtDNA that assume a strict bifurcating tree topology may not be well suited to the study of human mtDNA (Bandelt et al. 1999
|
| Discussion |
|---|
|
|
|---|
Most analyses of the phylogenetic relationships among African mtDNA haplogroup lineages have been confined to the d-loop and/or RFLP haplotyping of the whole-mtDNA genome. Phylogenies and TMRCA estimates based on the d-loop and RFLPs may be problematic because of homoplasy and heterogeneous mutation rates (Maddison et al. 1992
Tanzania is the only region of Africa where populations speak languages classified as belonging to the 4 major language families present in Africa: Afro-Asiatic, Nilo-Saharan, Niger-Kordofanian, and Khoisan (Greenberg 1963
). The Hadza and Sandawe, who speak a click language classified as Khoisan, are thought to be indigenous to Tanzania. However, populations speaking languages belonging to the other 3 language families are thought to have migrated into Tanzania from the Sudan (Nilotic Nilo-Saharan speakers), Ethiopia (Cushitic Afro-Asiatic speakers), and West Africa (Bantu Niger-Kordofanian speakers) within the past 5,000 years (Ambrose 1982
; Newman 1995
). Given the considerable ethnic and linguistic diversity present in Tanzania, it is not surprising that Tanzanians possess high mtDNA genetic diversity, comparable to the level of genetic diversity observed across continental sub-Saharan Africa. This genetic diversity is distributed among several mtDNA haplogroups that originated at different times in modern human history. The presence of very old mtDNA haplogroups (i.e., L0d, L0f, and L5) in Tanzanians that are rare or absent in other regions of Africa suggests populations in Tanzania may have had a large long-term effective population size and/or a large degree of long-term population structure, which has acted to preserve many divergent and rare mtDNA haplogroup lineages that appeared early in modern human history. The presence of these ancient lineages in Tanzania also suggests that eastern Africa might be the source of origin of many other African mtDNA haplogroup lineages. Our findings are consistent with other studies of mtDNA genetic diversity in African populations that have suggested populations in eastern Africa form a highly diverse gene pool (Watson et al. 1997
; Chen et al. 2000
; Watson and Penny 2003
; Kivisild et al. 2004
). In addition, the TMRCA of mtDNA haplogroup lineages L3, M, and N and their derivatives (94.3 ± 9.9 kya) is approximately half of the TMRCA of all modern humans (194.3 ± 32.55 kya), which supports models predicting that there was a significant period of time in which modern humans lived exclusively in Africa prior to the exodus of modern humans to other regions of the world (Penny et al. 1995
). These observations are consistent with paleobiological and archeological data suggesting that eastern Africa may have been an ancient source of dispersion both within and outside of Africa. The earliest remains of transitional modern humans, dated as early as 196 kya, have been found in Ethiopia (Clark et al. 2003
; White et al. 2003
; McDougall et al. 2005
). The earliest artifacts associated with modern humans are also found in eastern Africa (Foley 1998
). Later, Stone Age technology was established in several regions well before 40 kya in eastern Africa but not until 22 kya in southern Africa (Lahr and Foley 1994
; Lahr 1996
; Foley 1998
).
Further, the reciprocally monophyletic phylogenetic relationship of L0d lineages in the Sandawe and the SAK at the root of the human mtDNA gene tree, indicates an ancient, but unique, genetic connection between these populations (Tishkoff SA and Mountain JL, unpublished data). The oldest L0d lineages are observed in the SAK, but it is possible that the ancestral Khoisan population(s) originated in east Africa and subsequently migrated into southern Africa, and that ancient lineages have been lost in the Tanzanian Hadza and Sandawe populations due to genetic drift (Tishkoff SA and Mountain JL, unpublished data). These observations are consistent with both linguistic data indicating similarities between the Sandawe and SAK languages (Ruhlen 1991
; Ehret 2000
; Traunmuller 2003
) as well as shared subsistence patterns (until recently, the Sandawe maintained a huntergatherer lifestyle). Our findings are also consistent with patterns of variation in the Y chromosome suggesting an ancient genetic connection between SAK and several East African populations (Cruciani et al. 2002
; Semino et al. 2002
). Additional data from other loci and additional populations from Tanzania will help resolve whether the connection between Khoisan speakers in eastern and southern Africa is due to divergence from a common ancestor, or to ancient gene flow, and whether or not the ancestors of the Khoisan-speaking populations originated in eastern or southern Africa (Tishkoff SA and Mountain JL, unpublished data).
Finally, our limited genetic data from Tanzanians belonging to haplogroups M1, N1, and J suggest 2 alternatives that are not mutually exclusive. Populations in Tanzania may have been important in the migration of modern humans from Africa to other regions, as noted in previous studies of other populations in eastern Africa (Quintana-Murci et al. 1999
). For example, mtDNAs of Tanzanians belonging to haplogroup M1 cluster with peoples from Oceania, whereas Tanzanian mtDNAs belonging to haplogroup N1 and J cluster with peoples of Middle Eastern and Eurasian origin. However, the presence of haplogroups N1 and J in Tanzania suggest "back" migration from the Middle East or Eurasia into eastern Africa, which has been inferred from previous studies of other populations in eastern Africa (Kivisild et al. 2004
). These results are intriguing and suggest that the role of Tanzanians in the migration of modern humans within and out of Africa should be analyzed in greater detail after more extensive data collection, particularly from analysis of Y-, X-, and autosomal chromosome markers. Our analyses of African mtDNAs suggest populations in eastern Africa have played an important and persistent role in the origin and diversification of modern humans.
| Supplementary Material |
|---|
|
|
|---|
Supplementary figures S1S4 and tables S1S3 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This study was funded by L.S.B. Leakey Foundation, Wenner Gren Foundation, NSF BCS-0196183, BSC-0552486, and Packard and Burroughs Wellcome Foundation grants to S.A.T. H.M.M. and A.de S. were funded by IGERT-9987590 grant to S.A.T. F.A.R. was supported by National Institutes of Health grant F32HG003801. We thank Kweli Powell, Maritha Koetze and Alain Froment for assistance with DNA sample collection and Nigel Crawhall, Christopher Ehret, Alison Brooks, and Joanna Mountain for helpful discussion. We thank Godfrey Lema, Salum Juma Deo, Paschal Lufungulo, Waja Ntandu, Dr T.B. Nyambo at MUCHS, Dr Audax Mabulla at University of Dar es Saalam, Jeannette Hanby, and David Bygott for their assistance with field work in Tanzania. We thank African participants who generously donated DNA samples so that we might learn more about their population history.
| Footnotes |
|---|
Lisa Matisoo-Smith, Associate Editor
| References |
|---|
|
|
|---|
Ambrose SH. (1982) Archaeology and linguistic reconstructions of history in eastern Africa. In Ehret C and Posnansy M (Eds.). Archaeological and linguistic reconstruction of African history(University of California Press, Berkeley (CA)) pp. 104157.
Andrews R, Kubacka I, Chinnery P, Lightowlers R, Turnbull D, Howell N. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 2:149.[Medline]
Baldwin BG and Sanderson MJ. (1998) Age and rate of diversification of the Hawaiian silversword alliance (Compositae). Proc Natl Acad Sci USA 95:94029406.
Ballard JW and Whitlock MC. (2004) The incomplete natural history of mitochondria. Mol Ecol 13:729744.[CrossRef][Medline]
Bandelt H, Forster P, Rohl A. (1999) Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16:3748.[Abstract]
Bandelt HJ, Kong QP, Richards M, Villems R, Macaulay V. (2006) Estimation of mutation rates and coalescence times: some caveats. Human mitochondrial DNA and the evolution of Homo sapiens (Springer-VerlagIn Bandelt HJ, Macaulay V, Richards M (Eds.). , Berlin (Germany))149179.
Cann RL, Stoneking M, Wilson AC. (1987) Mitochondrial DNA and human evolution. Nature 325:3136.[CrossRef]
Chen J, Sokal RR, Ruhlen M. (1995) Worldwide analysis of genetic and linguistic relationships of human populations. Hum Biol 67:595612.[ISI][Medline]
Chen YS, Olckers A, Schurr TG, Kogelnik AM, Huoponen K, Wallace DC. (2000) mtDNA variation in the South African Kung and Khwe and their genetic relationships to other African populations. Am J Hum Genet 66:13621383.[CrossRef][ISI][Medline]
Chen YS, Torroni A, Excoffier L, Santachiara-Benerecetti AS, Wallace DC. (1995) Analysis of mtDNA variation in African populations reveals the most ancient of all human continent-specific haplogroups. Am J Hum Genet 57:133149.[ISI][Medline]
Clark JD, Beyene Y, WoldeGabriel G, et al. (13 co-authors). (2003) Stratigraphic, chronological and behavioural contexts of Pleistocene Homo sapiens from Middle Awash, Ethiopia. Nature 423:747752.[CrossRef]
Cruciani F, Santolamazza P, Shen P, et al. (16 co-authors). (2002) A back migration from Asia to sub-Saharan Africa is supported by high-resolution analysis of human Y-chromosome haplotypes. Am J Hum Genet 70:11971214.[CrossRef][ISI][Medline]
Destro-Bisol G, Coia V, Boschi I, Verginelli F, Caglia A, Pascali V, Spedini G, Calafell F. (2004) The analysis of variation of mtDNA hypervariable region 1 suggests that Eastern and Western Pygmies diverged before the Bantu expansion. Am Nat 163:212226.
Don RH, Cox PT, Wainwright BJ, Baker K, Mattick JS. (1991) Touchdown PCR to circumvent spurious priming during gene amplification. Nucleic Acids Res 19:4008.
Ehret C. (2000) Language and history. In Heine B and Nurse D (Eds.). African languages: an introduction(Cambridge University Press, Cambridge (UK)).
Excoffier L and Yang Z. (1999) Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. Mol Biol Evol 16:13571368.[Abstract]
Felenstein J. (1993) PHYLIP: phylogenetic inference package(Seattle (WA): Department of Genetics, University of Washington).
Finnila S, Lehtonen MS, Majamaa K. (2001) Phylogenetic network for European mtDNA. Am J Hum Genet 68:14751484.[CrossRef][ISI][Medline]
Foley R. (1998) The context of human genetic evolution. Genome Res 8:339347.
Greenberg J. (1963) The languages of Africa. (Indiana University Press, Bloomington (IN)).
Hammer MF, Garrigan D, Wood E, Wilder JA, Mobasher Z, Bigham A, Krenz JG, Nachman MW. (2004) Heterogeneous patterns of variation among multiple human X-linked loci: the possible role of diversity-reducing selection in non-Africans. Genetics 167:18411853.
Horai S, Hayasaka K, Kondo R, Tsugane K, Takahata N. (1995) Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc Natl Acad Sci USA 92:532536.
Huelsenbeck JP and Crandall KA. (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Annu Rev Ecol Syst 28:437466.[CrossRef][ISI]
Ingman M and Gyllensten U. (2003) Mitochondrial genome variation and evolutionary history of Australian and New Guinean aborigines. Genome Res 13:16001606.
Ingman M, Kaessmann H, Paabo S, Gyllensten U. (2000) Mitochondrial genome variation and the origin of modern humans. Nature 408:708713.[CrossRef][Medline]
Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE, Seielstad MT, Batzer MA. (2000) The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Hum Genet 66:979988.[CrossRef][ISI][Medline]
Kivisild T, Metspalu M, Bandelt HJ, Richards M, Villems R. (2006) The world mtDNA phylogeny. In Bandelt HJ, Macaulay V, Richards M (Eds.). Human mitochondrial DNA and the evolution of Homo sapiens(Springer-Verlag, Berlin (Germany)) pp. 149179.
Kivisild T, Reidla M, Metspalu E, Rosa A, Brehm A, Pennarun E, Parik J, Geberhiwot T, Usanga E, Villems R. (2004) Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am J Hum Genet 75:752770.[CrossRef][ISI][Medline]
Kivisild T, Shen P, Wall DP, et al. (17 co-authors). (2006) The role of selection in the evolution of human mitochondrial genomes. Genetics 172:373387.
Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, Henn BM, Louis D, Ruhlen M, Mountain JL. (2003) African Y chromosome and mtDNA divergence provides insight into the history of click languages. Curr Biol 13:464473.[CrossRef][ISI][Medline]
Kumar S, Filipski A, Swarna V, Walker A, Hedges SB. (2005) Placing confidence limits on the molecular age of the human-chimpanzee divergence. Proc Natl Acad Sci USA 102:1884218847.
Lahr MM. (1996) The evolution of modern human diversity. (Cambridge University Press, Cambridge (UK)).
Lahr MM and Foley RA. (1994) Multiple dispersals and modern human origins. Evol Anthropol 3:4860.[CrossRef]
Macaulay V, Hill C, Achilli A, et al. (21 co-authors). (2005) Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science 308:10341036.
Maddison DR, Ruvolo M, Swofford DL. (1992) Geographic origins of human mitochondrial DNA: phylogenetic evidence from control region sequences. Syst Biol 41:111124.
Maddison WP and Maddison DR. (2000) MacClade: analysis of phylogeny and character evolution(Sinauer Associates, Inc, Sunderland (MA)).
McBrearty S and Brooks A. (2000) The revolution that wasn't: a new interpretation of the origin of modern human behavior. J Hum Evol 39:453563.[CrossRef][ISI][Medline]
McDougall I, Brown FH, Fleagle JG. (2005) Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature 433:733736.[CrossRef][Medline]
Meyer S, Weiss G, von Haesler A. (1999) Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. Genetics 152:11031110.
Mishmar D, Ruiz-Pesini E, Brandon M, Wallace DC. (2004) Mitochondrial DNA-like sequences in the nucleus (NUMTs): insights into our African origins and the mechanism of foreign DNA integration. Hum Mutat 23:125133.[CrossRef][ISI][Medline]
Mishmar D, Ruiz-Pesini E, Golik P, et al. (13 co-authors). (2003) Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci USA 100:171176.
Newman J. (1995) The peopling of Africa. (Yale University Press, New Haven (CT)).
Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D. (2006) Genetic evidence for complex speciation of humans and chimpanzees. Nature 441:11031108.[CrossRef][Medline]
Penny D, Steel M, Waddell PJ, Hendy MD. (1995) Improved analyses of human mtDNA sequences support a recent African origin for Homo sapiens. Mol Biol Evol 12:863882.[Abstract]
Pereira L, Macaulay V, Torroni A, Scozzari R, Prata MJ, Amorim A. (2001) Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Ann Hum Genet 65:439458.[CrossRef][ISI][Medline]
Posada D and Crandall KA. (2001) Intraspecific gene genealogies: trees grafting into networks. Trends Ecol Evol 16:3745.[CrossRef][Medline]
Ptak SE and Przeworski M. (2002) Evidence for population growth in humans is confounded by fine-scale population structure. Trends Genet 18:559563.[CrossRef][ISI][Medline]
Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, Santachiara-Benerecetti AS. (1999) Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet 23:437441.[CrossRef][ISI][Medline]
Rieder MJ, Taylor SL, Tobe VO, Nickerson DA. (1998) Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome. Nucleic Acids Res 26:967973.
Ronquist F and Huelsenbeck JP. (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:15721574.
Rosa A, Brehm A, Kivisild T, Metspalu E, Villems R. (2004) MtDNA profile of West Africa Guineans: towards a better understanding of the Senegambia region. Ann Hum Genet 68:340352.[CrossRef][ISI][Medline]
Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. (2003) DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:24962497.
Ruhlen MA. (1991) Guide to the world's languages. (Stanford University Press, Stanford (CA)).
Ruiz-Pesini E, Mishmar D, Brandon M, Procaccio V, Wallace DC. (2004) Effects of purifying and adaptive selection on regional variation in human mtDNA. Science 303:223226.
Salas A, Richards M, De la Fe T, Lareu MV, Sobrino B, Sanchez-Diz P, Macaulay V, Carracedo A. (2002) The making of the African mtDNA landscape. Am J Hum Genet 71:10821111.[CrossRef][ISI][Medline]
Salas A, Richards M, Lareu MV, Scozzari R, Coppa A, Torroni A, Macaulay V, Carracedo A. (2004) The African diaspora: mitochondrial DNA and the Atlantic slave trade. Am J Hum Genet 74:454465.[CrossRef][ISI][Medline]
Sanderson MJ. (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:12181231.[ISI]



