- Split View
-
Views
-
Cite
Cite
Oddný Ósk Sverrisdóttir, Adrian Timpson, Jamie Toombs, Cecile Lecoeur, Philippe Froguel, Jose Miguel Carretero, Juan Luis Arsuaga Ferreras, Anders Götherström, Mark G. Thomas, Direct Estimates of Natural Selection in Iberia Indicate Calcium Absorption Was Not the Only Driver of Lactase Persistence in Europe, Molecular Biology and Evolution, Volume 31, Issue 4, April 2014, Pages 975–983, https://doi.org/10.1093/molbev/msu049
- Share Icon Share
Abstract
Lactase persistence (LP) is a genetically determined trait whereby the enzyme lactase is expressed throughout adult life. Lactase is necessary for the digestion of lactose—the main carbohydrate in milk—and its production is downregulated after the weaning period in most humans and all other mammals studied. Several sources of evidence indicate that LP has evolved independently, in different parts of the world over the last 10,000 years, and has been subject to strong natural selection in dairying populations. In Europeans, LP is strongly associated with, and probably caused by, a single C to T mutation 13,910 bp upstream of the lactase (LCT) gene (-13,910*T). Despite a considerable body of research, the reasons why LP should provide such a strong selective advantage remain poorly understood. In this study, we examine one of the most widely cited hypotheses for selection on LP—that fresh milk consumption supplemented the poor vitamin D and calcium status of northern Europe’s early farmers (the calcium assimilation hypothesis). We do this by testing for natural selection on -13,910*T using ancient DNA data from the skeletal remains of eight late Neolithic Iberian individuals, whom we would not expect to have poor vitamin D and calcium status because of relatively high incident UVB light levels. None of the eight samples successfully typed in the study had the derived T-allele. In addition, we reanalyze published data from French Neolithic remains to both test for population continuity and further examine the evolution of LP in the region. Using simulations that accommodate genetic drift, natural selection, uncertainty in calibrated radiocarbon dates, and sampling error, we find that natural selection is still required to explain the observed increase in allele frequency. We conclude that the calcium assimilation hypothesis is insufficient to explain the spread of LP in Europe.
Introduction
The intestinal enzyme lactase (lactase-phlorizin hydrolase) is required for digestion of lactose, the main carbohydrate in milk, into the readily absorbable monosaccharides glucose and galactose (Swallow and Harvey 1993). While lactase is present in infants, its production is usually downregulated following weaning and it is near absent in about 68% of adults worldwide (Itan et al. 2010). Continued production of lactase into adulthood—lactase persistence (LP)—is a genetically determined, Mendelian dominant trait that shows a highly structured global geographic distribution. It is common in Europe, particularly northwestern Europe, parts of the Indian subcontinent, as well as in many African and Middle Eastern pastoralist groups (Itan et al. 2010; Gallego Romero et al. 2012). It is associated with different allelic variants in different regions (Enattah et al. 2002; Ingram et al. 2007; Tishkoff et al. 2007; Enattah et al. 2008; Ingram et al. 2009) indicating convergent evolution. In Europe (Itan et al. 2009; Itan et al. 2010), the Indian subcontinent (Gallego Romero et al. 2012), and some African peoples (Mulcare et al. 2004), a C to T transition 13,910 bp upstream of the lactase gene (LCT), called -13,910*T (Enattah et al. 2002), predominates, whereas in the Middle East and most parts of Africa three other variants are commonly found in LP individuals (13,915*G, 13,907*G, and 14,010*C) (Ingram et al. 2007; Tishkoff et al. 2007). In addition, the presence of several additional alleles in Ethiopian LP pastoralists suggests the possibility of a recent soft selective sweep (Ingram et al. 2009).
In addition to evidence for the convergent evolution of LP, a low frequency or absence of -13,910*T in Early Neolithic central European farmers (Burger et al. 2007), Middle Neolithic Scandinavian hunter-gatherers (Malmstrom et al. 2010), Early Neolithic farmers from northeast Iberia (Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011), and Late Neolithic farmers from southern France (Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011), as well as long haplotypes surrounding LCT (Bersaglieri et al. 2004; Voight et al. 2006; Sabeti et al. 2007; Tishkoff et al. 2007; Gallego Romero et al. 2012; Schlebusch et al. 2013), low variation in closely linked microsatellites (Coelho et al. 2005), and demic (Gerbault et al. 2009) and spatially-explicit (Itan et al. 2009) simulations, all indicate a recent origin for, and strong natural selection acting on, LP-associated alleles.
There is a clear association between the presence of LP and a history of pastoralism in living populations (Holden and Mace 1997), and estimates of the ages of major LP-associated alleles—specifically -13,910*T in Europe and -14,010*C in Africa—bracket dates for the domestication of milkable animals (Peters et al. 1999; Vigne et al. 2001; Helmer et al. 2005; Peters et al. 2005; Bollongino et al. 2012) and evidence of dairying (Craig et al. 2005; Vigne 2006; Evershed et al. 2008) based in archaeological data.
The reasons why LP should provide such a strong selective advantage have been the subject of considerable speculation (Gerbault et al. 2011). The calcium assimilation hypothesis (Flatz and Rotthauwe 1973) focuses on key nutritional components in milk, specifically vitamin D and calcium. Vitamin D is essential for calcium absorption in the gut and can be obtained from a few dietary sources (e.g., fish and eggs) or by the action UVB light on the skin, which photoconverts 7-dehydrocholesterol into cholecalciferol (vitamin D3). At high latitudes, incident UVB levels are often too low for adequate vitamin D production (Jablonski and Chaplin 2010), so, in the absence of sufficient dietary sources of vitamin D, early northern European farmers may have been calcium deficient and in danger of developing rickets. Milk is a modest source of vitamin D but an excellent source of calcium, and the Calcium Assimilation Hypothesis (Flatz and Rotthauwe 1973) proposes that LP was selectively favorable in northern Europe as a means of avoiding poor calcium and vitamin D status.
While the Calcium Assimilation Hypothesis (Flatz and Rotthauwe 1973) may explain the high frequencies of LP observed in northern Europe, UVB levels in southern Europe, the Middle East, the Indian subcontinent, and Africa should be sufficient to avoid vitamin D deficiency (Jablonski and Chaplin 2010). Despite this, available estimates of selection coefficients for high-UVB non-European regions are also typically high (Bersaglieri et al. 2004; Tishkoff et al. 2007; Gallego Romero et al. 2012; Schlebusch et al. 2013). In modern Iberians, the -13,910*T allele is found at a frequency of around 37% (Rasinpera et al. 2005), consistent with an estimated LP frequency of 66% (Leis et al. 1997; Itan et al. 2010). If these frequencies are also the result of natural selection after the establishment of the Neolithic, then the Calcium Assimilation Hypothesis should not be invoked for this region of Europe.
Although a number of statistical methods that use contemporary data have been developed for detecting natural selection acting on novel alleles (Sabeti et al. 2006), all of these indirect approaches have poor sensitivity and temporal resolution, most are confounded by past demographic processes, and many are insensitive to selection acting on standing variation (Peter et al. 2012). In principle, none should be as powerful as direct comparison allele frequencies at different times in a continuous population. Given such data and conditions, any test for natural selection needs only discount genetic drift as the null explanation for allele frequency change. Furthermore, such an approach should permit temporal resolution of episodic selection given sufficient ancient DNA data. However, in central Europe and in Scandinavia, population continuity between Early Neolithic farming and modern (Bramanti et al. 2009) and between Late Neolithic hunter-gatherer and modern (Malmström et al. 2009) populations, respectively, has been rejected using ancient DNA data. While rejection of population continuity does not constitute evidence of complete population replacement, it clearly indicates that a degree of inward gene flow has taken place (Skoglund et al. 2012).
In this study, we report -13,910 C/T allele frequencies from Late Neolithic and Early Bronze Age northeastern Iberia. Using previously published mitochondrial data from Neolithic farmers from the same region (Sampietro et al. 2007; Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011), as well as modern northeast Iberian sequences (Crespillo et al. 2000), we show by coalescent simulation that population continuity in the intervening period cannot be rejected. We then apply a novel forward simulation approach that accommodates genetic drift, natural selection, uncertainty in calibrated radiocarbon dates, and ancient and modern sampling errors to test whether natural selection is still required to explain that rise in Iberian -13,910*T allele frequencies over the last 7,000 years. Additionally, we perform the same analyses on previously published mtDNA and -13,910*T allele frequency data from southern France (Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011).
Results
PCR and Sequencing
Of the 18 individual skeletal remains examined from the Iberian site of El Portalón, eight yielded reproducible sequences (see table 1). In total, we pyrosequenced these eight samples 60 times; each sample was sequenced between five and ten times (sequences are available from authors upon request). Out of these 60 genotypings, 53 indicated homozygozity for the LCT -13,910*C allele, and 7 indicated heterozygozity (in five of the eight samples). However, for all eight samples, the majority of genotypings consistently indicated homozygozity for the -13,910*C allele (the worst case was ATP6, where 25% of the genotypings indicated heterozygozity and 75% homozygozity for the -13,910*C allele). Previous studies on ancient cattle single-nucleotide polymorphisms using the pyrosequencing approach have demonstrated the presence of damaged bases (Svensson et al. 2007), and after numerous repeat sequencing of our samples we concluded that most of the few apparent heterozygote genotypes are due to postmortem cytosine deamination or type-two damage (Gilbert et al. 2003). Further support for this can be seen from the molecular behavior of considerably older cave bear (Ursus deningeri) (Dabney, Knapp et al. 2013) and hominin (Meyer et al. 2014) sequences recovered from the Sima de los Huesos cave, which is part of the Atapuerca cave system to which El Portalón forms the entrance. In both sets of bone material, nucleotide misincorporation rates in sequencing reads generated on Illumina’s HiSeq 2500 platform were elevated at the 3′ and 5′ ends, indicating authentic ancient DNA (Malmstrom et al. 2007; Dabney, Meyer et al. 2013). Negative controls used in batches showed no or negligible evidence of contamination. However, it should be noted that despite stringent anti-contamination measures, modern contamination can never be completely discounted (Gilbert et al. 2005; Malmstrom et al. 2005; Malmstrom et al. 2007; Malmström et al. 2009).
Sample . | C/C . | T/T . | C/T . |
---|---|---|---|
ATP 3 | 5 | ||
ATP 4 | 6 | 1 | |
ATP 6 | 6 | 2 | |
ATP 7 | 7 | ||
ATP 8a | 8 | 2 | |
ATP 8b | 8 | 1 | |
ATP 9 | 8 | ||
ATP 11 | 5 | 1 |
Sample . | C/C . | T/T . | C/T . |
---|---|---|---|
ATP 3 | 5 | ||
ATP 4 | 6 | 1 | |
ATP 6 | 6 | 2 | |
ATP 7 | 7 | ||
ATP 8a | 8 | 2 | |
ATP 8b | 8 | 1 | |
ATP 9 | 8 | ||
ATP 11 | 5 | 1 |
Sample . | C/C . | T/T . | C/T . |
---|---|---|---|
ATP 3 | 5 | ||
ATP 4 | 6 | 1 | |
ATP 6 | 6 | 2 | |
ATP 7 | 7 | ||
ATP 8a | 8 | 2 | |
ATP 8b | 8 | 1 | |
ATP 9 | 8 | ||
ATP 11 | 5 | 1 |
Sample . | C/C . | T/T . | C/T . |
---|---|---|---|
ATP 3 | 5 | ||
ATP 4 | 6 | 1 | |
ATP 6 | 6 | 2 | |
ATP 7 | 7 | ||
ATP 8a | 8 | 2 | |
ATP 8b | 8 | 1 | |
ATP 9 | 8 | ||
ATP 11 | 5 | 1 |
Population Continuity
Coalescence simulations were unable to reject population continuity in northeastern Iberia from the Neolithic to the present (see fig. 1a) using 18 ancient HVR1 mtDNA sequences (Sampietro et al. 2007; Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011) and 118 modern sequences (Crespillo et al. 2000). Likewise, we were unable to reject population continuity in southeastern France between the Late Neolithic and today (see fig. 1b) using 29 ancient HVR1 mtDNA sequences (Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011) and 109 modern sequences (Dubut et al. 2004).
Because our test for population continuity in northeastern Iberia from the Neolithic to the present involved using ancient mtDNA data from two independent studies (Sampietro et al. 2007; Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011), raising the possibility of local structuring, we also performed coalescent simulations to test for continuity using those two samples independently (see supplementary fig. S2a and Supplementary Data, Supplementary Material online). As can be seen, for both samples we are still unable to reject population continuity over the vast majority of combinations of the NUP and NN parameters.
Selection on LCT -13,910*T
In all three comparisons of LCT -13,910*T allele frequency between ancient and modern samples, neutrality was rejected for all simulated values of ancestral population size (see fig. 2). We performed two separate analyses of LCT -13,910*T allele frequency change in northeastern Iberia, the first between the ancient sample presented here and a modern data set (Rasinpera et al. 2005) and the second between the ancient LCT data presented by Lacan et al (2011) and the same modern Iberian data set. It is notable that the selection values giving the highest P values for the observed allele frequency increase are greater for the ancient LCT data presented here (1.8%) than the data presented by Lacan et al. (2011) (1.2%). This is almost certainly because the LCT data presented here comes from individuals living more recently (mean cal. BP = 3,735 years) than those analyzed by Lacan et al. (2011) (7,000 years BP), and because the LCT -13,910*T allele was not observed in either ancient Iberian sample, the rise in frequency over a shorter period requires greater selection pressures. The selection value (2.4%) giving the highest P values for the observed allele frequency increase in southern France (Balkau 1996; Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011) is somewhat higher than that for the Iberian data presented here (1.8%).
Discussion
We have formally tested for population discontinuity between a combined Early (Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011) and Middle Neolithic (Sampietro et al. 2007) sample and a modern sample (Crespillo et al. 2000) from northeast Iberia. For virtually all combinations of NUP and NN, a significant proportion of simulated FST values were equal to or greater than that observed (fig. 1), and so we were unable to reject population continuity. These results are reminiscent of those presented by Gamba et al. (2012) except that we used a combined ancient sample. If the two ancient northeast Iberian samples considered were from a relatively unstructured population, then combining them should increase scope for rejecting continuity, thus making our test more conservative. However, to allow for the possibility that those samples were from differentiated populations, we also performed coalescent simulations with each ancient sample independently (see supplementary fig. S2a and Supplementary Data, Supplementary Material online) and were still unable to reject population continuity over the majority of combinations of the NUP and NN parameters. We were likewise unable to reject population continuity between Late Neolithic and modern populations in southeastern France, using a larger ancient DNA sample (Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011). To examine the sensitivity of our continuity tests to the modern population size parameter, we performed identical coalescent simulations to those described earlier except that we set the modern effective population sizes to 0.1 times and 10 times the values assumed from census data (see supplementary fig. S1a–d, Supplementary Material online). Varying this parameter by an order of magnitude each way made little difference to our results and were still unable to reject population continuity across all combinations of the NUP and NN parameters.
Since continuity is the null hypothesis, failure to reject it does not constitute a demonstration of it. However, we note that population continuity has been rejected between Neolithic farmers and modern populations from central Europe (Bramanti et al. 2009) and between Late Neolithic hunter-gatherers and modern populations from Scandinavia (Malmström et al. 2009) using smaller ancient sample sizes. It thus seems likely that population turnover since or shortly after the Neolithic transition has been less severe in southwestern Europe than in central or northern Europe (Pinhasi et al. 2012).
Our test for neutrality of the LCT -13,910*T allele is contingent on population continuity and accounts for genetic drift, ancient and modern sampling errors, and calibrated radiocarbon date uncertainty. We believe that our accommodation of the latter is a first in evolutionary inference using serial DNA samples. An important caveat to the selection coefficients estimates presented here is the assumption that selection is constant and starts at the time of the ancient sample used. This is illustrated by the differences between the point estimate of S = 1.2% using the Early Neolithic data of Lacan et al. (2011) and the point estimate of S = 1.8% using the Late Neolithic/Early Bronze Age LCT data presented here; higher selection coefficients are required to increase allele frequency by the same amount but over a shorter period of time, because they start with zero observed -13,910*T alleles. Thus, our estimates of S should be considered conservative—the true values may be higher.
The LCT data presented here comes from an archaeological site where faunal remains from the same layers have a mean age of 3,735 years cal. BP. Although the sample is relatively small, using the classical formula p = 1 − α(1/n), where α is the probability of no occurrences of an allele in n observations (Burger et al. 2007), we estimate that the maximum frequency, p, of the LCT -13,910*T allele in this population should be no more than 0.17 with 95% confidence and no more than 0.25 with 99% confidence and could have been zero. If frequencies were very low then this is intriguing given the relatively recent date range and suggests a late introduction but rapid rise in frequency of LCT -13,910*T in Iberia. This, in turn, suggests that LCT -13,910*T was not introduced—or was only introduced at very low frequencies—with arrival of the Neolithic in Iberia. As the mtDNA analyses presented here and elsewhere (Izagirre and de la Rua 1999; Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011; Gamba et al. 2012; Hervella et al. 2012; Pinhasi et al. 2012) indicate little population turnover in Iberia since at least the Middle Neolithic, it is difficult to account for the observed change in LCT -13,910*T allele frequency by gene flow from neighboring regions, unless mediated primarily by males, further supporting the strong selection coefficients estimated here.
If the selection coefficients estimated for southeastern France and northwestern Iberia are correct, or underestimates, it is unlikely that they are explained by the Calcium Assimilation Hypothesis (Flatz and Rotthauwe 1973) as incident UVB levels in both regions permit adequate vitamin D production (Jablonski and Chaplin 2010). It is not currently possible to compare the selection coefficients estimated here with those for northern Europe using the same selection detection approach because, although LCT -13,910*T allele frequency data are available for ancient samples from northern Europe and Scandinavia (Burger et al. 2007; Malmstrom et al. 2010), in both regions population continuity between Neolithic and modern populations has been rejected (Bramanti et al. 2009; Malmström et al. 2009; Skoglund et al. 2012). However, the selection coefficients estimated here are remarkably similar to those presented in a recent study (Peter et al. 2012) for the LCT -13,910*T allele (2.5%; 95% CI 0.4–20%) using a Finnish sample and approximate Bayesian computation (Beaumont et al. 2002), which also concluded that LCT selection acted on de novo, rather than standing variation. The selection coefficient estimate mode reported by Itan et al. (2009) was 9.53%, but this was assumed to apply to less than half of the population of Europe (dairying farmers) and to be spatiotemporally constant. To a first order of approximation, this could be interpreted as indicating a Europe-wide selection coefficient of 4% or 5%, which is about twice the levels we estimate here for southwestern Europe. Thus, it is conceivable that additional factors, possibly including calcium assimilation, led to different selection coefficients in Northern Europe. Alternatively, as suggested by Itan et al. (2009), higher frequencies of LCT -13,910*T in northwestern Europe may be the result of a combination of strong selection and allele surfing (Edmonds et al. 2004; Klopfstein et al. 2006) on a wave of advance of farmers spreading across the continent.
A temporal dimension to estimates of selection coefficients on LCT -13,910*T is still lacking, but it is certainly possible that selection on this allele was episodic (Itan et al. 2009), possibly in response to oscillations in food supply (Gerbault et al. 2011). While the data that we present is insufficient to detect episodic selection, the method we present here should be adequate for this purpose given sufficient serial sampled data.
Despite overwhelming evidence for strong selection on LP alleles, including that presented here, the reasons for that selection remain elusive (Gerbault et al. 2011). In addition to the calcium assimilation hypothesis (Flatz and Rotthauwe 1973) various other hypotheses have been proposed. Apart from the obvious nutrient value of milk in terms of protein, fat, calories, and various vitamin and mineral micronutrients, milk production may have offered significant economic advantages (Benecke 1994) and a good source of relatively uncontaminated fluid, particularly in arid environments (Cook and al-Torki 1975), although this may be less relevant in Europe (Gerbault et al. 2011). Others have suggested that a milk-rich diet may offer some protection against malaria (Anderson and Vullo 1994; Cordain et al. 2012) or that LP spread as prestige class behavior (Durham 1991). Alternatively, dairy-based economies are likely to have been more buffered against food supply fluctuations than cereal-based economies, both because of the boom-and-bust of growing seasons and the possibility of crop failure (Gerbault et al. 2011). In mixed economies when crops have failed, fermented milk products are an alternative source of nutrients to LP and non-LP individuals alike. However, as lower lactose content milk products are consumed, so high lactose content products (milk and yoghurt) would be left. In famine conditions, the consequences of high lactose food consumption in non-LP individuals (particularly diarrhea) would be more severe than in well-nourished non-LP individuals, perhaps leading to high but episodic selection differentials.
Material and Methods
Samples
We attempted to extract DNA from 18 human mixed skeletal remains, out of which 8 yielded usable DNA. The remains were excavated from the Holocene site of El Portalón, the present day entrance to the Cueva Mayor Karst complex, in the Sierra de Atapuerca, 15 km east of Burgos, Spain (Carretero et al. 2008). There are no direct radiocarbon dates for these samples, but 13 radiocarbon dates are available from horse and cattle remains from the same layers (Lira et al. 2010).
DNA Extraction
Samples were handled in a clean laboratory under positive air pressure, UV irradiated daily, and isolated from post-polymerase chain reaction (PCR) work. Personnel wore disposable facemasks, zip suits, and sterile latex gloves at all times and do not enter the facility if they have worked in a post-PCR environment the same day. Equipment, nonorganic buffers, and reagents were UV irradiated.
Bone samples were individually irradiated in a crosslinker (Ultra-Violet Products Ltd., Cambridge, UK) with 6.0 J/cm2. From each bone, a millimeter of approximately 1 cm surface area was removed by drilling. Two 0.10 g samples of bone powder were taken from each bone using a drill hood which was UV radiated between sample batches. DNA was then extracted from the bone powder using a modification (Malmstrom et al. 2005) of the silica spin method (Yang et al. 1998). One in every four samples included an extraction control (milli-Q water).
Genotyping
A 53 bp fragment containing the -13,910 C/T polymorphism was amplified using primers “GCTGGCAATACAGATAAGATAATG” and “GAGGAGAGTTCCTTTGAGGC” (Tag Copenhagen). The forward primer was biotinylated (Malmstrom et al. 2005). Amplification was performed using 5 µl of DNA extract, 300 nM primers, and two units of Smart Taq Hot Start (Naxo). One in every three PCRs was a negative control containing only milli-Q water. PCR amplification conditions were 15 min at 95 °C, followed by 50 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 30 s. The -13,910* alleles were identified through pyrosequencing using primer “CCTTTGAGGCCAGGG”, as previously described (Anderung et al. 2005). Dispensation order for the Pyrosequencer was CATCAGT/CGATG.
Data Analysis
To test for population continuity in northeastern Iberia from the Neolithic until present, we first calculated the molecular FST (Excoffier et al. 1992) between 118 modern mtDNA hypervariable region I (HVR1) sequences from northeastern Iberia (Crespillo et al. 2000) and a combined sample of 18 homologous sequences from Neolithic remains from comprising 11 sequences from Granollers (Catalonia, northeast Spain) dated to 5,500 years BP (Sampietro et al. 2007) and 7 sequences from remains excavated from a Spanish funeral cave dating to around 7,000 years BP (Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011). The region of overlap for these three HVR1 samples was 310 bp (from 16,053 to 16,362 on the Cambridge reference sequence [Anderson et al. 1981]). The FST between the ancient and modern sequence samples (0.0101; P = 0.154) was calculated using the amova() function in the “R” (R Core Team 2012) library ade4 (Dray and Dufour 2007). P values for these FST estimates were calculated by permuting the data 100,000 times. We then examined whether this observed FST was greater than expected under a model of population continuity for a range of combinations of female effective population sizes at the Upper Palaeolithic transition in Europe (NUP) 45,000 years ago (Powell et al. 2009), and the Neolithic transition in Iberia (NN) 7,500 years ago (López de Pablo and Gómez Puche 2009), as described previously (Bramanti et al. 2009; Malmström et al. 2009), assuming exponential growth between 45,000 years ago and 7,500 years ago, and between 7,500 years ago and the present, and a generation time of 25 years. We assumed a modern female effective population size of 369,715 (one-tenth of the modern female population size of Catalonia and Aragon combined [census estimates for 1999, source: http://www.google.co.uk/publicdata/, last accessed February 4, 2014] for the region from which the modern sequences were sampled [Crespillo et al. 2000]). Ten thousand coalescent simulations were performed for each of all 10,000 combinations of 100 equally spaced values of NUP, ranging from 10 to 5,000, and 100 equally spaced values of NN, ranging from 1,000 to 100,000, using Fastsimcoal (Excoffier and Foll 2011). Simulated FST values were calculated using the same amova() function as used to calculate the observed FST values, and the proportion of simulated FST values equal to or greater than the observed FST were plotted for each combination of NUP and NN.
The above analysis was repeated for southeastern France using a combined modern sample of 106 mtDNA hypervariable region I sequences comprising 37 sequences from the department of Var in the region Provence-Alpes-Côte d'Azur in Provence and 69 sequences from the region of Périgord Limousin, straddling three western-central departments (northern Dordogne, western Corrèze, and southern Haute-Vienne) (Dubut et al. 2004), and 29 ancient sequences from a Late Neolithic necropolis in Treilles (Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011) (FST = 0.00553; P = 0.201). The region of overlap was 308 bp, from 16,055 to 16,362 on the Cambridge reference sequence (Anderson et al. 1981). Simulations were performed as described above except that a modern effective female population size of 480,598 was assumed (one-tenth of the modern female population of the combined departments of Var, Hérault, Dordogne, Corrèze, Haute-Vienne, Bouches-du-Rhône, Gard, Lozere, Aveyron, Cantal, Lot, Alpes-de-Haute-Provence, Vaucluse, Ardèche, Drôme, Haute-Loire, Puy-de-Dôme, Creuse, Charente, Lot-et-Garonne, Tarn-et-Garonne, Tarn [source: 2006 Population estimates from http://www.citypopulation.de/, last accessed February 4, 2014]).
To test whether changes in the -13,910*T allele frequency can be explained by genetic drift, or whether natural selection needs to be invoked, and to estimate the strength of natural selection where appropriate, we used a forward simulation approach. For the Iberian ancient sample reported here, we first obtained 10,000 Markov chain Monte Carlo samples (chain length = 1,100,000, burn in = 100,000, every tenth sample retained) from the calibration process of 13 faunal samples from the site of El Portalo'n de Cueva Mayor (Sierra de Atapuerca, Burgos) that were found at the same layer as the human samples for which we report LCT -13,910*T allele frequency estimates here (Lira et al. 2010), using the R library Bchron (Haslett and Parnell 2008) and the IntCal09 calibration curve (Heaton et al. 2009). These MCMC samples form a random sample of the summed calibrated date distribution for all 13 faunal samples, permitting integration of date uncertainty into our forward simulations.
To reflect uncertainty in ancestral allele frequencies, assuming a uniform allele frequency prior, estimates were drawn from a random Beta(np + 1,nq + 1) distribution, where np and nq were equal to the number of -13,910*T alleles and the number of -13,910*C alleles in the ancient sample, respectively. Then, drift and natural selection were forward simulated by binomial sampling across generations, with N equal to the population size in each generation, plus a standard selection equation assuming dominance (Maynard Smith 1998), respectively. Exponential population growth was modeled from a range of values of Ne at the time the ancient sample (50 equally spaced values between 1,000 and 100,000) to a modern Ne of 681,332 (one-tenth census population size of Catalonia in 2004 = 6,813,319; source: http://www.idescat.cat/, last accessed February 4, 2014). The number of generations forward simulated was drawn at random from the pool of 10,000 MCMC samples from each of the 13 calibrated radiocarbon dates (Lira et al. 2010) assuming a generation time of 25 years. In the final generation, the simulated modern sample allele frequencies were picked from a random binomial with N equal to the modern sample size (n = 442 alleles) (Rasinpera et al. 2005) and the simulated distribution of frequencies compared with the observed frequencies using the formula 1 − 2 × |0.5 − P|, where P is the proportion of simulated modern allele frequencies that are greater than that observed. This yields a two-tailed empirical P value for the similarity between the observed modern allele frequency and the distribution of simulated allele frequencies, given the demographic and natural selection model parameters modeled (Voight et al. 2005).
The above analysis was repeated for published ancient LCT −13,910*T allele frequency data from seven Neolithic remains excavated from a Spanish funeral cave dating to around 7,000 years BP (Lacan, Keyser, Ricaut, Brucato, Tarrus et al. 2011), using the same northeast Iberian modern -13,910*T allele frequency data (Rasinpera et al. 2005), and from 26 human samples from a Late Neolithic necropolis in Treilles, southern France (Lacan, Keyser, Ricaut, Brucato, Duranthon et al. 2011) using a combined modern French -13,910*T allele frequency data set of 270 chromosomes from individuals unrelated at the second cousin level (Balkau 1996).
The Sequences obtained from the eight individuals from this study are available from the authors upon request.
Acknowledgments
O.O.S. is funded by an EU Marie Curie FP7 Framework Programme grant (LeCHE, grant ref. 215362-2) awarded to A.G., M.G.T., and others not involved in this study. The authors acknowledge the use of the UCL Legion High Performance Computing Facility (Legion@UCL), and associated support services, in the completion of this work. Atapuerca project is funded by Spanish Government project CGL2012-38434-C03-01 and field work at the Atapuerca sites by the Junta de Castilla y León. The authors thank David Balding and Douglas Speed for their helpful advice and all the participants of the D.E.S.I.R study—genotyping for the modern French LCT data was supported by the Conseil Régional Nord-Pas-de-Calais Fonds Européen de Développement Economique et Regional CPER axe Cartdiodiabète 2010–2011 grant to Nabila Bouatia-Naji.
References
Author notes
Associate editor: Beth Shapiro