MBE Advance Access originally published online on November 11, 2008
Molecular Biology and Evolution 2009 26(2):245-248; doi:10.1093/molbev/msn256
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letters |
Accommodating the Effect of Ancient DNA Damage on Inferences of Demographic Histories



* Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh, United Kingdom
Centre for Macroevolution and Macroecology, School of Botany and Zoology, Australian National University, Canberra ACT, Australia
Department of Computer Science, University of Auckland, Auckland, New Zealand
Department of Biology, The Pennsylvania State University
E-mail: a.rambaut{at}ed.ac.uk.
| Abstract |
|---|
|
|
|---|
DNA sequences extracted from ancient remains are increasingly used to generate large population data sets, often spanning tens of thousands of years of population history. Bayesian coalescent methods such as those implemented in the software package BEAST can be used to estimate the demographic history of these populations, sometimes resulting in complex scenarios of fluctuations in population size, which can be correlated with the timing of environmental events, such as glaciations. Recently, however, Axelsson et al. (Axelsson E, Willerslev E, Gilbert MTP, Nielsen R. 2008. The effect of ancient DNA damage on inferences of demographic histories. Mol Biol Evol 25:2181–2187.) claimed that many of these complex demographic trends are likely to be the result of postmortem DNA damage, a problem that they investigate by removing all sites involving transitions from ancient sequences prior to analysis. When this solution is applied to a previously published data set of Pleistocene bison, they show that the demographic signal of population expansion and decline disappears. Although some apparently segregating mutations in ancient sequences may be due to postmortem damage, we argue that discarding the data will result in loss of power to detect patterns of population change. Instead, to accommodate this problem, we implement a model in which sequences are the result of a joint process of molecular evolution and postmortem DNA damage within a probabilistic inference framework. Through simulation, we demonstrate the ability of this model to accurately recover evolutionary parameters, demographic history, and DNA damage rates. When this model is applied to the bison data set, we find that the rate of DNA damage is significant but low and that the reconstruction of population size history is nearly identical to previously published estimates.
Key Words: ancient DNA demographic history coalescent Bayesian MCMC
| Introduction |
|---|
|
|
|---|
The development of techniques to extract and amplify DNA sequences from fossil remains has provided a novel perspective to molecular evolutionary analyses. Large ancient DNA (aDNA) data sets spanning nearly 60,000 years of population history have revealed dramatic fluctuations in genetic diversity across time and space, making it possible to test hypotheses about, for example, the long-term effects of climate change on large mammals (Shapiro et al. 2004
All phylogenetic methods make simplifying assumptions about the evolutionary process. Throughout the history of phylogenetics, violations of these assumptions have invariably arisen, eventually leading to the development of more sophisticated models that more accurately reflect the evolutionary process. In this way, the field has evolved to deal with difficulties such as unequal base frequencies (Felsenstein 1981
), variation in nucleotide substitution rates (Lanave et al. 1984
), and violations of the molecular clock (Thorne et al. 1998
). In their recent paper, Axelsson et al. (2008)
discuss a source of violations that is both specific to and ubiquitous in aDNA data sets: postmortem DNA damage. They correctly argue that DNA damage, which manifests as extraneous mutations along lineages leading to "ancient" specimens, has the potential to confound evolutionary and demographic analyses. Axelsson et al. then demonstrate through simulation how high rates of DNA damage can cause the inference of artifactually complex demographic histories where none are required. The dominant form of postmortem single-base modifications are C to T changes, which will be indistinguishable from G to A changes because damage can occur on either strand. Thus, Axelsson et al. investigate removing all sites containing transitions from alignments of aDNA sequences. Perhaps unsurprisingly, when all segregating transitions, regardless of their frequency in the population, are removed from a data set of Pleistocene and modern bison (Shapiro et al. 2004
), the previously reported demographic signal disappears.
Here, we argue that simply removing data is an inappropriate response, as it will remove the majority of genuine signals of changes in effective population size so that the retrieved demographic will be governed mostly by the prior. Therefore, although the reconstruction of a population growth and crash for the bison data set is superficially similar to the artifact identified by Axelsson et al., this does not mean damage is the cause. Indeed, changes in fossil abundance and diversity throughout the Late Pleistocene as well as modern molecular evidence of a recent, severe, population bottleneck provide strong corroborating evidence of such an evolutionary history; it was the timing of such events that was being investigated by Shapiro et al. (2004)
. Furthermore, as we will show here, when the process of postmortem damage (PMD) is modeled appropriately, the signal of growth and decline remains.
To avoid, or identify and correct, miscoding lesions in aDNA sequences, a variety of experimental protocols have been developed, most focusing on sequence replication. In producing the bison data set, overlapping fragments were amplified, cloned, and multiple polymerase chain reaction products were sequenced for >100 specimens. Replication of the entire extraction and amplification process was undertaken for nearly 15% of specimens at laboratories in Oxford, London, and Copenhagen. These measures will not necessarily avoid every incidence of PMD, but the results of these replication experiments all suggest remarkable preservation of the majority of specimens (as might be expected for permafrost-preserved remains), and thus, most of the observed segregating transitions are more likely to be real than the result of PMD.
To account for DNA damage in the bison and other aDNA data sets, we implemented in BEAST the model that Axelsson et al. (2008)
used to simulate their test alignments, which we will refer to as the PMD model. This model allows each observed state in an alignment to probabilistically be the result of a PMD event. As DNA damage will accumulate through time, we assume that it is more likely for sequences derived from older specimens to have miscoding lesions. To model this, the probability that any given nucleotide remained undamaged is assumed to decay exponentially with sample age. This assumption can be adjusted to more accurately reflect preservation, for example, by parameterizing the damage model with the thermal ages of the samples, rather than radiocarbon ages.
Table 1 provides the results of a simulation study with 200 replicates (see Methods for details). Not accommodating damage when it is present results in overestimation of parameter values for the rate of evolution and the transition–transversion bias. As shown by Axelsson et al., PMD will also result in artifacts in demographic reconstruction (fig. 1a), where damage manifests as a period of steep population growth followed by decline. However, when the same simulated data are analyzed using the PMD model, the evolutionary parameters (table 1) and demographic reconstruction (fig. 1b) are recovered correctly.
|
|
Axelsson et al. suggest that the findings of Shapiro et al. (2004)
|
| Methods |
|---|
|
|
|---|
Previous studies investigating the effects of PMD in the context of coalescent analyses (Ho et al. 2007
To model PMD, we assume the probability that a site is undamaged decays exponentially with rate r. For each site in each sequence, the probability that the site, S, is nucleotide state j given the observed state is i is given as
![]() |
Within the Bayesian MCMC framework, BEAST (Drummond and Rambaut 2007
), the rate of damage, r, is sampled to obtain an estimate of the marginal posterior probability density for this parameter.
Conditioned on the sampling times assumed by Axelsson et al. (2008)
, we generated genealogies using a constant-size coalescent process (with the product of effective population size and generation time set to 105). For each, sequences (of length 606 nucleotides) were simulated under a Kimura substitution model (Kimura 1980
) with the transition–transversion parameter, kappa, set to 10. The rate of substitution was set to 1.5 x 10–7 substitutions/site/year. The process of PMD was simulated by inducing a transition at each site with a probability 1 – e–rt, where t is the age of the sequence and the damage rate, r, was assumed to be 0.7 x 10–7 errors per site per year. Although the parameter values for the bison simulations are not reported by Axelsson et al. (2008)
, the values used here match those used for their figure 5 (Axelsson E, personal communication). The simulated data were analyzed using BEAST v1.5, both with and without the PMD model. For the PMD model, a uniform prior between 0 and 1 was assumed for the damage rate parameter. The demographic function was modeled as a Bayesian Skyline with 10 sampling intervals (Drummond et al. 2005
). All other priors and settings match the defaults for BEAST 1.4.8.
| Acknowledgements |
|---|
|
|
|---|
A.R. is funded by The Royal Society. S.Y.W.H. is funded by the Australian Research Council. We would like to thank Erik Axelsson and Thomas Gilbert for some constructive dialog.
| Footnotes |
|---|
Connie Mulligan, Associate Editor
| References |
|---|
|
|
|---|
Axelsson E, Willerslev E, Gilbert MTP, Nielsen R. The effect of ancient DNA damage on inferences of demographic histories. Mol Biol Evol (2008) 25:2181–2187.
Barnes I, Shapiro B, Lister A, Kuznetsova T, Sher A, Guthrie D, Thomas MG. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr Biol (2007) 17:1072–1075.[CrossRef][Web of Science][Medline]
Debruyne R, Chu G, King CE, et al, (21 co-authors). Out of America: ancient DNA evidence for a new world origin of Late Quaternary woolly mammoths. Curr Biol (2008) 18:1–7.[CrossRef][Web of Science][Medline]
Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol (2007) 7:214.[CrossRef][Medline]
Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol (2005) 22:1185–1192.
Edwards CJ, Bollongino R, Scheu A, et al, (40 co-authors). Mitochondrial DNA analysis shows a Near Eastern Neolithic origin for domestic cattle and no indication of domestication of European aurochs. Proc Roy Soc B Biol Sci (2007) 274:1377–1385.
Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol (1981) 17:368–376.[CrossRef][Web of Science][Medline]
Felsenstein J. Inferring phylogenies (2004) Sunderland (MA): Sinauer Associates.
Finlay EK, Gaillard C, Vahidi SM, Mirhoseini SZ, Jianlin H, Qi XB, El-Barody MA, Baird JF, Healy BC, Bradley DG. Bayesian inference of population expansions in domestic bovines. Biol Lett (2007) 3:449–452.
Griffiths RC, Tavare S. Simulating probability distributions in the coalescent. Theor Popul Biol (1994) 46:131–159.[CrossRef][Web of Science]
Ho SY, Heupink TH, Rambaut A, Shapiro B. Bayesian estimation of sequence damage in ancient DNA. Mol Biol Evol (2007) 24:1416–1422.
Ho SY, Larson G, Edwards CJ, Heupink TH, Lakin KE, Holland PW, Shapiro B. Correlating Bayesian date estimates with climatic events and domestication using a bovine case study. Biol Lett (2008) 4:370–374.
Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol (1980) 16:111–120.[CrossRef][Web of Science][Medline]
Kingman J. The coalescent. Stoch Proc Applic (1982) 13:235–248.[CrossRef]
Lanave C, Preparata G, Saccone C, Serio G. A new method for calculating evolutionary substitution rates. J Mol Evol (1984) 20:86–93.[CrossRef][Web of Science][Medline]
Mateiu LM, Rannala B. Bayesian inference of errors in ancient DNA caused by post mortem degradation. Mol Biol Evol (2008) 25:503–511.
Shapiro B, Drummond AJ, Rambaut A, et al, (27 co-authors). Rise and fall of the Beringian steppe bison. Science (2004) 306:1561–1565.
Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol (1998) 15:1647–1657.[Abstract]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
X. Liu, Y.-X. Fu, T. J. Maxwell, and E. Boerwinkle Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error Genome Res., January 1, 2010; 20(1): 101 - 109. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



