Skip Navigation


MBE Advance Access originally published online on November 11, 2008
Molecular Biology and Evolution 2009 26(2):245-248; doi:10.1093/molbev/msn256
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
26/2/245    most recent
msn256v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rambaut, A.
Right arrow Articles by Shapiro, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rambaut, A.
Right arrow Articles by Shapiro, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letters

Accommodating the Effect of Ancient DNA Damage on Inferences of Demographic Histories

Andrew Rambaut*, Simon Y.W. Ho{dagger}, Alexei J. Drummond{ddagger} and Beth Shapiro§

* Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh, United Kingdom
{dagger} Centre for Macroevolution and Macroecology, School of Botany and Zoology, Australian National University, Canberra ACT, Australia
{ddagger} Department of Computer Science, University of Auckland, Auckland, New Zealand
§ Department of Biology, The Pennsylvania State University

E-mail: a.rambaut{at}ed.ac.uk.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Acknowledgements
 References
 
DNA sequences extracted from ancient remains are increasingly used to generate large population data sets, often spanning tens of thousands of years of population history. Bayesian coalescent methods such as those implemented in the software package BEAST can be used to estimate the demographic history of these populations, sometimes resulting in complex scenarios of fluctuations in population size, which can be correlated with the timing of environmental events, such as glaciations. Recently, however, Axelsson et al. (Axelsson E, Willerslev E, Gilbert MTP, Nielsen R. 2008. The effect of ancient DNA damage on inferences of demographic histories. Mol Biol Evol 25:2181–2187.) claimed that many of these complex demographic trends are likely to be the result of postmortem DNA damage, a problem that they investigate by removing all sites involving transitions from ancient sequences prior to analysis. When this solution is applied to a previously published data set of Pleistocene bison, they show that the demographic signal of population expansion and decline disappears. Although some apparently segregating mutations in ancient sequences may be due to postmortem damage, we argue that discarding the data will result in loss of power to detect patterns of population change. Instead, to accommodate this problem, we implement a model in which sequences are the result of a joint process of molecular evolution and postmortem DNA damage within a probabilistic inference framework. Through simulation, we demonstrate the ability of this model to accurately recover evolutionary parameters, demographic history, and DNA damage rates. When this model is applied to the bison data set, we find that the rate of DNA damage is significant but low and that the reconstruction of population size history is nearly identical to previously published estimates.

Key Words: ancient DNA • demographic history • coalescent • Bayesian MCMC


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Acknowledgements
 References
 
The development of techniques to extract and amplify DNA sequences from fossil remains has provided a novel perspective to molecular evolutionary analyses. Large ancient DNA (aDNA) data sets spanning nearly 60,000 years of population history have revealed dramatic fluctuations in genetic diversity across time and space, making it possible to test hypotheses about, for example, the long-term effects of climate change on large mammals (Shapiro et al. 2004Go; Drummond et al. 2005Go; Barnes et al. 2007Go; Debruyne et al. 2008Go) and the timing and nature of domestication bottlenecks (Edwards et al. 2007Go; Finlay et al. 2007Go; Ho et al. 2008Go). To infer demographic history, each of these analyses used the software package BEAST (Drummond and Rambaut 2007Go), which provides a flexible framework for hypothesis testing with time-structured molecular sequence data. In BEAST, changes in effective population size through time are inferred according to the assumptions of coalescent theory (Kingman 1982Go; Griffiths and Tavare 1994Go), using parametric or nonparametric models of variable population size. Because BEAST employs Bayesian Markov chain Monte Carlo (MCMC) to average over all possible evolutionary histories, the results of demographic analyses can be summarized as plots of effective population size through time (Drummond et al. 2005Go).

All phylogenetic methods make simplifying assumptions about the evolutionary process. Throughout the history of phylogenetics, violations of these assumptions have invariably arisen, eventually leading to the development of more sophisticated models that more accurately reflect the evolutionary process. In this way, the field has evolved to deal with difficulties such as unequal base frequencies (Felsenstein 1981Go), variation in nucleotide substitution rates (Lanave et al. 1984Go), and violations of the molecular clock (Thorne et al. 1998Go). In their recent paper, Axelsson et al. (2008)Go discuss a source of violations that is both specific to and ubiquitous in aDNA data sets: postmortem DNA damage. They correctly argue that DNA damage, which manifests as extraneous mutations along lineages leading to "ancient" specimens, has the potential to confound evolutionary and demographic analyses. Axelsson et al. then demonstrate through simulation how high rates of DNA damage can cause the inference of artifactually complex demographic histories where none are required. The dominant form of postmortem single-base modifications are C to T changes, which will be indistinguishable from G to A changes because damage can occur on either strand. Thus, Axelsson et al. investigate removing all sites containing transitions from alignments of aDNA sequences. Perhaps unsurprisingly, when all segregating transitions, regardless of their frequency in the population, are removed from a data set of Pleistocene and modern bison (Shapiro et al. 2004Go), the previously reported demographic signal disappears.

Here, we argue that simply removing data is an inappropriate response, as it will remove the majority of genuine signals of changes in effective population size so that the retrieved demographic will be governed mostly by the prior. Therefore, although the reconstruction of a population growth and crash for the bison data set is superficially similar to the artifact identified by Axelsson et al., this does not mean damage is the cause. Indeed, changes in fossil abundance and diversity throughout the Late Pleistocene as well as modern molecular evidence of a recent, severe, population bottleneck provide strong corroborating evidence of such an evolutionary history; it was the timing of such events that was being investigated by Shapiro et al. (2004)Go. Furthermore, as we will show here, when the process of postmortem damage (PMD) is modeled appropriately, the signal of growth and decline remains.

To avoid, or identify and correct, miscoding lesions in aDNA sequences, a variety of experimental protocols have been developed, most focusing on sequence replication. In producing the bison data set, overlapping fragments were amplified, cloned, and multiple polymerase chain reaction products were sequenced for >100 specimens. Replication of the entire extraction and amplification process was undertaken for nearly 15% of specimens at laboratories in Oxford, London, and Copenhagen. These measures will not necessarily avoid every incidence of PMD, but the results of these replication experiments all suggest remarkable preservation of the majority of specimens (as might be expected for permafrost-preserved remains), and thus, most of the observed segregating transitions are more likely to be real than the result of PMD.

To account for DNA damage in the bison and other aDNA data sets, we implemented in BEAST the model that Axelsson et al. (2008)Go used to simulate their test alignments, which we will refer to as the PMD model. This model allows each observed state in an alignment to probabilistically be the result of a PMD event. As DNA damage will accumulate through time, we assume that it is more likely for sequences derived from older specimens to have miscoding lesions. To model this, the probability that any given nucleotide remained undamaged is assumed to decay exponentially with sample age. This assumption can be adjusted to more accurately reflect preservation, for example, by parameterizing the damage model with the thermal ages of the samples, rather than radiocarbon ages.

Table 1 provides the results of a simulation study with 200 replicates (see Methods for details). Not accommodating damage when it is present results in overestimation of parameter values for the rate of evolution and the transition–transversion bias. As shown by Axelsson et al., PMD will also result in artifacts in demographic reconstruction (fig. 1a), where damage manifests as a period of steep population growth followed by decline. However, when the same simulated data are analyzed using the PMD model, the evolutionary parameters (table 1) and demographic reconstruction (fig. 1b) are recovered correctly.


View this table:
[in this window]
[in a new window]

 
Table 1 The Result of the Simulation Study

 

Figure 1
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Reconstructed demographic histories for 25 alignments simulated under a constant population size model with PMD at a rate of 0.7 x 10–7 errors per site per year (see Methods for details). Reconstructions were performed a) assuming no postmortem DNA damage and b) incorporating the effect of PMD using the PMD model. The dashed lines show the "true" simulated demographic history.

 
Axelsson et al. suggest that the findings of Shapiro et al. (2004)Go and Drummond et al. (2005)Go are due to the artifact induced by PMD. However, the PMD model applied to these data gives nearly identical results to those previously published, both in terms of estimated parameter values (data not shown) and the reconstructed demographic history (fig. 2). The estimated rate of damage was 1.96 x 10–8 damaged sites per year (95% highest posterior density [HPD] interval: 0.90 x 10–8, 2.99 x 10–8) about a factor of 3 less than the level used in the simulation studies. This provides an expectation that the oldest (hence most damaged) sequences contain only 0.74 damaged sites. This level of PMD produced no qualitative change in the demographic reconstruction or our conclusions from it (fig. 2). Although PMD should, and now can, be accommodated in aDNA analyses, extensive, additional sequence replication as proposed by Axelsson et al. is not necessary if standard aDNA protocols are followed. In addition to accurately accommodating damage, the PMD model can be used to estimate the extent of damage in a data set so as to determine whether additional replication is required. The PMD model has been implemented in BEAST and is freely available from http://beast.bio.ed.ac.uk/.


Figure 2
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— The effect on the reconstructed demographic history of bison using the PMD model. The dashed lines show the mean Bayesian Skyline reconstruction and 95% HPD intervals assuming no PMD. The solid line and intervals show the equivalent reconstruction including the PMD model.

 

    Methods
 TOP
 Abstract
 Introduction
 Methods
 Acknowledgements
 References
 
Previous studies investigating the effects of PMD in the context of coalescent analyses (Ho et al. 2007Go; Mateiu and Rannala 2008Go) modeled this process as a fixed amount of extra evolution on each external branch in the tree. Here, we implement a model that allows the observed sequence to probabilistically be the result of a PMD event. This makes use of the fact that for the standard nucleotide likelihood formulation (Felsenstein 1981Go), it is necessary to specify the probability of the four nucleotide states for each site on an external node in the tree. Assuming the sequence is known without error, the probability for the observed state is 1 and for the other states is 0, but uncertainty can be included in the actual state observed at each site. This approach has been previously proposed as a mechanism to accommodate errors in the process of sequencing (Felsenstein 2004Go).

To model PMD, we assume the probability that a site is undamaged decays exponentially with rate r. For each site in each sequence, the probability that the site, S, is nucleotide state j given the observed state is i is given as

Formula

Within the Bayesian MCMC framework, BEAST (Drummond and Rambaut 2007Go), the rate of damage, r, is sampled to obtain an estimate of the marginal posterior probability density for this parameter.

Conditioned on the sampling times assumed by Axelsson et al. (2008)Go, we generated genealogies using a constant-size coalescent process (with the product of effective population size and generation time set to 105). For each, sequences (of length 606 nucleotides) were simulated under a Kimura substitution model (Kimura 1980Go) with the transition–transversion parameter, kappa, set to 10. The rate of substitution was set to 1.5 x 10–7 substitutions/site/year. The process of PMD was simulated by inducing a transition at each site with a probability 1 – ert, where t is the age of the sequence and the damage rate, r, was assumed to be 0.7 x 10–7 errors per site per year. Although the parameter values for the bison simulations are not reported by Axelsson et al. (2008)Go, the values used here match those used for their figure 5 (Axelsson E, personal communication). The simulated data were analyzed using BEAST v1.5, both with and without the PMD model. For the PMD model, a uniform prior between 0 and 1 was assumed for the damage rate parameter. The demographic function was modeled as a Bayesian Skyline with 10 sampling intervals (Drummond et al. 2005Go). All other priors and settings match the defaults for BEAST 1.4.8.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Acknowledgements
 References
 
A.R. is funded by The Royal Society. S.Y.W.H. is funded by the Australian Research Council. We would like to thank Erik Axelsson and Thomas Gilbert for some constructive dialog.


    Footnotes
 
Connie Mulligan, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Acknowledgements
 References
 

    Axelsson E, Willerslev E, Gilbert MTP, Nielsen R. The effect of ancient DNA damage on inferences of demographic histories. Mol Biol Evol (2008) 25:2181–2187.[Abstract/Free Full Text]

    Barnes I, Shapiro B, Lister A, Kuznetsova T, Sher A, Guthrie D, Thomas MG. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr Biol (2007) 17:1072–1075.[CrossRef][Web of Science][Medline]

    Debruyne R, Chu G, King CE, et al, (21 co-authors). Out of America: ancient DNA evidence for a new world origin of Late Quaternary woolly mammoths. Curr Biol (2008) 18:1–7.[CrossRef][Web of Science][Medline]

    Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol (2007) 7:214.[CrossRef][Medline]

    Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol (2005) 22:1185–1192.[Abstract/Free Full Text]

    Edwards CJ, Bollongino R, Scheu A, et al, (40 co-authors). Mitochondrial DNA analysis shows a Near Eastern Neolithic origin for domestic cattle and no indication of domestication of European aurochs. Proc Roy Soc B Biol Sci (2007) 274:1377–1385.[Abstract/Free Full Text]

    Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol (1981) 17:368–376.[CrossRef][Web of Science][Medline]

    Felsenstein J. Inferring phylogenies (2004) Sunderland (MA): Sinauer Associates.

    Finlay EK, Gaillard C, Vahidi SM, Mirhoseini SZ, Jianlin H, Qi XB, El-Barody MA, Baird JF, Healy BC, Bradley DG. Bayesian inference of population expansions in domestic bovines. Biol Lett (2007) 3:449–452.[Abstract/Free Full Text]

    Griffiths RC, Tavare S. Simulating probability distributions in the coalescent. Theor Popul Biol (1994) 46:131–159.[CrossRef][Web of Science]

    Ho SY, Heupink TH, Rambaut A, Shapiro B. Bayesian estimation of sequence damage in ancient DNA. Mol Biol Evol (2007) 24:1416–1422.[Abstract/Free Full Text]

    Ho SY, Larson G, Edwards CJ, Heupink TH, Lakin KE, Holland PW, Shapiro B. Correlating Bayesian date estimates with climatic events and domestication using a bovine case study. Biol Lett (2008) 4:370–374.[Abstract/Free Full Text]

    Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol (1980) 16:111–120.[CrossRef][Web of Science][Medline]

    Kingman J. The coalescent. Stoch Proc Applic (1982) 13:235–248.[CrossRef]

    Lanave C, Preparata G, Saccone C, Serio G. A new method for calculating evolutionary substitution rates. J Mol Evol (1984) 20:86–93.[CrossRef][Web of Science][Medline]

    Mateiu LM, Rannala B. Bayesian inference of errors in ancient DNA caused by post mortem degradation. Mol Biol Evol (2008) 25:503–511.

    Shapiro B, Drummond AJ, Rambaut A, et al, (27 co-authors). Rise and fall of the Beringian steppe bison. Science (2004) 306:1561–1565.[Abstract/Free Full Text]

    Thorne JL, Kishino H, Painter IS. Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol (1998) 15:1647–1657.[Abstract]

Accepted for publication October 30, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
X. Liu, Y.-X. Fu, T. J. Maxwell, and E. Boerwinkle
Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error
Genome Res., January 1, 2010; 20(1): 101 - 109.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
26/2/245    most recent
msn256v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rambaut, A.
Right arrow Articles by Shapiro, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rambaut, A.
Right arrow Articles by Shapiro, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?