Skip Navigation


MBE Advance Access originally published online on September 7, 2006
Molecular Biology and Evolution 2006 23(12):2271-2273; doi:10.1093/molbev/msl107
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/12/2271    most recent
msl107v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Woodhams, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Woodhams, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letters

Can Deleterious Mutations Explain the Time Dependency of Molecular Rate Estimates?

Michael Woodhams

Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, New Zealand

E-mail: m.d.woodhams{at}massey.ac.nz.


    Abstract
 TOP
 Abstract
 Supplementary Material
 Acknowledgements
 References
 
It has recently been observed by Ho et al. (Ho SYW, Phillips MJ, Cooper A, Drummond AJ. 2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol. 22(7):1561–1568) that apparent rates of molecular evolution increase when measured over short timespans. I investigate whether the data are explainable purely by deleterious mutations. I derive an empirical approximation for the persistence of these mutations in a randomly mating population and, hence, derive lower limits on effective population sizes. These limits are high and get higher if additional reasonable assumptions are made. This casts doubt on whether deleterious mutations are able to explain the apparent rate acceleration.

Key Words: deleterious mutation • nearly neutral evolution • mutation rate • substitution rate • rate calibration

The molecular clock has a long, fruitful, and sometimes controversial history (Bromham and Penny 2003Go). Typically, a phylogenetic tree is constructed, the clock is calibrated via one or more internal nodes whose dates are fixed by paleontology, and the dates of remaining internal nodes are estimated.

Ho et al. (2005)Go have observed an acceleration of molecular evolution on short timescales (fig. 1). The rate against time graph resembles the letter "J" fallen on its side, hence has been named the "J-shaped curve" (Penny 2005Go, Fig. 1) or "the lazy J." When evolutionary rates follow such a curve, using a molecular clock calibrated on the long-term rate will overestimate the length of recent time intervals. For example, accounting for the J-shaped curve decreases the estimated date of Mitochondrial Eve from 135 to 176 thousand years ago (kya) to 76 kya (Ho et al. 2005Go).


Figure 1
View larger version (6K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The J-shaped curve. Data points are for protein-coding mitochondrial genes in primates (Ho et al. 2005Go, Fig. 1b). The curve is the maximum likelihood fit for the model discussed in this paper. (Ng = 978,000, {psi} = 3,395, {nu} = 0.00193.)

 
Discarding the unlikely proposition that mutation rates have been greatly increasing over the last million years or so, we are left needing an explanation of how mutations can be maintained over short periods of time, yet eliminated on long periods. Slightly deleterious mutations are an obvious candidate—over short timescales, they behave like neutral mutations, with prevalence dominated by genetic drift. Over long timescales, their deleteriousness greatly reduces their chance of becoming fixed, so they contribute little to the (long-term) substitution rate (Kimura 1983Go, Fig. 3.7). Ho et al. (2005)Go list 4 possible causes for the J curve, including the one considered here (purifying selection). The other 3 candidates (sequencing errors, saturation of mutational hotspots, and errors in calibration times), they analyze and reject as insufficient to explain the observations.

Consider the simple model of a randomly mating haploid population of fixed size N, with nonoverlapping generations. At generation t = 0, a mutant allele arises in a single individual with selective advantage s. I define function F(N, s, t) to be the proportion of the population carrying the mutant allele at time t.

If N is large enough, we can treat the fraction of mutant alleles and the time as continuous variables. This is known as the diffusion approximation (Crow and Kimura 1970Go, Section 8.3). The solution is independent of N, up to scaling in the selection and time parameters. The scaled time is {tau} = t/N (Ewens 2004Go, Section 5.1) and scaled selection S = 2Ns (Kimura 1983Go, eq. 3.11). (My definition of S differs from Kimura's by a factor of 2, as I am considering a haploid population.) I define G(S, {tau}) to be the solution to the diffusion equation, with scaling given by:

Formula (1)

Consider how many mutations a randomly selected individual has accumulated compared with its ancestor of t0 generations ago. Let µ be the mean mutation rate (mutations/individual/generation) and assume all mutations have selective advantage s. Between times t and t + dt, we expect µN dt new mutations to arise in the population, each of which has probability F(N, s, t) of being present in the descendant. We integrate over time to find the expected number of differences between ancestor and descendant and divide by t0 to find the estimated mutation rate R:

Formula (2)
Note that for neutral (s = 0) evolution, R = µ as F(N, 0, t) = 1/N.

Now, we need to consider the distribution of the advantage parameter S. I take a model where all mutations are deleterious, with S following an exponential distribution with mean –{psi} (i.e., the probability density function of S is es/{psi}/{psi}). Define function R* to be the observed rate function integrated over the distribution of S:

Formula (2)
Examples of this function are plotted in figure 2c and d.


Figure 2
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— (a) Persistence G(S, {tau}) of a mutation in the population. (b) Apparent mutation rate R for fixed selection parameter S. (c, d) Apparent mutation rate R* for exponential distribution of S. In (c), the curves are normalized to R*(S, 0) = 1, and in (d) they are normalized to R*(S, {infty}) = 1. Note that the curves approach the asymptote faster for higher |S| or {psi}.

 
F(N, s, t) can be calculated exactly for small N by modeling the number of mutant alleles in the population as a discrete Markov process. The Markov matrix M corresponding to a single discrete generation is readily calculable. The eigendecomposition of this matrix can be used to express F as a constant plus a sum of decaying exponentials (see Supplementary Material online). (There is an analytic solution for the diffusion approximation [Crow and Kimura 1970Go, Section 8.6], but it is not useful here as the solution is a series that converges poorly for large |S|.)

I use the exact solutions to F for –100 ≤ S ≤ 10 and N ≤ 500 to find an approximation for G. The approximation is a constant plus up to 3 decaying exponentials. For S > –30, the decay parameters are found by interpolation from the exact results for N = 500 (from eigendecomposition of the Markov matrix). For S < –30, the approximation is

Formula (4)

Now we can compare the predictions of the R* curve against the J-shaped curve graphs of Ho et al. (2005)Go. Roughly speaking, the ratio between the y intercept of the curve (zero-time mutation rate) and the asymptote (long-term mutation rate) sets {psi}, and the decay rate sets N.

Each data point from Ho et al. (2005)Go consists of a calibration time, mean rate, and 95% confidence limits on the rate. I have modeled the distribution of the rate estimate as a log normal distribution with mean equal to the given mean and 95% of the weight within the given confidence limits. This model allows me to calculate a likelihood for a given rate curve (parameterize by Ng, {psi}, and {nu}, where g is the mean generation time and {nu} is the asymptotic mutation rate, i.e., the substitution rate, as opposed to µ, the instantaneous mutation rate).

To simplify calculations, I have truncated the range of S to –1,000 < S < 0. (Note that we are not considering highly deleterious or lethal mutations. Even at S = –1,000 and N = 10,000, we have s = –0.05, just a 5% reduction in fitness.) The maximum likelihood fits of this model to the Ho et al. (2005) data are shown in table 1.


View this table:
[in this window]
[in a new window]

 
Table 1 Maximum Likelihood Fits of the Model to the Data of Ho et al. Ng Is Population Size Times Generation Time, {psi} Is a Measure of How Deleterious the Mutations Are, {nu} Is the Long-Term Mutation Rate

 
Notice that very high values of {psi} are obtained (most mutations have large negative S), and effective populations are on the order of 108 for birds and 105 for primates. Figure 1 shows one of these maximum likelihood curves.

To estimate a rough lower limit for Ng, I set trial values of Ng and optimize on the other 2 variables. The point at which the log likelihood is less than the maximum likelihood by 2 is the point at which allowing Ng to vary significantly outperforms the fixed value (likelihood ratio test.) This gives limits Ng ≥ 6.3 x 106 (avian), Ng ≥ 5.0 x 105 (primate protein), and Ng ≥ 4.9 x 105 (primate d-loop). Hence, if the J-shaped curve is due to deleterious mutation, we can set approximate effective population limits of N > 1.6 million for bird species (assuming g = 4 years) and N ≥ 50,000 for primate species (assuming g = 10 years). These bounds are effective population sizes—census populations are typically 4–20 times larger (Frankham 1995Go). (The approximate generation times are justified in the Supplementary Material online.)

The essence of these bounds is this: to explain (by deleterious mutation) the contrast between short-term and long-term rates, it is necessary to have many significantly deleterious mutations. (Note that for a population of 1 million S = –1,000 implies s = –5 x 10–4, so these are still minor mutations from an individual's point of view.) The more deleterious the mutations, the quicker G(S, {tau}) the decays to its asymptotic value (fig. 2a). The quicker the decay (in scaled time {tau}), the larger the N must be to match the observed decay due to the timescaling equation t = N{tau}.

Note that these limits are conservative as they are based on the assumption that there are no strictly neutral mutations. For neutral mutations, F, G, and R are constant. If some fraction of the asymptotic rate is due to neutral mutations, the remainder (to be explained by deleterious mutations) has a larger contrast between short- and long-term rates, hence requires larger N. For example, the 95% confidence limit for the avian data is Ng ≥ 1.1 x 107, if half of the fixation rate is due to strictly neutral mutations—an increase of 75%. If we are unwilling to accept high ratios of short- to long-term mutation rates and fix the ratio at 100:1 (i.e., set {psi} = 163), then the avian data gives 95% confidence limit Ng ≥ 2.2 x 107.

I have not examined the effects of population or generation time varying over time or between lineages. The general rule is that the long-term effective population size is the harmonic mean of the short-term effective population sizes (Wright 1938Go), so bottlenecks have a disproportionate effect.

These results should be taken as indicative only—large uncertainties remain, notably that the error bars in Ho et al. (2005)Go are large, generation times have not been accurately estimated, a simplistic model of the distribution of S was used, and the effects of variable population sizes not accounted for.

Recently Bazin et al. (2006)Go have concluded that mitochondrial evolution is dominated by positive selection. If confirmed, this result invalidates my model and rules out deleterious mutations as an explanation for the J-shaped curve.

In conclusion, it appears that large effective populations are required to explain the J-shaped curve purely by deleterious mutations alone. The increase in observed rates in the short term (the height of the J-shaped curve) requires most mutations to be significantly deleterious and, hence, quickly lost from the population. Large populations are then required to maintain these mutations for the timescales over which the apparent rate is elevated (the length of the J curve's hook).


    Supplementary Material
 TOP
 Abstract
 Supplementary Material
 Acknowledgements
 References
 
A supplementary material elaborating on the mathematics and the derivation of the approximation (eq. 4) is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Supplementary Material
 Acknowledgements
 References
 
This research was inspired by discussions with Mike Hendy and David Penny, who also reviewed the manuscript. Simon Ho provided a copy of the data from Ho et al. (2005). I have had useful suggestions and criticisms from them and from Michael Baake, Alexei Drummond, Robert McLachlan, and Hamish Spencer.


    Footnotes
 
Peter Lockhart, Associate Editor


    References
 TOP
 Abstract
 Supplementary Material
 Acknowledgements
 References
 

    Bazin E, Glémin S, Galtier N. (2006) Population size does not influence mitochondrial genetic diversity in animals. Science 312:5773570–572.[Abstract/Free Full Text]

    Bromham L and Penny D. (2003) The modern molecular clock [historical article]. Nat Rev Genet 4:3216–224.[CrossRef][Web of Science][Medline]

    Crow JF and Kimura M. (1970) An introduction to population genetics theory. (Harper & Row, New York).

    Ewens WJ. (2004) Mathematical population genetics I. Theoretical introduction. (Springer-Verlag, New York).

    Frankham R. (1995) Effective population size/adult population size in wildlife: a review. Genet Res 66:95–107.[Web of Science]

    Ho SYW, Phillips MJ, Cooper A, Drummond AJ. (2005) Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol 22:71561–1568.[Abstract/Free Full Text]

    Kimura M. (1983) The neutral theory of molecular evolution. (Cambridge University Press, Cambridge).

    Penny D. (2005) Evolutionary biology: relativity for molecular clocks [news]. Nature 436:7048183–184.[CrossRef][Medline]

    Wright S. (1938) Size of a population and breeding structure in relation to evolution. Science 87:430–431.

Accepted for publication September 1, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
G. I. Peterson and J. Masel
Quantitative Prediction of Molecular Clock and Ka/Ks at Short Timescales
Mol. Biol. Evol., November 1, 2009; 26(11): 2595 - 2603.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
B. M. Henn, C. R. Gignoux, M. W. Feldman, and J. L. Mountain
Characterizing the Time Dependency of Human Mitochondrial DNA Mutation Rate Estimates
Mol. Biol. Evol., January 1, 2009; 26(1): 217 - 230.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C. P. Burridge, D. Craw, D. Fletcher, and J. M. Waters
Geological Dates and Molecular Rates: Fish DNA Sheds Light on Time Dependency
Mol. Biol. Evol., April 1, 2008; 25(4): 624 - 633.
[Abstract] [Full Text] [PDF]


Home page
Biol LettHome page
S. Y.W Ho, S.-O. Kolokotronis, and R. G Allaby
Elevated substitution rates estimated from ancient DNA sequences
Biol Lett, December 22, 2007; 3(6): 702 - 705.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
S. Y. W. Ho, B. Shapiro, M. J. Phillips, A. Cooper, and A. J. Drummond
Evidence for Time Dependency of Molecular Rate Estimates
Syst Biol, June 1, 2007; 56(3): 515 - 522.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/12/2271    most recent
msl107v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Woodhams, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Woodhams, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?