MBE Advance Access originally published online on September 7, 2006
Molecular Biology and Evolution 2006 23(12):2271-2273; doi:10.1093/molbev/msl107
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letters |
Can Deleterious Mutations Explain the Time Dependency of Molecular Rate Estimates?
Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, New Zealand
E-mail: m.d.woodhams{at}massey.ac.nz.
| Abstract |
|---|
|
|
|---|
It has recently been observed by Ho et al. (Ho SYW, Phillips MJ, Cooper A, Drummond AJ. 2005. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol. 22(7):15611568) that apparent rates of molecular evolution increase when measured over short timespans. I investigate whether the data are explainable purely by deleterious mutations. I derive an empirical approximation for the persistence of these mutations in a randomly mating population and, hence, derive lower limits on effective population sizes. These limits are high and get higher if additional reasonable assumptions are made. This casts doubt on whether deleterious mutations are able to explain the apparent rate acceleration.
Key Words: deleterious mutation nearly neutral evolution mutation rate substitution rate rate calibration
The molecular clock has a long, fruitful, and sometimes controversial history (Bromham and Penny 2003
). Typically, a phylogenetic tree is constructed, the clock is calibrated via one or more internal nodes whose dates are fixed by paleontology, and the dates of remaining internal nodes are estimated.
Ho et al. (2005)
have observed an acceleration of molecular evolution on short timescales (fig. 1). The rate against time graph resembles the letter "J" fallen on its side, hence has been named the "J-shaped curve" (Penny 2005
, Fig. 1) or "the lazy J." When evolutionary rates follow such a curve, using a molecular clock calibrated on the long-term rate will overestimate the length of recent time intervals. For example, accounting for the J-shaped curve decreases the estimated date of Mitochondrial Eve from 135 to 176 thousand years ago (kya) to 76 kya (Ho et al. 2005
).
|
Discarding the unlikely proposition that mutation rates have been greatly increasing over the last million years or so, we are left needing an explanation of how mutations can be maintained over short periods of time, yet eliminated on long periods. Slightly deleterious mutations are an obvious candidateover short timescales, they behave like neutral mutations, with prevalence dominated by genetic drift. Over long timescales, their deleteriousness greatly reduces their chance of becoming fixed, so they contribute little to the (long-term) substitution rate (Kimura 1983
Consider the simple model of a randomly mating haploid population of fixed size N, with nonoverlapping generations. At generation t = 0, a mutant allele arises in a single individual with selective advantage s. I define function F(N, s, t) to be the proportion of the population carrying the mutant allele at time t.
If N is large enough, we can treat the fraction of mutant alleles and the time as continuous variables. This is known as the diffusion approximation (Crow and Kimura 1970
, Section 8.3). The solution is independent of N, up to scaling in the selection and time parameters. The scaled time is
= t/N (Ewens 2004
, Section 5.1) and scaled selection S = 2Ns (Kimura 1983
, eq. 3.11). (My definition of S differs from Kimura's by a factor of 2, as I am considering a haploid population.) I define G(S,
) to be the solution to the diffusion equation, with scaling given by:
|
| (1) |
Consider how many mutations a randomly selected individual has accumulated compared with its ancestor of t0 generations ago. Let µ be the mean mutation rate (mutations/individual/generation) and assume all mutations have selective advantage s. Between times t and t + dt, we expect µN dt new mutations to arise in the population, each of which has probability F(N, s, t) of being present in the descendant. We integrate over time to find the expected number of differences between ancestor and descendant and divide by t0 to find the estimated mutation rate R:
![]() | (2) |
Now, we need to consider the distribution of the advantage parameter S. I take a model where all mutations are deleterious, with S following an exponential distribution with mean
(i.e., the probability density function of S is es/
/
). Define function R* to be the observed rate function integrated over the distribution of S:
|
| (2) |
|
F(N, s, t) can be calculated exactly for small N by modeling the number of mutant alleles in the population as a discrete Markov process. The Markov matrix M corresponding to a single discrete generation is readily calculable. The eigendecomposition of this matrix can be used to express F as a constant plus a sum of decaying exponentials (see Supplementary Material online). (There is an analytic solution for the diffusion approximation [Crow and Kimura 1970
I use the exact solutions to F for 100
S
10 and N
500 to find an approximation for G. The approximation is a constant plus up to 3 decaying exponentials. For S > 30, the decay parameters are found by interpolation from the exact results for N = 500 (from eigendecomposition of the Markov matrix). For S < 30, the approximation is
|
| (4) |
Now we can compare the predictions of the R* curve against the J-shaped curve graphs of Ho et al. (2005)
. Roughly speaking, the ratio between the y intercept of the curve (zero-time mutation rate) and the asymptote (long-term mutation rate) sets
, and the decay rate sets N.
Each data point from Ho et al. (2005)
consists of a calibration time, mean rate, and 95% confidence limits on the rate. I have modeled the distribution of the rate estimate as a log normal distribution with mean equal to the given mean and 95% of the weight within the given confidence limits. This model allows me to calculate a likelihood for a given rate curve (parameterize by Ng,
, and
, where g is the mean generation time and
is the asymptotic mutation rate, i.e., the substitution rate, as opposed to µ, the instantaneous mutation rate).
To simplify calculations, I have truncated the range of S to 1,000 < S < 0. (Note that we are not considering highly deleterious or lethal mutations. Even at S = 1,000 and N = 10,000, we have s = 0.05, just a 5% reduction in fitness.) The maximum likelihood fits of this model to the Ho et al. (2005) data are shown in table 1.
|
Notice that very high values of
are obtained (most mutations have large negative S), and effective populations are on the order of 108 for birds and 105 for primates. Figure 1 shows one of these maximum likelihood curves.
To estimate a rough lower limit for Ng, I set trial values of Ng and optimize on the other 2 variables. The point at which the log likelihood is less than the maximum likelihood by 2 is the point at which allowing Ng to vary significantly outperforms the fixed value (likelihood ratio test.) This gives limits Ng
6.3 x 106 (avian), Ng
5.0 x 105 (primate protein), and Ng
4.9 x 105 (primate d-loop). Hence, if the J-shaped curve is due to deleterious mutation, we can set approximate effective population limits of N > 1.6 million for bird species (assuming g = 4 years) and N
50,000 for primate species (assuming g = 10 years). These bounds are effective population sizescensus populations are typically 420 times larger (Frankham 1995
). (The approximate generation times are justified in the Supplementary Material online.)
The essence of these bounds is this: to explain (by deleterious mutation) the contrast between short-term and long-term rates, it is necessary to have many significantly deleterious mutations. (Note that for a population of 1 million S = 1,000 implies s = 5 x 104, so these are still minor mutations from an individual's point of view.) The more deleterious the mutations, the quicker G(S,
) the decays to its asymptotic value (fig. 2a). The quicker the decay (in scaled time
), the larger the N must be to match the observed decay due to the timescaling equation t = N
.
Note that these limits are conservative as they are based on the assumption that there are no strictly neutral mutations. For neutral mutations, F, G, and R are constant. If some fraction of the asymptotic rate is due to neutral mutations, the remainder (to be explained by deleterious mutations) has a larger contrast between short- and long-term rates, hence requires larger N. For example, the 95% confidence limit for the avian data is Ng
1.1 x 107, if half of the fixation rate is due to strictly neutral mutationsan increase of 75%. If we are unwilling to accept high ratios of short- to long-term mutation rates and fix the ratio at 100:1 (i.e., set
= 163), then the avian data gives 95% confidence limit Ng
2.2 x 107.
I have not examined the effects of population or generation time varying over time or between lineages. The general rule is that the long-term effective population size is the harmonic mean of the short-term effective population sizes (Wright 1938
), so bottlenecks have a disproportionate effect.
These results should be taken as indicative onlylarge uncertainties remain, notably that the error bars in Ho et al. (2005)
are large, generation times have not been accurately estimated, a simplistic model of the distribution of S was used, and the effects of variable population sizes not accounted for.
Recently Bazin et al. (2006)
have concluded that mitochondrial evolution is dominated by positive selection. If confirmed, this result invalidates my model and rules out deleterious mutations as an explanation for the J-shaped curve.
In conclusion, it appears that large effective populations are required to explain the J-shaped curve purely by deleterious mutations alone. The increase in observed rates in the short term (the height of the J-shaped curve) requires most mutations to be significantly deleterious and, hence, quickly lost from the population. Large populations are then required to maintain these mutations for the timescales over which the apparent rate is elevated (the length of the J curve's hook).
| Supplementary Material |
|---|
|
|
|---|
A supplementary material elaborating on the mathematics and the derivation of the approximation (eq. 4) is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This research was inspired by discussions with Mike Hendy and David Penny, who also reviewed the manuscript. Simon Ho provided a copy of the data from Ho et al. (2005). I have had useful suggestions and criticisms from them and from Michael Baake, Alexei Drummond, Robert McLachlan, and Hamish Spencer.
| Footnotes |
|---|
Peter Lockhart, Associate Editor
| References |
|---|
|
|
|---|
Bazin E, Glémin S, Galtier N. (2006) Population size does not influence mitochondrial genetic diversity in animals. Science 312:5773570572.
Bromham L and Penny D. (2003) The modern molecular clock [historical article]. Nat Rev Genet 4:3216224.[CrossRef][Web of Science][Medline]
Crow JF and Kimura M. (1970) An introduction to population genetics theory. (Harper & Row, New York).
Ewens WJ. (2004) Mathematical population genetics I. Theoretical introduction. (Springer-Verlag, New York).
Frankham R. (1995) Effective population size/adult population size in wildlife: a review. Genet Res 66:95107.[Web of Science]
Ho SYW, Phillips MJ, Cooper A, Drummond AJ. (2005) Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol 22:715611568.
Kimura M. (1983) The neutral theory of molecular evolution. (Cambridge University Press, Cambridge).
Penny D. (2005) Evolutionary biology: relativity for molecular clocks [news]. Nature 436:7048183184.[CrossRef][Medline]
Wright S. (1938) Size of a population and breeding structure in relation to evolution. Science 87:430431.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. I. Peterson and J. Masel Quantitative Prediction of Molecular Clock and Ka/Ks at Short Timescales Mol. Biol. Evol., November 1, 2009; 26(11): 2595 - 2603. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M. Henn, C. R. Gignoux, M. W. Feldman, and J. L. Mountain Characterizing the Time Dependency of Human Mitochondrial DNA Mutation Rate Estimates Mol. Biol. Evol., January 1, 2009; 26(1): 217 - 230. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. P. Burridge, D. Craw, D. Fletcher, and J. M. Waters Geological Dates and Molecular Rates: Fish DNA Sheds Light on Time Dependency Mol. Biol. Evol., April 1, 2008; 25(4): 624 - 633. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Y.W Ho, S.-O. Kolokotronis, and R. G Allaby Elevated substitution rates estimated from ancient DNA sequences Biol Lett, December 22, 2007; 3(6): 702 - 705. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Y. W. Ho, B. Shapiro, M. J. Phillips, A. Cooper, and A. J. Drummond Evidence for Time Dependency of Molecular Rate Estimates Syst Biol, June 1, 2007; 56(3): 515 - 522. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



) = 1. Note that the curves approach the asymptote faster for higher |S| or 

