Molecular Biology and Evolution 18:1134-1135 (2001)
© 2001 Society for Molecular Biology and Evolution
LETTER |
Coalescent Simulations and Statistical Tests of Neutrality
Department of Ecology and Evolution, University of Chicago
Recently, Depaulis and Veuille (1998)
proposed two new test statistics based on the number and frequency of different haplotypes. In a companion letter, Markovtsova, Marjoram, and Tavaré (2001) point out that Depaulis and Veuille (1998)
do not use the standard implementation for their coalescent simulations. Standard coalescent simulations first produce random genealogies, then place mutations at constant rate
/2 (
= 4Nµ is the population mutation parameter, where N is the effective population size and µ is the per-locus mutation rate per generation) along each of the branches (Kingman 1982a, 1982b
; Hudson 1990
). Instead, Depaulis and Veuille (1998)
generate distributions for their statistics by first constructing random genealogies, then placing S (the observed number of segregating sites) mutations on each tree. This "fixed S" method has been used before (e.g., Hudson 1993
; Rozas and Rozas 1999
), partly because it is easy to simulate, and partly because S is observed, while
must be estimated from the data (see, e.g., Fu 1996
). In fact, it is not clear how to estimate
independent of polymorphism data. Although the fixed S scheme does not directly use
, Markovtsova, Marjoram, and Tavaré (2001) highlight that the actual distributions of test statistics conditional on S are not independent of
. In particular, knowing both
and S changes the expected shape of a genealogy. For example, if S is unusually large given
, we expect the genealogy to be longer than average. Thus, the critical values in Depaulis and Veuille (1998)
might not be appropriate, since the actual rejection probabilities for their tests are functions of the unknown parameter
.
In this letter, we examined the type I errors of various statistical tests by simulation, and we determined what the actual rejection probability was for the fixed S method when data was simulated under standard coalescent assumptions. We considered three different test statistics: K (Strobeck 1987
; Fu 1996
; Depaulis and Veuille 1998
), D (Tajima 1989
), and D* (Fu and Li 1993
). First, we ran 105 replicates under the fixed S method to generate critical values (at the 5% level) for each possible value of S. Then, we ran 105 standard coalescent simulations with fixed mutation parameter
. For each trial, acceptance or rejection was determined from the fixed S critical values using the observed value of S. We tabulated what proportion of trials led to significantly low or significantly high values of each of the three test statistics. K is the number of distinct haplotypes in the sample, and D and D* are two commonly used test statistics that determine whether the frequencies of segregating mutations are consistent with the standard neutral model. An excess of low-frequency variants leads to negative D and D* values, while an excess of intermediate-frequency variants leads to positive D and D* values. To more accurately compare nominal and actual rejection probabilities, we used a randomized test (see, e.g., Lehmann 1986
, p. 71). All simulations were run with no recombination.
Table 1
shows results for a sample size of n = 50 and
= 3.0, 5.0, 10.0, and 15.0. The actual rejection probabilities for K were near 5%, while those for D and D* were between 5% and 5.4%. In all cases, the fractions of trials rejected on each tail (i.e., the proportions that were significantly too high or significantly too low) were roughly equal. If a nonrandomized test was used (as would be done in practice), the actual rejection probabilities for K and D* were substantially lower (
3.6% for K and
4.4% for D*), while those for D were about the same. To check for the effect of sample size, we reran all of our simulations with n = 20. The actual (randomized) rejection probabilities were about the same as in those table 1
: for 3.0
20.0, rejection probabilities were
5% for K,
5.6% for D*, and
5.4% for D (results not shown).
|
For K, we examined the relationship between
and S more systematically by determining how different values of S affect the rejection probabilities when
= 5.0. Table 2
shows the results of 106 simulations with n = 50 and S grouped into seven different classes. As expected, the actual rejection probability varied widely for different values of S. When S was near its expected value, the actual rejection probability was less than the nominal 5% level. However, rejection probabilities were far higher when S was very high or very low. When S was small, the underlying genealogy tended to be short; most mutations occurred on separate branches of the genealogy, leading to more haplotypes than expected and high K values. In contrast, when S was large, the oldest branches tended to be extremely long, and many mutations lay on them. This often led to fewer than expected distinct haplotypes (i.e., low K values).
|
In summary, we find that for a given value of
, the actual rejection probability (when critical values are determined from fixed S simulations) is close to the nominal rejection level. Thus, the fixed S scheme may still be appropriate, unless a researcher has a prior belief that the actual value of
is far from Watterson's (1975)
W (which is based on the observed value of S). In practice, if one knows that
W is inaccurate, one may also have independent evidence that the standard neutral model is false and should not be used. Finally, we point out that although the unknown parameter
may not prevent the construction of an appropriate statistical test, uncertainty in the population recombination parameter C = 4Nr (where N is the effective population size and r is the per-locus recombination rate per generation) is much more problematic. The distributions of all three test statistics (especially K) are highly dependent on the unknown value of C (Wall 1999
Acknowledgements
We thank Y.-X. Fu, P. Marjoram, M. Przeworski, and an anonymous reviewer for helpful comments and discussions. J.D.W. was supported by NIH grant 5 R01 H610847.
Footnotes
1 Present address: Department of Organismic and Evolutionary Biology, Harvard University. ![]()
2 Keywords: Coalescent theory, neutrality tests ![]()
3 Address for correspondence and reprints: Jeffrey D. Wall, Department of Organismic and Evolutionary Biology, Harvard University, 2102 Biological Laboratories, 16 Divinity Avenue, Cambridge, Massachusetts 02138. jwall{at}oeb.harvard.edu ![]()
literature cited
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:17881790[Web of Science][Medline]
Fu, Y.-X. 1996. New statistical tests of neutrality for DNA samples from a population. Genetics 143:557570
Fu, Y.-X., and W.-H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693709
Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, New York
Hudson, R. R. 1993. The how and why of generating gene genealogies. Pp. 2336 in N. Takahata and A. G. Clark, eds. Mechanisms of molecular evolution. Sinauer, Sunderland, Mass
Kingman, J. F. C. 1982a. On the genealogy of large populations. J. Appl. Prob. 19A:2743
. 1982b. The coalescent. Stochastic Processes Appl. 13:235248
Lehmann, E. L. 1986. Testing statistical hypotheses. Wiley, New York
Markovtsova, L., P. Marjoram, and S. Tavaré. 2001. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18:11321133
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174175
Strobeck, C. 1987. Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117:149153
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595
Wall, J. D. 1999. Recombination and the power of statistical tests of neutrality. Genet. Res. Camb. 74:6579
Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256276[Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Achaz Frequency Spectrum Neutrality Tests: One for All and All for One Genetics, September 1, 2009; 183(1): 249 - 258. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Achaz Testing for Neutrality in Samples With Sequencing Errors Genetics, July 1, 2008; 179(3): 1409 - 1424. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ramirez-Soriano, S. E. Ramos-Onsins, J. Rozas, F. Calafell, and A. Navarro Statistical Power Analysis of Neutrality Tests Under Demographic Expansions, Contractions and Bottlenecks With Recombination Genetics, May 1, 2008; 179(1): 555 - 567. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zeng, S. Shi, and C.-I Wu Compound Tests for the Detection of Hitchhiking Under Positive Selection Mol. Biol. Evol., August 1, 2007; 24(8): 1898 - 1908. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zeng, S. Mano, S. Shi, and C.-I Wu Comparisons of Site- and Haplotype-Frequency Methods for Detecting Positive Selection Mol. Biol. Evol., July 1, 2007; 24(7): 1562 - 1574. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zeng, Y.-X. Fu, S. Shi, and C.-I Wu Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Genetics, November 1, 2006; 174(3): 1431 - 1439. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan Modified Hudson-Kreitman-Aguade Test and Two-Dimensional Evaluation of Neutrality Tests Genetics, July 1, 2006; 173(3): 1725 - 1733. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Jiggins and M. C. Tinsley An Ancient Mitochondrial Polymorphism in Adalia bipunctata Linked to a Sex-Ratio-Distorting Bacterium Genetics, November 1, 2005; 171(3): 1115 - 1124. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Leman, Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama Likelihoods From Summary Statistics: Recent Divergence Between Species Genetics, November 1, 2005; 171(3): 1419 - 1436. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wichmann, D. Ritchie, C. S. Kousik, and J. Bergelson Reduced Genetic Variation Occurs among Genes of the Highly Clonal Plant Pathogen Xanthomonas axonopodis pv. vesicatoria, Including the Effector Gene avrBs2 Appl. Envir. Microbiol., May 1, 2005; 71(5): 2418 - 2432. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan, K. Zhang, P. Marjoram, S. Tavare, and N. A. Rosenberg Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites Genetics, March 1, 2005; 169(3): 1763 - 1777. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mousset, N. Derome, and M. Veuille A Test of Neutrality and Constant Population Size Based on the Mismatch Distribution Mol. Biol. Evol., April 1, 2004; 21(4): 724 - 731. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Derome, K. Metayer, C. Montchamp-Moreau, and M. Veuille Signature of Selective Sweep Associated With the Evolution of sex-ratio Drive in Drosophila simulans Genetics, March 1, 2004; 166(3): 1357 - 1366. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Ramos-Onsins and J. Rozas Statistical Properties of New Neutrality Tests Against Population Growth Mol. Biol. Evol., December 1, 2002; 19(12): 2092 - 2100. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Wall, P. Andolfatto, and M. Przeworski Testing Models of Selection and Demography in Drosophila simulans Genetics, September 1, 2002; 162(1): 203 - 216. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Lagercrantz, M. Kruskopf Osterberg, and M. Lascoux Sequence Variation and Haplotype Structure at the Putative Flowering-Time Locus COL1 of Brassica nigra Mol. Biol. Evol., September 1, 2002; 19(9): 1474 - 1482. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Przeworski The Signature of Positive Selection at Randomly Chosen Loci Genetics, March 1, 2002; 160(3): 1179 - 1189. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Gilad, S. Rosenberg, M. Przeworski, D. Lancet, and K. Skorecki Evidence for positive selection and population structure at the human MAO-A gene PNAS, January 22, 2002; 99(2): 862 - 867. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Markovtsova, P. Marjoram, and S. Tavare On a Test of Depaulis and Veuille Mol. Biol. Evol., June 1, 2001; 18(6): 1132 - 1133. [Full Text] |
||||
![]() |
F. Depaulis, S. Mousset, and M. Veuille Haplotype Tests Using Coalescent Simulations Conditional on the Number of Segregating Sites Mol. Biol. Evol., June 1, 2001; 18(6): 1136 - 1138. [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



