Skip Navigation

This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (26)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wall, J. D.
Right arrow Articles by Hudson, R. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wall, J. D.
Right arrow Articles by Hudson, R. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Molecular Biology and Evolution 18:1134-1135 (2001)
© 2001 Society for Molecular Biology and Evolution


LETTER

Coalescent Simulations and Statistical Tests of Neutrality

Jeffrey D. Wall and Richard R. Hudson

Department of Ecology and Evolution, University of Chicago

Recently, Depaulis and Veuille (1998)Citation proposed two new test statistics based on the number and frequency of different haplotypes. In a companion letter, Markovtsova, Marjoram, and Tavaré (2001) point out that Depaulis and Veuille (1998)Citation do not use the standard implementation for their coalescent simulations. Standard coalescent simulations first produce random genealogies, then place mutations at constant rate {theta}/2 ({theta} = 4Nµ is the population mutation parameter, where N is the effective population size and µ is the per-locus mutation rate per generation) along each of the branches (Kingman 1982a, 1982bCitation ; Hudson 1990Citation ). Instead, Depaulis and Veuille (1998)Citation generate distributions for their statistics by first constructing random genealogies, then placing S (the observed number of segregating sites) mutations on each tree. This "fixed S" method has been used before (e.g., Hudson 1993Citation ; Rozas and Rozas 1999Citation ), partly because it is easy to simulate, and partly because S is observed, while {theta} must be estimated from the data (see, e.g., Fu 1996Citation ). In fact, it is not clear how to estimate {theta} independent of polymorphism data. Although the fixed S scheme does not directly use {theta}, Markovtsova, Marjoram, and Tavaré (2001) highlight that the actual distributions of test statistics conditional on S are not independent of {theta}. In particular, knowing both {theta} and S changes the expected shape of a genealogy. For example, if S is unusually large given {theta}, we expect the genealogy to be longer than average. Thus, the critical values in Depaulis and Veuille (1998)Citation might not be appropriate, since the actual rejection probabilities for their tests are functions of the unknown parameter {theta}.

In this letter, we examined the type I errors of various statistical tests by simulation, and we determined what the actual rejection probability was for the fixed S method when data was simulated under standard coalescent assumptions. We considered three different test statistics: K (Strobeck 1987Citation ; Fu 1996Citation ; Depaulis and Veuille 1998Citation ), D (Tajima 1989Citation ), and D* (Fu and Li 1993Citation ). First, we ran 105 replicates under the fixed S method to generate critical values (at the 5% level) for each possible value of S. Then, we ran 105 standard coalescent simulations with fixed mutation parameter {theta}. For each trial, acceptance or rejection was determined from the fixed S critical values using the observed value of S. We tabulated what proportion of trials led to significantly low or significantly high values of each of the three test statistics. K is the number of distinct haplotypes in the sample, and D and D* are two commonly used test statistics that determine whether the frequencies of segregating mutations are consistent with the standard neutral model. An excess of low-frequency variants leads to negative D and D* values, while an excess of intermediate-frequency variants leads to positive D and D* values. To more accurately compare nominal and actual rejection probabilities, we used a randomized test (see, e.g., Lehmann 1986Citation , p. 71). All simulations were run with no recombination.

Table 1 shows results for a sample size of n = 50 and {theta} = 3.0, 5.0, 10.0, and 15.0. The actual rejection probabilities for K were near 5%, while those for D and D* were between 5% and 5.4%. In all cases, the fractions of trials rejected on each tail (i.e., the proportions that were significantly too high or significantly too low) were roughly equal. If a nonrandomized test was used (as would be done in practice), the actual rejection probabilities for K and D* were substantially lower (<=3.6% for K and <=4.4% for D*), while those for D were about the same. To check for the effect of sample size, we reran all of our simulations with n = 20. The actual (randomized) rejection probabilities were about the same as in those table 1 : for 3.0 <= {theta} <= 20.0, rejection probabilities were <=5% for K, <=5.6% for D*, and <=5.4% for D (results not shown).


View this table:
[in this window]
[in a new window]
 
Table 1 Actual Rejection Probabilities with Sample Size n = 50

 
For K, we examined the relationship between {theta} and S more systematically by determining how different values of S affect the rejection probabilities when {theta} = 5.0. Table 2 shows the results of 106 simulations with n = 50 and S grouped into seven different classes. As expected, the actual rejection probability varied widely for different values of S. When S was near its expected value, the actual rejection probability was less than the nominal 5% level. However, rejection probabilities were far higher when S was very high or very low. When S was small, the underlying genealogy tended to be short; most mutations occurred on separate branches of the genealogy, leading to more haplotypes than expected and high K values. In contrast, when S was large, the oldest branches tended to be extremely long, and many mutations lay on them. This often led to fewer than expected distinct haplotypes (i.e., low K values).


View this table:
[in this window]
[in a new window]
 
Table 2 Rejection Probabilities for Specific Classes of S, with n = 50 and {{theta}} = 5.0

 
In summary, we find that for a given value of {theta}, the actual rejection probability (when critical values are determined from fixed S simulations) is close to the nominal rejection level. Thus, the fixed S scheme may still be appropriate, unless a researcher has a prior belief that the actual value of {theta} is far from Watterson's (1975)Citation estimate {theta}W (which is based on the observed value of S). In practice, if one knows that {theta}W is inaccurate, one may also have independent evidence that the standard neutral model is false and should not be used. Finally, we point out that although the unknown parameter {theta} may not prevent the construction of an appropriate statistical test, uncertainty in the population recombination parameter C = 4Nr (where N is the effective population size and r is the per-locus recombination rate per generation) is much more problematic. The distributions of all three test statistics (especially K) are highly dependent on the unknown value of C (Wall 1999Citation ), making it extremely difficult to construct a test for nuclear sequence data with nominal rejection probability equal to the actual rejection probability.

Acknowledgements

We thank Y.-X. Fu, P. Marjoram, M. Przeworski, and an anonymous reviewer for helpful comments and discussions. J.D.W. was supported by NIH grant 5 R01 H610847.

Footnotes

Yun-Xin Fu, Reviewing Editor

1 Present address: Department of Organismic and Evolutionary Biology, Harvard University. Back

2 Keywords: Coalescent theory, neutrality tests Back

3 Address for correspondence and reprints: Jeffrey D. Wall, Department of Organismic and Evolutionary Biology, Harvard University, 2102 Biological Laboratories, 16 Divinity Avenue, Cambridge, Massachusetts 02138. jwall{at}oeb.harvard.edu Back

literature cited

    Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788–1790[Web of Science][Medline]

    Fu, Y.-X. 1996. New statistical tests of neutrality for DNA samples from a population. Genetics 143:557–570

    Fu, Y.-X., and W.-H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693–709

    Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 1–44 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, New York

    Hudson, R. R. 1993. The how and why of generating gene genealogies. Pp. 23–36 in N. Takahata and A. G. Clark, eds. Mechanisms of molecular evolution. Sinauer, Sunderland, Mass

    Kingman, J. F. C. 1982a. On the genealogy of large populations. J. Appl. Prob. 19A:27–43

    ———. 1982b. The coalescent. Stochastic Processes Appl. 13:235–248

    Lehmann, E. L. 1986. Testing statistical hypotheses. Wiley, New York

    Markovtsova, L., P. Marjoram, and S. Tavaré. 2001. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18:1132–1133[Free Full Text]

    Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175

    Strobeck, C. 1987. Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117:149–153

    Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595

    Wall, J. D. 1999. Recombination and the power of statistical tests of neutrality. Genet. Res. Camb. 74:65–79

    Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256–276[Web of Science][Medline]

Accepted for publication January 29, 2001.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
G. Achaz
Frequency Spectrum Neutrality Tests: One for All and All for One
Genetics, September 1, 2009; 183(1): 249 - 258.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Achaz
Testing for Neutrality in Samples With Sequencing Errors
Genetics, July 1, 2008; 179(3): 1409 - 1424.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Ramirez-Soriano, S. E. Ramos-Onsins, J. Rozas, F. Calafell, and A. Navarro
Statistical Power Analysis of Neutrality Tests Under Demographic Expansions, Contractions and Bottlenecks With Recombination
Genetics, May 1, 2008; 179(1): 555 - 567.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Zeng, S. Shi, and C.-I Wu
Compound Tests for the Detection of Hitchhiking Under Positive Selection
Mol. Biol. Evol., August 1, 2007; 24(8): 1898 - 1908.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Zeng, S. Mano, S. Shi, and C.-I Wu
Comparisons of Site- and Haplotype-Frequency Methods for Detecting Positive Selection
Mol. Biol. Evol., July 1, 2007; 24(7): 1562 - 1574.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. Zeng, Y.-X. Fu, S. Shi, and C.-I Wu
Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants
Genetics, November 1, 2006; 174(3): 1431 - 1439.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Innan
Modified Hudson-Kreitman-Aguade Test and Two-Dimensional Evaluation of Neutrality Tests
Genetics, July 1, 2006; 173(3): 1725 - 1733.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
F. M. Jiggins and M. C. Tinsley
An Ancient Mitochondrial Polymorphism in Adalia bipunctata Linked to a Sex-Ratio-Distorting Bacterium
Genetics, November 1, 2005; 171(3): 1115 - 1124.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. C. Leman, Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama
Likelihoods From Summary Statistics: Recent Divergence Between Species
Genetics, November 1, 2005; 171(3): 1419 - 1436.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
G. Wichmann, D. Ritchie, C. S. Kousik, and J. Bergelson
Reduced Genetic Variation Occurs among Genes of the Highly Clonal Plant Pathogen Xanthomonas axonopodis pv. vesicatoria, Including the Effector Gene avrBs2
Appl. Envir. Microbiol., May 1, 2005; 71(5): 2418 - 2432.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Innan, K. Zhang, P. Marjoram, S. Tavare, and N. A. Rosenberg
Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites
Genetics, March 1, 2005; 169(3): 1763 - 1777.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Mousset, N. Derome, and M. Veuille
A Test of Neutrality and Constant Population Size Based on the Mismatch Distribution
Mol. Biol. Evol., April 1, 2004; 21(4): 724 - 731.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Derome, K. Metayer, C. Montchamp-Moreau, and M. Veuille
Signature of Selective Sweep Associated With the Evolution of sex-ratio Drive in Drosophila simulans
Genetics, March 1, 2004; 166(3): 1357 - 1366.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. E. Ramos-Onsins and J. Rozas
Statistical Properties of New Neutrality Tests Against Population Growth
Mol. Biol. Evol., December 1, 2002; 19(12): 2092 - 2100.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. D. Wall, P. Andolfatto, and M. Przeworski
Testing Models of Selection and Demography in Drosophila simulans
Genetics, September 1, 2002; 162(1): 203 - 216.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
U. Lagercrantz, M. Kruskopf Osterberg, and M. Lascoux
Sequence Variation and Haplotype Structure at the Putative Flowering-Time Locus COL1 of Brassica nigra
Mol. Biol. Evol., September 1, 2002; 19(9): 1474 - 1482.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Przeworski
The Signature of Positive Selection at Randomly Chosen Loci
Genetics, March 1, 2002; 160(3): 1179 - 1189.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y. Gilad, S. Rosenberg, M. Przeworski, D. Lancet, and K. Skorecki
Evidence for positive selection and population structure at the human MAO-A gene
PNAS, January 22, 2002; 99(2): 862 - 867.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
L. Markovtsova, P. Marjoram, and S. Tavare
On a Test of Depaulis and Veuille
Mol. Biol. Evol., June 1, 2001; 18(6): 1132 - 1133.
[Full Text]


Home page
Mol Biol EvolHome page
F. Depaulis, S. Mousset, and M. Veuille
Haplotype Tests Using Coalescent Simulations Conditional on the Number of Segregating Sites
Mol. Biol. Evol., June 1, 2001; 18(6): 1136 - 1138.
[Full Text]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (26)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Wall, J. D.
Right arrow Articles by Hudson, R. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wall, J. D.
Right arrow Articles by Hudson, R. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?