Molecular Biology and Evolution 18:1132-1133 (2001)
© 2001 Society for Molecular Biology and Evolution
LETTER |
On a Test of Depaulis and Veuille
Department of Mathematics
Biostatistics Division, Department of Preventive Medicine
Program in Molecular Biology, Department of Biological Sciences, University of Southern California
In a recent letter to this journal, Depaulis and Veuille (1998)
discussed two possible tests of neutrality, the "haplotype number test" and the "haplotype diversity test." They present in their tables 1 and 2
means and percentage points of the distribution of the number Kn of haplotypes and the sample heterozygosity Hn = 1 - , where are the relative frequencies of those haplotypes in a sample of size n for different values of the number of segregating sites s observed in the data. They assume a neutral infinitely-many-sites model of mutation with no recombination. These percentage points were found by repeatedly simulating a random coalescent tree with n tips, randomly distributing s mutations on the tree, and calculating the observed values of Kn and Hn. See Hudson (1990)
for a description of how such simulations can be performed.
|
Depaulis and Veuille's (1998)
. The authors present the method as though the resulting simulated values had the conditional distribution of Kn and Hn given Sn = s, respectively. However, this is not true, as the following argument shows. Denote the coalescent tree by
and the coalescence times by T = (Tn, ... , T2), so that Tj is the time for which there are j distinct ancestors in the tree
. We see that for mutation rate
, the conditional distribution of (Kn, Hn) given Sn = s can be represented as
|
|
(
, t | s) is the joint conditional density of (
, T) given Sn = s. Because of the Poisson nature of the mutation process, the first term under the integral signs in equation (1) does not depend on
. It follows that in order to simulate observations from the joint conditional distribution of (Kn, Hn) given Sn = s, one first simulates from the conditional distribution of (
, T) given Sn = s and then randomly distributes the s mutations over the resulting tree and calculates the values of Kn and Hn. Notice that Depaulis and Veuille's (1998)
, T) instead of the conditional distribution of (
, T) given Sn = s. The joint distribution of (Kn, Sn) in the case of constant population size is discussed by Griffths (1982)
Depaulis and Veuille (1998)
suggested that the percentage points of their statistics could be used as a test of neutrality: for a given sample size n and observed value of s, one compares the observed values of Kn and Hn with the given 95% credible intervals. Values falling outside those intervals lead to rejection of neutrality. Given that the true conditional distributions of Kn and Hn in fact depend on the unknown mutation rate
, it is likely that the power of this test varies dramatically as a function of
for given n and s. To assess this hypothesis, we simulated observations from the true joint conditional distribution using equation (1) and then estimated the probability that either statistic would fall outside the limits given by Depaulis and Veuille. This gave an empirical estimate of the probability that their test would reject neutrality for different values of
.
We used a Markov chain Monte Carlo (MCMC) approach to simulate observations from the conditional distribution of (
, T) given Sn = s. MCMC methods produce correlated samples, but these samples may be made approximately independent by sampling the output at widely spaced intervals. The results below were generated using the approach in Markovtsova, Marjoram, and Tavaré (2000). An alternative approach is the rejection algorithm of Tavaré et al. (1997)
. Table 1
shows the fraction of 10,000 observations that fell inside the DV nominal 95% intervals for three different scenarios. As noted by Fu and Li (1993)
, Sn is not a sufficient statistic for
, so the fraction of observations that fall within the DV nominal 95% intervals varies greatly. It appears that if the true
value is well supported by the data, the test of neutrality based on the DV intervals will work well. However, if the true
value is not well supported by the data, the test will be inaccurate, leading to incorrect rejections of neutrality. For further discussion, see Wall and Hudson (2001).
As we have seen, the power of the DV test depends on the unknown parameter
. Even if the mutation rate is known, the compound parameter
still depends on the underlying effective population size at the time of sampling, and this is not known in practice. Depaulis and Veuille (1998)
discuss a data set from the Su(H) locus in Drosophila melanogaster for which n = 20, s = 44. The observed values of K20 and H20 were 7 and 0.76, respectively. The nominal P values were estimated to be 0.011 and 0.017, respectively. We used the simulation approach outlined above to find empirical estimates of these P values for different values of
, using 10,000 simulated values once more. We used values of
= 1, 5, 12.4, 50.0, and 100 for illustration; the value 12.4 corresponds to Watterson's (1975)
segregating-sites estimator. Results are shown in table 2
.
|
From table 2 , we see that for
in the range 12.4 or larger, the data are highly unlikely under a neutral scenario. However, for a range of smaller
values, including, for example,
= 5, the data become much more likely. Thus, the ability to reject the supposed neutral scenario depends on the true value of
. It should be noted that one cannot reject neutrality on the basis of this test; rather, one can reject the model upon which the test is based. As Depaulis and Veuille (1998)
Computer programs that implement both the MCMC approach and the rejection method to generate observations from the joint conditional distribution of, for example, (Kn, Hn) given the value of Sn for a given distribution for
under a variety of demographic scenarios can be obtained from the authors.
Acknowledgements
We thank Y.-X. Fu and an anonymous reviewer for helpful comments. We were supported in part by NSF grant DBI95-04393 and NIH grant GM 58897.
Footnotes
1 Keywords: Markov chain Monte Carlo
coalescent ![]()
2 Address for correspondence and reprints: Simon Tavaré, Program in Molecular Biology, Department of Biological Sciences, SHS 172, University of Southern California, Los Angeles, California 90089-1340. stavare{at}gnome.usc.edu ![]()
literature cited
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:17881790[ISI][Medline]
Fu, Y. X., and W. H. Li. 1993. Maximum likelihood estimation of population parameters. Genetics 134:12611270
Griffiths, R. C. 1982. The number of alleles and segregating sites in a sample from the infinite-alleles model. Adv. Appl. Prob. 14:225239
Griffiths, R. C., and S. Tavaré. 1996. Monte Carlo inference methods in population genetics. Math. Comput. Modelling 23:141158
Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, Oxford, England
Markovtsova, L., P. Marjoram, and S. Tavaré. 2000. The effects of rate variation on ancestral inference in the coalescent. Genetics 156:14271436
Tavaré, S., D. Balding, R. C. Griffiths, and P. Donnelly. 1997. Inferring coalescence times for molecular sequence data. Genetics 145:505518
Wall, J. D., and R. R. Hudson. 2001. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18:11341135
Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256276[ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. A. Rosenberg and M. Jakobsson The Relationship Between Homozygosity and the Frequency of the Most Frequent Allele Genetics, August 1, 2008; 179(4): 2027 - 2036. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Achaz Testing for Neutrality in Samples With Sequencing Errors Genetics, July 1, 2008; 179(3): 1409 - 1424. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Zeng, Y.-X. Fu, S. Shi, and C.-I Wu Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants Genetics, November 1, 2006; 174(3): 1431 - 1439. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Quesada, S. E. Ramos-Onsins, J. Rozas, and M. Aguade Positive Selection Versus Demography: Evolutionary Inferences Based on an Unusual Haplotype Structure in Drosophila simulans Mol. Biol. Evol., September 1, 2006; 23(9): 1643 - 1647. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Jiggins and M. C. Tinsley An Ancient Mitochondrial Polymorphism in Adalia bipunctata Linked to a Sex-Ratio-Distorting Bacterium Genetics, November 1, 2005; 171(3): 1115 - 1124. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Leman, Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama Likelihoods From Summary Statistics: Recent Divergence Between Species Genetics, November 1, 2005; 171(3): 1419 - 1436. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan, K. Zhang, P. Marjoram, S. Tavare, and N. A. Rosenberg Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites Genetics, March 1, 2005; 169(3): 1763 - 1777. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Ramos-Onsins and J. Rozas Statistical Properties of New Neutrality Tests Against Population Growth Mol. Biol. Evol., December 1, 2002; 19(12): 2092 - 2100. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Wall and R. R. Hudson Coalescent Simulations and Statistical Tests of Neutrality Mol. Biol. Evol., June 1, 2001; 18(6): 1134 - 1135. [Full Text] |
||||
![]() |
F. Depaulis, S. Mousset, and M. Veuille Haplotype Tests Using Coalescent Simulations Conditional on the Number of Segregating Sites Mol. Biol. Evol., June 1, 2001; 18(6): 1136 - 1138. [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


