Molecular Biology and Evolution 18:1136-1138 (2001)
© 2001 Society for Molecular Biology and Evolution
LETTER |
Haplotype Tests Using Coalescent Simulations Conditional on the Number of Segregating Sites
Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, Scotland
Ecole Pratique des Hautes Études and Centre National de la Recherche Scientifique Unité Mixte de Recherche 7625, Université Pierre et Marie Curie, Paris, France
Recently, we proposed two neutrality tests (Depaulis and Veuille 1998
) based on haplotype number (K) and haplotype diversity (H). They relied on coalescent simulations conditional on the observed number of segregating sites (S) following the coalescent simulation procedure proposed by Hudson (1993)
. In a companion letter, Markovtsova, Marjoram, and Tavaré (2001) use an alternative approach, based on the joint distribution of K and S, and show that the corresponding tests are not independent of the population mutational parameter
(
= 4Neµ, where Ne is the effective population size and µ is the neutral mutation rate per generation). They use the classical procedure of coalescent simulations conditional on
and restrict their distribution to the particular subset of genealogies consistent with a particular value of S. They show that if
is extreme, the probability of rejection can be substantially different from 5%.
In another companion letter, Wall and Hudson (2001) show that the test based on K is reasonably robust in its original form. They perform coalescent simulations conditional on
for a wide range of values. In contrast to the previous approach, they consider all the outcomes of the neutral simulations with various S values and look at the corresponding (various) confidence intervals given by our (Depaulis and Veuille 1998
) procedure regardless of the
value in the input of their simulations. This latter approach may be a better representation of a neutral distribution of genealogies. They study various statistics, including K, and the resulting type I error given by the confidence interval of Depaulis and Veuille (1998)
remains close to 5% once corrected for the discreteness of the statistics.
In practice, the exact value of
is unknown, and we generally have no information on its value independent of a given data set. One should find a reliable procedure to account for uncertainty on
. Replacing
with an estimate would not be conservative. Rather than conditioning on this unknown parameter, we chose to condition directly on the observed value of S.
In the present letter, we question the relevance of considering extreme
values in addressing the robustness of the tests. We first show that those values of
that lead to nonrobust neutrality tests with our procedure are highly unlikely given S under a neutral model. Second, we show with a Bayesian approach that our tests conditional on S are reliable, thus confirming Wall and Hudson's (2001) simulation results by an alternative approach.
All values of
are not equally likely given an observed value of S under the neutral model. Using Hudson's (1990)
recursion, we computed the probability of obtaining an S value equal to or more extreme than a given value (the parameter values used by Markovtsova, Marjoram, and Tavaré [2001, tables 1 and 2 ]). Only when
= 10 is the S value not highly unexpected (table 1
). For this
value, the tests are conservative according to Markovtsova, Marjoram, and Tavaré (2001, table 1
). We also computed Watterson's estimate of
given S and its confidence interval following Kreitman and Hudson's (1991)
method. The 95% confidence interval for
always shows a much smaller range (1.327) than the 1100 range used by Markovtsova, Marjoram, and Tavaré (2001). As pointed out by Wall and Hudson (2001), the fact that the observed S value is highly unexpected given
is a sufficient reason to reject the null Wright-Fisher neutral model, and there is no need to use any other neutrality test.
|
The reliability of the test should be assessed within the confidence interval of
. To do this, we used the rejection algorithm suggested by Markovtsova, Marjoram, and Tavaré (2001) for
values at the bounds of its confidence interval given S (table 1
). We found a good overall fit between the nominal value of the test and its frequency of rejection (<12% in any case). However, note that
values should be weighted by their probabilities given the data. Indeed, a difficulty with the procedure used by Markovtsova, Marjoram, and Tavaré is that the values for
were taken arbitrarily. The probability of obtaining the configuration of values used in each simulation is thus ignored and we can hardly draw firm conclusions. Markovtsova, Marjoram, and Tavaré (2001) can only conclude that the confidence interval tends to narrow "as
tends to zero or infinity."
Following Markovtsova, Marjoram, and Tavaré's (2001) approach, one sensible alternative procedure would be to weight the probability of rejection for different
values by f(
| Sn = s), the density probability of
given S. In the notation of Markovtsova, Marjoram, and Tavaré (2001),
|
|
The density of
given S was obtained using a Bayesian approach similar to that followed by Fu (1998)
. We used a uniform prior distribution between 0 and 100 encompassing the range of values considered by Markovtsova, Marjoram, and Tavaré (2001)
. For the posterior distribution, Bayes' theorem gives
|
|
) is obtained following Hudson (1990)
P(Sn = s |
) d
(integrated over the range of the prior distribution). The posterior probability density of
given S was derived for a large number of
values evenly distributed over the range of the prior distribution (fig. 1
). For these
values, we also computed the probability of rejection of the tests given by the rejection algorithm and weighted it by the posterior distribution according to equation (1) . The posterior distribution was much narrower than the range considered by Markovtsova, Marjoram, and Tavaré (2001) and had negligible probability density for extreme values of
(fig. 1
). As a result, the corresponding rejection probability given by Depaulis and Veuille's (1998)
(results not shown). For another neutrality test, Kelly (1997)
, including the Bayesian approach, and concluded that conditioning on S was the most reliable one. Note that the latter procedure is close, in this respect, to that proposed by Nielsen (2000) for analyzing SNP data, which (1) conditions the probabilities on the fact that loci are variable, (2) treats
as a nuisance parameter, and (3) eventually eliminates it.
|
Finally, as noted by Depaulis and Veuille (1998)
used by Markovtsova, Marjoram, and Tavaré (2001) would be even more unlikely under a model that includes recombination, since recombination tends to decrease the stochastic variance of estimates of
(Hudson 1983
Acknowledgements
We thank N. Barton, M. Cobb, Y. X. Fu, I. Gordo, A. Navarro, and S. Otto for helpful discussions and comments on earlier versions of this manuscript, and S. Tavaré for providing Markovtsova, Marjoram, and Tavaré's (2001) manuscript via his website. A computer program that implements Markovtsova, Marjoram, and Tavaré's (2001) rejection algorithm and the H and K haplotype tests conditional on either S or
and on a value of the population recombination parameter are available from smousset@snv.jussieu.fr. F.D. was supported by NERC and S.M. and M.V. were supported by Groupe de Recherche GDR 1928 of the Centre National de la Recherche Scientifique.
Footnotes
1 Keywords: coalescent theory
simulations
neutrality tests
haplotype distribution ![]()
2 Address for correspondence and reprints: Frantz Depaulis, Institute of Cell, Animal and Population Biology, Ashworth Laboratory, King's Buildings, West Mains Road, Edinburgh EH9 3JT, United Kingdom. frantz.depaulis{at}ed.ac.uk ![]()
literature cited
Depaulis, F., L. Brazier, and M. Veuille. 1999. Selective sweep at the Drosophila melanogaster Suppressor of Hairless locus and its association with the In(2L)t inversion polymorphism. Genetics 152:10171024
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:17881790[Web of Science][Medline]
Fu, Y. X. 1998. Probability of a segregating pattern in a sample of DNA sequences. Theor. Popul. Biol. 54:110[Medline]
Hudson, R. R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183201[Web of Science][Medline]
. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, Oxford, England
. 1993. The how and why of generating gene genealogies. Pp. 2336 in N. Takahata and A. G. Clark, eds. Mechanism of molecular evolution. Japan Scientific Societies Press, Tokyo
Kelly, J. K. 1997. A test of neutrality based on interlocus associations. Genetics 146:11971206
Kreitman, M., and R. R. Hudson. 1991. Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from pattern of polymorphism and divergence. Genetics 127:565582
Markovtsova, L., P. Marjoram, and S. Tavaré. 2001. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18:11321133
Nielsen, R. 2000. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931942
Wall, J. D., and R. R. Hudson. 2001. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18:11341135
Watterson, G. A. 1975. On the number of segregation sites. Theor. Popul. Biol. 7:256276[Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Achaz Frequency Spectrum Neutrality Tests: One for All and All for One Genetics, September 1, 2009; 183(1): 249 - 258. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. A. Rosenberg and M. Jakobsson The Relationship Between Homozygosity and the Frequency of the Most Frequent Allele Genetics, August 1, 2008; 179(4): 2027 - 2036. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Achaz Testing for Neutrality in Samples With Sequencing Errors Genetics, July 1, 2008; 179(3): 1409 - 1424. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ramirez-Soriano, S. E. Ramos-Onsins, J. Rozas, F. Calafell, and A. Navarro Statistical Power Analysis of Neutrality Tests Under Demographic Expansions, Contractions and Bottlenecks With Recombination Genetics, May 1, 2008; 179(1): 555 - 567. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Baudry, N. Derome, M. Huet, and M. Veuille Contrasted Polymorphism Patterns in a Large Sample of Populations From the Evolutionary Genetics Model Drosophila simulans Genetics, June 1, 2006; 173(2): 759 - 767. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Jiggins and M. C. Tinsley An Ancient Mitochondrial Polymorphism in Adalia bipunctata Linked to a Sex-Ratio-Distorting Bacterium Genetics, November 1, 2005; 171(3): 1115 - 1124. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Leman, Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama Likelihoods From Summary Statistics: Recent Divergence Between Species Genetics, November 1, 2005; 171(3): 1419 - 1436. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Innan, K. Zhang, P. Marjoram, S. Tavare, and N. A. Rosenberg Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites Genetics, March 1, 2005; 169(3): 1763 - 1777. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mousset, N. Derome, and M. Veuille A Test of Neutrality and Constant Population Size Based on the Mismatch Distribution Mol. Biol. Evol., April 1, 2004; 21(4): 724 - 731. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Derome, K. Metayer, C. Montchamp-Moreau, and M. Veuille Signature of Selective Sweep Associated With the Evolution of sex-ratio Drive in Drosophila simulans Genetics, March 1, 2004; 166(3): 1357 - 1366. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Glinka, L. Ometto, S. Mousset, W. Stephan, and D. De Lorenzo Demography and Natural Selection Have Shaped Genetic Variation in Drosophila melanogaster: A Multi-locus Approach Genetics, November 1, 2003; 165(3): 1269 - 1278. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. F. McAllister Sequence Differentiation Associated With an Inversion on the Neo-X Chromosome of Drosophila americana Genetics, November 1, 2003; 165(3): 1317 - 1328. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Baum, A. W. Thomas, and D. J. Conway Evidence for Diversifying Selection on Erythrocyte-Binding Antigens of Plasmodium falciparum and P. vivax Genetics, April 1, 2003; 163(4): 1327 - 1336. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mousset, L. Brazier, M.-L. Cariou, F. Chartois, F. Depaulis, and M. Veuille Evidence of a High Rate of Selective Sweeps in African Drosophila melanogaster Genetics, February 1, 2003; 163(2): 599 - 609. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Ramos-Onsins and J. Rozas Statistical Properties of New Neutrality Tests Against Population Growth Mol. Biol. Evol., December 1, 2002; 19(12): 2092 - 2100. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, A. S. Peek, D. Dunams, and B. S. Gaut Population Genetics of Duplicated Disease-Defense Genes, hm1 and hm2, in Maize (Zea mays ssp. mays L.) and Its Wild Ancestor (Zea mays ssp. parviglumis) Genetics, October 1, 2002; 162(2): 851 - 860. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




