Skip Navigation

This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Depaulis, F.
Right arrow Articles by Veuille, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Depaulis, F.
Right arrow Articles by Veuille, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Molecular Biology and Evolution 18:1136-1138 (2001)
© 2001 Society for Molecular Biology and Evolution


LETTER

Haplotype Tests Using Coalescent Simulations Conditional on the Number of Segregating Sites

Frantz Depaulis, Sylvain Mousset and Michel Veuille

Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, Scotland
Ecole Pratique des Hautes Études and Centre National de la Recherche Scientifique Unité Mixte de Recherche 7625, Université Pierre et Marie Curie, Paris, France

Recently, we proposed two neutrality tests (Depaulis and Veuille 1998Citation ) based on haplotype number (K) and haplotype diversity (H). They relied on coalescent simulations conditional on the observed number of segregating sites (S) following the coalescent simulation procedure proposed by Hudson (1993)Citation . In a companion letter, Markovtsova, Marjoram, and Tavaré (2001) use an alternative approach, based on the joint distribution of K and S, and show that the corresponding tests are not independent of the population mutational parameter {theta} ({theta} = 4Neµ, where Ne is the effective population size and µ is the neutral mutation rate per generation). They use the classical procedure of coalescent simulations conditional on {theta} and restrict their distribution to the particular subset of genealogies consistent with a particular value of S. They show that if {theta} is extreme, the probability of rejection can be substantially different from 5%.

In another companion letter, Wall and Hudson (2001) show that the test based on K is reasonably robust in its original form. They perform coalescent simulations conditional on {theta} for a wide range of values. In contrast to the previous approach, they consider all the outcomes of the neutral simulations with various S values and look at the corresponding (various) confidence intervals given by our (Depaulis and Veuille 1998Citation ) procedure regardless of the {theta} value in the input of their simulations. This latter approach may be a better representation of a neutral distribution of genealogies. They study various statistics, including K, and the resulting type I error given by the confidence interval of Depaulis and Veuille (1998)Citation remains close to 5% once corrected for the discreteness of the statistics.

In practice, the exact value of {theta} is unknown, and we generally have no information on its value independent of a given data set. One should find a reliable procedure to account for uncertainty on {theta}. Replacing {theta} with an estimate would not be conservative. Rather than conditioning on this unknown parameter, we chose to condition directly on the observed value of S.

In the present letter, we question the relevance of considering extreme {theta} values in addressing the robustness of the tests. We first show that those values of {theta} that lead to nonrobust neutrality tests with our procedure are highly unlikely given S under a neutral model. Second, we show with a Bayesian approach that our tests conditional on S are reliable, thus confirming Wall and Hudson's (2001) simulation results by an alternative approach.

All values of {theta} are not equally likely given an observed value of S under the neutral model. Using Hudson's (1990)Citation recursion, we computed the probability of obtaining an S value equal to or more extreme than a given value (the parameter values used by Markovtsova, Marjoram, and Tavaré [2001, tables 1 and 2 ]). Only when {theta} = 10 is the S value not highly unexpected (table 1 ). For this {theta} value, the tests are conservative according to Markovtsova, Marjoram, and Tavaré (2001, table 1 ). We also computed Watterson's estimate of {theta} given S and its confidence interval following Kreitman and Hudson's (1991)Citation method. The 95% confidence interval for {theta} always shows a much smaller range (1.3–27) than the 1–100 range used by Markovtsova, Marjoram, and Tavaré (2001). As pointed out by Wall and Hudson (2001), the fact that the observed S value is highly unexpected given {theta} is a sufficient reason to reject the null Wright-Fisher neutral model, and there is no need to use any other neutrality test.


View this table:
[in this window]
[in a new window]
 
Table 1 Rejection Probabilities of the Haplotype Tests and Probability of S Given Various {{theta}} Values

 
The reliability of the test should be assessed within the confidence interval of {theta}. To do this, we used the rejection algorithm suggested by Markovtsova, Marjoram, and Tavaré (2001) for {theta} values at the bounds of its confidence interval given S (table 1 ). We found a good overall fit between the nominal value of the test and its frequency of rejection (<12% in any case). However, note that {theta} values should be weighted by their probabilities given the data. Indeed, a difficulty with the procedure used by Markovtsova, Marjoram, and Tavaré is that the values for {theta} were taken arbitrarily. The probability of obtaining the configuration of values used in each simulation is thus ignored and we can hardly draw firm conclusions. Markovtsova, Marjoram, and Tavaré (2001) can only conclude that the confidence interval tends to narrow "as {theta} tends to zero or infinity."

Following Markovtsova, Marjoram, and Tavaré's (2001) approach, one sensible alternative procedure would be to weight the probability of rejection for different {theta} values by f({theta} | Sn = s), the density probability of {theta} given S. In the notation of Markovtsova, Marjoram, and Tavaré (2001),


The density of {theta} given S was obtained using a Bayesian approach similar to that followed by Fu (1998)Citation . We used a uniform prior distribution between 0 and 100 encompassing the range of values considered by Markovtsova, Marjoram, and Tavaré (2001)Citation . For the posterior distribution, Bayes' theorem gives


where P(Sn = s | {theta}) is obtained following Hudson (1990)Citation as described above and Pprior(Sn = s) = {int}{theta} P(Sn = s | {theta}) d{theta} (integrated over the range of the prior distribution). The posterior probability density of {theta} given S was derived for a large number of {theta} values evenly distributed over the range of the prior distribution (fig. 1 ). For these {theta} values, we also computed the probability of rejection of the tests given by the rejection algorithm and weighted it by the posterior distribution according to equation (1) . The posterior distribution was much narrower than the range considered by Markovtsova, Marjoram, and Tavaré (2001) and had negligible probability density for extreme values of {theta} (fig. 1 ). As a result, the corresponding rejection probability given by Depaulis and Veuille's (1998)Citation confidence intervals were <3.6% for the K-test and <2.7% for the H-test (table 1 , last column). In particular, the results of the Su(H) study (Depaulis, Brazier, and Veuille 1999Citation ) remained significant (P = 0.7 and 1.2). Both tests appeared to be conservative using the Bayesian procedure (the fact that they were overly conservative was a consequence of the discreteness of the statistics). The effect of assuming a particular prior distribution is unknown, but this effect should be small provided it encompasses the confidence interval of {theta} (results not shown). For another neutrality test, Kelly (1997)Citation tried several simulation schemes to account for the uncertainty about {theta}, including the Bayesian approach, and concluded that conditioning on S was the most reliable one. Note that the latter procedure is close, in this respect, to that proposed by Nielsen (2000) for analyzing SNP data, which (1) conditions the probabilities on the fact that loci are variable, (2) treats {theta} as a nuisance parameter, and (3) eventually eliminates it.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 1.—Posterior distribution of {theta} given S, obtained from equation (2) and assuming a uniform prior distribution of {theta} between 0 and 100. Parameters are identical to those in table 1 : s = 10, n = 10 (dashed curve); s = 40, n = 20 (thin solid curve); s = 50, n = 50 (dotted curve); s = 44, n = 20 (bold solid curve)

 
Finally, as noted by Depaulis and Veuille (1998)Citation and Wall and Hudson (2001)Citation , the distribution of haplotypes depends to a large degree on recombination, and the tests should be used with caution if recombination is not zero. The extreme values of {theta} used by Markovtsova, Marjoram, and Tavaré (2001) would be even more unlikely under a model that includes recombination, since recombination tends to decrease the stochastic variance of estimates of {theta} (Hudson 1983Citation ).

Acknowledgements

We thank N. Barton, M. Cobb, Y. X. Fu, I. Gordo, A. Navarro, and S. Otto for helpful discussions and comments on earlier versions of this manuscript, and S. Tavaré for providing Markovtsova, Marjoram, and Tavaré's (2001) manuscript via his website. A computer program that implements Markovtsova, Marjoram, and Tavaré's (2001) rejection algorithm and the H and K haplotype tests conditional on either S or {theta} and on a value of the population recombination parameter are available from smousset@snv.jussieu.fr. F.D. was supported by NERC and S.M. and M.V. were supported by Groupe de Recherche GDR 1928 of the Centre National de la Recherche Scientifique.

Footnotes

Yun-Xin Fu, Reviewing Editor

1 Keywords: coalescent theory simulations neutrality tests haplotype distribution Back

2 Address for correspondence and reprints: Frantz Depaulis, Institute of Cell, Animal and Population Biology, Ashworth Laboratory, King's Buildings, West Mains Road, Edinburgh EH9 3JT, United Kingdom. frantz.depaulis{at}ed.ac.uk Back

literature cited

    Depaulis, F., L. Brazier, and M. Veuille. 1999. Selective sweep at the Drosophila melanogaster Suppressor of Hairless locus and its association with the In(2L)t inversion polymorphism. Genetics 152:1017–1024

    Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788–1790[Web of Science][Medline]

    Fu, Y. X. 1998. Probability of a segregating pattern in a sample of DNA sequences. Theor. Popul. Biol. 54:1–10[Medline]

    Hudson, R. R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183–201[Web of Science][Medline]

    ———. 1990. Gene genealogies and the coalescent process. Pp. 1–44 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, Oxford, England

    ———. 1993. The how and why of generating gene genealogies. Pp. 23–36 in N. Takahata and A. G. Clark, eds. Mechanism of molecular evolution. Japan Scientific Societies Press, Tokyo

    Kelly, J. K. 1997. A test of neutrality based on interlocus associations. Genetics 146:1197–1206

    Kreitman, M., and R. R. Hudson. 1991. Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from pattern of polymorphism and divergence. Genetics 127:565–582

    Markovtsova, L., P. Marjoram, and S. Tavaré. 2001. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18:1132–1133[Free Full Text]

    Nielsen, R. 2000. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931–942

    Wall, J. D., and R. R. Hudson. 2001. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18:1134–1135[Free Full Text]

    Watterson, G. A. 1975. On the number of segregation sites. Theor. Popul. Biol. 7:256–276[Web of Science][Medline]

Accepted for publication January 29, 2001.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
G. Achaz
Frequency Spectrum Neutrality Tests: One for All and All for One
Genetics, September 1, 2009; 183(1): 249 - 258.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. A. Rosenberg and M. Jakobsson
The Relationship Between Homozygosity and the Frequency of the Most Frequent Allele
Genetics, August 1, 2008; 179(4): 2027 - 2036.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Achaz
Testing for Neutrality in Samples With Sequencing Errors
Genetics, July 1, 2008; 179(3): 1409 - 1424.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Ramirez-Soriano, S. E. Ramos-Onsins, J. Rozas, F. Calafell, and A. Navarro
Statistical Power Analysis of Neutrality Tests Under Demographic Expansions, Contractions and Bottlenecks With Recombination
Genetics, May 1, 2008; 179(1): 555 - 567.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
E. Baudry, N. Derome, M. Huet, and M. Veuille
Contrasted Polymorphism Patterns in a Large Sample of Populations From the Evolutionary Genetics Model Drosophila simulans
Genetics, June 1, 2006; 173(2): 759 - 767.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
F. M. Jiggins and M. C. Tinsley
An Ancient Mitochondrial Polymorphism in Adalia bipunctata Linked to a Sex-Ratio-Distorting Bacterium
Genetics, November 1, 2005; 171(3): 1115 - 1124.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. C. Leman, Y. Chen, J. E. Stajich, M. A. F. Noor, and M. K. Uyenoyama
Likelihoods From Summary Statistics: Recent Divergence Between Species
Genetics, November 1, 2005; 171(3): 1419 - 1436.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
H. Innan, K. Zhang, P. Marjoram, S. Tavare, and N. A. Rosenberg
Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites
Genetics, March 1, 2005; 169(3): 1763 - 1777.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Mousset, N. Derome, and M. Veuille
A Test of Neutrality and Constant Population Size Based on the Mismatch Distribution
Mol. Biol. Evol., April 1, 2004; 21(4): 724 - 731.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Derome, K. Metayer, C. Montchamp-Moreau, and M. Veuille
Signature of Selective Sweep Associated With the Evolution of sex-ratio Drive in Drosophila simulans
Genetics, March 1, 2004; 166(3): 1357 - 1366.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Glinka, L. Ometto, S. Mousset, W. Stephan, and D. De Lorenzo
Demography and Natural Selection Have Shaped Genetic Variation in Drosophila melanogaster: A Multi-locus Approach
Genetics, November 1, 2003; 165(3): 1269 - 1278.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
B. F. McAllister
Sequence Differentiation Associated With an Inversion on the Neo-X Chromosome of Drosophila americana
Genetics, November 1, 2003; 165(3): 1317 - 1328.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Baum, A. W. Thomas, and D. J. Conway
Evidence for Diversifying Selection on Erythrocyte-Binding Antigens of Plasmodium falciparum and P. vivax
Genetics, April 1, 2003; 163(4): 1327 - 1336.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. Mousset, L. Brazier, M.-L. Cariou, F. Chartois, F. Depaulis, and M. Veuille
Evidence of a High Rate of Selective Sweeps in African Drosophila melanogaster
Genetics, February 1, 2003; 163(2): 599 - 609.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. E. Ramos-Onsins and J. Rozas
Statistical Properties of New Neutrality Tests Against Population Growth
Mol. Biol. Evol., December 1, 2002; 19(12): 2092 - 2100.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. Zhang, A. S. Peek, D. Dunams, and B. S. Gaut
Population Genetics of Duplicated Disease-Defense Genes, hm1 and hm2, in Maize (Zea mays ssp. mays L.) and Its Wild Ancestor (Zea mays ssp. parviglumis)
Genetics, October 1, 2002; 162(2): 851 - 860.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Depaulis, F.
Right arrow Articles by Veuille, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Depaulis, F.
Right arrow Articles by Veuille, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?