Skip Navigation


MBE Advance Access originally published online on April 21, 2007
Molecular Biology and Evolution 2007 24(7):1562-1574; doi:10.1093/molbev/msm078
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/7/1562    most recent
msm078v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Comparisons of Site- and Haplotype-Frequency Methods for Detecting Positive Selection

Kai Zeng*, Shuhei Mano{dagger}, Suhua Shi* and Chung-I Wu{ddagger}

* State Key Laboratory of Biocontrol and Key Laboratory of Gene Engineering of the Ministry of Education, Sun Yat-sen University, Guangzhou, China
{dagger} Graduate School of Natural Sciences, Nagoya City University, Nagoya, Japan
{ddagger} Department of Ecology and Evolution, University of Chicago

E-mail: kzeng{at}uchicago.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
In this report, we compare the differences between various site- and haplotype-frequency tests in their power to detect positive selection by doing computer simulations. Our results are the following. 1) Although haplotype-frequency tests that are conditional on the number of haplotypes (K) were developed for nonrecombining haplotypes, these tests are insensitive to recombination. Such tests, including the Ewens–Watterson (EW) test, can therefore be applied to recombining haplotypes. 2) Tests conditional on the number of segregating sites (S) become overly conservative in the presence of recombination. 3) The EW test is usually the most powerful test during the sweep phase, especially when the local recombination rate is high. 4) The "extended haplotype homozygosity" test relies heavily on the prior knowledge of the target of selection. With that knowledge, it is the most powerful test, whereas in the absence of this prior information, the test has little power. We also study the sensitivities of the haplotype-frequency tests to background selection and various demographic forces. We find that these tests are sensitive to some forces other than positive selection. To alleviate the problem of low specificity, compound tests, such as the DH test (Zeng et al. 2006), may be a solution. In the companion paper (Zeng K, Shi S, Wu C-I, in preparation), we use the EW test to devise 2 compound tests, which are more powerful in detecting positive selection than DH, but are also relatively insensitive to demography.

Key Words: positive selection • haplotype-frequency tests • site-frequency tests


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
DNA sequence data are a rich source for detecting adaptive evolution. Many statistical methods have been proposed for this purpose (see Fay and Wu 2003Go; Nielsen 2005Go; Biswas and Akey 2006Go; Sabeti et al. 2006Go for recent reviews). In this study, we focus on methods using within-species polymorphism data. These methods can be loosely classified into 3 categories—site-frequency, haplotype-frequency, and linkage disequilibrium (LD) methods.

Site-Frequency Methods
These methods require only frequencies of variants at polymorphic nucleotide sites. Linkage phase of these variants is neither required nor used. These methods are in general based on the infinite-site model (Kimura 1969Go; Watterson 1975Go) and utilize the site-frequency spectrum for detection (Fu 1995Go). Tajima's D test (1989Go; referred to as the D test) is the first test of this type. Many other D-like tests have since been proposed, including Fu and Li's tests (1993Go), Fu's tests (1996Go, 1997Go), Fay and Wu's H test (2000Go), and 2 newly derived ones—E and DH (Zeng et al. 2006Go). Many papers have examined the properties of these tests (Braverman et al. 1995Go; Simonsen et al. 1995Go; Przeworski 2002Go; Zeng et al. 2006Go). There are other site-frequency methods which represent a major departure in methodology from the tests mentioned above and will not be considered in this study (e.g., Kim and Stephan 2002Go; Jensen et al. 2005Go).

Among these tests, the DH test is of some interest. Its proposal was motivated by the observation that Tajima's D test and Fay and Wu's H test are both powerful in detecting selection, but they are sensitive to different demographic factors and are affected by background selection to different degrees (Zeng et al. 2006Go). Therefore, by combining these 2 tests, the sensitivity of either test to a certain demographic factor (e.g., population growth) is counterbalanced by the insensitivity of the other test to the same factor. The compound test thus has high specificity to positive selection.

Haplotype-Frequency Methods
Methods in the second category require additional information on the linkage phase among variant sites and score a haplotype as an allele. They examine the level of haplotype polymorphism using simple summary statistics. The 3 widely used statistics are haplotype homozygosity (F), frequency of the most common haplotype (M), and configuration of haplotype frequencies (C). These measures are expressed either as statistics conditional on the number of haplotypes (K) or as statistics conditional on the number of segregating sites (S), elaborated below.

Statistics Conditional on the Number of Haplotypes (Alleles)
The statistics of F, M, or C conditional on K can be expressed in terms of Ewens' sampling distribution, which was derived under the no-recombination infinite-allele model (Ewens 1972Go; Karlin and McGregor 1972Go). One of the most remarkable results is the "invariant" property—the distributions of these conditional statistics (i.e., F |K, M|K, and C|K) are independent of the fundamental parameter of population genetics, {theta} (=4Nu, where N is the effective population size of a diploid organism and u is the neutral mutation rate of the locus). This property is useful as the estimation of {theta} may introduce further uncertainties (Donnelly and Tavaré 1995Go). We shall refer to F|K (haplotype homozygosity conditional on the number of haplotypes) as the Ewens–Watterson (EW) test statistic (Watterson 1978Go). The other 2 similar test statistics, M|K and C|K, were proposed by Ewens (1973)Go and Slatkin (1994Go, 1996)Go, respectively. All the 3 test statistics are independent of {theta} and are in fact highly correlated when positive selection is in operation (see Results). Because the invariant property is true only when there is no intragenic recombination, the sensitivity to (or robustness against) recombination is crucial to the applicability of these 3 test statistics to DNA haplotype data and will be addressed in this report. There are several other test statistics, which use the sampling distribution of K directly (Strobeck 1987Go; Fu 1996Go, 1997Go). The null distributions of these statistics are therefore dependent on {theta}. Properties of these tests have been studied elsewhere (Fu 1996Go, 1997Go) and are not considered here.

Statistics Conditional on the Number of Segregating Sites
These methods are based on the infinite-site model. Similar to the haplotype tests conditional on K mentioned above, these methods usually use F, M, and C to measure levels of variability, but conditional on the number of segregating sites S. Examples include the haplotype partition test of Hudson et al. (1994Go; referred to as Hudson's test), the haplotype diversity and haplotype number tests (Depaulis and Veuille 1998Go), and the full configuration test (Innan et al. 2005Go). Typically, the null distributions of these tests are obtained by doing coalescent simulation with the number of segregating sites fixed (Hudson 1993Go; Wall and Hudson 2001Go). Recombination is generally not considered in the null distribution.

LD Methods
Most methods in this category incorporate recombination in their null distributions. They use various statistics to summarize patterns of LD, including length of unrecombined segment (Slatkin and Bertorelle 2001Go), extent of haplotype sharing (Toomajian et al. 2003Go), and pairwise measure of LD (Kelly 1997Go; Wall 1999Go; Kim and Nielsen 2004Go). The most popular test in this category is probably the extended haplotype homozygosity (EHH) test (Sabeti et al. 2002Go) and its extensions (Hanchard et al. 2006Go; Voight et al. 2006Go; Wang et al. 2006). The principle underlying the EHH test is that, under neutrality, a new variant requires a long time to reach high frequency in the population. During this period, substantial recombination would have broken down the haplotype on which the mutation occurred. Therefore, long-range LD surrounding a site is considered as the signature of positive selection. Because the information on the target of selection is not easily available and is not used by other tests, our study of the EHH test is limited to the effect of this information on its power.

The main task of this report is to assess the power of haplotype- and site-frequency methods to detect positive selection (also referred to as genetic hitchhiking or selective sweep; Maynard Smith and Haigh 1974Go). Although several previous studies (Fu 1996Go, 1997Go; Depaulis et al. 2003Go, 2005Go) have addressed similar issues, there are important differences. First, in this study, we incorporate intragenic recombination and analyze its effects on the applicability and power of the tests. Second, we concentrate on the prefixation (or sweep) phase when the advantageous allele is sweeping through the population. In this phase, the patterns of polymorphism and LD show characteristics that are strongly associated with hitchhiking (Fay and Wu 2000Go; Stephan et al. 2006Go; Zeng et al. 2006Go). In contrast, the postfixation phase is characterized by an excess of low-frequency variants (i.e., accumulation of new mutations). Such excess is a general characteristic among many processes including population growth, background selection, and recovery from bottleneck (e.g., Fu 1997Go; Zeng et al. 2006Go). In practical terms, it may be difficult to distinguish the postfixation phase of sweep from those processes using the single-locus methods considered here. Third, we compare the sensitivities of the tests to other factors including background selection and demographic changes. We conclude that all tests except for the DH test are sensitive to some forces other than positive selection. In the companion study (Zeng K, Shi S, Wu C-I, in preparation), we develop new compound tests by using the novel properties of the EW test reported in this paper to circumvent the problem of low specificity.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Statistical Tests
Site-Frequency Tests
Two site-frequency tests, Tajima's D test (1989)Go and the DH test (Zeng et al. 2006Go), were considered in the simulations. Fay and Wu's H test (2000)Go was not included because its behavior is quite similar to that of the DH test, but with lower specificity to positive selection (Zeng et al. 2006Go). To carry out the DH test, we define

Formula (1)
where XS is a random variable representing random samples (of size n) under neutrality with S (S ≥ 1) segregating sites; d (X) is the value of the D statistic of sample X, and h (X) is that of the H statistic; dp and hp are critical values such that P { d (XS) ≤ dp } = P { h (XS) ≤ hp } = p. Note that we used the normalized H test here (Zeng et al. 2006Go). By definition, fS (p) is the significance level of the compound DH test. In practice, for a prespecified significance level ß, we solve fS (p) = ß numerically. We denote the solution as p*, and dp and hp corresponding to p* as d* and h*, respectively. Then a sample with S segregating sites, denoted Formula , is called significant if Formula and Formula

Tests Based on the Ewens Sampling Formula
We focused primarily on the EW test in the simulations because this test shows better statistical properties than the other 2 tests in this category (see Results). The test statistic of the EW test is defined as

Formula (2)
where F is the sample homozygosity; K is the number of haplotypes in a sample of size n; fi is the number of occurrence of the i-th haplotype (Watterson 1978Go). Similarly, the test statistic of Ewens' test is defined as M|K, where M is the frequency of the most common haplotype (Ewens 1973Go), and the test statistic of Slatkin's exact test (1994Go, 1996Go) is C|K, where C = (f1, ..., fK) is the configuration of haplotype frequencies. Under the no-recombination infinite-allele model, Ewens (1972Go; see also Karlin and McGregor 1972Go) showed that, conditional on K, the distribution of C (C = (f1, ..., fK)) is independent of {theta}. Thus, F|K and M|K are also independent of {theta}. Note that our implementation of Slatkin's exact test (1996)Go was different from the original definition in that we defined the P value of an observed sample as the sum of probabilities of occurrence of configurations, which are equal to or larger than that of the observed configuration. This change was made because when using the original definition of the test, we had little power to detect positive selection (unpublished data).

Haplotype Tests Conditional on the Number of Segregating Sites
The Hudson test rejects neutrality by investigating the probability of occurrence of a subset of sequences with low variation, that is, a major "haplotype class" (Hudson et al. 1994Go). Here, we used the simplified definition of the test statistic given in Innan et al. (2005)Go:

Formula (3)
where S and K are, respectively, the numbers of segregating sites and haplotypes in the sample. The haplotype diversity test due to Depaulis and Veuille (1998)Go is similarly defined, with 1– F in place of M in equation (3). But for consistency, we used F as the test statistic in our simulations. We will refer to this test as the DV test. The haplotype number test (K | S) proposed by Depaulis and Veuille (1998)Go did not outperform any tests in our simulations (Zeng K, unpublished data) and therefore is not discussed here. We did not consider the configuration test proposed by Innan et al. (2005)Go despite its similarity to Slatkin's exact test because its null distribution is difficult to simulate even for samples of moderate sizes.

The EHH Test
We used a single polymorphic site as the core in the EHH analyses. Because the goal was to look for signal of recent positive sweeps, we used the derived allele at the core as the core haplotype. The EHH statistic is defined as the probability that 2 randomly chosen genes carrying the core haplotype of interest are identical by descent for the entire interval from the core single nucleotide polymorphism to the point x (Sabeti et al. 2002Go). EHH values are therefore directional and can be calculated in both the 3' and 5' directions. Here, we adopted the definition of EHH proposed by Hanchard et al. (2006)Go, that is, we calculated EHH value at the variant farthest away from the core in the 3' direction and that at the farthest variant in the 5' direction and took the arithmetic mean of the 2 values.

Determining Critical Values
In this study, all tests were one-sided and were conducted at the 5% significance level. The tail of the null distribution, which can maximize a test's power to detect selection or minimize its sensitivity to recombination was used. For example, for tests EW and EHH, values falling into the upper 5% tail were considered significant; for D and H (performed jointly as the DH test), values falling in the lower tail were considered significant.

Site- and Haplotype-Frequency Tests
The null distributions of the EW and related tests were determined by Slatkin's exact enumeration algorithm (1994). Critical values of Tajima's D, the DH test, and haplotype tests conditional on S were obtained by coalescent simulation with the number of polymorphic sites fixed (Hudson 1993Go; Wall and Hudson 2001Go).

The EHH Test
For a sample with S segregating sites, to determine the level of significance of an EHH value at the given focal site (core), we need to generate neutral samples conditional on the frequency of the mutant allele at the core. To do this, first, we have to know the distribution of population frequency of the mutant allele at the core. Using the diffusion approximation, Griffiths (2003)Go has shown that, for a sample of size n, when the number of occurrence of the mutant allele at the core is b, the population frequency x (0 < x < 1) of the mutant allele has the following distribution:

Formula (4)
Second, we need to simulate frequency trajectories of the derived allele at the core, starting at the birth of the allele to the present. This can be done by utilizing the reversibility argument of the diffusion, that is, the backward diffusion process (starts at present and goes backward to the time when the mutant allele arose) has the same distribution as the usual forward process conditional on absorption at zero (Griffiths 2003Go; Coop and Griffiths 2004Go; Ewens 2004Go). Properties of conditional diffusion were reviewed in Ewens (2004)Go. Here, we used the pseudosampling device proposed by Kimura (1980)Go to simulate the diffusion process directly (see Griffiths [2003]Go; Coop and Griffiths [2004]Go; and Spencer and Coop [2004]Go for another way of simulating the conditional diffusion process).

In each replica of the simulation, we 1) generated a population frequency x by sampling from equation (4); 2) simulated a trajectory which started at x and ended at 0 using the pseudosampling device; 3) constructed genealogy by using the structured coalescent method with b lineages in the derived background and the rest in the ancestral background (Braverman et al. 1995Go; Zeng et al. 2006Go); 4) put the remaining S-1 segregating sites on the genealogy according to the shape of the local gene trees; and 5) calculated EHH for this random sample. The critical values were then obtained by examining the empirical distribution produced above. In the simulation above, we also assumed (true) local recombination rate and haplotype phase were known.

Simulation Algorithms
We used the coalescent algorithms implemented in the software package "ms" (Hudson 2002Go) to generate random samples under the neutral model with intragenic recombination and under various demographic models. To simulate hitchhiking, the coalescent process with a selective phase was used (Kaplan et al. 1989Go; Braverman et al. 1995Go; Kim and Stephan 2002Go). This model assumes that the fitnesses of the 3 genotypes AA, Aa, and aa at the selected site are 1 + s, 1 + hs, and 1, respectively, where A is the derived allele, s and h are the selection and dominance coefficients. The behavior of the selected allele is mainly governed by the scaled selective pressure {alpha} (=2Ns) and h. Our implementation followed the description in Zeng et al. (2006)Go with the following modification: frequency trajectories of the selected allele were obtained by using the pseudosampling device described above rather than the approximate deterministic model. This modification takes into account the random effects in the early stage of a sweep. It also allows us to examine the effects of selective sweeps with arbitrary level of dominance.

Background selection was simulated using the 2-locus model described in Hudson and Kaplan (1994)Go. This model assumes that the deleterious locus is in mutation-selection equilibrium and is not recombining, but recombination can occur between the neutral locus and the deleterious locus. Here, we extended the Hudson and Kaplan model by allowing intragenic recombination within the neutral locus.


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Effects of Intragenic Recombination on Haplotype-Frequency Tests
The haplotype-frequency tests considered in this study were derived under the no-recombination neutral model. To apply them to nuclear DNA sequence data, we need to make sure recombination does not inflate type I error. In table 1, we show the rejection probabilities of 3 tests, EW, Hudson's test, and Tajima's D test, generated with various combinations of {theta} and recombination rate (measured by {rho} = 4Nr, where r is the recombination rate of the region every generation). Results of other tests are very similar and are not presented (e.g., the rejection rates of Ewens' test are very similar to those of the EW test). The most striking result is that the type I error rate of the EW test is insensitive to recombination. This conclusion is true over a wide range of combinations of parameter values (see supplementary table S1, Supplementary Material online), except for some rare occasions with extreme {theta} and {rho} values which may be unlikely to cause problems in practice. When the EW test was conducted at other significance levels (e.g., 2%, 1%, and 0.5%), the conclusion still holds (unpublished data). On the contrary, Tajima's D and Hudson's test become very conservative in the presence of intragenic recombination, in agreement with previous reports (Wall 1999Go; Depaulis et al. 2005Go). The difference between these 2 tests (Tajima's D and Hudson's test) and EW is most significant when {rho}/{theta} is large, in which case the rejection rates of Tajima's D and Hudson's test are effectively zero. This result may imply that, when the effect of recombination is un-negligible in the history of a sample, the EW test may be more powerful than Tajima's D and Hudson's test (see fig. 3B below). The difference in susceptibility to recombination between haplotype tests conditional on K and tests conditional on S may be due to the fact that K and S contain a very different amount of information about the local recombination rate. Multiple studies have shown that K is a good indicator of local recombinational pressure (Wall 2000Go; Innan et al. 2005Go). But S does not possess this property. In fact, in a simple regression analysis with {rho} as the x axis and S as the y axis, no correlation was observed (unpublished data).


View this table:
[in this window]
[in a new window]

 
Table 1 The Actual Rejection Probabilities of the EW Test, Hudson's Test, and Tajima's D in the Presence of Recombination

 

Figure 3
Figure 3
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Power of the tests to detect positive selection as a function of the size of the neutral region when the advantageous allele was at various frequencies (0.3, 0.6, 0.8, and 0.95 for each panel). Positive selection was simulated with {alpha} = 300, h = 0.5. Sample size was 90. Size of the neutral region was measured by {theta}, the population mutation rate. The selected site was placed in the middle of the neutral region. Intragenic recombination was included with 2 different intensities: (A) {rho}/{theta} = 1 and (B) {rho}/{theta} = 5, where {rho} is the population recombination rate of the neutral region. Note that the scales on the y axis are different.

 
There is a difference between haplotype-frequency tests and site-frequency tests that we need to pay special attention. For a site-frequency test, either tail of the null distribution responses to recombination in a very similar manner; however, this is not the case for haplotype tests conditional on S. For example, if we conduct the DV test as a 2-sided test, its type I error rate is significantly higher than the nominal significance level in the presence of recombination (Depaulis et al. 2005Go; supplementary fig. S1, Supplementary Material online).

Power of the Haplotype-Frequency Tests to Detect Positive Selection
In this section, we compare the differences among haplotype-frequency tests in their ability to detect positive selection. The following results allow us to select the more powerful tests to compare with the 2 site-frequency tests in the next section. We only discuss the prefixation phase of a selective sweep for reasons given in Introduction. An example is given in figure 1. Several features can be observed. First, the power of all tests increases rapidly as a function of the frequency of the advantageous allele. When the frequency of the selected allele reaches 50%, all tests have obtained substantial power. Second, the power curves reach their peaks when the advantageous allele is at very high frequency and then start to decline before fixation. This is probably because at the time around fixation, levels of variation have been significantly reduced and thus, the samples tend to be uninformative. Third, tests conditional on K are generally more powerful than tests conditional on S, both before and after fixation (supplementary fig. S2, Supplementary Material online). This observation is true in most of our simulations (unpublished data; see Discussion).


Figure 1
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Power of haplotype-frequency tests to detect positive selection before fixation. Selective sweep was modeled by the coalescent process described in Methods. We assumed the genic selection model (h = 0.5). The scaled selection coefficient {alpha} (=2Ns, where N is the effective population size and s is the selection coefficient) of the selected allele was 300. Sample size (n) was 90. We assumed that the selected site was in the middle of an otherwise neutral region. The population mutation ({theta}) and recombination ({rho}) rates of the neutral region were both 10.

 
Lastly, tests based on C, M, and F (full haplotype configuration, frequency of the most common haplotype, and haplotype homozygosity, respectively) yield very similar results. This is true no matter they are conditional on K or S. The similarity is probably due to the presence of a predominant haplotype under hitchhiking (Barton 1998Go; Depaulis et al. 2003Go), resulting in a strong correlation among C, M, and F. Hereafter, we choose the EW (i.e., F conditional on K) and the Hudson test (M conditional on S) to represent the 2 kinds haplotype-frequency tests.

Note that the power curves in figure 1 were generated by assuming that the selected site was in the middle of an otherwise neutrally evolving region. It is well known that the strength of hitchhiking depends on the distance between the selected site and the hitchhiking variants (Maynard Smith and Haigh 1974Go; Stephan et al. 1992Go; Fay and Wu 2000Go; Kim and Stephan 2002Go). Even the LD pattern has been shown to depend on this spatial relationship (Kim and Nielsen 2004Go; Stephan et al. 2006Go). In the online supplementary material (supplementary fig. S3), we show that, while the power decreases when the distance between the selected site and the neutral region increases, the relative performances of these tests remain the same as in figure 1.

Contrasting the Power of Haplotype- and Site-Frequency Tests
In this section, we compare the 2 representative haplotype-frequency tests (EW and Hudson's test) with the 2 site-frequency tests, Tajima's D and the compound test, DH (Zeng et al. 2006Go). We first investigate the effects of intragenic recombination on the power of the tests by generating samples under hitchhiking with or without recombination (fig. 2; Tajima's D is less powerful than DH, and is not shown). The EW test is in fact more robust against recombination than the other 2 tests. On the contrary, Hudson's test tends to suffer the most significant loss of power.


Figure 2
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— The effects of intragenic recombination on the power of the tests to detect positive selection. Positive selection was simulated with {alpha} = 150, h = 0.5. Sample size was 90. The selected site was placed in the middle of a neutral region. The population mutation rate {theta} of the neutral region was set to 10. Two different levels of intragenic recombination ({rho}/{theta} = 0 or 1) were considered.

 
Figure 3 shows the power properties of the tests when the advantageous allele is at different frequencies. In general, haplotype-frequency tests are much more powerful than site-frequency tests before fixation. The EW test is usually the most powerful test, especially when recombination rate is much higher than mutation rate (fig. 3A vs.3B). Site-frequency tests can outperform haplotype tests only when the advantageous allele is very close to fixation (e.g., the fourth panel in fig 3A). In agreement with the previous report (Zeng et al. 2006Go), the DH test is more powerful than Tajima's D before fixation. In contrast, Hudson's test (representing haplotype tests conditional on S) is rarely the best test.

In figure 3A, all tests have good power to detect positive selection when f (frequency of the selected allele) is 0.95 (fourth panel). For this case, we further explore the effects of the distance between the selected site and the neutral region on the power (fig. 4). As expected, the power decreases when the scaled distance Cbet/ s increases, where Cbet is the recombination distance between the site under selection and the left end of the neutral region (we assume the selected locus is on the left-hand side of the neutral region). Most interestingly, the 2 haplotype tests, especially the EW test, are more powerful than the 2 site-frequency tests over a large range of Cbet/ s values. In practical terms, the EW test can detect positive selection further away from the site of selection than the other tests. This conclusion is generally true before fixation (unpublished data). The factor underlying the difference between haplotype- and site-frequency tests in the prefixation phase is likely LD. We know that the loss of diversity and the increase in LD are characteristic for this stage of hitchhiking (e.g., Stephan et al. 2006Go; Zeng et al. 2006Go). Site-frequency tests hence lose power because they do not consider the increase in level of LD.


Figure 4
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Power to detect positive selection as a function of the scaled distance between the selected site and the neutral region (Cbet/ s) when the frequency of the advantageous allele was 95%. Positive selection was simulated with {alpha} = 300, h = 0.5. Sample size was 90. The population mutation ({theta}) and recombination ({rho}) rates of the neutral region were both set to 10. We assumed the selected site was on the left-hand side of the neutral region. Cbet is the recombination distance between the left end of the neutral region and the selected site. Note that Cbet/ s = 0 means the selected site is in the middle of the neutral region.

 
After fixation, the patterns of polymorphism and LD caused by hitchhiking will linger on for a short period of time, and what follows is a prolonged period when the population shows a deficit of intermediate- to high-frequency variants (Kim and Stephan 2002Go; Przeworski 2002Go; Kim and Nielsen 2004Go; Zeng et al. 2006Go). Tajima's D and the newly derived E test may be most powerful during this latter period (e.g., Zeng et al. 2006Go). Simulation results indicate a distinct point of transition between the 2 postfixation phases. In supplementary figure S4 (Supplementary Material online), it is around {tau} = 0.05 ({tau} is time after fixation, measured in units of 4N generations). However, this recovery phase when D and E are most powerful is in fact hardly distinguishable from the recovery phase of some demographic changes (e.g., population growth) or background selection (e.g., Fu 1997Go; Zeng et al. 2006Go). Effectively, the transition point may be the limit of our ability to detect the unique influence of selective sweep.

Power of the EHH Test
The main difference between the EHH test and the other tests mentioned above is that EHH requires the prior definition of the focal site (and core haplotype) and knowledge of the local recombination rate. In figure 5, we show power curves of the EHH test conducted in 2 different ways (i.e., with or without using the site under selection as the core). The power of the EHH test is indeed very high when the precise site under selection is used as the core (the top graph in fig. 5), but its power drops precipitously when the next and closest segregating site is used as the core (the bottom graph). The power of the EW test is in between those of the EHH test and is much closer to the one with the selected site as the core (fig. 5). Considering that the EW test does not directly model recombination and does not require the knowledge of the location of the selected site, its performance is indeed quite impressive. Two factors may result in the "core-dependent" property of the EHH test. First, the site under selection is the defining site for the special pattern of LD caused by the rapid fixation of the advantageous allele (Kim and Nielsen 2004Go; Stephan et al. 2006Go). That is LD between 2 neutral loci can be eliminated by a sweep if they are separated by the selected site, whereas strong LD can be observed if the neutral loci are on the same side of the selected site. Hence, when the selected site is not used as the core, the signal of selection may be weakened. Second, ancestral alleles are much more common in the population than derived ones when a sweep starts. Thus, at the closest polymorphic site, it is more likely for the ancestral allele to hitchhike with the advantageous allele, that is, the use of derived allele at this site causes further loss in power. But, in the search of recent sweeps, there is no point to use ancestral alleles as the core haplotype. Recently, Voight et al. (2006)Go proposed an EHH-like test, the integrated haplotype score (IHS) test. They showed that this new test is not dependent on the precise knowledge of location of the selected site. The study of other properties of the EHH test and the IHS test is beyond the scope of this paper.


Figure 5
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Power of the EHH test to detect positive selection before fixation. Positive selection was simulated with {alpha} = 150, h = 0.5. Sample size was 90. The selected site was placed in the middle of the neutral region whose population mutation and recombination rates were both 10. The EHH test was conducted in 2 different ways: EHH(selected site as core) meant the site under selection was used as the core and EHH(site closest to selected site as core) meant the polymorphic site closest to the selected site was used as the core. We used the true population recombination rate ({rho} = 10) to obtain the critical values of the EHH test (see Methods).

 
Sensitivities of the Tests to Other Driving Forces
All tests considered so far use the standard neutral model as the null hypothesis. This model makes a number of assumptions (for instance, constant population size). Violation of different assumptions can usually lead to similar test results (e.g., Zeng et al. 2006Go). In this section, we study the sensitivities of the haplotype- and site-frequency tests to background selection and various demographic factors. We will still use the EW test and Hudson's test to represent haplotype tests conditional on K and S, respectively.

Background Selection
Selection against linked deleterious mutations maintained by recurrent mutation is referred to as background selection (Charlesworth et al. 1993Go). Typically, background selection results in a reduction in effective population size; it also reduces the levels of polymorphism at linked neutral loci (Charlesworth et al. 1993Go; Hudson and Kaplan 1994Go). Previous studies suggest that positive selection and background selection can leave similar patterns of polymorphism at linked neutral loci (Charlesworth et al. 1995Go; Fu 1997Go). Thus, discrimination between these 2 very different modes of selection is important.

Here, we used Hudson and Kaplan's 2-locus model (1994)Go to simulate background selection (see Methods). The recombination distance between the deleterious region and the left end of neutral locus is denoted as Cbet. (We assume the deleterious region is on the left-hand side of the neutral region.) The conventional value sh = 0.02 was assumed in the simulations, where s and h are, respectively, selection and dominance coefficients for mutant alleles. The results are shown in table 2. There are 2 general trends that apply to all tests except DH, which is essentially not affected by background selection. First, the sensitivity of the tests is monotonely increasing as U (deleterious mutation rate per diploid genome) increases. When U is small (say, U < 0.05), the tests are not affected even if the population size is relatively small (say, N ≤ 5,000). However, as U gets larger, the tests exhibit inflated type I error rates. In this case, larger populations are affected to a lesser extent. The second trend is that, when deleterious mutation rate is high, we can observe very large haplotype blocks with reduced variability. For example, when N is 10,000, U is 0.1, and Cbet is 5 x 10–4, the rejection rates of the EW test and Hudson's test are still 18.1% and 10.6%, respectively. More significant blocky haplotype structure can be seen when N is smaller (unpublished data).


View this table:
[in this window]
[in a new window]

 
Table 2 Sensitivity of Various Tests to Background Selection

 
Among the tests, the DH test has the lowest sensitivity. Hudson's test is more sensitive than DH, especially when deleterious mutation rate is high. Tajima's D and the EW test are remarkably similar. The sensitivity of these 2 tests is due to the excess of low-frequency variants in the population (Fu 1997Go).

Population Bottleneck
Population bottleneck is believed to be an important aspect of evolution for species, which may have dispersed from their ancestral range (e.g., the "out-of-Africa" hypothesis of human evolution). Here, we model bottleneck in the following way. The population size at present is N. Going backward in time, the population size reduces exponentially to ßN. The length of this size-reducing period is tb (in units of 4N generations). Then the population size restores to N instantaneously (i.e., at time tb in the past). The effects of mild bottlenecks (say, ß > 0.25) are not detectable by the tests considered (unpublished data). In figure 6, we present results for ß = 0.05. The 4 tests can be divided into 2 groups according to their dynamics—DH and Hudson's test form the first group and the other 2 tests form the second group. Tests in the first group are considerably more conservative than those in the second group. The DH test is the most conservative test. Its rejection rate never goes above 12% (the peak is 11.7% at tb = 0.08). But the highest rejection rates of other tests are all above 19%. The main difference between Tajima's D and the EW test is that the EW test tends to be more powerful in detecting more recent size reduction, whereas Tajima's D is better at detecting older ones.


Figure 6
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— The sensitivity of the tests to population bottleneck. We assumed that the population mutation rate {theta} at present was 10. Going backwards in time, {theta} decreased exponentially to {theta}1= 0.5 (i.e., ß = {theta}1/{theta} = 0.05). The length of this size-reducing period is denoted as tb (measured in 4N generations, where N is population size at present). Intragenic recombination was considered ({rho} = {theta} = 10 at present). Sample size was 90.

 
Population Subdivision
In this section, we discuss simulation results obtained by using the symmetric finite island model (Wright 1931Go) with 2 or 3 subpopulations (demes). The level of differentiation is measured by the FST statistic, which can be calculated as

Formula (5)
where Ns is the number of breeding individuals per deme, m is the probability that each gene is an emigrant, and d is the number of demes (d = 2 or 3) (e.g., Slatkin 1991Go). In figure 7, we show the rejection rates of the tests as functions of FST. In both plots, we assume that all genes are sampled from one deme. In general, Tajima's D and the EW test behave in a similar manner and are rarely affected. The DH test is slightly less conservative than Tajima's D and the EW test, but its rejection rate rarely goes above 10%. Hudson's test has similar behavior as the DH test when d = 2 (fig. 7A), but when d = 3 it is not as conservative, and its rejection rate is higher than 10% for FST≥ 0.2 (fig. 7B). Among the tests, Fay and Wu's H has the highest false-positive rate.


Figure 7
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 7.— Rejection rates of the tests as functions of FST when all sequences were sampled from one deme in a subdivided population with 2 or 3 demes. We used the symmetric finite island model with 2 (A) or 3 (B) demes. The population mutation rate of each deme ({theta}s = 4Nsu, where Ns is the population size of a deme) was 2.5 for plot A (2 demes) or 1.67 for plot B (3 demes). That is we fixed the total population mutation rate {theta}T= 4Nsud = 5. Recombination occurred at rate {rho}s = {theta}s in both plots. Sample size was 90.

 
When sequences sampled from different demes are pooled together in the analysis, the picture is somewhat more complicated. Summarizing our simulation results, we find the following. 1) When sampling is extremely biased (i.e., the vast majority of the sequences are taken from one deme), the behaviors of the tests resemble those shown in figure 7. The tests, except for the H test, usually have rejection rates lower than 10%, if FST≤ 0.2. 2) If the sequences are sampled more or less uniformly across the demes, pooling samples does not have visible effects on the type I error rates of the tests (e.g., supplemental fig. S5, Supplementary Material online).

Joint Effects of Positive Selection and Demography
The results presented above were obtained by considering positive selection and demography separately. In reality, these 2 driving forces may have occurred concurrently. It is thus desirable to understand the effects of 2 interfering processes on the power of the tests. As an example, we assume that a selective sweep occurs in a population, which experienced a recent size expansion (fig. 8). The joint effects of sweep and growth on the EW test are quite complex (fig. 8A). The power curve of the EW test obtained under the interfering model (i.e., positive selection and growth together) often falls below those obtained under simpler models (i.e., only positive selection or only growth). In particular, there is a dip in power at tg{approx} 0.03. This is because frequency of the selected allele at this time is moderately high (about 40%), and thus, linked variants are likely to have hitchhiked to intermediate frequency (due to size expansion, most of the variants initially linked to the selected allele are at low frequency). Note that the EW test is sensitive to both population growth and positive selection. Therefore, when EW rejects neutrality, it is hard to tell which factor causes the rejection. Similar results were obtained when other simple tests such as Tajima's D were considered (unpublished data). On the other hand, the DH test, which is sensitive to selection but insensitive to growth, is much less affected when both factors are present (fig. 8B). Based on these results and results obtained by considering different interfering models (unpublished data), we conclude that DH is usually able to distinguish between positive selection and demography.


Figure 8
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 8.— Power of the EW test (A) and the DH test (B) to detect positive selection when the population has experienced a recent size expansion. We assumed that the size of the population increased 10-fold instantly at time zero (tg= 0; in units of 4N generations, where N is the population size after growth) and that immediately after the expansion, an advantageous allele appeared (i.e., tg= 0). We showed the power of EW or DH to reject the neutral model, conditional on ultimate fixation of this advantageous allele (labeled "Selection + Growth"). For comparison, the power of the tests when only population expansion (labeled "Growth") or only positive selection (labeled "Selection") was in operation was also shown. In this figure, selective sweep was simulated by using {alpha} = 150, h = 0.5. The dotted line showed the mean frequency of the selected allele tb time units after its birth. We assumed that the site under selection was in the middle of a neutral region where {theta} = {rho} = 5 (scaled by using 4N).

 

    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The EW Test in the Presence of Recombination
In this study, we have showed that the EW test, albeit derived under the no-recombination neutral model, is applicable to DNA sequences with recombination. Previously, Hudson (1983)Go found that the conditional distributions of given values of K depend on {theta} and {rho} in a complex way, but in general the effects of recombination do not seem to be very significant. Our results suggest that, averaging over all possible values of K, recombination does not affect the type I error rate of the EW test appreciably. Because K contains information on the local mutation and recombination rates, it may not be surprising that haplotype tests conditional on K have higher power to detect positive selection than those conditional on S. In particular, results of figure 3 B indicate that the EW and related tests may be useful for analyzing data from organism like Drosophila whose recombination rate is several fold higher than the mutation rate.

Site-Frequency Tests versus Haplotype-Frequency Tests
By summarizing simulation results obtained under various combinations of parameter values (10 ≤ n ≤ 120, 0.5 ≤ {theta} ≤ 30, 0 ≤ {rho}/{theta} ≤ 15, and 50 ≤ {alpha} ≤ 1000), we find that the most important factor determining the relative performance between the DH test (representing site-frequency tests) and the EW test (representing haplotype-frequency tests) is the absolute level of polymorphism of the sample (i.e., values of S and n). For example, when there is only one polymorphic site in the sample whose derived allele occurs in 48 of the 50 samples, the P value of the DH test is 0.005, but that of the EW test is 0.344. In other words, when S and/or n are small (say, S < 10 and/or n ≤ 10), the EW test may not be the most powerful test among the tests considered. In those cases, we recommend the DH test for 2 reasons. First, the DH test does not require the identification of linkage phase. Second, the DH test is relatively insensitive to forces other than directional selection.

The Effects of Other Factors—Dominance and Haplotype Inference
So far, we have assumed the genic selection model for the advantageous allele (i.e., h = 0.5). Recent study has showed that the level of dominance can have important impact on the effect of hitchhiking (Teshima and Przeworski 2006Go). Based on some exploratory simulation studies with h = 0.1 or 0.9, we find that recessive advantageous alleles are indeed more difficult to detect than dominant ones, in agreement with the results of Teshima and Przeworski. Nevertheless, the relative performance of these tests reported above remains true. Furthermore, when the neutral region is very close to the selected site, the effect of dominance tends to be weak (unpublished data). Our analysis also assumes that the haplotypes are known. But, in many nonmodel organisms, they can only be inferred. When the phase inference is incorrect, type I error rates of the haplotype tests may be inflated (unpublished data).

The Issue of Specificity
We summarize qualitatively the power (or sensitivity) of the tests to detect positive selection, background selection, and various demographic scenarios in table 3. It is clear that all simple tests can be sensitive to more than one driving forces. In comparison, the compound test, DH, is affected by factors other than positive selection to a much lesser extent. This is because the 2 component tests of DH, Tajima's D and Fay and Wu's H, are both powerful in detecting selection but are sensitive to other driving forces in a somewhat mutually exclusive way (table 3; Zeng et al. 2006Go). Furthermore, figure 8B indicate that the DH test is reasonably powerful even when positive selection occurs in a nonequilibrium background. These results suggest that the compound test DH provides a novel way of separating positive selection from demography. On the other hand, we note that the EW test and Tajima's D are similar in many respects and that the EW test is very powerful in detecting positive selection. These observations suggest that, by combining the EW test and H or DH, we may be able to construct new compound tests with improved performance. We will explore this possibility in the companion paper (Zeng K, Shi S, Wu C-I, in preparation).


View this table:
[in this window]
[in a new window]

 
Table 3 A Qualitative Summary of the Power or Sensitivity of the Tests to Various Driving Forces

 

    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary table S1 and figures S1, S2, S3, S4, and S5 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We thank Drs R. Hudson, T. Nagylaki, J. Pritchard, and 2 anonymous reviewers for their helpful comments. K.Z. is supported by Sun Yat-sen University and the Kaisi Fund. S.S. is supported by grants from the National Natural Science Foundation of China (30230030, 30470119, 30300033, and 30500049). C.-I.W. is supported by National Institutes of Health grants and an OOCS grant from the Chinese Academy of Sciences.


    Footnotes
 
Jianzhi Zhang, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Barton NH. The effect of hitch-hiking on neutral genealogies. Genet Res (1998) 72:123–133.[CrossRef][Web of Science]

    Biswas S, Akey JM. Genomic insights into positive selection. Trends Genet (2006) 22:437–446.[CrossRef][Web of Science][Medline]

    Braverman JM, Hudson RR, Kaplan NL, Langley CH, Stephan W. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics (1995) 140:783–796.[Abstract]

    Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics (1993) 134:1289–1303.[Abstract]

    Charlesworth D, Charlesworth B, Morgan MT. The pattern of neutral molecular variation under the background selection model. Genetics (1995) 141:1619–1632.[Abstract]

    Coop G, Griffiths RC. Ancestral inference on gene trees under selection. Theor Popul Biol (2004) 66:219–232.[CrossRef][Web of Science][Medline]

    Depaulis F, Mousset S, Veuille M. Power of neutrality tests to detect bottlenecks and hitchhiking. J Mol Evol (2003) 57(Suppl 1):S190–S200.[CrossRef][Web of Science][Medline]

    Depaulis F, Mousset S, Veuille M. Detecting selective sweeps with haplotype tests. In: Selective sweep—Nurminsky D, ed. (2005) Georgetown (DC): Landes Bioscience. 33–54.

    Depaulis F, Veuille M. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol Biol Evol (1998) 15:1788–1790.[Web of Science][Medline]

    Donnelly P, Tavaré S. Coalescents and genealogical structure under neutrality. Annu Rev Genet (1995) 29:401–421.[CrossRef][Web of Science][Medline]

    Ewens WJ. The sampling theory of selectively neutral alleles. Theor Popul Biol (1972) 3:87–112.[CrossRef][Web of Science][Medline]

    Ewens WJ. Testing for increased mutation rate for neutral alleles. Theor Popul Biol (1973) 4:251–258.[CrossRef][Web of Science][Medline]

    Ewens WJ. Mathematical population genetics (2004) Berlin (Germany): Springer-Verlag.

    Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics (2000) 155:1405–1413.[Abstract/Free Full Text]

    Fay JC, Wu CI. Sequence divergence, functional constraint, and selection in protein evolution. Annu Rev Genomics Hum Genet (2003) 4:213–235.[CrossRef][Web of Science][Medline]

    Fu YX. Statistical properties of segregating sites. Theor Popul Biol (1995) 48:172–197.[CrossRef][Web of Science][Medline]

    Fu YX. New statistical tests of neutrality for DNA samples from a population. Genetics (1996) 143:557–570.[Abstract]

    Fu YX. Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics (1997) 147:915–925.[Abstract]

    Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics (1993) 133:693–709.[Abstract]

    Griffiths RC. The frequency spectrum of a mutation, and its age, in a general diffusion model. Theor Popul Biol (2003) 64:241–251.[CrossRef][Web of Science][Medline]

    Hanchard NA, Rockett KA, Spencer C, Coop G, Pinder M, Jallow M, Kimber M, McVean G, Mott R, Kwiatkowski DP. Screening for recently selected alleles by analysis of human haplotype similarity. Am J Hum Genet (2006) 78:153–159.[CrossRef][Web of Science][Medline]

    Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol (1983) 23:183–201.[CrossRef][Web of Science][Medline]

    Hudson RR. The how and why of generating gene genealogies. In: Mechanisms of molecular evolution—Takahata N, Clark AG, eds. (1993) Sunderland (MA): Sinauer Associates. 23–36.

    Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics (2002) 18:337–338.[Abstract/Free Full Text]

    Hudson RR, Bailey K, Skarecky D, Kwiatowski J, Ayala FJ. Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. Genetics (1994) 136:1329–1340.[Abstract]

    Hudson RR, Kaplan NL. Gene trees with background selection. In: Non-neutral evolution: theories and molecular data—Golding B, ed. (1994) London: Chapman & Hall. 140–153.

    Innan H, Zhang K, Marjoram P, Tavaré S, Rosenberg NA. Statistical tests of the coalescent model based on the haplotype frequency distribution and the number of segregating sites. Genetics (2005) 169:1763–1777.[Abstract/Free Full Text]

    Jensen JD, Kim Y, DuMont VB, Aquadro CF, Bustamante CD. Distinguishing between selective sweeps and demography using DNA polymorphism data. Genetics (2005) 170:1401–1410.[Abstract/Free Full Text]

    Kaplan NL, Hudson RR, Langley CH. The "hitchhiking effect" revisited. Genetics (1989) 123:887–899.[Abstract/Free Full Text]

    Karlin S, McGregor J. Addendum to a paper of W. Ewens. Theor Popul Biol (1972) 3:113–116.[CrossRef][Web of Science][Medline]

    Kelly JK. A test of neutrality based on interlocus associations. Genetics (1997) 146:1197–1206.[Abstract]

    Kim Y, Nielsen R. Linkage disequilibrium as a signature of selective sweeps. Genetics (2004) 167:1513–1524.[Abstract/Free Full Text]

    Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics (2002) 160:765–777.[Abstract/Free Full Text]

    Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics (1969) 61:893–903.[Free Full Text]

    Kimura M. Average time until fixation of a mutant allele in a finite population under continued mutation pressure: studies by analytical, numerical, and pseudo-sampling methods. Proc Natl Acad Sci USA (1980) 77:522–526.[Abstract/Free Full Text]

    Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res (1974) 23:23–35.[Web of Science][Medline]

    Nielsen R. Molecular signatures of natural selection. Annu Rev Genet (2005) 39:197–218.[CrossRef][Web of Science][Medline]

    Przeworski M. The signature of positive selection at randomly chosen loci. Genetics (2002) 160:1179–1189.[Abstract/Free Full Text]

    Sabeti PC, Reich DE, Higgins JM, et al, (17 co-authors). Detecting recent positive selection in the human genome from haplotype structure. Nature (2002) 419:832–837.[CrossRef][Medline]

    Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES. Positive natural selection in the human lineage. Science (2006) 312:1614–1620.[Abstract/Free Full Text]

    Simonsen KL, Churchill GA, Aquadro CF. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics (1995) 141:413–429.[Abstract]

    Slatkin M. Inbreeding coefficients and coalescence times. Genet Res (1991) 58:167–175.[Web of Science][Medline]

    Slatkin M. An exact test for neutrality based on the Ewens sampling distribution. Genet Res (1994) 64:71–74.[Web of Science][Medline]

    Slatkin M. A correction to the exact test based on the Ewens sampling distribution. Genet Res (1996) 68:259–260.[Web of Science][Medline]

    Slatkin M, Bertorelle G. The use of intraallelic variability for testing neutrality and estimating population growth rate. Genetics (2001) 158:865–874.[Abstract/Free Full Text]

    Spencer CC, Coop G. SelSim: a program to simulate population genetic data with natural selection and recombination. Bioinformatics (2004) 20:3673–3675.[Abstract/Free Full Text]

    Stephan W, Song YS, Langley CH. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics (2006) 172:2647–2663.[Abstract/Free Full Text]

    Stephan W, Wiehe THE, Lenz MW. The effect of strongly selected substitutions on neutral polymorphism—analytical results based on diffusion-theory. Theor Popul Biol (1992) 41:237–254.[CrossRef][Web of Science]

    Strobeck C. Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics (1987) 117:149–153.[Abstract/Free Full Text]

    Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics (1989) 123:585–595.[Abstract/Free Full Text]

    Teshima KM, Przeworski M. Directional positive selection on an allele of arbitrary dominance. Genetics (2006) 172:713–718.[Abstract/Free Full Text]

    Toomajian C, Ajioka RS, Jorde LB, Kushner JP, Kreitman M. A method for detecting recent selection in the human genome from allele age estimates. Genetics (2003) 165:287–297.[Abstract/Free Full Text]

    Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol (2006) 4:e72.[CrossRef][Medline]

    Wall JD. Recombination and the power of statistical tests of neutrality. Genet Res (1999) 74:65–79.[CrossRef][Web of Science]

    Wall JD. A comparison of estimators of the population recombination rate. Mol Biol Evol (2000) 17:156–163.[Abstract/Free Full Text]

    Wall JD, Hudson RR. Coalescent simulations and statistical tests of neutrality. Mol Biol Evol (2001) 18:1134–1135.[Free Full Text]

    Wang ET, Kodama G, Baldi P, Moyzis RK. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA (2006) 103:135–140.[Abstract/Free Full Text]

    Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol (1975) 7:256–276.[CrossRef][Web of Science][Medline]

    Watterson GA. The homozygosity test of neutrality. Genetics (1978) 88:405–417.[Abstract/Free Full Text]

    Wright S. Evolution in Mendelian populations. Genetics (1931) 16:97–159.[Free Full Text]

    Zeng K, Fu YX, Shi S, Wu CI. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics (2006) 174:1431–1439.[Abstract/Free Full Text]

Accepted for publication April 14, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
D. Garrigan, R. Lewontin, and J. Wakeley
Measuring the Sensitivity of Single-locus "Neutrality Tests" Using a Direct Perturbation Approach
Mol. Biol. Evol., January 1, 2010; 27(1): 73 - 89.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. P. Jackson
The Evolution of Amastin Surface Glycoproteins in Trypanosomatid Parasites
Mol. Biol. Evol., January 1, 2010; 27(1): 33 - 45.
[Abstract] [Full Text] [PDF]


Home page
Gen Biol EvolHome page
C.-H. Chen, T.-J. Chuang, B.-Y. Liao, and F.-C. Chen
Scanning for the Signatures of Positive Selection for Human-Specific Insertions and Deletions
Gen Biol Evol, November 23, 2009; 2009(0): 415 - 419.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. J. Eckert, J. L. Wegrzyn, B. Pande, K. D. Jermstad, J. M. Lee, J. D. Liechty, B. R. Tearse, K. V. Krutovsky, and D. B. Neale
Multilocus Patterns of Nucleotide Diversity and Divergence Reveal Positive Selection at Candidate Genes Related to Cold Hardiness in Coastal Douglas Fir (Pseudotsuga menziesii var. menziesii)
Genetics, September 1, 2009; 183(1): 289 - 298.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. E. Lohmueller, C. D. Bustamante, and A. G. Clark
Methods for Human Demographic Inference Using Haplotype Patterns From Genomewide Single-Nucleotide Polymorphism Data
Genetics, May 1, 2009; 182(1): 217 - 231.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. Zhai, R. Nielsen, and M. Slatkin
An Investigation of the Statistical Power of Neutrality Tests Based on Comparative and Population Genetic Data
Mol. Biol. Evol., February 1, 2009; 26(2): 273 - 283.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. A. Rosenberg and M. Jakobsson
The Relationship Between Homozygosity and the Frequency of the Most Frequent Allele
Genetics, August 1, 2008; 179(4): 2027 - 2036.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Llopart and J. M. Comeron
Recurrent Events of Positive Selection in Independent Drosophila Lineages at the Spermatogenesis Gene roughex
Genetics, June 1, 2008; 179(2): 1009 - 1020.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Lu, Y. Fu, S. Kumar, Y. Shen, K. Zeng, A. Xu, R. Carthew, and C.-I Wu
Adaptive Evolution of Newly Emerged Micro-RNA Genes in Drosophila
Mol. Biol. Evol., May 1, 2008; 25(5): 929 - 938.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Zeng, S. Shi, and C.-I Wu
Compound Tests for the Detection of Hitchhiking Under Positive Selection
Mol. Biol. Evol., August 1, 2007; 24(8): 1898 - 1908.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/7/1562    most recent
msm078v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?