Skip Navigation


MBE Advance Access originally published online on June 8, 2007
Molecular Biology and Evolution 2007 24(8):1898-1908; doi:10.1093/molbev/msm119
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/8/1898    most recent
msm119v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Compound Tests for the Detection of Hitchhiking Under Positive Selection

Kai Zeng*,1, Suhua Shi* and Chung-I Wu{dagger}

* State Key Laboratory of Biocontrol and Key Laboratory of Gene Engineering of the Ministry of Education, Sun Yat-sen University, Guangzhou, China
{dagger} Department of Ecology and Evolution, University of Chicago

E-mail: kzeng{at}uchicago.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Many statistical tests have been developed for detecting positive selection. Most of these tests draw conclusions based on significant deviations from the patterns of polymorphism predicted by the neutral model. However, many non-equilibrium forces may cause similar deviations, and thus the tests usually have low statistical specificity to positive selection. The main challenge is hence to construct test statistics that are reasonably powerful in detecting positive selection, but are relatively insensitive to other forces. Recently, Zeng et al. (2006) proposed a new test, DH, which is a compound of Tajima's D and Fay and Wu's H, and showed that DH has reasonably high statistical specificity to positive selection. In this report, we expand the idea of a compound test by combining Fay and Wu's H or DH with the Ewens-Watterson (EW) test. We refer to these 2 new tests as HEW and DHEW, respectively. Compared to the DH test, HEW and DHEW are more robust against the presence of recombination, and are also more powerful in detecting positive selection. Furthermore, the DHEW test, similar to DH, is also relatively insensitive to background selection and demography. The HEW test, on the other hand, tends to be somewhat less conservative than DH and DHEW in some cases.

Key Words: compound tests • positive selection • demography • background selection


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Clarifying the roles of genetic drift and natural selection in the course of evolution is an issue of fundamental importance in the study of evolutionary biology (Kimura 1983Go; Gillespie 1991Go). A widely adopted approach is to derive statistical tests under neutrality (the null hypothesis), and use these tests to detect whether the data is significantly different from the neutral expectation. A significant deviation is usually considered to be a consequence of positive selection. (These tests have been reviewed extensively by many authors: Fay and Wu 2003Go; Nielsen 2005Go; Biswas and Akey 2006Go; Sabeti et al. 2006Go.) However, the neutral theory is a highly simplified mathematical model with many assumptions (e.g., random mating, constant population size). Thus, a significant deviation can usually be explained by more than one alternative model (Simonsen et al. 1995Go; Przeworski 2002Go; Zeng et al. 2006Go). To make the problem even more difficult, different processes may sometimes leave very similar footprints on genetic variation, and are therefore hard to distinguish. For example, after the fixation of an advantageous allele, the population tends to have an excess of low-frequency variants in linked neutral regions (Braverman et al. 1995Go; Zeng et al. 2006Go). However, a similar pattern can also be produced by a recent population expansion (Fu 1997Go; Zeng et al. 2006Go). With these complications, it is usually hard to interpret results obtained by these statistical tests. This raises the need for statistical tests that have high specificity to positive selection.

Many methods have been proposed for specifically detecting positive selection. Broadly speaking, these methods are derived by using one of the 4 general approaches described below. The first approach, loosely speaking, is to develop methods in the HKA framework (Hudson et al. 1987Go; Innan 2006Go). The underlying principle is that positive selection tends to affect only a small number of loci that are closely linked to the advantageous allele, but demographic changes affect all loci in the genome equally. Usually, demographic changes are directly modeled, and genomic data are then used to estimate the parameters. Genes are considered as targets of positive selection if their patterns of genetic variation are incompatible with those predicted by the estimated demographic parameters (Wall et al. 2002Go; Akey et al. 2004Go; Li and Stephan 2006Go; Thornton and Andolfatto 2006Go). The difficulty with this approach is that it requires the prior knowledge of the demographic history of the species. But this information is usually not available, or is known only vaguely. Furthermore, it is not clear whether we can obtain the same results if we assume a different demographic model.

The second approach is more empirical. It scans patterns of variation over many loci, and considers loci in the tails of the empirical distribution as candidate targets of selection (Akey et al. 2002Go; Carlson et al. 2005Go; Voight et al. 2006Go; Wang et al. 2006Go). This approach is appealing in that it avoids specifying a complex demographic model. Nonetheless, the support for positive selection is usually obtained by comparing the empirical distribution with distributions obtained by simulating various demographic models. Furthermore, a recent study suggests that this kind of analysis may suffer from high rates of false positives and false negatives (Teshima et al. 2006Go).

The third approach relies on a composite likelihood method, which treats nucleotide sites as though they were statistically independent. Methods derived by this approach differ from those mentioned above in that the null hypothesis is not a specific population genetic model, but is derived from the background pattern of variation in the data itself (Jensen et al. 2005Go; Nielsen et al. 2005Go).

The fourth approach, which is the focus of this paper, has a simple structure. It requires no estimation of parameters, and is most suitable to data from a single locus. The first test derived by this approach is called the DH test (Zeng et al. 2006Go), which is a compound test combining Tajima's D (Tajima 1989Go) and Fay and Wu's H (Fay and Wu 2000Go). The underlying idea is that Tajima's D and Fay and Wu's H are both powerful in detecting positive selection, but are sensitive to other processes in a mutually exclusive manner (Zeng et al. 2006Go). Therefore, by properly combining these 2 tests, the resulting test (DH) obtains relatively high statistical specificity to positive selection.

The DH test, Tajima's D, and Fay and Wu's H only use the site-frequency spectrum (Fu 1995Go) for detection. It is known that these methods tend to be too conservative in the presence of intragenic recombination (Wall 1999Go; Przeworski et al. 2001Go; Zeng et al. 2007Go). In the companion study (Zeng et al. 2007Go), we have shown that these site-frequency methods are usually not as powerful in detecting positive selection as methods based on the haplotype-frequency spectrum such as the Ewens-Watterson (EW) test (Watterson 1978Go). Furthermore, we have shown that the EW test is insensitive to recombination, and is usually very powerful in detecting positive selection especially when recombination rate is high. In the light of these recent results, we expand the idea of a compound test by constructing new test statistics which make use of both site- and haplotype-frequency spectra.

In this paper, we propose two new tests by combining Fay and Wu's H or the DH test with the Ewens-Watterson (EW) test. We refer to these two tests as HEW (H + EW) and DHEW (DH + EW), respectively. The effect of positive selection is studied using the model first introduced by Maynard Smith and Haigh (1974)Go. By doing extensive computer simulations, we try to address the following questions: (1) Are HEW and DHEW more robust against the presence of intragenic recombination than is the DH test? (2) Are HEW and DHEW more powerful than DH in detecting positive selection? (3) Are HEW and DHEW sensitive to background selection and demographic changes?


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Statistical Tests
We first describe 4 previously developed tests (Tajima's D, Fay and Wu's H, Ewens-Watterson's test, and the DH test), and then we describe 2 new tests, HEW and DHEW.

Tajima's D and Fay and Wu's H
Tajima's well-known test statistic is defined as:

Formula (1)
(Tajima 1989Go). In equation (1), {theta}{pi} (Tajima 1983Go) and {theta}W (Watterson 1975Go) are 2 unbiased estimators of the population mutation rate {theta} (= 4Nu, where N is the effective population size and u is the neutral mutation rate of the locus).

Fay and Wu's H contrasts the abundance of high-frequency variants with the abundance of intermediate-frequency ones (Fay and Wu 2000Go). Here we use the normalized H statistic, defined as:

Formula (2)
(Zeng et al. 2006Go). {theta}L above is a new unbiased estimator of {theta}, and by using {theta}L, the variance term Var({theta}{pi} {theta}L) can be derived (Zeng et al. 2006Go).

The Ewens-Watterson test
The test statistic of the Ewens-Watterson (EW) test is:

Formula (3)
where F is the sample homozygosity, K is the number of haplotypes in a sample of size n, and fi is the frequency of the ith haplotype (Watterson 1978Go). Under the no-recombination infinite-allele model, Ewens (1972Go; see also Karlin and McGregor 1972Go) showed that the distribution of F|K is independent of {theta}.

The DH test
We denote an observed sample with S (≥ 1) segregating sites as XS,obs. We say XS,obs is rejected by the DH test at the significance level p if

Formula (4)
where D(X) and H(X) are, respectively, values of Tajima's D and Fay and Wu's H for a sample X, and DS,cri and HS,cri are critical values satisfying

Formula (5)
and

Formula (6)
XS,neu in the equations above represents random neutral samples with S segregating sites. The probabilities in equations (5) and (6) can be estimated by coalescent simulation with the number of segregating sites fixed (Hudson 1993Go; Wall and Hudson 2001Go).

The HEW test
Similar to the DH test, for a given significance level p, the rejection region (RR) for HEW is:

Formula (7)
where X{theta},neu represents random samples generated with a given {theta}, EW(X) is the value of the test statistic of the EW test for a sample X, and the critical values (H{theta},cri and EW{theta},cri) satisfying

Formula (8)
and

Formula (9)
These probabilities can be estimated by coalescent simulation (Hudson 1990Go). In practice {theta} is unknown. In our implementation, we set {theta} = {theta}W (Watterson 1975Go).

The DHEW test
The rejection region (RR) for DHEW for a given significance level p is:

Formula (10)

Similarly, the critical values satisfy the following conditions:

Formula (11)

Formula (12)

Simulating the null distributions of the tests
We obtain the null distribution of the EW test by using Slatkin's algorithm to enumerate all possible haplotype-frequency configurations for a given K (Slatkin 1994Go). For D, H, HEW and DHEW, we conduct standard coalescent simulation without recombination (Hudson 1990Go) and with {theta} estimated by Watterson's estimator (Watterson 1975Go). Finally, the critical values of the DH test are determined using coalescent simulation with the number of segregating sites fixed (Hudson 1993Go; Wall and Hudson 2001Go).

All tests are one-sided and are conducted at the 5% significance level. The tail of the null distribution of a test statistic which can maximize its power to detect positive selection is used. Thus, for the EW test, values falling into the upper 5% tail are considered significant, and for Tajima's D and Fay and Wu's H, values falling into the lower 5% tail are considered significant.

Simulating evolutionary processes
We simulate various evolutionary processes using the coalescent method (Hudson 1983Go, 1990Go) and the infinite-site model (Kimura 1969Go). Intragenic recombination is included in all the simulations. The intensity of recombination is measured by {rho} = 4Nr, where r is the recombination rate of the locus per generation. We use the coalescent process with a selective phase to generate random samples under the hitchhiking model. This model assumes that a beneficial mutation arises on a single chromosome, and that the fitnesses of the 3 genotypes (BB, Bb, and bb) are 1 + s, 1 + h·s, and 1, respectively, where B is the advantageous allele, b is the wild type allele, and s and h are the selection and dominance coefficients. The behavior of the selected allele is mainly determined by the scaled selection coefficient {alpha} (= 2Ns) and h. The trajectory of the advantageous allele is obtained by using the time-reversal property of the diffusion process (Ewens 2004Go), and the pseudo-sampling device (Kimura 1980Go). This procedure takes into account the stochastic effects on the change in frequency of the selected allele. It also allows us to explore selected alleles with different levels of dominance. (Only results obtained by assuming h = 0.5 have been presented. Other results obtained by assuming h = 0.1 or 0.9 are qualitatively similar.) The genealogy is then generated using the structured coalescent method (Braverman et al. 1995Go; Zeng et al. 2006Go). Background selection is simulated using the 2-locus model described in Hudson and Kaplan (1994)Go. This model assumes that the deleterious locus is in mutation-selection equilibrium and is not recombining, but that recombination can occur between the neutral locus and the deleterious locus. Here we extend the Hudson and Kaplan model by allowing intragenic recombination within the neutral locus. Finally, we use the algorithms implemented in the software package ms (Hudson 2002Go) to simulate various demographic scenarios.


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
In the companion study, we have shown that the EW test is more powerful than Tajima's D in detecting selection before and around the fixation of an advantageous allele, while under background selection or other demographic perturbations these 2 tests behave similarly (Zeng et al. 2007Go). Thus, we do not present results of the EW test, but will simply mention its contribution to the 2 new compound tests.

Null distributions of the compound tests
Statistically, a compound test transforms a sequence sample into a vector, and uses this vector as the test statistic. The elements of the vector are p-values of the component tests. An example of such a transformation is shown in figure 1 for the DH test, generated with n = 50 and S = 20. By definition, the 5% rejection region (the shaded area) is in the lower-left corner of the plot. Interestingly, under neutrality the vectors are largely uniformly distributed, suggesting that the p-values of D and H may be only weakly correlated (the left panel of figure 1). In contrast, when positive selection is in operation, the vectors tend to cluster in the lower-left corner, resulting in rejections of neutrality (the right panel of figure 1). In table 1, we show the significance levels of the component tests (i.e., p*’s defined by equations (6), (9) and (12)) of the 3 compound tests under various combinations of parameter values. In all cases, p* is several-fold higher than p, the nominal significance level of a compound test. The DHEW test has the highest p* value, because this test is constrained by 3 different tests. The actual rejection rates of the 3 compound tests under neutrality are presented in table 2. When there is no recombination ({rho}/{theta} = 0), the actual rejection rates of the tests are very close to the nominal significance level (5% in table 2), suggesting that they are indeed legitimate statistical tests. When intragenic recombination is included, the DH test becomes overly conservative, in agreement with previously reports (Wall 1999Go; Zeng et al. 2007Go). However, the other 2 tests (and especially HEW) are affected by recombination to a much lesser extent. This gain in robustness is due to the EW test whose rejection rate has been shown to be insensitive to a change in local recombination rate (Zeng et al. 2007Go).


Figure 1
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The distribution of the DH statistic under neutrality or positive selection. Sample size is 50, and each random sample has 20 segregating sites. Each sample is transformed to a vector by calculating the p-values of Tajima's Dand Fay and Wu's H. Neutral samples are generated using the standard coalescent method. Positive selection is simulated using the coalescent process with a selective phase (see Methods). Here we assume that a selected site with scaled selection coefficient {alpha}= 300 and dominance coefficient h= 0.5 is located in the middle of a neutral region, and that the frequency of the selected allele at the time of sampling is 80%. No intragenic recombination is included in either model. The shaded region indicates the 5% rejection region of the DH test (see also table 1).

 

View this table:
[in this window]
[in a new window]

 
Table 1 Significance Levels, p*, of the Component Tests of the Three Compound Tests

 

View this table:
[in this window]
[in a new window]

 
Table 2 Actual Rejection Rates of the Compound Tests Under the Neutral Model

 
Power of the compound tests to detect positive selection
In this section we study the power of the compound tests to detect positive selection, and the differences between compound tests and simple site-frequency tests (i.e., Tajima's D and Fay and Wu's H). In the simulations, we incorporate intragenic recombination, and assume that {alpha} = 300, h = 0.5. Results obtained by using other parameter values are qualitatively comparable (Supplemental Figure S1). Figure 2 shows the power of the tests as a function of the size of the neutral region when the advantageous allele is still segregating in the population. In this plot, the selected site is located in the middle of the neutral region. It is clear that HEW and DHEW are more powerful than the other 3 tests which rely solely on the site-frequency spectrum. This suggests that before fixation the haplotype-frequency spectrum contains useful information about the hitchhiking process, and that ignoring this information would cause a loss of power. At this stage of a sweep, the power of a test to detect positive selection decreases in the following order: HEW (the most powerful), DHEW, H, DH, and D. However, the difference between H and DH is not significant. Increasing the local recombination rate does not change this order, but does increase the advantage of HEW and DHEW over the other tests (figure 2 versus Supplemental figure S2). In general, recombination reduces the size of the region that a selective sweep affects, and lowers the power of the tests, especially that of Tajima's D (e.g., compare the fourth panels in figures 2 and S2). HEW is less powerful than the EW test when the selected allele is at low to intermediate frequency (e.g., less than 70%), but otherwise outperforms it.


Figure 2
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Power of the tests to detect positive selection as a function of the size of the neutral region when the advantageous allele is at various frequencies (0.3, 0.5, 0.7 and 0.9 for each panel). Size of the neutral region is measured by {theta}, the population mutation rate. Positive selection is simulated with {alpha}= 300, h= 0.5, and sample size 50. The selected site is located in the middle of the neutral region. Intragenic recombination is included with intensity {rho}/{theta}= 1, where {rho} is the population recombination rate of the neutral region. Note that the scales on the y-axes of the plots are different.

 
After the advantageous allele has been fixed, relative performances of the tests are not always the same as those before fixation. When recombination rate is of the same order as the mutation rate ({rho}/{theta} = 1; figure 3A), Tajima's D tends to be the most powerful test. In this case, the DHEW test outperforms the other 2 compound tests, and the HEW test is somewhat better than DH. However, if recombination rate is high ({rho}/{theta} = 5; figure 3B), the relative performances of the tests resemble those before fixation, i.e. HEW tends to be the most powerful, followed by DHEW, and then the site-frequency tests. Although the power of the H test decreases quickly after fixation (Kim and Stephan 2002Go; Przeworski 2002Go; Zeng et al. 2006Go) results in figure 3 suggest that the rates of decrease in power of the 3 compound tests are much lower than that of the H test. As {tau} (time after fixation, measured in units of 4N generations) becomes larger, the special patterns of polymorphism and LD (linkage disequilibrium) due to hitchhiking disappear, and the population exhibits an excess of low- and intermediate-frequency mutations (Zeng et al. 2006Go). At this stage, Tajima's D is usually the most powerful test (Kim and Stephan 2002Go; Przeworski 2002Go; Zeng et al. 2006Go). However, this pattern of polymorphism is hard to distinguish from that due to population growth, or more generally, from that observed in the recovery phase of a process that causes a loss of polymorphism (Fu 1997Go; Zeng et al. 2006Go, 2007Go). As a result, the compound tests have little power at this stage. (As we shall see later, the compound tests are sensitive primarily to positive selection, and are relatively insensitive to other perturbations.) For this reason, we do not further discuss the post-fixation stage here; some exploratory results are shown in Supplemental figure S3.


Figure 3
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Power of the tests to detect positive selection as a function of the size of the neutral region at different time points {tau} (measured in units of 4N generations) after fixation. Size of the neutral region is measured by {theta}, the population mutation rate. Positive selection is simulated with {alpha} = 300, h = 0.5, and sample size 50. The selected site is placed in the middle of the neutral region. Intragenic recombination is included with 2 different intensities: {rho}/{theta} = 1 in plot A, and {rho}/{theta} = 5 in plot B, where {rho} is the population recombination rate of the neutral region. Note that the scales on the y-axes of the plots are different.

 
It has been shown that the patterns of polymorphism and LD caused by hitchhiking are highly dependent on the distance between the neutral region and the site under selection (Kim and Stephan 2002Go; Kim and Nielsen 2004Go; Stephan et al. 2006Go). To get a better understanding of the performances of the tests, we simulate a neutral locus with scaled mutation rate {theta} = 10, and place this locus in regions at various distances from the site under selection (figure 4). In this plot, the frequency of the selected allele is 90%, and the distance between the neutral locus and the selected locus is measured by Cbet/s, where Cbet is the recombination distance between the 2 loci, and s is the selection coefficient. The 2 new compound tests (i.e., HEW and DHEW) are in most cases more powerful than DH, Tajima's D, and Fay and Wu's H. In practical terms, HEW and DHEW are able to detect a sweep further away than are the other tests. A similar picture is observed shortly after fixation (Supplemental figure S4).


Figure 4
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Power to detect positive selection as a function of the scaled distance between the selected site and the neutral region (Cbet/s) when the frequency of the advantageous allele is 90%. Positive selection is simulated with {alpha} = 300, h = 0.5. Sample size is 50. The population mutation ({theta}) and recombination ({rho}) rates of the neutral region are both 10. We assume the selected site is on the left-hand side of the neutral region. Cbet is the recombination distance between the left end of the neutral region and the selected site. Note that Cbet/s = 0 means that the selected site is in the middle of the neutral region.

 
Sensitivity to background selection
Purifying selection against linked deleterious mutations maintained by recurrent mutation is referred to as background selection (Charlesworth et al. 1993Go). It is argued that this kind of selection may be prevalent in many organisms; for example, D. melanogaster (Hudson and Kaplan 1995Go; Charlesworth 1996Go). Furthermore background selection can skew the frequency spectrum and result in significant Tajima's D and Ewens-Watterson's test (Charlesworth et al. 1995Go; Fu 1997Go; Zeng et al. 2006Go, 2007Go). In table 3, we show the rejection rates of the tests under the background selection model. It is clear that none of the 3 compound tests are affected, despite the high levels of sensitivity of Tajima's D and the EW test (table 3; Zeng et al. 2007Go). This insensitivity is due to Fay and Wu's H, which is not sensitive to background selection, and therefore counteracts the effects of D and EW. Although background selection may be a common feature in the genome, it may not cause serious problems in practice. Several studies and the results from our own simulations suggest that background selection can produce significantly negative Tajima's D only under some special combinations of parameter values (Golding 1997Go; Przeworski et al. 1999Go). Furthermore, in some organisms, for example Drosophila, purifying selection may be too strong to generate significant Tajima's D (Andolfatto and Przeworski 2001Go).


View this table:
[in this window]
[in a new window]

 
Table 3 Sensitivities of the Tests to Background Selection

 
The effects of population demography
Demographic changes may be the most visible violation of the neutral assumptions. Therefore, discrimination between rejections due to demography and those due to positive selection is very important.

Population growth
We use the following model to study the effect of population growth: the population size at present is N; going backward in time, the population size decreases instantly to N0 at time tg (in units of 4N generations). In figure 5, we show the results obtained by assuming N0/N = 0.05. It is clear that the compound tests are largely unaffected. In contrast, the power of the D test can be as high as 60%. The EW test and Tajima's D behave similarly in this case (unpublished data).


Figure 5
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— The sensitivities of the tests to population growth. We assume that the population mutation rate {theta} at the time of sampling is 5. Going backwards in time, {theta} decreases instantly at time tg (in units of 4N generations, where N is the population size at the time of sampling) to {theta}0= 0.25 (i.e., the population size decreases 20-fold). Intragenic recombination is incorporated with rate {rho} = 4Nr = 5. Sample size is 50.

 
Population bottleneck
Here we model bottleneck in the following way. The population size at present is N. Going backward in time the population size reduces exponentially to {gamma}N. The length of this size-reducing period is tb (in units of 4N generations). Then the population size is restored to N instantaneously (i.e., at time tb in the past). Summarizing the results of our simulations, we find the following general trends. (1) The DH and DHEW tests behave similarly, and they are in most cases the most conservative tests. (2) When the reduction in population size is mild (e.g., {gamma} ≥ 0.25), none of the tests have a rejection rate higher than 10%. (3) For an intermediate level of size contraction (e.g., 0.001 < {gamma} < 0.25), the compound tests may show limited sensitivity. In this case, HEW tends to be less conservative than DH and DHEW (e.g., figure 6A). However, Tajima's D, and to a lesser extent, Fay and Wu's H, are usually much more sensitive than the compound tests (figure 6A). (4) Severe bottlenecks (e.g., {gamma} ≤ 0.001) have no significant effect on the compound tests, although Fay and Wu's H and Tajima's D are both highly sensitive (e.g., figure 6B). It has been argued that patterns of variation caused by a bottleneck may be hard to distinguish from those caused by hitchhiking (Barton 1998Go; Depaulis et al. 2003Go). The above results suggest that, under the model we assume, the compound tests, especially DH and DHEW, are more capable of discriminating between positive selection and bottleneck than are the other tests.


Figure 6
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— The sensitivities of the tests to population bottleneck. We assume that the population mutation rate {theta} at the time of sampling is 5. Going backwards in time, {theta} decreases exponentially to {theta}0 = 0.1 (i.e., {gamma} = {theta}0/{theta} = 0.01; plot A) or {theta}0= 0.01 (i.e., {gamma} = {theta}0/{theta} = 0.001; plot B). The length of this size-reducing period is denoted as tb (measured in 4N generations, where N is the population size at the time of sampling). Intragenic recombination is incorporated with rate {rho} = 4Nr = 5. Sample size is 50. Note that the scales on the x-axes are different.

 
Population subdivision
In this section, we study the effects of population structure on the tests using the simple symmetric island model (Wright 1931Go) with different numbers of subpopulations (demes). In table 4, we assume all sequences are taken from 1 deme in a subdivided population with different levels of differentiation. In general, DH and DHEW tend to behave similarly, but DHEW is slightly more conservative than DH. The rejection rates of these 2 tests are not affected by subdivision when FST ≤ 0.25 (this is also true under other sampling schemes; unpublished data). HEW, on the other hand, is not as robust as DH and DHEW, though it is affected to a lesser extent than Fay and Wu's H. Usually HEW’s rejection rate is reasonably low when FST ≤ 0.2. In the Supplement, we explore the effects of pooling samples taken from different demes on the rejection rates of the tests (figure S5). We find that even when sampling is quite biased, the type I error rates of the tests are usually unaffected.


View this table:
[in this window]
[in a new window]

 
Table 4 Sensitivities of the tests to population subdivision when all samples are taken from one deme

 
Differences between compound tests and composite likelihood methods
The composite likelihood test due to Nielsen et al. (2005Go; referred to as the CL test) can be seen as an extension of several other similar methods (Kim and Stephan 2002Go; Kim and Nielsen 2004Go; Jensen et al. 2005Go). This test treats nucleotide sites as though they were statistically independent, and uses the background pattern of variation in the data itself as the null model (Nielsen et al. 2005Go). Exploratory simulation results (Table 5) suggest that the CL test tends to have similar sensitivity to background selection and to several demographic models as the compound tests. However, the CL test is not as powerful as the compound tests are in detecting positive selection unless {theta} becomes sufficiently large (e.g., {theta} = 50). Two factors may account for this difference in power. First, the performance of the CL test may rely heavily on the accurate inference of the background pattern of variation. However, this inference may require a large amount of data which may not be always available. Second, the lower power of the CL test could also be explained by our not having incorporated recombination when simulating CL’s null distribution. However, we have been cautious about including recombination in the simulation, since it may make the CL test less robust against demographic changes (Nielsen et al. 2005Go).


View this table:
[in this window]
[in a new window]

 
Table 5 Comparisons Between the Compound Tests and the Composite Likelihood (CL) Test

 

    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
In this study, we develop two new compound tests—HEW and DHEW—by using the novel properties of the Ewens-Watterson (EW) test reported in the companion paper (Zeng et al. 2007Go). By carrying out extensive simulations, we find that the 3 compound tests (i.e., DH, HEW and DHEW) are reasonably powerful in detecting ongoing or recently completed selective sweeps, and are relatively insensitive to background selection and a number of demographic perturbations. In other words, the compound tests have relatively high specificity to positive selection. Among the 3 compound tests, HEW is usually the most powerful in detecting positive selection, whereas DH and DHEW are usually the most insensitive to background selection and demographic changes. Furthermore, by using the fact that the EW test is robust against recombination (Zeng et al. 2007Go), HEW and DHEW are less sensitive to recombination than is the DH test. Having tests that are robust against recombination is desirable for 2 reasons. First, recombination usually reduces the power of the tests to detect positive selection. Second, although it is possible to incorporate recombination into the null model, estimating recombination rate is usually difficult and the estimates are usually not very accurate (Wall 2000Go).

Compared with Fay and Wu's H, the compound tests are better in the following ways: (1) HEW and DHEW are more robust against the presence of intragenic recombination. (2) HEW and DHEW are more powerful in detecting positive selection. (3) The power of all 3 compound tests decreases at a much lower rate after fixation. (4) The compound tests, especially DH and DHEW, are less sensitive to population bottleneck and population subdivision. Another important issue is that the use of Fay and Wu's H requires the identification of ancestral/derived alleles at each polymorphic site. This is usually done by using sequences from a closely related species. However, the inference may be inaccurate due to the violation of the infinite-site model. Incorrect inference of the ancestral state usually results in a significantly negative H statistic (Baudry and Depaulis 2003Go). The effect of an incorrect inference on the 3 compound tests is reduced due to the inclusion of Tajima's D and/or the EW test, both of which are not dependent on the separation of ancestral/derived alleles (table 6). Among the compound tests, the DHEW test seems to be the most robust. In general, if the probability of an incorrect inference per site is lower than 5%, none of the tests tend to be seriously affected.


View this table:
[in this window]
[in a new window]

 
Table 6 The effect of incorrect inference of the ancestral state on the type I error rates of the tests

 
In the companion study (Zeng et al. 2007Go), we have shown that the EW test is insensitive to the presence of recombination, and is usually very powerful in detecting positive selection especially when recombination is high. However, this test is also very sensitive to background selection, population growth, and population bottlenecks. The results regarding HEW and DHEW reported above suggest that by using of the idea of a compound test we are able to use the merits of the EW test without being affected much by its sensitivity to other driving forces.

It is of some interest to compare the compound tests with tests based on patterns of linkage disequilibrium (e.g., Sabeti et al. 2002Go). In the companion paper (Zeng et al. 2007Go), we have shown that when the site under positive selection is known and is used as the core, the extended haplotype homozygosity (EHH) test (Sabeti et al. 2002Go) is very powerful; whereas when this prior information about the selected site is not available, the test has little power. Therefore, the compound tests should be more powerful than the EHH test, unless the site under selection is known. Recently, Voight et al. (2006)Go proposed the integrated haplotype score (iHS) test, which can be seen as an extension of the EHH test. Unlike the EHH test, the iHS test has power to detect positive selection when the site under selection is not included in the data set (Voight et al. 2006Go). However, some preliminary results suggest that the iHS test can be less conservative than the compound tests under certain demographic schemes. We are currently preparing a manuscript to further investigate the differences between LD-based methods and the compound tests.


    Conclusion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
We have proposed 2 new compound tests, HEW and DHEW, by combining Fay and Wu's H or DH with the Ewens-Watterson (EW) test. Compared to the DH test, these 2 new tests are: (1) more robust against recombination, (2) more powerful in detecting positive selection. Summarizing the results, we conclude that: (1) the 3 compound tests (DH, HEW and DHEW) are most suitable for detecting ongoing or recently fixed selective sweeps; (2) DH and DHEW tend to have higher specificity to positive selection than HEW. In practice, when only site-frequency data is available, we suggest using the DH test due to its high specificity. If haplotype phase is also known, the DHEW test may be a good choice since it automatically takes into account the effect of recombination, and also has high specificity to positive selection. The HEW test, however, should be used when the population does not show much geographical structure and is not greatly affected by recent bottleneck events.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary figures S1 through S5 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 
We are very grateful to Dr. Richard Hudson for his help during the course of this study. We thank Dr. Thomas Nagylaki for his helpful comments. Thanks are also due to Drs Supriya Kumar, Graham Coop, Shuhei Mano, Eric Hungate, and 2 anonymous reviewers who helped us improve the manuscript. K.Z. is supported by Sun Yat-sen University and the Kaisi Fund. S.S. is supported by grants from the National Natural Science Foundation of China (30230030, 30470119, 30300033, and 30500049). C.W. is supported by National Institutes of Health grants and an OOCS grant from the Chinese Academy of Sciences.


    Footnotes
 
1 Present address: 1101 E 57th Street, Chicago, IL 60637 Back

Jianzhi Zheng, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Conclusion
 Supplementary Material
 Acknowledgements
 References
 

    Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol (2004) 2:e286.[CrossRef][Medline]

    Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. (2002) 12:1805–1814.[Abstract/Free Full Text]

    Andolfatto P, Przeworski M. Regions of lower crossing over harbor more rare variants in African populations of Drosophila melanogaster. Genetics (2001) 158:657–665.[Abstract/Free Full Text]

    Barton NH. The effect of hitch-hiking on neutral genealogies. Genet. Res. (1998) 72:123–133.[CrossRef][Web of Science]

    Baudry E, Depaulis F. Effect of misoriented sites on neutrality tests with outgroup. Genetics (2003) 165:1619–1622.[Abstract/Free Full Text]

    Biswas S, Akey JM. Genomic insights into positive selection. Trends Genet. (2006) 22:437–446.[CrossRef][Web of Science][Medline]

    Braverman JM, Hudson RR, Kaplan NL, Langley CH, Stephan W. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics (1995) 140:783–796.[Abstract]

    Carlson CS, Thomas DJ, Eberle MA, Swanson JE, Livingston RJ, Rieder MJ, Nickerson DA. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. (2005) 15:1553–1565.[Abstract/Free Full Text]

    Charlesworth B. Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet Res. (1996) 68:131–149.[Web of Science][Medline]

    Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics (1993) 134:1289–1303.[Abstract]

    Charlesworth D, Charlesworth B, Morgan MT. The pattern of neutral molecular variation under the background selection model. Genetics (1995) 141:1619–1632.[Abstract]

    Depaulis F, Mousset S, Veuille M. Power of neutrality tests to detect bottlenecks and hitchhiking. J Mol Evol (2003) 57(Suppl. 1):S190–200.[CrossRef][Web of Science][Medline]

    Ewens WJ. The sampling theory of selectively neutral alleles. Theor Popul Biol. (1972) 3:87–112.[CrossRef][Web of Science][Medline]

    Ewens WJ. Mathematical population genetics (2004) Berlin: Springer-Verlag. 188–191.

    Fay JC, Wu C-I. Hitchhiking under positive Darwinian selection. Genetics (2000) 155:1405–1413.[Abstract/Free Full Text]

    Fay JC, Wu C-I. Sequence divergence, functional constraint, and selection in protein evolution. Annu Rev Genomics Hum Genet. (2003) 4:213–235.[CrossRef][Web of Science][Medline]

    Fu Y-X. Statistical properties of segregating sites. Theor Popul Biol. (1995) 48:172–197.[CrossRef][Web of Science][Medline]

    Fu Y-X. Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics (1997) 147:915–925.[Abstract]

    Gillespie JH. The cause of molecular evolution (1991) Oxford: Oxford University Press.

    Golding B. The effect of purifying selection on genealogies. In: Progress in population genetics and human evolution—Donnelly P, Tavaré S, eds. (1997) New York, NY: Springer-Verlag. 271–285.

    Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. (1983) 23:183–201.[CrossRef][Web of Science][Medline]

    Hudson RR. Gene genealogies and the coalescent process. In: Oxford surveys in evolutionary biology—Futuyma D, Antonovics J, eds. (1990) New York, NY: Oxford University Press. 1–44.

    Hudson RR. The how and why of generating gene genealogies. In: Mechanisms of molecular evolution—Takahata N, Clark AG, eds. (1993) Sunderland, MA: Sinauer. 23–36.

    Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics (2002) 18:337–338.[Abstract/Free Full Text]

    Hudson RR, Kaplan NL. Gene trees with background selection. In: Non-neutral evolution: theories and molecular data—Golding B, ed. (1994) London: Chapman & Hall. 140–153.

    Hudson RR, Kaplan NL. Deleterious background selection with recombination. Genetics (1995) 141:1605–1617.[Abstract]

    Hudson RR, Kreitman M, Aguade M. A test of neutral molecular evolution based on nucleotide data. Genetics (1987) 116:153–159.[Abstract/Free Full Text]

    Innan H. Modified Hudson-Kreitman-Aguade test and two-dimensional evaluation of neutrality tests. Genetics (2006) 173:1725–1733.[Abstract/Free Full Text]

    Jensen JD, Kim Y, DuMont VB, Aquadro CF, Bustamante CD. Distinguishing between selective sweeps and demography using DNA polymorphism data. Genetics (2005) 170:1401–1410.[Abstract/Free Full Text]

    Karlin S, McGregor J. Addendum to a paper of W. Ewens. Theor Popul Biol. (1972) 3:113–116.[CrossRef]

    Kim Y, Nielsen R. Linkage disequilibrium as a signature of selective sweeps. Genetics (2004) 167:1513–1524.[Abstract/Free Full Text]

    Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics (2002) 160:765–777.[Abstract/Free Full Text]

    Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics (1969) 61:893–903.[Free Full Text]

    Kimura M. Average time until fixation of a mutant allele in a finite population under continued mutation pressure: Studies by analytical, numerical, and pseudo-sampling methods. Proc Natl Acad Sci USA (1980) 77:522–526.[Abstract/Free Full Text]

    Kimura M. The neutral theory of molecular evolution (1983) Cambridge: Cambridge University Press.

    Li H, Stephan W. Inferring the demographic history and rate of adaptive substitution in Drosophila. PLoS Genet (2006) 2:e166.[CrossRef][Medline]

    Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. (1974) 23:23–35.[Web of Science][Medline]

    Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. (2005) 39:197–218.[CrossRef][Web of Science][Medline]

    Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C. Genomic scans for selective sweeps using SNP data. Genome Res. (2005) 15:1566–1575.[Abstract/Free Full Text]

    Przeworski M. The signature of positive selection at randomly chosen loci. Genetics (2002) 160:1179–1189.[Abstract/Free Full Text]

    Przeworski M, Charlesworth B, Wall JD. Genealogies and weak purifying selection. Mol Biol Evol. (1999) 16:246–252.[Abstract]

    Przeworski M, Wall JD, Andolfatto P. Recombination and the frequency spectrum in Drosophila melanogaster and Drosophila simulans. Mol Biol Evol. (2001) 18:291–298.[Abstract/Free Full Text]

    Sabeti PC, Reich DE, Higgins JM, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature (2002) 419:832–837.[CrossRef][Medline]

    Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES. Positive natural selection in the human lineage. Science (2006) 312:1614–1620.[Abstract/Free Full Text]

    Simonsen KL, Churchill GA, Aquadro CF. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics (1995) 141:413–429.[Abstract]

    Slatkin M. An exact test for neutrality based on the Ewens sampling distribution. Genet Res. (1994) 64:71–74.[Web of Science][Medline]

    Stephan W, Song YS, Langley CH. The hitchhiking effect on linkage disequilibrium between linked neutral loci. Genetics (2006) 172:2647–2663.[Abstract/Free Full Text]

    Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics (1983) 105:437–460.[Abstract/Free Full Text]

    Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics (1989) 123:585–595.[Abstract/Free Full Text]

    Teshima KM, Coop G, Przeworski M. How reliable are empirical genomic scans for selective sweeps? Genome Res. (2006) 16:702–712.[Abstract/Free Full Text]

    Thornton K, Andolfatto P. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics (2006) 172:1607–1619.[Abstract/Free Full Text]

    Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol (2006) 4:e72.[CrossRef][Medline]

    Wall JD. Recombination and the power of statistical tests of neutrality. Genet Res. (1999) 74:65–79.[CrossRef][Web of Science]

    Wall JD. A comparison of estimators of the population recombination rate. Mol Biol Evol. (2000) 17:156–163.[Abstract/Free Full Text]

    Wall JD, Andolfatto P, Przeworski M. Testing models of selection and demography in Drosophila simulans. Genetics (2002) 162:203–216.[Abstract/Free Full Text]

    Wall JD, Hudson RR. Coalescent simulations and statistical tests of neutrality. Mol Biol Evol. (2001) 18:1134–1135. author reply 1136–1138.[Free Full Text]

    Wang ET, Kodama G, Baldi P, Moyzis RK. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA (2006) 103:135–140.[Abstract/Free Full Text]

    Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. (1975) 7:256–276.[CrossRef][Web of Science][Medline]

    Watterson GA. The homozygosity test of neutrality. Genetics (1978) 88:405–417.[Abstract/Free Full Text]

    Wright S. Evolution in Mendelian populations. Genetics (1931) 16:97–159.[Free Full Text]

    Zeng K, Fu Y.-X, Shi S, Wu C-I. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics (2006) 174:1431–1439.[Abstract/Free Full Text]

    Zeng K, Mano S, Shi S, Wu C-I. Comparisons of site- and haplotype-frequency methods for detecting positive selection. Mol Biol Evol (2007) 24:1562–1574.[Abstract/Free Full Text]

Accepted for publication June 4, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Plant CellHome page
Y. Yu, T. Tang, Q. Qian, Y. Wang, M. Yan, D. Zeng, B. Han, C.-I Wu, S. Shi, and J. Li
Independent Losses of Function in a Polyphenol Oxidase in Rice: Differentiation in Grain Discoloration between Subspecies and the Role of Positive Selection under Domestication
PLANT CELL, November 1, 2008; 20(11): 2946 - 2959.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Lu, Y. Fu, S. Kumar, Y. Shen, K. Zeng, A. Xu, R. Carthew, and C.-I Wu
Adaptive Evolution of Newly Emerged Micro-RNA Genes in Drosophila
Mol. Biol. Evol., May 1, 2008; 25(5): 929 - 938.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/8/1898    most recent
msm119v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zeng, K.
Right arrow Articles by Wu, C.-I
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?