Skip Navigation


MBE Advance Access originally published online on February 24, 2007
Molecular Biology and Evolution 2007 24(5):1122-1129; doi:10.1093/molbev/msm032
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/5/1122    most recent
msm032v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Orengo, D. J.
Right arrow Articles by Aguadé, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Orengo, D. J.
Right arrow Articles by Aguadé, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Genome Scans of Variation and Adaptive Change: Extended Analysis of a Candidate Locus Close to the phantom Gene Region in Drosophila melanogaster

Dorcas J. Orengo and Montserrat Aguadé

Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain

E-mail: dorcasorengo{at}ub.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Nucleotide variation in populations originating from the recent range expansion of a species should reflect their adaptation to new habitats as well as their demographic history. A survey of nucleotide variation at 109 noncoding X-chromosome fragments in a European population of Drosophila melanogaster allowed identifying some candidates to have been recently affected by positive selection. Adaptive changes leave a spatial differential footprint that can be used to discriminate among candidates by extending their study to neighboring regions. Here, we surveyed variation at an ~190-kb region spanning a locus exhibiting a significantly skewed frequency spectrum. A stretch of ~12 kb with reduced variation was detected within a continuously sequenced region that included the focal fragment. Moreover, the regions flanking this stretch exhibited an excess of high-frequency derived variants. Application of maximum likelihood ratio and goodness-of-fit tests suggested that the pattern of variation detected at the studied region (at cytological bands 17C–17D) might have been shaped by a recent selective change, most probably at or around the phantom gene that encodes CYP306A1, a cytochrome P450 enzyme in the ecdysteroidogenic pathway.

Key Words: Drosophila melanogaster • nucleotide polymorphism • positive selection


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Natural selection is the evolutionary force underlying adaptation with purifying selection maintaining existing adaptations through the removal of deleterious mutations, and positive selection being responsible for adaptive changes by driving advantageous mutations to fixation. At the molecular level, the action of positive selection at a particular nucleotide site affects the level and pattern of variation at linked neutral sites (hitchhiking effect; Maynard-Smith and Haigh 1974Go; Kaplan et al. 1989Go) with the extent of the effect being dependent on both the strength of selection (s) and the rate of recombination (r). Purifying selection also affects linked variation (background selection; Charlesworth et al. 1993Go) but its effect on patterns of variation is rarely noticeable in regions of nonrestricted recombination (Charlesworth et al. 1995Go). Evolutionary processes other than selection also affect levels and patterns of variation and can in some cases mimic the effect of positive selection (Aguadé et al. 2004Go).

Efforts to uncover the footprint left by selection on linked neutral variation have classically used either a candidate region approach (i.e., regions of restricted recombination where recurrent directional selection would leave a very extensive footprint of reduced variation; for initial examples of this approach in Drosophila, see Aguadé et al. 1989Go; Stephan and Langley 1989Go) or a candidate gene approach (i.e., genes involved in adaptive characters, given the narrow footprint expected around the target of selection in regions of nonrestricted recombination; see e.g., Kreitman and Hudson 1991Go; Aguadé et al. 1992Go). Whole-genome sequences of model species have led to the development of new multilocus approaches to uncover the footprint of positive selection and more specifically to distinguish the locus-specific footprint of adaptive events from the genome-wide footprint of demographic events. Genome scans of variation constitute one such approach that aims to characterize the empirical distribution of the level and pattern of variation in natural populations and therefore to identify outliers by means of different summary statistics (Glinka et al. 2003Go; Kauer et al. 2003Go; Orengo and Aguadé 2004Go; Ometto et al. 2005Go; Voight et al. 2005Go).

A recent survey of nucleotide variation in 109 noncoding DNA fragments across the X chromosome in a European population of Drosophila melanogaster (Orengo and Aguadé 2004Go) revealed that a simple bottleneck scenario could not fully explain the pattern of variation detected and therefore pointed to the action of positive selection in the out-of-Africa expansion of the species. Although the deviations detected in outliers may not reflect recent adaptive changes, outliers can be considered an enriched sample of candidate regions whose variation might have been modeled by natural selection rather than by demographic history. It is therefore important to extend the analysis of outliers to neighboring regions in order to detect the spatially differential footprint left by real positives (i.e., due to adaptive changes) from the more stochastic pattern associated with false positives.

Here, fragment 94 in Orengo and Aguadé (2004)Go is considered the focal fragment (henceforth fragment 0) in an extended study that includes 16 additional fragments spanning over ~190 kb (fig. 1). Fragment 0 corresponds to a 1,084-bp intergenic fragment on band 17D1, which is flanked at some distance (9.9 kb and 12.6 kb, respectively) by the annotated genes CG6696 and phantom (phm). The frequency spectrum of polymorphic variants in fragment 0, with 6 singletons and 1 doubleton, showed a significant excess of low-frequency variants (as revealed by Tajima's D statistic). Moreover, no variation was detected in the 3' half of the fragment. If these results were indeed indicative of a recent selective event, the location of fragment 0 in a rather large noncoding and normally recombining region and its distance to the neighboring coding regions would imply rather strong selection in a protein-coding region. Alternatively, it could reflect a similarly strong or even a milder event affecting a noncoding regulatory signal. The present study aims therefore at assessing the positive or negative character of a frequency-spectrum outlier in a genome scan and at the possible identification of a coding or regulatory advantageous mutation (Andolfatto 2005Go).


Figure 1
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Schematic representation of the ~190-kb region studied. Horizontal lines across the figure represent the whole-genome physical (in kb, thin line) and cytological (band, thick line) scales. Solid bars with numbers correspond to the sequenced fragments. Fragments are named according to their position in kilobases relative to the focal fragment (i.e., fragment 0; see text). Arrowed bars indicate genes and their transcription direction. The box in the lower part of the figure is an enlarged representation of the 0/32 interval, which includes a 20-kb continuous sequenced fragment spanning the phm and Cyp18a1 genes. Fragments –120, –82, and 0 correspond to fragments 92, 93, and 94 in Orengo and Aguadé (2004)Go.

 

    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Drosophila Strains
The same 13 X-isochromosomal lines of Drosophila melanogaster and the one isofemale line of Drosophila simulans from the Sant Sadurní d'Anoia (Catalonia, Spain) population analyzed in Orengo and Aguadé (2004)Go were used in the present study. Moreover, available genome sequences from 5 D. simulans strains were used to generate a consensus sequence for all new fragments studied (http://www.dpgp.org/syntenic_assembly). The available Drosophila yakuba and Drosophila sechellia genome sequences (http://flybase.org) were also used.

Fragment Selection
Release 4 of the D. melanogaster genome sequence (http://flybase.org) was used to select 16 fragments (and to design the corresponding amplification and sequencing primers) at increasing distance from the focal fragment (i.e., fragment 0). The following criteria were used for choosing fragments and for designing polymerase chain reaction amplification oligonucleotides: 1) noncoding regions as determined by Flybase Genome Browser inspection (http://flybase.org), 2) amplification fragment ~800–900 bp, 3) absence of homonucleotide runs longer than 6 in the database sequence, and 4) unique sequence in the genome confirmed by Blast analysis (http://flybase.org). The list of the successful amplification primer pairs and conditions is available from the authors. Fragments were named according to their position in kilobases relative to fragment 0 with negative and positive values indicating upstream and downstream locations, respectively. Coordinate 1 corresponds to the midpoint nucleotide position in the flybase sequence of fragment 0 (site 18,506,613 in release 4).

DNA Extraction, Amplification, and Sequencing
Genomic DNA was obtained using a quick DNA extraction procedure (protocol 48 in Ashburner [1989]Go). Amplicons for the different fragments were purified as described in Dean et al. (2003)Go and used directly as templates for sequencing with the ABI PRISM version 3.1 kit (Applied Biosystems, Foster City, CA) according to manufacturer's conditions. Sequencing reactions were ethanol precipitated and later separated on an ABI PRISM 3730 sequencer (PerkinElmer, Norwalk, CT). Fragments were sequenced on both strands. Chromatograms were in all cases visually inspected and all polymorphic sites checked both in each line and across lines. Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession numbers AM411681–AM411860 [GenBank] .

Sequence Analysis
Sequences were assembled and multiple aligned using the DNASTAR (Madison, WI) software package, and the multiple alignments were later edited with the MacClade version 3.06 program (Maddison WP and Maddison DR 1992Go). The DnaSP version 4.10.1 program (Rozas et al. 2003Go) was used to estimate nucleotide diversity ({pi}, Nei 1987Go) and different summary statistics (D, Tajima 1989Go; H, Fay and Wu 2000Go). Tests of neutrality were performed with the mlcoalsim program (Ramos-Onsins and Mitchell-Olds 2007Go). Statistical significance was in all cases obtained by computer simulations of the coalescent process under recombination and conditioning on theta. The recombination population parameter used (R = 2Nr, given the lack of recombination in Drosophila males) was based on the estimated rate of recombination for the phm gene at the 17D region (r = 2.27 x 10–8 recombination rate/base pair/generation; Hey and Kliman 2002Go) and N = 106 for D. melanogaster (as in Orengo and Aguadé 2004Go). Simulations were performed both under stationarity (standard neutral model, henceforth SNM), and under the 2 simple and possible bottleneck scenarios outlined in Orengo and Aguadé (2004)Go: stepwise bottlenecks of intermediate severity (Sb = 0.33) and rather recent time of onset (Tb = 0.02 and 0.03, respectively, in units of 3N generations given the X-linked character of the region studied). The Hudson–Kreitman–Aguadé (HKA; Hudson et al. 1987Go) multilocus test was performed using the HKA program distributed by Jody Hey through http://lifesci.rutgers.edu/~heylab.

Analysis of Selection and Sweep Localization
The composite likelihood method of Kim and Stephan (2002)Go for detecting positive selection along a recombining chromosome was applied to the entire 190,627-bp region surveyed using their clsw and ssw programs. In those cases with missing information for a particular strain and fragment (one strain in each of three fragments), analyses were performed considering that the polymorphic sites at these missing sequences had the ancestral state. This is a conservative assumption, as indicated by the similar results obtained when the 3 fragments with one missing sequence were excluded from the analysis (see Results and Discussion).

The clsw program was used to obtain the composite likelihood ratio (CLR) of the observed data set for test B (Kim and Stephan 2002Go), under the assumption of recombination (R = 0.0455 per nt). Availability of the D. simulans and D. yakuba sequences allowed using the likelihood ratio 1 (LR1) option of the program, which assumes that the derived state of a segregating site is known. In the few cases where the derived state could not be unambiguously established, either the D. sechellia sequence was used or the nucleotide with the higher frequency was assumed to be the ancestral one.

The ms program (Hudson 2002Go) was used to generate neutral genealogies of the region under study both under the SNM and the 2 bottleneck scenarios outlined above. Simulations were performed under recombination and conditioned on theta. The simulated data sets were run through clsw in order to obtain the corresponding P values of the observed data set. The low P values obtained (see Results) led us to perform a general goodness-of-fit (GOF) test (Jensen et al. 2005Go). The ssw program was used to generate genealogies under a selective sweep model based on the number of segregating sites in the empirical data set and using estimates of the location of the advantageous mutation and the strength of selection ({alpha}) obtained by the clsw program.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Levels of Polymorphism and Divergence at the 17C–17D Region
In order to investigate the level and pattern of nucleotide variation around fragment 0 (fragment 94 in Orengo and Aguadé 2004Go), which exhibited a significantly skewed frequency spectrum, we initially extended the study to include 17 fragments of noncoding DNA, spaced generally ~10 kb and spanning over ~190 kb (fig. 1 and table 1). According to Release 4.3 Mar 2006 of the Drosophila melanogaster genome sequence, 11 fragments were in intergenic regions, 1 in a 3' untranslated region (3'UTR), and 5 within introns. Three fragments exhibited no variation, whereas a total of 94 nucleotide polymorphic sites (46 with singletons) were detected in the 14 polymorphic fragments (supplementary fig. S1, Supplementary Material online). The average nucleotide diversity ({pi}17 = 0.0017) was 2-fold lower than the average value for the 109 fragments of the X chromosome scan ({pi}109 = 0.0038; Orengo and Aguadé 2004Go). Seven contiguous fragments (–41 to 22) exhibited a level of nucleotide polymorphism lower than {pi}17. The average {pi} value for the 7 fragments along this 63-kb region ({pi}7 = 0.0006) was well below the empirical average from the genome scan (P = 0.091). The average divergence for these 7 fragments (K7 = 0.051) was similar both to the average for all 17 fragments in the present study (K17 = 0.047) and to the average value from the genome scan (K109 = 0.051; 95% confidence interval [CI], 0.019–0.086). The reduced polymorphism in the 7 fragments was therefore not the result of an especially low mutation rate in this region.


View this table:
[in this window]
[in a new window]

 
Table 1 Nucleotide Polymorphism and Divergence

 
Pattern of Polymorphism at the 17C–17D Region and Neutrality Tests
The initially salient feature of variation at the focal fragment was an excess of low-frequency variants. The Tajima D value (D = –1.77) was significantly negative (1-tailed test) under the SNM (P = 0.005) and also under the 2 simple bottleneck scenarios considered (P = 0.05 and P = 0.04, respectively). Comparisons with D. simulans allowed establishing that the 6 singletons in this fragment had the ancestral state, leading to an excess of high-frequency derived variants. The Fay and Wu's H value (H = –8.10) was significantly negative under the SNM (P < 0.001) and also significant or highly significant (P = 0.006 and P = 0.003, respectively) under the bottleneck scenarios.

In the extended analysis, the frequency spectrum in the 13 additional fragments surveyed that were polymorphic was generally skewed toward low-frequency ancestral variants, as indicated by the generally negative values of Fay and Wu's H statistics (table 1). In order to establish significance of individual values, simulations were performed under recombination and independently for each fragment both under the SNM and under the 2 simple bottleneck scenarios considered. Although most D and H values in the extended analysis were negative (11 out of 14), deviations were in most cases not significant or only significant under the SNM (table 1). Similar results were obtained when simulations were performed under recombination for a single ~190-kb region, and statistical significance established from the simulated genealogies for each fragment (results not shown).

Test statistics were also obtained for the concatenated data set, that is, considering all fragments together: Dc = –1.20 and Hc = –23.23. In this more conservative case, both statistics were significant (or highly significant) not only under the SNM (P < 0.001 in both cases) but also under the 2 bottleneck scenarios considered above (for Sb = 0.33 and Tb = 0.02, P(D) = 0.050 and P(H) < 0.001; for Sb = 0.33 and Tb = 0.03, P(D) = 0.030 and P(H) < 0.001). Therefore, the detected deviation in the frequency spectrum cannot be explained by the recent demographic history of the population, at least under plausible parameters for the 109 fragments analyzed in the genome scan (Orengo and Aguadé 2004Go).

Sweep Detection and Localization by Maximum Likelihood
The composite likelihood method of Kim and Stephan (2002)Go for detecting positive selection along a recombining chromosome was applied to the entire 190,627-bp sequence (–120/70 interval in table 2; fig. 1). The LR obtained yielded a highly significant result (table 2). A similar result was obtained when fragments –120, –110, and –95 with 1 missing sequence each (see Materials and Methods) were excluded from the analysis (table 2). Application of a GOF test (Jensen et al. 2005Go) revealed in both cases that the deviation detected by the CLR test was not the result of recent demographic changes (P = 0.145 and P = 0.140, respectively). For the 2 bottleneck scenarios considered (see above; Orengo and Aguadé 2004Go), the distribution of the GOF statistic probabilities for false positives (i.e., genealogies simulated under each bottleneck scenario that yielded a significant CLR test) was indeed very skewed, with over 88% of the distribution with P ≤ 0.05. These bottleneck scenarios cannot, therefore, account for the P values obtained with the CLR test from the observed data. In fact, the LR value for the –120/70 interval was highly significant not only under the SNM but also under the 2 bottleneck scenarios considered (with P values 0.009 and 0.010, respectively).


View this table:
[in this window]
[in a new window]

 
Table 2 CLR and GOF Tests for Positive Selection

 
The CLR test allows estimating the strength ({alpha} = 1.5 Ns, given the X-linked character of the studied region) as well as the location of the target of selection. The estimated selection parameter is high both when considering the complete –120/70 interval ({alpha} = 23977.8) and only the –82/70 interval ({alpha} = 21768.1). Assuming N = 106 (as in Orengo and Aguadé 2004Go), the estimated selection coefficient would be 1.6 x 10–2 and 1.45 x 10–2, respectively. The maximum LR value, and therefore the most likely location for the target of selection, lies in both cases within fragment 10 at positions 10011 and 10062, respectively. Because the likelihood ratio surface presented several other peaks, the CLR analysis for the –120/70 interval was repeated 17 times by removing in each case a different fragment. Results were in all cases similar to those of the initial analysis that included all 17 fragments (table 2), with LR values varying between 17.73 and 27.16 and GOF values varying between 1002.2 and 1195.2. In 16 of the 17 cases, the maximum LR value was within a 300-bp stretch of fragment 10. Although this could seem an indication of a rather narrow CI around the estimated target of selection, the sequence discontinuity of the data set and its possible impact on the CLR analysis as well as the limitations of the method itself should be taken into account (Kim and Stephan 2002Go; Pool et al. 2006Go).

Further Evolutionary Characterization of the 17D Region
The presence of 2 known genes (phm and Cyp18a1; fig. 1) close to the putative target of selection (at fragment 10) led us to completely sequence the region spanning from the distal (5') part of the Cyp18a1 gene to fragment 0 (i.e., from site –538 to site 19,863, henceforth named fragment 0–20; fig. 1). This fragment exhibited an overall low level of polymorphism ({pi} = 0.0009) with singletons at most polymorphic sites (60 out of 87). Supplementary figure S2 (Supplementary Material online) gives a summary of polymorphism at the 0/32 interval that includes fragment 0–20 and fragments 22 and 32. Figure 2 reveals a trough of polymorphism between positions 6960 and 18960 (~12 kb) that could be extended to position 26700 at the midpoint between fragments 22 and 32 (~19.7 kb). The 2 flanks of this region depleted of variation exhibited an excess of high-frequency derived variants (fig. 2).


Figure 2
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Distribution of nucleotide diversity ({pi}) and of Fay and Wu's H statistic along the 0/32 interval. A scheme of the sequenced fragments (black bars) and genes (black arrowed bars) is given in the upper part of the figure. For the 0–20 fragment, the values given correspond to nonoverlapping 1,000-nt windows, whereas for the 22 and 32 fragments they correspond to the size of each fragment. Dashed lines in each panel correspond to the average nucleotide diversity for the 109 fragments in the genome scan (Orengo and Aguadé 2004Go) and to the expected H value under the SNM, respectively. Lines labeled as 12 and 19.7 kb indicate the size of the regions considered to harbor reduced polymorphism.

 
The CLR analysis was performed again considering the 0/32 interval. Results of this analysis were rather similar to those previously performed considering the 17 fragments. Indeed, the estimated LR value (36.94) was again highly significant under the SNM (table 2), and the P value associated to the GOF statistic obtained did also support the selective scenario. The similar results obtained in the analyses of the –120/70 and the 0/32 intervals relative to the null hypothesis of no selection would indicate that at least in this case the significant result obtained under the SNM in the first analysis was not mainly due to sequence discontinuity (Pool et al. 2006Go). Like in the –120/70 interval, the estimated LR value was also significant considering demographic history alone (P = 0.013 and P = 0.016 for the 2 bottleneck scenarios considered, respectively).

Under the selective scenario, the strength of selection and the location of the target of selection can be estimated through the CLR method (as mentioned above). The analysis considering the 0/32 interval yielded an approximately 4-fold lower estimate of the selection coefficient than the previous analyses considering the 17 fragments in the –120/70 interval, or the 14 fragments in the –82/70 interval (table 2). Indeed, the estimated selection coefficient would be 3.9 x 10–3 as compared with 1.6 x 10–2 and 1.45 x 10–2, respectively. Also, the new analysis predicted a slightly different location for the target of selection: at site 13784 versus sites 10011 and 10062 previously (table 2).

In an effort to explore the effect of the level of discontinuity (measured as the percentage of the region not covered by sequenced fragments) on selection coefficient estimates, the observed size of the region of reduced variation (either 12 or 19.7 kb; fig. 2) was compared with that expected (Stephan et al. 1992Go) from each estimated s value: s = 1.6 x 10–2 and s = 3.9 x 10–3 from variation at the –120/70 (92% uncovered) and 0/32 (35% uncovered) intervals, respectively. The relative reduction of variation (i.e., the ratio observed to expected level of variation) was also obtained from the data. Two expected levels of variation were used to calculate relative reduction: 1) the average level of variation detected at the 0/32 interval (which would underestimate the expected level of variation in the interval); and 2) the level of variation predicted from the multilocus HKA test, which was performed considering the 0/32 interval that includes fragment 0 and the remaining 108 fragments in the genome scan. Under 1, the expected sizes for the 12-kb stretch were 76 and 24 kb for the stronger and milder selection, respectively (160 and 56 kb for the 19.7-kb stretch). Under 2, the corresponding sizes for the 12-kb stretch were 54 and 15 kb (140 and 48 kb for the 19.7-kb stretch). Estimates under milder selection were much closer to the observed values. It would seem therefore that the method behaves better the lower discontinuity is (i.e., the higher coverage is) and likely in estimating not only the strength of selection but also the location of the target of selection. Under this assumption, the putative target of selection would rather be around site 13784, which is located in the phm gene that encodes CYP306A1, a cytochrome P450 enzyme in the ecdysteroidogenic pathway (Warren et al. 2004Go). However, given the close proximity between both estimated locations and the error associated to selective site estimation (Li and Stephan 2005Go; Glinka et al. 2006Go) as well as the observed window of reduced variation (fig. 2), it seemed worthwhile searching for possible functional targets of selection not only at the phm transcriptional unit, but also at an approximately 7-kb stretch of its upstream noncoding region.

Comparison of the phm transcriptional unit between D. melanogaster and D. simulans using D. yakuba as the out-group revealed 47 differences that had been fixed in the D. melanogaster lineage (i.e., after the split from the D. simulans lineage): 2 in the 5'UTR, 27 in the coding region (6 nonsynonymous and 21 synonymous), 17 in the introns, and 1 in the 3'UTR. All amino acid replacements (I44L, D144E, R307L, I320S, P485A, and P532S) were located in the {alpha} domain of the protein and could, therefore, affect the enzyme activity (Werck-Reichhart and Feyereisen 2000Go). Also, some of the changes in the gene untranslated regions (UTRs) and introns might affect translation. Comparison of the 7-kb noncoding stretch revealed 88 fixed differences, with only 3 located in regions conserved in the 3 species compared as well as in Drosophila pseudoobscura. However, no known regulatory motifs were identified in any of these conserved noncoding regions (results not shown), providing no support for their possible functionality. The putative target of selection would therefore more likely be in the phm transcriptional unit. No clear candidate can, however, emerge from our in silico search given the lack of functional information both on the different protein variants and the putative and unknown regulatory motifs in the gene UTRs and introns.

Genome Scan Outliers and Selective Sweeps
In D. melanogaster, genome scans of variation in derived populations, which aimed to uncover the action of positive selection through its effect on levels and patterns of variation, identified some outlier loci as candidates to have been the subject of adaptive change (Glinka et al. 2003Go; Kauer et al. 2003Go; Orengo and Aguadé 2004Go). These scans also led to the first proposals of likely demographic scenarios (i.e., bottleneck parameters) for the out-of-Africa range expansion of the species (Glinka et al. 2003Go; Orengo and Aguadé 2004Go; Haddrill et al. 2005Go; Ometto et al. 2005Go; Thornton and Andolfatto 2006Go). For a specific short region of the genome (as generally used in genome scans; Glinka et al. 2003Go; Orengo and Aguadé 2004Go; Ometto et al. 2005Go), demographic changes and positive selection may result in a similar deviation in the level and/or pattern of variation. Extending the study of particular outliers to neighboring regions may allow discriminating between the 2 possibilities through the combined use of the CLR (Kim and Stephan 2002Go) and the GOF methods (Jensen et al. 2005Go). The CLR method could yield false positives for nonstationary populations given that its null model is the SNM. However, application of the GOF test constitutes an indirect way to validate positive results of the CLR method. Also, the use of simulations performed under plausible demographic scenarios can corroborate positive results (see above). In the 17C–17D region, like in other candidate regions previously analyzed (Beisswanger et al. 2006Go; Glinka et al. 2006Go), this procedure clearly supported the action of positive selection. However, it was recently shown that correcting for demography alone did allow further but not complete discrimination of false positives and that correction for any ascertainment bias should be adopted (Thornton and Jensen 2007Go). Indeed, the focal fragment of these extended analyses was not randomly chosen among fragments of the genome scan, but because of its outlier character, which might greatly affect the results of the CLR analysis. In our case, the focal fragment (fragment 0) was chosen because it exhibited an excess of low-frequency variants, that is, a significantly negative Tajima's D value. In order to account for this outlier character of the focal fragment in the CLR analysis of the –120/70 interval, the analysis was performed again using data sets simulated under the bottleneck scenario, and whose focal fragment, based on its negative D value, would have been an outlier in the genome scan (Orengo and Aguadé 2004Go). The CLR analysis of the 0/32 interval was performed similarly, given that the nonrandomly chosen focal fragment (fragment 0) was included in this interval. Alternatively, the choice of the smaller interval for further analysis was accounted for by repeating the CLR analysis with data sets simulated under the bottleneck scenario, and with negative Fay and Wu's H values at the 2 flanking ~1-kb regions (fragments 0 and 32) at least as extreme as the less extreme of both values (Orengo and Aguadé, unpublished results). Both in the –120/70 and 0/32 interval analyses, the observed LR values lost significance when both the demographic history and the nonrandom choice of fragment were accounted for. In both cases, however, the conditional probabilities were rather low: 0.146 for the –120/70 interval, and 0.143 (or 0.151) for the 0/32 interval. Although at face values these results would preclude rejecting the null hypothesis of no selection at this region, different aspects of the CLR analysis performed need to be considered because they might affect these results and conclusion. The possible impact of sequence discontinuity on the CLR analysis is not well established, even if it does not seem to have a major effect here given the similar results obtained for the –120/70 and 0/32 intervals. More important might be the limitations of the method itself (Kim and Stephan 2002Go) and the simple demographic model used in these extended analyses. Given the previous caveats, the results obtained might be considered suggestive of the recent action of positive selection in the phm gene region, although further theoretical and analytical work is needed for its final validation or rejection (i.e., being a real or a false positive). In addition, functional studies might also be required to finally establish (or not) the adaptive character of positives validated through the analysis of linked regions.

As discussed above, regions affected by the recent fixation of a new advantageous mutation, that is, real-positive regions, should be among outliers. However, their status as an enriched sample of such regions in the available genome-scan data sets would deserve further verification. In any case, the outlier approach would facilitate detecting only those recent adaptive changes causing important distortions in the level and/or pattern of neutral linked variation, which may be a rather small fraction of recent adaptive changes (Teshima et al. 2006Go).


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
We thank D. Salguero, A. Fernández-Robles, and G. Blasco for excellent technical assistance and Serveis Científico-Tècnics from Universitat de Barcelona for automated sequencing facilities. We also thank D. Alvarez-Ponce, P. Librado, and F. G. Vieira for help with computer program implementation. Special thanks are given to S. Ramos-Onsins for sharing the mlcoalsim software for multilocus tests of neutrality prior to its publication and to J. Rozas and C. Segarra for comments on the manuscript. This work was supported by grants QLRT-2001-00004 from the European Community; BFU2004-02253 from Comisión Interdepartamental de Ciencia y Tecnología, Spain; 2005SGR-00166 from Comissió Interdepartamental de Recerca i Innovació Tecnològica, Generalitat de Catalunya, Spain; and by special support (Distinció per la Promoció de la Recerca Universitària) from Generalitat de Catalunya to M.A.


    Footnotes
 
Marcy Uyenoyama, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Aguadé M, Miyashita N, Langley CH. Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster. Genetics (1989) 122:607–615.[Abstract/Free Full Text]

    Aguadé M, Miyashita N, Langley CH. Polymorphism and divergence in the Mst26A male accessory gland gene region in Drosophila. Genetics (1992) 132:755–770.[Abstract]

    Aguadé M, Rozas J, Segarra C. Inferring the action of natural selection from DNA sequence comparisons: data from Drosophila. In: Evolution: from molecules to ecosystems—Moya A, Font E, eds. (2004) Oxford: Oxford University Press. 11–19.

    Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature (2005) 437:1149–1152.[CrossRef][Medline]

    Ashburner M. Drosophila: a laboratory handbook (1989) Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press.

    Bauer DuMont V, Aquadro CF. Multiple signatures of positive selection downstream of Notch on the X chromosome in Drosophila melanogaster. Genetics (2005) 171:639–653.[Abstract/Free Full Text]

    Beisswanger S, Stephan W, De Lorenzo D. Evidence for a selective sweep in the wapl region of Drosophila melanogaster. Genetics (2006) 172:265–274.[Abstract/Free Full Text]

    Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics (1993) 134:1289–1303.[Abstract]

    Charlesworth D, Charlesworth B, Morgan MT. The pattern of neutral molecular variation under the background selection model. Genetics (1995) 141:1619–1632.[Abstract]

    Dean MD, Ballard KJ, Glass A, Ballard JWO. Influence of two Wolbachia strains on population structure of East African Drosophila simulans. Genetics (2003) 165:1959–1969.[Abstract/Free Full Text]

    Fay JC, Wu C-I. Hitchhiking under positive Darwinian selection. Genetics (2000) 155:1405–1413.[Abstract/Free Full Text]

    Glinka S, De Lorenzo D, Stephan W. Evidence of gene conversion associated with a selective sweep in Drosophila melanogaster. Mol Biol Evol (2006) 23:1869–1878.[Abstract/Free Full Text]

    Glinka S, Ometto L, Mousset S, Stephan W, De Lorenzo D. Demography and natural selection have shaped genetic variation in Drosophila melanogaster: a multi-locus approach. Genetics (2003) 165:1269–1278.[Abstract/Free Full Text]

    Haddrill PR, Thornton KR, Charlesworth B, Andolfatto P. Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Genome Res (2005) 15:790–799.[Abstract/Free Full Text]

    Hey J, Kliman RM. Interactions between natural selection, recombination and gene density in the genes of Drosophila. Genetics (2002) 160:595–608.[Abstract/Free Full Text]

    Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics (2002) 18:337–338.[Abstract/Free Full Text]

    Hudson RR, Kreitman M, Aguadé M. A test of neutral molecular evolution based on nucleotide data. Genetics (1987) 116:153–159.[Abstract/Free Full Text]

    Jensen JD, Kim Y, DuMont VB, Aquadro CF, Bustamante CD. Distinguishing between selective sweeps and demography using DNA polymorphism data. Genetics (2005) 170:1401–1410.[Abstract/Free Full Text]

    Kaplan NL, Hudson RR, Langley CH. The "hitchhiking effect" revisited. Genetics (1989) 123:887–899.[Abstract/Free Full Text]

    Kauer MO, Dieringer D, Schlötterer C. A microsatellite variability screen for positive selection associated with the "out of Africa" habitat expansion of Drosophila melanogaster. Genetics (2003) 165:1137–1148.[Abstract/Free Full Text]

    Kim Y, Stephan W. Detecting the local signature of genetic hitchhiking along a recombining chromosome. Genetics (2002) 160:765–777.[Abstract/Free Full Text]

    Kreitman M, Hudson RR. Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from patterns of polymorphism and divergence. Genetics (1991) 127:565–582.[Abstract]

    Li H, Stephan W. Maximum-likelihood for detecting recent positive selection and localizing the selected site in the genome. Genetics (2005) 171:377–384.[Abstract/Free Full Text]

    Maddison WP, Maddison DR. MacClade: analysis of phylogeny and character evolution (1992) Sunderland (MA): Sinauer Associates. version 3.

    Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res (1974) 23:23–35.[ISI][Medline]

    Nei M. Molecular evolutionary genetics (1987) New York: Columbia University Press.

    Ometto L, Glinka S, De Lorenzo D, Stephan W. Inferring the effects of demography and selection on Drosophila melanogaster populations from a chromosome-wide scan of DNA variation. Mol Biol Evol (2005) 22:2119–2130.[Abstract/Free Full Text]

    Orengo DJ, Aguadé M. Detecting the footprint of positive selection in a European population of Drosophila melanogaster: multilocus pattern of variation and distance to coding regions. Genetics (2004) 167:1759–1766.[Abstract/Free Full Text]

    Pool LE, Bauer DuMont V, Mueller JL, Aquadro CF. A scan of molecular variation leads to the narrow localization of a selective sweep affecting both Afrotropical and cosmopolitan populations of Drosophila melanogaster. Genetics (2006) 172:1093–1105.[Abstract/Free Full Text]

    Ramos-Onsins SE, Mitchell-Olds T. mlcoalsim: multilocus coalescent simulations. Evolutionary Bioinformatics (2007) 2:41–44.

    Rozas J, Sánchez-DelBarrio JC, Meseguer X, Rozas R. DnaSP, DNA polymorphism analysis by the coalescent and other methods. Bioinformatics (2003) 19:2496–2497.[Abstract/Free Full Text]

    Stephan W, Langley CH. Molecular genetic variation in the centromeric region of the X chromosome in three Drosophila ananassae populations. I. Contrasts between the vermilion and forked loci. Genetics (1989) 121:89–99.[Abstract/Free Full Text]

    Stephan W, Wiehe T, Lenz MW. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor Popul Biol (1992) 41:237–254.[CrossRef][ISI]

    Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics (1989) 123:585–595.[Abstract/Free Full Text]

    Teshima KM, Coop G, Przeworski M. How reliable are empirical genomic scans for selective sweeps? Genome Res (2006) 16:702–712.[Abstract/Free Full Text]

    Thornton K, Andolfatto P. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics (2006) 172:1607–1619.[Abstract/Free Full Text]

    Thornton KR, Jensen JD. Controlling the false positive rate in multilocus genome scans for selection. Genetics (2007) 175:737–750.[Abstract/Free Full Text]

    Voight BF, Adams AM, Frisse LA, Qian Y, Hudson RR, Di Rienzo A. Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc Natl Acad Sci USA (2005) 102:18508–18513.[Abstract/Free Full Text]

    Warren JT, Petryk A, Marques G, et al, (12 co-authors). Phantom encodes the 25-hydroxylase of Drosophila melanogaster and Bombyx mori: a P450 enzyme critical in ecdysone biosynthesis. Insect Biochem Mol Biol (2004) 34:991–1010.[CrossRef][ISI][Medline]

    Werck-Reichhart D, Feyereisen R. Cytochromes P450: a success story. Genome Biol (2000) 1:reviews3003.1–reviews3003.9.

Accepted for publication February 16, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Beisswanger and W. Stephan
Evidence that strong positive selection drives neofunctionalization in the tandemly duplicated polyhomeotic genes in Drosophila
PNAS, April 8, 2008; 105(14): 5447 - 5452.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/5/1122    most recent
msm032v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Orengo, D. J.
Right arrow Articles by Aguadé, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Orengo, D. J.
Right arrow Articles by Aguadé, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?