MBE Advance Access originally published online on August 21, 2006
Molecular Biology and Evolution 2006 23(11):2203-2213; doi:10.1093/molbev/msl094
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Intron Length Evolution in Drosophila
Department of Molecular Biology and Genetics, Cornell University
E-mail: dvnp{at}mail.rochester.edu.
| Abstract |
|---|
|
|
|---|
I present data on the evolution of intron lengths among 3 closely related Drosophila species, D. melanogaster, Drosophila simulans, and Drosophila yakuba. Using D. yakuba as an outgroup, I mapped insertion and deletion mutations in 148 introns (spanning
30 kb) to the D. melanogaster and D. simulans lineages. Intron length evolution in the 2 sister species has been different: in D. melanogaster, X-linked introns have increased slightly in size, whereas autosomal ones have decreased slightly in size; in D. simulans, both X-linked and autosomal introns have decreased in size. To understand the possible evolutionary causes of these lineage- and chromosome-specific patterns of intron evolution, I studied insertiondeletion (indel) polymorphism and divergence in D. melanogaster. Small insertion mutations segregate at elevated frequencies and enjoy elevated probabilities of fixation, particularly on the X chromosome. In contrast, there is no detectable X chromosome effect on fixations in D. simulans. These findings suggest X chromosomespecific selection or biased gene conversiongap repair favoring insertions in D. melanogaster but not in D. simulans. These chromosome- and lineage-specific patterns of indel substitution are not easily explained by existing general population genetic models of intron length evolution. Genomic data from D. melanogaster further suggest that the forces described here affect introns and intergenic regions similarly.
Key Words: Drosophila melanogaster Drosophila simulans intron insertion deletion indel
| Introduction |
|---|
|
|
|---|
The genome of Drosophila melanogaster contains over 48,000 introns that must be replicated, transcribed, spliced from precursor mRNAs, and enzymatically degraded (Misra et al. 2002
Given that intron primary sequences are constrained, intron lengths might also be constrained. The distribution of intron lengths within and between species is certainly not easily explained by mutational bias in the relative rates of deletion versus insertion: if introns lacked length constraints, then the well-known deletion-biased mutation pressure (Petrov et al. 1996
; Petrov and Hartl 1998
; Blumenstiel et al. 2002
) would cause introns to decay and ultimately disappear. Instead, intron lengths in D. melanogaster range between 44 bp and >70 kb, with a strong mode at 58 bp (Mount et al. 1992
; Deutsch and Long 1999
; Adams et al. 2000
; Comeron and Kreitman 2000
; Misra et al. 2002
). Introns of
45 bp are believed to represent a minimum size in D. melanogaster, below which splicing reactions may be compromised (Guo et al. 1993
; Talerico and Berget 1994
). Deletions that reduce intron length below this minimum size are thus believed to be strongly deleterious (Mount et al. 1992
). In addition to direct purifying selection on minimum size, 2 findings suggest that selection influences intron size indirectly as a by-product of selection on functional elements in introns. First, intron sequence divergence is negatively related to intron length, as expected if longer introns comprise more functional elements (Hadrill et al. 2005
; Marais et al. 2005
). This correlation is strongest for first introns which, being nearest to transcription start sites, tend to be longer and to contain more cis-regulatory elements (Duret 2001
; Marais et al. 2005
). Second, deletions affecting Drosophila introns are less frequent and smaller than those affecting pseudogenes and "dead-on-arrival" transposons, consistent with functional constraints on intron content (Comeron and Kreitman 2000
; Ptak and Petrov 2002
; Ometto et al. 2005
). Therefore, in addition to selection on minimum size, the presence of functional elements also constrains intron size.
But not all sites in introns encode functional elements. One can therefore ask what population genetic forces influence the evolution of nonfunctional, presumably expendable, sequences in introns? Why, for instance, have some Drosophila species evolved longer introns than others (Moriyama et al. 1998
)? It seems unlikely that such species differences exist because the density of functional elements (or the complexity of gene regulation) is greater in one species than in its close relatives. Instead, it seems more plausible that some intron sequence is expendable and that some variation in intron length is due to differences in the amount of such sequence.
Two findings may shed light on the evolution of expendable intron sequence. First, in humans, nematodes, and Drosophila, intron length is negatively correlated with gene expression level (Castillo-Davis et al. 2002
; Urrutia and Hurst 2003
; Marais et al. 2005
), suggesting that selection eliminates expendable sequence in highly expressed genes to minimize the cost and/or rate of transcription (Carvalho and Clark 1999
; Castillo-Davis et al. 2002
; Urrutia and Hurst 2003
). Second, in D. melanogaster, a negative correlation exists between local recombination rate and intron size (Carvalho and Clark 1999
; Comeron and Kreitman 2000
). This finding, presumably reflecting the distribution of expendable intron sequence, cannot be explained by variation in the insertion-deletion (indel) mutational profile, as the degree of deletion-biased mutation does not vary with recombination rate (Comeron and Kreitman 2000
; Blumenstiel et al. 2002
). Instead, the longer introns of low recombination regions have been interpreted as evidence for some form of weak selection on intron length (Carvalho and Clark 1999
; Comeron and Kreitman 2000
).
Three weak selection models have been offered. In the first 2, intron length evolves as a consequence of Hill-Robertson effects (Hill and Robertson 1966
; Felsenstein 1974
; Gordo and Charlesworth 2001
). Hill and Robertson found that natural selection acting at one locus interferes with selection at linked loci (Hill and Robertson 1966
; Birky and Walsh 1988
; Hey 1999
). Recombination reduces linkage among loci, thereby alleviating interference and increasing the efficacy of natural selection. Thus, natural selection may be less effective at loci in low recombination regions of the genome as these experience more interference than loci in high recombination regions (Kliman and Hey 1993
; Betancourt and Presgraves 2002
; Hey and Kliman 2002
; Presgraves 2005
). In the first model, Carvalho and Clark (1999)
argue that genes in low recombination regions are less able to prevent the accumulation of weakly deleterious insertions. In the second model, however, Comeron and Kreitman (2000)
argue that the known deletion-biased mutation pressure would drive introns to smaller sizes; thus, the fact that introns are longer in low recombination regions suggests that insertions enjoy a relative advantage in regions of low versus high rates of recombination. Comeron and Kreitman (2000)
suggest that longer introns may be favored in low recombination regions as modifiers that increase recombination rates between adjacent exons and thus alleviate interference acting at many weakly selected sites, for example, synonymous sites (see also Comeron and Kreitman 2002
; Qin et al. 2004
; Comeron and Guthrie 2005
). The third model is also a weak selection model, but it does not attempt to explain the correlation between recombination rate and intron length. By studying the long-term evolution of 15 Drosophila introns, Parsch (2003)
inferred that small deletions are commonly fixed by mutation pressure, whereas larger, relatively rare insertions that restore optimal intron size are occasionally fixed by compensatory selection (see also Stephan et al. 1994
). Each of these models makes distinct, testable predictions, for example, that insertions are weakly deleterious (Carvalho and Clark 1999
), that insertions are weakly favorable in low recombination regions (Comeron and Kreitman 2000
), or that optimal intron size is maintained by the long-term balance of mutation pressure and compensatory selection on indel mutations (Parsch 2003
).
Here, I study indel polymorphism and divergence in a large collection of introns from D. melanogaster, Drosophila simulans, and Drosophila yakuba to better understand what forces have shaped intron length evolution. I find that lineage- and sex chromosome-specific forces, such as weak selection or biased gene conversion, act on insertions. These findings are not easily explained by a single general model of intron length evolution.
| Materials and Methods |
|---|
|
|
|---|
The Data
I gathered publicly available DNA sequences for 68 intron-containing genes with polymorphism data in D. melanogaster (genes, sample sizes, and references are provided in Supplementary Table 1, Supplementary Material online). In the analyses presented below, I included genes for which 6 or more chromosomes were sampled from the population. I also gathered homologous sequences from D. simulans and D. yakuba when available; when unavailable, I identified homologous sequences using Blast searches against the D. simulans and D. yakuba genome sequences. If multiple D. simulans and D. yakuba sequences were available, one from each species was chosen arbitrarily. To annotate sequences, I used GadFly version 4.3 of the D. melanogaster genome. All sequences were aligned using DIALIGN2 with default parameters (Morgenstern 1999
94 kb of coding sequence from regions flanking the introns. The average number of chromosomes sampled from D. melanogaster is 21.5 (median = 20), with samples ranging from n = 671. For some analyses, I focused on chromosomes sampled from African populations of D. melanogaster, where the average number of chromosomes sampled is 13.6 (median = 12) and samples ranging from n = 625. Polymorphism and divergence data for nucleotide changes were tallied using DnaSP v. 4.0 (Rozas et al. 2003
To investigate the effects of recombination rate on intron evolution, I used Kliman and Hey's (1993)
"KH93" estimator of local recombination rate that, like other genome-wide estimators (Hey and Kliman 2002
), is based on the relationship between the genetic and cytological maps of the D. melanogaster genome. The KH93 estimator is based on the fit of 4- and 5-term polynomials of the genetic and physical maps of each chromosome. KH93 is used here because, in a separate study, I found that it explained more variation in silent nucleotide variability for 98 loci scattered throughout the D. melanogaster genome than 5 alternative recombination rate estimators (Presgraves 2005
). In the analyses below, I consider 3 recombination rate classes, low (01.663 x 108 rec/bp/gen), medium (1.66403.327 x 108 rec/bp/gen), and high (3.3284.999 x 108 rec/bp/gen), that span equal ranges of recombination rates in the genome (Hey and Kliman 2002
).
Scoring Indels
The common ancestor of D. melanogaster and D. simulans diverged from D. yakuba
512 MYA and subsequently split into D. melanogaster and D. simulans
2.55 MYA (Li et al. 1999
; Tamura et al. 2004
). I used D. yakuba sequences as outgroups to map mutations onto the branches of the 3-species phylogeny using parsimony (Akashi 1994
) and to classify indel events as either insertions or deletions. Only indels that could be unambiguously classified as insertions or deletions were used in the analyses (see fig. 1). Of 1,076 indel events identified, 951 (88.3%) could be unambiguously mapped onto the branches of the phylogeny as either polymorphic in D. melanogaster (135), fixed in D. melanogaster (183), fixed in the D. simulans (136), or fixed in the branches connecting D. yakuba and the most recent common ancestor of D. melanogaster and D. simulans (540). In some cases, an indel could be classified as an insertion or deletion but, due to overlapping or contiguous indels, its length could not be unambiguously determined (fig. 1C). Therefore, only nonoverlapping indels were included in analyses involving indel length. To estimate intron lengths in each species, I calculated the gapless length of each intron; when intron lengths varied within species, I used the average length of the population sample.
|
Analysis of Polymorphic Mutation Frequencies
To compare the frequency spectra of different classes of mutation across loci, I first rescaled frequencies to a common sample size of n = 6 (i.e., the lowest common sample size in the data set), where 6 copies is equivalent to being fixed in the sample. Rescaling of frequencies for loci with n > 6 was done by first estimating a mutation's frequency from the entire sample and then multiplying by 6 to obtain the expected number in a random sample with n = 6. This procedure sacrifices information from larger samples (very high or low frequency variants are scored as fixed or absent, respectively, and thus are excluded from frequency analyses) while treating all mutation classes similarly and should therefore be conservative.
After rescaling, mutations were pooled across loci. I compared frequencies using nonparametric MannWhitney tests, and I compared the frequency spectra using Tajima's (1989) DTaj and Fu and Li's (1993)
DFu. I estimated DTaj and DFu using Fu's web application (http://hgc.sph.uth.tmc.edu/fu/genealogy/test2/welcome.html). Deviations from the standard neutral model were evaluated by estimating the probability of a value of D more extreme than that observed from the distribution obtained by simulating 1,000 neutral genealogies without recombination. The no-recombination assumption makes these tests conservative.
To test for heterogeneity in values of DTaj (and DFu) between 2 mutation classes (e.g., insertions versus deletions), I used the method and program of Hahn et al. (2002)
. The program performs neutral coalescent Monte Carlo simulations using a perl implementation of Hudson's (2002)
make_tree program to generate gene genealogies. For each contrast, I used the program to generate 1,000 neutral genealogies with no recombination using 2 values of
W, one for each mutation class (Watterson 1975
). For each genealogy simulated, D values were calculated for each of the 2 mutation classes and the difference, for example,
D = DinsertionDdeletion, was saved. The distribution of
D from these 1,000 neutral genealogies was then used to estimate the probability of obtaining
D as great or greater than the
D observed, that is, P(
D
Dobserved) (Hahn et al. 2002
). Significant values of
D reveal that different processes affect the frequency spectra of the 2 mutation classes. All probabilities are 2-tailed.
| Results |
|---|
|
|
|---|
Indel Polymorphism and Divergence
The data set consists of 148 introns from 68 genes. In D. melanogaster, 135 indel polymorphisms occurred in 43 introns from 36 genes. Table 1 shows that 73 indel polymorphisms are deletions and 62 are insertions. The small polymorphic deletion/insertion bias (PDB) of 1.18 is similar to those previously reported in Drosophila introns (Comeron and Kreitman 2000
|
In the D. melanogaster lineage, indel fixations have occurred in 70 introns from 51 genes; in the D. simulans lineage, indel fixations have occurred in 52 introns from 41 genes. The relative numbers of deletion to insertion fixations in the D. melanogaster (0.76) and D. simulans (2.68) lineages are significantly different (G = 27.76, P < 106; table 1). I used Tajima's (1993) relative rates test to evaluate the neutral expectation that similar numbers of deletions and insertions, respectively, have been fixed in the 2 lineages. Although the 2 species have fixed similar numbers of deletions (
2 = 2.25, P = 0.134; table 1), D. melanogaster has fixed significantly more insertions than D. simulans (
2 = 31.84, P < 107; table 1). This finding suggests that either 1) the indel mutational spectrum differs between species, with more insertions arising in the D. melanogaster lineage, or 2) different population genetic forces act on insertions and deletions in the 2 lineages, with insertions having higher probabilities of fixation in the D. melanogaster lineage than in the D. simulans lineage.
Nonneutral Indel Evolution
The above findings suggest that insertions and deletions may evolve differently. To compare their evolution against a neutral (or nearly neutral) standard, I performed MK tests (McDonald and Kreitman 1991
) contrasting the evolution of insertion and deletion mutations with silent (synonymous + intron) point mutations in D. melanogaster. The ratio of deletion to silent fixations in D. melanogaster does not differ from that for polymorphisms (G = 0.03, P = 0.860; table 1). In contrast, the ratio of insertion to silent fixations is significantly greater than that for polymorphisms (G = 8.25, P = 0.004; table 1). I performed similar contrasts using only synonymous changes as a rough neutral standard because there is increasing evidence that many intron sites are constrained (Andolfatto 2005
; Marais et al. 2005
) and because there is little evidence for current selection on preferred synonymous codons in the D. melanogaster lineage (Akashi 1995
; Akashi 1996
; McVean and Viera 2001
). As above, the ratio of deletions to synonymous mutations fixed does not differ from that for polymorphisms (G = 0.09, P = 0.765; table 1), but the ratio of insertions to synonymous mutations fixed is significantly greater than that for polymorphisms (G = 8.53, P = 0.004; table 1). Similar results hold in contrasts involving deletions and insertions, respectively, with noncoding point mutations in introns (deletions: G = 0.003, P = 0.960; insertions: G = 7.14, P = 0.008; table 1). Insertions in D. melanogaster introns thus appear to enjoy greater than neutral probabilities of fixation as if favored by some directional force, such as natural selection or biased gene conversion (Nagylaki 1983
).
Indel Evolution at X-linked versus Autosomal Introns
The findings above show that the ratio of deletions to insertions fixed in D. melanogaster is lower than that in D. simulans (table 1). This species difference holds for the 99 autosomal introns in the data set (G = 6.53, P = 0.011; table 2) but is especially strong for the 49 X-linked introns (G = 25.61, P = 7 x 107). Indeed, the ratio of deletions to insertions fixed in D. melanogaster is significantly lower for X-linked introns than autosomal ones (G = 8.42, P = 0.004; table 2). There is, however, no such Xautosome difference for indels fixed in the D. simulans lineage (G = 0.09, P = 0.761; table 2). These observations show that, in D. melanogaster, factors affecting either the mutational process or the substitution process (or both) differ between the X and autosomes.
|
Importantly, the PDB in D. melanogaster does not differ between the X and the autosomes (G = 1.22, P = 0.269; table 2), suggesting that the mutational spectra on the X and the autosomes are not different. Instead, the X and autosomes appear to differ in the probabilities of fixation of either deletions or insertions. There is no significant species difference in the numbers of deletions fixed at X-linked (
2 = 3.66, P = 0.056; table 2) or autosomal introns (
2 = 0.15, P = 0.700; table 2). However, D. melanogaster and D. simulans show strong differences in the numbers of insertions fixed at X-linked introns (
2 = 25.33, P = 5 x 107; table 2) and to a lesser but still significant extent at autosomal ones (
2 = 8.47, P = 0.004; table 2). In the D. melanogaster lineage, MK tests of insertions versus synonymous mutations show that the fixation probabilities of insertions are significantly elevated for X-linked introns (G = 8.32, P = 0.004; table 2) but not for autosomal ones (G = 1.35, P = 0.245; table 2). MK tests of deletions versus synonymous mutations reveal no evidence of selection on deletions at either X-linked (G = 0.001, P = 0.999; table 2) or autosomal introns (G = 0.14, P = 0.707; table 2). Thus, the elevated probabilities of fixation for insertions in D. melanogaster appear to be particularly strong for X-linked introns.
Indel Frequencies in African D. melanogaster Populations
To study the population frequencies of indels in D. melanogaster, I limited the analyses to samples obtained from African populations. This should minimize some of the confounding effects of the recent demographic expansion and heterogeneous sampling schemes of non-African samples (Andolfatto and Przeworski 2001
; Glinka et al. 2003
; Thornton and Andolfatto 2006
). African data were available for 83 introns from 33 genes with at least 6 chromosomes sampled from the population. Because there are too few indel mutations per locus to perform meaningful locus-by-locus analysis, I pooled mutations across loci. Before pooling, all frequencies were rescaled to a common sample size of n = 6 (see Materials and Methods). Table 3 shows the frequencies of 5 classes of mutation. There is significant heterogeneity in frequency among classes (KruskalWallis H = 18.02, P = 0.001), with nonsynonymous mutations segregating at the lowest frequencies, followed by deletion, noncoding and synonymous mutations (table 3). Insertions segregate at significantly higher frequencies than all 4 other classes of mutation (table 3). Two previous surveys of indel frequencies in D. melanogaster also found higher population frequencies for insertions than deletions (Comeron and Kreitman 2000
; Ometto et al. 2005
).
|
Comparing the frequency spectra of the different classes of mutation yields similar results. Tajima's DTaj and Fu and Li's DFu for deletions are both negative, suggesting an excess of deletions segregating at low frequency (fig. 2 and table 3). In contrast, DTaj and DFu for insertions are both positive, suggesting an excess of insertions segregating at intermediate frequency (fig. 2 and table 3). Although none of the test statistics deviate significantly from the standard neutral model (P > 0.05), these tests have little power given small samples like those used here (Simonsen et al. 1995
|
Given that there is a significant X-effect on the fixation probabilities of insertions, I performed separate frequency analyses for X-linked and autosomal mutations (table 4). X-linked mutations show significant heterogeneity in frequency among the 5 classes of mutation (KruskalWallis H = 10.589, P = 0.031), with insertions segregating at significantly higher frequencies than deletion, nonsynonymous, noncoding, and synonymous mutations (MannWhitney PMW
0.015 for all contrasts). None of the other classes of mutation differed significantly from one another (PMW
0.104 for all contrasts). These X chromosome findings hold in comparisons of the frequency spectra using heterogeneity tests: insertions have significantly higher DTaj and DFu values than all other classes of mutation (P
0.046 for all contrasts), except one (PTaj = 0.068 for the contrast involving insertions and noncoding mutations). Autosomal mutations also show significant heterogeneity in frequency among the 5 mutational classes (KruskalWallis H = 19.867, P = 0.0005), but the causes differ from those on the X. The only significant differences in frequency among autosomal mutations involve nonsynonymous changes: these segregate at a lower mean frequency than insertions (PMW = 0.003), synonymous mutations (PMW < 0.0001), and noncoding mutations (PMW = 0.0004). Comparing the full frequency spectra of insertions and deletions with the other classes of mutation reveals only one significant contrast: insertions and nonsynonymous mutations have significantly different frequency spectra (PTaj = 0.002 and PFu = 0.006).
|
The elevated frequencies found for segregating insertions are consistent with their elevated probabilities of fixation in the D. melanogaster lineage (see above). These data thus further suggest that insertions are currently favored in D. melanogaster, particularly on the X chromosome.
Indel Lengths
Only nonoverlapping, noncontiguous indel events were used to study indel lengths (see Materials and Methods). Figure 3 shows the distribution of lengths for polymorphic and fixed indels. The distributions are similar to those seen in previous studies of intronic indels (Comeron and Kreitman 2000
; Bergman and Kreitman 2001
; Parsch 2003
; Ometto et al. 2005
), with 89.8% of deletions and 92% of insertions being
10-bp long. None of the indels corresponds to new transposon insertions or excisions. Table 5 compares the lengths of insertions and deletions. For polymorphisms in D. melanogaster, the sizes of insertions and deletions do not differ (table 5). However, insertion fixations in D. melanogaster are significantly shorter than deletion fixations (table 5). In D. simulans, fixed insertions are also shorter than fixed deletions, although not significantly so (table 5). There are no differences in the lengths of indels fixed in D. melanogaster and D. simulans (insertions: PMW = 0.739; deletions: PMW = 0.132). There are no Xautosome differences in the lengths of polymorphic or fixed indels (PMW
0.288 for all contrasts).
|
|
Intron Size Evolution
For the 148 intron sequences studied here, those in D. simulans are significantly shorter than their homologous sequences in D. melanogaster (Wilcoxon signed rank test, P = 0.005; Comeron and Kreitman 2000
I tested whether total intron length evolution is at equilibrium by combining information on the numbers and sizes of indels fixed in the D. melanogaster and D. simulans lineages across introns. (There are too few indel events per intron to perform a meaningful intron-by-intron analysis.) A lineage at equilibrium should show little or no net change in intron length, with the loss of intron sequence by deletions being balanced by the addition of intron sequence by insertions. In the rough calculations that follow, I assume that the lengths of nonoverlapping, noncontiguous indels are representative of all indels. Overall, the history of indel substitutions appears to conform to the equilibrium expectation in D. melanogaster, but not in D. simulans. In D. melanogaster, 79 deletions and 104 insertions have been fixed, eliminating (on average) 5.2 bp and adding 4.0 bp, respectively, thereby causing a net gain of
6 bp of intron sequence. In D. simulans, the analogous calculation reveals a net loss of
280 bp of intron sequence. However, these naïve calculations ignore the Xautosome differences in indel evolution in D. melanogaster. (There is no such Xautosome difference in D. simulans; see above.) On the X chromosome, deletions and insertions are 6.3 bp and 3.4 bp, respectively; on the autosomes, deletions and insertions are 4.5 bp and 4.8 bp, respectively. Repeating the above calculations separately for X-linked and autosomal introns shows that X-linked introns have gained
32 bp and autosomal ones have lost
13 bp.
In the lineages connecting D. yakuba to the common ancestor of D. melanogaster and D. simulans, 540 indels were fixed in 117 introns from 57 genes. Although these indels cannot be classified as deletions or insertions without an appropriate outgroup sequence, they can be classified as simply longer or shorter in D. yakuba than in D. melanogaster and D. simulans. Indels leaving D. yakuba with longer sequences (299) than D. melanogaster and D. simulans have been fixed significantly more often than those leaving D. yakuba with shorter sequences (241;
2 = 6.23, P = 0.013). Thus, D. yakuba has either fixed more insertions or fewer deletions than the D. melanogasterD. simulans common ancestor. Indel events leaving D. yakuba with longer sequences (mean: 10.6 ± 2.10 bp; n = 280) do not differ in length from those leaving D. yakuba with shorter sequences (mean: 6.8 ± 0.96 bp; n = 194; PMW = 0.375; lengths estimated from nonoverlapping, noncontiguous indels). Combining information on the number and mean size of scoreable indel fixations shows that D. yakuba possesses
521 bp of excess intron sequence relative to the inferred ancestor of D. melanogaster and D. simulans.
Noncoding DNA Sizes in the D. melanogaster Genome
If the indel fixation profile of D. melanogaster is representative of its deeper evolutionary history, then X-linked introns might be expected to be longer than autosomal ones. To test this possibility, I compared the intron lengths of all introns in Release 4.2.1 of the D. melanogaster genome (for loci with alternative transcripts, one was arbitrarily chosen for the analysis). As Figure 4 shows, X-linked introns (mean = 1005 ± 51; median = 83; n = 5,950) are significantly longer than autosomal ones (mean = 844 ± 18; median = 70; n = 33, 479; PMW < 2.2 x 1016). To test whether the Xautosome difference in intron length extends to other noncoding DNA, I also compared the lengths of intergenic regions between the X and the autosomes of D. melanogaster. X-linked intergenic regions (mean = 5,779 ± 288; median = 1169; n = 2003) are significantly longer than autosomal ones (mean = 4,804 ± 111; median = 820; n = 9,946; PMW < 2.8 x 1011), and there is a good correlation among chromosome arms between the lengths of introns and intergenic regions (fig. 5). Interestingly, intron and intergenic region lengths also appear to covary among autosomal arms. These observations suggest that similar chromosome arm-specific forces influence the evolution of length variation in introns and intergenic regions (Comeron and Kreitman 2000
; Ometto et al. 2005
).
|
|
| Discussion |
|---|
|
|
|---|
Evolutionary geneticists have uncovered many differences between the genomes of D. melanogaster and D. simulans that suggest that D. simulans maintains a tidier genome. The D. simulans genome, for instance, features fewer transposable elements (Dowsett and Young 1982
But at least for intron lengths, this effective size hypothesis cannot explain the data. The species difference in intron lengths is not attributable to differences in indel size (table 5) or to differences in the rate at which deletions are fixed (table 1) but is instead caused by differences in the rate at which insertions are fixed: D. melanogaster has fixed more insertions than D. simulans. Although small insertions might increase the cost, or slow the rate, of transcription (Castillo-Davis et al. 2002
; Urrutia and Hurst 2003
), there is no indication that such putatively deleterious insertions are invisible to natural selection in D. melanogaster. Rather, small insertionsparticularly those on the X chromosomeappear moderately favorable in D. melanogaster. (It is important to distinguish this finding from previous ones that, using restriction map variability in D. melanogaster, detected evidence for selection against insertions [Golding et al. 1986
; Aquadro et al. 1988
; Tajima 1989
]; these earlier studies, having limited indel size resolution [Aquadro et al. 1988
], detected only large insertion events probably associated with transposable elements.) Similar evidence for erratic, species-specific shifts in indel fixations in noncoding regions has been noted before: Kreitman and Ludwig (1996)
found striking evidence for lineage-specific nonneutral evolution of small indels in the stripe 2 enhancer of even skipped, reminiscent of the patterns seen here for introns.
Indel Evolution in D. melanogaster Introns
Two lines of evidence indicate that insertions are favored in the D. melanogaster lineage: 1) MK tests reveal greater than neutral probabilities of fixation for insertions and 2) polymorphic insertion frequencies are significantly elevated relative to other classes of mutation, including putatively neutral ones. The latter finding is in good agreement with those of Ometto et al. (2005)
, despite our use of different data sets. The elevated population frequencies of insertion mutations suggest that whatever forces drove the excess of insertion fixations in D. melanogaster's deeper past have continued into its more recent past.
For reasons that are not clear, the evidence that insertions are favored in D. melanogaster introns is particularly strong on the X chromosome. In principle, the X could differ from autosomes in mutational spectra, the efficacy of selection and/or rates and biases in gene conversiondouble-stranded break repair (Singh et al. 2005
). The PDB provides some insight into the relative rates of deletion and insertion mutations: although the PDBs for X-linked (1.371) and autosomal introns (0.926) do not differ significantly, a mutation profile difference cannot be excluded from these limited data. However, even if there is a difference in indel mutation profiles, it cannot explain the significantly different ratios of deletions to insertion fixed at X-linked (0.466) and autosomal (1.130) introns. Therefore, some force appears to have enhanced the fixation probabilities of insertions on the X chromosome. If that force is natural selection, then the pattern could imply that the beneficial fitness effects of newly arising insertions are on average recessive (Charlesworth et al. 1987
).
Biased gene conversion is another force that may favor insertions, leaving patterns indistinguishable from those of weak selection (Nagylaki 1983
; Marais 2003
; Webster and Smith 2004
; Galtier et al. 2006
). In D. melanogaster, only one experimental study (as far as I am aware) has tested for indel-associated biases in gene conversion. By monitoring the repair of transposon-induced double-strand breaks (DSBs) in exon 6 of the white gene, Johnson-Schlitz and Engels found that sequences with insertions had substantially higher efficiency as templates for DSB repair than those with deletions and, remarkably, those with wild-type sequences: conversion frequencies for insertion, deletion, and wild-type templates were 40.352.7%, 6.38.5%, and 18.6%, respectively (Johnson-Schlitz and Engels 1993
). In their experiments, then, insertions enjoyed a transmission advantage. Whether their results reflect properties general to DSB repair or specific to repair of transposon-induced breaks is not known. Nevertheless, this experimental work shows the potential for biased repair to act as a directional force influencing the fates of indel mutations and is consistent with the population genetic results presented above.
Why might insertions on the D. melanogaster X chromosome enjoy increased probabilities of fixation compared with those on the autosomes or to those in D. simulans? The many autosomal inversions segregating at appreciable frequencies in African populations of D. melanogaster, but not D. simulans (Lemeunier and Aulard 1992
), could contribute to this species- and chromosome-specific pattern in 2 ways. First, in inversion heterozygotes, rates of crossing over are suppressed within inverted regions; second, the suppression of crossing over on the autosomes by inversions increases rates of crossing over on the Xthe so-called interchromosomal effect (Lucchesi 1976
). Together these effects lead to a relatively elevated rate of crossing over on the X chromosome (Begun 1996
; Andolfatto 2001
). If small insertions are for some reason weakly favorable in the D. melanogaster lineage, then the elevated rates of crossing over on the X chromosome could increase the efficacy of natural selection on the X. (For other causes of Xautosome differences in the efficacy of selection, see Charlesworth 2001
). Alternatively, if small insertions are weakly favored by a biased gene conversiongap repair process whose rate is correlated with crossing over, then the elevated rates of crossing over on the X chromosome could enhance transmission of small insertions on the X. The latter scenario is appealing in that it could simultaneously explain the chromosome- and the lineage-specific effects.
Testing Models of Intron Evolution
Carvalho and Clark (1999)
and Comeron and Kreitman (2000)
both document a negative correlation between intron length and recombination rate in the D. melanogaster genome but arrive at different interpretations for the pattern. Note that recombination rate does not appear to influence the indel mutation profile, as the deletion bias of indel polymorphisms does not differ among low (1.44), medium (1.47), and high (0.97) recombination environments (see also Comeron and Kreitman 2000
; Blumenstiel et al. 2002
). Under the first model (Carvalho and Clark 1999
), insertions are weakly deleterious and hence more likely to go to fixation in regions of low recombination; however, the findings above show that insertions in D. melanogaster behave as moderately favorable, not deleterious. Under the second model (Comeron and Kreitman 2000
), insertions are selectively favored as enhancers of recombination, particularly in low recombination regions of the genome; however, insertions in D. melanogaster appear to be favored in high not low recombination regions: MK tests for indels versus synonymous mutations show a significant excess of insertion fixations in high, and nearly so in medium, recombination regions (table 6). The elevated probabilities of fixation for insertions in higher recombination environments suggests either that: 1) insertions are specifically favored in these regions but not in low recombination regions; 2) insertions are weakly favored throughout the genome but effectively neutral in low recombination regions; or 3) gene conversion is correlated with rates of crossing over (Marais 2003
; but see Langley et al. 2000
; Andolfatto and Wall 2003
), and gap repair proceeds more efficiently from insertion-bearing templates (Johnson-Schlitz and Engels 1993
). Unfortunately, very little experimental information exists concerning the rates and biases of gene conversion processes as a function of rates of crossing over in the Drosophila genome. Indeed, recent findings suggest that, at least in Drosophila, rates of gene conversion are not correlated with rates of crossing over in the expected way (Langley et al. 2000
; Andolfatto and Wall 2003
).
|
Under the third model (Parsch 2003
| Summary and Conclusions |
|---|
|
|
|---|
There is a remarkable convergence among several disparate observations in D. melanogaster: the population genetic data show that X-linked insertions are favored (they segregate at elevated frequencies and enjoy elevated probabilities of fixation); experimental data show that DSB repair at a X-linked locus is biased in favor of insertions (Johnson-Schlitz and Engels 1993
| Supplementary Material |
|---|
|
|
|---|
Supplementary Table 1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/)
| Acknowledgements |
|---|
|
|
|---|
I thank Andrea Betancourt, Victoria Cattani, Andy Clark, Josep Comeron, John McDonald, Allen Orr, Lino Ometto, John Parsch, Todd Schlenke, Wolfgang Stephan, Kevin Thornton, Shanwu Tang, Marcy Uyenoyama, and 3 anonymous reviewers for helpful discussions and/or suggestions on the manuscript. I also thank the Genome Sequencing Center at Washington University School of Medicine for prepublication access to the D. yakuba and D. simulans genome sequence data. This work was supported by funds from a Ruth L. Kirschstein National Research Service Award Postdoctoral Fellowship from the National Institutes of Health while at Cornell University.
| Footnotes |
|---|
1 Present address: Department of Biology, University of Rochester
John H. McDonald, Associate Editor
| References |
|---|
|
|
|---|
Adams MD, Celniker SE, Holt RA, et al. (192 co-authors). (2000) The genome sequence of Drosophila melanogaster. Science 287:218595.
Akashi H. (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:92735.[Abstract]
Akashi H. (1995) Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA. Genetics 139:106776.[Abstract]
Akashi H. (1996) Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144:1297307.[Abstract]
Akashi H. (1999) Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination. Genetics 151:22138.
Amos W, Hutter CM, Schug MD, Aquadro CF. (2003) Directional evolution of size coupled with ascertainment bias for variation in Drosophila microsatellites. Mol Biol Evol 20:6602.
Andolfatto P. (2001) Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans. Mol Biol Evol 18:27990.
Andolfatto P. (2005) Adaptive evolution of non-coding DNA in Drosophila. Nature 437:114952.[CrossRef][Medline]
Andolfatto P and Przeworski M. (2001) Regions of lower crossing over harbor more rare variants iin African populations of Drosophila melanogaster. Genetics 158:65765.
Andolfatto P and Wall JD. (2003) Linkage disequilibrium patterns across a recombination gradient in African Drosophila melanogaster. Genetics 165:1289305.
Aquadro CF, Lado KM, Noon WA. (1988) The rosy region of Drosophila melanogaster and Drosophila simulans. I. Contrasting levels of naturally occurring DNA restriction map variation and divergence. Genetics 119:87588.
Aulard S, Monti L, Chaminade N, Lemeunier F. (2004) Mitotic and polytene chromosomes: comparisons between Drosophila melanogaster and Drosophila simulans. Genetica 120:13750.[CrossRef][ISI][Medline]
Begun DJ. (1996) Population genetics of silent and replacement variation in Drosophila simulans and D melanogaster: X/autosome differences? Mol Biol Evol 13:14057.[ISI][Medline]
Bergman C and Kreitman M. (2001) Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res 11:133545.
Betancourt AJ and Presgraves DC. (2002) Linkage limits the power of natural selection in Drosophila. Proc Natl Acad Sci USA 99:1361620.
Birky CW and Walsh JB. (1988) Effects of linkage on rates of molecular evolution. Proc Natl Acad Sci USA 85:64148.
Blumenstiel JP, Hartl DL, Lozovsky ER. (2002) Patterns of insertion and deletion in contrasting chromatin domains. Mol Biol Evol 19:221125.
Boulesteix M, Weiss M, Biemont C. (2005) Differences in genome size between closely related species: the Drosophila melanogaster species subgroup. Mol Biol Evol 23:1627.
Cardazzo B, Bargelloni L, Toffolatti L, Patarnello T. (2003) Intervening sequences in paralogous genes: a comparative genomic approach to study the evolution of X chromosome introns. Mol Biol Evol 20:203441.
Carvalho AB and Clark AG. (1999) Intron size and natural selection. Nature 401:344.[CrossRef][Medline]
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA. (2002) Selection for short introns in highly expressed genes. Nat Genet 31:4158.[ISI][Medline]
Charlesworth B. (2001) The effect of life-history and mode of inheritan




