MBE Advance Access originally published online on July 17, 2008
Molecular Biology and Evolution 2008 25(9):2043-2053; doi:10.1093/molbev/msn155
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Evidence of Adaptive Evolution of Accessory Gland Proteins in Closely Related Species of the Drosophila repleta Group


* Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY
Department of Biology, New York University
E-mail: falmeida{at}amnh.org.
| Abstract |
|---|
|
|
|---|
Accessory gland proteins (Acps) are part of the seminal fluid of Drosophila species. These proteins have important reproductive functions, being responsible for the proper functioning of several steps of the fertilization process. Acps also contribute indirectly for the reproductive success of males by modulating female behavior. Evidence that Acps participate in sperm competition and sexual conflict includes findings that, on average, Acps have fast evolutionary rates, suggestive of adaptive evolution. This is especially true in species of the Drosophila repleta group. Nevertheless, only in a few occasions have robust statistical tests been used to determine whether observed evolutionary rates are in fact due to positive selection on amino acid substitutions between related species. Here we apply maximum likelihood tests for positive selection on 14 Acps of the D. repleta group. To increase statistical robustness, we use at least 8 sequences, all belonging to species of the Drosophila mulleri complex, for each gene analyzed. We found significant evidence of adaptive evolution for 10 of the tested genes. Among these, the ones with a conserved protein domain had positively selected sites within the functional region of the sequence. We also detected one instance of lineage-specific adaptive evolution in a clade formed by 2 sister species.
Key Words: Acp Drosophila repleta group adaptive evolution
| Introduction |
|---|
|
|
|---|
Accessory gland proteins (Acps) form part of the Drosophila and other dipetrans seminal fluid. During insemination, Acps are transferred into the female reproductive tract where they perform several functions during the fertilization process, being essential for its achievement (for a review of Acp function, see Wolfner 2002
Interestingly, the findings that suggest that Acps are under sexual selection were met by evidence that many Acp genes have patterns of nucleotide variation compatible with evolution by positive selection. In fact, the phenomenon of rapid, adaptive evolution in reproductive molecules has been observed in a wide variety of animal taxa (Swanson and Vacquier 2002
; Panhuis et al. 2006
). In Drosophila, some of the Acps showing signs of adaptive evolution are the same that had been linked to functions involved in sperm competition and sexual conflict (Aguadé et al. 1992
; Aguadé 1999
; Begun et al. 2000
; Swanson, Clark et al. 2001; Mueller et al. 2005
; Wagstaff and Begun 2005
; Schully and Hellberg 2006
). The implication that coevolution by sexual conflict drives the adaptive evolution of Acps has been reinforced by evidence that some female reproductive tract proteins are under positive selection (Swanson et al. 2004
; Kelleher et al. 2007
). Similar evidence was observed in different Drosophila species groups, such as melanogaster, pseudoobscura, and repleta.
The main evidence used to suggest adaptive evolution in Acps is a high nonsynonymous substitution (dN) rate as compared with synonymous substitution rates (dS). High dN/dS ratios (>0.5) were observed in most Acps described so far (Swanson, Clark et al. 2001; Wagstaff and Begun 2005
; Haerty et al. 2007
). Nevertheless, although a high dN/dS suggests that the gene in question may be under positive selection, it is simply an estimate not a statistical test. In addition, relatively high rates of nonsynonymous substitutions (dN/dS
1) can also be due to a neutral selection regime. To have a better assessment of evolutionary patterns of Acps, different statistical tests have been employed to determine whether Acp evolutionary rates depart from neutral expectations. Most Acps with high dN/dS tested so far, however, fail to show sequence variation patterns consistent with positive selection in neutrality tests based on population polymorphism data (Kern et al. 2004
). Another way of testing for positive selection is to compare the relative number of nonsynonymous substitutions in fixed and polymorphic substitutions when populational data are available for 2 closely related species, using the McDonald–Kreitman test. This test, with a few exceptions, also failed to provide support for positive selection for a number of Acps with high dN/dS estimates (Begun et al. 2000
; Kern et al. 2004
; Wagstaff and Begun 2005
).
Here we apply a likelihood approach to test for positive selection in several Acp loci identified in an accessory gland cDNA library of Drosophila mayaguana (FC Almeida and R DeSalle, in review). The likelihood tests for positive selection rely on site-by-site estimates of dN/dS (Yang 1998
). Because selection pressure can be heterogeneous across codons of a gene, likelihood tests have increased power to detect adaptive evolution. This is specially true when most codons of a gene are under purifying selection, and only a few sites are under positive selection. However, the accuracy of these tests depends on the assembly of a large enough number of sequences for comparison. The power to detect sites under positive selection is decreased considerably if the data set consists of 6 or fewer sequences (Anisimova et al. 2001
).
Drosophila mayaguana is part of the Drosophila repleta group, whose Acps show extraordinarily high dN/dS ratios (Wagstaff and Begun 2005
, 2007
; Almeida and DeSalle, in review). In this group, McDonald-Kretiman (MK) tests showed significant results suggestive of positive selection for 3 Acp loci (Wagstaff and Begun 2005
). We have taken advantage of the intermediate phylogenetic relationship between D. mayaguana and Drosophila mojavensis and the availability of a genome sequence for the latter species to design primers that are likely conserved across most species of the Drosophila mulleri complex, to which both species belong. This advance made possible the first tests for positive selection on Acps using a considerable number of species. We were able to include samples for all the known species of the mulleri and mojavensis clusters. These sister clusters include species that diverged possibly less than 10 MYA (Russo et al. 1995
). Such phylogenetic sampling allows for a comprehensive understanding of the evolution of Acps in closely related species and provides clues of how selection may have influenced species establishment and divergence.
Our approach allowed us to use robust tests for positive selection in 14 Acps of the D. repleta group. We found a significant evidence of adaptive evolution in 10 of them. Among these, the 3 Acps with a conserved protein domain had positively selected amino acids sites identified within the functional domains. Three additional genes, for which tests were not applied due to a lack of adequate number of sequences, showed high dN/dS ratios suggestive of positive selection. We also identified one case of lineage-specific accelerated evolution in a protease gene.
| Materials and Methods |
|---|
|
|
|---|
Flies Samples
Ten species of the D. repleta group, all belonging to the D. mulleri complex (Durando et al. 2000
Loci and DNA Sequencing
Forty-three Acp candidates were selected from an accessory gland cDNA library of D. mayaguana. The selection was based on the existence of good quality sequences longer than 300 bp and a negative dot blot result when probed with female cDNA (Almeida and DeSalle, in review). Six of these loci, however, were not classified as Acp based on the commonly used Acp criteria (Swanson, Clark et al. 2001; Mueller et al. 2004
). Drosophila mayaguana cDNA sequences were aligned with their best Blast hit in the D. mojavensis genome, and potentially conserved primers sites were identified. Conserved primers that allow for the amplification of fragments longer than 250 bp were designed from the alignments of 26 loci (supplementary table 2, Supplementary Material online). Polymerase chain reaction (PCR) amplification protocols were optimized for each locus (conditions available upon request). Sequences were obtained using a 3730XL automated sequencer and edited with Sequencher 4.5 (Genecodes, Ann Arbor, MI). For genes classified as Acp in both D. mayaguana (Almeida and DeSalle, in review) and D. mojavensis (Wagstaff and Begun 2005
), the species prefix was omitted from the gene name.
Sequence Analyses
Alignments were performed with MAFFT 6.236b (Katoh et al. 2005
) and trimming of noncoding regions (introns and 3' untranslated region [UTR]) was done in MacClade 4.08 (Maddison D and Maddison W 2000
), using D. mayaguana cDNA sequence for determining introns. Manual codon alignment (gap placement) was performed in MacClade 4.08 on alignments obtained with MAFFT. Presence of a signal peptide in the amplified sequence was checked using the program SignalP 3.0 (Bendtsen et al. 2004
). Codon bias was estimated by the effective number of codons (ENC) and proportion of G and C in the third codon position (G/C 3rd), both obtained with DNAsp (Rozas et al. 2003
). Conserved domain alignments were obtained using the CD-search online program and CDD database (Marchler-Bauer and Bryant 2004
; Marchler-Bauer et al. 2005
). Relative rate tests were done using HyPhy (Pond et al. 2005
) using the general reversible model. In these tests, only the species of the mulleri subcluster were included and D. mayaguana was used as outgroup.
Tests for Positive Selection
The role of positive selection in the evolution of the coding region of Acp genes was assessed in cross-species comparisons and statistical tests were applied when sequences of a minimum of 8 species were available. Tests for positive selection were carried out using the program codeml of the PAML 3.15 package (Yang 1997
; Yang and Bielawski 2000
; Yang and Nielsen 2002
). The program codeml provides a number of nucleotide substitution models and the likelihood of these models given the data can be compared using the likelihood ratio test (LRT) with a chi-square distribution (Yang et al. 1998
, 2005
; Wong et al. 2004
). Average dN/dS across codons was obtained for each gene using the maximum likelihood estimates (Yang et al. 1998
), assuming homogeneous replacement rates across sites (M0). For assessing whether some sites are under positive selection, we used 3 model comparisons: M1a x M2a, M7 x M8, and M8a x M8 (Swanson et al. 2003
; Wong et al. 2004
; Yang et al. 2005
). The first and the second models compare at one side models that assume that site dN/dS ratios are distributed from 0 to 1 (M1a and M7) with their alternative hypothesis which assume that a few sites are outside of this distribution and have dN/dS > 1 (M2a and M8, respectively). M8a is another null hypothesis for model M8, in which dN/dS is fixed at 1 for the class of sites with dN/dS > 1, being a robust test for positive selection (Swanson et al. 2003
). Whereas M1a and M2a assume a discrete distribution of dN/dS classes, M7 and M8 assume a beta distribution. Branch models, which allow for different dN/dS in different branches of the tree, were run using codeml (model = 2, Nsites = 0), with the null hypothesis being that all branches have the same dN/dS (model = 0, Nsites = 0).
The phylogenetic relationships among species are very important in the likelihood estimation of these models. Because some genes, especially Acps, may have multiple copies in a genome (paralogs), the species phylogeny may not represent the relationship of the sequences analyzed here. For this reason, we obtained gene trees for each one of the genes analyzed. With minor exceptions, the species relationships recovered were largely congruent among Acps and very similar to what has been obtained using other genes, such as hunchback and 16S (Durando et al. 2000
). For the genes with tree topologies discordant from the most supported one (fig. 1), positive selection tests were rerun using the "species tree" (i.e., the most supported topology). All phylogenetic analyses were done using the maximum likelihood algorithm and the general time reversible + I +
model as implemented in PAUP* (Swofford 2003
), with parameters estimated from the data based on an initial tree obtained by maximum parsimony. Detailed results of the phylogenetic analyses will be described elsewhere (FC Almeida, S-O Kolokotronis and R DeSalle in preparation).
|
As a comparison, tests for positive selection were also conducted on the gene hunchback, the only nuclear, protein-coding gene for which sequences are available for all the species included here. For the genes with evidence of positive selection (alternative hypothesis with significantly higher likelihood than the null hypothesis), the probability of each site being subject to positive selection was estimated using a Bayes Empirical Bayes (BEB) approach also using codeml (Yang et al. 2005
| Results and Discussion |
|---|
|
|
|---|
PCR Results and Sequence Characterization
PCR Amplifications
Amplification was successful in at least 8 species for 16 genes (table 1). Two of these genes did not meet the Acp criteria, although they are expressed in the accessory glands of D. mayaguana. Among the remaining genes for which conserved primers were designed (9 loci), some could be amplified in different numbers of species (6 loci, table 1) and 3 of them did not work well even for D. mayaguana and/or D. mojavensis. Almost all genes that could be amplified in D. mayaguana and D. mojavensis were also amplified in their sister species, D. parisiena and D. arizonae, respectively (table 1). Considerably fewer genes could be amplified in D. propachuca (3) and D. spenceri (4), as expected due to their more distant relationships with the other species used in this study (Durando et al. 2000
|
Drosophila navojoa, among the species analyzed here, had a particularly high number of failures to amplify (7/16), as compared with other species similarly related to D. mayaguana and D. mojavensis (species used for primer design): D. wheeleri (2/16), D. aldrichi (1/16), and D. mulleri (1/16). This result could be related to particularly high evolutionary rates in D. navojoa, resulting in nonconserved primer sites or gene loss. We tested this hypothesis by comparing rates between species with the relative rate test in 9 loci for which D. navojoa sequences were available. The results did not support a general higher evolutionary rate in the lineage of D. navojoa. This species showed a significantly higher substitution rate in only 1 gene, may97, in 4 (out of 7) pairwise comparisons with other species. On the other hand, in Acp25, D. navojoa showed significantly lower rates in all possible comparisons.
Sequences
High-quality sequences were obtained for most PCR products, but some PCR products resulted in double sequences that suggest lineage-specific gene duplications. We cloned the PCR products of D. mulleri for the gene mayAcp74 and sequenced 20 colonies. The sequences revealed that, in fact, 2 sequences, with 82.5% similarity, were being amplified in that species. A phylogenetic analysis of all the sequences obtained for this gene did not support a species-specific duplication in D. mulleri (fig. 2). Instead, it suggests that the duplication occurred before the split between D. mulleri and D. nigrodumosa + D. huaylasi. If in fact the latter 2 species have only one copy of the gene, the results of the phylogenetic analysis imply gene loss in those species.
|
For one of the non-Acp genes, may83, even though amplification was successful for 12 species (table 1), it was not possible to find a common open reading frame (ORF) for all the species in the alignment. No ORF was found in D. propachuca, and the ORF found in D. mayaguana and D. parisiena was in reverse orientation and not overlapping with the one found in the remaining species. Nevertheless, the alignment of the ORF found the largest number of species (9) revealed that D. arizonae and D. nigrodumosa had frameshift mutations a few bases downstream to the start codon, which did not allow for an accurate alignment of codons. The fragments sequenced probably represent pseudogenes or unstranslated mRNA. This gene was not included in further analyses.
Most sequences had a high probability of carrying a signal peptide (P > 0.9), indicating that the coding regions analyzed were complete or almost complete at their 5' end (table 2). Most of the sequences were also complete in their 3' end of the coding regions as inferred by the presence of a stop codon. The presence of a signal peptide can also be interpreted as evidence that these genes are Acps in the other species besides D. mayaguana, although this could only be confirmed by mRNA analyses.
|
The average codon bias for Acp genes was low, with ENC = 54.4 (ENC varies from 20 to 61, where 61 is no bias) and C/G 3rd = 0.48 (table 2). The codon bias, as measured by both ENC and C/G 3rd, of the only 2 non-Acp genes analyzed here, may97 and hunchback, was higher than the most biased Acp. Little is known about codon bias in the species of the repleta group. The only other gene for which there are data on codon bias for several species of the repleta group, including 4 of the species studied here, is the xanthine dehydrogenase locus (Begun and Whitley 2002
Extensive Positive Selection in Acps across Species Boundaries in the repleta Group
Substitution Rates
Synonymous (dS) and nonsynonymous (dN) substitution rates and the dN/dS ratios for genes with at least 8 sequences are shown in table 2. Overall, across sites, only Acp7 and Acp42 had dN/dS > 1. Another 11 Acps had dN/dS > 0.5, which is relatively high as compared with the average dN/dS of non-Acp genes in Drosophila (Mueller et al. 2005
; Wagstaff and Begun 2005
). High dN/dS values observed could not be attributed to extraordinarily low dS because we found a positive correlation between dN and dS (Spearman rank correlation,
= 0.63, P = 0.01). A possible explanation for this correlation is related to codon bias; if genes with high dN/dS have low codon bias as previously found (Akashi 1996
; Kim 2004
), then low pressure for preferred codons could lead to high dS in these loci. In fact, we found a highly negative correlation between dN/dS and codon bias as measured by both ENC and C + G 3rd (Spearman rank correlation,
= 0.67, P = 0.006, and
= –0.75, P = 0.001, respectively).
Even though we chose not to apply tests for positive selection in genes with less than 8 sequences, we calculated the average overall dN/dS for 6 of them. Among these, only mayAcp68b showed dN/dS > 1 but 2 (mayAcp73 and may82) other had dN/dS > 0.7 (supplementary table 2, Supplementary Material online).
Tests for Positive Selection
Table 3 shows the results of the LRTs obtained in comparisons between M7 and M8 and between M8 and M8a conducted to examine positive selection in 17 genes. Results obtained in the comparison between M1a and M2a (data not shown) were in general agreement with the other tests results. The LRTs comparing M7 and M8 support the presence of sites under positive selection in 10 Acps with P < 0.01 (P < 0.001 was found in 7 genes). The same conclusion was reached by LRT in comparisons between M8a and M8 (table 3). The results strongly support the notion that positive selection is a cause for the inflated dN/dS values, rather than simply relaxed selection. The occurrence of positive selection on amino acid substitutions was significant at the P < 0.01 level for all the genes classified as Acps in both D. mayaguana and D. mojavensis. These genes are those that are most likely Acps in the other species of the mulleri cluster. Sites under positive selection were detected in genes with average dN/dS as low as 0.635 (mayAcp58).
|
Of the 16 genes analyzed, mayAcp57, mayAcp63, mayAcp69a, mayAcp75, may97, and hunchback were the only genes for which the LRTs did not reject the null hypothesis. may97 belongs to a family of endoplasmatic reticulum proteins (ERp29) that are highly expressed in secretory cells. Its likely ortholog in Drosophila melanogaster, windbeutel, is a gene involved in dorsal–ventral patterning and whose sequence is conserved across distantly related Drosophila species. A low dN indicates that strong purifying selection is acting on this gene (table 2). Three of the 4 Acps analyzed here that had no codon under selection according to the likelihood tests, also showed relatively low dN. mayAcp57 is part of a gene family that encodes for proteins with a CRISP domain, frequently found among Drosophila Acps (Mueller et al. 2004
mayAcp69a, however, did show a high dN estimate. Nevertheless, dS estimates were also quite high for this gene, and we believe that this might reflect a paralogy problem. The D. mayaguana and D. parisiena sequences were very divergent from the rest as they contained many indels. For D. parisiena, the mayAcp69a sequence was considerably shorter than that of the remaining species analyzed due to a premature stop codon and, for this reason, was not included in the tests for positive selection. Although the gene tree obtained with the sequences of mayAcp69a was congruent with the most accepted relationships among species, the node leading to D. parisiena and D. mayaguana was exceptionally long. One explanation for this result is that we actually amplified a paralog of mayAcp69a in the remaining species. In fact, a paralog of this gene, mayAcp69b, was also found to be expressed in the accessory glands of D. mayaguana (Almeida and DeSalle, in review). The relative rate test showed that the evolutionary rate of D. mayaguana is significantly (P < 0.00001) higher than that of other species in all pairwise comparisons.
Indel Substitutions
The alignments of some genes analyzed here revealed many indels in their coding sequences. Particularly, Acp7, Acp42, Acp45, and mayAcp69a had large numbers of indels (table 4). Although, in theory, indels can be under positive selection, tests to demonstrate it are complicated. Indels are ignored in the likelihood models used here and in most other tests available for positive selection. One way to test whether there is positive selection for indels is to compare the rate of indel appearance in a coding region with the same rate in noncoding sequences that are likely to be neutral, weighting for divergence time (Podlaha and Zhang 2003
). This test has been used to show that indels are likely under positive selection in Acp26Aa in the Drosophila pseudoobscura group (Schully and Hellberg 2006
). This approach, however, is not available for the species studied here because the rate of indels in noncoding sequences is unknown for the repleta group and there is no reliable estimate of divergence time for the species of the mulleri complex. Nevertheless, some of the genes used here have introns and 3' UTR sequences that allow for some rough comparisons. Although introns and 3' UTRs may have regulatory function and therefore cannot be assumed to be neutral (Healy et al. 1996
; Rodriguez-Trelles et al. 2002
, 2003
), these regions are definitely less constrained than coding sequences. Indels in noncoding sequences are naturally less constrained because the survival of a new mutation does not dependent on the number of sites involved, whereas in coding sequences, they have to be in multiples of 3 to survive.
|
We compared the rates of indel substitutions per base pair between coding and noncoding sequences for 12 genes (table 4). Numbers of indels were calculated by taking into account the phylogeny and disregarding the number of base pairs included in the indel. In all comparisons, the frequency of indels is larger in noncoding sequences, suggesting that indels are mostly under negative selection in coding sequences. This comparison is very conservative, and it is not an appropriate test for selection on indels. Even genes with high dN/dS have regions of the coding sequence under strong purifying selection, which usually contain very few or no indels. To minimize the effect of these highly constrained regions, ratios of indels per base pair in coding sequences were recalculated for the 4 genes with the largest number of indels (Acp7, Acp42, Acp45, and mayAcp69a) using the number of base pairs (sequence length) equivalent to the proportion of codons in either class 1 (dN/dS = 1) or class 2 (dN/dS > 1) (table 5). In this way, gene regions under purifying selection are excluded, making comparisons with noncoding sequences more realistic. With this approach, the rate of indels in the coding sequence is higher than in noncoding sequences in Acp7 and mayAcp69a, and very similar to that of noncoding sequences in Acp42 and Acp45, suggesting that some indels in these loci are likely to be under selection.
|
Evolutionary and Functional Trends in Acps of the D. mulleri Complex
Positive Selection and Functional Categories
Among the 14 Acp genes analyzed in tests for positive selection, only 5 had a conserved protein domain (table 1). This sample is too small to assess whether adaptive evolution in Acps is related to function. Among these 5 Acps, 3 had highly significant results in the LRTs for positive selection: a serine protease (mayAcp74), a thiol reductase (mayAcp58), and cysteine-rich secreted protein (CRISP, Acp19). Among the 9 Acps with unknown function, 7 showed highly significant results in the tests for positive selection. Acps without a known conserved domains many times have hormonal activity and are involved in sperm competition as has been shown for Acp70A, Acp26Aa, and Acp53Ea of the D. melanogaster group (Chen et al. 1988
The results obtained here suggest that positive selection in Acps is not directly related to function. The 2 genes containing a CRISP domain, Acp19 and mayAcp57, showed very different evolutionary patterns. These genes had similar synonymous substitution rates (dS), but the amino acid substitution rates were considerably different, leading to very different overall dN/dS ratios (table 2). Also, whereas Acp19 had highly significant results in the tests for positive selection, mayAcp57 had no support for the presence of positively selected amino acid substitutions. It is possible that the Acp function of mayAcp57 is restricted to D. mayaguana because we do not have cDNA evidence for the other species. On the other hand, there is no indication that mayAcp57 shows lineage-specific patterns in D. mayaguana. Codon bias of D. mayaguana is similar to those of the other species, and nonsignificant results were obtained when we tested for different dN/dS in D. mayaguana (using the branch model). Interestingly, Acp19 and mayAcp57 also diverge in the amount of codon bias. Acp19 has almost null bias, whereas mayAcp57 has the third highest codon bias among the genes analyzed here. This result raises the question of whether optimal codon selection can restrict adaptive evolution of a gene. Although the function of Acps containing the CRISP domain is not clear, this gene family includes genes involved in defense response in other organisms from plants to humans. Immune defense genes are often found to be under positive selection (Schlenke and Begun 2003
; Vallender and Lahn 2004
). In vertebrates, some proteins of the CRISP family are related to sperm binding at different stages of reproduction (Olson et al. 2001
; Voight et al. 2006
).
In order to examine the importance of positive selection in modulating changes in protein function, we analyzed the position of positively selected sites (class 2 BEB P > 0.90) in relation to the functional domain and active sites of 3 Acps. These were the Acp genes with a conserved functional domain and a significant result in tests for positive selection (Acp19, mayAcp58, and mayAcp74). In Acp19, the conserved CRISP domain encompassed 135 of the 230 amino acids in the coding region, starting at amino acid position 80. Five positively selected sites were found before the beginning and one after the ending of the CRISP domain. The remaining 5 positively selected sites were within the domain, 2 of which in more or less conserved sites in the domain alignments. Similar results were obtained in a study on mammalian "fertilin," a reproductive protein that also carries a CRISP domain (Civetta 2003
).
In mayAcp58 and mayAcp74, almost all the sites with BEB P
0.90 of being in class 2 were within the conserved protein domains. These included 9 out of 10 sites in mayAcp58 and all 22 sites in mayAcp74. In D. mayaguana, the gene mayAcp74 had nonsynonymous substitutions in 1 of the 3 active sites and 1 of the 3 binding sites. It is possible that D. mayaguana carries a nonactive pseudogene, although the coding region was intact without early stop codons or frameshift mutations. These mutated sites were not among the ones with high probability of being under positive selection.
Both those genes have domains related to protein catalysis. mayAcp58 contains a GILT domain, which is present in gamma interferon–inducible lysosomal thiol reductase, whose function is to catalyze thiol bond reduction, denaturating proteins and facilitating the action of proteases (West et al. 1994
). mayAcp74 is a trypsin-like serine protease, a functional category often found among Acps (Mueller et al. 2004
). A third Acp protease analyzed here (mayAcp73) also showed a relatively high dN/dS, suggestive of adaptive evolution. The importance of proteolysis regulation in fertilization is ubiquitous. Proteases, reductases, and protease inhibitors are often present in the seminal fluid of a diversity of organisms. In Drosophila, proteolysis is implicated in the processing of other Acps, both before and after insemination (Ram et al. 2006
). Acps involved in proteolysis regulation show pleiotropic and epistatic effects in sperm competition, and at least one Acp protease (CG6168) is involved in immune defense (Fiumera et al. 2007
; Mueller et al. 2007
). At least 2 Acp proteases have adaptive evolution in the D. melanogaster group (Wong et al. 2008
).
Lineage-Specific Adaptive Evolution
So far, the models used here in tests for positive selection assume that the different lineages are affected by the same evolutionary forces, that is, dN/dS is assumed to be the same in all branches of the tree. Nevertheless, positive selection can be restricted to a node, a clade, or even to a single species. This hypothesis can be tested by selecting one or more branches (foreground) to have different dN/dS from that of the remaining branches (background) on a tree (using the branch model as implemented in codeml), obtaining the likelihood of this model, and comparing it to the likelihood of the null hypothesis, which is the model that assumes homogeneous dN/dS across branches. Although the branch model used here does not allow for testing positive selection, it can be used to show whether a certain branch has significantly faster evolution as compared with other branches of the tree. We used this approach in 2 cases where some of the results already discussed pointed to the possibility that a certain lineage might have particularly high amino acid substitution rate.
As evidenced by the relative rate tests, the low success in amplifying D. navojoa genes cannot be attributed to higher evolutionary rates in this species. One alternative explanation for the PCR results is related to the demographics of this species. Drosophila navojoa has a very limited geographic distribution and breeds exclusively in one species of cactus, which could lead—but not necessarily—to small population sizes (Ruiz et al. 1990
). It has been proposed that selection is less efficient in small populations, leading to the accumulation of slightly deleterious mutations (Ohta 1973
, 1993
; Lynch and Conery 2003
). This would lead to higher rates not only of gene loss but also of amino acid substitution. We used the branch model to test the hypothesis of a higher rate of amino acid substitution in D. navojoa. Drosophila navojoa had higher dN/dS than the average of the other species in only 2 genes, Acp7 and Acp25, but the difference was not statistically significant (supplementary table 4, Supplementary Material online).
The second case where we used the branch model was that of the gene mayAcp74. The presence of amino acid substitutions in the active and binding sites of the protein exclusively in D. mayaguana raised the question of whether this lineage experiences relatively high rates of nonsynonymous substitution. When allowed to vary, estimates of dN/dS for D. mayaguana (1.712) were twice as large as the background dN/dS averaged across all the other branches (background dN/dS = 0.678). The likelihood of the model that allows for a different dN/dS in D. mayaguana in relation to that of the remaining species was significantly higher than the likelihood of the model that assumes homogeneity of dN/dS ratios across all species (table 6). It is possible, however, that the pattern observed was caused by positive selection acting on the ancestor of D. mayaguana and its sister species, D. parisiena. We tested this hypothesis by allowing for a different dN/dS ratio in the branch leading to the clade D. mayaguana + D. parisiena. The branch dN/dS (0.812) was not significantly different from the average across the other branches of the tree (0.767, table 6). Nevertheless, in a third model, where D. parisiena and D. mayaguana dN/dS ratios are set to be equal but independent from the dN/dS of the remaining branches, the average dN/dS (1.440) of these 2 species was higher than the background ratio (0.639, P < 0.01). Allowing D. mayaguana and D. parisiena to vary independently from each other and from the remaining species did not increase the model likelihood, as reflected by the high dN/dS ratios obtained for both species (D. mayaguana dN/dS = 1.657 and D. parisiena dN/dS = 1.151; in this model, background dN/dS = 0.640). The branch models discussed here are summarized in table 6. These results suggest that mayAcp74 could be involved in the development of reproductive isolation between the 2 sister species. Intraspecific patterns of nucleotide substitution in D. mayaguana and D. parisiena as compared with substitution patterns between species may provide further insight on the involvement of this gene in reproductive isolation.
|
One final point concerns the possibility that positive selection in mayAcp74 might be restricted to D. mayaguana and D. parisiena. To address this hypothesis, we reran the tests for positive selection on this gene without the sequences of the 2 most divergent species. The results showed that positive selection is not restricted to D. mayaguana and D. parisiena (LRT = 27.030, P < 0.001), although considerably fewer sites were found to be under positive selection than in the tests including all the species (4 as compared with 14 with class 2 BEB P > 0.95).
| Conclusions |
|---|
|
|
|---|
Our results suggest that positive selection is very likely acting on 10 of the 15 accessory gland–expressed genes tested here, all of them likely Acps (Acp1, Acp7, Acp11, Acp19, Acp25, Acp42, Acp45, mayAcp58, mayAcp65, and mayAcp74). These genes had overall (averaged across sites and lineages) dN/dS between 0.635 and 1.275. Genes with dN/dS < 0.60 had negative results in the test for positive selection, suggesting that 0.6 is perhaps a more precise and conservative cutoff than 0.5, when tests are not available or accurate (e.g., when too few sequences are available). We found that positive selection on Acps is acting, many times, within the conserved protein domains and may therefore cause divergence in activity and/or substrate specificity among species. The fact that all the species analyzed here belong to a recently diverged clade shows that adaptive evolution is responsible for gene sequence divergence in a relatively short time frame. As shown for mayAcp74, adaptive evolution maybe involved in sister species divergence.
The results of this study confirm previous suggestions of a faster evolutionary rate in Acps of the D. repleta group as compared with the D. melanogaster and D. pseudoobscura groups (Wagstaff and Begun 2005
; Almeida and DeSalle, in review). They show significant statistical evidence of positive selection for an additional 8 Acps of the repleta group. Together with the 3 Acps with positive selection detected by the MK test in Wagstaff and Begun (2005)
, 2 of which (Acp1 and Acp25) were confirmed here, the total number of repleta group Acps with adaptive evolution is 11. If we include evidence of positive selection in Acp gene families (in paralog divergence; Wagstaff and Begun 2007
), this number is raised to 15. Similar results have been obtained for 3 protease gene families expressed in the female reproductive tract of D. arizonae, one of the species analyzed here (Kelleher et al. 2007
). Positive selection on reproductive molecules of both sexes in species of the D. repleta group suggests a role of male x female antagonistic coevolution. Such selective pressure is expected given the high remating rates observed in species in the group (Markow 1996
; Dorus et al. 2004
). Another explanation is increased sperm competition due to multiple inseminations in a short period of time (Dorus et al. 2004
).
| Supplementary Material |
|---|
|
|
|---|
Supplementary tables 1–4 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Cymone Speed for her help in the laboratory and 2 anonymous reviewers for their helpful comments on an earlier version of the manuscript. Funds for this research were provided by the Sackler Institute for Comparative Genomics (American Museum of Natural History), the Cullman Program in Molecular Systematics, and National Science Foundation (DEB 0129105 to R.D.). F.C.A. was supported by the Henry McCraken Fellowship (New York University).
| Footnotes |
|---|
Jody Hey, Associate Editor
| References |
|---|
|
|
|---|
Aguadé M. Positive selection drives the evolution of the Acp29AB accessory gland protein in Drosophila. Genetics (1999) 152:543–551.
Aguadé M, Miyashita N, Langley CH. Polymorphism and divergence in the Mst26A male accessory gland gene region in Drosophila. Genetics (1992) 132:755–770.[Abstract]
Akashi H. Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics (1996) 144:1297–1307.[Abstract]
Anisimova M, Bielawski JP, Yang Z. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol (2001) 18:1585–1592.
Begun DJ, Whitley P. Molecular population genetics of Xdh and the evolution of base composition in Drosophila. Genetics (2002) 162:1725–1735.
Begun DJ, Whitley P, Todd BL, Waldrip-Dail HM, Clark AG. Molecular population genetics of male accessory gland proteins in Drosophila. Genetics (2000) 156:1879–1888.
Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: signalP 3.0. J Mol Biol (2004) 340:783–795.[CrossRef][Web of Science][Medline]
Chapman T, Davies SJ. Functions and analysis of the seminal fluid proteins of male Drosophila melanogaster fruit flies. Peptides (2004) 25:1477–1490.[CrossRef][Web of Science][Medline]
Chapman T, Liddle LF, Kalb JM, Wolfner MF, Partridge L. Cost of mating in Drosophila melanogaster females is mediated by male accessory gland products. Nature (1995) 373:241–244.[CrossRef][Web of Science][Medline]
Chen PS, Stumm-Zollinger E, Aigaki T, Balmer J, Bienz M, Böhlen P. A male accessory gland peptide that regulates reproductive behavior of female D. melanogaster. Cell (1988) 54:291–298.[CrossRef][Web of Science][Medline]
Civetta A. Positive selection within sperm-egg adhesion domains of fertilin: an ADAM gene with a potential role in fertilization. Mol Biol Evol (2003) 20:21–29.
Civetta A, Clark AG. Correlated effects of sperm competition and postmating female mortality. Proc Natl Acad Sci USA (2000) 97:13162–13165.
Clark AG, Aguade M, Prout T, Harshman LG, Langley CH. Variation in sperm displacement and its association with accessory gland protein loci in Drosophila melanogaster. Genetics (1995) 139:189–201.[Abstract]
Dorus S, Evans PD, Wyckoff GJ, Choi SS, Lahn BT. Rate of molecular evolution of the seminal protein gene SEMG2 correlates with levels of female promiscuity. Nat Genet (2004) 36:1326–1329.[CrossRef][Web of Science][Medline]
Durando CM, Baker RH, Etges WJ, Heed WB, Wasserman M, DeSalle R. Phylogenetic analysis of the repleta species group of the genus Drosophila using multiple sources of characters. Mol Phylogenet Evol (2000) 16:296–307.[CrossRef][Web of Science][Medline]
Fiumera AC, Dumont BL, Clark AG. Associations between sperm competition and natural variation in male reproductive genes on the third chromosome of Drosophila melanogaster. Genetics (2007) 176:1245–1260.
Haerty W, Jagadeeshan S, Kulathinal RJ, et al, (11 co-authors). Evolution in the fast lane: rapidly evolving sex-related genes in Drosophila. Genetics (2007) 177:1321–1335.
Harshman LG, Prout T. Sperm displacement without sperm transfer in Drosophila melanogaster. Evolution (1994) 48:758–766.[CrossRef][Web of Science]
Healy MJ, Dumancic MM, Cao A, Oakeshott JG. Localization of sequences regulating ancestral and acquired sites of esterase6 activity in Drosophila melanogaster. Mol Biol Evol (1996) 13:784–797.[Abstract]
Heifetz Y, Lung O, Frongillo EA, Wolfner MF. The Drosophila seminal fluid protein Acp26Aa stimulates release of oocytes by the ovary. Curr Biol (2000) 10:99–102.[CrossRef][Web of Science][Medline]
Katoh K, Misawa K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res (2005) 33:511–518.
Kelleher ES, Swanson WJ, Markow TA. Gene duplication and adaptive evolution of digestive proteases in Drosophila arizonae female reproductive tracts. PLoS Genet (2007) 3:e148.[CrossRef][Medline]
Kern AD, Jones CD, Begun DJ. Molecular population genetics of male accessory gland proteins in the Drosophila simulans complex. Genetics (2004) 167:725–735.
Kim Y. Effect of strong directional selection on weakly selected mutations at linked sites: implication for synonymous codon usage. Mol Biol Evol (2004) 21:286–294.
Lung O, Tram U, Finnerty CM, Eipper-Mains MA, Kalb JM, Wolfner MF. The Drosophila melanogaster seminal fluid protein Acp62F is a protease inhibitor that is toxic upon ectopic expression. Genetics (2002) 160:211–224.
Lynch M, Conery JS. The origins of genome complexity. Science (2003) 302:1401–1404.
Maddison D, Maddison W. MacClade 4: analysis of phylogeny and character evolution (2000) Sunderland (MA): Sinauer Associates.
Marchler-Bauer A, Anderson JB, Cherukuri PF, et al, (24 co-authors). CDD: a conserved domain database for protein classification. Nucleic Acids Res (2005) 33:D192–D196.
Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res (2004) 32:W327–W331.
Markow TA. Evolution of Drosophila mating systems. Evol Biol (1996) 29:73–106.
Mueller JL, Page JL, Wolfner MF. An ectopic expression screen reveals the protective and toxic effects of Drosophila seminal fluid proteins. Genetics (2007) 175:777–783.
Mueller JL, Ram KR, McGraw LA, Bloch Qazi MC, Siggia ED, Clark AG, Aquadro CF, Wolfner MF. Cross-species comparison of Drosophila male accessory gland protein genes. Genetics (2005) 171:131–143.
Mueller JL, Ripoll DR, Aquadro CF, Wolfner MF. Comparative structural modeling and inference of conserved protein classes in Drosophila seminal fluid. Proc Natl Acad Sci USA (2004) 101:13542–13547.
O'Grady PM, Durando CM, Heed WB, Wasserman M, Etges W, DeSalle R. Genetic divergence within the Drosophila mayaguana subcluster, a closely related triad of Caribbean species in the repleta species group. Hereditas (2002) 136:240–245.[CrossRef][Web of Science][Medline]
Ohta T. Slightly deleterious mutant substitutions in evolution. Nature (1973) 246:96–98.[CrossRef][Web of Science][Medline]
Ohta T. Amino acid substitution at the Adh locus of Drosophila is facilitated by small population size. Proc Natl Acad Sci USA (1993) 90:4548–4551.
Olson JH, Xiang X, Ziegert T, Kittelson A, Rawls A, Bieber AL, Chandler DE. Allurin, a 21-kDa sperm chemoattractant from Xenopus egg jelly, is related to mammalian sperm-binding proteins. Proc Natl Acad Sci USA (2001) 98:11205–11210.
Panhuis TM, Clark NL, Swanson WJ. Rapid evolution of reproductive proteins in abalone and Drosophila. Philos Trans R Soc Lond B Biol Sci (2006) 361:261–268.
Podlaha O, Zhang J. Positive selection on protein-length in the evolution of a primate sperm ion channel. Proc Natl Acad Sci USA (2003) 100:12241–12246.
Pond SL, Frost SD, Muse SV. HyPhy: hypothesis testing using phylogenies. Bioinformatics (2005) 21:676–679.
Ram KR, Sirot LK, Wolfner MF. Predicted seminal astacin-like protease is required for processing of reproductive proteins in Drosophila melanogaster. Proc Natl Acad Sci USA (2006) 103:18674–18679.
Rodriguez-Trelles F, Tarrio R, Ayala FJ. A methodological bias toward overestimation of molecular evolutionary time scales. Proc Natl Acad Sci USA (2002) 99:8112–8115.
Rodriguez-Trelles F, Tarrio R, Ayala FJ. Evolution of cis-regulatory regions versus codifying regions. Int J Dev Biol (2003) 47:665–673.[Web of Science][Medline]
Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics (2003) 19:2496–2497.
Ruiz A, Heed WB, Wasserman M. Evolution of the mojavensis cluster of cactophilic Drosophila with descriptions of two new species. J Hered (1990) 81:30–42.
Russo CA, Takezaki N, Nei M. Molecular phylogeny and divergence times of drosophilid species. Mol Biol Evol (1995) 12:391–404.[Abstract]
Schlenke TA, Begun DJ. Natural selection drives Drosophila immune system evolution. Genetics (2003) 164:1471–1480.
Schully SD, Hellberg ME. Positive selection on nucleotide substitutions and indels in accessory gland proteins of the Drosophila pseudoobscura subgroup. J Mol Evol (2006) 62:793–802.[CrossRef][Web of Science][Medline]
Swanson WJ, Aquadro CF, Vacquier VD. Polymorphism in abalone fertilization proteins is consistent with the neutral evolution of the egg's receptor for lysin (VERL) and positive Darwinian selection of sperm lysin. Mol Biol Evol (2001) 18:376–383.
Swanson WJ, Clark AG, Waldrip-Dail HM, Wolfner MF, Aquadro CF. Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proc Natl Acad Sci USA (2001) 98:7375–7379.
Swanson WJ, Nielsen R, Yang Q. Pervasive adaptive evolution in mammalian fertilization proteins. Mol Biol Evol (2003) 20:18–20.
Swanson WJ, Vacquier VD. The rapid evolution of reproductive proteins. Nat Rev Genet (2002) 3:137–144.[Web of Science][Medline]
Swanson WJ, Wong A, Wolfner MF, Aquadro CF. Evolutionary expressed sequence tag analysis of Drosophila female reproductive tracts identifies genes subjected to positive selection. Genetics (2004) 168:1457–1465.
Swofford DL. PAUP*. Phylogenetic anlysis using parsimony (*and other methods) (2003) Sunderland (MA): Sinauer Associates.
Vallender EJ, Lahn BT. Positive selection on the human genome. Hum Mol Genet (2004) 13:R245–R254.
Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS Biol (2006) 4:e72.[CrossRef][Medline]
Wagstaff BJ, Begun DJ. Molecular population genetics of accessory gland protein genes and testis-expressed genes in Drosophila mojavensis and D. arizonae. Genetics (2005) 171:1083–1101.
Wagstaff BJ, Begun DJ. Adaptive evolution of recently duplicated accessory gland protein genes in desert Drosophila. Genetics (2007) 177:1023–1030.
West MA, Lucocq JM, Watts C. Antigen processing and class II MHC peptide-loading compartments in human B-lymphoblastoid cells. Nature (1994) 369:147–151.[CrossRef][Web of Science][Medline]
Wigby S, Chapman T. Sex peptide causes mating costs in female Drosophila melanogaster. Curr Biol (2005) 15:316–321.[CrossRef][Web of Science][Medline]
Wolfner MF. The gifts that keep on giving: physiological functions and evolutionary dynamics of male seminal proteins in Drosophila. Heredity (2002) 88:85–93.[CrossRef][Web of Science][Medline]
Wong A, Turchin MC, Wolfner MF, Aquadro CF. Evidence for positive selection on Drosophila melanogaster seminal fluid protease homologs. Mol Biol Evol (2008) 25:497–506.
Wong WS, Yang Z, Goldman N, Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics (2004) 168:1041–1051.
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci (1997) 13:555–556.
Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol (1998) 15:568–573.[Abstract]
Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol (2000) 15:496–503.[CrossRef][Medline]
Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol (2002) 19:908–917.
Yang Z, Nielsen R, Hasegawa M. Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol (1998) 15:1600–1611.[Abstract]
Yang Z, Wong WS, Nielsen R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol (2005) 22:1107–1118.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. S. Kelleher and J. E. Pennington Protease Gene Duplication and Proteolytic Activity in Drosophila Female Reproductive Tracts Mol. Biol. Evol., September 1, 2009; 26(9): 2125 - 2134. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Kelleher and T. A. Markow Duplication, Selection and Gene Conversion in a Drosophila mojavensis Female Reproductive Protein Family Genetics, April 1, 2009; 181(4): 1451 - 1465. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



