MBE Advance Access originally published online on July 19, 2006
Molecular Biology and Evolution 2006 23(10):1952-1965; doi:10.1093/molbev/msl062
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Geographic Variation and Positive Selection on M7 Lysin, an Acrosomal Sperm Protein in Mussels (Mytilus spp.)
Department of Biology, Duke University
E-mail: c.riginos{at}uq.edu.au.
| Abstract |
|---|
|
|
|---|
Successful fertilization in free-spawning marine organisms depends on the interactions between genes expressed on the surfaces of eggs and sperm. Positive selection frequently characterizes the molecular evolution of such genes, raising the possibility that some common deterministic process drives the evolution of gamete recognition genes and may even be important for understanding the evolution of prezygotic isolation and speciation in the marine realm. One hypothesis is that gamete recognition genes are subject to selection for prezygotic isolation, namely, reinforcement. In a previous study, positive selection on the gene coding for the acrosomal sperm protein M7 lysin was demonstrated among allopatric populations of mussels in the Mytilus edulis species group (M. edulis, Mytilus galloprovincialis, and Mytilus trossulus). Here, we expand sampling to include M7 lysin haplotypes from populations where mussel species are sympatric and hybridize to determine whether there is a pattern of reproductive character displacement (RCD), which would be consistent with reinforcement driving selection on this gene. We do not detect a strong pattern of RCD; neither are there unique haplotypes in sympatry nor is there consistently greater population structure in comparisons involving sympatric populations. One distinct group of haplotypes, however, is strongly affected by natural selection, and this group of haplotypes is found within M. galloprovincialis populations throughout the Northern Hemisphere concurrent with haplotypes common to M. galloprovincialis and M. edulis. We suggest that balancing selection, perhaps resulting from sexual conflicts between sperm and eggs, maintains old allelic diversity within M. galloprovincialis.
Key Words: balancing selection bindin gamete recognition reinforcement reproductive character displacement reproductive isolation
| Introduction |
|---|
|
|
|---|
One of the most exciting results to emerge from recent studies of molecular evolution and genomics is the observation that proteins with reproductive functions typically evolve rapidly and are often subject to positive selection (Civetta and Singh 1998
Indeed, some of the most intensively studied reproductive genes code for the sperm proteins of abalone and sea urchins, which are model organisms for fertilization biology. For example, in abalone, the sperm protein lysin nonenzymatically dissolves the egg vitelline membrane, permitting sperm to fertilize eggs (Vacquier et al. 1990
; Swanson and Vacquier 1997
). The abalone lysin gene has some of the highest ratios of nonsynonymous (dN) to synonymous (dS) substitutions between species of any known full-length gene (as great as 4.10, Lee and Vacquier 1992
; Lee et al. 1995
; Vacquier et al. 1997
). The observation of dN/dS > 1 and paucity of intraspecific polymorphisms (Lee et al. 1995
; Metz et al. 1998
) point to recent selective sweeps on this gene due to the adaptive fixation of nonsynonymous substitutions. Abalone acrosomes also contain a 18-kDa protein, and the gene coding for this protein similarly shows high ratios of dN/dS across the entire gene (as great as 4.67, Swanson and Vacquier 1995
; Vacquier et al. 1997
) and no intraspecific polymorphism (Metz et al. 1998
), also consistent with recent selective sweeps.
In sea urchins, the sperm protein bindin controls attachment and fusion of sperm to eggs (Vacquier et al. 1995
), a critical stage for determining whether male and female gametes are compatible (Palumbi and Metz 1991
; Metz et al. 1994
). Like lysin, bindin is subject to positive selection, although only one region of bindin has dN/dS > 1 (Metz and Palumbi 1996
). This "hotspot" is characterized by nonsynonymous substitutions both between species and also nonsynonymous polymorphisms within species of urchins (Metz and Palumbi 1996
; Biermann 1998
; Palumbi 1999
; Geyer and Palumbi 2003
). In contrast to abalone lysin, the nonsynonymous polymorphisms of bindin suggest that selection acts intraspecifically to maintain allelic variation.
In abalone and sea urchins, sperm receptor genes (expressed on egg surfaces) have been identified (Swanson and Vacquier 1997
; Galindo et al. 2002
, 2003
; Kamei and Glabe 2003
) and in some cases are also subject to positive selection (Galindo et al. 2003
; Mah et al. 2005
); if and how these female and male genes coevolve is not yet understood.
Other marine invertebrate sperm proteins similarly show evidence for positive selection, such as genes from teguline snails (Hellberg et al. 2000
) and blue mussels (Riginos and McDonald 2003
) but are less studied than either abalone lysin or sea urchin bindin.
Although positive selection on fertilization genes in broadcast-spawning marine invertebrates seems to be common, there is little consensus on the underlying source(s) of selection. A number of nonmutually exclusive hypotheses have been proposed. Two relevant categories of hypotheses are discussed briefly here; for a full treatment, reviews by Howard (1999)
, Swanson and Vacquier (2002b)
, and Coyne and Orr (2004)
are recommended.
One group of hypotheses reflects the differing interests of male and female gametes. For example, in high-density situations, sperms of broadcast spawners may be in direct competition with other males (Yund and McCartney 1994
; Yund 2000
; Levitan 2004
; Levitan and Ferrell 2006
), leading to sperm competition, where mutations conferring a fertilization advantage to certain males will be advantageous and subject to strong selection. Sperm competition could explain the rapid evolution, via selective sweeps, of male reproductive genes. At the same time, eggs may be under strong selection to delay sperm penetration in order to avoid polyspermy (Vacquier et al. 1997
; Yund 2000
; Levitan 2002
; Levitan and Ferrell 2006
), or some other aspect of female choice may be involved (Vacquier et al. 1997
; Palumbi 1999
). Conflicting pressures on males and females could theoretically drive positive selection on both partners and would maintain intraspecific polymorphisms at gamete recognition genes (Rice and Holland 1997
; Rice 1998
; Frank 2000
; Gavrilets 2000
; Haygood 2004
), with many possible outcomes (Gavrilets and Hayashi 2005
, 2006
), including allopatric (Rice 1998
; Gavrilets 2000
) and sympatric (Sander van Doorn et al. 2001
; Gavrilets and Waxman 2002
) speciation.
Reinforcement is a second hypothesis that could explain the rapid evolution of gamete recognition genes (e.g., Howard 1999
; Swanson and Vacquier 2002b
; Coyne and Orr 2004
). Under the process of reinforcement (Dobzhansky 1940
), there is a fitness advantage for mutations contributing to prezygotic isolation when taxa have some degree of postzygotic isolation yet hybrids are formed. An expected outcome of selection due to reinforcement is a pattern of reproductive character displacement (RCD; where reproductive differences are greater where taxa are sympatric as compared with where allopatric); however, other scenarios can also lead to RCD (Howard 1993
; Noor 1999
; Zigler and Lessios 2003
; Coyne and Orr 2004
). For within-species variation of reproductive genes, a strong pattern of RCD would be manifest as unique alleles found in sympatric populations or enhanced genetic divergence between sympatric and allopatric populations as compared with comparisons among allopatric populations (as in Geyer and Palumbi 2003
). The reinforcement hypothesis does not negate a role for female genes or sexual conflict: indeed, eggs are expected to be under stronger selection than sperm (due to relative energetic investments of the parents) and the observed evolution of sperm genes could be a response to the evolution at the receptor egg genes (Coyne and Orr 2004
), with reinforcement as a special outcome of conflicting sexual pressures on male and female gametes (Parker and Partridge 1998
).
An intraspecific pattern of RCD at the nucleotide level would support a reinforcement hypothesis, whereas antagonistic gamete interactions, in the absence of reinforcement, should act at the population level (irrespective of fitness costs from hybridization), and thus, patterns of polymorphism and selection within these populations should not be affected by the presence or absence of other closely related species.
Whether or not there is a pattern of RCD cannot be evaluated in abalone as there is virtually no within-species polymorphism for either lysin or the gene for the 18-kDa protein (Lee et al. 1995
; Metz et al. 1998
). Sea urchin bindin, however, is considerably polymorphic within species, and Geyer and Palumbi (2003)
found a pattern of RCD between the species of Echinometra oblonga and Echinometra sp. C. Specifically, within E. oblonga they described a unique clade of bindin haplotypes that were only found in populations sympatric with E. sp. C. Both the observation of RCD and prezygotic isolation in the form of conspecific sperm precedence between sympatric populations of these 2 species (Geyer and Palumbi 2005
) lend support to the hypothesis of reinforcement. Furthermore, RCD is broadly observed among sea urchin genera; in those genera that include many sympatric species, bindin evolves under positive selection, whereas in genera with allopatric species, bindin evolution is constrained (Biermann 1998
; Metz et al. 1998
; Zigler and Lessios 2003
; Zigler et al. 2003
; Zigler and Lessios 2004
; Palumbi and Lessios 2005
). These patterns of bindin evolution are also accompanied by a reduced ability for heterospecific sperm to fertilize eggs within sympatric genera as compared with allopatric genera (Minor et al. 1991
; Metz et al. 1994
, 1998
; Palumbi and Lessios 2005
).
Although the observation of RCD in bindin within and between urchin species is consistent with selection due to reinforcement, at the same time, there is also evidence for male-by-female interactions in the urchin Echinometra mathei based on bindin genotype, so that none of the clades of bindin haplotypes are universally superior (Palumbi 1999
). In the urchin Strongylocentrotus franciscanus, the fitness of bindin alleles is affected both by genotype frequency and local spawning density (Levitan and Ferrell 2006
). Thus, interactions between male and female urchins are likely to be important for the evolution of bindin.
In summary, most of what is known about the evolution of fertilization genes in broadcast-spawning marine invertebrates comes from sperm genes of urchins and abalone. The lack of intraspecific variation in male abalone genes is a major impediment to experimentally testing competing hypotheses regarding sources of selection (but see Panhuis et al. 2006
). Furthermore, in both groups, reproductive isolation is essentially complete, as evidenced by the rarity of hybrids in nature (Owen et al. 1971
; Palumbi and Metz 1991
; Lessios and Pearse 1996
), so that it is difficult to infer the importance of bindin and lysin evolution to speciation in urchins and abalone.
In contrast, natural hybridization is common among mussels in the Mytilus edulis species complex, presenting an alternative system in which to understand intraspecific selective pressures on fertilization genes and to investigate whether their evolution is important in the establishment of reproductive isolation. Previously, the acrosomal sperm protein gene M7 lysin was isolated (Takagi et al. 1994
) and examined from allopatric populations within the 3 closely related species of M. edulis, Mytilus galloprovincialis, and Mytilus trossulus (Riginos and McDonald 2003
). Although mussel lysins function in the same manner as abalone lysins (to dissolve the egg vitelline envelope), these genes exhibit no sequence similarity to abalone lysin and may not be homologous (or have evolved so rapidly that evidence for homology has been obscured). An excess of nonsynonymous substitutions between species (P < 0.05 in a McDonaldKreitman test) was found for M7 lysin along with considerable within-species polymorphism (Riginos and McDonald 2003
). This polymorphism (like that of bindin but unlike abalone lysin) allows us to examine intraspecific geographic variation of M7 lysin.
Mytilus edulis, M. galloprovincialis, and M. trossulus are genetically distinct (McDonald and Koehn 1988
; McDonald et al. 1991
; Rawson and Hilbish 1995
), and there is experimental evidence for both prezygotic (Bierne et al. 2002
; Rawson et al. 2003
) and postzygotic (Zouros et al. 1994
; Bierne et al. 2002
) isolation between species pairs (but see Beaumont et al. 2005
). Nevertheless, hybrid individuals are found where any of the 3 species overlap and are common (>10%) in each of these hybrid zones (fig. 1, and Skibinski et al. 1978
; Coustau et al. 1991
; Suchanek et al. 1997
; Comesaña et al. 1999
; Rawson et al. 1999
; Riginos and Cunningham 2005
). The age of secondary contact differs among zones, for example, contact between native Pacific M. trossulus and M. galloprovincialis probably has resulted from human introduction of M. galloprovincialis to the Pacific Ocean (McDonald and Koehn 1988
), whereas other Northern Hemisphere zones are probably older (Pleistocene). The geographic range of each species spans thousands of kilometers, with populations separated by continental landmasses and deep ocean waters. These barriers can minimize genetic exchange (Hilbish et al. 2000
; Riginos et al. 2004
), so that antagonistic gamete interactions between geographically disjunct populations might have different evolutionary outcomes (Gavrilets 2000
). Finally, mussels form extremely dense aggregations (C. Riginos, personnel observation), so that multiple gametes are likely to encounter each other in spawning events, increasing the likelihood of selection induced by competitive (within- or between-species) gamete interactions.
|
In the present study, we survey genetic variation of M7 lysin across the species ranges of M. edulis, M. galloprovincialis, and M. trossulus in the Northern Hemisphere. We investigate whether M7 lysin shows a pattern of strong RCD in the form of unique alleles restricted to sympatric populations and/or greater genetic differentiation between sympatric and allopatric populations as compared with differentiation among allopatric populations. In addition, we further characterize patterns of selection within species and among major haplotype groups.
| Materials and Methods |
|---|
|
|
|---|
Sampling Sites and Mussel Hybrid Zones
In the initial survey of M7 lysin variation (Riginos and McDonald 2003
The present study also includes regions where ranges of 2 of the 3 species overlap and where hybrid individuals are often found. Collecting sites are shown in figure 1. Tissues used were either frozen at 80 °C, preserved in RNAlater (Ambion, Austin, TX), or preserved in 95% ethanol. Species identity was established using the Glu 5' marker (Rawson et al. 1996
).
Polymerase Chain Reaction Amplification and DNA Sequencing of Genomic DNA
The M7 lysin gene is approximately 8 kb in length, and the 5 exons are separated by large (up to 3 kb) introns. These large introns make genomic amplification across the entire coding region impractical. (Complete details of the M7 lysin gene structure and how it was elucidated will be described elsewhere.) Here, we concentrate on the fifth exon (at the 3' end of the coding region) for an in-depth examination of variation within and among species. We chose this exon because Riginos and McDonald (2003)
found evidence for positive selection in this region of the gene and because the fifth exon is the largest exon of M7 lysin (210 bp). A forward primer targeting the intron upstream of the fifth exon (melex F2: 5'-CGG CTG GTC CAG TAA GCA GT-3') and a reverse primer matching 3' UTR (melex R2: 5'-TCA AAA TTA TTT CAG CAT GCA CTT TG-3') were used to amplify a
545-bp fragment, with Titanium Taq polymerase (Clontech, BD Biosciences, Palo Alto, CA).
Allelic diversity was sufficiently high and included many polymorphic indels in the intron and 3' UTR, so that direct sequencing frequently did not yield legible sequences. Therefore, each amplicon was cloned using a TOPO-TA cloning kit (Invitrogen, Carlsbad, CA). Individual colonies were used to prime polymerase chain reactions (PCRs) using M13 primers, and the resultant PCR products were cleaned using Eppendorf PerfectPrep Cleanup 96 plates (Westbury, NY). Amplified clonal inserts were sequenced with M13 primers, using the ABI PRISM BigDye Terminators ver. 3.1 Cycle Sequencing Kit (Foster City, CA). Sequencing reactions were electrophoresed on an ABI PRISM 3700 DNA Analyzer. Sequence chromatograms were visualized and edited using SEQUENCHER ver. 4.2 software (Gene Codes Corp., Ann Arbor, MI).
Because PCR amplification can introduce errors such as nucleotide substitutions and recombination, haplotypes were determined using consensuses across cloned inserts. Initially, 8 colonies were sequenced per individual mussel and additional colonies were screened when preliminary sequences were not sufficient to resolve specific haplotypes. Haplotypes were considered resolved when three or more insert sequences (from the same mussel) differed only by unique substitutions (presumably due to random incorporation of incorrect nucleotides) or in the rare instance where 2 sequences were exactly identical. Sequences from cloned inserts that could not be confirmed in this manner were discarded. When all 8 inserts recovered from a single individual were consistent with being sequences of an identical haplotype (e.g., differing only by unique substitutions), we assumed that the individual was homozygous, and this was also confirmed by direct sequencing. Because the intron and 3' UTR regions were difficult to align, for most analyses contigs were parsed to the region of the fifth exon and combined with alleles containing this region determined from Reverse Transcription (RT)PCR of mRNA. This data set will be referred to as the fifth exon.
RNA Extraction and RT-PCR
Individuals containing a divergent cluster of M. galloprovincialis haplotypes (see results) were also targeted for RT-PCR to determine the full coding sequence for this group of haplotypes. Total RNA was extracted from fresh-frozen tissue or tissue preserved in RNAlater using Qiagen's RNeasy Plant mini kits and following manufacturer's directions. The first-strand synthesis of cDNA and RT-PCR was performed as in Riginos and McDonald (2003)
but using Eppendorf cMaster RTplusPCR System for first-strand synthesis. This RT-PCR protocol yields the entire coding region of M7 lysin. These PCR products were cloned and screened for nucleotide variation as described above. New haplotypes from cDNA, containing the entire 540-bp coding region, were manually aligned to the 25 haplotypes from Riginos and McDonald (2003)
. We will refer to this data set as the M7 lysincoding region or whole-gene data set.
Tests of Selection
The issue of whether M7 lysin evolves under positive selection was examined in a genealogical framework. The program Modeltest (Posada and Crandall 1998
) was run with PAUP* 4.0b10 (Swofford 1998
) to determine the most appropriate model of sequence evolution for both the whole-gene and fifth-exon data sets. The most likely models of sequence evolution were implemented in PAUP*, using heuristic maximum likelihood searches with tree bisection-reconnection branch swapping.
Patterns of evolution within M7 lysin were examined using maximum likelihood methods developed by Yang and coworkers as implemented in CODEML in the PAML ver. 3.14 (Yang 1997
) package of programs. Taxa from the maximum likelihood genealogy estimated by PAUP* were pruned to remove all zero-length branches, and this genealogy was used as the input tree. First, we examined how models of evolution that allowed
(the ratio of nonsynonymous to synonymous rates of substitution) to vary among branches (branch models, Yang 1998
) fit the genealogy. Likelihood was estimated where
was held constant along all branches (M0) and where
was allowed to freely vary among branches. We also examined 2 discrete branch models for
: 1) a 2-rate model where
D was estimated for the species branch leading to the divergent group of M. galloprovincialis haplotypes and a second
0 for all remaining branches and 2) a 3-rate model where discrete rates were given to the divergent group of M. galloprovincialis haplotypes (
D) and to the remaining species branches (
S), and a third rate to within-species branches (
0). In addition, rates of
were also allowed to vary across codon sites while held constant across branches (site models, Nielsen and Yang 1998
; Yang et al. 2000
), under the nearly neutral (M1a), selection (M2), discrete (M3), beta (M7), and beta +
(M8) models. Finally, we also employed the models (A and B) of Yang and Nielsen (2002)
, which allow
for both sites and discrete branches to differ. For models A and B, the 2 branch rates (e.g.,
D and
0) were used and site parameters were estimated independently for those 2 branch types.
For all models described above, the F3 x 4 codon frequencies and Jones et al. (1992)
amino acid rates were used with no enforced molecular clock. The significance of each model was tested in a nested fashion using log-likelihood ratio tests as described in Yang (1998)
, where
is distributed as a chi-square distribution with degrees of freedom (df) equaling the difference between parameters estimated by the 2 models.
In addition to the maximum likelihood estimates of
along branches of the genealogy,
was estimated between all species pairs using a NeiGojobori distance with a JukesCantor correction, as implemented in MEGA (Kumar et al. 2004
). In all cases, comparisons were made between the individuals with the most basal haplotype relative to their species group, so as to estimate
along the species branches and not along intraspecific branches. To examine how
may differ in different gene regions, we used a sliding window analysis of NeiGojobori distances for dN and dS along the length of the gene and for the fifth exon (similar to the sliding windows in Metz and Palumbi 1996
; Biermann 1998
that identified the bindin hotspot). Z-tests and Fisher's exact tests were used to test whether dN was significantly greater than dS for each data partition.
Relative proportions of nonsynonymous and synonymous polymorphisms within species were also compared with the proportions of nonsynonymous and synonymous fixed differences between clades. These proportions were determined both for the whole coding region and the fifth exon data sets using McDonaldKreitman tests (1991)
, as implemented in DNAsp ver.4.10 (Rozas et al. 2003
).
Heterogeneity in polymorphism to divergence ratios along the gene was examined using McDonald's (1998)
KolmogorovSmirnov test, as described in Riginos and McDonald (2003)
. This test had previously identified a significant reduction in polymorphism within M. galloprovincialis in the 3' region of the gene. With the inclusion of additional M. galloprovincialis haplotypes in the current study, we wanted to test whether this earlier result was upheld.
Finally, genetic variation within species was examined in order to ascertain whether any recent selective sweeps have occurred at this locus. Genetic variation was estimated and tests of selection based on frequencies of polymorphic sites were performed for both the whole coding region and fifth exon for each species group, using DNAsp. To test for such deviations from the neutral, equilibrium frequency spectrum of polymorphisms, we used Tajima's (1989)
and Fu's (1996)
tests of selection. Haplotypes were also examined for evidence of past recombination, following the method of Hudson and Kaplan (1985)
.
Geographic Distribution of M7 Lysin Haplotypes
Analysis of the genomic DNA sequences containing the fifth exon revealed several fixed differences among major groups (see results), including the presence and absence of sites amenable to restriction enzyme digestion. Specifically, an EcoRV site was present in all haplotypes typical of M. galloprovincialis and M. edulis (but absent in M. trossulus haplotypes). Similarly, a BstEII cut site was present in single divergent cluster of M. galloprovincialis haplotypes (see results for full details). To determine the geographic distributions of specific allelic types, this region was PCR amplified as described above, and the PCR products were digested separately with EcoRV and BstEII (New England Biolabs, Beverly, MA), following manufacturer's directions. An excess of restriction enzyme was added to all reactions and allowed to digest the PCR products overnight so that the digestion proceeded to completion. Thus, individuals homozygous for a given cut site could be resolved from heterozygous individuals on an agarose gel (fig. 2).
|
Genotyping was also performed using the Glu 5' marker to determine species identity. Primers from Rawson et al. (1996)
To assess population structure within species, FST was estimated using Arlequin (ver. 2001, Schneider et al. 1997
), based on pairwise differences between full coding sequence haplotypes and using 10,000 permutations. FST was estimated for all sequenced haplotypes and then also for a subset of sequences where hybrid individuals (based on combined M7 lysin and Glu 5' genotypes) were excluded. FST based on the complete set of sequences includes potentially introgressive M7 lysin haplotypes, whereas some of those introgressive haplotypes were excluding by omitting hybrid individuals (subject to the limitations of resolving backcross hybrids from pure species individuals with only 2 loci). For homozygous individuals, we included 2 identical haplotypes in the FST estimation. In addition, within M. galloprovincialis, we estimated FST based on allele frequencies as determined by restriction digests with BstEII.
| Results |
|---|
|
|
|---|
Patterns of Selection and Within-Species Polymorphism
Both sequences from genomic DNA containing the fifth exon and whole-gene sequences from mRNA revealed a distinct cluster of highly divergent haplotypes (group D; fig. 3) that had not been found in the previous survey of M7 lysin variation (e.g., Riginos and McDonald 2003
on the D branch (
D) provided a better fit to the data than a model where all branches had the same single
(P < 0.020; table 2). In all models where
D was independent from other branches, it was estimated to be greater than 1.9, indicative of an increased rate of nonsynonymous substitutions along this branch, whereas
on other branches (both inter- and intraspecific branches) was always less than one. Models of evolution that allowed for
to vary among sites (M2, M3, M7 and models A and B) along the gene did not provide a substantially better fit to the data than the nearly neutral models (M1a and M7) of sequence evolution, although models A and B also found
D > 1 (for the foreground branch).
|
|
|
Previous analysis of variation along the M7 lysin gene had suggested that selection may be acting more strongly on the 3' end of the gene (Riginos and McDonald 2003
|
McDonaldKreitman tests, however, substantiated a greater rate of amino acidchanging substitutions between major groups relative to within-species polymorphism. Previous analyses had yielded a significant McDonaldKreitman test when polymorphism data for all 3 species were combined (Riginos and McDonald 2003
|
If new M7 lysin mutations are frequently adaptive, then there may be evidence for recent selective sweeps within some species or haplotype groups. Riginos and McDonald (2003)
None of the species groups had an excess of rare variants from Tajima's and Fu's tests of selection (for M. galloprovincialis, D = 0.11, F = 2.60; for M. edulis, D = 0.56, F = 3.24; and for M. trossulus, D = 0.08, F = 1.50), and nonsynonymous polymorphisms were found segregating among all species (13 for M. galloprovincialis, 1 for M. edulis, and 3 for M. trossulus; see fig. 4 and single additional nonsynonymous polymorphic sites were found in fifth exon sequences).
In the full coding region, within M. edulis and M. galloprovincialis haplotypes, 3 pairs of sites ([129, 138], [138, 191], [252, 510]) were found having all 4 possible gametic combinations, indicative of a minimum of 3 recombination events (Hudson and Kaplan 1985
), and there were no detectable recombination events within M. trossulus. Within the haplotypes from genomic DNA that include intron and 3' UTR, there were a minimum of 3 recombination events detected among all sequences but no recombination within any species or major group of haplotypes. Only one of these recombination events included the exon (basepair position 172 in the intron and basepair position 400 in the exon). When only the fifth exon was examined (and sequences derived from mRNA included), no recombination events were detected.
Geographic Distribution of M7 Lysin Haplotypes
Within M. edulis and M. galloprovincialis, there were 4 major haplotypes (figs. 1 and 3). Haplotype A (characteristic of M. edulis) was recovered from all populations with M. edulis in the Atlantic and from the allopatric M. galloprovincialis population in Greece (GR811A) but not from the Pacific La Jolla (M. galloprovincialis) population. Haplotype B was found in all populations that would be expected to have M. galloprovincialis haplotypes (Greece, La Jolla, and France), and 2 B haplotypes were also recovered from western Sweden (Tjärnö; TJ917A and TJ928A). Haplotype C was the least common and only found in Northern Atlantic populations that have M. edulis. D haplotype sequences were recovered from all populations that include M. galloprovincialis (Greece, France, and La Jolla). Restriction digests recovered the D allele in 18 out of 57 M. galloprovincialis individuals worldwide, but D was always heterozygous with an A, B, or C allele. This observed deficit of DD homozygotes was not a significant deviation from HardyWeinberg expectations across all M. galloprovincialis (
2 = 2.0, P = 0.16); however, a post hoc estimate of power for this test (assuming the observed proportions are the true population parameters) equaled only 29% (estimated with G*Power; Buchner et al. 1997
). The single major M. trossulus haplotype was found in all populations where M. trossulus was sampled (figs. 1 and 3) based both on sequencing and restriction digests, including sites in the Eastern Pacific (Alaska, Washington State), Northwest Atlantic (Nova Scotia and Newfoundland), and the Baltic Sea (Askö). All major haplotypes were recovered both from sympatric and allopatric populations. Among individuals sequenced for M7 lysin, only TJ914 and the 4 Askö individuals had hybrid (combined M7 lysin and Glu 5') genotypes.
Population structure of M7 lysin was not significant within either M. edulis or M. trossulus (table 4). Excluding the 2 TJ914 haplotypes did not substantially alter FST estimates (FST = 0.071 for Woods HoleTjärnö and FST = 0.099 for TjärnöMahone Bay). Within M. galloprovincialis, FST for the sympatric Samos, GreeceRoscoff, France, and La Jolla, CARoscoff population pairs was significant based on pairwise differences among fifth exon sequences (table 4). Among pairwise estimates based on BstEII restriction digests, only the Berkeley, CALa Jolla, CA comparison was significantly greater than zero (FST = 0.169, P = 0.03).
|
All sequences described here have been deposited in GenBank, with accession numbers DQ151462DQ151465 and DQ837305DQ837370.
| Discussion |
|---|
|
|
|---|
The acrosomal sperm protein gene, M7 lysin, has been subject to positive selection within the M. edulis species complex. Evidence for selection was apparent between all 3 species of mussels considered here but has acted particularly strongly on the D group of haplotypes. These haplotypes were only detected within populations of M. galloprovincialis. Within species, although there was considerable polymorphism, there were no obvious patterns of geographic differentiation among M7 lysin haplotypes. Major groups of haplotypes were found within all surveyed populations of each species and not restricted to areas of sympatry. In most cases, genetic differentiation between sympatric and allopatric populations was similar to differentiation between allopatric populations. Thus, there was no strong evidence for a pattern of RCD for M7 lysin, and therefore, reinforcement is an unlikely agent of selection on this gene.
| Patterns of Selection and Within-Species Polymorphism |
|---|
|
|
|---|
Positive selection has shaped the evolution of M7 lysin in 2 distinct ways. First, there has been an increase in the rate of nonsynonymous substitutions in the D lineage along the length of the gene (table 2). Second, the 3' end of the gene (encompassed by the fifth exon) has experienced a greater proportion of nonsynonymous to synonymous substitutions relative to the rest of the gene (fig. 4) and a significant reduction in polymorphism among group B haplotypes.
Maximum likelihood models of branch evolution all provided a significantly better fit to the data (P
0.02) when rates for the branch of the D lineage was allowed to differ from other branches and, in all cases, assigned
D
1.9 (table 2). Given that these maximum likelihood methods assume long branch lengths and have little power to detect positive selection on short branches (Anisimova et al. 2001
), it is surprising to have found such a strong result on an intraspecific branch. McDonaldKreitman tests, usually more sensitive to selection within species and among closely related species, showed a general pattern of positive selection among species for M7 lysin (as in Riginos and McDonald 2003
) and also found a high rate of nonsynonymous substitutions between D and B haplotypes within M. galloprovincialis (table 3).
In addition, it appears that selection may have acted more strongly on the 3' end of the gene across all M7 lysin haplotypes. In comparisons between taxa, dN/dS > 1 for all sliding windows between basepairs 363 and 510, a region encompassed by the fifth exon (fig. 3). Although none of these contrasts were statistically significant with a stringent Fisher's exact test (due to low power with few total substitutions), Z-tests did indicate that dN is significantly greater than dS in this region and, in general, for the fifth exon. Such Z-tests were similarly used to identify the sea urchin bindin hotspot (Metz and Palumbi 1996
; Biermann 1998
) and, thus, can pinpoint regions of biological interest even when more stringent exact tests are not significant. McDonaldKreitman tests on the fifth exon also showed an excess of nonsynonymous fixed differences (P
0.02) between each group (A, B, C, and D) of M. edulis and M. galloprovincialis haplotypes and M. trossulus. Remarkably, there were no synonymous fixed differences between species in this region, only nonsynonymous fixed differences and synonymous polymorphisms. Thus, significant McDonaldKreitman tests corroborate our contention that the 3' end of the gene has specifically been the target of selection.
Positive selection in the 3' end of the gene was particularly notable within group B haplotypes, which are characteristic of M. galloprovincialis. The previous study of M7 lysin variation had also identified this gene region as possibly being under increased selection pressure, based largely on the observation that polymorphism within M. galloprovincialis decreased in sliding window tests of polymorphism to divergence (Riginos and McDonald 2003
). The M. galloprovincialis haplotypes used in that test correspond to the B group of haplotypes (fig. 3). Here, we confirmed this reduction in polymorphism (in the 3' region) among B haplotypes when contrasted to divergence with group D (P < 0.01). Thus, the serine to alanine substitution at codon 144 remains a candidate for selective fixation, although it does not define M. galloprovincialis as a species group as previously described.
In the M7 lysin genealogy, the D clade of haplotypes was basal to clades A and B that include both M. galloprovincialis and M. edulis, yet no D haplotypes were recovered from any M. edulis individuals. Thus, D haplotypes appear to have been lost from M. edulis. Within M. galloprovincialis, A, B, and D haplotypes have been retained and there is evidence for positive selection on both B and D haplotype groups. The basal position of the D haplotypes, combined with evidence for positive selection acting on both B and D clades and lack of DD homozygous individuals (determined from BstEII restriction digests) points to some form of balancing selection maintaining polymorphisms within M. galloprovincialis.
| Geographic Patterns and Potential Sources of Selection |
|---|
|
|
|---|
Although positive selection has shaped the evolution of M7 lysin, the source of selection is unclear. If reinforcement were the main causal agent, we would expect to find RCD where species are sympatric. In the sea urchin E. oblonga, for example, a unique clade of bindin haplotypes was only found in populations that are sympatric with E. sp. C (Geyer and Palumbi 2003
In general, population structure was not enhanced in comparisons involving sympatric populations and most FST estimates were not significant (table 4). The only exceptions were within M. galloprovincialis (table 4). In the case of the Samos, GRRoscoff, FR and La Jolla, CARoscoff, FR comparisons (where Roscoff was considered a sympatric population), FST estimates were significant from sequenced haplotypes but not from allelic samples (determined by restriction digestion with BstEII) (table 4). In Roscoff, only one D haplotype (from 10 sequenced total) was recovered (as compared with 4 out of 14 from Samos and 2 out of 9 from La Jolla), whereas the frequency of D alleles in Roscoff as determined by restriction digests was 17% (from 36 allelic copies). Because the frequency of divergent D haplotypes/alleles strongly affected FST estimation, the smaller sampling sizes from sequenced samples could have contributed to sampling error that elevated FST values for pairwise comparisons involving Roscoff. FST estimates from allelic frequencies are more likely to accurately reflect true population parameters, in this case a lack of population structure. The significant FST value obtained for the La JollaBerkeley comparison, however, was less likely to have been affected by small sample sizes (16 alleles from La Jolla and 30 alleles from Berkeley), but the frequency of D alleles is less in the sympatric Berkeley population (7%) as compared with allopatric La Jolla (31%), contrary to the predicted RCD pattern. Although our data do not indicate a clear pattern of RCD for M7 lysin, given that the sampling design employed here is not sensitive to subtle differences in allele frequencies among populations, we cannot absolutely refute the possibility that there is a pattern of RCD in M. galloprovincialis along the California coastline (as suggested by Springer 2003
). Certainly, the geographic distribution of D alleles in M. galloprovincialis is noteworthy and should be investigated further.
Aside from the 3 aforementioned pairwise comparisons within M. galloprovincialis (none of which can be interpreted as unequivocal support for a RCD pattern), there are no other instances of significant population structure between sympatric and allopatric populations within any of the 3 species considered here (table 4). Thus, sympatry does not generally increase population structure and major haplotypes are not geographically restricted. We find no strong signal of RCD in M7 lysin such as that found for bindin in E. oblonga, despite the high frequencies of hybridization where mussel species are sympatric.
Rather, the presence of strongly selected D haplotypes in all M. galloprovincialis populations suggests that some intraspecific process (not reinforcement between species) is more likely to have been the causative agent of selection. In particular, in M. galloprovincialis the segregation of divergent B and D haplotypes points to balancing selection having maintained intraspecific variation. Similarly in bindin, within-species polymorphisms have been selectively maintained among Echinometra species (Metz and Palumbi 1996
; Palumbi 1999
; Geyer and Palumbi 2003
). The retention of allelic diversity within both sea urchin and M. galloprovincialis male gamete compatibility genes is consistent with sexual conflicts between male and female gametes causing selection to maintain intraspecific amino acid variation. If antagonistic interactions are important, the gamete recognition genotypes of spawning individuals should affect fertilization success, as in the sea urchins S. franciscanus, where the exact outcome of spermegg interactions depends both on male and female genotypes and spawning density (Levitan and Ferrell 2006
). Although little is known about natural spawning events for most marine invertebrates, mussels can be found in dense aggregations, and eggs have several mechanisms for preventing polyspermy (Togo et al. 1995
). The ecology of mussels, thus, suggests that polyspermy could be a problem for eggs and female resistance to polyspermy could exert strong selection on male gamete recognition genes. Such complex interactions among sperm and eggs could cause frequency-dependent selection (as with bindin in S. franciscanus) and could maintain B and D (and rare A) M7 lysin haplotypes within M. galloprovincialis.
Experimental assays of gamete compatibility in mussels also are consistent with fertilization depending on interactions between male and female genotypes. M. edulis females are variable in their compatibility with heterospecific M. trossulus males, indicative of polymorphism in permissiveness at an egg receptor (Rawson et al. 2003
). It is unknown whether M. edulis females also differ in their compatibility among conspecific males. Further examination of gamete compatibility patterns in Mytilus would help reveal specific sources of selection, as would the isolation and description of egg receptor genes and additional sperm genes (such as M6 and M3 lysins) involved in fertilization.
Although DNA sequences of M7 lysin show a strong signal of Darwinian selection, as demonstrated here, it is clear that a full understanding of why M7 lysin has been subject to positive selection will require molecular examinations of the receptor female gene(s) combined with experimental manipulations of gametes. Overall, the molecular evolution of Mytilus M7 lysin is strikingly similar to that of sea urchin bindin. Although both these genes mediate gamete compatibility, their specific functions are quite different. Nonetheless, similar forces due to the interactions between male and female gametes may underlie the selective maintenance of nonsynonymous polymorphisms within species.
| Acknowledgements |
|---|
|
|
|---|
Many thanks for comments and assistance from the Cunningham and Willis labs, J. H. McDonald, M. A. F. Noor, G. A. Wray, R. Haygood, D. R. Levitan, K. S. Zigler, A. L. Case, M. A. McCartney, P. D. Rawson, R. S. Burton, B. Ball, C. Riginos, M. Uyenoyama, and 2 anonymous reviewers. This work was supported by the Duke University program in Molecular Evolution and Comparative Genomics, the Professional Association of Diving Instructors Foundation, and the National Science Foundation.
| Footnotes |
|---|
Marcy Uyenoyama, Associate Editor
| References |
|---|
|
|
|---|
Anisimova M, Bielawski JP, Yang Z. 2001. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:158592.
Beaumont AR, Turner G, Wood AR, Skibinski DOF. 2005. Laboratory hybridizations between Mytilus species and performance of pure species and hybrid veliger larvae at lowered salinity. J Molluscan Stud 71:3036.
Biermann CH. 1998. The molecular evolution of sperm bindin in six species of sea urchin (Echinoida: Strongylocentrotidae). Mol Biol Evol 15:176171.[Abstract]
Bierne N, David P, Boudry P, Bonhomme F. 2002. Assortative fertilization and selection at larval stage in the mussels Mytlius edulis and M. galloprovincialis. Evolution 56:2928.[CrossRef][ISI][Medline]
Buchner A, Erdfelder E, Faul F. 1997. How to use G*Power [www document]. Available from: http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/how_to_use_gpower.html. Accessed on 2006.
Civetta A, Singh RS. 1998. Sex-related genes, directional sexual selection, and speciation. Mol Biol Evol 15:9019.[Abstract]
Comesaña AS, Toro JE, Innes DJ, Thompson RJ. 1999. A molecular approach to the ecology of a mussel (Mytilis edulis - M. trossulus) hybrid zone on the east coast of Newfoundland, Canada. Mar Biol 133:21321.
Coustau C, Renaus F, Delay B. 1991. Genetic characterization of the hybridization between Mytilus edulis and M. galloprovincialis on the Atlantic coast of France. Mar Biol 111:8793.[CrossRef]
Coyne JA, Orr HA. 2004. Speciation. Sunderland, MA: Sinauer.



model of evolution for the entire coding region determined from mRNA. As recombination has occurred between regions of this gene (see text), these trees do not estimate any true genealogy but are useful for understanding relationships among haplotypes. The populations from which haplotypes were sampled are indicated by the first 2 letters of each taxon name and correspond to abbreviations in 