MBE Advance Access originally published online on August 3, 2007
Molecular Biology and Evolution 2007 24(10):2286-2297; doi:10.1093/molbev/msm159
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Positive Selection and Gene Conversion in SPP120, a Fertilization-Related Gene, during the East African Cichlid Fish Radiation
Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, Konstanz, Germany
E-mail: axel.meyer{at}uni-konstanz.de.
| Abstract |
|---|
|
|
|---|
The ability to infer historical natural selection from sequence data aides in finding genes that might be important in adaptation and the formation of new species. As the fastest evolving and largest known vertebrate radiation, the cichlid fish of the African Great Lakes exhibit a wide range of recent morphological diversification. We used DNA databases, mostly of expressed sequence tags, to find candidate orthologous coding sequences from 2 tribes of cichlids and, using an automated procedure, scanned these sequence pairs for high dN/dS, the signal of positive selection and protein adaptation. The results included vertebrate genes commonly found to be under selection (e.g., major histocompatibility complex [MHC] loci) as well as genes known to be important specifically in the cichlid radiation (e.g., long-wave-sensitive opsins). Further investigation focused on a gene encoding a fertilization-related protein, SPP120, which was previously known only from cichlids. Using maximum likelihood analysis on novel SPP120 cDNA sequences from a range of African cichlids, we demonstrate the influence of positive selection in a specific subregion of the protein. We also show that SPP120 is a tandemly arranged, multicopy gene evolving with occasional interlocus gene conversion. A search of the Medaka genome database also revealed a tandem arrangement of multiple SPP120 copies and evolutionary rate differences between Medaka gene subregions mirroring those found for cichlids. Combined, these results suggest that SPP120 has been under repeated diversifying selection for over 100 Myr.
Key Words: reproductive genes natural selection gene conversion cichlid SPP120
| Introduction |
|---|
|
|
|---|
It is a major goal of evolutionary genetics to match genetic differences between species or individuals with those phenotypic differences that, when seen by natural selection, drive adaptive evolution (Schluter 2001
In model organisms, it is possible to investigate candidate genes for certain species-specific phenotypes. However, due to constraints inherent to nonmodel organisms, this type of approach is often more difficult to apply. Traits that are potentially causally related to the diversification of cichlids include genes that are involved in variation of jaw morphology (Albertson et al. 2003
) and visual sensitivities to different wavelengths of light (Spady et al. 2005
). Linkage analysis has also been successfully employed to find genomic loci influencing species differences (Albertson et al. 2003
; Streelman et al. 2003
), but the relatively long lifecycle and small brood size of most of the African cichlids (Fryer and Iles 1972
), along with the low resolution of current genetic maps, make this approach tedious (Lee et al. 2005
).
Alternatively, the expanding and diversifying sequence databases present an opportunity to scan large number of genes and compare their rates of evolution (Swanson et al. 2001
). In an effort to identify novel genes of potential importance in the adaptive radiation of cichlids, we compared published coding sequences of Haplochromine and Tilapiine cichlids, taking advantage of 3 recently created expressed sequence tag (EST) libraries from Haplochromine species (Renn et al. 2004
; Watanabe et al. 2004
). After identifying probable orthologues, we automated the pairwise alignment of these sequences and calculated the ratio of the rates of substitution at nonsynonymous and synonymous sites (dN/dS) to identify genes evolving under positive Darwinian selection. Here, we present the results of this approach and the investigation of a reproductive protein, SPP120, that showed a very high dN/dS ratio but that was also found to be a multicopy gene evolving under the influence of gene conversion.
| Materials and Methods |
|---|
|
|
|---|
Scan for High dN/dS Candidate Genes
DNA sequences originating from 2 tribes of African cichlids were downloaded from GenBank. Not all available sequences from the Haplochromini, the extremely species-rich lineage of cichlid that makes up most of the adaptive radiations of cichlids in East African Great Lakes (Salzburger et al. 2005
|
After a screen for cloning vector sequences, approximately 400 of the Ptyochromis (Haplochromis) sp. redtail sheller sequences (Watanabe et al. 2004
10–15. The highest scoring significant Haplochromini sequence matching each query was then added to a list of sequences (992 in total), which were Blast searched back against the Tilapiini database. This produced a list of pairs of sequences that could be filtered to identify the reciprocal best matches from the Blast searches. Nonreciprocal or duplicate matches were removed. Additionally, all the Tilapiini sequences from the list of matches were Blast searched back against the Tilapiini database to find all matches that were higher scoring than the match found in the Haplochromini database (controlling for sequence length). These matches were sorted into families of highly similar sequences, and only the longest matching sequence was kept in the list to control for redundant sequencing, misannotation, and gene families where one sequence from one database was a top hit for many sequences from the other database. All sequences matching the full O. mossambicus mitochondrial genome (AY597335
[GenBank]
) were also removed from the data set (1,201 sequences).
The following sequence manipulations were carried out using a Perl script with the aid of the BioPerl modules (Stajich et al. 2002
). The sequences were aligned according to a forced amino acid translation of the longest open reading frames. If the coding sequence was annotated, it was used; otherwise, the sequences were aligned in 6 coding arrangements and the highest scoring protein alignment was used to align the nucleotide sequences (using needle and tranalign, EMBOSS—Rice et al. 2000
). Under automation, pairwise dN/dS was calculated using the YN00 method of Yang and Nielsen (2000)
implemented in PAML (Yang 1997
). Because of the low number of orthologous pairs found, those with dS less than 0.2 and more than 200 bp of aligned coding sequence were realigned manually and dN/dS was recalculated in MEGA according to the "Modified Nei–Gojobori" method (Nei and Gojobori 1986
).
Target Gene SPP120
The pairwise dN/dS is a weak estimate of positive selection because most nucleotide sites in most functional genes are under strong purifying selection (Makalowski and Boguski 1998
). When dN/dS is averaged over all sites, any signal derived from the minority of adaptive substitutions is diluted by the pervasive signal of constraint (Li 1997
). Greater resolution can be achieved by analyzing the accumulation of changes to a protein across a phylogenetic tree. The codeml program, part of the PAML package (Yang 1997
), uses a maximum likelihood approach to contrast the ability of alternative, nested models of evolution in explaining the available data. This method is able to discern the signal of adaptation from only some sites against a background of constraint and is a more powerful test for the presence of positive selection (Yang and Nielsen 2002
).
We chose a gene, SPP120, for further study. SPP120 is expressed in the gonad, was previously only known from Tilapia, and was experimentally shown to encode a protein with sperm-binding affinity (Mochida et al. 1999
). As such, it is a good candidate for positive selection (Swanson and Vacquier 2002
). The maximum likelihood method requires, however, a divergent sampling of sequences from species within the framework of a strongly resolved phylogeny. Therefore, we amplified SPP120 from different African cichlid lineages using the phylogeny of Salzburger et al. (2005)
. The relationships of cichlids within Lakes Malawi and Victoria are weakly resolved, but, because the cichlid radiations of Lakes Victoria and Malawi are monophyletic (Meyer et al. 1990
; Moran and Kornfield 1993
; Sultmann et al. 1995
; Verheyen et al. 2003
; Salzburger et al. 2005
), 2 species from each lake could be sampled as nearest neighbors. Novel SPP120 sequences were obtained from the following East African species: A. burtoni (a widespread nonendemic species), Pundamilia nyererei (Lake Victoria), Haplochromis sp. 44 (Lake Victoria), Melanochromis auratus (Lake Malawi), Pseudotropheus sp. Bicolor (Lake Malawi), and Pseudocrenilabrus multicolor (a widespread nonendemic species).
A single adult male of each species was anaesthetized with approximately 0.04% Tricaine (Sigma, Frankfurt, Germany) and then placed into ice-cold water for 5–10 min. The spinal column was severed before dissection. The paired testes were located running anterior–posterior from immediately beneath the swim bladder to near the proctodeum. RNA was extracted from one homogenized testis using Trizol reagent (Invitrogen, Karlsruhe, Germany) and stored at –80 °C. cDNA was reverse transcribed from the testis RNA using Superscript II Reverse Transcriptase (Invitrogen). SPP120 was polymerase chain reaction (PCR) amplified from the cDNA using primers designed on the single, full-length, published sequence from Tilapia (AB073751
[GenBank]
—Mochida et al. 2002
). PCR products over 2 kb in length were cloned using the TOPO-TA Kit (Invitrogen), and several clones were checked by sequencing with M13 forward and reverse primers. SPP120-positive clones were partially sequenced using primers designed on the O. niloticus sequence and then completed by incremental PCR steps with additional primers. Sequences from all species were compiled into one GAP4 (Staden Package—Staden et al. 2000
) database to simultaneously check for sequencing gaps or errors and to make multispecies alignments. All clones were fully sequenced in both directions. Table 2 lists the primers used to PCR amplify SPP120 from testis cDNA or from genomic DNA.
|
Genomic Sequences
SPP120 was amplified from O. niloticus genomic DNA using the Fermentas Long PCR Enzyme Mix with an extension time of 11 min for the first 10 cycles, then an additional 5 s for each of the remaining 25 cycles. The PCR products were cloned and sequenced. The raw sequences from the genomic clones were compiled into a GAP4 database along with the completed cDNA sequences to identify the exon/intron boundaries and so that new exonic conserved primers could be designed.
The lengths of the genes were variable and generally longer in haplochromine species than in O. niloticus. Therefore, a portion of the gene from exon 4 to exon 10 (primers 420F/1207R, table 2) was PCR amplified and cloned from all species. This region corresponds to amino acids 107–370 and contained 1,422–1,981 bp of intronic sequence. All novel sequences have been deposited in GenBank under accession numbers EF486251 [GenBank] –EF486264 [GenBank] (cDNAs) and EF490572 [GenBank] –EF490597 [GenBank] (genomic DNAs).
Screen of the Astatotilapia Bacterial Artificial Chromosome Library
To better estimate the number of copies of SPP120 in the cichlid genome, we conducted a hybridization screen of the recently developed bacterial artificial chromosome (BAC) library of A. burtoni following the method of Lang et al. (2006)
. As multicopy genes may include retrotransposed (spliced) copies of themselves, the screen was conducted using a mixture of genomic (intron containing) and cDNA (intronless)-derived probes spanning exons 7–12. Positive clones were screened by PCR targeting the same region from exon 4 to exon 10 (primers 420F/1207R, table 2).
Sequence Analyses
The full-length cDNA sequences were extracted from the GAP4 database ready aligned and then analyzed using MEGA3 (Kumar et al. 2004
). The coding frame and protein sequence from the published SPP120 sequence were used to check for frameshift and nonsense mutations in the sequences. Phylogenetic trees of the coding regions from the full-length cDNA sequences were constructed using the Neighbor-Joining and Minimum-Evolution methods implemented in MEGA3 using the Tamura–Nei distance (Tamura and Nei 1993
).
Recombination
The presence of recombination among the cDNA sequences was suggested by an inspection of the variable sites along the sequence alignment. The program "permute" by McVean et al. (2002)
was used to test for a significant correlation of linkage disequilibrium with physical distance along the alignment, a clear signal of recombination.
Selection
The relative abilities of different models to explain the evolution of the cDNA sequences were contrasted using the program codeml (PAML; Yang 1997
). First, models M0 and M3 were compared with test for heterogeneity between codon sites in the value of the dN/dS ratio,
. Second, a mixed model allowing for some sites under positive selection (
> 1.0) and other sites under varying degrees of purifying selection (
< 1.0) or neutrality (
1.0) Model M8 was contrasted with a similar model with the same flexibility in
below 1.0 but with the constraint that the
at fast-evolving sites could not be greater than 1.0 (Model M8a, see Wong et al. 2004
). This is effectively a test between neutrality and positive selection at fast-evolving sites against a background of constrained sites in both models (Wong et al. 2004
). The Neighbor-Joining tree constructed from these sequences was used as input to codeml (see Results).
Medaka Database Scan
The October 2006 release (#41) of the Medaka (Oryzias latipes) genome (http://www.ensembl.org/Oryzias_latipes; Kasahara et al. 2007
) was Blast searched using the protein sequence of the O. niloticus (Tilapia) SPP120 sequence. Significant hits to the protein sequence were assigned to particular exons using the exon/intron structure that we determined for O. niloticus (see above).
| Results |
|---|
|
|
|---|
The database Blast searching and filtering produced 353 pairs of putative Haplochromine/Tilapiine orthologues of varying divergence and alignment quality. Of these, 82 sequence pairs produced uninterrupted coding sequence alignments of sufficient length to calculate dN/dS. The median dS value was 6.3%. Among the top 10 pairs (ranked by dN/dS) were 3 genes that were previously shown to be under selection in their coding regions during the radiation of cichlids indicating that this approach is able to identify positively selected genes (table 3, and supplementary table 1, Supplementary Material online). Two were major histocompatibility complex (MHC) Class II loci (Figueroa et al. 2000
|
The low number of putative orthologues meant that all 82 could be checked manually before further investigation of strong candidates of positive selection. The results of these manual alignments are shown as a plot of dN against dS in figure 1. Though the initial scan revealed several good candidates for further study, SPP120 was chosen because of the length of the alignment, the provenance of the tilapia cDNA sequence, and the functional studies linking this gene to cichlid reproduction (Mochida et al. 1999
|
Full-Length cDNA Sequences and gene Trees
PCR amplification of SPP120 from cichlid testis cDNA revealed several unexpected properties of this gene. First, multiple, divergent transcribed copies of the gene were amplified from almost all cDNA samples. Three sequences from M. auratus and 2 from P. bicolor were full-length transcribed sequences that differed by more than 1% at the nucleotide level (fig. 2). We failed to amplify SPP120 from testis cDNA of P. multicolor but were able to amplify several distinct copies of the gene from its genomic DNA. Two divergent copies of this gene in this species included 95% of the coding sequence, conserved without nonsense mutations or frameshifts. Among the Lake Victoria haplochromine cichlids (including A. burtoni), most cDNA sequences differed by no more than 1% either between or within species, and it is difficult to distinguish distinct loci from alleles based on this data alone. The numbers of distinct sequences (those with divergences >0.3% within species) were 4, 2 and, 2 for A. burtoni, P. nyererei, and Haplochromis sp. 44, respectively.
|
Inspection of the sequences revealed 2 independent mutations that likely represent a transition to a pseudogene of their respective sequences (marked with
in fig. 2). cDNA copy number 2 from Pseudotropheus sp. Bicolor contains a 1-bp frameshifting deletion at position 468. The cDNA-derived sequences from A. burtoni, P. nyererei, and Haplochromis sp. 44 were generally very similar. cDNA copy 4 from A. burtoni has a G–A transition, creating a premature TGA stop codon at position 2131 (losing 11% of the coding sequence, including half the "Zona Pellucida" [ZP] domain). The same substitution is also seen in individual clones from both of the Lake Victoria species (P. nyererei cDNA copy 2 and Haplochromis sp. 44 copy 2), but, interestingly, a single cDNA cloned from P. nyererei (copy 1 in fig. 2) featured a second, potentially compensatory substitution in the same codon that alters the sequence to CGA (Arginine). In addition, the sequences of M. auratus appear to be partly recombinant. This was confirmed by a significant negative correlation between linkage disequilibrium (r2) and distance along the sequence (–0.21, P < 0.002). It appears that copy 3 is chimaeric, sharing greatest similarity to copy 2 within the first (most 5' 1,800 bp) but is closer to copy 1 in the final 600 bp. All of the "pseudogenes" listed above, the recombinant sequence 3, and any sequences sharing more than 99.5% identity were excluded from the maximum likelihood analysis. As the relationships among the Lake Victoria Haplochromine sequences were ambiguous, only 2 sequences were used, A. burtoni cDNA copy 3 and the more divergent Haplochromis sp. 44 cDNA copy 1.
The gene tree topology, estimated from cDNA and P. multicolor genomic (see below) SPP120 sequences (fig. 2A), was used as input for the maximum likelihood analysis because it better represents the evolutionary relationships of the gene copies than does the species phylogeny (fig. 2B; Salzburger et al. 2005
). Maximum likelihood analysis of the remaining 8 sequences revealed strong heterogeneity in
among the codons and favored a model of evolution including an excess of amino acid changing substitutions at some sites, consistent with the influence and signature of adaptive evolution.
The maximum likelihood results are summarized in table 4. First, the heterogeneity test (M3 vs. M0) revealed that
is not uniform along the coding sequence of SPP120 (P < 10–5). Second, when a mixed model of evolution including sites with
> 1 (Model 8) was tested against the same model but with the upper class of
constrained to 1.0 (Model 8a, reflecting neutral evolution at those sites), then the former had a significantly greater likelihood (P < 10–5) using a more conservative test with 2 degrees of freedom (Wong et al. 2004
). In both tests, the more parameter rich model was successful and gave very similar estimates of
for the fast-evolving sites. Under Model 8, the value of
predicted for the sites with adaptive substitutions is 5.92. The Bayes Empirical Bayes posterior probabilities (BEBpp—Yang et al. 2005
) of sites with
> 1.0 list 17 codon positions with BEBpp > 0.95, of which 13 are within the first third of the protein, the region with no detectable homology to other known proteins. The positions of codons predicted to be under the influence of strong positive selection are illustrated in figure 3.
|
|
Genomic Sequence from O. niloticus
A genomic PCR product containing the entire SPP120-coding region was amplified from Tilapia DNA and overlapping sequences from P. multicolor spanning most of the gene. In Tilapia, primers SPP120_01F and SPP120_2520R were used, generating a 9.3-kb product, which, aligned to the published cDNA sequence, revealed the presence of 18 introns (fig. 4). Multiple copies were only as distinct as one might expect from alleles (0.24% between 5,000 bp sequenced in 2 different clones, Tamura–Nei distance). The coding sequence of the fully sequenced clone differed from the published sequence (Mochida et al. 2002
|
Amplification of a portion of SPP120 from genomic DNA of the same male individuals produced a sequence tree (fig. 5) with identical or highly similar genomic counterparts to most of the testis-derived sequences (except M. auratus cDNA copy 2). In addition, several novel and distinct (>1% divergence) copies were amplified from several species. Taking together, both cDNA-derived and genomic DNA–derived sequences (all from one male per species) brings the copy number for the exon 4 to exon 10 region to 5 for M. auratus, 4 for P. multicolor, and 3 each for P. bicolor, P. nyererei, and Haplochromis sp. 44.
|
BAC Library Results
The screen of the A. burtoni BAC library produced 8 clones positive for the mixed spliced/nonspliced SPP120 probe. Initial analysis by PCR revealed length and copy number differences between SPP120 forms on different clones. Seemingly full-length versions of SPP120 were found on 3 clones (149-C9, 164-G7, and 182-L14). Long range PCR from these BAC clones gave single products of over 20 kb. PCR and sequencing of the subregion from exon 4 to exon 10, revealed that all 3 clones carried the same copy of SPP120 and that the coding region most closely matched the A. burtoni cDNA copy 4 (fig. 2). The other clones contained "spliced" forms of SPP120 lacking introns, curtailed forms containing only introns 8–10 with a long interspersed nuclear element (LINE) element in the place of exons 5–7, or both of these forms. Two BAC clones (183-P24 and 165-O8) contained both the spliced and the truncated forms of SPP120. Sequencing of the BAC PCR products revealed that copies were divergent both between and within different clones. BAC clone 182-O1 contained a spliced form matching most closely cDNA copy 1 from Haplochromis sp. 44. Spliced forms from BAC clones 183-P24 and 165-O8 were similar to each other and to the alternative genomic copies from P. nyererei and Haplochromis sp. 44 (numbered genomic copy II in fig. 5). The nonspliced but curtailed forms of SPP120, however, did not share discriminatory synapomorphies with any of the previously sequenced Lake Victoria forms and were less similar to other A. burtoni sequences than the closest of those from the Lake Malawi species. These fragments of SPP120 may therefore be relics dating back to shortly before or during the diversification of the modern haplochromines.
Database Searches
When this project began, there were no obvious noncichlid orthologous sequences matching SPP120 in any genome database (Mochida et al. 2002
). However, Blast searches using protein sequence revealed strong hits to SPP120 in the October 2006 release of the Medaka genome expanding the taxonomic range of SPP120 to the ancestor of these lineages. In the Medaka genome, there are multiple strong matches (Blast P value < 10–80 in most cases) to SPP120 exons, and these are all located within a 5-Mb region on chromosome 18. Other significant Blast hits (P value < 10–05) are divergent paralogues matching the von Willebrand factor D (VWD) or ZP domains. There are multiple potential copies of the SPP120 gene in the chromosome 18 region, each with a similar level of divergence from the cichlid sequence. The relative positions and orientations of clusters of exon hits are shown in figure 6 and a Neighbor-Joining phylogeny of the more complete clusters in figure 7. It is notable that the clusters located in the same orientation along the chromosome ([1,2] or [5,7,8,10]) show greater similarity than with clusters in the opposite direction. The outgroup sequences shown in figure 7, from Gasterosteus aculeatus and Danio rerio, are not full-length copies of SPP120 and only correspond to the most 3' 1100 nt of the coding region (mostly the ZP domain). More complete copies of the gene, if they exist, have yet to be found.
|
|
An alignment of the entire coding region of the cichlid SPP120 to 3 of the Medaka clusters (1, 2, and 8) contains numerous short, frame-preserving indels between the 4 sequences, the Kozak sequence (Kozak 1981
| Discussion |
|---|
|
|
|---|
We aimed to identify genes that might be important in the adaptive radiation of East African cichlids by scanning for genes with an atypical rate of amino acid evolution. The discovery of several genes that had been previously identified to have been under recent natural selection by this method provided assurance that this method had sufficient power to do so. The other genes with high dN/dS values were mostly reproductive or immune system genes in line with expectations for this statistic (Yang and Bielawski 2000
The molecular genetic evidence supporting widespread adaptive evolution of reproductive genes is matched only by that of genes involved in immune responses or their evasion. We chose to focus on a reproductive protein, SPP120, for further investigation because it was known to be functional and contained protein domains involved in binding to sperm and perhaps also to the ZP (or chorion) surrounding vertebrate oocytes (Mochida et al. 1999
). The influence of positive Darwinian selection was confirmed by maximum likelihood analyses on novel cDNA sequences obtained from a variety of cichlids, but only after the influences of gene duplication, gene conversion and pseudogenization were accounted for.
The excess of nonsynonymous substitutions in this gene during the evolution of the cichlids is better represented by the mixed model of evolution that includes some sites (
15%) evolving under the influences of positive Darwinian selection. This signal remains one of the strongest yet observed in cichlids, even after a significant portion of the available data is discarded to remove any confounding influence of recombinant sequences.
From a structural perspective, the combination of the VWD domain and the ZP domain is a rare configuration, best known from alpha-tectorin, the major noncollagenous component of the tectorial membrane in mammalian ears (Legan et al. 1997
). This gene has 4 VWD domains in tandem and a single C-terminus ZP domain. As alpha-tectorin seems to exist in teleosts (a predicted protein on Medaka Chr14 has 57% identity to human alpha-tectorin), SPP120 is not the orthologue of this gene but may be a paralog specific to higher teleosts. It has recently been speculated that the ZP domain was a feature of the ancestral animal oocyte coating due to its presence in mammalian, bird, and teleost oocyte proteins as well as those of the invertebrate abalone (Mold et al. 2001
; Smith et al. 2005
). The closest teleost homologues of the mammalian proteins have been described in Zebrafish (Mold et al. 2001
) and in a percomorph (Modig et al. 2006
), and these loci map to Medaka chromosomes 6 and 24 (by Blast homology). The existence of the ZP domain in SPP120 is therefore intriguing because of its strong expression in testis. Has this domain been co-opted due to its affinity for sperm or has it just filled a need for an extracellular structural glycoprotein as in other proteins unrelated to reproduction?
The precise function of the SPP120 "genes" remains unknown. At least one copy has the potential to produce a 790-amino acid protein in each of the cichlid species studied and does so successfully in O. niloticus (Mochida et al. 2002
). Within cichlids, the VWD and ZP domains have evolved under purifying selection, but the 200–amino acid, N-terminus sequence has undergone adaptive diversifying evolution. A full-length coding sequence has also been conserved without stop codons or frameshift mutations for more than 100 Myr (Steinke et al. 2006
) since the divergence of the cichlid and Medaka lineages (see below). Again, the divergence of SPP120, both between Medaka and cichlids and between different loci within Medaka, is accelerated in the same 200-amino acid region. This pattern hints at coevolution between SPP120 and other genes, perhaps as a conflict over the ability for sperm to fertilize eggs. The findings of Mochida et al. (2002)
, that what was thought to be a single copy gene was expressed in both ovaries and gonads, might stand against a sex-specific role for SPP120. However, the evidence from Mochida et al. (2002)
and our own observations, suggest that there could be quite a strong difference in expression between these 2 tissues.
Alternatively, the SPP120 genes may have altered their function or functions specifically during the cichlid radiation. Grier and Fishelson (1995)
describe a distinction between mouth-brooding and substrate-spawning Tilapiine cichlids; the sperm of the former are packed, immotile, in a periodic acid-Schiff (PAS)-positive mucus (probably a glycoprotein) that is taken up by the female into her mouth where fertilization subsequently occurs. It might be speculated that 1 gene involved in this trait is SPP120. Though this particular trait is not seen outside Tilapiine cichlids (Wickler 1997
), the fertilization and brooding strategies of African cichlids are diverse (Fryer and Iles 1972
) and likely to have been under varied and perhaps repeated episodes of natural selection.
The evidence for occasional but repeated gene conversion during the evolution of this gene family is manifold. The recombinant cDNA sequences from a single male M. auratus individual suggested multiple divergent loci encoding different versions of SPP120, and this was confirmed for all species by genomic sequencing. The screen of the BAC library confirmed multiple loci, their tandem arrangement (for some at least) within 200 kb (the upper limit of A. burtoni BAC clone lengths), and the existence of spliced and retrotransposed gene copies. Finally, the recently released genome sequence of Medaka (Kasahara et al. 2007
) showed that copies of SPP120 arranged in the same orientation along chromosome 18 exhibit greater similarity than copies lying in opposite orientations (fig. 6), even when the inverted copies are located much closer. This suggests that the Medaka SPP120 family underwent a history of gene conversion events dating back tens of millions of years, maintained after an inversion event separated clusters into forward- and reverse-orientated copies. Furthermore, it is possible that the homogenizing force of gene conversion has been operating on a tandem array of SPP120 duplicates since the most recent common ancestor of cichlids and Medaka dating back over 100 Myr (Steinke et al. 2006
) and potentially of influence over more than 7,000 species of percomorph fish. The frequency of gene conversion events between SPP120 loci is, however, seemingly limited relative to the rate of cladogenesis (figs. 2 and 7).
Though the prevalence of gene conversion throughout the genome is hard to determine, it is intriguing that several genes and gene families that are thought to be under positive selection have also recently been shown to evolve concertedly. For example, the MHC Class II genes, a paradigm of coding region adaptation in vertebrates, show a high degree of allelic copy number variation in teleosts (Malaga-Trillo et al. 1998
; Reusch et al. 2004
), and it has been proposed that for some MHC loci in sticklebacks (G. aculeatus), gene conversion may be more important than mutation in generating diversity, the fuel of adaptive evolution (Reusch and Langefors 2005
). Similar co-occurrences of diversifying selection and gene conversion have been observed in protocadherins (Noonan et al. 2004
; Wu 2005
) and the vitelline envelope forming genes of Abalone, another classic example of diversifying selection (Aagaard et al. 2006
). In mammalian zonadhesin, a sperm surface protein that binds species specifically with the egg ZP, tandem VWD domain repeats have evolved both divergently and concertedly (Herlyn and Zischler 2006
). It will be interesting to find out whether arrays of tandem gene copies have a greater propensity to undergo rapid, adaptive evolution or whether this correlation is just a by-product of the occasional neofunctionalization of duplicated gene copies (Stephens 1951
; Ohno 1970
).
| Supplementary Material |
|---|
|
|
|---|
Supplementary table 1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Walter Salzburger for help in the choice of species and their dissection and Klaus Zanker for some of the laboratory work. The constructive criticism of 2 anonymous reviewers improved an earlier version of this manuscript. D.T.G. was supported by a Research Fellowship from the Alexander von Humboldt Stiftung and A.M. by grants from the Deutsche Forschungsgemeinschaft. Medaka data—"The data have been provided freely by the National Institute of Genetics and the University of Tokyo for use in this publication/correspondence only."
| Footnotes |
|---|
Billie Swalla, Associate Editor
| References |
|---|
|
|
|---|
Aagaard JE, Yi X, MacCoss MJ, Swanson WJ. Rapidly evolving zona pellucida domain proteins are a major component of the vitelline envelope of abalone eggs. Proc Natl Acad Sci USA (2006) 103:17302–17307.
Albertson RC, Streelman JT, Kocher TD. Genetic basis of adaptive shape differences in the cichlid head. J Hered (2003) 94:291–301.
Colosimo PF, Hosemann KE, Balabhadra S, Villarreal G Jr, Dickson M, Grimwood J, Schmutz J, Myers RM, Schluter D, Kingsley DM. Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science (2005) 307:1928–1933.
Coyne J, Orr A. Speciation (2004) Sunderland (MA): Sinnauer Associates.
Coyne JA, Orr HA. The evolutionary genetics of speciation. Philos Trans R Soc Lond B Biol Sci (1998) 353:287–305.
Figueroa F, Mayer WE, Sultmann H, O'HUigin C, Tichy H, Satta Y, Takezaki N, Takahata N, Klein J. Mhc class ii b gene evolution in East African cichlid fishes. Immunogenetics (2000) 51:556–575.[CrossRef][Web of Science][Medline]
Fryer G, Iles TD. The cichlid fishes of the great lakes of Africa: their biology and evolution. (1972) Edinburgh (UK): Oliver & Boyd.
Grier HJ, Fishelson L. Colloidal sperm-packaging in mouthbrooding tilapiine fishes. Copeia (1995) 1995:966–970.[CrossRef]
Hamilton LC, Macpherson GR, Wright JM. Expressed sequence tags derived from brain tissue of Oreochromis niloticus. J Fish Biol (2000) 56:219–222.[CrossRef][Web of Science]
Herlyn H, Zischler H. Tandem repetitive d domains of the sperm ligand zonadhesin evolve faster in the paralogue than in the orthologue comparison. J Mol Evol (2006) 63:602–611.[CrossRef][Web of Science][Medline]
Ishimoto Y, Savan R, Endo M, Sakai M. Non-specific cytotoxic cell receptor (nccrp)-1 type gene in tilapia (Oreochromis niloticus): Its cloning and analysis. Fish Shellfish Immunol (2004) 16:163–172.[CrossRef][Web of Science][Medline]
Kasahara M, Naruse K, Sasaki S, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature (2007) 447:714–719.[CrossRef][Medline]
Kocher TD. Adaptive evolution and explosive speciation: the cichlid fish model. Nat Rev Genet (2004) 5:288–298.[CrossRef][Web of Science][Medline]
Kocher TD, Conroy JA, McKaye KR, Stauffer JR. Similar morphologies of cichlid fish in lakes Tanganyika and Malawi are due to convergence. Mol Phylogenet Evol (1993) 2:158–165.[CrossRef][Medline]
Kozak M. Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. Nucleic Acids Res (1981) 9:5233–5252.
Kumar S, Tamura K, Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.
Lang M, Miyake T, Braasch I, Tinnemore D, Siegel N, Salzburger W, Amemiya CT, Meyer A. A BAC library of the East African haplochromine cichlid fish Astatotilapia burtoni. J Exp Zoolog B Mol Dev Evol (2006) 306:35–44.[Medline]
Lee BY. Approach to the identification of sex-determining genes in the tilapia genome by genetic mapping and comparative positional cloning. Hubbard Center for Genome Studies (2004) Durham (NH): University of New Hampshire.
Lee BY, Lee WJ, Streelman JT, Carleton KL, Howe AE, Hulata G, Slettan A, Stern JE, Terai Y, Kocher TD. A second-generation genetic linkage map of tilapia (Oreochromis spp.). Genetics (2005) 170:237–244.
Legan PK, Rau A, Keen JN, Richardson GP. The mouse tectorins. Modular matrix proteins of the inner ear homologous to components of the sperm-egg adhesion system. J Biol Chem (1997) 272:8791–8801.
Li W-H. Molecular evolution (1997) Sunderland (MA): Sinauer Associates.
Makalowski W, Boguski MS. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci USA (1998) 95:9407–9412.
Malaga-Trillo E, Zaleska-Rutczynska Z, McAndrew B, Vincek V, Figueroa F, Sultmann H, Klein J. Linkage relationships and haplotype polymorphism among cichlid Mhc class ii b loci. Genetics (1998) 149:1527–1537.
McVean G, Awadalla P, Fearnhead P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics (2002) 160:1231–1241.
Meyer A. Phylogenetic relationships and evolutionary processes in East African cichlid fishes. Trends Ecol Evol (1993) 8:279–284.[CrossRef]
Meyer A, Kocher TD, Basasibwaki P, Wilson AC. Monophyletic origin of Lake Victoria cichlid fishes suggested by mitochondrial DNA sequences. Nature (1990) 347:550–553.[CrossRef]
Mochida K, Kondo T, Matsubara T, Adachi S, Yamauchi K. A high molecular weight glycoprotein in seminal plasma is a sperm immobilizing factor in the teleost Nile tilapia, Oreochromis niloticus. Dev Growth Differ (1999) 41:619–627.[CrossRef][Web of Science][Medline]
Mochida K, Matsubara T, Andoh T, Ura K, Adachi S, Yamauchi K. A novel seminal plasma glycoprotein of a teleost, the Nile tilapia (Oreochromis niloticus), contains a partial von Willebrand factor type d domain and a zona pellucida-like domain. Mol Reprod Dev (2002) 62:57–68.[CrossRef][Web of Science][Medline]
Modig C, Modesto T, Canario A, Cerda J, Hofsten Jv, Olsson P-E. Molecular characterization and expression pattern of zona pellucida proteins in gilthead seabream (Sparus aurata). Biol Reprod (2006) 75:717–725.
Mold DE, Kim IF, Tsai C-M, Lee D, Chang C-Y, Huang RCC. Cluster of genes encoding the major egg envelope protein of zebrafish. Mol Reprod Dev (2001) 58:4–14.[CrossRef][Web of Science][Medline]
Moran P, Kornfield I. Retention of an ancestral polymorphism in the mbuna species flock (Teleostei: Cichlidae) of Lake Malawi. Mol Biol Evol (1993) 10:1015–1029.[Web of Science]
Nei M, Gojobori T. Simple methods for estimating the number of synonymous and non-synonymous nucleotide substitutions. Mol Biol Evol (1986) 3:418–426.[Abstract]
Noonan JP, Grimwood J, Schmutz J, Dickson M, Myers RM. Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res (2004) 14:354–366.
Ohno S. Evolution by gene duplication (1970) New York: Springer-Verlag.
Renn SC, Aubin-Horth N, Hofmann HA. Biologically meaningful expression profiling across species using heterologous hybridization to a cDNA microarray. BMC Genomics (2004) 5:42.[CrossRef][Medline]
Reusch TB, Langefors A. Inter- and intralocus recombination drive Mhc class iib gene diversification in a teleost, the three-spined stickleback Gasterosteus aculeatus. J Mol Evol (2005) 61:531–541.[CrossRef][Web of Science][Medline]
Reusch TB, Schaschl H, Wegner KM. Recent duplication and inter-locus gene conversion in major histocompatibility class ii genes in a teleost, the three-spined stickleback. Immunogenetics (2004) 56:427–437.[Web of Science][Medline]
Rice P, Longden I, Bleasby A. Emboss: the european molecular biology open software suite. Trends Genet (2000) 16:276–277.[CrossRef][Web of Science][Medline]
Salzburger W, Mack T, Verheyen E, Meyer A. Out of Tanganyika: genesis, explosive speciation, key-innovations and phylogeography of the haplochromine cichlid fishes. BMC Evol Biol (2005) 5:17.[CrossRef][Medline]
Salzburger W, Meyer A. The species flocks of East African cichlid fishes: recent advances in molecular phylogenetics and population genetics. Naturwissenschaften (2004) 91:277–290.[Web of Science][Medline]
Schluter D. The ecology of adaptive radiation (2000) Oxford: Oxford University Press.
Schluter D. Ecology and the origin of species. Trends Ecol Evol (2001) 16:372–380.[CrossRef][Medline]
Seehausen O. African cichlid fish: a model system in adaptive radiation research. Proc Biol Sci (2006) 273:1987–1998.
Smith J, Paton IR, Hughes DC, Burt DW. Isolation and mapping the chicken zona pellucida genes: an insight into the evolution of orthologous genes in different species. Mol Reprod Dev (2005) 70:133–145.[CrossRef][Web of Science][Medline]
Spady TC, Seehausen O, Loew ER, Jordan RC, Kocher TD, Carleton KL. Adaptive molecular evolution in the opsin genes of rapidly speciating cichlid species. Mol Biol Evol (2005) 22:1412–1422.
Staden R, Beal KF, Bonfield JK. The staden package, 1998. Methods Mol Biol (2000) 132:115–130.[Medline]
Stajich JE, Block D, Boulez K, et al. The bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 12:1611–1618.
Steinke D, Salzburger W, Meyer A. Novel relationships among ten fish model species revealed based on a phylogenomic analysis using ESTs. J Mol Evol (2006) 62:772–784.[CrossRef][Web of Science][Medline]
Stephens SG. Possible significances of duplication in evolution. Adv Genet (1951) 4:247–265.[Web of Science][Medline]
Stiassny MLJ, Meyer A. Cichlids of the rift lakes. Sci Am (1999) 280:64–69.[Web of Science]
Streelman JT, Albertson RC, Kocher TD. Genome mapping of the orange blotch colour pattern in cichlid fishes. Mol Ecol (2003) 12:2465–2471.[CrossRef][Medline]
Sultmann H, Mayer WE, Figueroa F, Tichy H, Klein J. Phylogenetic analysis of cichlid fishes using nuclear DNA markers. Mol Biol Evol (1995) 12:1033–1047.[Abstract]
Swanson WJ, Clark AG, Waldrip-Dail HM, Wolfner MF, Aquadro CF. Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila. Proc Natl Acad Sci USA (2001) 98:7375–7379.
Swanson WJ, Vacquier VD. The rapid evolution of reproductive proteins. Nat Rev Genet (2002) 3:137–144.[Web of Science][Medline]
Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol (1993) 10:512–526.[Abstract]
Terai Y, Mayer WE, Klein J, Tichy H, Okada N. The effect of selection on a long wavelength-sensitive (lws) opsin gene of Lake Victoria cichlid fishes. Proc Natl Acad Sci USA (2002) 99:15501–15506.
Terai Y, Morikawa N, Kawakami K, Okada N. Accelerated evolution of the surface amino acids in the WD-repeat domain encoded by the hagoromo gene in an explosively speciated lineage of East African cichlid fishes. Mol Biol Evol (2002) 19:574–578.
Terai Y, Morikawa N, Okada N. The evolution of the pro-domain of bone morphogenetic protein 4 (bmp4) in an explosively speciated lineage of East African cichlid fishes. Mol Biol Evol (2002) 19:1628–1632.
Ting CT, Tsaur SC, Sun S, Browne WE, Chen YC, Patel NH, Wu CI. Gene duplication and speciation in Drosophila: evidence from the Odysseus locus. Proc Natl Acad Sci USA (2004) 101:12232–12235.
Verheyen E, Salzburger W, Snoeks J, Meyer A. Origin of the superflock of cichlid fishes from Lake Victoria, East Africa. Science (2003) 300:325–329.
Watanabe M, Kobayashi N, Shin-i T, Horiike T, Tateno Y, Kohara Y, Okada N. Extensive analysis of ORF sequences from two different cichlid species in Lake Victoria provides molecular evidence for a recent radiation event of the Victoria species flock: identity of EST sequences between Haplochromis chilotes and Haplochromis sp. "Redtailsheller". Gene (2004) 343:263–269.[CrossRef][Web of Science][Medline]
Wickler W. Sexually selected genital adornment and sperm packaging in species of Oreochromis (Teleostei: Cichlidae). Copeia (1997) 1997:188–190.[CrossRef]
Wong WS, Yang Z, Goldman N, Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics (2004) 168:1041–1051.
Wu Q. Comparative genomics and diversifying selection of the clustered vertebrate protocadherin genes. Genetics (2005) 169:2179–2188.
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci (1997) 13:555–556.
Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol (2000) 15:496–503.[CrossRef][Medline]
Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol (2000) 17:32–43.
Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol (2002) 19:908–917.
Yang Z, Wong WS, Nielsen R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol (2005) 22:1107–1118.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Sanetra, F. Henning, S. Fukamachi, and A. Meyer A Microsatellite-Based Genetic Linkage Map of the Cichlid Fish, Astatotilapia burtoni (Teleostei): A Comparison of Genomic Architectures Among Rapidly Speciating Cichlids Genetics, May 1, 2009; 182(1): 387 - 397. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


" denotes a recombinant sequence sharing similarity to sequences 1 and 2 from Melanochromis auratus. Asterisk denotes that cDNA sequence Pundamilia nyererei 1 features a "compensatory" substitution altering the premature stop codon to "CGA." (B) The known species phylogeny is based on Salzburger et al. 2005




