Molecular Biology and Evolution 18:1940-1951 (2001)
© 2001 Society for Molecular Biology and Evolution
Low Rates of Silent Substitution in Nuclear Genes of Two Distantly Related Scrophulariaceae (Antirrhinum and Verbascum)
Institute of Cell Animal and Population Biology, University of Edinburgh, Edinburgh, Scotland
| Abstract |
|---|
|
|
|---|
Low levels of genetic diversity and divergence at nuclear loci have previously been observed for cycloidea and fil1-like genes within and between several Antirrhinum species, and divergence at these loci is also low between species in genera at different levels of relatedness in the former family Scrophulariaceae (Digitalis and Verbascum). The low divergence values are surprising, because (based on the sequences of chloroplast loci) the Scrophulariaceae are thought to be polyphyletic, with two anciently diverged clades, and the species we compared belonged to the two different clades. Here, we extend our studies of sequence divergence to more nuclear genes: fil2, far, globosa, and Adh. Detailed studies revealed that in Antirrhinum these genes belong to gene families. Low levels of divergence between Antirrhinum and Verbascum were observed for four of the loci studied, fil2-1, fil2-2, far-L, and globosa, similar to our previous observations. We discuss hypotheses to explain these low synonymous divergence values. For Adh, no cases of very similar sequences were found, but, rather, our sequences from the three different genera (Antirrhinum, Digitalis, and Verbascum) were all very diverged. Repeated gene duplication and loss of elements in the Adh gene family is likely in these lineages, making it impossible to determine orthology of the Adh genes.
| Introduction |
|---|
|
|
|---|
In Antirrhinum, studies of DNA sequence diversity and divergence of genes of the cycloidea and fil1 gene families have revealed surprisingly little variation within species, as well as very low divergence between several Antirrhinum species and Digitalis purpurea (Vieira, Vieira, and Charlesworth 1999
In contrast to the low divergence of the genes we have sequenced in Antirrhinum and Verbascum, allozyme divergence has been found between populations within and between Antirrhinum species. In Antirrhinum lopesianum, Antirrhinum mollissimum, and Antirrhinum microphyllum, allozyme diversity He for the species ranged from 0.19 to 0.52, and Nei's genetic identity values between populations averaged less than 0.96, or D = 0.04 (Nei 1987), for A. mollissimum and only 0.91 (D = 0.09) for A. microphyllum; between the first species and the others, identities were 0.5 and 0.46 based on 14 loci (Mateu-Andres 1999
). These D values are unusually high compared with those of other plant species (Crawford 1989
). Even in two narrow endemic species, within-population diversity was almost 0.1 (Mateu-Andres and Segarra-Moragues 2000
). Similar allozyme differentiation has been found within and between various other species of the former Scrophulariaceae (Elisens and Crawford 1988
; Ritland 1989
; Schoen and Brown 1991
; Elisens 1992
; Elisens and Nelson 1993
). Allozymes may not be an unbiased sample of loci, since they may be chosen on the basis of having variants within species. The fact that differences are found nevertheless leads one to expect some silent or intron site differences, even between Antirrhinum species (although some species can hybridize; Mather 1947
; Harrison and Darby 1955
; Rothmaler 1956
), and certainly between Antirrhinum and Verbascum.
Our aim in this study was to test whether the low divergence we found for several loci was general or an unusual feature of the fil1 and cyc gene families. We therefore sequenced additional genes that have been identified in Antirrhinum and compared sequences from one Antirrhinum species with sequences from Verbascum and, for some loci, Digitalis. The Antirrhinum genes studied here include Adh genes and three genes involved in flower development, fil2, far, and globosa. fil2 is a flower-specific gene that encodes a protein of the extracellular matrix with an LRR signal peptide at the N-terminus (Steinmayr et al. 1994
). A BlastX search reveals that fil2 has more than 62% amino acid identity to polygalacturonase genes from several plants (Actinida, AF263465; Prunus, Af020785, Z49063; Citrus, AB0162206, AB015356, AB016204; Vitis, AF305093). Farinelli (far) and globosa are floral homeotic genes that control petal and stamen development. far belongs to class C, and globosa to class B, of the MADS-box genes (Tröbner et al. 1992
; Davies et al. 1999
).
As we previously found for the fil1 and cyc genes, we show that in Antirrhinum all of these genes, including the alcohol dehydrogenase (Adh) genes, belong to gene families. Like the fil1 genes, four genes belonging to the fil2, far, and globosa gene families yielded identical sequences from Verbascum, but the results from the Adh genes were quite different. Divergence of the Adh sequences was high, even between Antirrhinum and Digitalis, and was at least roughly consistent with the rbcL and ndhF results, although gene duplications and losses make it impossible to establish orthology between Adh gene family members in the different taxa studied.
As explained above, the low divergence between Antirrhinum and Verbascum sequences appears to conflict with the hypothesis of an ancient split between two clades of Scrophulariaceae. We therefore also searched GenBank for pairs of gene sequences that were similar in Antirrhinum and other species of Scrophulariaceae in order to find further potential orthologs whose divergence between species of Scrophulariaceae could be compared. As will be seen, all of the few pairs of sequences that we found were nonorthologous; i.e., the "same" gene was not found in two different species. This suggests that gene families, and perhaps the birth and death of genes, may be common in these plant species, making it impossible at present to estimate sequence divergence between different species in this family.
| Materials and Methods |
|---|
|
|
|---|
Plant Material and PCR Amplification
Leaves of Antirrhinum majus subsp. cirrhigerum (Ficalho) Franco were collected in the field in the Aveiro in the north of Portugal (Vieira, Vieira, and Charlesworth 1999
PCR primers (table 1
) were designed based on the GenBank sequences of the A. majus genes fil2, farinelli (far), and globosa (accession numbers X76995, AJ239057, and X68831, respectively). The regions of each of the genes analyzed are shown in figures 13
. fil2 is known to be a gene family in a number of species, including tomato, melon, maize, and willow (Hadfield and Bennett 1998
; Futamura et al. 2000
). Homologs of far and globosa have also been described as belonging to gene families in several angiosperm species (Yu et al. 1999
; Theissen et al. 2000
), and there is evidence for both ancient and recent duplications of globosa-like genes (Kramer, Dorit, and Irish 1998
). However, among Antirrhinum sequences in GenBank, only far shows nucleotide similarity to any other known Antirrhinum gene (in this case, plena; Davies et al. 1999
).
|
|
We did not characterize the fil2, far, and globosa gene families in detail in either Antirrhinum or Verbascum. However, BlastX searches in the Arabidopsis Information Resource (TAIR) database with A. majus query sequences indicated large families in the Arabidopsis thaliana genome. We took a value of more than 50 amino acid identities in a region more than 100 amino acids long as our criterion for identifying homologous genes. Antirrhinum majus globosa has 34 A. thaliana homologs, including pistillata, Apetala3, and many agamous-like genes. The results are similar for the A. majus far gene (more than 50 homologs, many of them overlapping those found for globosa). Finally, there were more than 100 fil2 homologs, including polygalacturonases, disease resistance genes, and kinase-like receptors. In A. thaliana, all three of these types of genes are known to be large gene families (Rounsley, Ditta, and Yanofsky 1995
Since these genes are clearly members of gene families in Antirrhinum also (see below), additional primers were designed based on the new sequences obtained (see Results). To amplify single members of these gene families, seminested PCR (Cubas, Vincent, and Coen 1999
) was carried out on the product of the initial PCR reactions using further internal primers.
Antirrhinum Adh sequences were not available in GenBank. Primers (adA and adB; see table 1 ) were therefore designed for conserved regions of 20 bp identified based on the alignment of Adh1, Adh2, and Adh3 genes of other dicotyledon species (Solanum tuberosum, M25154, M25153, M25152; Leavenworthia stylosa, AF037564, AF037558, AF037560; Leavenworthia crassa, AF037563). These primers amplify a small region (392 bp) corresponding to part of exon 4 in A. thaliana (D63464).
Standard amplification conditions were 35 cycles of denaturation at 94°C for 30 s, primer annealing at 48°C for 30 s, and primer extension at 72°C for 2 min. Because we were working with very similar sequences, it was important to be extremely careful to avoid contamination. Standard negative controls (PCR cocktail without genomic DNA) were routinely included to ensure that similar sequences from species between which divergence was expected could not be attributed to contamination of the DNA samples or to the equipment used; these controls never yielded PCR products. Also, the PCR amplification was repeated with at least six A. majus individuals, and two of them were always sequenced. Since only one V. nigrum individual was used, DNA from several leaves was extracted independently and treated as different samples. The PCR of these samples always gave the same products.
All PCR amplification products were checked for homogeneity by digestion with several four-cutter restriction enzymes. If the number and/or size of the bands obtained after digestion was not compatible with that of the reference sequence (from which the primers were designed), the product was classified as heterogeneous. In such cases, we cloned the product and screened several colonies until several of each of the sequence types had been identified, and then we determined their DNA sequences. Cloning was performed using the TA cloning kit (Invitrogen). If more than one band was systematically obtained, we always cloned and sequenced all of them. Because differences can arise from nucleotide misincorporation during amplification, we determined the DNA sequences of plasmids from at least three different colonies (from the same PCR reaction) and obtained a consensus sequence. DNA sequencing was performed with an Applied Biosystems model 377 DNA sequencing system with the ABI PRISM BigDye cycle-sequencing kit (Perkin Elmer), using specific primers or the primers for the M13 forward and M13 reverse priming sites of the pCR 2.1 vector.
Sequence Analyses
The DNA sequences were deposited in GenBank (accession numbers AF307068AF307071 for fil2-1, AF307072AF307075 for fil2-2, AF307063AF307064 for farL, AF307065AF307067 for farS, AF307076AF307078 for globosa1, and AF307054307062 for Adh). The nucleotide sequences to be compared were aligned using ClustalX, version 1.64b (Thompson et al. 1997
), and minor manual adjustments were made using SeqPup, version 0.6f. Intron/exon boundaries within the genes were deduced by comparison with the Antirrhinum GenBank cDNA sequences corresponding to the genes from which the primers were designed. The numbers of synonymous and nonsynonymous differences between pairs of sequences were calculated using DnaSP software, version 3.0 (Rozas and Rozas 1999
). Divergence estimates were corrected for multiple hits using Jukes-Cantor correction (Jukes and Cantor 1969
), and neighbor-joining trees were generated for the Adh genes with MEGA, version 1.01 (Kumar, Tamura, and Nei 1994
).
To find homologous genes for comparison between Antirrhinum and other species in the Scrophulariaceae, we used BlastX searches, which use only protein-coding regions (Altschul et al. 1997
), using as queries GenBank sequences from species of Scrophulariaceae other than Antirrhinum. Of the 62 non-Antirrhinum genes available, only six had amino acid homology with Antirrhinum sequences using the criterion described above. These, with their GenBank accession numbers were as follows: ACS1, AF083814 (homology with a gene from Striga hermonthica, AF090351); PHYA, U08142 (homology with a gene from Digitalis lanata, AJ002525); GADPH, X59517 (homology with a gene from Craterostigma plantagineum, X78307); MADS-box transcription factor, Y10750 (homology with a gene from Paulownia kawakamii, AF060880); Chs, X03710 (homology with a gene from Digitalis lanata, AJ002526); TFNS5, AB028151 (homology with a gene from Torenia hybrida, AB028152). One sequence, SUT1 from Asarina barclaiana (AF191024), was similar to a non-Antirrhinum gene (Alonsoa meridionalis, AF191025). For comparative purposes, we also included a sequence of each gene from a species in the related families Geraniaceae (Pelargonium hortorum, U17231), Solanaceae (Sophora affinis, U78835; Petunia, X60346; Solanum tuberosum, AF008651; Nicotiana tabacum, X82276), and Fabaceae (Perilla frutescens, AB002582; Glycine max, D83968).
| Results |
|---|
|
|
|---|
Evidence that fil2, far, globosa, and Adh Are Members of Gene Families
fil2
Primers Fil2A and Fil2B (table 1 ) amplified a PCR product of the predicted size (845 bp) from two A. majus plants. Digesting this band with several restriction enzymes revealed two DNA sequences. We therefore cloned this PCR product and sequenced several clones corresponding to both types of sequence, from both individuals. Each individual had both sequence types, which we denote by fil2-1 and fil2-2, and each of these was identical in the two individuals. Between the fil2-1 and fil2-2 sequences, there were 17 nucleotide differences (9 nonsynonymous and 8 synonymous) and one 4-bp indel in the putative intron (fig. 1 ). Twelve A. majus plants were tested for the presence of the fil2-1 and fil2-2 sequences, and both sequences amplified from all individuals, strongly suggesting that they represent two different genes. Blast searches with these two types of sequences revealed high nucleotide sequence similarity to the Antirrhinum fil2 gene, but both fil2-1 and fil2-2 had several differences from the GenBank sequence. For fil2-1, there were 12 nonsynonymous and 10 synonymous differences in the coding sequences compared, along with five nucleotide differences plus eight indels in the putative intron; for fil2-2, there were five nonsynonymous and four synonymous differences in the coding sequence, along with five nucleotide differences plus seven indels in the intron (fig. 1 ). fil2-2 was similar to fil2-1 at the 5' end, while at the 3' end it was similar to the A. majus GenBank fil2 sequence (fig. 1 ). This is not the result of PCR recombination, since identical results were obtained in four independent PCR reactions using three different sets of primers (Fil2A and Fil2B for A. majus, and Fil2A with fil2-1R or fil2-2R for V. nigrum and V. thapsus).
Based on two fixed differences between fil2-1 and fil2-2 at positions 493 and 495 (fig. 1 ), specific primers were designed to amplify each of these genes separately (table 1 ). Using these primers together with Fil2A, PCR amplification products were also obtained from genomic DNA of V. nigrum and V. thapsus, and their sequences were determined. Both sequences were present in Verbascum, supporting the conclusion that at least two separate fil2 genes exist. These primers were not expected to amplify the GenBank sequence, since the relevant sequences differed from the GenBank sequence. Thus, further sequences might also be present in these species. The V. nigrum and V. thapsus sequences were identical for both fil2-1 and fil2-2. In a comparison of the A. majus and Verbascum sequences, only one difference was found (in the intron) out of 477 bp that could be compared for the fil2-1 gene (table 2 and fig. 1 ; Ks = 0.0032 based on 318 intron and synonymous sites), and none were found in the fil2-2 gene.
|
far
Two bands of different sizes, 905 bp (L) and 731 bp (S), were amplified from A. majus using primers FarF and FarR (table 1 ). Both bands were obtained in all six individuals tested. For two individuals, both L and S were cloned and sequenced in order to establish whether these bands were both specific amplification products. The corresponding sequences from these two individuals were identical. Both the L and the S sequences were similar to the GenBank far sequence (96% and 88% nucleotide identity, respectively). The GenBank A. majus far sequence was most similar to the L sequence, but there were several differences (five synonymous nucleotide differences, 18 nucleotide differences in the putative introns, and five intronic indels), and it is possible that they were not allelic (fig. 2 ).
|
Between the putative coding regions of the L and S sequences, there were 15 silent-site and seven replacement-site differences (fig. 2 ). There were also length differences in the putative introns. Intron 1 of the L sequences was 171 bp, versus 121 bp in the S sequences; the respective intron 3 lengths were 80 versus 97 bp, and those of intron 4 were 254 versus 106 bp. Because of these length differences in the putative introns, the L and S sequences could not be reliably aligned.
To compare orthologs between V. nigrum and A. majus, a new forward primer based on the S sequence was designed in order to amplify this sequence specifically (again, this will not amplify the GenBank sequence). No attempt was made to amplify or further study the V. nigrum far-L. The S-band-specific forward primer spans the end of exon 2 and the beginning of intron 3 (table 1 and fig. 1 ) and yields a fragment of 662 bp in both species. The V. nigrum and A. majus S-band sequences were identical in this region (table 2 ). This finding, together with the number of differences between the two A. majus band size types, strongly suggests that the L and S types of sequences represent two different genes (which we denote by far-S and far-L).
globosa
Primers GloF1 and GloR (table 1
) amplified a PCR product in A. majus. From all six individuals studied from this species, the size of this amplification product was 994 bp, rather than the expected 864 bp based on the Antirrhinum GenBank sequence. We cloned and sequenced the product from two individuals; both were identical. Our sequence was similar in both the intron and the coding region with the GenBank globosa sequence (>95% nucleotide identity for both regions). The size difference was mainly due to a duplication of a 129-bp region of the intron of the GenBank sequence (fig. 3
). Apart from this, out of the 826 sites compared, there were 17 nucleotide differences between our sequence and the one in GenBank (two synonymous differences in putative coding regions and 15 in the intron; fig. 3
), plus five indels in the intron.
|
To test for the presence of a similar sequence in V. nigrum, we designed a new primer (Glogap; see table 1 and fig. 1 ) specific for the region of the gene that is duplicated relative to the GenBank sequence (see above). Since this region was present twice in the target sequence, two bands with different sizes (512 bp and 641 bp) were expected. Both were obtained and sequenced in both V. nigrum and A. majus using primers Glogap and GloR. As expected, there were no differences between the 512-bp band and the corresponding region of the 622-bp band. The 622-bp sequences were identical in V. nigrum and A. majus (table 2 ; there were no differences in the exon regions, but too few sites were compared to permit divergence estimates). Once again, gene duplication was indicated for the globosa gene. If there were only a single copy of this gene, it is very unlikely that a sequence from a distantly related species would be identical to our A. majus sequence but different from that in GenBank.
Adh
Primers adA and adB (table 1
) were designed for sequences in the coding region that are conserved in the Adh-1, Adh-2, and Adh-3 genes of two distantly related genera (see Materials and Methods). PCR products of the expected size (392 bp) were obtained from species of all three genera studied, A. majus, D. purpurea, and V. thapsus. As the primers are based on a conserved region of several paralogous Adh genes, the PCR product is expected to be heterogeneous, and it should include all Adh genes of this family in our species (generally two or three for diploid angiosperms; Small and Wendel 2000
). On average, 30 clones from each species (from several different PCR reactions) were therefore digested with various restriction enzymes (AciI, AluI, RsaI, and DdeI) to determine the different types of sequences amplified. There were three types of clones in A. majus, one in D. purpurea, and two in V. nigrum. For all species, at least four clones of each type were sequenced.
Five different sequences were obtained from a single A. majus individual. A Blast search revealed that all sequences shared the highest amino acid similarity (>62% amino acid identity) with an Adh-like sequence from S. tuberosum (accession number X92179). Since A. majus is a diploid species, the presence of five different sequences implies the presence of at least three genes. We denote them by Adhant1, Adhant2, and Adhant3; these numbers do not imply orthology with other Adh plant genes with the same numbers. The mean pairwise difference per synonymous site between these genes (Ks) ranges from 0.192 (between Adhant1 and Adhant2) to 1.41 (Adhant2 vs. Adhant3). Despite the small portion of Adh coding region sequenced, even the lowest divergence value (0.192, based on 83 synonymous sites) differed significantly (P < 0.001 by a 2 x 2
2 test) from even the shortest coding sequence region from the other genes (fil2-1, 0/53 sites; see table 2
). The two Adhant1 sequences from the single individual studied were similar but not identical (Ks = 0.0247) and could be allelic. The same applies to the two Adhant2 sequences (Ks = 0.0244). These differences were much larger than the average synonymous site diversity for the cyc and fil1 genes of Antirrhinum species (Vieira, Vieira, and Charlesworth 1999
; Vieira and Charlesworth 2001
).
In V. thapsus, the restriction enzyme digestions revealed only two types of clones. Sequences were obtained for each type from a single individual. The Ks value for these sequences was 0.453. They thus appear to represent paralogs from an ancient duplication. Ten D. purpurea clones were sequenced. Two different sequences (digitalis1 and digitalis2; fig. 4 ) were found, differing at four nucleotide positions, two of them nonsynonymous; the Ks estimate based on just these two sequences was 0.0243, and they could be allelic. The differences in copy number made it uncertain which (if any) of these genes were orthologous.
|
Divergence between the sequences from the different species was considerable. Synonymous-site divergence between A. majus and Verbascum (with Jukes-Cantor correction) appeared to be saturated, with values of 1 or greater for all comparisons. Even Ka values ranged from >6% to 19.5%. Figure 4 presents an unrooted gene tree showing the relationships among our Adh gene sequences from species of Scrophulariaceae, plus sequences from Solanaceae taken from GenBank. Given the nature of our adA and adB primers, together with the fact that they evidently amplify very divergent sequences, it is unlikely that the Antirrhinum genome contains further Adh genes more closely related to the Verbascum sequences than the ones amplified. Furthermore, a primer (v1) designed based on the verbascum1 sequence, yielded no amplification product from A. majus with the reverse primer adB over a range of different annealing temperatures down to 45°C, whereas a PCR product of the expected size was always obtained in Verbascum. Thus, for this locus, A. majus has no detectable gene with a sequence similar to verbascum1, in contrast to all the other nuclear genes studied. Adh copy numbers have thus probably changed between these species.
| Discussion |
|---|
|
|
|---|
The Antirrhinum fil2, far, globosa, and Adh Genes Are Members of Gene Families
All four kinds of genes studied here (fil2, far, globosa, and Adh genes) are clearly members of gene families in Antirrhinum. Our fil2, far, and globosa genes are probably not allelic with the A. majus sequences in GenBank (see Results). It is surprising that our primers did not amplify sequences similar to those in GenBank, since they were designed based on these sequences. A possible explanation is that the total copy numbers for these gene families, which are unknown in both Antirrhinum and Verbascum, may be very large. There is evidence that this is the case for A. thaliana (see Materials and Methods). In addition, studying genes that belong to gene families, even small ones, can produce artifactual sequences in several ways, including PCR recombination or combination of partial sequence data from different clones (based on the assumption that a gene is single-copy). It is helpful to check that the entire sequence can be amplified from genomic DNA, but this is not always done. Because of their high sensitivity, PCR approaches are likely to reveal gene families even when Southern blotting suggests a single copy. Heterogeneity of the PCR product may not be immediately apparent (unless the region amplified includes length differences, a band of the expected size will be found) but can readily be detected by studying multiple clones, as described here.
Copy Number and Sequence Differences in the Adh Gene Family
The fact that the genes we have studied are members of gene families complicates molecular evolutionary analysis, as it is essential to determine orthology before comparing sequences. This is not a problem for the fil2, far, and globosa genes, as we found identical sequences in the different species. However, it is clear from the gene tree that duplications and losses have obscured orthology among the Adh genes of Antirrhinum, Verbascum, and Digitalis (fig. 4
). This is consistent with our finding of different copy numbers in the three species studied.
The Adh sequences from the different species are highly diverged, suggesting that some of the gene duplication events are ancient. Sequences from the Solanaceae and Scrophulariaceae form separate clusters, so several gene duplications must have occurred after these two families split (fig. 4
). A minimum of three duplications are required in the lineages of Scrophulariaceae analyzed here, one in each lineage (fig. 4
). In the Solanaceae, at least five duplications must have occurred to explain the data shown in the tree (fig. 4
). Phylogenetic analyses have revealed repeated gains and losses of Adh genes in other genera, even among closely related species (Gaut et al. 1996, 1999
; Morton, Gaut, and Clegg 1996
; Clegg, Cummings, and Durbin 1997
; Small and Wendel 2000
).
The uncertain orthology makes it unclear whether the divergence between the Adh genes of Antirrhinum, Verbascum, and Digitalis is truly more extensive than that for the other genes studied, as pairwise comparisons make it appear. For example, the Adhant3 sequence could be orthologous to those from Verbascum and Digitalis, with considerable divergence (see fig. 4
). The Adh gene Ks values would then be consistent with the high synonymous-site divergence in chloroplast gene sequences between Antirrhinum and Verbascum (about 0.1 based on GenBank sequences of ndhF and trnL), assuming a fivefold faster nuclear gene substitution rate (Gaut 1998
). A formal alternative is that none of our Adh sequences are orthologous. This alternative, however, is less parsimonious when examined closely, as follows.
Antirrhinum has two gene types (the Adhant3 type just mentioned and the two sequences Adhant1 and Adhant2, which are quite similar to one another). Since there are clearly three different types of sequences in the taxa studied, paralogy would, by definition, require two duplications before the taxa diverged, with each species having retained only one of the genes. The common ancestor of Antirrhinum and Digitalis must have had all three types of genes, since Antirrhinum has two of them and Digitalis has the third, so the losses must be recent (after the split from Antirrhinum). Digitalis must have lost all but the third type, and Antirrhinum and Verbascum must have lost the third type, during this period. The only alternative is that although the very divergent (on this hypothesis, paralogous) Digitalis sequence amplifies with our primers, more similar orthologs in the other species do not. This seems unlikely but, if true, implies at least some coding sequence difference, unlike our results for the other genes. We must also account for the absence of an ortholog of Adhant1 and Adhant2 in Verbascum (whose two sequences, presumably due to more recent duplication in this lineage, differ greatly from these). Therefore, either there must be a hypothetical ortholog that fails to amplify (i.e., has diverged), or this hypothesis requires yet another duplication in the Verbascum lineage (after its split from Antirrhinum/Digitalis; otherwise further gene losses are required), with additional loss of the Adhant3 ortholog in Verbascum. Moreover, this implies that the Verbascum sequence diverged from Adhant3 later than the divergence times for the other (orthologous) genes studied, so the much greater sequence divergence remains puzzling. Compared with the hypothesis that Adhant3 is orthologous to the Verbascum and Digitalis sequences, we thus require at least two more duplications, and three more gene losses (or failure to amplify). It is therefore arguable that some sequence divergence almost certainly happened in the Adh sequences and that these genes therefore behave differently from the other genes studied.
Turnover in Plant Gene Families
Our results add to other data showing that gene families are common in plants (e.g., Clegg, Cummings, and Durbin 1997
; Kramer, Dorit, and Irish 1998
; Meyers et al. 1999
; Durbin, McCaig, and Clegg 2000
; Oberholzer, Durbin, and Clegg 2000
; Pan et al. 2000
; Theissen et al. 2000
; Zhang, Pond, and Gaut 2001
). The frequency of duplications is difficult to estimate. Few studies describe the total copy numbers for gene families in a group of species. Based on the average number of independent lineages inferred within Poaceae, Asteraceae, Fabaceae, and Solanaceae, Clegg, Cummings, and Durbin (1997)
suggest a faster rate of duplication for the Chs and rbcS gene families than for the Adh gene family. These authors also suggest that new gene copies arise infrequently within families. However, this may be the result of poor species representation in each family. For instance, for Adh, only one genus was included for each of the families Malvaceae, Vitaceae, Asteraceae, and Pinaceae. Our data suggest duplications of Adh within the Scroph II clade similar to those inferred in the genus Gossypium (Small and Wendel 2000
) and in the Brassicaceae (Koch, Haubold, and Mitchell-Olds 2000
).
Orthology Among Scrophulariaceae GenBank Sequences
To obtain additional evidence on sequence divergence between species plants related to our study species, we also searched GenBank for pairs of homologous nuclear genes in taxa from the former Scrophulariaceae. For any given gene, the divergence values for synonymous sites, Ks (and perhaps nonsynonymous divergence, Ka), for orthologous loci should, of course, reflect the relationships between the species being compared, and more distantly related species should have larger Ks and Ka values than more closely related species. Only seven gene pairs were found (see Materials and Methods). The species from which the comparison sequences are available are not in the same clade as Antirrhinum and Digitalis (Veronicaceae, according to Olmstead et al. 2001
). No phylogenetic data are available for Asarina or Craterostigma. Striga and Paulownia belong to lineages separate from both Scroph I and Scroph II (they are assigned, respectively, to Orobanchaceae and Paulowniae by Olmstead et al. 2001
). Four of the genes compared between the different species (ACS1, PHYA, GADPH, and the MADS-box transcription factor; fig. 5
) clearly cannot be orthologs. Ks values for these genes between different species of the former Scrophulariaceae are as high as, or higher than, those between these species and members of other plant families. For the other three genes (Chs, TFNS5, and SUT1), substitution at silent sites is saturated between the species compared, and Ks values are as high as those for two of the first four genes. It is therefore very unlikely that any of these pairs are orthologous genes. Again, these results indicate that gene families are important in the genomes of these species; otherwise, orthologs would be found.
|
Low Divergence Values Between the Distantly Related A. majus and Verbascum
Table 2 summarizes the new results presented here, together with our previous work on the same taxa, but excluding those from the alcohol dehydrogenases, for which we were unable to determine orthology. In addition, we found identical sequences for five fil1-like genes in A. majus subsp. cirrhigerum and D. purpurea, as well as for one of the cyc-like genes in these two species (cyc4; one sequence from A. majus subsp. cirrhigerum and two from Misopates orontium were identical to the Digitalis cyc4 sequence; the cyc-like genes in which sequence differences were found may not be single loci; see Vieira, Vieira, and Charlesworth 1999
Recent gene duplication cannot explain our findings of highly similar sequences in such distant relatives as Antirrhinum and Verbascum. Concerted evolution due to gene conversion can retard divergence among paralogous sequences within a genome. This could explain the similarities we find among some of the genes within species (Vieira, Vieira, and Charlesworth 1999
) but should not affect divergence of orthologous genes (Ohta 1981, 1984
; Nagylaki and Petes 1982
; Arnheim 1983
).
In the case of the intronless cyc-like genes, only the coding region was analyzed, and it was therefore possible that the low diversity and divergence observed was due to an unusually high level of selective constraint on the coding sequences (Vieira, Vieira, and Charlesworth 1999
). An unusual level of purifying selection cannot, however, readily explain the low diversity and divergence for the genes studied here, since similar results were obtained for introns, nor that for the fil1-like genes, for which our sequences include introns and the 3' noncoding regions (Vieira and Charlesworth 2001
). In Drosophila and other species, synonymous and nonsynonymous divergence rates are positively correlated (reviewed in Dunn, Bielawski, and Yang 2001
), so our results could possibly be explained by strong constraints acting on the amino acid sequences of the genes studied, along with correlated low sequence divergence rates at synonymous sites. However, as already mentioned, the heterogeneity of divergence values between our plant taxa is much more extreme than that observed between Drosophila taxa with synonymous site-divergence similar to that expected based on chloroplast gene divergence between our study taxa.
Codon usage bias is also unlikely to explain the low divergence values we found. Most of the Antirrhinum genes we have studied, including those described here, have low codon usage bias; ENC values (Wright 1990
) are above 48 (Vieira and Charlesworth 2001
). There is therefore no evidence suggesting severe constraints of any kind, although codon usage analyses will not detect every type of constraint. For the fil1 genes, we also found no evidence for constraints imposed by the mRNA structure (Vieira and Charlesworth 2001
).
Another possible explanation for low divergence of nuclear gene sequences is hybridization and introgression in an ancestor of Antirrhinum and Digitalis, resulting in some nuclear genes of these species being similar to Verbascum, even though different chloroplast gene sequences are retained. The New World cotton Gossypium gossypoides has A genome (Old World) nuclear ribosomal DNA sequences, although chloroplast restriction sites and other evidence clearly group it with other New World species, suggesting such an event (Wendel, Schnabel, and Seelanan 1995
), and in Heuchera and Tellima, populations have been found with similar allozymes but differing chloroplast genomes (Soltis and Kuzoff 1996
). However, the chloroplast gene differences between our species are large compared with those between those involved in these examples (estimated to be about 1% for New and Old World cottons; Wendel 1989
) or other documented plant hybrids (Sang, Crawford, and Stuessy 1995
; Rieseberg and Carney 1998
), making exchange less plausible. Moreover, the low nuclear sequence divergence between Antirrhinum and Digitalis remains puzzling if the putative hybridization event is quite old. If this is the explanation for the incongruity, moreover, the relationships of these taxa may need to be reevaluated.
Another possible explanation for the low divergence observed for the fil1, fil2, far, and globosa genes between Antirrhinum and Verbascum is a low rate of nucleotide substitution. Substitution rate estimates for the cyc-like genes (Vieira, Vieira, and Charlesworth 1999
) and fil1A genes (Vieira and Charlesworth 2001
) are lower than most other estimates for nuclear genes in monocotyledons (Wolfe, Sharp, and Li 1989
; Gaut et al. 1996
) and dicotyledons (Small, Ryburn, and Wendel 1999
; Small and Wendel 2000
). Moreover, our estimates assumed implausibly recent origins. On the same basis, at least four of the genes studied here must have equally extreme low substitution rates.
At present, it remains unclear why such low divergence is found between Antirrhinum and Verbascum genes. Many of the genes we have studied are involved in flower development, and it is possible that they evolve much slower than other genes (Purugganan 1998
). Alternatively, it may have to do with their belonging to large gene families. Our data on Adh differ from the results for the other loci. Either there is more gene turnover, so that (unlike the other loci) orthologs are rarely found, or the Adh sequences evolve faster than those of the other loci. More comparative studies using nondevelopmental genes, including allozyme loci, are needed to clarify the situation.
| Acknowledgements |
|---|
|
|
|---|
We thank Jorge Vieira for helpful comments on the manuscript. C.P.V. was supported by the Commission of the European Communities (Grant ERBFMBICT 972455), and D.C. was supported by an NERC Senior Research Fellowship.
| Footnotes |
|---|
Brandon Gaut, Reviewing Editor
1 Present address: Instituto de Biologia Molecular e Celular, Universidade do Porto, Porto, Portugal. ![]()
1 Keywords: Adh
Antirrhinum
far
fil2
globosa
Verbascum ![]()
2 Address for correspondence and reprints: Deborah Charlesworth, Institute of Cell Animal and Population Biology, University of Edinburgh, Ashworth Laboratories, King's Buildings, West Mains Road, Edinburgh EH9 3JT, United Kingdom. deborah.charlesworth{at}ed.ac.uk
. ![]()
| References |
|---|
|
|
|---|
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Arnheim N., 1983 Concerted evolution of multigene families Pp. 3861 in M. Nei and R. K. Koehn, eds. Evolution of genes and proteins. Sinauer, Sunderland, Mass
Clegg M. T., M. P. Cummings, M. L. Durbin, 1997 The evolution of plant nuclear genes Proc. Natl. Acad. Sci. USA 94:7791-7798
Crawford D. J., 1989 Enzyme electrophoresis and plant systematics Pp. 146164 in D. E. Soltis and P. S. Soltis, eds. Isozymes in plant biology. Dioscorides Press, Portland, Oreg
Cubas P., C. Vincent, E. Coen, 1999 An epigenetic mutation responsible for natural variation in floral symmetry Nature 401:157-161[Medline]
Davies B., P. Motte, E. Keck, H. Saedler, H. Sommer, Z. Schwarz-Sommer, 1999 PLENA and FARINELLI: redundancy and regulatory interactions between two Antirrhinum MADS-box factors controlling flower development EMBO J 18:4023-4034[ISI][Medline]
Dunn K. A., J. P. Bielawski, Z. Yang, 2001 Substitution rates in Drosophila nuclear genes: implications for translational selection Genetics 157:295-305
Durbin M. L., B. McCaig, M. T. Clegg, 2000 Molecular evolution of the chalcone synthase multigene family in the morning glory genome Plant Mol. Biol 42:79-92[ISI][Medline]
Elisens W. J., 1992 Genetic divergence in Galvezia (Scrophulariaceae): evolutionary and biogeographic relationships among South American and Galapagos species Am. J. Bot 79:198-206[ISI]
Elisens W. J., D. J. Crawford, 1988 Genetic variation and differentiation in the genus Mabrya (Scrophulariaceae-Antirrhineae): systematic and evolutionary inferences Am. J. Bot 75:85-96[ISI]
Elisens W. J., A. D. Nelson, 1993 Morphological and isozyme divergence in Gambelia (Scrophulariaceae): species delimitation and relationships Syst. Bot 18:454-468
Futamura N., H. Mori, H. Kouchi, K. Shinohara, 2000 Male flower-specific expression of genes for polygalacturonase, pectin methylesterase and beta-1,3 glucanase in a dioecious willow (Salix gilgiana Seemen) Plant Cell Physiol 41:16-26
Gaut B. S., 1998 Molecular clocks and nucleotide substitution rates in higher plants Evol. Biol 30:93-120
Gaut B. S., B. R. Morton, B. C. McCaig, M. T. Clegg, 1996 Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL Proc. Natl. Acad. Sci. USA 93:10274-10279
Gaut B. S., A. S. Peek, B. R. Morton, M. T. Clegg, 1999 Patterns of genetic diversification within the Adh gene family in the grasses (Poaceae) Mol. Biol. Evol 16:1086-1097[Abstract]
Hadfield K. A., A. B. Bennett, 1998 Polygalacturonases: many genes in search of a function Plant Physiol 117:337-343
Harrison B. J., L. A. Darby, 1955 Unilateral hybridization Nature 176:982
Ingram G. C., S. Doyle, R. Carpenter, E. A. Schultz, R. Simon, E. S. Coen, 1997 Dual role for fimbriata in regulating floral homeotic genes and cell division in Antirrhinum EMBO J 16:6521-6534[ISI][Medline]
Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules P. 21 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Koch M., B. Haubold, T. Mitchell-Olds, 2000 Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis and related genera (Brassicaceae) Mol. Biol. Evol 17:1483-1498
Kramer E. M., R. L. Dorit, V. F. Irish, 1998 Molecular evolution of genes controlling petal and stamen development: duplication and divergence within the APETALA3 and PISTILLATA MADS-box gene lineages Genetics 149:765-783
Kumar S., K. Tamura, M. Nei, 1994 MEGAmolecular evolutionary genetics analysis software for microcomputers Comput. Appl. Biosci 10:189-191
Mateu-Andres I., 1999 Allozymic variation and divergence in three species of Antirrhinum L. (Scrophulariaceae-Antirrhineae) Bot. J. Linn. Soc 131:187-199
Mateu-Andres I., J. G. Segarra-Moragues, 2000 Population subdivision and genetic diversity in two narrow endemics of Antirrhinum L Mol. Ecol 9:2081-2087[Medline]
Mather K., 1947 Species crosses in Antirrhinum. I. Genetic isolation of the species majus, glutinosum, and orontium Heredity 1:175-186
Meyers B. C., A. W. Dickerman, R. W. Michelmore, S. Sivaramakrishnan, B. W. Sobral, N. D. Young, 1999 Plant disease resistance genes encode members of an ancient and diverse protein family within the nucleotide-binding superfamily Plant J 20:317-332[ISI][Medline]
Morton B. R., B. S. Gaut, M. T. Clegg, 1996 Evolution of alcohol dehydrogenase genes in the palm and grass families Proc. Natl. Acad. Sci. USA 93:11735-11739
Nagylaki T., T. D. Petes, 1982 Intrachromosomal gene conversion and the maintenance of sequence homogeneity among repeated genes Genetics 100:315-337
Nei M., 1975 Molecular population genetics and evolution North Holland Press, Amsterdam
Oberholzer V., M. L. Durbin, M. T. Clegg, 2000 Comparative genomics of chalcone synthase and Myb genes in the grass family Genes Genet. Syst 75:1-16[ISI][Medline]
Ohta T., 1981 Genetic variation in small multigene families Gent. Res 37:133-149
. 1984 Some models of gene conversion for treating the evolution of multigene families Genetics 106:517-528
Olmstead R. G., C. W. DePamphilis, A. D. Wolfe, N. D. Young, W. J. Elisens, P. A. Reeves, 2001 Disintegration of the Scrophulariaceae Am. J. Bot 88:348-361
Olmstead R. G., P. Reeves, 1995 Evidence for the polyphyly of the Scrophulariaceae based on chloroplast rbcL and ndhF sequences Ann. Mo. Bot. Gard 82:176-193
Pan Q., Y.-S. Liu, O. Budai-Hadriana, M. Sela, L. Carmel-Gorenc, D. Zamirc, R. Fluhra, 2000 Comparative genetics of nucleotide binding site-leucine rich repeat resistance gene homologues in the genomes of two dicotyledons: tomato and Arabidopsis Genetics 155:309-322
Purugganan M. D., 1998 The molecular evolution of development BioEssays 20:700-711[ISI][Medline]
Rieseberg L. H., S. E. Carney, 1998 Plant hybridization New Phytol 140:599-624
Ritland K., 1989 Genetic differentiation, diversity and inbreeding in the mountain monkeyflower (Mimulus caespitosus) of the Washington Cascades Can. J. Bot 67:2017-2024
Rothmaler W., 1956 Taxonomische monographie der gattung Antirrhinum Feddes Rep 136:1-134
Rounsley S. D., G. S. Ditta, M. F. Yanofsky, 1995 Diverse roles for MADS box genes in Arabidopsis development Plant Cell 7:125912-125969
Rozas J., R. Rozas, 1999 DnaSP version 3.0: an integrated program for molecular population genetics and molecular evolution analysis Bioinformatics 15:174-175
Sang T., D. J. Crawford, T. F. Stuessy, 1995 Documentation of reticulate evolution in peonies (Peonia) using internal transcribed spacer sequences of nuclear ribosomal DNAimplications for biogeography and concerted evolution Proc. Natl. Acad. Sci. USA 92:6813-6817
Schoen D. J., A. H. D. Brown, 1991 Intraspecific variation in population gene diversity and effective population size correlates with the mating system in plants Proc. Natl. Acad. Sci. USA 88:4494-4497
Small R. L., J. A. Ryburn, J. F. Wendel, 1999 Low levels of nucleotide diversity at homoeologous Adh loci in allotretraploid cotton (Gossypium L.) Mol. Biol. Evol 16:491-501[Abstract]
Small R. L., J. F. Wendel, 2000 Copy number lability and evolutionary dynamics of the Adh gene family in diploid and tetraploid cotton Genetics 155:1913-1926
Soltis D. E., R. K. Kuzoff, 1996 Discordance between nuclear and chloroplast phylogenies in the Heuchera group Evolution 49:727-742
Soltis P. S., D. E. Soltis, 2000 Contributions of plant molecular systematics to studies of molecular evolution Plant Mol. Biol 42:45-75[ISI][Medline]
Soltis P. S., D. E. Soltis, M. W. Chase, 1999 Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology Nature 402:402-404
Soltis D. E., P. S. Soltis, D. L. Nickrent, et al. (16 co-authors) 1997 Angiosperm phylogeny inferred from 18S ribosomal DNA sequences Ann. Mo. Bot. Gard 84:1-49
Steinmayr M., P. Motte, H. Sommer, H. Saedler, Z. Schwarz-Sommer, 1994 FIL2, an extracellular Leucine-Rich Repeat protein, is specifically expressed in Antirrhinum flowers Plant J 5:459-467[ISI][Medline]
Theissen G., A. Becker, A. Di Rosa, A. Kanno, J. T. Kim, T. Munster, K. U. Winter, H. Saedler, 2000 A short history of MADS-box genes in plants Plant Mol. Biol 42:115-149[ISI][Medline]
Thompson J., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The ClustalX window interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882
Torki M., P. Mandaron, F. Thomas, F. Quigley, R. Mache, D. Falconet, 1999 Differential expression of a polygalacturonase gene family in Arabidopsis thaliana Mol. Gen. Genet 261:948-952[ISI][Medline]
Tröbner W., L. Ramirez, P. Motte, I. Hue, P. Huijser, W. E. Lonnig, H. Saedler, H. Sommer, Z. Schwarz-Sommer, 1992 GLOBOSA: a homeotic gene which interacts with DEFICIENS in the control of Antirrhinum floral organogenesis EMBO J 11:4693-4704[ISI][Medline]
Vieira C. P., D. Charlesworth, 2001 Low diversity and divergence in the fil1 gene family of Antirrhinum (Scrophulariaceae) J. Mol. Evol 52:171-181[ISI][Medline]
Vieira C. P., J. Vieira, D. Charlesworth, 1999 Evolution of the cycloidea gene family in Antirrhinum and Misopates Mol. Biol. Evol 16:1474-1483[Abstract]
Wendel J. F., 1989 New world tetraploid cottons contain old-world cytoplasm Proc. Natl. Acad. Sci. USA 86:4132-4136
Wendel J. F., A. Schnabel, T. Seelanan, 1995 An unusual ribosomal DNA-sequence from Gossypium gossypioides reveals ancient, cryptic, intergenomic introgression Mol. Phylogenet. Evol 4:298-313[ISI]




