Molecular Biology and Evolution 19:1181-1197 (2002)
© 2002 Society for Molecular Biology and Evolution
Integrating Genomics, Bioinformatics, and Classical Genetics to Study the Effects of Recombination on Genome Evolution
Department of Ecology and Evolutionary Biology, University of Arizona, Tucson
| Abstract |
|---|
|
|
|---|
This study presents compelling evidence that recombination significantly increases the silent GC content of a genome in a selectively neutral manner, resulting in a highly significant positive correlation between recombination and "GC3s" in the yeast Saccharomyces cerevisiae. Neither selection nor mutation can explain this relationship. A highly significant GC-biased mismatch repair system is documented for the first time in any member of the Kingdom Fungi. Much of the variation in the GC3s within yeast appears to result from GC-biased gene conversion. Evidence suggests that GC-biased mismatch repair exists in numerous organisms spanning six kingdoms. This transkingdom GC mismatch repair bias may have evolved in response to a ubiquitous AT mutational bias. A significant positive correlation between recombination and GC content is found in many of these same organisms, suggesting that the processes influencing the evolution of the yeast genome may be a general phenomenon. Nonrecombining regions of the genome and nonrecombining genomes would not be subject to this type of molecular drive. It is suggested that the low GC content characteristic of many nonrecombining genomes may be the result of three processes (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation, and (3) the absence of the GC-biased gene conversion which, in recombining organisms, permits the reversal of the most common types of mutation. A model is proposed to explain the observation that introns, intergenic regions, and pseudogenes typically have lower GC content than the silent sites of corresponding open reading frames. This model is based on the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. According to this "Constraint" hypothesis, the formation and propagation of heteroduplex DNA is expected to occur, on average, more frequently within conserved coding and regulatory regions of the genome. In organisms possessing GC-biased mismatch repair, this would enhance the GC content of these regions through biased gene conversion. These findings have a number of important implications for the way we view genome evolution and suggest a new model for the evolution of sex.
| Introduction |
|---|
|
|
|---|
The genomes of warm-blooded vertebrates consist of large regions (>300 kb) of relatively homogeneous GC content termed isochores (Bernardi 1986
The hypotheses proposed to explain the heterogeneity in GC content are those that favor selection (Bernardi 1986
, 2000
; Charlesworth 1994
), regional mutational biases (Sueoka 1962
; Filipski 1987
; Wolfe, Sharp, and Li 1989
; Gu and Li 1994
; Francino and Ochman 1999
), or biased gene conversion (BGC) (Brown and Jiricny 1989
; Holmquist 1992
; Eyre-Walker 1993
; Charlesworth 1994
; Galtier et al. 2001
). This study presents evidence in favor of the BGC model, according to which GC-biased mismatch repair results in GC-biased gene conversion within the heteroduplexes formed during recombination. Over an evolutionary time scale, these processes result in a positive relationship between recombination and GC content.
Positive Correlation Between Recombination and GC Content
Positive correlations have been found between recombination and GC content in humans (Ikemura and Wada 1991
; Eyre-Walker 1993
; Eisenbarth et al. 2000
, 2001
; Fullerton, Bernardo Carvalho, and Clark 2001
; Galtier et al. 2001
; unpublished data), birds (Hurst, Brunton, and Smith 1999
; Galtier et al. 2001
; unpublished data), rodents (Williams and Hurst 2000
), worms (Marais, Mouchiroud, and Duret 2001
), insects (Marais, Mouchiroud, and Duret 2001
; Takano-Shimizu 2001
), and plants (unpublished data).
Recombination and GC Content in the YeastS. cerevisiae
Gerton et al. (2000)
used DNA microarrays, in an elegant and pioneering study, to map the relative rate of recombination throughout the S. cerevisiae genome at a resolution of about 12 kb. This study revealed a genomewide correlation between recombination and total GC content, a relationship that had previously been observed only on chromosome III (Sharp and Lloyd 1993
). When the GC content within a 5-kb window was examined, there was a total of 221 GC peaks in which the GC content was >3% higher than the chromosome mean (Gerton et al. 2000
). There was a total of 177 recombination hot spots. If these were distributed at random, one would expect 18 hot spots within 2.5 kb of a peak; however, there were 99 peak-associated hot spots (P < 0.001) (Gerton et al. 2000
). The GC content of a 5-kb window with its center at the middle of each hot spot exceeded the mean GC content of the chromosome in 162 of 177 hot spots (P < 0.001). A significant correlation between the ranking of the hot spots and their GC content was also found (P < 0.002) (Gerton et al. 2000
).
Because the total GC content of a gene is not particularly sensitive to mutational or substitutional biases, an analysis designed to provide more power to determine the relationship between recombination and GC content was undertaken. This study made use of the fact that silent GC content ("GC3s") is the most sensitive measure of mutational or substitutional biases.
| Methods |
|---|
|
|
|---|
Two different methods were used to analyze the relationship between recombination and GC content in the yeast genome. In the first, the correlation between recombination and GC content was determined directly using the data from Gerton et al. (2000)
khwolfe/yeast/), which was searched (using the Smith-Waterman algorithm [Pearson 1991
The second major approach used in this study was to analyze mismatch repair data from 12 studies to determine whether there was any repair bias. Six of these studies involved mismatch repair of heteroduplex plasmid DNA in mitotic wild-type strains. These studies used plasmids containing a reporter gene having a defined mismatch in a defined orientation. The correction of this mismatch could give rise to one of the two visibly distinct colony phenotypes depending upon the direction of correction. A proximally located nick can confer a very slight, though sometimes significant, bias in repair to the nicked strand (i.e., the base on the nicked strand is replaced) (Yang et al. 1999
). If the nick is 3.5 kb or further away from the mismatch, it does not bias mismatch repair in favor of either strand (Bishop, Andersen, and Kolodner 1989
; Yang et al. 1999
). In five of the studies analyzed, the nick was positioned at least 3.5 kb away from the mismatch (Kramer et al. 1989
; Kunz, Kang, and Kohalmi 1991
; Kang and Kunz 1992
; Yang et al. 1996
, 1999
). In the sixth study, the plasmid was ligated such that between 70% and 95% of all plasmids were covalently closed circles (Bishop, Andersen, and Kolodner 1989
). Only corrections of heteromismatches (e.g., G/T, A/C, etc.) in wild-type strains were considered. Throughout this article, the terms "GC" and "AT" are used to refer to G/C or C/G and A/T or T/A base pairs, respectively. Mismatches involving the same two nucleotides in opposite orientations (e.g., G/T and T/G) were pooled within the same experiment, but the pooling of the observations between experiments was determined to be statistically inappropriate by means of a heterogeneity chi-square test (Zar 1984, pp. 4952
). Another six studies were analyzed to determine the direction of repair of meiotic-induced heteroduplexes in the HIS4 recombination hot spot. In this analysis, segregation ratios of 6:2, 2:6, 7:1, and 1:7 were counted as one conversion event, whereas segregation ratios of 8:0 and 0:8 were counted as two conversion events.
Global recombinational data were obtained from the data of Gerton et al. (2000)
at http://derisilab.uscf.edu/hotspots/. The yeast coding and intergenic sequences were obtained from the Stanford Genome database at http://genome-www.stanford.edu/Saccharomyces/. The ORF nucleotide content was determined using the General Codon Usage Analysis package available at http://www.bioinf.org/vibe/software/gcua/download.html (McInerney 1998
). The analysis of codon usage bias and intergenic and intronic GC contents was performed using the Molecular Evolutionary Analysis Package (Version 6/22/00) kindly provided by Etsuko Moriyama (emoriyama2@unlnotes.unl.edu.)
The GenBank sequences of the S. cerevisiae intron containing ORFs along with the sequences of their introns were kindly provided by Francis Clarke at http://www.maths.uq.edu.au/
fc/datasets/. Redundant sequences were removed using CLEANUP (Grillo et al. 1996
) available at http://bighost.area.ba.cnr.it/BIG/CleanUP/. The sequences of 697 yeast regulatory elements were obtained from the Promoter Database of Saccharomyces cerevisisae at http://cgsigma.cshl.org/jian/.
To examine the effects of BGC within a gene that has recently changed its recombinational environment, I analyzed substitutions within the Mus musculus Fxy gene. Coding sequences for human (AF035360), M. musculus (AF026565), M. spretus (AF186460), and Rattus novegicus (AF186461) Fxy genes were aligned in Clustal X. The sequences were highly similar, and the alignments produced no gaps. Ancestral sequences were reconstructed using maximum likelihood analysis (codeml) no molecular clock (unrooted tree) option implemented in PAML (3.0c) (Yang 1997
) (http://abacus.gene.ucl.ac.uk/software/paml.html). The number and direction of silent third position substitutions were compared with the expected number on the basis of the third position base composition of the inferred ancestral sequence.
All chi-square calculations used the Yates correction for continuity (Zar 1984, p. 48
). The Kolmogorav-Smirnov tests (Zar 1984, p. 91
) were used to test for normality, and the parametric tests were used when appropriate. The arcsine transformation was performed on proportional data (Zar 1984, p. 239
). The measures of variance are standard errors unless otherwise stated. Tests of significance were two tailed, except for tests of correlation for which one-tailed tests are appropriate (Zar 1984, p. 309
). BLASTP searches (Altschul et al. 1997
) for homologs of known GC-biased mismatch repair enzymes, such as Escherichia coli MutYprotein and human TDG protein, were performed at http://www.ncbi.nlm.nih.gov/BLAST/, and tBLASTn searches were performed against a series of partially completed genomes at the TIGR BLAST site http://tigrblast.tigr.org/tgi/.
| Results and Discussion |
|---|
|
|
|---|
Mismatch Repair Bias
To determine whether there is any evidence of a mismatch repair bias in S. cerevisiae, an analysis was made of the repair of heteroduplex DNA in mitotic cells. Of a total of 72,971 repaired heteromismatches, 42,242 (57.9%) were repaired to G/C or C/G, whereas only 30,729 (42.1%) were repaired to A/T or T/A (tables 1 and 2 ). This represents a highly significant GC bias according to a Wilcoxon test (performed on the number of mismatches corrected to GC vs. the number corrected to AT for all the 50 experiments), Z = -5.09 (P = 3.6 x 10-7). Out of the 50 experiments, 36 showed a significant repair bias toward GC, whereas only 3 showed a significant repair bias toward AT. This difference (36 to 3) is, itself, highly significant,
2 = 26.3 (P = 2.9 x 10-7). The mean ratio of repair to GC versus AT for all the 50 experiments is 1.48 ± 0.11 to 1. The GC repair bias was most pronounced for G/T mismatches which exhibited a mean bias of 1.71 ± 0.338 to 1 (n = 15), followed by C/T mismatches (1.50 ± 0.14 to 1; n = 9), A/G mismatches (1.38 ± 0.13 to 1; n = 9), and A/C mismatches (1.31 ± 0.05 to 1; n = 17).
|
|
To determine whether distally located nicks (3.5 kb or further away from the mismatch) were responsible for the observed GC bias in repair, a signed rank test was performed on the number of mismatches corrected to GC versus AT for the 23 experiments having an A or a T on the nicked strand. This demonstrated a very significant GC bias, Z = -2.71 (P = 0.0067). The 23 experiments having a G or a C on the nicked strand had an even more significant GC bias, Z = -4.1 (P = 2.0 x 10-5). Although there may be a slight repair bias to the strand possessing the distally located nick, it is clear that regardless of where the nick is located, there is a highly significant GC repair bias. In addition, there was no evidence of any significant difference in the relative efficiency of repair of different heteromismatches, as determined by a single factor ANOVA, F = 1.87, df = 7 (P = 0.10).
The mismatch repair studies analyzed in the preceding section (see Methods) all involved the repair of plasmid DNA in mitotically dividing cells. To determine whether similar repair biases exist in chromosomal DNA in meiotic cells, six studies involving a total of 2,148 informative gene conversion events were analyzed. Of these, 1,186 were corrected to GC or CG, whereas 962 were corrected to AT or TA (table 3 ). A comparison of the number of mismatches corrected in each direction revealed a significant GC bias, Wilcoxon Z = -2.04, (P = 0.041). The mean bias of all 15 experiments was 1.22 ± 0.10 to 1. A comparison of the mean mitotic repair bias (1.48 ± 0.11) shows that although this bias was greater than the mean meiotic repair bias (1.22 ± 0.10), the difference was not significant (Mann-Whitney Z = -1.76, P = 0.08).
|
These results provide the first evidence, in any fungus, of a significant GC mismatch repair bias. This bias is found in both meiosis and mitosis and suggests that S. cerevisiae may possess hereto uncharacterized mismatch-specific thymine glycosylase and adenine glycosylase activities. Protein blast searches did not reveal any ORFs with significant homology to known mismatch-specific adenine or thymine glycosylases. I suggest that genes, of as yet uncharacterized function, may be responsible for the observed mismatch repair biases.
It is noteworthy that the relative repair biases associated with different mismatches in yeast (G/T >C/T > A/G > A/C) are the same as those found through the analysis of the simian mismatch repair data of Brown and Jiricny (1988)
(i.e., G/T > C/T > A/G > A/C). The evolution of similar mismatch repair biases may have occurred as a response to similar underlying biological phenomena.
The fact that S. cerevisiae does not possess 5-methylcytosine (Proffitt et al. 1984
) may be reflected in the differences between the relative repair biases of G/T mismatches in yeast and mammals. The mutagenic potential of 5-methylcytosine is well known (Coulondre et al. 1978
; Duncan and Miller 1980
), and mammals possess substantial quantities of 5-methylcytosine. Not surprisingly, mammalian cells have evolved very efficient mechanisms to repair G/T mismatches and show a highly significant bias in favor of GC over AT of 24 to 1 (Brown and Jiricny 1988
). Compare this with the more modest 1.71 to 1 GC bias seen in yeast which lacks 5-methylcytosine. I suggest that these differences may, in part, be attributable to the amount of 5-methylcytosine within the genomes of these two types of organisms and that, in general, GC mismatch repair biases will be found to be substantially greater in aerobic organisms possessing 5-methylcytosine.
Recombination versus GC Content
Within the 6,143 yeast ORFs analyzed, there is a highly significant positive correlation between silent GC content (GC3s) and recombination (fig. 1
and table 4
). The mean GC content of first and second codon positions (GC1 + GC2)/2 also shows a significant, though much lower, correlation with recombination. The 100 hottest loci have a significantly greater GC3s (48.7% ± 0.80%) than the 100 coldest loci (34.85% ± 0.44%), Mann-Whitney Z = -10.554 (P = 4.8 x 10-26).
|
|
The GC3s is also highly correlated with recombination rate within the 10 sets of 50 loci, (fig. 2 ), and increases monotonically with increasing levels of recombination for all groups, except the last group containing the 50 hottest loci. This last group actually has a significantly lower GC3s than the preceding group of 50 (47.0 ± 1.13 vs. 50.4 ± 1.10, Mann-Whitney Z = -2.20, P = 0.028). If recombination is mutagenic and does have an AT bias, then in extremely recombinagenic loci, this mutational effect may slightly overcome the substitutional bias toward GC caused by BGC (see subsequent discussion).
|
The yeast genome contains hundreds of duplications, allowing a comparison of the GC3s of recombinationally hot loci with that of their recombinationally cool paralogs. This approach minimizes the effects of amino acid composition, selective constraints, and gene length on the observed GC3s and reveals that recombinationally hot loci have a significantly greater GC3s (45.5% ± 1.25%) than their nonhot paralogs (37.6% ± 0.85%), Wilcoxon Z = -4.889 (P = 1.0 x 10-6). There is no significant difference, however, between the mean length of these coding sequences, Z = -0.645 (P = 0.52).
It is important to emphasize that the correlations described earlier in this article are not simply broad ranging relationships seen over hundreds of kilobasepairs but rather occur at a fine scale. This can be visualized in figure 3
, which shows a plot of GC3s versus recombination for chromosomes 13. As can be seen, GC3s frequently closely mirrors the relative recombination rates of individual ORFs even over short distances encompassing two to four ORFs.
|
|
A significant positive correlation was found between the difference in recombinational activity of the hot loci and their nonhot paralogs versus the difference between the hot GC3s and the GC3s of their nonhot paralogs (fig. 4 ). The difference between the GC3s of the recombinationally active loci and their nonactive paralogs increases significantly as the GC3s of the recombinationally active locus increases (fig. 5 ). This relationship may reflect different lengths of time since duplication and different lengths of time spent in recombinationally hot and nonhot regions of the genome.
|
|
This second set of results demonstrate a highly significant, linear relationship between silent GC content and recombination in S. cerevisiae. There are four possible explanations for these observations (1) GC content, per se, may stimulate recombination, (2) selection, (3) mutational bias, and (4) BGC. Each of these hypotheses will be discussed in turn, and evidence will be presented in favor of the fourth hypothesis.
GC Content may Stimulate Recombination
In yeast, most recombination initiating double-strand breaks occur intergenically (Baudat and Nicholas 1997
; Gerton et al. 2000
), and it has been suggested that the location of these recombination hot spots may be determined, in part, by the GC richness of the adjacent ORFs (Gerton et al. 2000; Petes 2001
). If the GC content of the ORFs drives recombination, then the total GC content of the ORFs should explain far more of the variation in recombination rates than GC3s alone. Contrary to this expectation, a Spearman correlation shows that GC3s explains 41% more of the variation in recombination rates and is 61 orders of magnitude more significant than the correlation between total GC content and recombination (table 4
). This result is not compatible with a model in which the GC content of ORFs determines their recombination rates but is compatible with BGC.
Galtier et al. (2001)
pointed out further evidence, derived from a study by Perry and Ashworth (1999)
, that recombination drives GC content and not the converse. In mammals, the pseudoautosomal region recombines at a high rate (Ellis and Goodfellow 1989
; Blaschke and Rappold 1997
). In humans, R. norvegicus, and M. spretus, the Fxy gene is located exclusively on the X chromosome (Perry and Ashworth 1999
). In M. musculos, the Fxy has been rearranged sometime within the past 3 Myr such that the 3' 1,248 nucleotides are now located in the pseudoautosomal region, leaving 756 nucleotides (GC3s = 63%) on the nonpseudoautosomal X (Ferris et al. 1983
; Palmer et al. 1997
). This rearrangement was followed by a dramatic increase in the GC content of the pseudoautosomally located Fxy (GC3s = 72%) (Perry and Ashworth 1999
). The 5' region of this gene in both M. musculus and M. spretus is equally divergent from both the rat and human genes. However, in M. musculus, the recombinationally hot 3' end of this gene has experienced a 170-fold greater synonymous substitution rate than the homologous region of the same gene in M. spretus (Perry and Ashworth 1999
).
To further demonstrate that recombination drives GC content, not the converse, I analyzed the direction of substitutions within the M. musculus and M. spretus Fxy genes. There were a total of 133 substitutions within M. musculus Fxy, and of these, 106 are silent third position changes. Of the 133 substitutions, 127 involved AT to GC substitutions, whereas only 2 involved GC to AT substitutions. In contrast, the M. spretus gene has a very low rate of substitution, with only one AT to GC and two GC to AT substitutions. A frequency distribution of the position of these substitutions (fig. 6
) shows that the frequency of substitutions increases dramatically at the pseudoautosomal boundary. Within the pseudoautosomal region there were 105 silent third position changes. Of these, 102 were AT to GC, whereas none were in the opposite direction. On the basis of the ancestral GC3 content of the corresponding region (50.1%), the expected number of silent substitutions is 51 AT to GC and 51 GC to AT. The observed number is significantly different,
2 = 100.0 (P = 1.5 x 10-23). These observations provide a vivid example of how an increase in the rate of recombination can dramatically increase silent GC content over an evolutionarily brief time span.
|
Selection
It is very difficult to envision how selection could be responsible for the mouse Fxy data because it would imply an enormous mutational load (Galtier et al. 2001
= -0.022, Z = -1.73 (P = 0.42), or between recombination and CAI. A scatter plot reveals a distinctive L-shaped distribution, with genes having the highest CAI values being confined primarily to regions of lower recombination (fig. 7
). The 500 loci with the highest CAIs actually had lower mean array values than the 500 loci with the lowest CAIs (1.17 ± 0.014 vs. 1.22 ± 0.017), though this difference was not significant. These findings argue strongly against any of the selectionist hypotheses as an explanation for the correlation between GC3s content and recombination. The observation of a positive correlation between CAI and mRNA levels could be caused by the enhanced efficacy of selection brought about by a very large effective population size.
|
Mutational Bias
It has been suggested that the variation in the GC content could be explained by regional differences in mutational biases (Sueoka 1962
2 is 15.0 with a P-value of 0.0001 (J. A. Birdsell, unpublished data). Although the data are limited, they show a significant AT mutational bias. This is an important finding, especially if it turns out to be a general phenomenon. Such a bias would provide an even greater selective advantage to the evolution of the GC-biased mismatch repair systems. It must be pointed out that if recombination does have an AT mutational bias, this would in no way contradict the BGC model because the mutation rate is far lower than the rate of biased conversion.
I suggest that another, potentially serious, drawback to the suggestion that recombination causes a GC mutational bias is that in the presence of GC-biased mismatch repair, such a combination would have the potential to greatly increase the mutational load (Bengtsson 1990
).
One final problem with the mutational bias hypotheses resides in the fact that the GC content of introns and intergenic regions is typically significantly lower than the GC3s of the genes in which they reside (Aota and Ikemura 1986
; D'Onofrio et al. 1991
; Clay et al. 1996
; Hughes and Yeager 1997
; Musto et al. 1999
). This is incompatible with a mutational model (Hughes and Yeager 1997
; Eyre-Walker 1999
). The same holds true for yeast introns, which have a significantly lower GC content (33.8% ± 0.003%) than the GC3s of the corresponding ORFs (38.8% ± 0.005%), t = 10.4, df = 220 (P = 7.3 x 10-21) (unpublished data). The mean size of these 228 introns (belonging to 221 ORFs) is 284 ± 14 bp. It is very unlikely that these very short introns could be subject to a different mutational pressure than the exons on either side of them. Identical arguments apply to the intergenic regions of yeast (averaging < 500 bp), which, for the genome as a whole, have an average GC content of 33.16% ± 0.063% as compared with a genomewide average GC3s of 37.00% ± 0.08%, Mann-Whitney Z = -31.6 (P = 3.7 x 10-219).
Biased Gene Conversion
Brown and Jiricny (1989)
pointed out that GC-biased mismatch repair could lead to an increase in the GC content of recombinationally active regions of the genome. The biased gene conversion (BGC) model does not attempt to explain the location or the reason for the occurrence of recombination initiation hot spots (i.e., the locations where double-strand breaks occur during meiosis). Rather, it explains the observation in numerous organisms of a significant positive correlation between recombination (i.e., regions of heteroduplex formation) and GC content.
The data presented in this paper is totally consistent with the operation of BGC in yeast. If the BGC model is a general phenomenon, then one should be able to demonstrate GC-biased mismatch repair in other organisms showing a positive relationship between recombination and GC content. On the basis of the evidence presented below, I suggest there is a transkingdom GC bias in mismatch repair systems. This would explain the correlations seen in numerous organisms between recombination and GC content. I have compiled evidence of GC-biased mismatch repair in organisms spanning six kingdoms (sensu Woese, Kandler, and Wheelis 1990
; table 5
) (unpublished data). It may not be surprising that these organisms appear to possess GC-biased mismatch repair because evidence suggests that many of them are subject to an AT-biased mutational pressure (J. A. Birdsell, unpublished data; table 5
). Such a mutational pressure can result from a variety of fundamental processes, including the spontaneous deamination of cytosine to Uracil or 5-methylcytosine to thymine (Coulondre et al. 1978
; Duncan and Miller 1980
), oxidative damage to cytosine (Kreutzer and Essigmann 1998
) or guanine (Newcomb and Loeb 1998
), or UV irradiation (Peng and Shaw 1996
), all of which can result in GC to AT or TA mutations. The fact that virtually every organism in which a correlation has been found between recombination and GC content appears to possess a GC-biased mismatch repair system provides strong circumstantial evidence in favor of the BGC model. Recent articles by Eyre-Walker and Hurst (2001)
and Galtier et al. (2001)
provide additional support for the BGC model.
|
Potential Drawbacks of the BGC Model
There are several potential problems with the BGC model (Eyre-Walker 1999
|
The observation that introns have a lower GC content than the neighboring exons appears difficult to reconcile with the BGC model (Eyre-Walker 1999
The Constraint Model
The intergenic regions of many organisms have, on average, lower GC content than the silent GC content of ORFs on either side of them (Clay et al. 1996
). Pseudogenes also usually have a lower GC content than their functional counterparts (Gojobori, Li, and Graur 1982
; Li, Wu, and Luo 1984
; Petrov and Hartl 1999
). Here I propose a model, referred to as the Constraint hypothesis, which may explain the lower GC content of introns, intergenic regions, and pseudogenes. This model is based upon the observation that the greater the heterology between two sequences, the less likely it is that recombination will occur between them. This "antirecombinagenic" effect of sequence heterology has been well documented in prokaryotes (Shen and Huang 1989
; Roberts and Cohan 1993
; Vulic et al. 1997
), yeast (Borts and Haber 1987
; Datta et al. 1997
; Chen and Jinks-Robertson 1999
), and mammalian cells (Waldman and Liskay 1988
; Lukacsovich and Waldman 1999
), and mismatch repair enzymes have been shown to be responsible for preventing recombination between diverged sequences in a variety of organisms (Rayssiguier, Thaler, and Radman 1989
; Borts et al. 1990
; Chen and Jinks-Robertson 1999
).
Nonregulatory, noncoding regions of the genome are under less selective constraint than regulatory or coding regions; therefore, they evolve more rapidly and have higher levels of polymorphism (Hughes and Yeager 1997
; Shabalina et al. 2001
). I suggest that, on average, heteroduplex formation and propagation should be expected to occur most frequently within conserved coding and regulatory regions of the genome. At the population level, these more conserved regions of the genome will possess lower levels of polymorphism, which, in an outcrossing organism, translates into less heterology within the individual. This could explain why the GC3s of coding regions and the GC content of regulatory regions (Babenko et al. 1999
; unpublished data) is higher than the GC content of introns, intergenic regions, or pseudogenes. The analysis of sequences from 697 yeast regulatory elements (belonging to 99 different element types) shows that they have a significantly higher mean GC content (45.72 ± 0.66) than yeast intergenic regions as a whole (33.16 ± 0.06) (Mann-Whitney Z = -23.3; P = 4.4 x 10-120). The mean GC content of these 99 types of regulatory element (47.47 ± 1.53) is also significantly greater than that of intergenic regions, Z = -11.9 (P = 1.2 x 10-32). These 697 regulatory sequences also have a significantly higher GC content than the mean GC3s of 6,330 ORFs (37.00 ± 0.08), Z = -16.81 (P = 2.0 x 10-63), as do the 99 types of element, Z = -9.1 (P = 9.0 x 10-20).
The process proposed by the Constraint model would lead to a positive feedback loop in which selective constraint leads to increased rates of recombination, which in turn would enhance the efficacy of selection, thereby increasing the selective constraint. The Constraint hypothesis does not seek to explain the cause or location of recombination initiation hot spots. Rather, it seeks to point out that, given there is a recombination hot spot, heteroduplex formation and propagation will, on average, proceed from this hot spot into the conserved coding or regulatory regions more frequently than into nonconserved regions. An important implication of the Constraint hypothesis is that the large intergenic and intronic regions of organisms such as humans would not contribute proportionately to the genetic map size of such organisms. Further support for this Constraint model comes from a number of independent sources and will be presented in detail elsewhere.
| Conclusions |
|---|
|
|
|---|
A number of lines of evidence are presented in this paper in support of the BGC model. There is a highly significant positive correlation between recombination and silent GC content in the yeast S. cerevisiae. This relationship cannot be explained by any of the other models examined. Any model attempting to explain regional variations in GC content must not only explain the relationship between GC content and recombination but also the observation that GC3s is almost always higher than the GC content of introns, pseudogenes, or intergenic regions. The BGC model, in conjunction with the Constraint model, can do so. For the first time in any member of the fungi kingdom, a significant GC-biased mismatch repair system is found operating in both mitotic as well as meiotic cells. This repair bias may have evolved in response to the AT mutational bias to which S. cerevisiae is subjected. Much of the variation in the GC content within the yeast genome may therefore be a result of the interplay between AT-biased mutational pressure and GC-biased gene conversion.
Evidence suggests that a number of other organisms spanning several kingdoms may be subjected to similar processes. Virtually all organisms in which a correlation exists between recombination and GC content also appear to possess GC-biased mismatch repair. I suggest that this transkingdom GC bias in mismatch repair systems has evolved in response to a prevailing AT mutational bias resulting from fundamental properties of DNA.
Nonrecombining regions of the genome and nonrecombining genomes would not be subject to the molecular drive caused by BGC. I suggest that the low GC content, characteristic of nonrecombining genomes, may be the result of three processes: (1) a prevailing AT mutational bias, (2) random fixation of the most common types of mutation caused by genetic drift, and (3) the absence of GC-biased gene conversion which, in recombining organisms, would permit the reversal of the most common form of mutation.
A model is presented to explain the observations that the GC content of introns, pseudogenes, and intergenic regions is almost always lower than the silent GC content of open reading frames. According to this Constraint model, heteroduplex formation and propagation is expected to occur, on average, more frequently within the regions of the genome that are under greater selective constraint, such as conserved regulatory and coding regions. The higher GC content of such conserved regions supports this view. In summary, I suggest that much of the variation in GC content seen in organisms spanning several kingdoms may be attributed, in part, to the interplay between a prevailing AT mutational pressure, recombination, GC-biased mismatch repair, and the antirecombinagenic effects of sequence heterology.
Because most point mutations are GC to AT events, recombination allows the most common form of mutation to be restored to wild type through the actions of GC-biased mismatch repair. In recombining organisms, mismatches occur through mutation as well as through recombination. In nonrecombining organisms, mismatches only occur through mutation. Recombination therefore provides mismatch repair enzymes multiple chances to repair the most common type of mutation. Nonrecombining organisms would not be afforded such opportunities. I suggest that this ability to resurrect wild-type alleles from mutant alleles would have powerful and immediate selective advantages through its potential to reduce both the number of mutations within the recombining genome as well the mutational load of the outcrossing population. This Mutation Reversal model may explain, in part, the evolution of several forms of sexual recombination, including meiotic recombination and genetic transformation, and will be presented in detail elsewhere. For those interested in a comprehensive review of other contemporary models for the evolution of sex see Birdsell and Wills 2001
.
The findings presented here have implications for a variety of fields of research. With respect to DNA repair, it appears that S. cerevisiae may posses both a thymine and an adenine DNA glycosylase activity. No such enzymes have ever been characterized in this organism, and blast-p searches of known thymine and adenine glycosylases against the S. cerevisiae genome have turned up no candidate loci, suggesting that genes of uncharacterized function are responsible for these mismatch repair activities.
Given the paucity of accurate data on the recombination rates in organisms such as humans, silent GC content may be a useful first order approximation of the relative recombination rate of a locus. Algorithms used to calculate evolutionary parameters, such as Ka and Ks, may benefit by taking into account the recombinational background of loci as well as the effect and degree of BGC on estimates of Ka and Ks. I suggest that the theory of directional mutation pressure (Sueoka 1962
) may require modification such that it applies only to selectively neutral regions of the genome which are not subject to BGC.
Phylogenetic models may also benefit by taking into consideration the recombinational background of the loci under investigation. Failure to do so may result in an underestimate of divergence in recombinationally hot loci because of mutation reversal as well as convergent evolution (i.e., if GC mutations have a greater chance of fixation, then two sequences may appear more similar because of a common form of molecular drive acting on them). Models of the evolution of sex may benefit by incorporating the possibility that recombination helps reverse the most common form of mutation through GC-biased gene conversion.
| Acknowledgements |
|---|
|
|
|---|
I would like to thank the following people for their assistance and helpful comments: Margaret Kidwell, Ken Wolfe, Eric Alani, Bruce Walsh, Rick Michod, Bill Birky, Bernard Kunz, Chris Wills, Dawn Birdsell, Megan McCarthy, Tassia Kolesnikow, Lillian Engel, Ted Weinert, Tom Petes, and two anonymous reviewers. I would also like to thank James McInerney, Ziheng Yang, and Etsuko Moriyama for kindly making their software available.
| Footnotes |
|---|
Ken Wolfe, Reviewing Editor
Keywords: Saccharomyces cerevisiae
recombination
GC content
biased gene conversion
GC-biased mismatch repair
evolution of isochores
evolution of sex ![]()
Address for correspondence and reprints: John A. Birdsell, Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85121. birdsell{at}email.arizona.edu
. ![]()
| References |
|---|
|
|
|---|
Alani E., R. A. Reenan, R. D. Kolodner, 1994 Interaction between mismatch repair and genetic recombination in Saccharomyces cerevisiae Genetics 137:19-39[Abstract]
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Aota S., T. Ikemura, 1986 Diversity in G+C content at the third position of codons in vertebrate genes and its cause Nucleic Acids Res 14:6345-6355
Au K. G., M. Cabrera, J. H. Miller, P. Modrich, 1988 Escherichia coli mutY gene product is required for specific A-GC. G mismatch correction Proc. Natl. Acad. Sci. USA 85:9163-9166
Babenko V. N., P. S. Kosarev, O. V. Vishnevsky, V. G. Levitsky, V. V. Basin, A. S. Frolov, 1999 Investigating extended regulatory regions of genomic DNA sequences Bioinformatics 15:644-653
Baudat F., A. Nicholas, 1997 Clustering of meiotic double-strand breaks on yeast chromosome III Proc. Natl. Acad. Sci. USA 94:5213-5218
Bengtsson B. O., 1990 The effect of biased conversion on the mutation load Genet. Res 55:183-187[ISI][Medline]
Bernardi G., 1986 Compositional constraints and genome evolution J. Mol. Evol 24:1-11[ISI][Medline]
. 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]
Bhui-kaur A., M. F. Goodman, J. Tower, 1998 DNA mismatch repair catalyzed by extracts of mitotic, postmitotic, and senescent Drosophila tissues and involvement of mei-9 gene function for full activity Mol. Cell. Biol 18:1436-1443.
Bill C. A., W. A. Duran, N. R. Miselis, J. A. Nickoloff, 1998 Efficient repair of all types of single-base mismatches in recombination intermediates in Chinese hamster ovary cells: competition between long-patch and G-T glycosylasemediated repair of G-T mismatches Genetics 149:1935-1943
Birdsell J. A., C. Wills, 2001 The evolutionary origin and maintenance of sexual recombination: a review of contemporary models Evol. Biol. (in press)
Bishop D. K., J. Andersen, R. D. Kolodner, 1989 Specificity of mismatch repair following transformation of Saccharomyces cerevisiae with heteroduplex plasmid DNA Proc. Natl. Acad. Sci. USA 86:3713-3717
Blaschke R. J., G. A. Rappold, 1997 Man to mouselessons learned from the distal end of the human X chromosome Genome Res 7:1114-1117
Borts R. H., J. E. Haber, 1987 Meiotic recombination in yeast: alteration by multiple heterzygosities Science 237:1459-1465
Borts R. H., W. Y. Leung, W. Kramer, B. Kramer, M. Williamson, S. Fogel, J. E. Haber, 1990 Mismatch repairinduced meiotic recombination requires the pms1 gene product Genetics 124:573-584[Abstract]
Bradnam K. R., C. Seoighe, P. M. Sharp, K. H. Wolfe, 1999 G+C content variation along and among Saccharomyces cerevisiae chromosomes Mol. Biol. Evol 16:666-675[Abstract]
Brown T. C., J. Jiricny, 1988 Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells Cell 54:705-711[ISI][Medline]
. 1989 Repair of base-base mismatches in simian and human cells Genome 31:578-583[Medline]
Casane D., S. Boissinot, B. H. Chang, L. C. Shimmin, W. Li, 1997 Mutation pattern variation among regions of the primate genome J. Mol. Evol 45:216-226[ISI][Medline]
Charlesworth B., 1994 Patterns in the genome Curr. Biol 4:182-184[ISI][Medline]
Chen W., S. Jinks-Robertson, 1999 The role of the mismatch repair machinery in regulating mitotic and meiotic recombination between diverged sequences in yeast Genetics 151:1299-1313
Clay O., S. Caccio, S. Zoubak, D. Mouchiroud, G. Bernardi, 1996 Human coding and noncoding DNA: compositional correlations Mol. Phylogenet. Evol 5:2-12[ISI][Medline]
Coulondre C., J. H. Miller, P. J. Farabaugh, W. Gilbert, 1978 Molecular basis of base substitution hotspots in Escherichia coli Nature 274:775-780[Medline]
Datta A., M. Hendrix, M. Lipsitch, S. Jinks-Robertson, 1997 Dual roles for DNA sequence identity and the mismatch repair system in the regulation of mitotic crossing-over in yeast Proc. Natl. Acad. Sci. USA 94:9757-9762
D'Onofrio G., D. Mouchiroud, B. Aissani, C. Gautier, G. Bernardi, 1991 Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins J. Mol. Evol 32:504-510[ISI][Medline]
de Jong P. J., A. J. Grosovsky, B. W. Glickman, 1988 Spectrum of spontaneous mutation at the APRT locus of Chinese hamster ovary cells: an analysis at the DNA sequence level Proc. Natl. Acad. Sci. USA 85:3499-3503
Detloff P., J. Sieber, T. D. Petes, 1991 Repair of specific base pair mismatches formed during meiotic recombination in the yeast Saccharomyces cerevisiae Mol. Cell. Biol 11:737-745
Detloff P., M. A. White, T. D. Petes, 1992 Analysis of a gene conversion gradient at the HIS4 locus in Saccharomyces cerevisiae Genetics 132:113-123[Abstract]
Duncan B. K., J. H. Miller, 1980 Mutagenic deamination of cytosine residues in DNA Nature 287:560-561[Medline]
Duret L., L. D. Hurst, 2001 The elevated GC content at exonic third sites is not evidence against neutralist models of isochore evolution Mol. Biol. Evol 18:757-762
Eisenbarth I., A. M. Striebel, E. Moschgath, W. Vogel, G. Assum, 2001 Long-range sequence composition mirrors linkage disequilibrium pattern in a 1.13 Mb region of human chromosome 22 Hum. Mol. Genet 10:2833-2839
Eisenbarth I., G. Vogel, W. Krone, W. Vogel, G. Assum, 2000 An isochore transition in the NF1 gene region coincides with a switch in the extent of linkage disequilibrium Am. J. Hum. Genet 67:873-880[ISI][Medline]
Ellis N., P. N. Goodfellow, 1989 The mammalian pseudoautosomal region Trends Genet 5:406-410[ISI][Medline]
Eyre-W







