Skip Navigation


MBE Advance Access originally published online on March 21, 2006
Molecular Biology and Evolution 2006 23(6):1203-1216; doi:10.1093/molbev/msk008
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/6/1203    most recent
msk008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Webster, M. T.
Right arrow Articles by Ellegren, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Webster, M. T.
Right arrow Articles by Ellegren, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Article

Strong Regional Biases in Nucleotide Substitution in the Chicken Genome

Matthew T. Webster1, Erik Axelsson and Hans Ellegren

Department of Evolution, Genomics and Systematics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden

E-mail: websterm{at}tcd.ie.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Interspersed repeats have emerged as a valuable tool for studying neutral patterns of molecular evolution. Here we analyze variation in the rate and pattern of nucleotide substitution across all autosomes in the chicken genome by comparing the present-day CR1 repeat sequences with their ancestral copies and reconstructing nucleotide substitutions with a maximum likelihood model. The results shed light on the origin and evolution of large-scale heterogeneity in GC content found in the genomes of birds and mammals—the isochore structure. In contrast to mammals, where GC content is becoming homogenized, heterogeneity in GC content is being reinforced in the chicken genome. This is also supported by patterns of substitution inferred from alignments of introns in chicken, turkey, and quail. Analysis of individual substitution frequencies is consistent with the biased gene conversion (BGC) model of isochore evolution, and it is likely that patterns of evolution in the chicken genome closely resemble those in the ancestral amniote genome, when it is inferred that isochores originated. Microchromosomes and distal regions of macrochromosomes are found to have elevated substitution rates and a more GC-biased pattern of nucleotide substitution. This can largely be accounted for by a strong correlation between GC content and the rate and pattern of substitution. The results suggest that an interaction between increased mutability at CpG motifs and fixation biases due to BGC could explain increased levels of divergence in GC-rich regions.

Key Words: isochore • base composition • chicken • mutation • recombination


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Base composition is heterogeneous within a wide variety of eukaryotic genomes, characterized by local similarities in GC content within genomic regions and significant differences between regions (Nekrutenko and Li 2000Go). In addition, GC content correlates with a number of important features of the genomic landscape such as gene density, patterns of gene expression, repeat element distribution, and recombination rate (Mouchiroud et al. 1991Go; Saccone et al. 1993Go; Caron et al. 2001Go; Fullerton, Bernardo Carvalho, and Clark 2001Go; Kong et al. 2002Go; Lercher et al. 2003Go). This indicates that base composition is a key component of genome organization. Understanding the forces that govern it is therefore an important step in elucidating the evolutionary significance and potential biological function of a variety of other aspects of genome organization. Patterns of base composition and patterns of nucleotide substitution have a complex and dynamic interrelationship: regional variation in patterns of nucleotide substitution affects base composition and variation in base composition is an important component in determining patterns of mutation and fixation (Smith, Webster, and Ellegren 2002Go; Ellegren, Smith, and Webster 2003Go; Meunier and Duret 2004Go). This study attempts to address two main questions: (1) what forces generate spatial heterogeneity in GC content? and (2) what are the major determinants of substitution rate variation?

Although the genomes of most eukaryotes exhibit some spatial heterogeneity in GC content, the genomes of mammals, birds, and reptiles exhibit more extreme variation in GC content commonly known as the isochore structure (Filipski, Thiery, and Bernardi 1973Go; Bernardi, Hughes, and Mouchiroud 1997Go; Hughes, Zelus, and Mouchiroud 1999Go; Hughes, Clay, and Bernardi 2002Go). These genomes were originally described as mosaics of isochores with different GC contents, comprising long regions (>300 kb), where GC content is relatively homogeneous separated by distinct boundaries (Bernardi 2000Go). Subsequent whole-genome analyses have indicated that this structure does not exist in the strict sense (Nekrutenko and Li 2000Go; IHGSC 2001Go). Nonetheless, it is clear that the genomes of mammals, birds, and reptiles are highly heterogeneous in GC content and have acquired GC-rich regions. In this article, we refer to such regions as GC-rich isochores. The phylogenetic distribution of GC-rich isochores suggests that they were acquired in the amniote lineage after the split with amphibians (the genomes of Xenopus species appear uniformly AT rich; Bernardi 2000Go). This isochore structure could potentially be influenced by natural selection. As GC-rich isochores have been observed in both warm- and cold-blooded animals, it seems unlikely that selection for increased thermal stability is responsible, as initially suggested (Bernardi et al. 1985Go). However, it is possible that selection could act on variation in GC content to optimize genomic structure due to the effects of GC content on physical properties of DNA and chromatin-level effects on control of gene expression (Vinogradov 2003Go, 2005Go).

It is likely that variation in neutral processes have played a large part in generating variation in GC content. A number of hypotheses have been put forward to explain how patterns of mutation could be variable and lead to heterogeneous patterns of GC content, for example, related to replication timing (Wolfe, Sharp, and Li 1989Go) or variation in efficiency of repair (Filipski 1987Go; Sueoka 1988Go). However, much recent evidence in mammals points to a role for recombination in generating variation in GC content (Galtier et al. 2001Go; Birdsell 2002Go; Galtier 2003Go; Montoya-Burgos, Boursot, and Galtier 2003Go; Webster et al. 2005Go). In humans, the equilibrium GC content (GC*)—defined as the stable GC content toward which a genomic region is evolving—was found to correlate with the crossover rate, indicating that recombination is a major factor influencing variation in substitution pattern (Meunier and Duret 2004Go). Many reports support the idea that recombination is mutagenic (Lercher and Hurst 2002Go; Hellmann et al. 2003Go), and it is possible that this mutagenic effect also produces a bias toward mutations that incorporate G:C nucleotide pairs. However, it is most likely that recombination facilitates the accumulation of GC nucleotides through biased gene conversion (BGC), which results in a bias toward fixation of G:C alleles at sites that are polymorphic for A:T and G:C alleles (Eyre-Walker 1993Go; Galtier et al. 2001Go; Birdsell 2002Go). It has been demonstrated that BGC has dynamics identical to weak directional selection (Nagylaki 1983Go). This process is believed to act either by biased repair of heteroduplexes formed by gene conversion or by meiotic drive (Marais 2003Go).

Fixation biases toward G:C alleles have been detected by comparison of patterns of mutational changes in divergence and polymorphism data in both human and mouse (Duret et al. 2002Go; Smith and Eyre-Walker 2002Go; Webster, Smith, and Ellegren 2003Go), using modified versions of the McDonald Kreitman tests for selective neutrality (McDonald and Kreitman 1991Go). Furthermore, some studies in which the direction of mutations giving rise to single-nucleotide polymorphisms could be determined have noted that mutations from A or T to G or C (AT -> GC) segregate at significantly higher frequencies than the opposite type (GC -> AT) (Duret et al. 2002Go; Lercher et al. 2002Go; Webster and Smith 2004Go; Webster et al. 2005Go), also indicating that AT -> GC mutations have an increased probability of fixation. It should be noted that the mechanisms described so far are not mutually exclusive: it is possible that biases in patterns of both mutation and fixation exist. Similarly, although it is unlikely that selection is acting on millions of single-nucleotide changes to alter GC content in vertebrates, it is plausible that selection acts on the processes creating this variation, such as recombination intensity (Otto and Lenormand 2002Go).

Perhaps surprisingly, a number of recent reports suggest that isochores in mammals are being homogenized (Duret et al. 2002Go; Smith, Webster, and Ellegren 2002Go; Arndt, Petrov, and Hwa 2003Go; Webster, Smith, and Ellegren 2003Go). This is particularly apparent in GC-rich regions, which are tending toward much lower GC contents. This was first suggested by the analysis of substitutions at synonymous sites in genes along lineages within three different mammalian orders (rodents, artiodactyls, and primates) using appropriate outgroups (Duret et al. 2002Go). Although the extent to which this homogenization is occurring in all mammalian lineages is unclear (Alvarez-Valin et al. 2004Go), a maximum likelihood (ML) analysis of 41 genes in up to 66 diverse mammals strongly supports this effect in early mammalian evolution (Belle et al. 2004Go). By analysis of patterns of substitution in human interspersed repeats of many different ages, Arndt, Petrov, and Hwa (2003)Go demonstrated a shift in the pattern of substitution that occurred around the time of mammalian radiation from isochore preserving to homogenization. A homogenization of GC content has also been observed in primate noncoding alignments (Smith, Webster, and Ellegren 2002Go; Webster, Smith, and Ellegren 2003Go; Meunier and Duret 2004Go) and analysis of patterns of substitution in Alu repeats (Webster et al. 2005Go). Note that a recent study arguing against the homogenization of GC content in primates and rodents (Antezana 2005Go) is based on a highly unreliable method of inferring substitution patterns (Duret 2006Go).

This trend toward homogenization of GC content suggests that the forces responsible for creating and maintaining isochores in the ancestral amniote have reduced in efficacy in mammals. Assuming that BGC is the main factor generating variation in GC content, at least three factors could be involved. Firstly, it is possible that the enzymes involved in heteroduplex repair have altered to change a preference for incorporating G:C pairs in mammalian lineages (a change in the repair bias). Secondly, because BGC leads to a bias in the fixation of certain alleles (comparable to natural selection), it is sensitive to changes in effective population size (Ne). Hence, a reduction in Ne would reduce the effects of BGC. Thirdly, chromosomal rearrangements could affect variation in GC by altering the intensity of recombination. In particular, there is likely to be a requirement for one crossover per meiosis per chromosome arm (Pardo-Manuel de Villena and Sapienza 2001Go), which results in smaller chromosomes having higher average rates of recombination (Meunier and Duret 2004Go). Hence, changes in chromosome size could affect recombination rate (and hence strength of BGC).

Chromosomes in the chicken genome (Gallus gallus) are variable in size, and the autosomes are classified into 5 large macrochromosomes (56–188 Mb), 5 intermediate chromosomes (21–34 Mb), and 28 small microchromosomes (<100 kb to 19 Mb). In contrast, mammalian chromosomes tend to be longer and more similar in size. For example, the human genome has 22 autosomes (47–246 Mb) (IHGSC 2001Go; ICGSC 2004Go). There is evidence that the ancestral amniote genome closely resembled the chicken, implying that microchromosomes fused together during the evolution of the premammalian karyotype. Several studies point to an extremely slow rate of chromosomal evolution in the avian lineage compared with mammals (Bush et al. 1977Go; Burt et al. 1999Go; Burt 2002Go). Recently, Bourque et al. (2005)Go estimated that the number of interchromosomal rearrangements between chicken and a putative mammalian ancestor only slightly exceeds the number inferred in the mouse lineage, although the evolutionary distance is more than fivefold greater.

Recombination varies over an eightfold range among chicken chromosomes, and microchromosomes have elevated recombination rates and GC content. This extreme variation makes the chicken an ideal model for understanding the effects of recombination and GC content on substitution rate. Furthermore, as the chicken karyotype is similar to the ancestral amniote karyotype, analysis of the forces affecting GC content in the chicken genome can shed light on the forces responsible for generating GC-rich isochores in birds, reptiles, and mammals.

Comparative genomic studies have suggested that mutation rates are elevated in microchromosomes and subtelomeric regions compared with the rest of the chicken genome (ICGSC 2004Go; Axelsson et al. 2005Go). However, as many genomic features are variable between micro- and macrochromosomes, the precise causes of these observations are unclear. For instance, recombination, GC content, and the number of CpG motifs are all higher on microchromosomes. One possibility is that recombination is directly mutagenic (Lercher and Hurst 2002Go; Hellmann et al. 2003Go; Filatov 2004Go; ICGSC 2004Go). Alternatively, GC-rich regions could be more mutable simply because the rate of GC -> AT mutation is high (Smith, Webster, and Ellegren 2002Go) or because they possess more hypermutable CpG sites (Hurst and Williams 2000Go). Fixation biases due to BGC can also alter substitution rates (Piganeau et al. 2002Go). One way to understand the relative contribution of these various processes is to analyze individual substitution frequencies separately. For example, if double-stranded breaks associated with recombination increase the rate of all types of transitions and transversions, then those mutations that do not alter GC content will also be affected (i.e., A {leftrightarrow} T or G {leftrightarrow} C transversions) (Filatov 2004Go). In contrast, BGC only affects mutations that do alter GC content (AT -> GC and GC -> AT).

Patterns of divergence in interspersed repeats can be used to examine variability in rates and patterns of substitution during evolution (Arndt, Petrov, and Hwa 2003Go; Arndt, Hwa, and Petrov 2005Go; Webster et al. 2005Go). Whereas a massive proportion of human and other mammalian genomes are made up of interspersed repeats (40%–50%), less than 9% of the chicken genome is classified as interspersed repeats (ICGSC 2004Go; Wicker et al. 2005Go). However, 80% of this is dominated by the CR1 element (6.4% of genome). CR1 is a long interspersed nuclear element with close similarity to the mammalian L1 element. So far, no intact copies closely resembling a CR1 master copy have been observed in the chicken genome, and only one full-length open reading frame was found in the initial chicken genome analysis, suggesting that CR1 is unlikely to be currently active in the chicken genome. A full-length CR1 is 4.5 kb, but the vast majority (99.4%) are truncated from their 5' end to around 1.2 kb. RepBase contains 22 CR1 master sequences, divided into 11 families (Jurka 2000Go; ICGSC 2004Go).

Here we reconstruct patterns of nucleotide substitution in a genome-wide sample of CR1 repeats by comparing each repeat sequence found in the chicken genome with its respective master copy, inferred to be its ancestral sequence. The results shed light on the causes of heterogeneity in GC content and variation in substitution rates observed in the chicken genome. In order to confirm our findings using an independent source of data, we also performed a comparative analysis of 34 intron sequences in chicken (G. gallus), turkey (Meleagris gallopavo), and Japanese quail (Coturnix japonica). As the ancestral sequence is not known in this case, we inferred patterns of substitution since the common ancestor of chicken and turkey using a parsimony approach (Meunier and Duret 2004Go).


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Inference of Substitution Patterns from Interspersed Repeats
We reconstructed patterns of substitution in the chicken genome by comparison of all copies of a particular repeat family with their inferred ancestral copies. Interspersed repeats are noncoding and should therefore be free from functional constraints. It is commonly assumed that after insertion in any genomic location, they become inactive and begin to accumulate neutral substitutions. As repetitive elements are abundant in vertebrate genomes, they can provide large amounts of raw data with which to estimate variation in patterns of substitution. The ancestral copy of each sequence can be estimated by using the master copy defined in RepBase (Jurka 2000Go), assuming that all subfamilies of the repeat class have been reliably identified.

Knowledge of the ancestral sequence permits use of an ML approach to estimate the substitution patterns (Arndt, Burge, and Hwa 2003Go). This reconstructs substitution frequencies, correcting for multiple hits, for the four transversions, two transitions, and CpG transitions. These seven rates comprise all possible mutational changes assuming strand complementarity and that there are no other important context effects other than CpG mutability. As many repeats are highly diverged from their ancestral sequence, it is crucial to take into account multiple hits. This is particularly important at CpG sites where mutation rates are elevated up to 10 times due to methylated cytosine mutagenesis (Yang et al. 1996Go; Templeton et al. 2000Go). Indeed, it is impossible to account for the effects of CpG hypermutability by selectively removing sites because sites that are not associated with CpGs in the ancestral may still be involved with CpG mutations during evolution due to mutations at neighboring sites and/or multiple hits. Furthermore, some mutations at CpG sites may not be due to CpG hypermutability.

Alignment of CR1 Repeats with Master Sequences
We searched the draft assembly of the chicken genome (WASHUC1) using RepeatMasker (http://www.repeatmasker.org/) on the default settings. Generation and analysis of alignments were done using self-written Perl programs. We used a sliding window of 5 Mb across chromosomes. To prevent the subtelomeric regions of the five macrochromosomes being excluded from the analysis, we first removed a distal 5-Mb segment from each end of the macrochromosomes and divided the remainder into nonoverlapping 5-Mb segments. We concatenated the alignments made by RepeatMasker of each CR1 repeat with its identified master copy (taken from a library containing 22 CR1 master copies) within each segment. We made 11 such alignments for each segment by dividing initial alignments into the 11 CR1 families (shown in fig. 4 of ICGSC 2004Go). Hence, for each genomic segment, a set of 11 long pairwise alignments were produced, where one sequence consisted of concatenated master copies from a particular repeat family and the other of the concatenated repeat sequences from the particular genomic sequence identified as descendents of those copies. We calculated the GC content and frequency of CpG motifs present in the nonrepetitive, nongenic sequence within each 5-Mb segment. This was done using the repeat-masked sequence from which genes were masked using the annotations from the initial genome sequence analysis. We also calculated the proportion of sites in exons within each segment using these annotations, which we term exon density.


Figure 4
View larger version (14K):
[in this window]
[in a new window]
 
FIG. 4.— Relationship between GC content and the seven substitution frequencies inferred from CR1 repeats. (A) Transversion substitution frequencies that do not affect GC content (A:T -> T:A and C:G -> G:C). (B) Transversion substitution frequencies that do affect GC content (A:T -> C:G and G:C -> T:A). (C) Transition substitution frequencies (A:T -> G:C and C:G -> T:A). (D) CpG transition substitution frequency. In each case, the quadratic fit is also presented.

 
Estimation of Regional Substitution Pattern in the Chicken Genome
Alignments with fewer than 5,000 bases were excluded from the analysis of substitution rates. Estimates of the frequencies of the seven different substitution events (see above) were obtained using the ML approach described by Arndt, Burge, and Hwa (2003)Go. In order to obtain a time-averaged estimate for each nucleotide substitution frequency in each genomic region, we corrected for the age of each of the 11 CR1 families across the entire genome. This was done using the weighted sum of each individual substitution frequency relative to the genome-wide average transversion frequency of the repeat (all four frequencies are similar) as described by Arndt, Hwa, and Petrov (2005)Go. The substitution frequencies were also used to calculate the equilibrium GC (GC*) of each region, taking into account the time-averaged substitution pattern using a forward simulation as described by Arndt, Burge, and Hwa (2003)Go.

Each substitution frequency represents the relative frequency of each event per potentially mutable site (e.g., the G:C -> A:T rate is the frequency of this type of substitution at positions which are A or T in the sequence). In order to estimate the predicted relative contribution to present-day substitution rate each substitution frequency has on the present-day sequence, we multiplied the time-averaged substitution frequency in each region by the proportion of A:T or G:C base pairs in the noncoding, nonrepetitive region in each particular 5-Mb genomic window. For example, the relative contribution of G:C -> A:T changes to the expected substitution rate in a particular genomic region is equal to the G:C -> A:T substitution frequency in that region multiplied by its GC content. To calculate the CpG rates, we multiplied the relevant CpG rate by the number of CpG sites in the present-day flanking sequence (this assumes that all CpG sites are methylated). We refer to these as net predicted rates.

Analysis of Substitution Pattern in Human Alu Repeats
In order to compare the relationship between GC* and GC content in the chicken and human genomes, we reanalyzed a comparable data set of human repeats presented in Webster et al. (2005)Go. In this data set, concatenated alignments were made of Alu repeats with the human genome divided into segments with the boundaries halfway between genetic markers with known crossover frequencies (average length of segments was 595 kb). We used the same correction for repeat element age described above for CR1 repeats (Arndt, Hwa, and Petrov 2005Go) to correct for the age of each Alu repeat using the average transversion frequencies of AluJ, AluS, and AluY repeats. The resulting estimates of each of the substitution frequencies were then used to calculate GC* for each genomic segment using forward simulation (Arndt, Burge, and Hwa 2003Go).

Analysis of Pattern of Substitution in Intron Alignments
Sequence data from 34 orthologous introns in chicken and turkey, spread over the genome, were previously presented by Axelsson et al. (2005)Go. For the purpose of this study, the orthologous intron sequence in Japanese quail was obtained using the same laboratory methods. Alignment of orthologous sequences was performed using ClustalW (Thompson, Higgins, and Gibson 1994Go) under the default settings and then checked manually. Details of all alignments are presented in Supplementary Table 1 (Supplementary Material online). We first performed a pairwise analysis of substitutions between all three species. This indicated that divergence between quail and either chicken or turkey was ~20% higher than between chicken and turkey. We therefore considered quail to be the outgroup of chicken and turkey, as also indicated by previous studies (Dimcheff, Drovetski, and Mindell 2002Go).

We analyzed substitutions along the chicken and turkey lineages using a parsimony approach. In order to minimize misinference caused by homoplasy at hypermutable CpG sites, we followed the protocol developed by Meunier and Duret (2004)Go. Accordingly, we considered three classes of sites: (1) CpG free, (2) CpG ancestral, and (3) all other sites. We estimated the four transversion and two transition rates from the first site class. The CpG transition rate was estimated using the second class. We used these seven substitution rates to derive the GC* for each alignment using the sequence evolution model of Arndt, Burge, and Hwa (2003)Go.

We performed simulations of molecular evolution to estimate the error expected by using parsimony for estimating substitution rates from our intron data set. We simulated evolution along the inferred phylogenetic relationship inferred between the three species using the observed chicken-turkey and chicken-quail divergences of 10% and 12%, respectively. In each simulation, the transition/transversion ratio was set to 2.75, and the CpG transition rate was 10 times greater than other transitions. We then introduced different biases in the relative rates of GC -> AT and AT -> GC substitutions, resulting in four parameter sets. The stationary dinucleotide base composition corresponding to each set of substitution rate parameters was obtained using the sequence evolution model of Arndt, Burge, and Hwa (2003)Go. The four parameter sets corresponded to GC* values of 36%, 42%, 52%, and 60%. The stationary dinucleotide frequencies were used to generate random sequences, which served as a starting point for the simulations. Substitutions were subsequently allowed to accumulate on the chicken-turkey-quail phylogenetic tree, using the same substitution rate parameters (i.e., assuming that GC content remains at equilibrium). We then compared the parameter estimates obtained by applying the parsimony approach to the sequences with the real parameter values to ascertain the accuracy of the parsimony approach.

Statistics
All statistical analyses were performed in R (http://www.r-project.org). Confidence intervals (CIs) were produced by bootstrapping. In order to determine the CIs for average transition frequencies for each individual CR1 family, we resampled with replacement concatenated alignments corresponding to each family from the entire chicken genome with 10,000 replicates. CIs for the rate and pattern of substitution in different chromosome classes were derived in a similar way by randomly resampling time-averaged estimates of each individual substitution frequency in each chromosomal class. CIs for correlation coefficients were also calculated by bootstrap. To describe the relationship between flanking GC content and the individual substitution frequencies, we fitted quadratic equations. In order to correct for the correlation with GC content when calculating the difference in substitution rate on different chromosome classes, we used the residuals of the fitted curve between GC and the substitution pattern.


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We divided the chicken genome into 5-Mb segments, resulting in a total of 191 blocks. For each block, we constructed 11 concatenated alignments corresponding to repeats within each major CR1 repeat family aligned with their corresponding master sequence. In total, 2,131 concatenated alignments were produced, which reduced to 1,881 when those under 5,000 bp were removed from the data set. The GC content of the CR1 master copies in RepBase ranges from 52.9% to 56.9%. Both full-length master copies and truncated CR1 repeats are rich in CpG motifs (full-length copies contain 128.8 CpG motifs and truncated copies contain 27.2), indicating that it is crucial to accurately consider the CpG mutation process. A summary of the data set is shown in table 1.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary of Alignments

 
In general, there is a good correspondence between the genome-wide estimates of average transition and transversion substitution frequencies from the 11 CR1 families (fig. 1). However, different CR1 families have different ages. To measure the extent to which different genomic regions accumulate changes at different rates, we calculated a time-averaged estimate for each of the seven substitution frequencies in each 5-Mb block. To do this, we corrected each ML estimate of the individual substitution frequency with the genome-wide average transversion frequency of the particular CR1 family as described in Methods. These time-corrected estimates of each of the seven substitution frequencies in each 5-Mb genomic segment were used in all further analyses. Note, however, that this correction for differences in age does not affect the overall "pattern" of substitution (e.g., estimates of GC*) as all substitution frequencies are affected equally.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
 
FIG. 1.— Relationship between average transition and average transversion frequency in 11 chicken CR1 families.

 
There is a significant correlation between the nonrepetitive nongenic GC content of each segment and GC* (fig. 2A; Pearson's r = 0.898; P < 10–4; 95% CI 0.866–0.928 by bootstrap). A 1:1 relationship between GC content and GC* (shown on graph) is expected if base composition is stable along the chicken lineage. As the gradient of the linear regression line is significantly greater than one (1.39; P < 10–4; 95% CI 1.27–1.55), the heterogeneity between genomic regions appears to be increasing. In order to make a comparison with the human lineage, we made a similar analysis of a data set from humans using Alu repeats (fig. 2B). A significant correlation between GC content and GC* is observed (r = 0.614; P < 10–4; 95% CI 0.583–0.642). As has been previously demonstrated, the gradient of the slope between GC content and GC* is much less than one (0.242; P < 10–4; 95% CI 0.227–0.258), indicating that GC content is becoming homogenized on the human lineage (Webster et al. 2005Go). There is therefore a strong contrast between the evolution of GC content in humans and chicken.


Figure 2
View larger version (9K):
[in this window]
[in a new window]
 
FIG. 2.— Relationship between GC content and GC* in the chicken and human genomes estimated using interspersed repeats. (A) Chicken CR1 elements and (B) human Alu elements. Dashed lines indicate a 1:1 relationship and solid lines indicate the real linear regression. The gradient of the linear regression is significantly greater than one for the CR1 repeats, indicating that heterogeneity in GC content is reinforced along the chicken lineage. Conversely, the gradient of the linear regression for the Alu repeats is significantly less than one, demonstrating that GC content has been homogenized along the human lineage.

 
We also analyzed the pattern of substitution in 34 intron alignments from chicken, turkey, and quail. The total number of aligned bases was 22.5 kb. Figure 3 shows a significant correlation between the average GC content of each intron and GC* (r = 0.618; P < 10–4; 95% CI 0.374–0.790). The gradient of this line is not significantly different from one (0.837; 95% CI 0.451–1.24). However, the gradient is significantly greater than the gradient from Alu repeats in figure 2B (P = 0.006). This is therefore consistent with the data from chicken CR1 repeats, indicating that GC content is either stable or reinforced along the chicken lineage. We performed simulations to test the reliability of using parsimony to estimate GC* from the intron alignments. When the substitution parameters were set so that GC* remained at 36%, parsimony estimated GC* to be 0.4% higher. With values of 42%, 52%, and 60%, the parsimony method estimated GC* to be 0.6%, 1.4%, and 2.1% lower, respectively. This indicates that the expected error is small within the range of GC values used in this study. In addition, there is a trend toward parsimony leading to false inference of homogenization of GC content, as has been previously demonstrated (Eyre-Walker 1998Go). It is therefore unlikely that the use of parsimony could lead us to erroneously infer the close correspondence between GC and GC* observed in our data set.


Figure 3
View larger version (9K):
[in this window]
[in a new window]
 
FIG. 3.— Relationship between GC content and GC* in substitution patterns inferred from alignments of 34 introns from the chicken, turkey, and quail genomes. The dashed line indicates a 1:1 relationship and the solid line indicates the linear regression. The gradient of the regression line is not significantly different from one.

 
Figure 4 shows the relationship between GC content and each of the seven individual substitution frequencies. We fitted quadratic equations to each graph to describe this relationship. Figure 4A shows this relationship for the two tranversions that do not affect GC content (A:T -> T:A and C:G -> G:C). Neither of these rates exhibit much variation with GC content. However, for the transversions that do affect GC content (fig. 4B), the GC-increasing transversion (A:T -> C:G) shows a strong increase with GC content, whereas the opposite trend is shown by the GC-decreasing transversion (G:C -> T:A). A similar trend is exhibited by the two transitions (fig. 4C). The GC-increasing transition (A:T -> G:C) shows a strong increase with GC content, whereas the GC-decreasing one (C:G -> T:A) decreases with increasing GC. The CpG transition rate seems to increase in regions of higher GC content (fig. 4D). There is large variance in this measure, which could reflect difficulties in accurate estimation because there are fewer CpG sites. In a similar analysis of the human genome, Arndt, Hwa, and Petrov (2005)Go observed a strong (roughly twofold) increase in the C:G -> G:C and G:C -> T:A transversion frequencies in GC-poor regions (<35%). This is not observed in our data set.

In order to determine the predicted net effect of the estimated substitution pattern on substitution rate in each genomic segment, we multiplied each rate by the GC or AT content of the genomic segment (or CpG content in the case of the CpG rate). Figure 5A shows the calculated net predicted rate in each region due to the substitutions that do not affect GC content (A:T -> T:A or C:G -> G:C). This rate is virtually unchanged across regions of different GC contents. As seen from figures 4B and C, both of the GC-increasing (AT -> GC) substitution frequencies show a similar (positive) relationship with GC content, whereas the GC-decreasing (GC -> AT) substitution frequencies both show a negative relationship with GC content. Figure 5B shows the net predicted effect on the AT -> GC rate. Despite the fact that the number of A:T nucleotides is lower (by definition) in regions of high GC content, the substitution rate due to AT -> GC changes increases with GC content. Hence, the increased substitution frequency of AT -> GC changes in regions of high GC content is strong enough to counteract the paucity of A:T nucleotides. Figure 5C shows the net predicted effect on the GC -> AT rate. When the CpG transitions are also included, the relationship is almost identical to the AT -> GC relationship in figure 5B. This congruence of the relationships between both AT -> GC and GC -> AT with GC content is consistent with the relationship between GC content and GC* shown in figure 2A: the net effects of AT -> GC and GC -> AT substitutions in changing GC content cancel out, and GC content remains relatively stable. When CpG transitions are excluded, the net predicted effect on substitution rate of GC -> AT substitutions shows very little variation with GC content. This indicates that the presence of additional methylated CpG sites is an important factor increasing mutation rate in GC-rich regions in chicken.


Figure 5
View larger version (10K):
[in this window]
[in a new window]
 
FIG. 5.— Relationships between the net number of predicted substitutions inferred from CR1 repeats in each genomic block. (A) Substitutions that do not alter GC content. (B) AT -> GC substitutions. (C) GC -> AT substitutions (crosses represent the data set excluding CpG sites, and circles represent the data set including CpG sites). In each case, the quadratic fit is also presented.

 
When all seven substitution frequencies are used to estimate the net predicted substitution rate relative to the genomic average, there is a strong positive correlation with GC content (r = 0.832; P < 10–4; 95% CI 0.781–0.876). Figure 6 shows the linear regression fitted to this data. There is a more than twofold variation in these rates between 5-Mb blocks in genomic regions with low and high GC content, indicating that there is substantial variation in rates of single-nucleotide mutation and fixation across the chicken genome.


Figure 6
View larger version (12K):
[in this window]
[in a new window]
 
FIG. 6.— Significant positive correlation between GC content and the net predicted substitution rate relative to the genomic average net predicted rate, taking all substitution frequencies into account. Line represents linear regression.

 
To investigate the effect of genomic location on the rate and pattern of nucleotide substitution, we partitioned the 5-Mb blocks into those on microchromosomes, macrochromosomes, and intermediate chromosomes. The distal portions of macrochromosomes (defined as 5 Mb encompassing the subtelomeric region at each end of the chromosome) were considered separately. A shorter definition of subtelomeric regions was not used due to lack of data. As we have shown previously (Axelsson et al. 2005Go), microchromosomes have significantly higher predicted net substitution rates than macrochromosomes (fig. 7A; P < 10–4). Rates on intermediate chromosomes are significantly different from both of these classes and lie in between the two (P < 10–4 for both comparisons). The distal regions of macrochromosomes have significantly greater rates than the remainder of macrochromosomes (P < 10–4), which are similar to intermediate chromosomes. The average rate on microchromosomes is 19.7% higher than the genomic average. In order to understand how much of this variation can be explained by correlation between GC content, we plotted the residuals of the regression between GC and substitution rate (shown in fig. 6). The differences are greatly reduced (fig. 7B), although rates on macrochromosomes are still significantly lower than both microchromosomes (P = 0.006) and intermediate chromosomes (P < 10–4). There are no other significant differences. This indicates that the GC content is able to explain the majority of variation between different chromosomal classes. After correcting for GC content, microchromosomes have net predicted rates only 3.15% higher than the genomic average.


Figure 7
View larger version (7K):
[in this window]
[in a new window]
 
FIG. 7.— (A) Average net predicted substitution rates relative to the genomic average in the three different chromosome classes and distal regions of macrochromosomes. There are significant differences between classes, with microchromosomes exhibiting a rate ~20% above the genomic average. (B) The residuals of the previous graph when correcting for the linear regression with GC content.

 
There are also significant differences between GC* estimated between different chromosomal regions (fig. 8A). The average GC* in microchromosomes is 47%, whereas for macrochromosomes it is 37%, with intermediate chromosomes about halfway between (42%). Pairwise comparisons between macrochromosomes, intermediate, and microchromosomes are all highly significant (P < 10–4) with distal regions of macrochromosomes significantly higher than the remainder of macrochromosomes (P < 10–4). The distal portions of macrochromosomes appear similar to microchromosomes in GC*. When we corrected for flanking GC content by examining the residuals of the linear regression between GC content and GC*, the difference in GC* between genomic regions almost completely disappears (fig. 8B). None of the pairwise comparisons are now significant except for that between the distal and nondistal regions of macrochromosomes (P < 0.009).


Figure 8
View larger version (6K):
[in this window]
[in a new window]
 
FIG. 8.— (A) Average estimates of GC* in the three different chromosome classes and the distal regions of macrochromosomes. Microchromosomes and distal regions of macrochromosomes tend toward significantly higher GC* than other regions. (B) The residuals of the previous graph when correcting for the linear regression with GC content.

 
Exon density has been previously examined as a potential correlate of the rate and pattern of nucleotide substitution (Arndt, Hwa, and Petrov 2005Go). There is a strong correlation between exon density and net predicted rates (r = 0.742; P < 10–4; 95% CI 0.636–0.821) and between exon density and GC* (r = 0.786; P < 10–4; 95% CI 0.718–0.846) in our data set. As predicted from the chicken genome analysis (IHGSC 2001Go), exon density strongly correlates with nonrepetitive, nongenic GC content in our data set (r = 0.866; P < 10–4; 95% CI 0.825–0.901). Hence, it is unclear which of these factors explains the greatest proportion of variation in the rate and pattern of substitution. There is no significant correlation between exon density and the residuals of the linear regression between GC content and net predicted substitution rate (r = 0.039; P = 0.369; 95% CI –0.193 to 0.277). However, a significant correlation remains between GC content and the residuals of the linear regression between exon density and net predicted substitution rate (r = 0.352; P = 8 x 10–4; 95% CI 0.186–0.509). This indicates that the correlation between net predicted rate and exon density can be fully accounted for by the relationship between net predicated rate and GC content. Additionally, there is no significant correlation between exon density and the residuals of the linear regression between GC content and GC* (r = 0.019; P = 0.464; 95% CI –0.200 to 0.250). However, a significant correlation remains between GC content and the residuals of the linear regression between exon density and GC* (r = 0.352; P = 8 x 10–4; 95% CI 0.186–0.509). Hence, the correlation between exon density and GC* can also be fully accounted for by variation in GC content.


    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Inferring Substitutions from CR1 Repeats
We reconstructed patterns of nucleotide substitution along the chicken lineage by comparing CR1 repeats with their inferred ancestral sequence. Calibrating the molecular clock with an estimate of substitution rate from Alu repeats in mammals suggests that an average transversion frequency of 0.1 corresponds to 35 MYA (Kapitonov and Jurka 1996Go; Arndt, Petrov, and Hwa 2003Go; ICGSC 2004Go). If we apply this estimate to CR1 repeats, this leads to a maximum age of repeats analyzed here of ~155 MYA, with the majority of repeats being inserted 50–125 MYA. Phylogenetic analysis of the ancestral sequences suggests that CR1 repeats are distantly related to mammalian L3 elements and that some chicken CR1 repeat families predate the chicken-turtle split (~210–230 MYA; Hedges and Poling 1999Go; ICGSC 2004Go). The reason for this discrepancy is unclear, but it could indicate that the rate of neutral nucleotide substitution has been slower on average on the chicken than on the human lineage since the chicken/human split, as observed in the rate of genome rearrangements (Bourque et al. 2005Go). Nevertheless, our analysis of CR1 elements should provide good estimates of average substitution patterns since the origin of avian lineages. As we have only used repeat elements from the CR1 family, differences in nucleotide substitution between genomic regions cannot be due to differences in the pattern of evolution between different classes of repeat element.

We used an ML method that can accurately reconstruct neutral substitution pattern, including the neighbor-dependent CpG rate, to infer patterns of evolution in CR1 repeats. This is important because it takes into account multiple hits and can model the effect of CpG mutations, which may also affect sites that are not ancestrally CpG. As with other analyses of this type, it is necessary to assume a star phylogeny, which implies that insertions of particular CR1 families occur in rapid bursts followed by inactivity. This is a generally accepted model for the evolution of vertebrate interspersed repeats (Kapitonov and Jurka 1996Go; Jurka 2000Go). We make the assumption that the master copy in RepBase identified by RepeatMasker at each insertion site is the true ancestral sequence, and all the differences with the descendent sequences were accumulated due to neutral substitutions subsequent to insertion.

The set of 22 CR1 master copies currently available in RepBase was constructed by improving on a previously defined set of transposable elements using the program RECON (Bao and Eddy 2002Go) as part of the analysis of the completed chicken genome (ICGSC 2004Go). The RECON program has been demonstrated to recover the known families of transposable elements in the human genome with high accuracy (Bao and Eddy 2002Go). The master copies in RepBase should therefore correspond well to the ancestral sequences of the CR1 repeats in the chicken genome. However, we cannot rule out the possibility that as yet unidentified master copies have acted as secondary source elements, as may occur in human Alu repeats (Cordaux et al. 2004Go). Errors in identification of the correct master element would mean that some of the estimated substitutions actually occurred between the identified master copy and the true ancestor, rather than accumulating neutrally subsequent to insertion. This would result in genome-wide errors and would therefore lead to inference of a more homogenized substitution pattern. As we observe strong regional biases in the pattern of substitution, we argue that it is unlikely that the presence of unidentified master copies have strongly influenced the results.

Another potential problem with the use of repeat elements to infer patterns of substitution is that they may not be representative of noncoding sequence in general. In particular, CR1 elements are rich in CpG sites, which may experience higher degrees of methylation in transposable elements than in surrounding noncoding DNA (Meunier et al. 2005Go). This could cause rates of CpG mutability to be slightly elevated in CR1 elements. Furthermore, ectopic gene conversion could occur between CR1 repeats, which could bias the pattern of neutral substitution, possibly leading to elevated estimates of GC* (Galtier 2003Go). However, as the effects of CpG mutability and gene conversion are likely to influence patterns of substitution at all CR1 repeats in a similar fashion, we do not expect them to contribute to the significant regional biases in patterns of nucleotide substitution that we infer.

Evolution of Isochores
In contrast to mammals, where GC content is becoming homogenized, patterns of molecular evolution in CR1 repeats indicates that genomic heterogeneity in GC content along the chicken lineage is increasing. Our analysis of patterns of substitution along the chicken and turkey lineages using intronic alignments also indicates that the forces maintaining variation in GC content are much stronger than in mammals.

What are the forces responsible for variation in GC? As the phylogenetic distribution of GC-rich isochores indicates that they have a common origin in the amniote common ancestor (see Introduction), it is likely that similar processes govern the evolution of GC content in mammals and birds. In primate noncoding alignments, a strong correlation between recombination rate and GC* indicates that recombination drives the evolution of GC content (Meunier and Duret 2004Go). Unfortunately, fine-scale recombination maps are not available for the chicken genome. In humans, recombination is known to be highly variable and rapidly evolving, even on the kilobase scale (Kauppi, Jeffreys, and Keeney 2004Go; McVean et al. 2004Go; Ptak et al. 2004Go). As GC content is known to correlate with recombination in birds (Hurst, Brunton, and Smith 1999Go; Galtier et al. 2001Go; ICGSC 2004Go) and a variety of other organisms (Birdsell 2002Go), it is the best available measure for local rates of recombination. Insight into the effect of recombination on the pattern of nucleotide substitution can therefore be gained by examining the variation of individual substitution rates with GC content. Those substitutions that affect GC content (AT -> GC or GC -> AT) all show a strong relationship with GC content. The AT -> GC substitution frequency increases with GC content, whereas the GC -> AT substitution frequency decreases with GC content (fig. 4B and C). These findings are compatible with an increased bias toward fixation of G:C over A:T alleles in regions of higher recombination. This is consistent with a strong effect of BGC in regions of high recombination. It should, however, be noted that so far there is no direct evidence to suggest that that BGC is an important process in birds.

What could be responsible for the differences in the evolution of GC content between mammals and birds? The karyotype of chicken is divided into macro- and microchromosomes and characterized by extreme variation in GC content and recombination rate. The BGC hypothesis suggests that these factors are linked because smaller chromosomes tend to have higher recombination rates and hence experience a higher intensity of BGC, which results in elevated GC content. There is good evidence from comparative genomics that the ancestral amniote karyotype was similar to the chicken genome (Burt et al. 1999Go; Burt 2002Go; ICGSC 2004Go; Bourque et al. 2005Go). This could suggest that a GC-reinforcing pattern of nucleotide substitution, as inferred in the chicken genome, was also present in the ancestral amniote genome. The genome of the ancestral bony vertebrate genome (450 MYA) has been estimated to have contained 12 chromosomes by comparison of human and tetraodon (ICGSC 2004Go; Jaillon et al. 2004Go). Hence, it appears that the ancestral amniote karyotype evolved after this split. Heterogeneity in GC content could then have arisen due to increased variability in recombination rates.

Although it seems likely that the major trends for GC content are decay in mammals and reinforcement in birds, the detailed picture is probably far more complex. Chromosome number and genome size vary both within and between mammalian and avian orders, and extremes of recombination rate are therefore possible in a variety of species. In general, avian genomes have a large number of chromosomes (2n is usually 60–80) and genome sizes of roughly 1–2 billion base pairs. Relative to mammals, large-scale genome duplications and rearrangements are infrequent (Shetty, Griffin, and Graves 1999Go; Bourque et al. 2005Go). Mammalian genomes are roughly 2–4 billion base pairs in size but exhibit large variability in chromosome numbers (Gregory 2005Go). This large variation in karyotype suggests that many mammalian genomes may have regions with high recombination rates. It is therefore quite possible that GC-rich isochores are being preserved or reinforced in parts of some mammalian genomes. Likewise, they may be found in lineages outside amniotes: both a GC repair bias and a correlation between GC content and recombination has also been observed in many species unrelated to amniotes, including amphibians, plants, fish, yeast, and bacteria (Birdsell 2002Go). However, the broad picture that is emerging is one whereby strong variation in GC arose in the ancestor of birds, mammals, and reptiles and that this heterogeneity has been maintained or reinforced along some lineages, whereas others show a tendency for homogenization.

Determinants of Nucleotide Substitution Rate
The strong isochore structure of the chicken genome and the finding that this heterogeneity in GC content is being reinforced have important consequences in generating variation in mutation and substitution rate across the genome. Some recent studies have suggested that recombination is mutagenic in humans (Lercher and Hurst 2002Go; Hellmann et al. 2003Go). To examine this potential effect, Filatov (2004)Go analyzed rates and patterns of substitution in the highly recombining human p-arm pseudoautosomal region. In order to exclude the potential effects of BGC, which only affects AT -> GC and GC -> AT mutations, only A {leftrightarrow} T and G {leftrightarrow} C substitutions were considered. As these rates were elevated in the pseudoautosomal region, it was concluded that an additional factor such as a mutagenic effect of recombination was important. To examine this using the current data set, we studied variation with GC content of the same transversion frequencies (A {leftrightarrow} T and G {leftrightarrow} C). These substitution frequencies show little variation with GC content. Indeed, all the predicted net variation in substitution rates is due to AT -> GC and GC -> AT substitutions. If a process such as recombination increases all forms of mutation in GC-rich regions, we would expect substitutions that do not affect GC to also change. As this is not observed, recombination may not be mutagenic in chicken. Alternatively, it is possible that it does not cause A {leftrightarrow} T or G {leftrightarrow} C mutations or that some information is lost by using GC content as a proxy for recombination.

The main trends of variation in substitution frequencies are a strong increase in AT -> GC substitutions with GC content and corresponding decrease in the opposite GC -> AT type (fig. 4B and C). This is consistent with the action of BGC, which favors the fixation of G:C over A:T alleles in regions of higher recombination rate (and GC content). However, as shown in figure 5B and C, the net effect of these opposing substitutions on GC content is expected to cancel out (i.e., the relationship between GC content and the net predicted AT -> GC and GC -> AT substitution rates is roughly the same). This results in GC content remaining approximately stable, as evidenced by a good correspondence between GC* and GC content in fig. 2A (although the gradient is actually significantly greater than one). In concordance with the findings of Axelsson et al. (2005)Go and ICGSC (2004)Go, the net predicted effect of the variation in substitution frequencies is a strong elevation of substitution rate in microchromosomes and other GC-rich regions, such as the distal portions of macrochromosomes encompassing the subtelomeric regions. Examination of the residuals of the correlations between GC content and the rate and pattern of substitution indicates that the majority of variation in nucleotide substitution between chromosome types and distal regions can be accounted for by GC content. We therefore have no evidence to suggest that there are any qualitative differences in the mode of evolution between these genomic regions.

What is the cause of higher substitution rates in GC-rich regions? It is likely that the patterns result from a complex interaction of factors influencing mutation and fixation. From figure 5C, it seems clear that there are an increased number of predicted changes due to CpG mutations in GC-rich regions. This is not unexpected, as CpG sites are highly mutable and more frequent in GC-rich regions. However, there is also a corresponding increase in AT -> GC substitutions (fig. 5B). It is likely that BGC is important in generating this increase as it favors the fixation of G:C alleles over A:T in regions of high recombination. Hence, this could lead to a dynamic situation where the high rate of decay of CpG sites is balanced by BGC, which creates new CpG sites in GC-rich regions. So far this situation has not been explicitly modeled, although good simulations exist where fixation biases are uniform across the genome (Piganeau et al. 2002Go). Notably, recombination and CpG motifs are strongly correlated in the human genome (Kong et al. 2002Go). This could be explained by BGC favoring the fixation of G:C alleles and thus generating new CpG sites. The correlation between divergence and recombination reported in humans and related species (Hellmann et al. 2003Go) could also be partly due to this effect. Further simulations would be helpful in understanding this dynamic process.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary Table 1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
This work was supported by the Swedish Research Council and Science Foundation Ireland. We thank Arian Smit and Robert Hubley for help with RepeatMasker and Ken Wolfe, Gavin Conant, and Marie Sémon for critical reading of the manuscript.


    Footnotes
 
1 Present address: Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland. Back

Aoife McLysaght, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Alvarez-Valin, F., O. Clay, S. Cruveiller, and G. Bernardi. 2004. Inaccurate reconstruction of ancestral GC levels creates a "vanishing isochores" effect. Mol. Phylogenet. Evol. 31:788–793.[CrossRef][Web of Science][Medline]

    Antezana, M. A. 2005. Mammalian GC content is very close to mutational equilibrium. J. Mol. Evol. 61:834–836.[CrossRef][Web of Science][Medline]

    Arndt, P. F., C. B. Burge, and T. Hwa. 2003. DNA sequence evolution with neighbor-dependent mutation. J. Comput. Biol. 10:313–322.[CrossRef][Web of Science][Medline]

    Arndt, P. F., T. Hwa, and D. A. Petrov. 2005. Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density, and telomere-specific effects. J. Mol. Evol. 60:748–763.[CrossRef][Web of Science][Medline]

    Arndt, P. F., D. A. Petrov, and T. Hwa. 2003. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol. Biol. Evol. 20:1887–1896.[Abstract/Free Full Text]

    Axelsson, E., M. T. Webster, N. G. Smith, D. W. Burt, and H. Ellegren. 2005. Comparison of the chicken and turkey genomes reveals a higher rate of nucleotide divergence on microchromosomes than macrochromosomes. Genome Res. 15:120–125.[Abstract/Free Full Text]

    Bao, Z., and S. R. Eddy. 2002. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12:1269–1276.[Abstract/Free Full Text]

    Belle, E. M., L. Duret, N. Galtier, and A. Eyre-Walker. 2004. The decline of isochores in mammals: an assessment of the GC content variation along the mammalian phylogeny. J. Mol. Evol. 58:653–660.[CrossRef][Web of Science][Medline]

    Bernardi, G. 2000. Isochores and the evolutionary genomics of vertebrates. Gene 241:3–17.[CrossRef][Web of Science][Medline]

    Bernardi, G., S. Hughes, and D. Mouchiroud. 1997. The major compositional transitions in the vertebrate genome. J. Mol. Evol. 44(Suppl. 1):S44–S51.

    Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953–958.[Abstract/Free Full Text]

    Birdsell, J. A. 2002. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19:1181–1197.[Abstract/Free Full Text]

    Bourque, G., E. M. Zdobnov, P. Bork, P. A. Pevzner, and G. Tesler. 2005. Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages. Genome Res. 15:98–110.[Abstract/Free Full Text]

    Burt, D. W. 2002. Origin and evolution of avian microchromosomes. Cytogenet. Genome Res. 96:97–112.[CrossRef][Web of Science][Medline]

    Burt, D. W., C. Bruley, I. C. Dunn et al. (13 co-authors). 1999. The dynamics of chromosome evolution in birds and mammals. Nature 402:411–413.[CrossRef]

    Bush, G. L., S. M. Case, A. C. Wilson, and J. L. Patton. 1977. Rapid speciation and chromosomal evolution in mammals. Proc. Natl. Acad. Sci. USA 74:3942–3946.[Abstract/Free Full Text]

    Caron, H., B. van Schaik, M. van der Mee et al. (13 co-authors). 2001. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291:1289–1292.[Abstract/Free Full Text]

    Cordaux, R., D. J. Hedges, M. A. Batzer, P. L. Deininger, J. V. Moran, and H. H. Kazazian Jr. 2004. Retrotransposition of Alu elements: how many sources? Trends Genet. 20:464–467.[CrossRef][Web of Science][Medline]

    Dimcheff, D. E., S. V. Drovetski, and D. P. Mindell. 2002. Phylogeny of Tetraoninae and other galliform birds using mitochondrial 12S and ND2 genes. Mol. Phylogenet. Evol. 24:203–215.[CrossRef][Web of Science][Medline]

    Duret, L. 2006. The GC content of primates and rodents genomes is not at equilibrium: a reply to Antezana. J. Mol. Evol. (in press).

    Duret, L., M. Semon, G. Piganeau, D. Mouchiroud, and N. Galtier. 2002. Vanishing GC-rich isochores in mammalian genomes. Genetics 162:1837–1847.[Abstract/Free Full Text]

    Ellegren, H., N. G. Smith, and M. T. Webster. 2003. Mutation rate variation in the mammalian genome. Curr. Opin. Genet. Dev. 13:562–568.[CrossRef][Web of Science][Medline]

    Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc. R. Soc. Lond. B Biol. Sci. 252:237–243.[Medline]

    ———. 1998. Problems with parsimony in sequences of biased base composition. J. Mol. Evol. 47:686–690.[CrossRef][Web of Science][Medline]

    Filatov, D. A. 2004. A gradient of silent substitution rate in the human pseudoautosomal region. Mol. Biol. Evol. 21:410–417.[Abstract/Free Full Text]

    Filipski, J. 1987. Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Lett. 217:184–186.[CrossRef][Web of Science][Medline]

    Filipski, J., J. P. Thiery, and G. Bernardi. 1973. An analysis of the bovine genome by Cs2SO4-Ag density gradient centrifugation. J. Mol. Biol. 80:177–197.[CrossRef][Web of Science][Medline]

    Fullerton, S. M., A. Bernardo Carvalho, and A. G. Clark. 2001. Local rates of recombination are positively correlated with GC content in the human genome. Mol. Biol. Evol. 18:1139–1142.[Free Full Text]

    Galtier, N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 19:65–68.[CrossRef][Web of Science][Medline]

    Galtier, N., G. Piganeau, D. Mouchiroud, and L. Duret. 2001. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159:907–911.[Free Full Text]

    Gregory, T. R. 2005. Animal genome size database. (http://www.genomesize.com).

    Hedges, S. B., and L. L. Poling. 1999. A molecular phylogeny of reptiles. Science 283:998–1001.[Abstract/Free Full Text]

    Hellmann, I., I. Ebersberger, S. E. Ptak, S. Paabo, and M. Przeworski. 2003. A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72:1527–1535.[CrossRef][Web of Science][Medline]

    Hughes, S., O. Clay, and G. Bernardi. 2002. Compositional patterns in reptilian genomes. Gene 295:323–329.[CrossRef][Web of Science][Medline]

    Hughes, S., D. Zelus, and D. Mouchiroud. 1999. Warm-blooded isochore structure in Nile crocodile and turtle. Mol. Biol. Evol. 16:1521–1527.[Abstract]

    Hurst, L. D., C. F. Brunton, and N. G. Smith. 1999. Small introns tend to occur in GC-rich regions in some but not all vertebrates. Trends Genet. 15:437–439.[CrossRef][Web of Science][Medline]

    Hurst, L. D., and E. J. Williams. 2000. Covariation of GC content and the silent site substitution rate in rodents: implications for methodology and for the evolution of isochores. Gene 261:107–114.[CrossRef][Web of Science][Medline]

    [ICGSC] International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695–716.[CrossRef][Medline]

    [IHGSC] International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921.[CrossRef][Medline]

    Jaillon, O., J. M. Aury, F. Brunet et al. (61 co-authors). 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946–957.[CrossRef][Medline]

    Jurka, J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16:418–420.[CrossRef][Web of Science][Medline]

    Kapitonov, V., and J. Jurka. 1996. The age of Alu subfamilies. J. Mol. Evol. 42:59–65.[CrossRef][Web of Science][Medline]

    Kauppi, L., A. J. Jeffreys, and S. Keeney. 2004. Where the crossovers are: recombination distributions in mammals. Nat. Rev. Genet. 5:413–424.[CrossRef][Web of Science][Medline]

    Kong, A., D. F. Gudbjartsson, J. Sainz et al. (16 co-authors). 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31:241–247.[CrossRef][Web of Science][Medline]

    Lercher, M. J., and L. D. Hurst. 2002. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18:337–340.[CrossRef][Web of Science][Medline]

    Lercher, M. J., N. G. Smith, A. Eyre-Walker, and L. D. Hurst. 2002. The evolution of isochores. Evidence from SNP frequency distributions. Genetics 162:1805–1810.[Abstract/Free Full Text]

    Lercher, M. J., A. O. Urrutia, A. Pavlicek, and L. D. Hurst. 2003. A unification of mosaic structures in the human genome. Hum. Mol. Genet. 12:2411–2415.[Abstract/Free Full Text]

    Marais, G. 2003. Biased gene conversion: implications for genome and sex evolution. Trends Genet. 19:330–338.[CrossRef][Web of Science][Medline]

    McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654.[CrossRef][Medline]

    McVean, G. A., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley, and P. Donnelly. 2004. The fine-scale structure of recombination rate variation in the human genome. Science 304:581–584.[Abstract/Free Full Text]

    Meunier, J., and L. Duret. 2004. Recombination drives the evolution of GC-content in the human genome. Mol. Biol. Evol. 21:984–990.[Abstract/Free Full Text]

    Meunier, J., A. Khelifi, V. Navratil, and L. Duret. 2005. Homology-dependent methylation in primate repetitive DNA. Proc. Natl. Acad. Sci. USA 102:5471–5476.[Abstract/Free Full Text]

    Montoya-Burgos, J. I., P. Boursot, and N. Galtier. 2003. Recombination explains isochores in mammalian genomes. Trends Genet. 19:128–130.[CrossRef][Web of Science][Medline]

    Mouchiroud, D., G. D'Onofrio, B. Aissani, G. Macaya, C. Gautier, and G. Bernardi. 1991. The distribution of genes in the human genome. Gene 100:181–187.[CrossRef][Web of Science][Medline]

    Nagylaki, T. 1983. Evolution of a finite population under gene conversion. Proc. Natl. Acad. Sci. USA 80:6278–6281.[Abstract/Free Full Text]

    Nekrutenko, A., and W. H. Li. 2000. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10:1986–1995.[Abstract/Free Full Text]

    Otto, S. P., and T. Lenormand. 2002. Resolving the paradox of sex and recombination. Nat. Rev. Genet. 3:252–261.[CrossRef][Web of Science][Medline]

    Pardo-Manuel de Villena, F., and C. Sapienza. 2001. Female meiosis drives karyotypic evolution in mammals. Genetics 159:1179–1189.[Abstract/Free Full Text]

    Piganeau, G., D. Mouchiroud, L. Duret, and C. Gautier. 2002. Expected relationship between the silent substitution rate and the GC content: implications for the evolution of isochores. J. Mol. Evol. 54:129–133.[CrossRef][Web of Science][Medline]

    Ptak, S. E., A. D. Roeder, M. Stephens, Y. Gilad, S. Paabo, and M. Przeworski. 2004. Absence of the TAP2 human recombination hotspot in chimpanzees. PLoS Biol. 2:e155.[CrossRef][Medline]

    Saccone, S., A. De Sario, J. Wiegant, A. K. Raap, G. Della Valle, and G. Bernardi. 1993. Correlations between isochores and chromosomal bands in the human genome. Proc. Natl. Acad. Sci. USA 90:11929–11933.[Abstract/Free Full Text]

    Shetty, S., D. K. Griffin, and J. A. Graves. 1999. Comparative painting reveals strong chromosome homology over 80 million years of bird evolution. Chromosome Res. 7:289–295.[CrossRef][Web of Science][Medline]

    Smith, N. G., and A. Eyre-Walker. 2002. The compositional evolution of the murid genome. J. Mol. Evol. 55:197–201.[CrossRef][Web of Science][Medline]

    Smith, N. G., M. T. Webster, and H. Ellegren. 2002. Deterministic mutation rate variation in the human genome. Genome Res. 12:1350–1356.[Abstract/Free Full Text]

    Sueoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653–2657.[Abstract/Free Full Text]

    Templeton, A. R., A. G. Clark, K. M. Weiss, D. A. Nickerson, E. Boerwinkle, and C. F. Sing. 2000. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66:69–83.[CrossRef][Web of Science][Medline]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract/Free Full Text]

    Vinogradov, A. E. 2003. DNA helix: the importance of being GC-rich. Nucleic Acids Res. 31:1838–1844.[Abstract/Free Full Text]

    ———. 2005. Noncoding DNA, isochores and gene expression: nucleosome formation potential. Nucleic Acids Res. 33:559–563.[Abstract/Free Full Text]

    Webster, M. T., and N. G. Smith. 2004. Fixation biases affecting human SNPs. Trends Genet. 20:122–126.[CrossRef][Web of Science][Medline]

    Webster, M. T., N. G. Smith, and H. Ellegren. 2003. Compositional evolution of noncoding DNA in the human and chimpanzee genomes. Mol. Biol. Evol. 20:278–286.[Abstract/Free Full Text]

    Webster, M. T., N. G. Smith, L. Hultin-Rosenberg, P. F. Arndt, and H. Ellegren. 2005. Male-driven biased gene conversion governs the evolution of base composition in human alu repeats. Mol. Biol. Evol. 22:1468–1474.[Abstract/Free Full Text]

    Wicker, T., J. S. Robertson, S. R. Schulze et al. (11 co-authors). 2005. The repetitive landscape of the chicken genome. Genome Res. 15:126–136.[Abstract/Free Full Text]

    Wolfe, K. H., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283–285.[CrossRef][Medline]

    Yang, A. S., M. L. Gonzalgo, J. M. Zingg, R. P. Millar, J. D. Buckley, and P. A. Jones. 1996. The rate of CpG mutation in Alu repetitive elements within the p53 tumor suppressor gene in the primate germline. J. Mol. Biol. 258:240–250.[CrossRef][Web of Science][Medline]

Accepted for publication March 13, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J HeredHome page
D. E. Janes, T. Ezaz, J. A. Marshall Graves, and S. V. Edwards
Recombination and Nucleotide Diversity in the Sex Chromosomal Pseudoautosomal Region of the Emu, Dromaius novaehollandiae
J. Hered., March 1, 2009; 100(2): 125 - 136.
[Abstract] [Full Text] [PDF]


Home page
Integr. Comp. Biol.Home page
C. L. Organ, R. G. Moreno, and S. V. Edwards
Three tiers of genome evolution in reptiles
Integr. Comp. Biol., October 1, 2008; 48(4): 494 - 504.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Peifer, J. E. Karro, and H. H. von Grunberg
Is there an acceleration of the CpG transition rate during the mammalian radiation?
Bioinformatics, October 1, 2008; 24(19): 2157 - 2164.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. Nam and H. Ellegren
The Chicken (Gallus gallus) Z Chromosome Contains at Least Three Nonlinear Evolutionary Strata
Genetics, October 1, 2008; 180(2): 1131 - 1136.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Brandstrom and H. Ellegren
Genome-wide analysis of microsatellite polymorphism in chicken circumventing the ascertainment bias
Genome Res., June 1, 2008; 18(6): 881 - 887.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
G. Abrusan, H.-J. Krambeck, T. Junier, J. Giordano, and P. E. Warburton
Biased Distributions and Decay of Long Interspersed Nuclear Elements in the Chicken Genome
Genetics, January 1, 2008; 178(1): 573 - 581.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. E. Mank, L. Hultin-Rosenberg, E. Axelsson, and H. Ellegren
Rapid Evolution of Female-Biased, but Not Male-Biased, Genes Expressed in the Avian Brain
Mol. Biol. Evol., December 1, 2007; 24(12): 2698 - 2706.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Gordon, S. Yang, M. Tran-Gyamfi, D. Baggott, M. Christensen, A. Hamilton, R. Crooijmans, M. Groenen, S. Lucas, I. Ovcharenko, et al.
Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions
Genome Res., November 1, 2007; 17(11): 1603 - 1613.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Brandstrom and H. Ellegren
The Genomic Landscape of Short Insertion and Deletion Polymorphisms in the Chicken (Gallus gallus) Genome: A High Frequency of Deletions in Tandem Duplicates
Genetics, July 1, 2007; 176(3): 1691 - 1701.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. E. Mank, E. Axelsson, and H. Ellegren
Fast-X on the Z: Rapid evolution of sex-linked genes in birds
Genome Res., May 1, 2007; 17(5): 618 - 624.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. M. Hoffman and E. Birney
Estimating the Neutral Rate of Nucleotide Substitution Using Introns
Mol. Biol. Evol., February 1, 2007; 24(2): 522 - 531.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/6/1203    most recent
msk008v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Webster, M. T.
Right arrow Articles by Ellegren, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Webster, M. T.
Right arrow Articles by Ellegren, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?