MBE Advance Access originally published online on July 26, 2007
Molecular Biology and Evolution 2007 24(10):2196-2202; doi:10.1093/molbev/msm149
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Context-Dependent Mutation Rates May Cause Spurious Signatures of a Fixation Bias Favoring Higher GC-Content in Humans

* Biological Statistics and Computational Biology, Cornell University
Clinical Science, Cornell University
E-mail: cdb28{at}cornell.edu.
| Abstract |
|---|
|
|
|---|
Understanding the proximate and ultimate causes underlying the evolution of nucleotide composition in mammalian genomes is of fundamental interest to the study of molecular evolution. Comparative genomics studies have revealed that many more substitutions occur from G and C nucleotides to A and T nucleotides than the reverse, suggesting that mammalian genomes are not at equilibrium for base composition. Analysis of human polymorphism data suggests that mutations that increase GC-content tend to be at much higher frequencies than those that decrease or preserve GC-content when the ancestral allele is inferred via parsimony using the chimpanzee genome. These observations have been interpreted as evidence for a fixation bias in favor of G and C alleles due to either positive natural selection or biased gene conversion. Here, we test the robustness of this interpretation to violations of the parsimony assumption using a data set of 21,488 noncoding single nucleotide polymorphisms (SNPs) discovered by the National Institute of Environmental Health Sciences (NIEHS) SNPs project via direct resequencing of n = 95 individuals. Applying standard nonparametric and parametric population genetic approaches, we replicate the signatures of a fixation bias in favor of G and C alleles when the ancestral base is assumed to be the base found in the chimpanzee outgroup. However, upon taking into account the probability of misidentifying the ancestral state of each SNP using a context-dependent mutation model, the corrected distribution of SNP frequencies for GC-content increasing SNPs are nearly indistinguishable from the patterns observed for other types of mutations, suggesting that the signature of fixation bias is a spurious artifact of the parsimony assumption.
Key Words: ancestral misidentification biased gene conversion context dependence GC-content human natural selection single nucleotide polymorphism site-frequency spectrum
| Introduction |
|---|
|
|
|---|
Thirty years ago, mammalian genomes were described as mosaics of isochores or long stretches of DNA with relatively homogeneous base composition (Macaya et al. 1976
310–350 MYA (Bernardi et al. 1997
Comparative and population genomic data suggests that mammalian genomes may not be at compositional equilibrium and predict that GC-rich isochores are being degraded by mutation (Galtier and Gouy 1998
; Arndt et al. 2003
; Belle et al. 2004
; Meunier and Duret 2004
). Likewise, several studies have analyzed human polymorphism data and found evidence for a fixation bias in favor of mutations that increase GC-content (i.e., from A or T to G or C, denoted AT
GC; Eyre-Walker 1999
; Webster et al. 2003
). Such a fixation bias could be caused by either natural selection or biased gene conversion (i.e., when nucleotide mismatches formed from the hybridization of 2 DNA strands during meiosis is repaired asymmetrically). Thus, the prevailing view is that mutation biases or compositional disequilibrium tend to erode GC-content and biased fixation rates tend to increase it.
A powerful approach for detecting a fixation bias is the analysis of the frequency distribution of SNPs (i.e., the "unfolded" site frequency spectrum [SFS]) (Akashi 1999
; Bustamante et al. 2001
; Nielsen et al. 2005
). Several studies have utilized such tools with a variety of models and data summaries to suggest the presence of an AT
GC fixation bias in the human genome, especially in regions of high GC-content (Duret et al. 2002
; Lercher et al. 2002
; Webster and Smith 2004
), with similar patterns observed in the proximal regions of recombination hot spots (Spencer 2006
; Spencer et al. 2006
). Many of these tests rely on using a parsimony assumption and outgroup sequence data to distinguish ancestral alleles from derived alleles (i.e., the segregating allele matching the outgroup allele is assumed to be ancestral), though Lercher et al. (2002)
did not use an outgroup and Webster and Smith (2004)
developed a weighted parsimony technique (discussed below).
In a recent article (Hernandez et al. 2007
), we presented a flexible method for relaxing the parsimony assumption by using a context-dependent mutation model, which includes features such as elevated mutation rates at CpG dinucleotides, increased propensity for transitional versus transversional mutations, as well as other directional and contextual mutation biases inferred along the human lineage by Hwang and Green (2004)
. We found that even for species as closely related as human and chimpanzee, enough unobserved nucleotide substitutions could have occurred to make some population genetic analyses spuriously reject neutrality. The spurious signal of selection is due to misidentifying the ancestral state of some SNPs via the parsimony assumption. Because most derived mutations tend to be rare, ancestral misidentification will most often lead to mislabeling low-frequency variants as extremely high-frequency mutations (a characteristic that would be consistent with a fixation bias).
Because the mutation rate from GC
AT tends to be approximately 2-fold higher than the reverse (Hwang and Green 2004
), if an AT
GC substitution should occur during the divergence of 2 species, the site-specific mutation rate would immediately increase
2-fold, thereby doubling the relative probability of another mutation at this site on the same lineage. Should a polymorphism arise at such a site, an allele that matches the outgroup would be due to homoplasy and not indicative of ancestry. A further complicating factor in the analysis of the SFS is that historical demographic effects can have a large impact on the underlying frequency distribution of derived mutations (Slatkin and Hudson 1991
; Nielsen 2001
). Without explicitly accounting for such demographic forces, population genetic tests of the SFS can lead either to a false rejection of selective neutrality or to a poor fitting model.
Here, we use 2 approaches to test the fixation bias hypothesis using a large noncoding human polymorphism data set with and without correcting for ancestral misidentification. We find that when ancestral misidentification is not taken into account, neutral population genetic models tend to fit the data very poorly and suggest strong evidence for nonneutral processes acting on GC-content in the human genome. After correcting for ancestral misidentification using the method of Hernandez et al. (2007)
, we find much of the statistical evidence for a fixation bias favoring G and C alleles from SNP data goes away, suggesting that the result is an artifact of ancestral misidentification.
| Materials and Methods |
|---|
|
|
|---|
Data
The data used in this study were retrieved from the NIEHS Environmental Genome Project Web site (http://egp.gs.washington.edu). Our final data set represents a collection of SNPs obtained through direct sequencing of 161 genes (along with flanking and intronic regions) in a sample of 95 individuals (190 chromosomes) from 5 worldwide populations (panel 2): 15 African-American, 12 African (Yoruba), 22 European, 22 Hispanic, and 24 Asian individuals (Livingston et al. 2004
Our analysis is based primarily on the frequency distribution of derived mutations at all observed SNPs (i.e., the SFS). The SFS is a random vector that represents the number of SNPs whose derived allele is observed at each frequency in our sample of chromosomes. Missing sequence data from some chromosomes can cause SNPs to have a sample size smaller than the total set of 190 chromosomes. Rather than discarding all such SNPs, we only removed SNPs that had a sample size less than 40 (4 SNPs) and performed our analysis on the expected SFS in a subsample of size 40 chromosomes (Marth et al. 2004
; Nielsen et al. 2004
).
Below, we describe a population genetic model to infer the parameters of a demographic model that allows for a fixation bias favoring GC-content. Because our demographic model cannot accommodate the complex dynamics of the full data set, only the results of analyzing the African-American and Yoruban populations will be reported (though not shown, the model fits the other populations very poorly). To accommodate the missing data in the African-American and Yoruban populations, both were analyzed using the expected SFS in a subsample of size 12 chromosomes (as above).
Testing the Significance of a Fixation Bias Favoring GC-Content
Our interest is in identifying whether or not natural selection or biased gene conversion has been acting on GC-content in the human genome. To do so, we analyzed the data set pooled across populations using 2 nonparametric tests: the Mann–Whitney U test (MWU) and the Kolmogorov–Smirnov test (KS). We also analyzed the African-American and Yoruban populations individually using a population genetic model of demography and selection. We performed all tests before and after correcting for the probability of misidentifying the ancestral state of each SNP (as discussed in Hernandez et al. [2007]
).
Because recent demographic effects can confound inference of fixation biases using the SFS, applying population genetic techniques to infer the presence/strength of a fixation bias without taking into account the effect of demography may lead to highly biased results (Nielsen 2001
; Williamson et al. 2005![]()
). We, therefore, adapted a recently proposed method for simultaneously inferring the demographic history of a population and the strength of a fixation bias for the analysis of both the African-American and Yoruban populations (Williamson et al. 2005
). In its original form, the method pooled synonymous and noncoding SNPs (i.e., the putatively neutral or class 1 SNPs) to estimate the time back to a population size change event (tdem) that had magnitude
= Na/Nc (the ratio of the ancestral to current population sizes). Then, the strength of selection acting on nonsynonymous SNPs (i.e., class 2) was inferred conditional on the nonstationary demographic model from synonymous and noncoding SNPs.
We consider 4 models of the SFS. The first model (MSNM) represents the standard neutral model (SNM) and has 0 free parameters. The second model (Mdem) represents a neutral demographic model and has 2 free parameters (tdem and
). The third model (Mfix) assumes that the size of the population changed at some time tdem in the past with magnitude
and that all mutations are selectively neutral except for some proportion of AT
GC mutations (
), which experience a common fixation bias denoted
(a total of 4 free parameters). In this model, we consider the potential fixation bias favoring AT
GC mutations to be analogous to the population-scaled selective effect of a new mutation as in previous studies (Duret et al. 2002
; Lercher et al. 2002
; Webster and Smith 2004
), which could be due to either natural selection or biased gene conversion.
Model Mfix is a modification of Williamson et al. (2005)
, which allows only a proportion (
) of nonlethal mutations to be subject to a fixation bias. Allowing only a proportion of AT
GC mutations to be subject to the fixation bias enables us to identify the effect even if it is restricted to small regions of the genome (e.g., regions of high GC-content [Duret et al. 2002
; Lercher et al. 2002
] or near recombination hot spots [Spencer 2006
; Spencer et al. 2006
]). To write down the likelihood function for the new model, we updated the distribution of allele frequencies for the AT
GC mutations (f2(·) in the notation of Williamson et al. [2005]
). In our model, we assume that the fixation bias parameter of a nonlethal AT
GC mutation is either neutral (i.e.,
= 0) or nonneutral (i.e.,
0), with probabilities 1–
and 
(respectively) and that all other types of mutations drift neutrally (i.e., in the absence of a fixation bias,
= 0). This implies that the fixation bias of nonlethal AT
GC mutations come from a mixture distribution, which can readily be incorporated into the new distribution of allele frequencies,
(x|
, 
,tdem,
). Namely,
|
| (1) |
, tdem,
) derives from the numerical solution to the allele frequency distribution of a mutation with a fixation bias of strength
in a nonstationary population found by Williamson et al. (2005)
GC mutation at frequency i is P(i|
, 
,tdem,
), which can be found by substituting our equation (1) into equation (9) of Williamson et al. (2005)
GC drift neutrally, the probability of observing a non-AT
GC mutation at frequency i is P(i|0, 0, tdem,
), which is equivalent to equation (6) of Williamson et al. (2005)
, 
, tdem,
), is then written as
![]() | (2) |
GC(i) is the number of SNPs from AT
GC at frequency i and Kother(i) is the number of SNPs at frequency i that either preserve or decrease GC-content. Note that we did not implement the probability of ancestral misidentification used in Williamson et al. (2005)
The fourth model (Mmult) is a multinomial model, where the probability of observing a SNP of a given mutation class (i.e., AT
GC or other) at frequency i is given by the observed proportion of SNPs in that mutation class at frequency i. This is the most general model and has 2(n – 2) free parameters.
Our test for a fixation bias favoring GC-content involves 4 likelihood ratio tests (LRTs). The first test compares model MSNM with model Mdem. If the log likelihood of Mdem (denoted Ldem) is significantly larger than LSNM, we reject MSNM in favor of the neutral demographic model. Our second test compares the log likelihood of the neutral demographic model (Ldem) with the log likelihood of the demographic model with a fixation bias favoring AT
GC mutations (Lfix). If Lfix is significantly larger than Ldem, we reject the neutral demographic hypothesis in favor of the model with a fixation bias favoring AT
GC mutations. Finally, we perform a goodness of fit (GOF) test on both Mdem and Mfix by comparing Ldem and Lfix with Lmult (the log likelihood of model Mmult). Note that a P value larger than 0.05 for a goodness of fit test indicates that the model under consideration sufficiently explains the data.
For all LRTs, P values were estimated from 2,000 coalescent simulations of the LRT statistic. In order for our simulations to mimic the true data as much as possible, we accounted for the inferred demographic history of each population, linkage among sites, mutation rate variation, and the distribution of missing data that we observed. We first estimated the demographic parameters for each population independently (using model Mdem). To estimate the population scaled recombination rate (R), we applied a novel approach proposed by Zhu L, Feng F, Bustamante CD (in review). This method uses the variances and covariances of unphased SNPs at different frequencies to predict the local recombination rate by multiple linear regression and nonparametric bootstrap resampling. For the data in this paper, we first fit the regression model by simulating 1,000 replicate data sets under the inferred demographic model for each gene region in each population using the coalescent with R in the range {1, 2, 5, 10, 20, 50, 100, 200, 400, 1000}. Each replicate has the same number of sequences (n) and segregating sites (S) as in the observed data. We estimate the variances of the site frequencies for each replicate by nonparametric bootstrapping and use the mean of the variances over 1,000 replicates to fit the best linear regression on R. The R2 of the linear models are all above 90%. We then bootstrap to estimate the variances of the site frequencies for the observed gene region and use the linear relationship between the log of the variances and log R to predict the local recombination rate for each gene region. For gene regions with less than 10 SNPs, the recombination rate was assumed to be 0. Our estimate of the mutation rate for each gene region (independent for each population) was based on the observed number of segregating sites and the inferred demographic history.
After generating 2,000 coalescent simulations for each gene region in each population, we randomly assigned some SNPs to be of type AT
GC based on the proportion of SNPs that were observed to be of that type in our data. To account for the observed pattern of missing data, each simulated SNP was assigned to a new sample size according to the proportion of SNPs observed at each sample size. The frequency of the derived state in the reduced sample size follows a hypergeometric distribution. That is, if the frequency of the derived state of a SNP was i in the original sample size of n chromosomes, then the probability that j copies (j
i) were observed in the reduced sample of size n' chromosomes was
After generating the missing data for each of the coalescent simulations, we generated the expected SFS in a subsample of size 12 chromosomes using the same technique as in the observed data. Finally, we performed the LRTs described above on each coalescent simulation to approximate the distribution of the LRT statistic, from which we obtained our P values.
| Results and Discussion |
|---|
|
|
|---|
Under the assumption of parsimony, we identified the ancestral state of each SNP in our noncoding data set using the chimpanzee genome (see Data in Materials and Methods). We refer to this data set as the uncorrected data set. Shown in figure 1a are the normalized SFS for SNPs that decrease GC-content (i.e., GC
AT), increase GC-content (i.e., AT
GC), and preserve GC-content (i.e., other) for the uncorrected data set (pooled into bins of size 3). We found that although there is no statistical evidence that the frequency distribution of GC
AT mutations differs from GC-content preserving mutations (P value = 0.788, MWU; P value > 0.99, KS), the SFS for AT
GC mutations is significantly different from both GC
AT mutations (P value = 2.04 x 10–07, MWU; P value = 0.0004, KS) and GC-content preserving mutations (P value = 6.36 x 10–05, MWU; P value = 0.0100, KS).
|
However, because the uncorrected SFS were generated using orthologous chimpanzee sequences, ancestral misidentification of some SNPs could have occurred. We applied a correction for ancestral misidentification (Hernandez et al. [2007]
3.3%). These SNPs may actually have been GC-content decreasing SNPs, but due to ancestral misidentification, the orientation was swapped. After correcting for ancestral misidentification, we find no statistical evidence suggesting that the SFS for AT
GC SNPs differ from either GC
AT SNPs (P value = 0.0967 MWU; P value = 0.3260 KS) or GC-content preserving SNPs (P value = 0.5541 MWU; P value > 0.99 KS). This suggests that ancestral misidentification is an alternative explanation for the observed deviation of AT
GC polymorphisms from other types of polymorphisms.
|
Previous studies have fit population genetic models to observed SFS to assess the statistical evidence for positive selection (or biased gene conversion) acting on GC-content in the human genome using a chimpanzee outgroup (Duret et al. [2002]
We first tested whether this noncoding data set showed evidence for a nonstationary demographic history using an adapted version of the numerical technique developed by Williamson et al. (2005)
. This method assumes that the population experienced an instantaneous size change from the ancestral size of Na to the current size Nc (where
= Na/Nc denotes the magnitude of the change) at a time tdem in the past. Table 2 shows the parameter estimates of the neutral demographic model (Mdem), and table 3 shows that the SNM can clearly be rejected both before and after correcting for ancestral misidentification in both populations.
|
|
We then tested for a fixation bias favoring AT
GC mutations by extending the model of Williamson et al. (2005)

1) of nonlethal AT
GC mutations to have a fixation bias with strength
, whereas the remaining proportion of AT
GC mutations (as well as the other mutation classes) drift neutrally (see Materials and Methods). This model implicitly accounts for the possibility of a fixation bias that only acts in confined regions of the genome (e.g., in GC-rich regions or near recombination hot spots). Table 2 shows the parameter values obtained from each of the population genetic models we evaluated, and table 3 shows the P values obtained via simulation for each LRT. Before correcting for ancestral misidentification, we can clearly reject the neutral demographic model in both populations. However, in both populations, a goodness of fit test narrowly rejects model Mfix, suggesting that it cannot fully explain the data.
After correcting for ancestral misidentification, the evidence for a fixation bias favoring AT
GC mutations in the human genome nearly vanishes. In the African-American population, we cannot statistically reject the neutral demographic model in favor of a model allowing for a fixation bias (P = 0.643, table 3). Moreover, a goodness of fit test of the neutral demographic model in this population suggests that the neutral demographic model is sufficient to explain the data (P = 0.234, table 3).
The data for the Yoruban population is slightly more complicated. After correcting for ancestral misidentification, the neutral demographic model is narrowly rejected at the 0.05 significance level (P = 0.0192, table 3), suggesting that either there is slight evidence for an extremely weak fixation bias (
= 0.5) acting on all AT
GC mutations or the simple 2-epoch demographic model is insufficient. Goodness of fit tests suggest that although the neutral demographic model can narrowly be rejected at the 0.05 significance level (P = 0.0383), there is marginal support for the model that allows for a fixation bias (P = 0.0963).
| Conclusion |
|---|
|
|
|---|
We found that after correcting for ancestral misidentification, much of the evidence for the fixation bias favoring G and C alleles disappeared. This is because
3.3% of SNPs identified to be of type AT
GC (most of which were at very high frequency) may actually have been of type GC
AT (and at very low frequency). Such an effect might be expected because the overall mutation rate from GC
AT tends to be roughly twice as large as the rate from AT
GC along primate lineages (Hwang and Green 2004
GC substitution occurred along the human lineage, then the mutation rate at this site would, on average, double. This would, in turn double the probability of subsequently sampling a polymorphism at the same site but with a misidentified frequency based on the simple parsimony assumption. We, therefore, conclude that much of the evidence for a recent fixation bias favoring GC-content in humans based on population genetic data may be a result of failing to account for multiple hits at rapidly evolving sites between humans and chimpanzees. However, a previous study developed a weighted parsimony method to account for ancestral misidentification of human SNPs using a chimpanzee outgroup without accounting for context effects (Webster and Smith 2004
GC SNPs at very high frequency remained after their correction. Given that the SFS we have observed in this data set is consistent with our neutral simulations of ancestral misidentification (Hernandez et al. 2007
We emphasize that eliminating hypermutable CpG sites from consideration in SNP studies neither is sufficient to safeguard against this effect nor is restricting the analysis to those SNPs that have outgroup support from multiple species. Both of these techniques tend to require much of the data to be discarded (thereby leading to an ascertainment bias) without guaranteeing against further ancestral misidentification (Hernandez et al. 2007
). Rather, we recommend employing a parametric model for the data that can account for uncertainty in the ancestral states of all SNPs as well as mutation rate heterogeneity because this approach both theoretically and in simulations appears to have proper type I (false positive) error rates (Hernandez et al. 2007
).
| Acknowledgements |
|---|
|
|
|---|
We are grateful to D. G. Torgerson for helpful comments and assistance with obtaining orthologous chimpanzee sequences and 2 reviewers who provided very helpful comments. This work was funded by a National Science Foundation grant (0516310) to C.D.B.
| Footnotes |
|---|
John H. McDonald, Associate Editor
| References |
|---|
|
|
|---|
Akashi H. Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationary and free recombination. Genetics (1999) 151:221–238.
Arndt PF, Petrov DA, Hwa T. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol Biol Evol. (2003) 20:1887–1896.
Belle EMS, Duret L, Galtier N, Eyre-Walker A. The decline of isochores in mammals: an assessment of the GC content variation along the mammalian phylogeny. J Mol Evol. (2004) 58:653–660.[CrossRef][Web of Science][Medline]
Bernardi G. Isochores and the evolutionary genomics of vertebrates. Gene (2000) 241:3–17.[CrossRef][Web of Science][Medline]
Bernardi G, Hughes S, Mouchiroud D. The major compositional transitions in the vertebrate genome. J Mol Evol. (1997) 44:S44–S51.[CrossRef][Web of Science][Medline]
Bustamante CD, Wakeley J, Sawyer S, Hartl DL. Directional selection and the site-frequency spectrum. Genetics (2001) 159:1779–1788.
Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 437:69–87.[CrossRef][Medline]
Duret L, Mouchiroud D, Gautier C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol. (1995) 40:308–317.[CrossRef][Web of Science][Medline]
Duret L, Semon M, Piganeau G, Mouchiroud D, Galtier N. Vanishing GC-rich isochores in mammalian genomes. Genetics (2002) 162:1837–1847.
Eyre-Walker A. Recombination and mammalian genome evolution. Proc Biol Sci. (1993) 252:237–243.
Eyre-Walker A. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics (1999) 152:675–683.
Fryxell KJ, Zuckerkandl E. Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol. (2000) 17:1371–1383.
Fullerton SM, Bernardo Carvalho A, Clark AG. Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol. (2001) 18:1139–1142.
Galtier N, Gouy M. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol. (1998) 15:871–879.[Abstract]
Hernandez RD, Williamson SH, Bustamante CD. Context dependence, ancestral misidentification, and spurious signatures of natural selection. Mol Biol Evol. (2007) 24:1792–1800.
Hwang DG, Green P. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA (2004) 101:13994–14001.
Jabbari K, Bernardi G. CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene (1998) 224:123–127.[CrossRef][Web of Science][Medline]
Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. (2002) 12:656–664.
Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
Lercher MJ, Smith NGC, Eyre-Walker A, Hurst LD. The evolution of isochores: evidence from SNP frequency distributions. Genetics (2002) 162:1805–1810.
Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, Rieder MJ, Gowrisankar S, Aronow BJ, Weiss RB, Nickerson DA. Pattern of sequence variation across 213 environmental response genes. Genome Res. (2004) 14:1821–1831.
Macaya G, Thiery JP, Bernardi G. An approach to the organization of eukaryotic genomes at a macromolecular level. J Mol Biol. (1976) 108:237–254.[CrossRef][Web of Science][Medline]
Marth GT, Czabarka E, Murvai J, Sherry ST. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics (2004) 166:351–372.
Meunier J, Duret L. Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. (2004) 21:984–990.
Mouchiroud D, D'Onofrio G, Assani B, Macaya G, Gautier C, Bernardi G. The distribution of genes in the human genome. Gene (1991) 100:181–187.[CrossRef][Web of Science][Medline]
Nielsen R. Statistical tests of selective neutrality in the age of genomics. Heredity (2001) 86:641–647.[CrossRef][Web of Science][Medline]
Nielsen R, Hubisz MJ, Clark AG. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics (2004) 168:2373–2382.
Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C. Genomic scans for selective sweeps using SNP data. Genome Res. (2005) 15:1566–1575.
Slatkin M, Hudson RR. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics (1991) 129:555–562.[Abstract]
Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. (1999) 9:657–663.[CrossRef][Web of Science][Medline]
Spencer CCA. Human polymorphism around recombination hotspots. Biochem Soc Trans. (2006) 34:535–536.[CrossRef][Web of Science][Medline]
Spencer CCA, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G. The influence of recombination on human genetic diversity. PLoS Genet. (2006) 2:e148.[CrossRef][Medline]
Thiery JP, Macaya G, Bernardi G. An analysis of eukaryotic genomes by density gradient centrifugation. J Mol Biol. (1976) 108:219–235.[CrossRef][Web of Science][Medline]
Webster MT, Smith NGC. Fixation biases affecting human SNPs. Trends Genet. (2004) 20:122–126.[CrossRef][Web of Science][Medline]
Webster MT, Smith NGC, Ellegren H. Compositional evolution of noncoding DNA in the human and chimpanzee genomes. Mol Biol Evol. (2003) 20:278–286.
Williamson SH, Hernandez R, Fledel-Alon A, Zhu L, Nielsen R, Bustamante CD. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA (2005) 102:7882–7887.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


T or G