Skip Navigation


MBE Advance Access originally published online on October 20, 2006
Molecular Biology and Evolution 2007 24(1):23-25; doi:10.1093/molbev/msl156
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/1/23    most recent
msl156v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhao, Z.
Right arrow Articles by Jiang, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, Z.
Right arrow Articles by Jiang, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letters

Methylation-Dependent Transition Rates Are Dependent on Local Sequence Lengths and Genomic Regions

Zhongming Zhao and Cizhong Jiang

Virginia Institute for Psychiatric and Behavioral Genetics and Center for the Study of Biological Complexity, Virginia Commonwealth University

E-mail: zzhao{at}vcu.edu.


    Abstract
 TOP
 Abstract
 Methods
 Supplementary Material
 Acknowledgements
 References
 
Recently, Fryxell and Moon (2005) examined methylation-dependent transition rates (5mC deamination rates), which were calculated by the difference between the CpG transition and GpC transition rates, using 4,437 transition mutations in CpG or GpC dinucleotides. They concluded that 5mC deamination rates were highly dependent on local GC content but not on local sequence lengths over which GC content was calculated or the genomic regions where the mutations occurred. Here, we reexamined these statements by using 292,216 CpG->TpG/CpA and GpC->GpT/ApC mutations, an increase of 66 times as much data. Contrary to Fryxell and Moon's conclusions, our analysis indicated that 5mC deamination rates in the human genome were dependent on both the local sequence length and the genomic region. Some explanations for their conclusions were provided.

Key Words: CpG • GpC • mutation rate • single nucleotide polymorphisms • GC content • genomic regions

CpG dinucleotides are subject to global methylation in mammalian genomes. The transition rate of methylated CpG (5mCpG) to TpG is 10- to 50-fold higher than other transitional changes (Sved and Bird 1990Go; Fryxell and Moon 2005Go). Because cytosines in GpC dinucleotides are not methylated in mammalian genomes (Razin and Riggs 1980Go), the difference between the CpG transition rate, measured by the number of CpG->TpG/CpA per CpG dinucleotide in a sequence, and GpC transition rate, measured by the number of GpC->GpT/ApC per GpC dinucleotide, represents the rate of methylation-dependent transition or 5mC deamination rate. By applying this approach to human single nucleotide polymorphism (SNP) data, Fryxell and Moon (2005)Go indicated that the 5mC deamination rate was exponentially dependent on local GC content. Importantly, in their plots of log10(5mC deamination rate) versus GC content of SNP neighboring sequences, the slope values using linear regression analysis were close to –3.0. This is an ideal slope that was predicted based on DNA melting as a function of base composition (Fryxell and Zuckerkandl 2000Go). They further concluded that the slope of –3.0 was neither dependent on the lengths of DNA sequences where the GC content was measured nor specifically caused by the genomic regions (e.g., exons, introns, and differential methylation of CpG islands) where the SNPs were located (Fryxell and Moon 2005Go, see Conclusions).

Our recent studies indicated that the sequence context pattern observed in the local environment of SNPs depended on many factors such as genomic regions, sequence lengths, and SNP types (Zhao and Zhang 2006aGo, 2006bGo). Fryxell and Moon's study used only 4,437 SNPs including 2,568 in intergenic regions, 1,222 in introns, 187 in exons, 260 in 5' untranslated regions (UTRs), 52 in 3’ UTRs, 145 in CpG islands, and 3 others. Their general conclusions were only based on the observations on the 4,437 SNPs and 2,568 intergenic SNPs but not on the observations in each genomic region. Therefore, further investigation is warranted. Here, we reexamined this issue using a much larger data set extracted from the 8,353,499 SNPs recently released in the dbSNP database (ftp://ftp.ncbi.nih.gov/snp/, Build 124). To ensure the high quality of our data, we selected only those SNPs that were biallelic, noninsertion/deletion, validated, uniquely mapped in the human nonrepetitive sequences and whose ancestral alleles were reliably inferred by comparing with their homologous sequences in the chimpanzee genome (see Methods). These processes resulted in 292,216 CpG->TpG/CpA and GpC->GpT/ApC SNPs, 66 times more data than Fryxell and Moon's sample. These SNPs were further categorized by genomic region.

We first examined whether the 5mC deamination rates were dependent on the lengths of local sequences. Using the same method as Fryxell and Moon (2005)Go, we calculated the rates of CpG->TpG/CpA per CpG dinucleotide and GpC->GpT/ApC per GpC dinucleotide in the SNP sequences whose lengths were 101, 201, 401, 601, and 1,001 nt (fig. 1A). In each length category (e.g., 101 nt), each SNP sequence consisted of the polymorphic site (1 nt) and an equal length (50 nt) of the 5’ and 3’ flanking sequences. We then calculated the difference of CpG and GpC transition rates (fig. 1B) and plotted log10(5mC deamination rate) versus GC content for each length category (fig. 1C). Linear regression analysis confirmed the previous findings that the 5mC deamination rates are inversely correlated with GC content (Bernardi et al. 1985Go; Bernardi 1995Go). However, our slope values varied with SNP sequence lengths. Specifically, they increased when the sequence lengths decreased. For example, the slope values increased from –3.1 to –1.2 when the lengths decreased from 1,001 to 101 nt (table 1). We noted that when the SNP sequence length was 601 nt, the slope value (–2.8) was the same as Fryxell and Moon's observed value, where the modal average length of their SNP sequences was 564 nt.


Figure 1
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— 5mC deamination rates are dependent on local GC content and SNP sequence length. (A) The rate of CpG->TpG/CpA per CpG dinucleotide (solid line) and GpC->GpT/ApC per GpC dinucleotide (dashed line) in the SNP sequences where SNP GC content was calculated. (B) The 5mC deamination rate, calculated by the difference between the rates of CpG transition and GpC transition in (A). (C) Plot of log10(5mC deamination rate) versus SNP GC content. The slopes of linear regression lines were –1.2 (101 nt), –2.0 (201 nt), –2.5 (401 nt), –2.8 (601 nt), and –3.1 (1,001 nt).

 

View this table:
[in this window]
[in a new window]

 
Table 1 Summary of Linear Regression among Genomic Regions with Different Sequence Lengths

 
We next examined whether the 5mC deamination rates were dependent on genomic regions. The above analysis was performed for the SNPs in each genomic region. The CpG->TpG/CpA rates in intergenic and intronic regions were nearly the same; however, they were remarkably higher than other genomic regions. Conversely, the GpC->GpT/ApC rates were nearly the same among all genomic regions (supplementary fig. S1, Supplementary Material online). The slope values in the intergenic regions and introns were always similar regardless of the sequence lengths (table 1), reflecting that these 2 noncoding regions are nearly selectively neutral and have similar GC content. Moreover, the slope values in the overall genome were similar to those in the intergenic and intronic regions. This is largely because the number of SNPs in the intergenic and intronic regions accounted for 85% of the total SNPs. However, the slopes in the CpG islands were much smaller (or greater in absolute value) than those in the intergenic or intronic regions for all length categories. This reflects the lack of methylation and suppression of 5mC deamination in CpG islands (Zhao and Zhang 2006aGo). The slopes in the exonic regions were smaller than those in the intergenic or intronic regions for the length category 101 nt but similar for the remaining length categories (table 1). The lack of difference in the length categories 201, 401, 601, and 1,001 nt was likely because of the short length of exons, for example, an average of 145 bp in the human genome (Lander et al. 2001Go). When the SNP sequence length increased, although the SNPs were always located in exons, the part of the sequences where the CpGs and GpCs were counted could be from introns or UTRs. Therefore, for exons, the slope for length category 101 nt appears to be the only reliable estimate. Further, the slope values in the 5' UTRs, which ranged from –1.7 (101 nt) to –3.7 (1,001 nt), were more negative than in the intergenic or intronic regions. Overall, our slope values varied among genomic regions.

Now we will try to explain why some conclusions in Fryxell and Moon (2005)Go were different from our observations. First, they stated that the slope of –3.0 did not depend on the sequence lengths over which GC content was calculated because the results from their 2 analysis, one based on SNP sequence length (564 nt) and the other based on chimpanzee genomic contig length (163 kb), were essentially the same. However, large nucleotide composition biases were found at a few adjacent sites of SNPs, whereas small biases were detected within 200 nt at each flanking side of SNPs (Zhao and Boerwinkle 2002Go). Their 2 analysis failed to reveal the difference of slopes because both lengths were longer than 400 nt. Second, after they found that the slopes were close to –3.0 using SNPs in the overall genome and in intergenic regions, they implied that the slopes would remain unchanged regardless of the inclusion or exclusion of exons, introns, or CpG islands. This generalization was not directly based on the analysis of SNPs in the specific regions but, in fact, based on the fact that the intergenic and intronic SNPs accounted for 85% of the total SNPs in their study.

In summary, contrary to Fryxell and Moon's conclusions, our analysis indicated that the CpG transition rates, measured by the difference of CpG transition and GpC transition rates, in the human genome were dependent on both local sequence length and genomic region.


    Methods
 TOP
 Abstract
 Methods
 Supplementary Material
 Acknowledgements
 References
 
We used the SNP data set prepared in Jiang and Zhao (2006)Go. The details of SNP data processing, inference of ancestral alleles of SNPs, and identification of SNPs in genomic regions were described in the Materials and Methods in Jiang and Zhao (2006)Go. Here, we describe the procedures briefly. Among the 8,353,499 human reference SNPs retrieved from the dbSNP database (ftp://ftp.ncbi.nih.gov/snp/, Build 124), we extracted 2,632,415 SNPs that were noninsertion/deletion, biallelic, uniquely mapped in the nonrepetitive sequences, validated, and at least 100 nt long at each side of flanking sequences. For each of these SNPs, we obtained 500 nt at each flanking side based on the flanking sequences and the mapped contig sequences (Zhao and Zhang 2006aGo).

The ancestral alleles of these SNPs were inferred by mapping human SNPs to the chimpanzee genome using the MegaBlast program (Zhang et al. 2000Go). A total of 1,785,712 SNPs' ancestral alleles were inferred by the stringent criteria (Jiang and Zhao 2006Go). Among them, there were 292,216 transition mutations that occurred in ancestral CpG or GpC dinucleotides.

To identify these transition mutations in intergenic, intronic, and exonic regions, we compared the SNP locations with the coordinates of each well-categorized (e.g., known) intergenic, intronic, and exonic region based on the human gene annotation information from the ENSEMBL database (ftp://ftp.ensembl.org/pub/, version 32.35e) (Jiang and Zhao 2006Go). Because the annotations for UTRs were not readily available in the ENSEMBL database, we used the nonredundant human UTR data from UTResource (http://www.ba.itb.cnr.it/UTR/, Release 21). The mutations in 5’ and 3’ UTRs were identified by comparing the SNP locations with the coordinates of 5‘ and 3’ UTRs in human chromosomes. The SNPs in CpG islands were identified according to the procedure in Jiang and Zhao (2006)Go. We identified 176,518 transition mutations in intergenic regions, 71,456 in introns, 4,168 in exons, 8,027 in 5’ UTRs, 28,714 in 3’ UTRs, and 6,642 in CpG islands.


    Supplementary Material
 TOP
 Abstract
 Methods
 Supplementary Material
 Acknowledgements
 References
 
A supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Methods
 Supplementary Material
 Acknowledgements
 References
 
We are grateful to Wen-Hsiung Li for comments on the manuscript. This study was supported by the Thomas F. and Kate Miller Jeffress Memorial Trust Fund.


    Footnotes
 
Naruya Saitou, Associate Editor


    References
 TOP
 Abstract
 Methods
 Supplementary Material
 Acknowledgements
 References
 

    Bernardi G. (1995) The human genome: organization and evolutionary history. Annu Rev Genet 29:445–476.[CrossRef][Web of Science][Medline]

    Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F. (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958.[Abstract/Free Full Text]

    Fryxell KJ and Moon WJ. (2005) CpG mutation rates in the human genome are highly dependent on local GC content. Mol Biol Evol 22:650–658.[Abstract/Free Full Text]

    Fryxell KJ and Zuckerkandl E. (2000) Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol 17:1371–1383.[Abstract/Free Full Text]

    Jiang C and Zhao Z. (2006) Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms. Genomics 88:527–534.[CrossRef][Web of Science][Medline]

    Lander ES, Linton LM, Birren B, et al. (255 co-authors). (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921.[CrossRef][Medline]

    Razin A and Riggs AD. (1980) DNA methylation and gene function. Science 210:604–610.[Abstract/Free Full Text]

    Sved J and Bird A. (1990) The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc Natl Acad Sci USA 87:4692–4696.[Abstract/Free Full Text]

    Zhang Z, Schwartz S, Wagner L, Miller W. (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203–214.[CrossRef][Web of Science][Medline]

    Zhao Z and Boerwinkle E. (2002) Neighboring-nucleotide effects on single nucleotide polymorphisms: a study of 2.6 million polymorphisms across the human genome. Genome Res 12:1679–1686.[Abstract/Free Full Text]

    Zhao Z and Zhang F. (2006a) Sequence context analysis of 8.2 million single nucleotide polymorphisms in the human genome. Gene 366:316–324.[CrossRef][Web of Science][Medline]

    Zhao Z and Zhang F. (2006b) Sequence context analysis in the mouse genome: single nucleotide polymorphisms and CpG island sequences. Genomics 87:68–74.[CrossRef][Web of Science][Medline]

Accepted for publication October 17, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
C. Jiang, L. Han, B. Su, W.-H. Li, and Z. Zhao
Features and Trend of Loss of Promoter-Associated CpG Islands in the Human and Mouse Genomes
Mol. Biol. Evol., September 1, 2007; 24(9): 1991 - 2000.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/1/23    most recent
msl156v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhao, Z.
Right arrow Articles by Jiang, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhao, Z.
Right arrow Articles by Jiang, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?