MBE Advance Access originally published online on May 2, 2007
Molecular Biology and Evolution 2007 24(8):1622-1626; doi:10.1093/molbev/msm080
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Inference of Expression-Dependent Negative Selection Based on Polymorphism and Divergence in the Human Genome
Department of Biomedical Resources, National Institute of Biomedical Innovation, Osaka, Japan
E-mail: nosada{at}nibio.go.jp.
| Abstract |
|---|
|
|
|---|
There is a mounting evidence for the correlation between the gene expression pattern and sequence divergence. However, little is known about the relationship between the gene expression pattern and polymorphism. We compiled the gene expression, polymorphism, and divergence data from the public databases of the human genome. The ratios of nonsynonymous (A) to synonymous (S) substitutions in polymorphism and divergence in the human genome were strongly influenced by the expression pattern and breadth of genes and showed strong correlations. Among the tissues we analyzed, the brain-expressed genes have the smallest and the liver-expressed genes have the largest proportion of amino acid changes both in polymorphism and divergence. The analysis implies that negative selection is the primary factor affecting expression-dependent gene evolution and the prevalent but nonuniform distribution of slightly deleterious mutations in the genome. Although the genes under relaxed negative selection evolved faster than the other genes, these genes are even more liable to slightly deleterious mutations in the population. On the other hand, nonneutral mutations in the highly conservative genes, such as brain-expressed and housekeeping genes, are largely deleterious and eliminated before they enter the population.
Key Words: gene expression human genome slightly deleterious mutations natural selection McDonald–Kreitman test
| Introduction |
|---|
|
|
|---|
Since the advent of the genome sequencing of humans and closely related species using information based on polymorphisms, many studies have demonstrated that natural selection has played an important role in shaping our genome (Clark et al. 2003; Bustamante et al. 2005
Combined with divergence data, polymorphism data provides a more efficient method for measuring the intensity of selection on genes (McDonald and Kreitman 1991
). Previous studies have challenged to estimate the distribution of intensity of selection on the genome in many organisms, such as the Arabidopsis, Drosophila, and humans (Bustamante et al. 2002
; Fay et al. 2002
; Smith and Eyre-Walker 2002
; Piganeau and Eyre-Walker 2003
; Lu and Wu 2005
; Zhang and Li 2005
). A few caveats should be considered when we test the selection on the genome using the polymorphism and divergence data (e.g., McDonald–Kreitman test). There is an assortment bias in current single-nucleotide polymorphism (SNP) data without complete resequencing. Another important assumption is that the selective constraint on a gene and effective population size remains constant over time. These distortions are hardly adjusted without illuminating the contrast among the different classes of genes within the genome.
In general, higher A/S ratio in divergence is explained by the relaxation of functional constraint or positive selection on genes. However, distinguishing positive selection from relaxed evolution is often a difficult task because one single amino acid change can make the protein function beneficial without any other modifications in the gene. Especially, whether the fast evolution in the reproductive proteins is due to positive selection or not is a debating issue (Rooney and Zhang 1999
; Osada et al. 2005
). The generality of prevalence of positive selection can be tested through investigating the amount of polymorphisms relative to the amount of divergence in the tissue-specific genes.
Here we compiled the gene expression, polymorphism, and divergence data in the human genome and showed that the pattern of polymorphism and divergence is strongly affected by the pattern of gene expression. The distribution of the intensity of selection on the human genome, which has a large impact on the further studies of medical and evolutionary genetics, is also discussed.
| Materials and Methods |
|---|
|
|
|---|
Gene Expression
We compiled gene expression data for 18 human tissues from the Genomic Institute of the Novartis Research Foundation (GNF) Gene Expression Database (Su et al. 2004
Sequence Polymorphism and Divergence
Polymorphism data on the human autosomes were downloaded from the HapMap database (version July 2006; The International Hapmap Consortium 2003
). SNPs of low quality in any of 4 sampled populations (Yoruba, Han Chinese, Japanese, and Europeans) were filtered out. The annotation of SNPs was obtained from dbSNP v126 (http://www.ncbi.nlm.nih.gov/SNP/). We used all polymorphic sites in the HapMap sample for the following analysis. Exclusion of rare polymorphisms (<1% derived allele frequency) reduced the number of segregating sites to 85% (7415 to 6297 sites) and A/S ratio in polymorphism, but the definition of SNP did not qualitatively change our conclusion. The ancestry of polymorphism was inferred using the human–chimpanzee genome alignment downloaded from the University of California, Santa Cruz genome browser (http://genome.ucsc.edu/). In order to estimate the divergence of genes between humans and chimpanzees, the nonredundant human RefSeq sequences on autosomes were aligned with the predicted chimpanzee cDNA sequences from Ensembl (PanTro2; http://www.ensembl.org/) using ClustalW (Thompson et al. 1994
). The pairs showing synonymous divergence of more than 10% by the Li–Pamilo–Bianchi method (Li 1993
; Pamilo and Bianchi 1993
) were removed from the analysis to exclude the paralogous comparisons. In total, we obtained 5159 genes that were expressed at least in one tissue out of 18 analyzed tissues and have the chimpanzee orthologs. We assume that the time of divergence is much greater than the age of polymorphisms and that we can therefore ignore any contribution polymorphism makes to the apparent divergence. For each expression class and expression breadth, we randomly sampled the genes of the same sample size for 100 times with replacement from the original data set and estimated the 95% confidence intervals.
| Results and Discussion |
|---|
|
|
|---|
We collected gene expression data for 18 human tissues from the GNF Gene Expression Database (Su et al. 2004
|
As shown in figure 1, in each expression class, the A/S ratios in polymorphism and divergence show a significant, high correlation (R2 = 0.673, P < 10–5). This correlation implies the prevalence of negative selection both in polymorphism and divergence in the human genome. For example, the genes expressed in the liver have the highest A/S ratio between the human and chimpanzee. One may suspect that the high A/S ratio in the liver is because of positive selection on some set of genes that elevated the nonsynonymous substitution rate. However, in figure 1 and table 1, the liver-expressed genes also show the highest A/S ratio in polymorphism. According to the population genetics theory, advantageous mutations contribute mainly to sequence divergence but much less to polymorphisms because they are quickly fixed in populations (McDonald and Kreitman 1991
|
Figure 1 represents an additional interesting finding. The neutral theory of molecular evolution predicts that the A/S ratio should be the same for both polymorphism and divergence (Kimura 1983
The A/S ratios in polymorphism and divergence also showed a strong correlation with the expression breadth of genes, that is, number of tissues where the genes were expressed. Figure 2 shows that the genes become less polymorphic and less divergent according to the increase of the expression breadth, supporting that negative selection is the primary reason for the elevation of A/S ratios in divergence for the tissue-specific genes. Even though the tissue-specific genes have evolved rapidly than the other widely expressed genes, the tissue-specific genes are heterogeneous and consist of the both slowly and rapidly evolved genes. Therefore, it is not surprising that the increase of the excess of A/S ratio in polymorphism according to the evolutionary rate, which we observed in figure 1, was not equally recapitulated in figure 2.
|
The results may be affected by the definition of tissue-specific and housekeeping genes. We changed the criteria for the tissue specificity and compared the results. In general, with the more or less stringent definitions, the characters of tissue-specific groups were enhanced or degraded. On the other hand, under the loose criteria, the number of genes in the categories increased and the variance of A/S ratio was reduced; the correlation became more significant. In this report, we defined that the tissue-specific genes were expressed in up to 5 tissues in order to obtain the sufficient number of genes in each category. However, the correlation between A/S ratios in polymorphism and divergence were consistently significant despite of the different definition of the tissue-specific genes. The correlation is significant even if we use the genes exclusively expressed in the tissues (R2 = 0.562, P < 10–3) and all genes expressed in the tissues (R2 = 0.814, P < 10–6).
The assortment bias in the HapMap data has probably resulted in slightly more nonsynonymous SNPs than synonymous SNPs compared with the neutral expectation because nonsynonymous SNPs have been more intensively sought. Furthermore, common frequency SNPs tend to be overrepresented in the HapMap data. The overrepresented common frequency SNPs would have low A/S ratio compared with the rare SNPs and reduce the overall A/S ratio (Fay et al. 2002
). These 2 assortment biases would work to the opposite directions, and it is difficult to accurately correct the biases. In this report, we did not evaluate the absolute A/S ratios but contrasted the A/S ratios in different categories of genes, assuming that there is no systematic assortment bias of the SNP discovery rate among the genes with different expression patterns.
In conclusion, we found that the gene expression pattern is a strong determinant of the intensity of negative selection on the human genome. Although positive selection on some genes accelerates amino acid substitutions, the general pattern is mainly determined by the intensity of negative selection. While the genes under relaxed negative selection evolved faster than the other genes, these genes are even more liable to slightly deleterious mutations in the population. On the other hand, nonneutral mutations in the highly conservative genes are largely deleterious and eliminated before they enter the population.
| Supplementary Material |
|---|
|
|
|---|
Supplementary fig.1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Chung-I Wu, Hideki Innan, Jun Kusuda, and Katsuyuki Hashimoto for helpful comments. The author also thanks the anonymous reviewers for helpful suggestions. This study was supported by a grant from International Council on Amino Acid Science and a Health Science Research grant from the Ministry of Health, Labor and Welfare of Japan.
| Footnotes |
|---|
Takashi Gojobori, Associate Editor
| References |
|---|
|
|
|---|
Bustamante CD, Fledel Alon A, Williamson S. (14 co-authors). Natural selection on protein-coding genes in the human genome. Nature (2005) 437:1153–1157.[CrossRef][Medline]
Bustamante CD, Nielsen R, Sawyer SA, Olsen KM, Purugganan MD, Hartl DL. The cost of inbreeding in Arabidopsis. Nature (2002) 416:531–534.[CrossRef][Medline]
Charlesworth B, Coyne JA, Barton NH. The relative rates of evolution of sex chromosomes and autosomes. Am Nat (1987) 130:113–146.[CrossRef][Web of Science]
Chimpanzee Sequencing and AnalysisConsortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 437:69–87.[CrossRef][Medline]
Clark AG, Glanowski S, Nielsen R. (17 co-authors). Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science (2003) 302:1960–1963.
Comeron JM. Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans. Proc Natl Acad Sci USA (2006) 103:6940–6945.
Duret L, Mouchiroud D. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol (2000) 17:68–74.
Enard W, Khaitovich P, Klose J. (14 co-authors). Intra- and interspecific variation in primate gene expression patterns. Science (2002) 296:340–343.
Eyre-Walker A. Changing effective population size and the McDonald-Kreitman test. Genetics (2002) 162:2017–2024.
Fay JC, Wyckoff GJ, Wu CI. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature (2002) 415:1024–1026.[CrossRef][Medline]
Hastings KE. Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J Mol Evol (1996) 42:631–640.[CrossRef][Web of Science][Medline]
Khaitovich P, Hellmann I, Enard W, Nowick K, Leinweber M, Franz H, Weiss G, Lachmann M, Paabo S. Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science (2005) 309:1850–1854.
Kimura M. The neutral theory of molecular evolution (1983) Cambridge: Cambridge University Press.
Li WH. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol (1993) 36:96–99.[CrossRef][Web of Science][Medline]
Lu J, Wu CI. Weak selection revealed by the whole-genome comparison of the X chromosome and autosomes of human and chimpanzee. Proc Natl Acad Sci USA (2005) 102:4063–4067.
Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res (2005) 33:D54–D58.
McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature (1991) 351:652–654.[CrossRef][Medline]
Nielsen R, Bustamante C, Clark AG. (13 co-authors). A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol (2005) 3:e170.[CrossRef][Medline]
Ohta T. Slightly deleterious mutant substitutions in evolution. Nature (1973) 246:96–98.[CrossRef][Medline]
Osada N, Hirata M, Tanuma R. (11 co-authors). Substitution rate and structural divergence of 5'UTR evolution: comparative analysis between human and cynomolgus monkey cDNAs. Mol Biol Evol (2005) 22:1976–1982.
Pamilo P, Bianchi NO. Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. Mol Biol Evol (1993) 10:271–281.[Abstract]
Piganeau G, Eyre-Walker A. Estimating the distribution of fitness effects from DNA sequence data: implications for the molecular clock. Proc Natl Acad Sci USA (2003) 100:10335–10340.
Rooney AP, Zhang J. Rapid evolution of a primate sperm protein: relaxation of functional constraint or positive Darwinian selection? Mol Biol Evol (1999) 16:706–710.[Abstract]
Smith NG, Eyre-Walker A. Adaptive protein evolution in Drosophila. Nature (2002) 415:1022–1024.[CrossRef][Medline]
Su AI, Wiltshire T, Batalov S. (13 co-authors). A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA (2004) 101:6062–6067.
The International HapMap Consortium. The International HapMap Project. Nature (2003) 426:789–796.[CrossRef][Medline]
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.
Wang HY, Chien HC, Osada N, Hashimoto K, Sugano S, Gojobori T, Chou CK, Tsai SF, Wu CI, Shen CK. Rate of evolution in brain-expressed genes in humans and other primates. PLoS Biol (2006) 5:e13.[Medline]
Zhang L, Li WH. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol (2004) 21:236–239.
Zhang L, Li WH. Human SNPs reveal no evidence of frequent positive selection. Mol Biol Evol (2005) 22:2504–2507.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Osada, S. Mano, and J. Gojobori Quantifying dominance and deleterious effect on human disease genes PNAS, January 20, 2009; 106(3): 841 - 846. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Foxe, V.-u.-N. Dar, H. Zheng, M. Nordborg, B. S. Gaut, and S. I. Wright Selection on Amino Acid Substitutions in Arabidopsis Mol. Biol. Evol., July 1, 2008; 25(7): 1375 - 1383. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



