MBE Advance Access originally published online on July 25, 2007
Molecular Biology and Evolution 2007 24(10):2235-2241; doi:10.1093/molbev/msm152
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
The Nonsynonymous/Synonymous Substitution Rate Ratio versus the Radical/Conservative Replacement Rate Ratio in the Evolution of Mammalian Genes


* Department of Ecology and Evolution, University of Chicago
Department of Plant Biology, Michigan State University
E-mail: whli{at}uchicago.edu.
| Abstract |
|---|
|
|
|---|
There are 2 ways to infer selection pressures in the evolution of protein-coding genes, the nonsynonymous and synonymous substitution rate ratio (KA/KS) and the radical and conservative amino acid replacement rate ratio (KR/KC). Because the KR/KC ratio depends on the definition of radical and conservative changes in the classification of amino acids, we develop an amino acid classification that maximizes the correlation between KA/KS and KR/KC. An analysis of 3,375 orthologous gene groups among 5 mammalian species shows that our classification gives a significantly higher correlation coefficient between the 2 ratios than those of existing classifications. However, there are many orthologous gene groups with a low KA/KS but a high KR/KC ratio. Examining the functions of these genes, we found an overrepresentation of functional categories related to development. To determine if the overrepresentation is stage specific, we examined the expression patterns of these genes at different developmental stages of the mouse. Interestingly, these genes are highly expressed in the early middle stage of development (blastocyst to amnion). It is commonly thought that developmental genes tend to be conservative in evolution, but some molecular changes in developmental stages should have contributed to morphological divergence in adult mammals. Therefore, we propose that the relaxed pressures indicated by the KR/KC ratio but not by KA/KS in the early middle stage of development may be important for the morphological divergence of mammals at the adult stage, whereas purifying selection detected by KA/KS occurs in the early middle developmental stage.
Key Words: positive selection radical substitution conservative substitution classification of amino acids development
| Introduction |
|---|
|
|
|---|
Selection pressure on protein-coding sequences is commonly estimated by the ratio of the nonsynonymous substitution rate (KA) to the synonymous substitution rate (KS) (Li and Gojobori 1983
In the present study, we searched for an amino acid classification that gives the best correlation between the 2 ratios. This amino acid classification is useful because the KR/KC ratio based on this classification can identify genes undergoing similar selection pressures inferred by the KA/KS ratio between distant protein-coding sequences.
Another issue is that it is likely that the 2 ratios are not completely correlated even if the amino acid classification that gives the maximum correlation between the 2 ratios is used. To address the differences between the selection pressures inferred by KA/KS and KR/KC in the evolution of mammalian genes, we examined functions of genes that showed different selection pressures inferred by the 2 ratios using Gene Ontology (GO) categories and expression data of a representative mammal, the mouse.
| Materials and Methods |
|---|
|
|
|---|
Construction of Orthologous Groups
cDNA data of 5 mammalian species were retrieved from the Ensembl database (http://www.ensembl.org): Homo sapiens (NCBI35.may), Pan troglodytes (CHIMP1.may), Mus musculus (NCBIM33.may), Rattus norvegicus (RGSC3.4.may), and Canis familiaris (BROADD1.may). Reciprocal best hits between every combination of 2 species were identified with Blastp (Altschul et al. 1997
|
The orthologous gene groups in the 5 mammalian species were determined as follows. The orthologous gene data were carefully constructed to reduce errors for estimating nucleotide and amino substitutions. Only segments aligned among the 5 species without any gaps were used for the calculation of the KA/KS and KR/KC ratios.
Estimation of KA/KS and KR/KC in Each Orthologous Gene Set
A phylogenetic tree was reconstructed for each orthologous gene group by the NJ method (Saitou and Nei 1987
). The ancestral sequence was inferred at each node in the phylogenetic tree using the maximum likelihood method (Yang et al. 1995
). The transition/transversion ratio was estimated in each orthologous group, and the ratio was then used to estimate KA and KS in all branches in the phylogenetic tree by the modified Nei–Gojobori method (Zhang et al. 1998
). The sums of KA and KS of all branches were used to determine the KA/KS ratio in each orthologous gene group.
Radical and conservative changes were defined by a classification (A) that gave the best correlation between KR/KC and KA/KS and also by 3 previous classifications with respect to the chemical properties: (B) polarity and volume, (C) charge and aromaticity, and (D) charge and polarity (Zhang 2000
; Hanada et al. 2006
) (table 1). These so-called physicochemical properties (aromaticity, charge, polarity, and volume) are thought to be relevant for the evolution of proteins (Grantham 1974
; Miyata et al. 1979
). Based on the ancestral sequences inferred at all nodes in the phylogenetic tree of each orthologous group, KR and KC were estimated in all branches in the phylogenetic tree by the Zhang method (Zhang 2000
). The sums of branch lengths that reflected KR and KC were used to determine the KR/KC ratio in each orthologous group. Average KA, KS, KR, and KC in each branch of species tree among 3,375 orthologous groups are given in Supplement A (Supplementary Material online).
|
Construction of a New Amino Acid Classification
To estimate the average KA/KS ratio for each amino acid replacement, we collected from the orthologous gene groups the amino acid replacements that had occurred. The average KA/KS ratio for each type of amino acid replacement is defined to be the average KA/KS ratio in the collected orthologous gene groups. The average KA/KS ratios were estimated for each of the 75 kinds of amino acid replacement occurring by single nucleotide substitution. Because the amino acid replacement having a low (high) KA/KS ratio should tend to be a conservative (radical) change in the highly associated classification, radical and conservative scores were numbered for 75 types of amino acid replacement in descending (ascending) order of KA/KS (Supplement B, Supplementary Material online). Using the radical and conservative scores for the 75 types of amino acid replacement, we calculated the totals of radical and conservative scores for each amino acid classification. To find an amino acid classification that would give the maximum correlation between KR/KC and KA/KS, amino acids were classified into 2–5 groups in all possible combinations and we identified the classification with the highest score. The new classification is regarded as the amino acid classification that can more adequately characterize the relationship between KA/KS and KR/KC.
Functional Categories by GO
Orthologous gene groups with the top and bottom 10% KA/KS or KR/KC values were considered as relaxed selection groups and purifying selection groups, respectively. Under this classification, there are 4 possible combinations for the orthologous gene groups: 1) relaxed selection groups inferred by both KA/KS and KR/KC (a high KA/KS and a high KR/KC), 2) purifying selection groups inferred by both KA/KS and KR/KC (a low KA/KS and a low KR/KC), 3) relaxed and purifying selection groups inferred by KA/KS and by KR/KC (a high KA/KS and a low KR/KC), respectively, and 4) purifying selection and relaxed selection groups inferred by KA/KS and by KR/KC (a low KA/KS and a high KR/KC), respectively.
GO assignments for the mouse genes were obtained from the mouse genome database (Hill et al. 2002
). To simplify functional interpretation, we used the GO categories of biological processes from top to the fourth depth in the hierarchy. The expected proportion of each GO category assigned by the mouse genes was compared with the observed proportion of each GO category assigned by the mouse genes of orthologous gene groups undergoing different selection pressures by the chi-square test. When the observed proportion is significantly higher than the expected proportion in a given GO category (P < 0.05), the hierarchical pathways from the root to the overrepresented GO category were shown by the Graphviz software (http://www.graphviz.org).
The Expression Pattern at a Developmental Stage
The mouse expression data set covering various stages of mouse development (Ringwald et al. 2001
) was used to determine the relationships between gene expression and the nature of selection pressure as determined by the KA/KS and KR/KC measures. Among different selection pressures, we compared the expression bias of genes at a developmental stage by the following equation:
|
|
| Results |
|---|
|
|
|---|
A New Classification of Amino Acids
To find a new classification that yields the maximum correlation between KA/KS and KR/KC, we first constructed all possible combinations in which the 20 amino acids can be classified into 2–5 groups. Second, a table representing the average KA/KS ratio for each type of amino acid replacement was constructed to see what kinds of amino acid replacements more adequately characterize the KA/KS ratio (Supplement B, Supplementary Material online). Based on the table, a new classification of amino acids with a higher correlation between the KA/KS ratio and the radical or conservative change was constructed (Classification A in table 1). In the new classification, amino acids are classified into basic, acidic, and neutral charges. The aromatic amino acids belong to the group of the basic charges because one of the aromatic amino acids has a basic charge. The amino acids with neutral charge are classified into small and large volumes that fall into distinct groups. Consequently, this new classification seems to be constructed with respect to the chemical properties of charge, aromaticity, and volume.
Correlation between KR/KC and KA/KS
Using 3 existing amino acid classifications and our new classification, we estimated 4 KR/KC ratios for each orthologous gene group. The 4 KR/KC ratios were significantly positively correlated with each other (P < 0.01) (table 2). In terms of the correlation between KR/KC and KA/KS, the correlation coefficient in the new classification (A, r = 0.48, table 2) was expected to be the highest among the 4 chemical classifications because the new classification (A) was constructed by the chemical properties associated with the KA/KS ratio. In fact, the correlation coefficient between KA/KS and KR/KC based on the new classification is significantly higher than those based on the other 3 classifications (P < 0.01), though the other 3 KR/KC ratios are also each positively correlated with the KA/KS ratio (P < 0.01) (fig. 2).
|
|
However, even under the new classification, which gives the highest correlation between the 2 ratios, the correlation coefficient is less than 0.5, indicating that selective pressures inferred by the KR/KC ratio and by the KA/KC ratio differ substantially. In particular, there are many orthologous gene groups with a low KA/KS and a high KR/KC ratio (fig. 2). These orthologous gene groups have likely undergone relaxed selection in radical amino acid substitutions as indicated by the KR/KC ratio but experienced purifying selection in nonsynonymous changes as indicated by the KA/KS ratio.
Overrepresented Functional Categories Undergoing Opposite Selection Pressures Inferred by 2 Ratios
There are 4 types of selection pressures experienced by the orthologous gene groups. The number of orthologous gene groups that experienced relaxed or purifying selection pressures in the 2 ratios is shown in table 3, and the gene lists are given in Supplement C (Supplementary Material online). Because KA/KS was on the whole positively correlated with KR/KC in mammals, a larger number of groups undergoing the same selection pressures in the 2 ratios was found in comparison with the number of groups that underwent the opposite selection pressures in the 2 ratios. The groups with the opposite selection pressures are only found in a high KR/KC and a low KA/KS ratio.
|
To assess the functions of groups that underwent different selection pressures, we examined significantly overrepresented GO categories of mouse genes in orthologous gene groups subject to each type of selection pressures (fig. 3; Supplement D, Supplementary Material online). The overrepresented functions of genes with a high KR/KC and a high KA/KS ratio are related to "response to stimulus" and "physiological process." In particular, several functions related to defense response can be clearly found in these genes. Because genes related to defense response are in general accepted as genes undergoing positive selection, these results seem biologically reasonable. On the other hand, the overrepresented functions of genes with a low KA/KS ratio are related to development. This result is also reasonable because most of the genes related to development are subject to purifying selection based on the KA/KS ratio between distantly related species (Powell et al. 1993
|
To further examine the different gene functions between the high and the low KR/KC ratios in mammalian development, we examined the expression of mouse genes with different selection pressures using the mouse expression data set covering various stages of development (fig. 4A and B). Genes subject to purifying selection based on both ratios are expressed at high levels at the early developmental stages (one cell egg to blastocyst). On the other hand, genes subject to purifying selection indicated by KA/KS but relaxed selection indicated by KR/KC were expressed predominantly in the early middle stage of development (blastocyst to amnion). The relaxed pressures indicated solely by the KR/KC ratio in the early middle stage of development may be important for the divergent evolution in mammals.
|
| Discussion |
|---|
|
|
|---|
The key finding of the present study is that a positive correlation between KA/KS and KR/KC at a genomic scale is observed in all amino acid classifications, indicating that the 2 tests of selection pressure give similar conclusions in mammalian evolution. In particular, the KR/KC ratio of the new classification is useful for estimating selection pressure between distantly related sequences (Gojobori 1983
However, a major limitation in substituting KR/KC for KA/KS is that, even when we used the new classification aimed at maximizing the correlation between KR/KC and KA/KS, the correlation between KR/KC and KA/KS is still less than 0.5. There are potentially 2 reasons why the 2 ratios are not highly correlated. One reason is biological. For some genes, KR/KC may not be related to the type of natural selection identified by KA/KS. The other reason is technical. In the computation of the KR/KC ratio, radical and conservative changes were defined as amino acid replacements between groups and within groups, respectively. In view of the fact that the radical and conservative changes are defined to be always "0" or "1," the KR/KC ratio may not fully represent the selection pressure of amino acid replacements.
We note that there are many orthologous gene groups with a low KA/KS and a high KR/KC as outliers. To address the opposite selection pressures, we examined the functions of mouse genes and found that functional categories related to development were overrepresented in these genes. We then examined these gene expression patterns at different developmental stages. The mouse genes that underwent such selection pressures tend to be overexpressed in the early middle developmental stages. Richardson (1999)
proposed that the early middle developmental stages were important for speciation of mammals because these are the stages when many adult traits are specified even if these stages were conservative in the morphological level. Therefore, we propose that the relaxed selection pressures indicated by KR/KC but not by KA/KS in the early middle developmental stages may be important for the morphological divergence of mammals at the adult stage, whereas purifying selection detected by KA/KS tends to occur in the early middle developmental stages. The differences in the selection pressures assessed by KA/KS and KR/KC indicate that, although genes involved in development have strong constraints in amino acid substitutions, radical changes in the substitutions permitted are likely important for developmental divergence of adult mammals. Thus, opposite selection pressures in the 2 ways might play an important role in the evolution of genes related to development in mammals.
In summary, we inferred 3,375 orthologous gene groups in 5 mammalian species in a stringent manner. KR/KC is positively correlated with KA/KS. The correlation was observed in each of 4 chemical classifications taking account of aromaticity, charge, polarity, or volume. In particular, the chemical classification for aromaticity, charge, and volume led to the highest correlation between these 2 ratios. Moreover, the genes with high KR/KC but low KA/KS were overrepresented with genes expressed at a high level in the early middle developmental stages. The selection pressures at these developmental stages may be important for the morphological diversification of mammals.
| Supplementary Materials |
|---|
|
|
|---|
Supplementary materials are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank the members of our laboratories for valuable comments and discussion. This study was supported by National Institute of Health grant (GM30998) to W.-H. L. and an National Science FoundationNSF grant (DBI-0638591) to S.-H. S.
| Footnotes |
|---|
Takashi Gojobori, Associate Editor
| References |
|---|
|
|
|---|
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.
Gojobori J, Tang H, Akey JM, Wu CI. Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution. Proc Natl Acad Sci USA (2007) 104:3907–3912.
Gojobori T. Codon substitution in evolution and the "saturation" of synonymous changes. Genetics (1983) 105:1011–1027.
Grantham R. Amino acid difference formula to help explain protein evolution. Science (1974) 185:862–864.
Hanada K, Gojobori T, Li WH. Radical amino acid change versus positive selection in the evolution of viral envelope proteins. Gene (2006) 385:83–88.[CrossRef][Web of Science][Medline]
Hill DP, Blake JA, Richardson JE, Ringwald M. Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. (2002) 12:1982–1991.
Hughes AL, Nei M. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature (1988) 335:167–170.[CrossRef][Medline]
Hughes AL, Ota T, Nei M. Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules. Mol Biol Evol. (1990) 7:515–524.[Abstract]
Li WH, Gojobori T. Rapid evolution of goat and sheep globin genes following gene duplication. Mol Biol Evol. (1983) 1:94–108.[Abstract]
Miyata T, Miyazawa S, Yasunaga T. Two types of amino acid substitutions in protein evolution. J Mol Evol. (1979) 12:219–236.[CrossRef][Web of Science][Medline]
Powell JR, Caccone A, Gleason JM, Nigro L. Rates of DNA evolution in Drosophila depend on function and developmental stage of expression. Genetics (1993) 133:291–298.[Abstract]
Richardson MK. Vertebrate evolution: the developmental origins of adult variation. Bioessays (1999) 21:604–613.[CrossRef][Web of Science][Medline]
Ringwald M, Eppig JT, Begley DA, Corradi JP, McCright IJ, Hayamizu TF, Hill DP, Kadin JA, Richardson JE. The mouse gene expression database (GXD). Nucleic Acids Res. (2001) 29:98–101.
Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. (1987) 4:406–425.[Abstract]
Slack JM, Holland PW, Graham CF. The zootype and the phylotypic stage. Nature (1993) 361:490–492.[CrossRef][Medline]
Smith JM, Smith NH. Synonymous nucleotide divergence: what is "saturation"? Genetics (1996) 142:1033–1036.[Abstract]
Smith NG. Are radical and conservative substitution rates useful statistics in molecular evolution? J Mol Evol. (2003) 57:467–478.[CrossRef][Web of Science][Medline]
Tang H, Wyckoff GJ, Lu J, Wu CI. A universal evolutionary index for amino acid changes. Mol Biol Evol. (2004) 21:1548–1556.
Thompson JD, Higgins DG, Gibson TJ. Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
Yang Z, Kumar S, Nei M. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics (1995) 141:1641–1650.[Abstract]
Zhang J. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J Mol Evol. (2000) 50:56–68.[Web of Science][Medline]
Zhang J, Rosenberg HF, Nei M. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA (1998) 95:3708–3713.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Ramsden, E. C. Holmes, and M. A. Charleston Hantavirus Evolution in Relation to Its Rodent and Insectivore Hosts: No Evidence for Codivergence Mol. Biol. Evol., January 1, 2009; 26(1): 143 - 153. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




