Skip Navigation


MBE Advance Access originally published online on August 10, 2006
Molecular Biology and Evolution 2006 23(11):2131-2133; doi:10.1093/molbev/msl086
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/11/2131    most recent
msl086v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Choi, S. S.
Right arrow Articles by Lahn, B. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Choi, S. S.
Right arrow Articles by Lahn, B. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Systematically Assessing the Influence of 3-Dimensional Structural Context on the Molecular Evolution of Mammalian Proteomes

Sun Shim Choi*, Eric J. Vallender*,{dagger} and Bruce T. Lahn*

* Howard Hughes Medical Institute, Department of Human Genetics, University of Chicago
{dagger} Committee on Genetics, University of Chicago

E-mail: blahn{at}bsd.uchicago.edu.


    Abstract
 TOP
 Abstract
 Materials and Methods
 Supplementary Material
 References
 
The 3-dimensional (3D) structural context of amino acid residues in a protein could significantly impact the level of selective constraint on the residues. Here, by analyzing 767 mammalian proteins, we systematically investigate how various 3D structural contexts influence selective constraint. The structural contexts we examined include solvent accessibility, secondary structure, and intramolecular residue–residue interactions. Through this analysis, we offer quantitative information on how 3D structural contexts affect the level of selective constraint.

Key Words: proteome evolution • 3-dimensional structure • selective constraint

The functional specificity of a protein stems from the intricate 3-dimensional (3D) structure of the protein. As such, the local structural context of amino acid residues within the protein should significantly affect the level of selective constraint operating on the residues. Although this notion is readily assumed by many investigators (Bao and Cui 2005Go; Karchin et al. 2005Go; Porto et al. 2005Go), only a few studies have directly examined the influence of 3D structural context on selective constraint (Goldman et al. 1998Go; Bustamante et al. 2000Go; Mintseris and Weng 2005Go). Given that these studies typically utilized a small number of proteins, their conclusions are qualitative rather than quantitative and may not represent genome-wide patterns. In this study, we performed a combination of 3D structural analysis and phylogenetic analysis on a large number of mammalian proteins. This large-scale study allowed us, for the first time, to quantitatively assess the extent to which various 3D structural contexts affect levels of selective constraint on amino acid residues in proteins.

We collected all available human–rat orthologous pairs for which 3D structural information can be obtained for the human protein and for which there is at least one amino acid difference between human and rat orthologs. This resulted in a set of 767 human–rat orthologous pairs. We will only report results based on human structures because rat structures produced essentially the same findings.

We first investigated how solvent accessibility of amino acid residues influences selective constraint. We divided all the residues into 2 categories: likely buried and likely exposed. We found that, on average, buried residues have 26% lower human–rat replacement rate as compared with exposed residues, though both have much lower rates than the neutral expectation (fig. 1A). This finding shows that buried residues evolve under considerable stronger constraint than exposed residues.


Figure 1
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Effect of 3D structural context on protein sequence evolution. (A) Amino acid replacement rates under various 3D structural contexts. For each structural context, the number of replacements over the total number of residues is given in parentheses. Neutral expectation was calculated by computer simulation and error bars obtained from bootstrap resampling. (B) Breakdown of amino acid replacements into conservative, moderate, radical, and very radical groups. Solid bars represent amino acid replacements of interacting residues; open bars represent amino acid replacements of noninteracting residues. Error bars were obtained from bootstrap resampling.

 
We next examined the influence of secondary structural context on selective constraint. We considered 4 secondary structures: helix, strand, loop, and turn. Residues in helices and strands showed significantly lower replacement rate as compared with residues in loops and turns, with helices showing a slightly lower rate than strands (fig. 1A). As helices and strands are generally considered to be more ordered than loops and turns, this finding suggests that portions of proteins that are more likely to be ordered tend to evolve under greater constraint.

Protein function is critically dependent on 3D protein structure, and the structure of a protein, in turn, is critically dependent on the complex interactions among its amino acid residues. To consider how intramolecular residue–residue interactions affect selective constraint, we divided all the residues into those that are involved in some form of intramolecular interactions and those that are not. We found that interacting residues have 29% lower amino acid replacement rate than noninteracting residues (fig. 1A), arguing that interacting residues are under much stronger constraint. We further noted that the rate difference between interacting and noninteracting residues is greater than that between buried and exposed residues or between any 2 classes of secondary structure. This argues that among all the 3D structural contexts considered thus far, intramolecular residue–residue interactions exert the strongest effect on selective constraint.

Using the Grantham classification (Grantham 1974Go), we divided amino acid replacements into 4 groups based on the physicochemical properties of the replacements, including conservative, moderate, radical, and very radical. We found that interacting residues are strongly enriched for the conservative class of replacements relative to noninteracting residues and are deficient for the moderate, radical, and very radical classes (fig. 1B). This result demonstrates that not only is selective constraint stronger on interacting residues than noninteracting residues, but the stronger constraint on interacting residues also biases amino acid replacements considerably toward conservative ones.

We next investigated how chemical properties of residue–residue interactions influence levels of constraint. We categorized residue–residue interactions into 5 classes: ionic interactions, hydrophobic interactions, sidechain–sidechain hydrogen bonds, sidechain–backbone hydrogen bonds, and disulfide bonds. We then calculated the rate of amino acid replacements for each class (fig. 1A). It is noteworthy that cysteine residues in disulfide bonds have a replacement rate of 1.4%, far lower than other types of interactions, demonstrating the extraordinary conservation of such residues. In contrast, of the 3,493 cysteines not involved in disulfide bonding, there are 227 (6.5%) replacements.

The traditional metric for selective constraint on a gene is the ratio of the gene's nonsynonymous substitution rate (Ka) to its synonymous substitution rate (Ks) (Li 1993Go, 1997Go). Here, we developed an analogous metric based on the ratio of interacting residue replacement rate (Ri) to noninteracting residue replacement rate (Rn). Similar to the Ka/Ks ratio, the Ri/Rn ratio is below 1 for the great majority of genes. We plotted human–rat Ka/Ks against the Ri/Rn, which showed a strong positive correlation between Ka/Ks and Ri/Rn for Ri/Rn values less than 1 (supplementary fig. S1, Supplementary Material online). A simple explanation for this correlation is that, similar to Ka/Ks, low Ri/Rn is consistent with strong constraint, whereas high Ri/Rn being consistent with weak constraint. In other words, as selective constraint becomes stronger, its effect in limiting the freedom of amino acid replacements tends to fall preferentially on interacting residues more so than on noninteracting residues. To better visualize the relationship between Ka/Ks and Ri/Rn, we binned genes based on Ri/Rn values into 25 genes per bin. This produced a remarkably neat, positive linear relationship between bin-average Ka/Ks and Ri/Rn for Ri/Rn values less than 1. However, this positive relationship breaks down for Ri/Rn values greater than 1. The reason for this breakdown is not immediately obvious. We noted that proteins with Ri/Rn values greater than 1 tend to have small numbers of amino acid replacements, suggesting that stochastic variance may have contributed to the high Ri/Rn values in these proteins. However, additional studies are needed to clarify the relationship between Ka/Ks and Ri/Rn.

The significantly greater selective constraint on interacting residues suggests that mutations in interacting residues may be more deleterious to fitness. To test this, we collected all disease-causing missense mutations occurring in our proteins. Of the 2,877 such mutations, 1,933 (67%) strike interacting residues, significantly higher than the 63% expected by chance (P < 0.00001).

Because the human lineage and the rat lineage diverged from their common forebears, these 2 lineages were beset by vastly different mutation rates, generation time, demographic histories, and selective regimes. It is, therefore, of interest to examine whether the manner by which 3D structural context affects evolutionary constraint is comparable between these 2 lineages. Using dog sequences as the outgroup, we assigned human–rat amino acid replacements onto either the human or the rat lineage. We then analyzed the extent to which various 3D structural contexts affected amino acid replacement rates in each lineage. We found that there is a remarkable symmetry between the 2 lineages (data not shown). Thus, the observations we made for the human and rat lineages may reflect broader patterns of protein evolution in other species.

In conclusion, our study provides compelling evidence that 3D structural context of amino acid residues in proteins exerts a strong influence on selective constraint. Although this conclusion should not come as a surprise, our study is the first to definitively demonstrate it on a genomic scale and to offer quantitative measurements on the extent to which various structural contexts affect constraint.


    Materials and Methods
 TOP
 Abstract
 Materials and Methods
 Supplementary Material
 References
 
As described previously (Choi et al. 2005Go), a set of 767 rat–human orthologs was compiled for which 3D structures were available for the human proteins, and there was at least one amino acid difference between the rat and human orthologs. Dog sequences were also obtained for these proteins. Human–rat amino acid replacements were parsed by parsimony onto either the human or the rat lineage (i.e., the identity of the human–rat ancestral residue is assumed to be the residue that occurs in 2 of the 3 species), with residues that differ in all 3 species excluded from the analysis. Structural analysis of the proteins and the classification of 3D structural contexts were as previously described (Choi et al. 2005Go).

Ka and Ks between human and rat orthologs were calculated by the Li (1993)Go method. Amino acid replacement rate for a given class of residues (e.g., buried residues) was calculated as the fraction of residues in that class that differ between human and rat. We did not correct for multiple hits because they are likely rare given the low replacement rates. Error bars are generated by 1,000 bootstrap resampling (with replacement) of all the proteins and obtaining the standard deviation of the resampled data. Neutral evolution was simulated by evolving the human sequence forward, allowing a random placement of mutations under the observed transition/transversion ratio until the number of simulated synonymous mutations was equal to the number of synonymous mutations observed between human and rat orthologs. This simulated sequence was then compared against the original human sequence to determine the neutral replacement rate. Grantham distances for amino acid replacements were obtained as previously described (Grantham 1974Go) and classified into conservative, moderate, radical, and very radical groups based on prior convention (Grantham 1974Go; Wyckoff et al. 2000Go; Choi and Lahn 2003Go; Choi et al. 2005Go).

Human disease-causing mutations were obtained from the Human Gene Mutation Database (http://www.hgmd.cf.ac.uk) (Stenson et al. 2003Go). We only considered missense mutation (i.e., mutations that result in amino acid changes) and disregarded the other types of mutations such as nonsense mutations and deletions. This resulted in a total of 2,877 residues, whose missense mutations are disease causing in humans.


    Supplementary Material
 TOP
 Abstract
 Materials and Methods
 Supplementary Material
 References
 
Supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Footnotes
 
William Martin, Associate Editor


    References
 TOP
 Abstract
 Materials and Methods
 Supplementary Material
 References
 

    Bao L and Cui Y. (2005) Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. Bioinformatics 21:2185–90.[Abstract/Free Full Text]

    Bustamante CD, Townsend JP, Hartl DL. (2000) Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol Biol Evol 17:301–8.[Abstract/Free Full Text]

    Choi SS and Lahn BT. (2003) Adaptive evolution of MRG, a neuron-specific gene family implicated in nociception. Genome Res 13:2252–9.[Abstract/Free Full Text]

    Choi SS, Li W, Lahn BT. (2005) Robust signals of coevolution of interacting residues in mammalian proteomes identified by phylogeny-aided structural analysis. Nat Genet 37:1367–71.[CrossRef][ISI][Medline]

    Goldman N, Thorne JL, Jones DT. (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–58.[Abstract/Free Full Text]

    Grantham R. (1974) Amino acid difference formula to help explain protein evolution. Science 185:862–4.[Abstract/Free Full Text]

    Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A. (2005) LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 21:2814–20.[Abstract/Free Full Text]

    Li WH. (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96–9.[CrossRef][ISI][Medline]

    Li WH. (1997) Molecular evolution(Sinauer Associates, Sunderland, MA).

    Mintseris J and Weng Z. (2005) Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 102:10930–5.[Abstract/Free Full Text]

    Porto M, Roman HE, Vendruscolo M, Bastolla U. (2005) Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences. Mol Biol Evol 22:630–8.[Abstract/Free Full Text]

    Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. (2003) Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 21:577–81.[CrossRef][ISI][Medline]

    Wyckoff GJ, Wang W, Wu C-I. (2000) Rapid evolution of male reproductive genes in the descent of man. Nature 403:304–9.[CrossRef][Medline]

Accepted for publication August 4, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Sci.Home page
R. J. Johnson, S. R. Lin, and R. T. Raines
Genetic selection reveals the role of a buried, conserved polar residue
Protein Sci., August 1, 2007; 16(8): 1609 - 1616.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/11/2131    most recent
msl086v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Choi, S. S.
Right arrow Articles by Lahn, B. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Choi, S. S.
Right arrow Articles by Lahn, B. T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?