Skip Navigation


MBE Advance Access originally published online on November 23, 2006
Molecular Biology and Evolution 2007 24(2):349-351; doi:10.1093/molbev/msl181
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/2/349    most recent
msl181v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fornasari, M. S.
Right arrow Articles by Echave, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fornasari, M. S.
Right arrow Articles by Echave, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letter

Quaternary Structure Constraints on Evolutionary Sequence Divergence

María Silvina Fornasari*, Gustavo Parisi* and Julián Echave{dagger}

* Centro de Estudios e Investigaciones, Universidad Nacional de Quilmes, Bernal, Argentina
{dagger} Instituto Nacional de Investigaciones Fisicoquímicas Teóricas y Aplicadas, Universidad Nacional de La Plata, La Plata, Argentina

E-mail: jechave{at}inifta.unlp.edu.ar.


    Abstract
 TOP
 Abstract
 Acknowledgements
 References
 
The structurally constrained protein evolution (SCPE) model simulates protein divergence considering protein structure explicitly. The model is based on the observation that protein structure is more conserved during evolution than the sequences encoding for that structure. In the previous work, the SCPE model considered only the tertiary structure. Here we show that the performance of the model is enhanced when the oligomeric structure is taken into account. Our results agree with recent evolutionary studies of oligomeric proteins, which show that conservation of the quaternary structure imposes additional constraints on sequence divergence. The incorporation of protein–protein interactions into protein evolution models may be important in the study of quaternary protein structures and complex protein assemblies.

Key Words: SCPE • quaternary structure • sequence divergence

A major constraint in protein sequence divergence is the conservation of protein structure. This constraint is related to the selective pressure involved in the conservation of protein cores, which involve a complex network of interresidue noncovalent interactions (Russell and Barton 1994Go; Sali and Overington 1994Go). This network of interactions is among the predominant factors involved in the protein-folding process to obtain a stable fold (Lim and Sauer 1991Go; Lattman et al. 1994Go; Xu et al. 1998Go; Vendruscolo et al. 2000Go). These observations explain the fact that the amino acid substitution pattern for a given site depends on its structural environment and also that residue substitutions in interacting sites are correlated (Overington et al. 1990Go; Overington 1992Go; Pollock and Taylor 1997Go).

Noncovalent interactions between amino acid side chains are important for the correct assembly of folded chains into multichain proteins. The protein–protein interactions in these complex proteins could be permanent or transient. In the first case, proteins exist only in their complexed form, which is usually very stable. On the other hand, transient complexes associate and dissociate in vivo according to the environment or to the presence of external factors and involve proteins that also exist as independent entities (Jones and Thornton 1996Go; Nooren and Thornton 2003Go). The emerging picture suggests that the residues involved in these protein–protein interactions are evolutionarily constrained because of the selective pressure to conserve the structure of the complex to ensure the conservation of biological activity (Ofran and Rost 2003Go; Caffrey et al. 2004Go; Halperin et al. 2004Go; Li et al. 2004Go; Mintseris and Weng 2005Go). It was also found that for close homologues (30–40% or higher sequence identity), the protein–protein interactions are invariably the same (Aloy et al. 2003Go).

To study how protein structure modulates sequence divergence, we developed the structurally constrained protein evolution (SCPE) model (Parisi and Echave 2001Go). The SCPE model simulates sequence divergence constrained by conservation of protein structure. Recently, we successfully applied the SCPE to representatives of the main 4 classes of protein fold (alpha, beta, alpha + beta, and alpha/beta) (Parisi and Echave 2005Go). Using substitution matrices derived from SCPE simulations (Fornasari et al. 2002Go), we found that the SCPE model outperforms site-independent models such as JTT (Jones et al. 1992Go). In all these studies, the SCPE model considered only the tertiary structure of the protein. Here we extend the model to include protein–protein interactions and show that performance of the model improves.

We will describe briefly the algorithm of the SCPE model (a more detailed description can be found elsewhere [Parisi and Echave 2001Go; Fornasari et al. 2002Go; Parisi and Echave 2005Go]). In SCPE simulations, trial sequences are generated by introducing a random mutation in a reference sequence, which at the beginning of the simulation is equal to a sequence of known structure. Mutations are introduced using a amino acid mutational rate matrix Qmut derived using the HKY model of DNA evolution (Hasegawa et al. 1985Go) and the universal genetic code (see below). For each trial, a score Formula which measures the structural perturbation, is calculated, where EFormula and EFormula are the mean-field energies of reference and trial sequences. Mean-field energies are calculated using a contact map representation of the protein structure and an empirical contact potential (Berrera et al. 2003Go). Trial sequences are then accepted or rejected using an acceptance probability function (P) to generate in each round a new reference sequence,

Formula
where {lambda} is the only parameter of the SCPE model that must be fit to the data for each homologous set and is related to the degree of selection pressure for structural conservation.

Site-specific substitution matrices are derived from a matrix of counts for each site: for i != j, NFormula is half the number of mutational steps, which result in either i -> j or j -> i amino acid replacements at site p. NFormula is the number of mutational steps for which amino acid i remains constant. Then, the substitution rate matrix Qp is obtained using

Formula
and

Formula

Finally, to avoid numerical problems, each Qp is recalculated using pseudocounts as described previously (Parisi and Echave 2005Go).

In this paper a series of position-specific Qp were calculated using 2 alternative models: one model considers only the tertiary structure of a protein (SCPEt) and the other the quaternary structure of the protein (SCPEq). Seven homooligomeric protein families were used as test systems for model comparisons. These families adopt different quaternary structures and also belong to different fold classes (table 1). As described previously (Parisi and Echave 2004Go, 2005Go), for each family a set of homologous DNA sequences were collected and a maximum parsimony (MP) topology was inferred using DNAPARS (DNA parsimony program) (Felsenstein 1993Go). These sequences were aligned and Hasegawa-Kishino-Yano parameters estimated using hypothesis testing using phylogenies (HYPHY)(Pond et al. 2005Go). This alignment was translated using the universal code to obtain a protein alignment. The alignment length was adjusted to fit the reference sequence length. With this protein alignment, a MP topology was obtained using PROTPARS (Felsenstein 1993Go). This protein alignment and the derived MP topology are used to evaluate the likelihood of the models, as described below.


View this table:
[in this window]
[in a new window]

 
Table 1. Protein Families Studied

 
For each test system, we obtained, using SCPEt and SCPEq simulations, a set of Qp over a grid of {lambda} values. With maximum likelihood (ML) calculations, we obtained the ML for each {lambda} in the set. Then, both models were compared using parametric bootstrapping with a likelihood ratio test statistic (see Goldman 1993Go). In this evaluation, the best {lambda} for each model was used, and because the structural representation of the protein is not the same in SCPEt and SCPEq, the best {lambda} value for both models could not be the same. All the ML optimizations were performed independently for the different sites of the protein using the program HYPHY (Pond et al. 2005Go) and using the protein alignment and the MP topology for each family described above.

For each representative set of sequences, the statistic 2{delta}data = 2(ln (MLFormula) – ln (MLFormula)) was calculated, using the SCPEt as the null hypothesis and SCPEq as the alternative one. In order to assess the significance of 2{delta}data, we simulated 300 data sets (parametric bootstrapping) using the null hypothesis to obtain the 2{delta} reference distribution. Then, the significance of 2{delta}data was evaluated calculating a Z score as follows:

Formula
where the averages are taken over the 2{delta} distribution obtained by parametric bootstrapping.

In the first column of table 2, we show the results of the comparison of both models using the likelihood ratio test. The SCPEq model outperforms the SCPEt model in all the cases with high statistical significance (P < 10–2). The main difference between the 2 models is that because SCPEq takes into account the interactions between the chains in the oligomeric structure, the constraints imposed on those positions involved in intermonomer interactions differ from those of the SCPEt model that does not consider these interactions. These positions, called quaternary positions (QP), are detected by the difference in the total number of contacts per position between the contact matrices obtained using the quaternary structure (SCPEq) or just the tertiary structure (SCPEt). Positions with the same number of contacts in the 2 models are called tertiary positions (TP). To study the reason for the enhanced performance of SCPEq over SCPEt, we studied the 2{delta}data distributions for QP and TP. Using the Kolmogorov–Smirnov test, we found that these distributions are significantly different for all the protein families considered. This test was chosen because it has the advantage of making no assumption about the distribution of the data. Moreover, the average of 2{delta}data for QP shows a bias toward positive values, whereas for the corresponding TP values these averages are approximately centered around 2{delta}data = 0, as can be seen in table 2. We should note, however, that in 1 family (4-oxalocrotonate tautomerase, see table 2), the average 2{delta}data for TP departs more than it would be expected from zero, probably indicating that the TPs could be influenced by contacts with quaternary sites. This hypothesis requires additional studies that will be addressed in the future.


View this table:
[in this window]
[in a new window]

 
Table 2. Statistical analysis of QP and TP distributions

 
In summary, we have shown that the model SCPEq significantly outperforms SPCEt and that this improvement rests on the better modeling of QP when the quaternary structure is considered. This improvement shows the importance of the conservation of quaternary structure as one of the factors constraining sequence divergence during evolution. Thus, the oligomeric state of a protein should be taken into account to improve the quality of evolutionary models. Taking into account these constraints will improve our understanding of the forces involved in the formation and evolution of protein complexes.


    Acknowledgements
 TOP
 Abstract
 Acknowledgements
 References
 
We thank Jeff Thorne and an anonymous reviewer for their useful remarks that resulted in an improved version of the manuscript. This work was partially supported by grants from Universidad Nacional de Quilmes, Agencia Nacional de Promocion Cientifica y Tecnologica, and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET). G.P and J.E are members of CONICET.


    Footnotes
 
Spencer V. Muse, Associate Editor


    References
 TOP
 Abstract
 Acknowledgements
 References
 

    Aloy P, Ceulemans H, Stark A, Russell RB. (2003) The relationship between sequence and interaction divergence in proteins. J Mol Biol 332:989–998.[CrossRef][Web of Science][Medline]

    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242.[Abstract/Free Full Text]

    Berrera M, Molinari H, Fogolari F. (2003) Amino acid empirical contact energy definitions for fold recognition in the space of contact maps. BMC Bioinformatics 4:8.[CrossRef][Medline]

    Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES. (2004) Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–202.[CrossRef][Web of Science][Medline]

    Felsenstein J. (1993) PHYLIP (phylogeny inference package). Version 3.5c. Distributed by the author. (Department of Genetics, University of Washington, Seattle (WA)).

    Fornasari MS, Parisi G, Echave J. (2002) Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. Mol Biol Evol 19:352–356.[Abstract/Free Full Text]

    Goldman N. (1993) Simple diagnostic statistical tests of models for DNA substitution. J Mol Evol 37:650–661.[Web of Science][Medline]

    Halperin I, Wolfson I, Nussinov R. (2004) Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure 12:1027–1038.[Medline]

    Hasegawa M, Kishino H, Yano T. (1985) Dating of the human–ape splitting by molecular clock of mitochondrial DNA. J Mol Biol 22:160–174.

    Jones DT, Taylor WR, Thornton JM. (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282.[Abstract/Free Full Text]

    Jones S and Thornton JM. (1996) Principles of protein-protein interactions. Proc Natl Acad Sci USA 93:13–20.[Abstract/Free Full Text]

    Lattman EE, Fiebig KM, Dill KA. (1994) Modeling compact denatured states of proteins. Biochemistry 33:6158–6166.[CrossRef][Medline]

    Li X, Keskin O, Ma B, Nussinov R, Liang J. (2004) Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. J Mol Biol 344:781–795.[CrossRef][Web of Science][Medline]

    Lim WA and Sauer RT. (1991) The role of internal packing interactions in determining the structure and stability of a protein. J Mol Biol 219:359–376.[CrossRef][Web of Science][Medline]

    Mintseris J and Weng Z. (2005) Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 102:10930–10935.[Abstract/Free Full Text]

    Murzin AG, Brenner SE, Hubbard T, Chothia C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540.[CrossRef][Web of Science][Medline]

    Nooren IM and Thornton JM. (2003) Diversity of protein-protein interactions. EMBO J 22:3486–3492.[CrossRef][Web of Science][Medline]

    Ofran Y and Rost B. (2003) Analysing six types of protein-protein interfaces. J Mol Biol 325:377–387.[CrossRef][Web of Science][Medline]

    Overington J. (1992) Structural constraints on residue substitution. Genet Eng 14:231–249.

    Overington J, Johnson MS, Sali A, Blundell TL. (1990) Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc Biol Sci 241:132–145.

    Parisi G and Echave J. (2001) Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol 18:750–756.[Abstract/Free Full Text]

    Parisi G and Echave J. (2004) The structurally constrained protein evolution model accounts for sequence patterns of the LbetaH superfamily. BMC Evol Biol 4:41.[CrossRef][Medline]

    Parisi G and Echave J. (2005) Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. Gene.34 5:45–53.[CrossRef][Web of Science]

    Pollock DD and Taylor WR. (1997) Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 10:647–657.[Abstract/Free Full Text]

    Pond SL, Frost SD, Muse SV. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676–679.[Abstract/Free Full Text]

    Russell RB and Barton GJ. (1994) Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. J Mol Biol 244:332–350.[CrossRef][Web of Science][Medline]

    Sali A and Overington JP. (1994) Derivation of rules for comparative protein modeling from a protein structure alignments. Protein Sci 3:1582–1596.[Web of Science][Medline]

    Vendruscolo M, Mirny LA, Shakhnovich EI, Domany E. (2000) Comparison of two optimization methods to derive energy parameters for protein folding: perceptron and Z score. Proteins 41:192–201.[CrossRef][Web of Science][Medline]

    Xu J, Baase WA, Baldwin E, Matthews BW. (1998) The response of T4 lysozyme to large-to-small substitutions within the core and its relation to the hydrophobic effect. Protein Sci 7:158–177.[Web of Science][Medline]

Accepted for publication November 8, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/2/349    most recent
msl181v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fornasari, M. S.
Right arrow Articles by Echave, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fornasari, M. S.
Right arrow Articles by Echave, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?