MBE Advance Access originally published online on July 24, 2006
Molecular Biology and Evolution 2006 23(10):1937-1945; doi:10.1093/molbev/msl056
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
A Simple Statistical Method for Estimating Type-II (Cluster-Specific) Functional Divergence of Protein Sequences
Department of Genetics, Development and Cell Biology, Center for Bioinformatics and Biological Statistics, Iowa State University
E-mail: xgu{at}iastate.edu.
| Abstract |
|---|
|
|
|---|
Predicting functional amino acid residues in silico is important for comparative genomics. In this paper, we focus on the issue of how to statistically identify cluster-specific amino acid residues that are related to the functional divergence after gene duplication. We approach this problem using a framework based on site-specific shift of amino acid property (type-II functional divergence), as opposed to site-specific shift of evolutionary rate (type-I functional divergence). An efficient statistical procedure is implemented to facilitate the development of phylogenomic database for cluster-specific residues of large-scale protein families. Our method has the following features: 1) statistical testing of the type-II functional divergence and 2) the site-specific Bayesian profile to measure how amino acid residues contribute to type-II (cluster-specific) functional divergence. Consequently, one may obtain the posterior probability for "functional" cluster-specific residues. Case studies are presented and indicate that radical cluster-specific residues are responsible for most of inferred type-II functional divergence, whereas conserved cluster-specific residues appear less than even those imperfect radical cluster-specific residues to this type of functional divergence.
Key Words: functional divergence gene family evolution cluster-specific sites
| Introduction |
|---|
|
|
|---|
Under the framework of phylogenomic annotation of gene function (Eisen and Fraser 2003
Furthermore, Gu (2001)
made a distinction between 2 types of functional divergence. Type-I functional divergence results in site-specific rate shift (Gu 1999
; Knudsen and Miyamoto 2001
; Landgraf et al. 2001
; Gaucher et al. 2002
; Lopez et al. 2002). A typical case is an amino acid residue that is highly conserved in a subset of homologous genes but highly variable in a different subset of those homologous genes. Alternatively, type-II functional divergence results in the shift of cluster-specific amino acid property (Lichtarge et al. 1996
; Gu 2001
). Such divergence is exemplified by a radical shift of amino acid property, for example, positive versus negative charge differences at a homologous site that is otherwise evolutionally conserved between subtrees within a phylogeny. Note that these 2 types of functional divergence may have other names. For instance, the basic evolutionary trace approach (Lichtarge et al. 1996
; Madabushi et al. 2004
) has mainly focused on cluster-specific residues related to type-II functional divergence. Gribaldo et al. (2003)
also looked at type-II functional divergence called as "constant-but-different." Meanwhile, the weighted evolutionary trace approach proposed by Landgraf et al. (2001)
was similar to type-I functional divergence (Gu 1999
).
Many studies have been published about the statistical significance of observed patterns, which is important as the research community tends to use them to infer functional divergence of proteins (e.g., Gu 1999
, 2001
; Knudsen and Miyamoto 2001
; Landgraf et al. 2001
; Gaucher et al. 2002
). Although several methods for type-I functional divergence are available, as well as the software (e.g., Gu and Vander Velden 2002
), the implementation of statistical testing for type II has not been well resolved.
In this paper, we develop a statistical method for type-II functional divergence. In a typical case of 2 gene clusters generated by a gene duplication event, type-II functional divergence results in site-specific shift of amino acid physiochemical property after the gene duplication. In this regard, "type II is also called cluster-specific functional divergence" (Lichtarge et al. 1996
). Cluster-specific amino acid residues have been widely used in functional assays of protein families. For instance, Sun et al. (2002)
identified essential amino acid changes in paired domain evolution of the PAX gene family. Given the growing diversity and number of protein sequences available, it is now practical and necessary to develop a phylogenetic database for cluster-specific amino acid residues of protein families. To this end, we have to address 2 related statistical issues. First, are the type-II (cluster-specific) changes statistically significant? And secondly, for observed cluster-specific amino acid residues, how can we statistically measure whether they are related to type-II functional divergence. We address these issues by developing the statistical method that is suitable for large-scale data analysis.
| Theory |
|---|
|
|
|---|
Type-II Functional Divergence (cluster-specific) in the Early Stage
In principle, the evolution of protein sequences of duplicate genes can be divided into 2 stages, the early (E) stage after gene duplication and the late (L) stage (fig. 1). We assume that functional divergence between duplicate genes has occurred in the E stage, whereas in the late (L) stage, the purifying selection plays a major role to maintain related but distinct functions of 2 duplicate genes (Ohno 1970
- (i) In the E stage, an amino acid residue can be in either of 2 states: F0 (type II unrelated) and F1 (type II related). The probability of a residue being under F1 is P(F1) =
II and that being under F0 is P(F0) = 1
II. To distinguish it from the type-I functional divergence (Gu 1999
), we call
II "the coefficient of type-II functional divergence."
- (ii) In the L stage, an amino acid residue is always under the state of F0, indicating no further type-II functional divergence. Amino acid substitutions in this stage are mainly under purifying selection.
- (ii) In the L stage, an amino acid residue is always under the state of F0, indicating no further type-II functional divergence. Amino acid substitutions in this stage are mainly under purifying selection.
|
Substitution Models under F0 and F1
The pattern of amino acid substitutions during evolution, or the substitution model, relies on the states of functional divergence (F0/F1). The F0-substitution model largely reflects the conserved evolution of protein sequences, which can be empirically determined by the Dayhoff model (Dayhoff et al. 1978
Secondly, we assume that, under state F0, the transition probability for a radical, conserved, or no substitution is given by
![]() |
![]() |
![]() | (1) |
is the substitution rate, and
R (or
C) is the proportion of radical (or conserved) substitutions in the total substitutions;
R +
C = 1. Apparently, equation (1) is an extended Poisson model of protein sequence evolution. Based on the Dayhoff PAM matrix, we empirically determined that
R = 0.312 and
C = 0.688. Indeed, without any functional divergence, conserved amino acid substitutions are more likely to occur, as expected by the theory of neutral evolution (Kimura 1983
Next we consider the transition probabilities under F1 in the early stage, denoted by P(Y|F1) for Y = N, R, C. It should be noted that, according to our model (see above), an amino acid residue that has no change in the early stage is essentially unrelated to the type-II functional divergence. This argument implies P(N|F1) = 0. Similar to Eq.(1), one may choose P(R|F1) and P(C|F1) in the same forms as
and a'C(1 e
t), respectively. Together, these arguments directly lead to
![]() |
![]() |
![]() | (2) |
R), as will be shown later.
Evolutionary Link between Early and Late Stages
The evolutionary link between early and late stages depends on the status of type-II (cluster-specific) functional divergence. Let
E and
L be the evolutionary rates in the E and L stages, respectively. The statistical framework we developed is under the following assumptions:
- (i) A random variable u, called the rate component, varies among sites according to a standard gamma distribution
![]() | (3) |
describes the strength of rate variation among sites, that is, a small value of
means a strong rate heterogeneity among sites, and
=
means no rate variation among sites (Gu et al. 1995- (ii) Under F0, the evolutionary rates in the early (
E) and late (
L) stages share the same rate component u. That is,
E = c1u and
L = c2u, where c1 and c2 are constant.
- (iii) F1-amino acid substitutions in the early stage are independent of the rate component u, as indicated by equation (2). In other words, F1-amino acid substitutions have escaped from the ancestral functional constraint on the protein sequence.
- (iii) F1-amino acid substitutions in the early stage are independent of the rate component u, as indicated by equation (2). In other words, F1-amino acid substitutions have escaped from the ancestral functional constraint on the protein sequence.
Two Clusters by Gene Duplication
Consider the typical case of 2 clusters generated by a gene duplication event, each of which consists of several orthologous genes (fig. 1). Let X be the amino acid pattern of the late stage, the column (site) of the multiple alignments of the sequences. Let Y = (a, b) be the amino acid pattern of the early stage, the ancestral sequences of 2 internal nodes a and b. From the assumption (ii), the joint probability of X and Y under F0 is given by
where P(Y|F0) is determined by equation (1) for Y = N, C, or R, respectively, P(X|Y) is the likelihood of the subtrees of the 2 clusters A and B, conditional on the ancestral states a and b, which can be constructed according to the Markov-chain property under a known phylogeny (Felseinstein 1981; Gu 2001
), and the gamma distribution density for rate variation among sites
(u) is given by equation (3). Similarly, from (iii), under F1 we have
where P(Y|F1) is given by equation (2). Remembering that the probability of a site being under F1 is given by P(F1) =
II, the coefficient of type-II functional divergence, we have the joint probability for X and Y as follows:
![]() | (4) |
Direct application of equation (4) for estimating
II may face some difficulties because the amino acid pattern of early stage (Y) is unobservable. A straightforward solution is to invoke the ancestral sequence inference, for example, Yang et al. (1995)
. Treating the ancestral sequences as inferred observations, the standard procedure for the likelihood analysis of protein sequence can be applied. In spite of nice statistical properties, this approach requires a detailed description of the model and sensitive to the statistical uncertainty in ancestral sequence inference. To solve this problem, we thus propose a simple but robust method that is computationally efficient, allowing genome-wide proteomic analysis.
| A Simple Robust Method |
|---|
|
|
|---|
Poisson Model in the Late Stage
Testing type-II functional divergence between 2 gene clusters (the early stage) utilizes the within-cluster amino acid patterns to examine the conservation in the late stage. Therefore, a Poisson-based model that counts the number (k) of substitutions may be sufficient for this purpose, where smaller values of k of substitutions in a gene cluster indicate high conservation. Formally, at a given amino acid residue, the number of substitutions in each cluster (A or B) follows a Poisson process, for example, for cluster A, we have
![]() | (5) |
A (or
B) is the evolutionary rate of cluster A (or B), respectively. Hence, the earlylate joint distribution can be specified as fij, Y = P(X = (i, j), Y), where i or j is the number of substitutions in cluster A or B. Under this model, P(X|Y) = pApB, which is independent of the early stage Y. Similar to the derivation of equation (4), we have
and
Together, one can show that the earlylate distribution under the Poisson-based model is given by
![]() | (6) |
Analytical Form of the EarlyLate Distribution
First we consider the late-stage distribution, fij, the probability for i and j substitutions in clusters A and B, respectively. From equation (6), one can show that
which is a specific version of bivariate negative binomial distribution,
![]() | (7) |
/(DA + DB +
), ZA = DA/(DA + DB +
), and ZB = DB/(DA + DB +
);
and
are the total branch lengths of clusters A and B, respectively, and
is the gamma shape parameter.
Next we consider the early-stage distribution fY, the frequencies of 3 early-stage amino acid patterns for Y = N, R, or C. Because
from equation (6) one can show
![]() | (8) |
![]() | (9) |
is the branch length of the early stage.
We define W =
/(DA + DB + d +
), WA = DA/(DA + DB + d +
), and WB = DB/(DA + DB + d +
). Finally, we have shown that the joint distribution of earlylate stages, fij, Y, can be expressed as follows:
![]() |
![]() |
![]() | (10) |
is given by
![]() | (11) |
Estimation
Based on the likelihood principle, we have implemented the following algorithms to estimate unknown parameters for testing type-II functional divergence. Here we always assume that the phylogenetic tree of the gene family is known or can be reliably inferred.
Late-Stage Likelihood
The distribution of late-stage Qij is the probability of a site being i and j substitutions in the 2 clusters. As shown by equation (7), Qij depends on 3 (late-stage) parameters DA, DB, and
. We thus modified the likelihood method of Gu and Zhang (1997)
to estimate them simultaneously, denoted by
and
respectively. Note that the algorithm of Gu and Zhang (1997)
corrected the parsimony bias in counting the number of substitutions.
Likelihood for Estimating Early-Stage Parameters
Let nij, Y be the number of site with the pattern X = (i, j) and Y = N, R, or C. After treating 3 late-stage parameters as known, we develop a simple likelihood to estimate early-stage parameters
II, aR/aC, and d. From equation (10), we have fij,S = fij,R + fij,C = Qij (1
II)Mij = Qij fij,N. Let nij,S = nij,R + nij,C. Thus, the log-likelihood function,
|
|
II and d. Let
is the total number of sites that have no change in the early stage. Under the p constraint of equation (9), the maximum likelihood estimate of
II is given by
where y is the solution of
![]() | (13) |
II). (Note that Mij depends on the parameter d, whereas Qij only depends on late-stage parameters that are treated as known). The iteration can start with the initial values of d(0) = ln(1 p) until convergence. Let L be the sequence length,
and
The sampling variance of
can be calculated as follows:
![]() | (14) |
When the estimates
and
are obtained, aR can be estimated from equation (8).
The proportion of amino acid differences between the internal nodes a and b represented by p can be computed as follows. First, we use the Bayesian algorithm (Zhang and Nei 1997
) to infer the ancestral sequences of Y, which is a simplified version of Yang et al. (1995)
in which the branch lengths of the phylogenetic tree are estimated using a least square method rather than the ML method. Then, we estimate p when each site in the inferred ancestral sequence receives the assignment of amino acid with the highest posterior probability. Simulations conducted by Zhang and Nei (1997)
showed that this approach for estimating p is almost unbiased.
The U-Likelihood
This method utilizes amino acid sites that are universally conserved in both clusters, that is, i = j = 0. Let n00Y be the number of sites with Y = N (the U type), R, or C. Let n00 = n00N + n00R + n00C and f00 = f00N + f00R + f00C. Then, the log of U-likelihood can be written as
|
|
Let
Similar to above, we have shown that the ML estimates of
II and d are given by
![]() |
![]() | (16) |
The sampling variance of the estimates
is Var(
II) = f00N(1 f00N)b2/N, where
Because the U method largely relies on the universally conserved sites, it seems robust against the inaccuracy of ancestral sequence inference and sequence alignment.
Predicting Critical Amino Acid Residues: Empirical Bayesian Approach
The identification of which sites are responsible for these type-II (cluster-specific) functional differences is of great interest, if the coefficient of functional divergence (
II) between early and late stages is significantly larger than 0. Here we develop a method of predicting such sites, which indeed can be further tested by experimentation, using molecular, biochemical, or transgenic approaches.
We wish to know the probability of state F1 in the early stage at a site, that is, P(F1|X, Y). According to the Bayesian law, we have
![]() | (17) |
II. Under the Poisson-based model, P(X = (i, j), Y|F1) and P(X = (i, j), Y|F0), and P(X = (i, j), Y) are given by equations (5) and (7), respectively. Noting that aY = 0 if Y = N, one can show
![]() |
![]() |
![]() | (18) |
One may find that it is simple to use the posterior probability ratio of F1 to F0, that is, R(F1|F0) = P(F1|X, Y)/P(F0|X, Y). After some algebras, we obtain
![]() | (19) |
).
Statistical Evaluation of Cluster-Specific Sites
An important result from equation (19) is that the posterior ratio R(F1|F0) reaches its maximum if there is no amino acid substitution in each gene cluster but the amino acid is different between them, that is, i = j = 0 and Y
N. As usually observed, and assuming that the proportion of radical changes under F1 is higher than that under F0 such that aR/aC >
R/
C, we have
![]() | (20) |
Hence, a typical cluster-specific site indeed will receive a highest score for the type-II functional divergence, consistent with the intuitive biological interpretation. However, it should also be indicated that a high score could be statistically meaningless if
II is not significantly larger than 0. Finally, we note that
if
This means that greater accuracy is achieved as more sequences are analyzed (i.e., increasing DA or DB). In practice, one may use this property to determine how many sequences are sufficient to achieve the statistical resolution of site prediction.
| Software and Examples |
|---|
|
|
|---|
The newly developed method for type-II functional divergence has been implemented in the software package DIVERGE2, which is available from our Web site http://xgu.gdcb.iastate.edu. We have distribution packages for both Microsoft Windows and Linux operating systems, including manual and example files. We have conducted several case studies to demonstrate its potential applications in understanding functional divergence of protein sequences.
Data Sets
We present case studies analyzing 3 gene families. 1) The cyclooxygenase (COX) enzymes catalyze a key step in the conversion of arachidonate to PGH2, the immediate substrate for a series of cell prostaglandin and thromboxane synthases. There are 2 tissue-specific isoforms in mammals: COX1 and COX2. Molecular cloning of COX2 led to a major investment by pharmaceutical companies in the development of selective inhibitors. The sequence alignment and the phylogenetic tree are the same as in Gu (2001)
; also see figure 1C. 2) The caspase gene family is important for apoptosis (programmed cell death) and cytokine maturation, which has been studied extensively for type-I functional divergence (Wang and Gu 2001
). And 3) we have also analyzed the duplicated isoforms (Gq and GS) of G-protein alpha subunits.
Testing the Significance of Type-II Functional Divergence
As expected, all these gene families show significant type-I functional divergence. Based on the phylogenetic tree in figure 1C, we estimated that the coefficient of type-II functional divergence between COX1 and COX2 duplicate genes
II = 0.159 ± 0.036, which is statistically significant (P < 0.001). We also analyzed the duplicated isoforms (Gq and GS) of G-protein alpha subunits and found a similar pattern of type-II functional divergence (table 1). In contrast to Wang and Gu's (2001
) finding for type-I functional divergence of caspase gene family, we found no evidence for type-II functional divergence between 2 major CED-3 and ICE subfamilies. This raises the question whether the relative importance of type-I and type-II functional divergences is associated with specific functional classes of the protein family.
|
Site-Specific Profiles of Type-II (cluster-specific) Functional Divergence
For illustration, figure 2 shows the site-specific ratio profile of type-II functional divergence between COX1 and COX2. For 583 aligned sites, 492 (84%) sites have received the ratio score < 1, indicating that most sites are predicted to be unrelated to the type-II functional divergence. Moreover, we identified 28 radical cluster-specific sites that receive the highest posterior ratio score, that is, R(F1|F0)max = 7.17. In other words, if we select these radical cluster-specific sites as candidates for type-II functional divergence, the posterior probability for them is PII = 7.17/(1 + 7.17) = 87.8%, indicating that the prediction error (false-positive rate) is 12.2%. Actually, it is impressive that among 111 radical amino acid substitutions in the early stage after the gene duplication, about 29/111
26% are potentially related to type-II functional divergence between COX1 and COX2.
|
|
Effect of Radical Substitutions in the Early Stage
For the COX gene family, we found that the radical substitutions for type-II functional divergence in the early stage is about 2.7-fold increasing (aR/
R = 2.7 in table 1). Consequently, an amino acid residue with a radical change between COX1 and COX2 may have a higher score than a conserved change for being type-II functional divergence related. As shown by table 2, the sites most likely exhibiting type-II behavior are the radical cluster-specific sites, whereas the conserved cluster-specific sites are less likely, as indicated by a low posterior probability (
0.35). This case study clearly shows the important role of statistical analysis, otherwise one cannot objectively justify whether one-less radical cluster-specific site (i.e., there is one amino acid substitution in the late stage) is more likely to be functional divergence related than conserved cluster-specific site. | Discussion |
|---|
|
|
|---|
The importance of cluster-specific amino acid residues for understanding functional divergence of a protein family has been well recognized (Lichtarge et al. 1996
We first test type-II functional divergence after the gene duplication. Rejection of the null hypothesis indicates that some conserved amino acid residues have experienced radical shift of amino acid property between the duplicate genes. The site-specific profile based on the posterior probabilities is useful to select residues that are type-II functional divergence related. Apparently, radical cluster-specific sites usually receive the highest scores for type-II functional divergence. However, previous computational methods have difficulty to rank ordering the conserved cluster-specific sites or imperfect radical cluster-specific sites (e.g., with one amino acid substitution within clusters) because the score system adopted was ad hoc, subject to arbitrary. Our software was developed to overcome these potential pitfalls. For instance, in the COX gene family (table 2), we found that the imperfect radical cluster-specific sites may be more important than conserved cluster-specific sites to understand the mechanism of functional divergence for this family.
As shown in tables 2 and 3, our case studies have demonstrated the critical role of amino acid classification in predicting type-II functional divergence as it determines radical or conserved change in the early stage. There have been extensive discussions about this issue (e.g., Atchley et al. 2005
). As a starting point, the current system classifies 20 amino acids into 4 groups (charge positive, charge negative, hydrophilic, and hydrophobic) to characterize major types of radical amino acid substitutions, which may be sufficient for most protein families. However, the rationale of classification may be questionable in some cases. For instance, we categorize histidine (H) as charge positive. In fact, the isoelectric point of H is close to physiological pH (
7.5) and the pKa is
6.0. So the positive charge on H is "very" sensitive to pH changes. As a result, in some areas of the cell, H is positively charged, whereas it is not the case in other areas. Indeed, the classification of 2 groups (positive and negatively charged residues) is somewhat exaggerated as, frequently, these residues are embedded in local environment that significantly modifies the pKa and make both groups of residues protonated. Consequently, they are equivalent to serve as H-bond donor or acceptor. Another example is site 419 in table 3: this site may be incorrectly highlighted as "radical" because glycine (G) and alanine (A) are the 2 smallest amino acids, if the functional constraints that act on this site require that only small amino acids occupy this site regardless of their physiochemical properties.
|
Given these sophisticated cases, the software DIVERGE2 has the option of amino acid classification for the user. It should be emphasized here that biological interpretations of type-II site predictions are based on the specified amino acid classification. One possible improvement in the future is to adopt the hierarchical amino acid classification, for example, Murphy et al. (2000)
Because our method relies on the ancestral sequence inference, the accuracy of ancestral state estimations may cause some potential problems. It should be noted that the statistical method we developed for estimating
II and other parameters only utilized the frequencies of amino acid differences in the early stage, rather than the inferred ancestral characters. Computer simulations conducted by Zhang and Nei (1997)
indicated that the frequencies of amino acid substitutions between ancestral nodes are almost unbiased, though the inferred amino acid residues at some sites may be incorrect. Moreover, in calculating the site-specific profile for prediction, we are more interested in radical/conserved cluster-specific sites or imperfect cluster-specific sites. In these cases, ancestral sequence inference is almost 100% correct. In addition, the number of amino acid substitutions in each cluster is estimated by Gu and Zhang's (1997)
algorithm, and this approach accounts for multiple hits. Further, we have used a modified U method without invoking the ancestral sequence inference: the early-stage distance (d) between internal nodes (a and b) is approximated by the branch length estimated by the Neighbor-Joining phylogeny. In the cases of COX and G-alpha gene families, the results are very similar (not shown).
Among many statistical problems that remain to be solved in the phylogeny-based functional analysis of protein sequences, one common issue is how to model functional divergence of insertion and deletions (indels), which is important because the protein structure is more strongly affected by indels than by point substitutions. We are developing a Poisson model to integrate indel events into the framework of type-II functional divergence. This simple approach will treat indel as a 21st character, and its evolutionary rate will be indel-length (k) dependent. Gu and Li (1995)
found a power law for the size distribution of indels. Thus, one may assume the rate of k indels can be written as µk
kß, where ß
2 (Gu and Li 1995
).
To accomplish the goal of phylogenomic annotation of gene functions, we need many so-called evolutionary models of protein function to link between the biochemicalphysiological perspective and the evolutionary pattern of sequence. Type-I and type-II functional divergences, for instance, are 2 special models for this purpose. The statistical method we have described in this article can be integrated into a relational protein database for cluster-specific amino acid residues of, for example, transcription factors, to help in the better understanding of protein function.
| Acknowledgements |
|---|
|
|
|---|
The author is grateful to Jie Liang, Eric Gaucher, and Kent Vander Velden for constructive comments. This work was partially supported by the National Institutes of Health grants.
| Footnotes |
|---|
Naoko Takezaki, Associate Editor
| References |
|---|
|
|
|---|
Atchley WR, Zhao J, Fernandes A, Drueke T. 2005. Solving the sequence metric problem. Proc Natl Acad Sci 102:6395400.
Dayhoff MO, Schwartz RM, Orcutt BC. 1978. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of protein sequence structure. Volume 5. Suppl 3. Washington, DC: National Biomedical Research Foundation. p 34252.
Eisen MB, Fraser CM. 2003. Phylogenomics: intersection of evolution and genomics. Science 300:17067.
Felsenstein J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:36876.[CrossRef][ISI][Medline]
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:153145.
Gao X, Vander Velden K, Voytas DF, Gu X. 2005. SplitTester: software to identify domains responsible for functional divergence in protein family. BMC Bioinformatics 6:137.[Medline]
Gaucher EA, Gu X, Miyamoto M, Benner S. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trend Biochem Sci 27:31521.[CrossRef][ISI][Medline]
Golding GB, Dean AM. 1998. The structural basis of molecular adaptation. Mol Biol Evol 15:35569.[Abstract]
Gribaldo S, Casane D, Lopez P, Philippea H. 2003. Functional divergence prediction from evolutionary analysis: a case study of vertebrate hemoglobin. Mol Biol Evol 20:17549.
Gu X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16:166474.[Abstract]
Gu X. 2001. Maximum likelihood approach for gene family evolution under functional divergence. Mol Biol Evol 18:45364.
Gu X. 2003. Functional divergence in protein (family) sequence evolution. Genetica 118:13341.[CrossRef][ISI][Medline]
Gu X, Fu YX, Li WH. 1995. Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol Biol Evol 12:54657.[Abstract]
Gu X, Li WH. 1995. The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J Mol Evol 40:46473.[CrossRef][ISI][Medline]
Gu X, Vander Velden K. 2002. DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18:5001.
Gu X, Zhang J. 1997. A simple method for estimating the parameter of substitution rate variation among sites. Mol Biol Evol 14:110613.[Abstract]
Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:27582.
Jordan K, Bishop GR, Gonzalez DS. 2001. Sequence and structural aspects of functional diversification in class I
-mannosidase evolution. Bioinformatics 17:96576.
Kimura M. 1983. The neutral theory of molecular evolution. Cambridge: Cambridge University Press.
Knudsen B, Miyamoto MM. 2001. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc Natl Acad Sci 98:145127.
Landgraf R, Xenarios I, Eisenberg D. 2001. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 307:1487502.[CrossRef][ISI][Medline]
Li X, Hu C, Liang J. 2003. Simplicial edge representation of protein structures and alpha contact potential with confidence measure. Proteins 53:792805.[CrossRef][ISI][Medline]
Lichtarge O, Bourne HR, Cohen FE. 1996. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:34258.[CrossRef][ISI][Medline]
Lopez P, Forterre P, Philippe H. 1999. The root of the tree of life in the light of the covarion model. J Mol Evol 49:496508.[CrossRef][ISI][Medline]
Madabushi S, Gross AK, Philippi A, Meng EC, Wensel TG, Lichtarge O. 2004. Evolutionary trace of G protein-coupled receptors reveals clusters of residues that determine global and class-specific functions. J Biol Chem 279:812632.
Murphy L, Wallqvist A, Levy R. 2000. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein 13:14952.
Ohno S. 1970. Evolution by gene duplication. Berlin: Springer-Verlag.
Rastogi S, Liberles D. 2005. Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5:28.[CrossRef][Medline]
Sun H, Merugu S, Gu X, Kang Y, Dickinson DP, Callaerts P, Li W-H. 2002. Identification of essential amino acid changes in paired domain evolution using a novel combination of evolutionary analysis and in vitro and in vivo studies. Mol Biol Evol 19:1490500.
Tseng YY, Liang J. 2006. Estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a Bayesian Monte Carlo approach. Mol Biol Evol 23:42136.
Wang Y, Gu X. 2001. Predicting functional divergence of caspase gene family. Genetics 158:131120.
Yang Z, Kumar S, Nei M. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:164150.[Abstract]
Zhang J, Nei M. 1997. Accuracies of ancestral amino acid sequence inferred by the parsimony, likelihood, and distance method. J Mol Evol 44:S13946.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Hernandez-Sanchez, A. Mansilla, F. de Pablo, and R. Zardoya Evolution of the Insulin Receptor Family and Receptor Isoform Expression in Vertebrates Mol. Biol. Evol., June 1, 2008; 25(6): 1043 - 1053. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Courville, E. Urbankova, C. Rensing, R. Chaloupka, M. Quick, and M. F. M. Cellier Solute Carrier 11 Cation Symport Requires Distinct Residues in Transmembrane Helices 1 and 6 J. Biol. Chem., April 11, 2008; 283(15): 9651 - 9658. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ye, G. Vriend, and A. P. IJzerman Tracing evolutionary pressure Bioinformatics, April 1, 2008; 24(7): 908 - 915. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Matsuo Rapid Evolution of Two Odorant-Binding Protein Genes, Obp57d and Obp57e, in the Drosophila melanogaster Species Group Genetics, February 1, 2008; 178(2): 1061 - 1072. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ye, K. Anton Feenstra, J. Heringa, A. P. IJzerman, and E. Marchiori Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting Bioinformatics, January 1, 2008; 24(1): 18 - 25. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||

































