MBE Advance Access originally published online on August 4, 2006
Molecular Biology and Evolution 2006 23(11):2039-2048; doi:10.1093/molbev/msl081
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Protein Evolution Is Faster Outside the Cell


* Division of Matrix Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
Stockholm Bioinformatics Center, SCFAB, Stockholm University, Stockholm, Sweden
Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Lyngby, Denmark
E-mail: karinjul{at}sbc.su.se.
| Abstract |
|---|
|
|
|---|
Some proteins are highly conserved across all species, whereas others diverge significantly even between closely related species. Attempts have been made to correlate the rate of protein evolution to amino acid composition, protein dispensability, and the number of proteinprotein interactions, but in all cases, conflicting studies have shown that the theories are hard to confirm experimentally. The only correlation that is undisputed so far is that highly/broadly expressed proteins seem to evolve at a lower rate. Consequently, it has been suggested that correlations between evolution rate and factors like protein dispensability or the number of proteinprotein interactions could be just secondary effects due to differences in expression. The purpose of this study was to analyze mammalian proteins/genes with known subcellular location for variations in evolution rates. We show that proteins that are exported (extracellular proteins) evolve faster than proteins that reside inside the cell (intracellular proteins). We find weak, but significant, correlations between evolution rates and expression levels, percentage of tissues in which the proteins are expressed (expression broadness), and the number of protein interaction partners. More important, we show that the observed difference in evolution rate between extra- and intracellular proteins is largely independent of expression levels, expression broadness, and the number of proteinprotein interactions. We also find that the difference is not caused by an overrepresentation of immunological proteins or disulfide bridgecontaining proteins among the extracellular data set. We conclude that the subcellular location of a mammalian protein has a larger effect on its evolution rate than any of the other factors studied in this paper, including expression levels/patterns. We observe a difference in evolution rates between extracellular and intracellular proteins for a yeast data set as well and again show that it is completely independent of expression levels.
Key Words: evolution rate subcellular localization expression level protein connectivity immunological protein disulfide bridge
| Introduction |
|---|
|
|
|---|
Some proteins are highly conserved across all species, whereas others diverge significantly even between closely related species (Li and Graur 1991
In a previous study, we investigated mammalian mucin-type O-glycosylation sites aiming, among other things, to investigate whether glycosylated serines and threonines are more likely to be evolutionary conserved than other serines and threonines (Julenius et al. 2005
). As a negative data set, we tried using serines and threonines present in nuclear proteins. Because mucin-type O-glycosylation is only found in extracellular proteins or extracellular domains of membrane proteins, we could be certain that the nuclear serines and threonines would not be glycosylated. During the collection of these data, it quickly became apparent that the serines and threonines of nuclear proteins were in fact much more conserved than the serines and threonines of the O-glycosylated proteins (regardless of whether or not they were glycosylated). In this serendipitous manner we became interested in investigating the rate of protein evolution in different subcellular compartments. Since then, others have found similar indications (Winter et al. 2004
; Aris-Brosou 2005
), although no one has yet made this the main topic of a study.
The aim of this study was to investigate whether there was a significant difference in the evolution rate of proteins with different subcellular localizations. We found for a mammalian data set that extracellular proteins evolve faster than intracellular proteins. We also found that the differences in evolution rates between extra- and intracellular proteins are independent of expression levels, number of protein interaction partners, presence of disulfide bridges, and whether or not the protein is involved in the immune response. We observe a difference in evolution rates between extra- and intracellular proteins for a yeast data set as well and show that it is again completely independent of expression levels.
| Materials and Methods |
|---|
|
|
|---|
Amino Acid Sequence analysis
Mammalian proteins were extracted from Swiss-Prot, release 43 (Boeckmann et al. 2003
For each alignment, the sequence diversity was quantified using Nei's sequence diversity measure,
(Nei and Li 1979
).
is defined as the average fraction of differing sites between all possible pairwise comparisons of the sequences; a sequence diversity of 0.05 in a population of sequences thus means that on average any pair of sequences will be different at 5% of their sites. Signal peptides and propeptides (according to Swiss-Prot annotation) were not included in the analysis. The amino acid diversities of the extracellular, transmembrane, and cytosolic segments were calculated individually for transmembrane proteins. Information about membrane topology was extracted from Swiss-Prot annotation although this was not experimentally verified for all entries.
Amino acidspecific measures of sequence conservation were also calculated. For each aligned position, the frequency of the most prevalent amino acid residue was determined. Sequence conservation for this individual amino acid residue was defined as the fraction of sequences with identical amino acid residue in the aligned position. For each protein alignment (and subcellular location in the case of membrane proteins), we calculated average degrees of conservation for the types of amino acid residues present in the sequence. Again, signal peptides and propeptides were excluded. For each of the subcellular categories, these averages were used to calculate an overall average as well as the standard deviation (SD). Assuming normal distribution, the SD was used to estimate a 95% confidence interval for the true overall average.
DNA Sequence Analysis
DNA sequences were identified through cross-references to GenBank (Benson et al. 2004
) in the Swiss-Prot entries. Only DNA sequences that were consistent with the corresponding amino acid sequence were accepted, leading to a slight reduction in data set size (table 1). The DNA sequences were aligned according to the existing protein alignment using RevTrans (Wernersson and Pedersen 2003
), and neighbor-joining trees were created using ClustalW (Thompson et al. 1994
). For membrane proteins, DNA sequences were divided into extracellular, transmembrane, and cytosolic fragments. The fragments were concatenated with fragments from the same cellular compartment and protein. Only membrane protein sequences of total length 80 aa or more after concatenation were accepted for further analysis. One dN/dS ratio per alignment was estimated using codeml in the phylogenetic analysis by maximum likelihood (PAML) software package (Yang 1997
) (model M0).
|
Permutation Test
We used permutation tests to investigate whether the means of two dN/dS distributions were significantly different. What we want to do is to compare the present distributions with what could occur by chance. For the sake of clarity, we will call the distribution with the lower mean A and the distribution with the higher mean B. A and B were merged, and a new distribution with the same size as B was constructed by randomly drawing from the combined distribution. We repeated the drawing process 10,000 times, and if the mean of the randomly drawn distribution was equal to or larger than the mean of B fewer than 100 times out of the 10,000, the means of A and B were found to be significantly different with P < 0.01. The mean of all values (A + B) is a constant and remains the same for all permutations. If the mean of a random distribution of size B is smaller than the mean of B, it follows that the mean of the remaining distribution must be larger than the mean of A.
Expression Data Analysis
Expression data for the human and/or mouse genes were collected from GNF SymAtlas v 0.8.0 (Su et al. 2002
). From all expression levels in different tissues, for different DNA probes and, if available, also different organisms (mouse and human) one mean was calculated and used as the expression level of that particular protein. The broadness of tissue expression was estimated by calculating the fraction of tissues in which the genes are positively expressed for every DNA probe individually and averaging over the probes of that protein. Positive expression in a tissue was defined as those cases where a gene displayed at least 20% of its maximum expression and at the same time had an absolute expression of at least 100.
Analysis of Number of ProteinProtein Interactions
In order to determine the number of proteinprotein interactions for members of our data sets, we used a comprehensive database constructed in connection with another current project (Kasper Lage et al., unpublished data). Briefly, this database has been made by pooling human interaction data from a number of the largest databases and then increasing coverage by transferring data from model organisms. All interactions in the database have been assessed for trustworthiness using a score that relies on network topology and furthermore takes into account that interactions from large-scale experiments generally contain more false positives than the interactions from small-scale experiments and that interactions are more reliable if they have been reproduced in more than one independent interaction experiment.
Immunological Proteins
To investigate the contribution of immunological proteins involved in recognizing pathogens to the evolution rate of extracellular proteins, 225 Gene Ontology terms indicating possible involvement in immunological processes and pathogen binding were identified. The Swiss-Prot entries of all orthologs of every extracellular protein were searched for the presence of these terms and any one hit identified that particular protein as immunological. A total of 95 proteins for the amino acid sequence analysis and 90 proteins for the DNA sequence analysis were identified using this method.
Yeast Data Set
Evolution rates (estimated by comparison to Saccharomyces bayanus, Saccharomyces mikatae, and Saccharomyces paradoxus) (Wall et al. 2005
), gene dispensability data (Deutschbauer et al. 2005
), and expression levels (Holstege et al. 1998
) for Saccharomyces cerevisiae were downloaded from the electronic supplement of Drummond et al. (2005)
. As gene dispensability measurement, we used both average growth rates of the homozygous deletion strains and whether the gene was essential or not. Lists of yeast genes with the subcellular localizations "nucleus," "cytoplasm," "extracellular," and "cell wall" were downloaded from the Comprehensive Yeast Genome Database (Guldener et al. 2005
). Because exported proteins were sparse, additional extracellular and cell wall proteins were identified using Saccharomyces Genome Database (Christie et al. 2004).
| Results |
|---|
|
|
|---|
Analysis of Diversity in Different Subcellular Compartments
From Swiss-Prot, we collected a total of 2,723 mammalian proteins where the subcellular location was known. Specifically, the following location categories were included: nuclear, cytoplasmic, transmembrane, and extracellular (table 1). The 2,723 groups of orthologs consisted of between 2 and 40 proteins each (median group size: 3; mean: 3.2). For every single group of orthologs, a multiple alignment was constructed, and Nei's sequence diversity
(Nei and Li 1979
|
For globular proteins we observe that the average diversity varies between the different subcellular compartments in the following order: nuclear < cytoplasmic < extracellular (fig. 1A). Interestingly, a similar trend is seen for different parts of transmembrane proteins where diversity is low in transmembrane regions, higher for cytosolic regions, and highest for extracellular regions (fig. 1A). The mean diversities for the above-mentioned categories are significantly different in all cases (table 2). To rule out the possibility that these results are biased due to uneven taxonomic coverage in the data sets for different subcellular compartments, we also analyzed the diversity in a set of alignments that all contained only human, mouse, and rat sequences (table 1). The results from this analysis are essentially identical to the results of the full data analysis (cf. figs. 1A and 1B).
|
In conclusion, the analysis of sequence diversity in globular and transmembrane proteins in a large mammalian data set shows that extracellular (parts of) proteins evolve more rapidly than intracellular (parts of) proteins.
Analysis of Conservation of Individual Amino Acids
In order to see if the differences are attributed to certain amino acid residues, we also calculated an amino acidspecific measure of sequence conservation. For each aligned position, the sequence conservation with respect to the most prevalent amino acid residue in that position was calculated. The results were averaged for each type of amino acid residue within each subcellular location category, and confidence levels were calculated. For the membrane proteins, the extracellular, the transmembrane, and the cytosolic parts were averaged individually. We observe a statistically significant difference where extracellular proteins show lower sequence conservation for every individual residue type except for cysteine as compared with intracellular proteins (fig. 2).
|
Analysis of Selective Pressure in Different Subcellular Compartments
In order to rule out that the differences in evolution rates between proteins from different subcellular compartments are merely due to differences in mutation rates, we also analyzed the corresponding DNA sequences. When analyzing DNA sequences for molecular evolution purposes, one distinguishes between the rate of synonymous mutations per synonymous site (dS, mutations that do not change the corresponding amino acid) and the rate of nonsynonymous mutations per nonsynonymous site (dN, mutations that do). The ratio of these rates (dN/dS) provides information about the selective pressure acting on the investigated set of sequences. If no selection is acting on the encoded protein, then the synonymous and nonsynonymous rates per site will be the same, leading to a dN/dS = 1. Similarly, dN/dS < 1 indicates that the protein has been under negative (purifying) selection, whereas dN/dS > 1 indicates the presence of positive, adaptive selection (Yang 1997
Through cross-referencing from Swiss-Prot (Boeckmann et al. 2003
) to GenBank (Benson et al. 2004
), we gathered as many DNA sequences as possible corresponding to the protein data sets (table 1). Using PAML (Yang 1997
), we estimated one dN/dS ratio per protein alignment. The resulting distributions of dN/dS ratios for proteins of different subcellular compartments are shown as a violin plot in figure 3. Analogous to the
results, we observe that the mean of the dN/dS ratio varies between the different cellular compartments in the following order: nuclear < cytoplasmic < extracellular for globular proteins and transmembrane regions < cytoplasmic regions < extracellular regions for cellular membrane proteins. The means of the distributions are significantly different in all cases (table 2). Most cytosolic and nuclear proteins have a low dN/dS ratio (<0.2), whereas extracellular proteins show a wider distribution of different dN/dS ratios with a higher overall mean. For membrane proteins we see the same tendency, although the differences are less distinct. The difference in dN/dS on either side of the cellular membrane shows that, on average, DNA coding for intracellular proteins is under more strict negative (purifying) selection than DNA coding for extracellular proteins.
|
Impact of Expression Level on Results
To rule out that the differences in evolution rates between extracellular and intracellular (cytoplasmic and nuclear) proteins that we have discovered are secondary effects from differences in expression level, we investigated the effects of gene expression levels on the rate of gene evolution in our data set. Expression data from a publicly available database, GNF SymAtlas v 0.8.0 (Su et al. 2002
, weak but significant correlations were found to both expression level (r = 0.11, r2 = 0.012) and expression broadness (r = 0.27, r2 = 0.073). The results indicate that at the most, 7% of the variation in evolution rate can be explained by differences in expression characteristics.
We see differences in expression in different subcellular compartments. Expression levels are higher on average in cytoplasmic proteins (1.1 x 1031.8·x 103) as compared with nuclear (4.9 x 1026.2·x 102) and extracellular proteins (5.0 x 1027.0·x 102), and expression broadness is highest in nuclear proteins (0.400.44), followed by cytoplasmic (0.350.40) and extracellular proteins (0.210.26). To further prove that our results are independent of expression, we divided the proteins into three groups of approximately equal sizes depending on their expression levels. Within each category, the differences in means of dN/dS and
for intra- and extracellular proteins were statistically significant (table 3). The same result was obtained for division into three groups depending on the broadness of the tissue expression (table 3). Because the extracellular and cytosolic parts of a membrane protein have the same expression characteristics, the analysis was not performed for membrane proteins. We also performed the opposite experiments and divided the proteins into three categories depending on the subcellular localization (nuclear, cytoplasmic, and extracellular). Within each category, no statistically significant correlations were found from dN/dS to neither expression levels nor expression broadness.
|
Impact of Number of ProteinProtein Interactions on Results
To see whether differences in protein connectivity for proteins in different subcellular compartments could be causing the differences in evolution rates between extracellular, intracellular (cytoplasmic and nuclear) proteins, we investigated the effects of protein connectivity on the rate of gene evolution in our data set. The number of proteinprotein interactions for extracellular, cytoplasmic, and nuclear proteins were extracted from a database constructed in connection with another current project (Kasper Lage et al., unpublished data). The number of proteins for which proteinprotein interaction data could be extracted is shown in table 1. Correlations between number of interaction partners and evolution rate was low but significant (r = 0.16, r2 = 0.025) whether we use selective pressure, dN/dS, or sequence diversity,
, as a measure of evolution rate. The negative correlation shows that proteins with a high number of protein interaction partners (high connectivity in the network) have lower evolution rate, in agreement with previous reports (Fraser et al. 2002
We see differences in protein connectivity in different subcellular compartments. Cytoplasmic proteins have, on average, highest connectivity (9.416.8), followed by nuclear (8.410.8) and extracellular proteins (0.851.4). This somewhat contradicts previous findings that proteins involved in transcription and replication (typically nuclear) are among the proteins with the highest average number of interaction partners (Kunin et al. 2004
). To further prove that our results are largely independent of number of interaction partners, we divided the proteins into three groups depending on their number of interaction partners. The categories (of approximately equal size) were as follows: 1) no interaction partners, 2) one interaction partner, and 3) more than one interaction partner. Within each category, the differences in means of dN/dS and
for intra- and extracellular proteins were statistically significant (table 3). We conclude that the differences in evolution rate between proteins from different subcellular compartments are not caused by differences in the number of proteinprotein interactions.
Impact of Immunoproteins on Results
Some extracellular proteins are involved in immunological processes and possibly in recognizing pathogens and therefore could be under positive, adaptive selection (Hurst and Smith 1999
; Castillo-Davis et al. 2004
). In order to check to what extent immunological proteins involved in recognizing pathogens contribute to the observed differences in evolution rate between intracellular and extracellular proteins, we removed proteins with possible roles in the immune response from the analysis of extracellular proteins using Gene Ontology information in the Swiss-Prot entries. The remaining extracellular proteins showed slightly slower evolution than the complete extracellular data set, but the distributions of evolution rates between cytoplasmic and extracellular proteins excluding immunological proteins are still significantly different (table 2). We conclude that selective pressure caused by interaction with pathogens is not the main cause for the observed phenomenon.
Impact of Disulfide Bridges on Results
It has been proposed that the existence of disulfide bridges in an extracellular protein allows for a faster rate of evolution by stabilizing the structure (Hegyi and Bork 1997
). To investigate whether this could explain the observed difference in evolution rate between intra- and extracellular proteins, we looked for pairs of conserved cysteines, indicating the possible presence of disulfide bridges, in the set of extracellular proteins. Among the 485 extracellular proteins in our amino acid analysis, 101 do not have a pair of conserved cysteines, making the presence of disulfide bonds highly unlikely (83 out of 427 for the DNA analysis). Contrary to the expectation, these disulfide-free proteins show no evidence of being more constrained evolutionarily. In fact, the proteins in this group are evolving slightly faster than the remaining extracellular proteins (figs. 1 and 2). Therefore, our data do not support the theory that the presence of disulfide bridges leads to faster protein evolution. The distributions of evolution rates between cytoplasmic and extracellular proteins excluding those with possible disulfide bonds are still significantly different (table 2). We conclude that the presence of disulfide bonds does not explain the observed differences in evolution rate between intra- and extracellular proteins.
Analysis of Yeast Data
Because yeast is the most studied organism when it comes to evolution rates and its correlation to other factors, we also investigated whether we could find any differences in evolution rates between intracellular and extracellular proteins for a yeast data set. Evolution rates as measured by dN/dS ratios (Wall et al. 2005
) for 48 extracellular and cell wall proteins, 1,568 cytoplasmic proteins, and 1,216 nuclear proteins are shown as violin plots in figure 4A. We see no significant difference between nuclear and cytoplasmic proteins, but the mean of dN/dS for the extracellular proteins is significantly higher (table 4). Expression levels (Holstege et al. 1998
) for 108 extracellular and cell wall proteins, 2,713 cytoplasmic proteins, and 2,032 nuclear proteins are shown in figure 4B. The means of the expression levels are significantly different (table 4) in the following order: nuclear < cytoplasmic < extracellular proteins. It is known that there is a strong anticorrelation between expression levels and evolution rates in yeast (highly expressed genes evolve slower). The fact that exported proteins show faster evolution could only be explained by differences in expression levels if the expression levels were lower for the exported proteins, which they are not. Therefore, the difference in evolution rates between extra- and intracellular proteins in yeast is completely independent of expression levels.
|
|
Gene dispensability as measured by average growth rates of the homozygous deletion strains (Deutschbauer et al. 2005
| Discussion |
|---|
|
|
|---|
Analysis of Diversity in Different Subcellular Compartments
We have shown that the evolution rate of extracellular proteins is faster than the evolution rate of intracellular proteins. Specifically, this was shown by analyzing diversity in alignments of proteins from various subcellular compartments. Diversity can be used as a simple measure of evolution rate because it will, all other things being equal, be higher in alignments of rapidly evolving proteins. The observed difference in diversity was not an artifact caused by uneven taxonomic coverage in the different data sets as shown by analyzing data where all alignments contained only the sequences from man, mouse, and rat. It was also not caused by immunoproteins or proteins with disulfide bonds being overrepresented in the extracellular data. Interestingly, different parts of transmembrane proteins displayed a similar trend, with the extracellular parts evolving more rapidly than the intracellular and transmembrane parts.
Concerning the results, two details confirm previous knowledge and indicate that the approach is sound. 1) It has previously been observed that transmembrane regions of membrane proteins are highly conserved between species (Donnelly et al. 1993
; Tourasse and Li 2000
; Stevens and Arkin 2001
). 2) Cysteine is the one amino acid residue that goes against the trend and is actually more highly conserved in the extracellular environment than in the intracellular environment. This can be explained by the fact that cysteines form structurally important disulfide bonds in extracellular proteins, but not in intracellular proteins due to the reducing environment inside a cell.
Analysis of Selective Pressure
The results of the diversity analysis were confirmed and elaborated upon by analyzing the selective pressure acting on genes encoding proteins from different subcellular compartments. Specifically, we estimated the dN/dS ratio (the ratio between the rate of nonsynonymous substitutions per nonsynonymous site and the rate of synonymous substitutions per synonymous site) for the genes encoding the previously examined proteins. We find that genes encoding extracellular proteins on average have a higher dN/dS ratio than genes encoding intracellular proteins. This means that amino acidchanging mutations are more likely to be accepted in extracellular proteins than in intracellular proteins and shows that the results of the amino acidbased analysis are caused by differences in selective pressure and not by differences in the mutation rate of genes from different compartments.
Effects of Gene Expression, Gene Dispensability, and Protein Connectivity
We find that expression levels, expression broadness, gene dispensability (yeast), and protein connectivity (mammals) are all correlated to the evolution rate, with weak, but significant correlations. We also find that expression levels, expression broadness, gene dispensability, and protein connectivity have different distributions in the different subcellular compartments. Protein connectivity, the number of essential genes, and gene expression broadness are all lower for extracellular proteins as compared with intracellular proteins, and growth rate of deletion strains is higher. However, only expression broadness may explain the observed differences in evolution rates between cytoplasmic and nuclear proteins. The gene expression is broader for nuclear as compared with cytoplasmic proteins in the mammalian data set, which correlates to a lower evolution rate. Dividing the data into different subsets based on expression level, expression broadness, or protein connectivity showed that none of these are major reasons for the differences found (table 3). Unfortunately, the available yeast data did not permit us to do a similar analysis for gene dispensability. On the other hand, gene dispensability (the very same data we used) has been shown to have little impact on evolution rate in yeast (Drummond et al. 2005
), and the nuclear yeast proteins are not evolving slower than the cytoplasmic, something we would expect from the gene dispensability distributions.
We also have the possibility that the effects are additive, so that it is a combination of several of the mentioned factors that is causing the differences. However, we do not believe gene expression to be an important cause for the following two reasons. 1) The yeast data show contradicting distributions. 2) The correlations between expression and evolution rate are higher when measured by sequence diversity than when measured by selective pressure. (This is probably due to the fact that preferred codons are more important in highly/broadly expressed genes, giving rise to an overall lower mutation rate.) If differences in expression were a contributing cause of the differences in evolution rate in different compartments, we would see the same difference for the two evolution rate measures, but we do not. As a comparison, the correlation between protein connectivity and evolution rate is also independent on evolution rate measure.
Other Causes for Different Evolution Rates
Because of the observed differences in selective pressure, the differences in evolution rate therefore seem to be caused by extracellular proteins being less constrained than intracellular ones, possibly with an added component of positive selection for some extracellular proteins. In this context, it seems conceivable that intracellular proteins could be relatively constrained because of the complexity of the cellular chemistry. Moreover, intracellular pathways have probably been relatively stable during the evolution of the mammal species investigated here, whereas the systems for intercellular communication and organization are likely to have undergone considerable change in the same period of time. We have shown that the difference was not caused by the presence of immunoproteins or proteins containing disulfide bonds in the extracellular data set. However, extracellular proteins with no direct role in the immune response are also potential targets for infecting pathogens, and this could add an element of positive selection to any extracellular host protein. If there is an inverse relationship between evolution rate and age of a gene (Alba and Castresana 2005
), this could be another reason for the differences between intra- and extracellular proteins because most extracellular proteins probably have evolved with multicellular organisms and therefore are younger on average (compare with the small number of exported proteins in yeast, which is a unicellular eukaryote).
Many extracellular proteins, as well as extracellular parts of membrane-bound proteins, consist of evolutionarily mobile sequence modules (Hegyi and Bork 1997
). These mosaic proteins are generally believed to have evolved by exon shuffling (Kolkman and Stemmer 2001
), and it is thought that this may have played a role in the evolution of multicellularity. Many modules in extracellular proteins contain disulfide bridges, which stabilize the fold and have therefore been suggested to allow for a faster mutation rate (Hegyi and Bork 1997
). Although our data contradict this particular theory, it is possible that the modularity and the exon shuffling may have led to an increased rate of evolution in these proteins.
In recent years, the connection between protein folding/misfolding and evolution has been the focus of much interest (Dobson 1999
, 2003
; Depristo et al. 2005
). A range of debilitating human diseases is associated with protein misfolding events, some of which are associated with protein aggregation resulting in insoluble agglomerates called amyloid plaques. Early-formed species from the aggregation process of otherwise nondisease-associated proteins have been shown to be cytotoxic (Bucciantini et al. 2002
), indicating that there is an inherent toxicity to the aggregates themselves. In agreement with this observation, there is evidence that evolutionary selection has tended to avoid amino acid sequences, such as alternating polar and hydrophobic residues, that favor a ß-sheet structure of the type seen in amyloid fibrils (Broome and Hecht 2000
). Numerous safety mechanisms are in place to protect the organism from misfolded proteins. These are slightly different in nature for intra- and extracellular proteins. Intracellular proteins reside in a crowded environment where a misfolded protein can be refolded with chaperones or marked for degradation with ubiquitin, while exported proteins come in contact with chaperones mainly in the endoplasmic reticulum and Golgi. Quality control is rigorous in the secretory pathway (Hammond and Helenius 1995
), but once outside the cell, the environment is less crowded and the concentration of proteases is low. How these different control mechanisms affect the evolution of intra- and extracellular proteins is hard to predict, but it is possible that a perturbation in protein stability can be more easily accommodated in an extracellular protein. It seems likely that amyloid or other protein aggregates would be more harmful inside the cell than outside. Contradicting this is the fact that the typical amyloid diseases, Alzheimer's disease, and CreutzfeldtJakob disease, both occur extracellularly, but it is important to note that these diseases both occur at an age where the genes have already been passed on to the next generation, and they are therefore less likely to have impacted the evolution to any larger extent (Dobson 2002
).
The horizontal transfer of genes is an important evolutionary mechanism in bacterial genomes. A recent study shows that horizontally transferred genes are integrated at the periphery of the metabolic networks, whereas central parts remain evolutionary stable (Pal et al. 2005
). Our data confirm previous findings that proteins involved in a large number of different proteinprotein contacts (central in the interactome) evolve at a lower rate compared with those that have fewer interaction partners (peripheral in the interactome), which is another example that genes that are central to a system on a molecular level undergo slower evolution than genes that are peripherally involved in the system. An example of this on an organ/tissue level is the finding that members of the acetylcholine receptor family show slower evolution for the members involved in the central nerve system as compared with the members involved in the peripheral nerve system (Miyata et al. 1994
). On the level of the organism, neural- or brain-specific genes display lower evolution rates than the other members of the same gene families (Miyata et al. 1994
), and the brain may be regarded as very central to the system of a mammalian organism. Correspondingly, on the cell level, proteins that are exported may be regarded as peripheral, whereas intracellular proteins may be regarded as central. We speculate that this may be a general rule that will be true on many levels.
We have shown that a major determinant of how rapidly a protein evolves is its subcellular location with extracellular proteins evolving significantly faster than intracellular proteins. Especially because so many plausible determinants have been difficult to prove relevant, this is an important finding.
| Acknowledgements |
|---|
|
|
|---|
We thank Jens Lagergren for guidance concerning the permutation test. This work was supported by The Danish National Research Foundation, the Danish Center for Scientific Computing, and Knut and Alice Wallenbergs Foundation.
| Footnotes |
|---|
Douglas Crawford, Associate Editor
| References |
|---|
|
|
|---|
Alba MM and Castresana J. (2005) Inverse relationship between evolution rate and age of mammalian genes. Mol Biol Evol 22:598606.
Aris-Brosou S. (2005) Determinants of adaptive evolution at the molecular level: the extended complexity hypothesis. Mol Biol Evol 22:2009.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. (2004) GenBank: update. Nucleic Acids Res 32:D236.
Bloom JD and Adami C. (2003) Apparent dependence of protein evolution rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol Biol 3:21.[CrossRef][Medline]
Boeckmann B, Bairoch A, Apweiler R, et al. (12 co-authors). (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:36570.
Broome BM and Hecht MH. (2000) Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. J Mol Biol 296:9618.[CrossRef][ISI][Medline]
Bucciantini M, Giannoni E, Chiti F, Baroni F, Formigli L, Zurdo J, Taddei N, Ramponi G, Dobson CM, Stefani M. (2002) Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature 416:50711.[CrossRef][Medline]
Castillo-Davis CI, Kondrashov FA, Hartl DL, Kulathinal RJ. (2004) The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res 14:80211.
Depristo MA, Weinreich DM, Hartl DL. (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6:67887.[Medline]
Deutschbauer AM, Jaramillo DF, Proctor M, Kumm J, Hillenmeyer ME, Davis RW, Nislow C, Giaever G. (2005) Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169:191525.
Dobson CM. (1999) Protein misfolding, evolution and disease. Trends Biochem Sci 24:32932.[CrossRef][ISI][Medline]
Dobson CM. (2002) Getting out of shape. Nature 418:72930.[CrossRef][Medline]
Dobson CM. (2003) Protein folding and misfolding. Nature 426:88490.[CrossRef][Medline]
Donnelly D, Overington JP, Ruffle SV, Nugent JH, Blundell TL. (1993) Modeling alpha-helical transmembrane domains: the calculation and use of substitution tables for lipid-facing residues. Protein Sci 2:5570.[Abstract]
Drummond DA, Raval A, Wilke CO. (2005) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:32737.
Duret L and Mouchiroud D. (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 17:6874.
Elhaik E, Sabath N, Graur D. (2006) The "inverse relationship between evolution rate and age of Mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence. Mol Biol Evol 23:13.
Fraser HB and Hirsh AE. (2004) Evolution rate depends on number of protein-protein interactions independently of gene expression level. BMC Evol Biol 4:13.[CrossRef][Medline]
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. (2002) Evolution rate in the protein interaction network. Science 296:7502.
Graur D. (1985) Amino acid composition and the evolution rates of protein-coding genes. J Mol Evol 22:5362.[CrossRef][ISI][Medline]
Guldener U, Munsterkotter M, Kastenmuller G, et al. (20 co-authors). (2005) CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res 33:D3648.
Hahn MW and Kern AD. (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22:8036.
Hammond C and Helenius A. (1995) Quality control in the secretory pathway. Curr Opin Cell Biol 7:5239.[CrossRef][ISI][Medline]
Hegyi H and Bork P. (1997) On the classification and evolution of protein modules. J Protein Chem 16:54551.[CrossRef][ISI][Medline]
Hintze JL and Nelson RD. (1998) Violin plots: a box plot-density trace synergism. Am Stat 52:1814.[CrossRef]
Hirsh AE and Fraser HB. (2001) Protein dispensability and rate of evolution. Nature 411:10469.[CrossRef][Medline]
Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA. (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95:71728.[CrossRef][ISI][Medline]
Hurst LD and Smith NG. (1999) Do essential genes evolve slowly? . Curr Biol 9:74750.[CrossRef][ISI][Medline]
Jordan IK, Marino-Ramirez L, Wolf YI, Koonin EV. (2004) Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol 21:205870.
Jordan IK, Rogozin IB, Wolf YL, Koonin EV. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:9628.
Jordan IK, Wolf YI, Koonin EV. (2003) No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3:1.[CrossRef][Medline]
Julenius K, Molgaard A, Gupta R, Brunak S. (2005) Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15:15364.
Kimura M. (1979) The neutral theory of molecular evolution. Sci Am 241:98100 102, 108 passim.[ISI][Medline]
Kolkman JA and Stemmer WP. (2001) Directed evolution of proteins by exon shuffling. Nat Biotechnol 19:4238.[CrossRef][ISI][Medline]
Kunin V, Pereira-Leal JB, Ouzounis CA. (2004) Functional evolution of the yeast protein interaction network. Mol Biol Evol 21:11716.
Li WH and Graur D. (1991) Fundamentals of molecular evolution. (Sinauer Associates, Sunderland, MA).
Miyata T, Kuma K, Iwabe N, Nikoh N. (1994) A possible link between molecular evolution and tissue evolution demonstrated by tissue specific genes. Jpn J Genet 69:47380.[CrossRef][Medline]
Nei M and Li WH. (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:526973.
Pal C, Papp B, Hurst LD. (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:92731.
Pal C, Papp B, Hurst LD. (2003) Genomic function: rate of evolution and gene dispensability. Nature 421:4967, discussion 49:78.
Pal C, Papp B, Lercher MJ. (2005) Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet 37:13725.[CrossRef][ISI][Medline]
Rocha EP and Danchin A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:10816.
Stevens TJ and Arkin IT. (2001) Substitution rates in alpha-helical transmembrane proteins. Protein Sci 10:250717.
Su AI, Cooke MP, Ching KA, et al. (14 co-authors). (2002) Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA 99:446570.
Thompson JD, Higgins DG, Gibson TJ. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:467380.
Tourasse NJ and Li WH. (2000) Selective constraints, amino acid composition, and the rate of protein evolution. Mol Biol Evol 17:65664.
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW. (2005) Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA 102:54838.
Wernersson R and Pedersen AG. (2003) RevTrans: multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res 31:35379.
Winter EE, Goodstadt L, Ponting CP. (2004) Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res 14:5461.
Yang J, Gu Z, Li WH. (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol Biol Evol 20:7724.
Yang Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:5556.
Yang Z. (2002) Inference of selection from multiple species alignments. Curr Opin Genet Dev 12:68894.[CrossRef][ISI][Medline]
Zhang L and Li WH. (2004) Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol 21:2369.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. D. Dean, J. M. Good, and M. W. Nachman Adaptive Evolution of Proteins Secreted during Sperm Maturation: An Analysis of the Mouse Epididymal Transcriptome Mol. Biol. Evol., February 1, 2008; 25(2): 383 - 392. [Abstract] [Full Text] [PDF] |
||||
| ||||||||


), cytoplasmic proteins in black (
), extracellular proteins in blue (x), and membrane proteins: membrane part in green (
), cytosolic part in orange (
) and extracellular part in brown (). The vertical bars show 95% confidence interval of the means.

