MBE Advance Access originally published online on August 24, 2005
Molecular Biology and Evolution 2006 23(1):30-39; doi:10.1093/molbev/msi249
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Protein Function, Connectivity, and Duplicability in Yeast

* Committee on Genetics, University of Chicago; and
Department of Ecology and Evolution, University of Chicago
E-mail: whli{at}uchicago.edu.
| Abstract |
|---|
|
|
|---|
Protein-protein interaction networks have evolved mainly through connectivity rewiring and gene duplication. However, how protein function influences these processes and how a network grows in time have not been well studied. Using protein-protein interaction data and genomic data from the budding yeast, we first examined whether there is a correlation between the age and connectivity of yeast proteins. A steady increase in connectivity with protein age is observed for yeast proteins except for those that can be traced back to Eubacteria. Second, we investigated whether protein connectivity and duplicability vary with gene function. We found a higher average duplicability for proteins interacting with external environments than for proteins localized within intracellular compartments. For example, proteins that function in the cell periphery (mainly transporters) show a high duplicability but are lowly connected. Conversely, proteins that function within the nucleus (e.g., transcription, RNA and DNA metabolisms, and ribosome biogenesis and assembly) are highly connected but have a low duplicability. Finally, we found a negative correlation between protein connectivity and duplicability.
Key Words: protein interaction network protein connectivity gene duplicability network evolution protein localization
| Introduction |
|---|
|
|
|---|
Biological processes, which contribute to the phenotypes of living cells, are wired by interaction networks of various cellular components such as proteins, DNA, RNA, and metabolites. Such network data, especially protein-protein interactions in the budding yeast (Saccharomyces cerevisiae), can now be generated in a high-throughput manner, allowing large-scale analyses. We are interested in the yeast protein interaction network that is organized, similar to nonbiological networks, into a small world and a scale-free topology (Barabasi and Oltvai 2004
Barabasi and Albert (1999)
proposed that growth of a network with a preferential attachment behavior is sufficient to explain the emergence of a scale-free network topology. This model requires that a new node preferentially connects to a well-connected node, predicting that old nodes should tend to have a higher connectivity than young ones. This prediction, however, was not supported by a recent analysis of the yeast protein network by Kunin, Pereira-Leal, and Ouzounis (2004)
, who therefore suggested that to understand the scale-free topology of the protein network, protein function should also be taken into account.
In this study, we use a larger set of data or a set of better quality data than that of Kunin, Pereira-Leal, and Ouzounis (2004)
to re-examine the prediction of the preferential attachment model by checking whether a correlation exists between the age and connectivity of yeast proteins. We also investigate whether protein connectivity and gene duplicability vary with gene function. Because yeast, which is a single-cell organism, inhabits in a wide range of environmental niches, genetic diversity for proteins that are exposed to or interact with extracellular environments may confer benefits to the organism. As duplication may increase such diversity (or produce a new adaptive function, e.g., Francino 2005
), we hypothesize a higher duplicability for proteins exposed to extracellular environments than for those localized to intracellular compartments. Moreover, because gene duplication plays a major role in network growth (e.g., Barabasi and Albert 1999
; Pastor-Satorras, Smith, and Sole 2003
) and conversely, connectivity may affect gene duplicability, we investigate whether a relationship exists between protein connectivity and duplicability.
| Materials and Methods |
|---|
|
|
|---|
Protein-Protein Interaction Data
Protein-protein interaction pairs are collected from various high-throughput experiments (Fromont-Racine et al. 2000
a, where a is 5 or 7 (the two cutoff points give similar results). We show the results of analyses on SSBader and ALL_K but not the results on other data sets because they are essentially the same.
|
Classification of Proteins into Age Groups
For each yeast protein, we identified homologous proteins from other genomes that have been sequenced. These homologous groups of yeast proteins were obtained from KOG and COG (Tatusov et al. 2003
|
|
Identification of Duplicate and Singleton Genes
The whole set of S. cerevisiae protein sequences were downloaded from SGD (http://www.yeastgenome.org/). Duplicate genes were identified as described in Gu et al. (2003)
Protein Subcellular Localization and Biological Process
The protein localization profile for S. cerevisiae grown in synthetic medium (downloaded from http://yeastgfp.ucsf.edu; Huh et al. 2003
) is combined with subcellular localization defined by the gene ontogeny (GO) classification (downloaded from SGD on April 5, 2005). Mislocalization of some proteins from Huh et al. (2003)
is corrected according to the authors' supplementary data. The GO subcellular localization categories are translated to the subcellular localization categories of Huh et al. (2003)
because GO subcellular localizations are at a deeper level than those from Huh et al. (2003)
(e.g., GO distinguishes between membrane and lumen of mitochondrion, while Huh et al. [2003]
does not). The GO's extracellular category composed of a small number of proteins is combined into the cell periphery. A protein is associated with more than one localization category if it is found in multiple localizations (e.g., shuttle and transport proteins). Biological processes of each ORF are assigned according to the GO Slim that classifies proteins to gain a high-level view of the functions (downloaded from SGD on April 5, 2005).
Measures of Gene Duplicability
Similar to Marland et al. (2004)
, for each category (i.e., a subcellular localization category or a biological process) under study, the number of unique types of genes is defined as the number of singletons plus the number of duplicated gene types in that category. The number of duplications per gene (n) is the total number of genes divided by the total number of unique types of genes. The proportion of unduplicated genes (P) is the proportion of singletons in the total number of unique types of genes. While n roughly indicates the average number of paralogs per gene in the category, 1 P denotes the proportion of gene types that have been duplicated. Both n and 1 P can be used as measures of gene duplicability (Yang, Lusk, and Li 2003
). In addition, we also consider the proportion of duplicate genes in each category (Q). Q and n are less desirable than P because they can be strongly affected by the presence of large gene families.
Our statistical analyses are conducted in R (version 2.0.1, http://www.r-project.org/). The statistical tests used are Fisher's exact test and the Mann-Whitney test (also called the Wilcoxon rank sum two-sample test), which, in contrast to the parametric two-sample t test, is a nonparametric method replacing the protein connectivity data by ranks, which reduces the influence of outliers. The test is more appropriate than the t test because protein connectivities are not normally distributed.
| Results |
|---|
|
|
|---|
Origins of Proteins and Their Connectivity
To determine whether the connectivity (k) correlates with the age of a protein, the mean and median k values for each age group are obtained. It appears that young proteins (e.g., those found in yeasts only) have a lower mean k than that in the older age groups (e.g., Archaea and Plasmodium-Plants-Animals) for both the all data set (ALL_K) and the highly confident (SSBader) data set (table 2 and fig. 2A and B). However, those proteins traceable to Eubacteria show a lower mean k and a slightly lower median k than those in the Archaea group (table 2 and fig. 2A and B). Further, the younger age groups have a lower proportion of hubs than the older age groups, except the Eubacteria, which shows a lower proportion of hubs than the Archaea and the Plasmodium-Plants-Animals (fig. 2C).
|
Performing the Mann-Whitney test on these data, we first ask whether two adjacent age groups have different connectivities. The test shows that the Eubacteria age group has a significantly lower k than Archaea in both data sets (P < 5 x 108; fig. 2A and B). The Archaea age group has a significantly higher k than the Plasmodium-Plants-Animals group in ALL_K (P = 2 x 104), though the significant level is lower in SSBader (P = 0.068). Second, we pick an age group as a pivot group and perform two tests: (1) between this pivot group and the older proteins and (2) between the pivot group and the younger proteins. The tests reveal that the Eubacteria group "does not" show a different k from the rest of the proteins in the network. The other groups show a significantly different k from their older and/or younger counterparts (P << 0.006; data not shown). Clearly, the oldest proteins (the Eubacteria group) do not have the highest k in the protein network, and for this reason there is no positive correlation between connectivity and age. However, a significant correlation is seen when the Eubacteria group is excluded.
Protein Function and Connectivity
In the following analysis, we consider protein localization and perform the Mann-Whitney test on both data sets; although we show only the results for SSBader, a similar pattern is observed for ALL_K. Note that the mean k values for the proteins localized to nucleus and nucleolus are 6.85 and 8.81, respectively, which are significantly higher than the mean k (5.33) for the whole network (P < 5 x 106, table 3). Some other localization categories such as cytoplasm, mitochondrion, cell periphery, and endoplasmic reticulum show a significantly lower k than the other proteins (P < 0.003, table 3).
|
Similarly, when biological processes are considered, proteins involved in protein biosynthesis and catabolism, ribosome biogenesis and assembly, DNA and RNA metabolisms, and transcription show a significantly higher k than the proteins involved in other biological processes (mean and median k are greater than 5.33 and 3, respectively; P < 5 x 106, table 3). Although proteins involved in lipid, carbohydrate, and amino acid metabolisms and cellular respiration show a significantly lower k than the average in SSBader (table 3), only lipid metabolism proteins show a significantly lower k in ALL_K; nonetheless, the proteins in the other three categories still have the low k (data not shown).
Protein Function Versus Connectivity Within the Same Age Group
It is interesting to ask whether within the same age group the function of a protein affects its connectivity. To answer this question, we categorize proteins by their localization or biological processes for each protein age group and perform the Mann-Whitney test between a functional group of interest and the rest within the same age group (only mean k values for functional categories are shown in Supplementary Fig. 1S, Supplementary Material online). Proteins localized to nucleus and nucleolus show a significantly higher k in the Eubacteria and Archaea age groups; proteins localized to nucleus also show a significant higher k in the Plasmodium-Plants-Animals and MicrosporaSchizosaccharomyces pombeSaccharomyces complex groups (P < 7 x 104). For biological processes, proteins involved in ribosome biogenesis and assembly, RNA metabolism, and protein catabolism are significantly more highly connected than other functions for the Eubacteria, Archaea, and Plasmodium-Plants-Animals groups. Although many younger age groups (IV and V) do not show a significant difference in connectivity among biological process categories (probably because of small sample sizes), proteins involved in transcription show a significantly higher k than those in the other biological processes in the MicrosporaSchizosaccharomyces pombeSaccharomyces complex age group. Proteins involved in carbohydrate and amino acid and derivative metabolisms show a significantly lower k than other proteins in the Eubacteria group, while proteins involved in cell wall and membrane organization and biogenesis are lowly connected in the MicrosporaS. pombeSaccharomyces complex group.
Protein Function and Duplicability
We investigate the proportion of unduplicated genes (P) for each localization category. A low P value indicates a high duplicability. The P values are significantly lower in cell periphery, bud, and vacuole categories but significantly higher in nucleus and nucleolus (P < 0.003, table 4); all tests for this section are Fisher's exact test. The categories with a significantly lower P value have a higher proportion of duplicate genes (Q) than that of the whole genome and vice versa (P < 0.003, table 4). A significantly different duplicability in cytoplasm (higher) and spindle pole (lower) from average is indicated by Q. The significant high duplicability in cell periphery is also revealed by the number of duplications per gene (n = 1.44; n = 1.21 for the whole-genome average). Similarly, the n values are relatively low (between 1.03 and 1.11) for mitochondrion, nucleus, nucleolus, and spindle pole.
|
When biological processes are considered, we find that
1/4 of yeast proteins are uncharacterized. Among the remaining proteins, duplicates in carbohydrate metabolism, generation of precursor metabolites and energy, protein biosynthesis and catabolism, transport, and response to stress are significantly overrepresented, whereas in DNA metabolism, RNA metabolism, transcription, and ribosome biogenesis and assembly, duplicates are significantly underrepresented (P < 0.002, table 5). Among all proteins annotated with their biological processes, those involved in the transport, protein biosynthesis and catabolism, RNA metabolism, transcription, protein modification, and DNA metabolism are among the highest represented (between 7%17%). Relative to the whole-proteome average, these categories show either high or low number of duplicates (table 5). Generally speaking, low P values are supported by high Q values. Duplicates in the unknown biological process category, however, are significantly underrepresented (P < 0.002).
|
Protein Connectivity and Duplicability
Figure 3A shows that P is positively correlated with both mean and median k for biological processes (R2 = 0.35 and 0.45 for mean and median k, respectively, P < 0.002). A similar pattern is also observed when we consider only significant categories from table 3 (R2 = 0.66 and 0.79) or table 5 (R2 = 0.74 and 0.83 for mean and median k, respectively, all P < 0.008). Moreover, this pattern is also found when the proportion of hubs is used as a measure of connectivity (R2 = 0.43, P = 0.0001; fig. 3B). In addition, we observe essentially the same results when using protein localization categories and/or the Q values (data not shown). Furthermore, there are, on average,
8% higher duplicabilities in the nonhub proteins than the hub proteins (P = 79% and 88% and Q = 30% and 22% for the nonhubs and hubs, respectively, P < 1 x 106). This pattern suggests that proteins with a lower connectivity have, on average, a high gene duplicability.
|
A summary of protein connectivity and gene duplicability of nuclear, cytoplasmic, and external and cell peripheral proteins are shown in table 6. In general, nuclear proteins are highly connected but show a low duplicability, while those external and cell peripheral ones show a high duplicability but are lowly connected. The connectivity and gene duplicability of cytoplasmic proteins are between those of the nuclear and the external and cell peripheral proteins.
|
| Discussion |
|---|
|
|
|---|
Our finding that proteins in the oldest group (the Eubacteria group) do not exhibit higher connectivities (k) than proteins in the Archaea and Plasmodium-Plants-Animals groups is similar to that of Kunin, Pereira-Leal, and Ouzounis (2004)
The higher protein connectivity for the Archaea and Plasmodium-Plants-Animals age groups than the Eubacteria group could be due to connection gains through new gene creation (e.g., gene duplication or gene fusion). Possibly, during the early evolution of eukaryotic cells whose nucleus evolved from Archaea, proteins for eukaryotic cell formation might have arisen in number, and some became hubs for such functional modules (e.g., fig. 2C). Moreover, domain shuffling and length extension (increase protein complexity) of proteins in the Archaea and Plasmodium-Plants-Animals groups could have increased new connections for these proteins.
A constraint by gene function may influence protein network evolution (Kunin, Pereira-Leal, and Ouzounis 2004
). To investigate this, we defined protein function by both localization and biological processes according to the GO annotation. Because localization partly determines the function of a protein, a combination of localization and biological process increases confidence in our function classification. Proteins involved in transcription, RNA metabolism, protein biosynthesis and catabolism, and ribosome biogenesis and assembly tend to be highly connected. Although the majority of our results are consistent with those reported by Kunin, Pereira-Leal, and Ouzounis (2004)
, translational proteins (e.g., protein biosynthesis and catabolism) are highly connected, contrary to their finding. In support of our observation, the majority of these proteins localized to nucleus and nucleolus are highly connected. On the other hand, proteins localized to cell periphery and vacuole are lowly connected (tables 3 and 6).
It appears that protein function affects connectivity across protein age groups (see "Protein Function Versus Connectivity Within the Same Age Group"). This pattern, however, may have resulted from the emergence time of these highly connected protein functions because proteins emerged at the same evolutionary period tend to interact with one another (Qin et al. 2003
), and proteins with similar functions are likely clustered (von Mering et al. 2002
). We find that the emergence time of protein contributes partly to the high k for "only" some gene functions. For example, transport and RNA metabolism categories have comparable numbers of proteins (and prevalently emerged) in the Eubacteria and Plasmodium-Plants-Animals age groups, but transport proteins are not highly connected (Supplementary Table 1S and Fig. 1SB, Supplementary Material online). Biological processes with proteins that largely emerged in the Eubacteria group (e.g., carbohydrate, amino acid and derivative metabolisms, and generation of precursor metabolites and energy) are also relatively lowly connected (Supplementary Table 1S and Fig. 1SB, Supplementary Material online). Likewise, proteins localized in cell periphery, cytoplasm, endoplasmic reticulum, nucleus, and nucleolus largely emerged in the Eubacteria and Plasmodium-Plants-Animals age groups, but only those localized in nucleus and nucleolus are coincidentally highly connected (Supplementary Table 1S and Fig. 1SA, Supplementary Material online). This finding supports the view of Kunin, Pereira-Leal, and Ouzounis (2004)
that age alone is not sufficient to explain the observed connectivities of proteins and that protein function also needs to be considered. Importantly, evidence that for almost all of the function categories proteins in the Eubacteria group show a lower k than those in the Archaea and Plasmodium-Plants-Animals groups (Supplementary Fig. 1S, Supplementary Material online) confirms our previous finding.
The observed patterns of gene duplication suggest that duplicate genes in the yeast are unequally represented in both subcellular localization and biological process categorizations (tables 46 ![]()
). A higher duplicability is observed for proteins localized to cell periphery, bud, vacuole, and cytoplasm and for proteins involved in transport, carbohydrate metabolisms, protein biosynthesis and catabolism, response to stress, and generation of precursor metabolites and energy, but not for proteins in other subcellular compartments or biological processes. Some functions such as transcription, DNA and RNA metabolisms, and ribosome biogenesis and assembly have a low duplicability. From these observations, we suggest that gene function is a major determinant of gene duplicability in S. cerevisiae.
Duplicate genes of some functions may not have a good chance to confer selective advantages, leading to a low gene duplicability. Proteins involved in transcription, DNA and RNA metabolisms, and ribosome biogenesis and assembly may face with such a constraint. For example, duplication of a global transcription regulator likely affects many downstream genes, presumably being deleterious in the majority of cases and leading to a slim chance of duplicate survival. These functions (e.g., ribosome biogenesis and assembly) may also be constrained by the dosage balance of protein complex (Papp, Pal, and Hurst 2003
; Yang, Lusk, and Li 2003
). However, other factors may affect gene duplicability because of a higher proportion of transcription proteins in multicellular organisms than in yeast (Babu et al. 2004
). Moreover, the pattern that yeast's duplicate genes, especially those retained from the whole-genome duplication, tend to have a higher gene complexity (measured by protein length, number of domains or of cis-regulatory elements) than other genes leads to the conclusion that gene complexity may contribute to the duplicate retention (He and Zhang 2005
). However, analyzing protein length in our data set, we find that in approximately half of the functional categories duplicates are longer than singletons, and in a few of these cases the difference is statistically significant (data not shown).
Our results (table 6) support the hypothesis that a higher duplicability for proteins interacting with fluctuating external environments may confer benefits to the organism. For example, in yeast nutrient capture through cell periphery is the first stage of cell growth, and so the chance that duplication of a gene in this process is beneficial is high. A high duplicability for proteins localized to cell periphery is also seen in fruit fly, nematode, mouse, and humans (unpublished data), along with an increase in the total numbers of these proteins from yeast to nematode and fruit fly (Hazkani-Covo et al. 2004
). Moreover, the majority of highly duplicated genes in bacterial or multicellular eukaryotic genomes encode various types of membrane or secreted proteins such as membrane transporters, receptors, and secreted signaling molecules (Kondrashov et al. 2002
). Together, these results support a higher duplicability for proteins that interact with external environments.
Living in an often scarce nutrient habitat, yeasts inevitably compete among themselves or with other species for limited nutrients. Therefore, duplication of a transport protein may be advantageous because it increases the efficiency of nutrient uptake. Similarly, the substrate transport between subcellular compartments or even in or out of the cell is a basic requirement of eukaryotic cells. In addition to nutrient uptake, yeast transporters play diverse roles such as drug resistance, salt tolerance, control of cell volume, efflux of undesirable metabolites, and sensing of extracellular nutrients (Van Belle and Andre 2001
). A high duplicability of transport proteins is also observed in bacterial genomes (Gevers et al. 2004
). Therefore, duplication of such a protein may increase the chance of functional specialization or diversification.
Using transporter subfamilies characterized phylogenetically (De Hertogh et al. 2002
), we find a unique set of transporters in mitochondrion but a shared set between cell periphery and vacuole. In cell periphery and vacuole, three subfamilies are present at a high number: the yeast amino acid transporters (YATs), the drug H+ antiporters (DHAs), and the sugar porters (SPs). In particular, the DHAs directly interact with and protect cell from a number of extracellular compounds that are growth inhibitory or unusual to natural environments (Sá-Correia and Tenreiro 2002
). Most DHAs are typically characterized as nonessential due to their functional redundancy and specificity overlap (Rogers et al. 2001
; Giaever et al. 2002
). Furthermore, these genes are only activated by environmental stress factors. In general, DHAs and a large number of YATs and SPs are undetected under a normal growth condition. The SPs are usually involved in the first step in carbohydrate metabolism after di- and trisaccharides are hydrolyzed outside the cell. Therefore, the variability and efficiencies of transporters directly affect the metabolic and growth rate of yeast. Furthermore, a high duplicability in yeast metabolism, especially in the central metabolism and upstream of the central metabolism pathways, has been observed (Marland et al. 2004
).
Although recent evidence of prevalence in partial duplications of yeast's protein complexes (i.e., a large fraction of protein complexes with a strong homology to others) lends support for functional specialization (Pereira-Leal and Teichmann 2005
), how protein connectivity plays a role in gene duplicability is unclear. The preferential attachment model also does not suggest any bias in duplicability of a node type (hub vs. nonhub). Our results suggest that highly connected proteins (i.e., hubs) have a low duplicability (fig. 3 and table 6). Despite its high tolerance against random perturbation, the protein network integrity relies mainly on its hubs and is sensitive to a targeted hub removal (Albert, Jeong, and Barabasi 2000
). Indeed, lethality increases threefolds if a hub is deleted (Jeong et al. 2001
; Han et al. 2004
). Along with these observations, a slow evolutionary rate (Fraser 2005
) and highly conserved ortholog (Wuchty 2004
; Fraser 2005
) for hubs suggest a strong selection pressure on them. Likely, duplication of a hub is deleterious because it affects a large number of proteins (i.e., a high pleiotropy), especially those with partners participting in different functions (an intermodule hub). However, the pleiotropy is likely reduced if such a hub is situated within a functional module (an intramodule hub). Recently, however, a greater constraint on intramodule than intermodule hubs was found (Fraser 2005
). Below, we discuss this issue further.
A hub protein may be part of a large (stable) protein complex; in this case, a dosage increase by a single-gene duplication would likely affect the balance of complex formation (Veitia 2002
). A larger proportion of the intramodule hubs (81%) are in a complex than that of the intermodule hubs (18%). Conversely, the majority of the intermodule hubs are mediators, regulators, or adapters (Han et al. 2004
). These intermodule hubs globally integrate signals between functional modules and are likely to localize to various subcellular compartments. Duplication of an intermodule hub can destroy the network integrity and disrupt the informational flow because of a subsequent interaction change or misexpression of a duplicate. Using a small data set characterized by Han et al. (2004)
, we find that the intermodule hubs show a slightly lower duplicability (12.6%) than the intramodule hubs (16.3%). This is contrary to Fraser's (2005)
observation. Further research is needed to find out whether duplicability of a hub is more constrained within or between functional modules. It is, however, clear that the survivability of duplication of an intramodule or an intermodule hub is usually lower than the average gene duplicability in the genome.
| Supplementary Material |
|---|
|
|
|---|
Supplementary Table 1S and Figure 1S are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank V. Kunin for sending us data and R. Lusk and M. Chou for their help in the protein interaction data collection, Y.-W. Chang for her help in the gene function classification, and G. Morris, J. Yang, and Z. Gu for helpful discussions. We are grateful to two anonymous reviewers for their valuable comments. This study was supported by the International Balzan Foundation.
| Footnotes |
|---|
Takashi Gojobori, Associate Editor
| References |
|---|
|
|
|---|
Albert, R., and A. L. Barabasi. 2000. Topology of evolving networks: local events and universality. Phys. Rev. Lett. 85:52345237.[CrossRef][Web of Science][Medline]
Albert, R., H. Jeong, and A. L. Barabasi. 2000. Error and attack tolerance of complex networks. Nature 406:378382.[CrossRef][Medline]
Babu, M. M., N. M. Luscombe, L. Aravind, M. Gerstein, and S. A. Teichmann. 2004. Structure and evolution of transcriptional regulatory networks. Curr. Opin. Struct. Biol. 14:283291.[CrossRef][Web of Science][Medline]
Bader, J. S., A. Chaudhuri, J. M. Rothberg, and J. Chant. 2004. Gaining confidence in high-throughput protein interaction networks. Nat. Biotechnol. 22:7885.[CrossRef][Web of Science][Medline]
Barabasi, A. L., and R. Albert. 1999. Emergence of scaling in random networks. Science 286:509512.
Barabasi, A. L., and Z. N. Oltvai. 2004. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5:101113.[CrossRef][Web of Science][Medline]
Cliften, P., P. Sudarsanam, A. Desikan, L. Fulton, B. Fulton, J. Majors, R. Waterston, B. A. Cohen, and M. Johnston. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301:7176.
De Hertogh, B., E. Carvajal, E. Talla, B. Dujon, P. Baret, and A. Goffeau. 2002. Phylogenetic classification of transporters and other membrane proteins from Saccharomyces cerevisiae. Funct. Integr. Genomics 2:154170.[CrossRef][Medline]
Drees, B. L., B. Sundin, E. Brazeau et al. (22 co-authors). 2001. A protein interaction map for cell polarity development. J. Cell Biol. 154:549571.
Dujon, B., D. Sherman, G. Fischer et al. (19 co-authors). 2004. Genome evolution in yeasts. Nature 430:3544.[CrossRef][Medline]
Francino, M. P. 2005. An adaptive radiation model for the origin of new gene functions. Nat. Genet. 37:573577.[CrossRef][Web of Science][Medline]
Fraser, H. B. 2005. Modularity and evolutionary constraint on proteins. Nat. Genet. 37:351352.[CrossRef][Web of Science][Medline]
Fromont-Racine, M., A. E. Mayes, A. Brunet-Simon et al. (11 co-authors). 2000. Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast 17:95110.[CrossRef][Web of Science][Medline]
Gavin, A. C., M. Bosche, R. Krause et al. (38 co-authors). 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141147.[CrossRef][Medline]
Gevers, D., K. Vandepoele, C. Simillon, and Y. Van de Peer. 2004. Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 12:148154.[CrossRef][Web of Science][Medline]
Ghaemmaghami, S., W. K. Huh, K. Bower, R. W. Howson, A. Belle, N. Dephoure, E. K. O'Shea, and J. S. Weissman. 2003. Global analysis of protein expression in yeast. Nature 425:737741.[CrossRef][Medline]
Giaever, G., A. M. Chu, L. Ni et al. (74 co-authors). 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418:387391.[CrossRef][Medline]
Gu, Z., L. M. Steinmetz, X. Gu, C. Scharfe, R. W. Davis, and W. H. Li. 2003. Role of duplicate genes in genetic robustness against null mutations. Nature 421:6366.[CrossRef][Medline]
Han, J. D., N. Bertin, T. Hao et al. (11 co-authors). 2004. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430:8893.[CrossRef][Medline]
Hazkani-Covo, E., E. Y. Levanon, G. Rotman, D. Graur, and A. Novik. 2004. Evolution of multicellularity in Metazoa: comparative analysis of the subcellular localization of proteins in Saccharomyces, Drosophila and Caenorhabditis. Cell Biol. Int. 28:171178.[CrossRef][Web of Science][Medline]
He, X., and J. Zhang. 2005. Gene complexity and gene duplicability. Curr. Biol. 15:10161021.[CrossRef][Web of Science][Medline]
Ho, Y., A. Gruhler, A. Heilbut et al. (20 co-authors). 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180183.[CrossRef][Medline]
Huh, W. K., J. V. Falvo, L. C. Gerke, A. S. Carroll, R. W. Howson, J. S. Weissman, and E. K. O'Shea. 2003. Global analysis of protein localization in budding yeast. Nature 425:686691.[CrossRef][Medline]
Ito, T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98:45694574.
Jeong, H., S. P. Mason, A. L. Barabasi, and Z. N. Oltvai. 2001. Lethality and centrality in protein networks. Nature 411:4142.[CrossRef][Medline]
Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241254.[CrossRef][Medline]
Kondrashov, F. A., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Selection in the evolution of gene duplications. Genome Biol. 3:research0008.10008.9.
Kunin, V., J. B. Pereira-Leal, and C. A. Ouzounis. 2004. Functional evolution of the yeast protein interaction network. Mol. Biol. Evol. 21:11711176.
Marland, E., A. Prachumwat, N. Maltsev, Z. Gu, and W. H. Li. 2004. Higher gene duplicabilities for metabolic proteins than for nonmetabolic proteins in yeast and E. coli. J. Mol. Evol. 59:806814.[CrossRef][Web of Science][Medline]
Newman, J. R., E. Wolf, and P. S. Kim. 2000. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 97:1320313208.
O'Brien, K. P., M. Remm, and E. L. Sonnhammer. 2005. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33(Database Issue):D476D480.
Papp, B., C. Pal, and L. D. Hurst. 2003. Dosage sensitivity and the evolution of gene families in yeast. Nature 424:194197.[CrossRef][Medline]
Pastor-Satorras, R., E. Smith, and R. V. Sole. 2003. Evolving protein interaction networks through gene duplication. J. Theor. Biol. 222:199210.[CrossRef][Web of Science][Medline]
Pereira-Leal, J. B., and S. A. Teichmann. 2005. Novel specificities emerge by stepwise duplication of functional modules. Genome Res. 15:552559.
Qin, H., H. H. Lu, W. B. Wu, and W. H. Li. 2003. Evolution of the yeast protein interaction network. Proc. Natl. Acad. Sci. USA 100:1282012824.
Rogers, B., A. Decottignies, M. Kolaczkowski, E. Carvajal, E. Balzi, and A. Goffeau. 2001. The pleitropic drug ABC transporters from Saccharomyces cerevisiae. J. Mol. Microbiol. Biotechnol. 3:207214.[CrossRef][Web of Science][Medline]
Sá-Correia, I., and S. Tenreiro. 2002. The multidrug resistance transporters of the major facilitator superfamily, 6 years after disclosure of Saccharomyces cerevisiae genome sequence. J. Biotechnol. 98:215226.[CrossRef][Medline]
Tatusov, R. L., N. D. Fedorova, J. D. Jackson et al. (17 co-authors). 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41.[CrossRef][Medline]
Tong, A. H., B. Drees, G. Nardelli et al. (16 co-authors). 2002. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295:321324.
Uetz, P., L. Giot, G. Cagney et al. (20 co-authors). 2000. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403:623627.[CrossRef][Medline]
Van Belle, D., and B. Andre. 2001. A genomic view of yeast membrane transporters. Curr. Opin. Cell Biol. 13:389398.[CrossRef][Web of Science][Medline]
Veitia, R. A. 2002. Exploring the etiology of haploinsufficiency. Bioessays 24:175184.[CrossRef][Web of Science][Medline]
von Mering, C., R. Krause, B. Snel, M. Cornell, S. G. Oliver, S. Fields, and P. Bork. 2002. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417:399403.[Medline]
Wuchty, S. 2004. Evolution and topology in the yeast protein interaction network. Genome Res. 14:13101314.
Yang, J., R. Lusk, and W. H. Li. 2003. Organismal complexity, protein complexity, and gene duplicability. Proc. Natl. Acad. Sci. USA 100:1566115665.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Qian and J. Zhang Gene Dosage and Gene Duplicability Genetics, August 1, 2008; 179(4): 2319 - 2324. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Kim, J. O. Korbel, and M. B. Gerstein Positive selection at the protein network periphery: Evaluation in terms of structural constraints and cellular context PNAS, December 18, 2007; 104(51): 20274 - 20279. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. B. Dopman and D. L. Hartl A portrait of copy-number polymorphism in Drosophila melanogaster PNAS, December 11, 2007; 104(50): 19920 - 19925. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, Y. Huang, X. Xia, and Z. Sun Preferential Duplication in the Sparse Part of Yeast Protein Interaction Network Mol. Biol. Evol., December 1, 2006; 23(12): 2467 - 2473. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





