MBE Advance Access originally published online on September 15, 2006
Molecular Biology and Evolution 2006 23(12):2467-2473; doi:10.1093/molbev/msl121
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Preferential Duplication in the Sparse Part of Yeast Protein Interaction Network
MOE Key Laboratory of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing, China
E-mail: sunzhr{at}mail.tsinghua.edu.cn.
| Abstract |
|---|
|
|
|---|
Gene duplication is an important mechanism driving the evolution of biomolecular network. Thus, it is expected that there should be a strong relationship between a gene's duplicability and the interactions of its protein product with other proteins in the network. We studied this question in the context of the protein interaction network (PIN) of Saccharomyces cerevisiae. We found that duplicates have, on average, significantly lower clustering coefficient (CC) than singletons, and the proportion of duplicates (PD) decreases steadily with CC. Furthermore, using functional annotation data, we observed a strong negative correlation between PD and the mean CC for functional categories. By partitioning the network into modules and assigning each protein a modularity measure Qn, we found that CC of a protein is a reflection of its modularity. Moreover, the core components of complexes identified in a recent high-throughput experiment, characterized by high CC, have lower PD than that of the attachments. Subsequently, 2 types of hub were identified by their degree, CC and Qn. Although PD of intramodular hubs is much less than the network average, PD of intermodular hubs is comparable to, or even higher than, the network average. Our results suggest that high CC, and thus high modularity, pose strong evolutionary constraints on gene duplicability, and gene duplication prefers to happen in the sparse part of PINs.
Key Words: gene duplicability yeast clustering coefficient protein interaction network modularity network evolution
| Introduction |
|---|
|
|
|---|
Gene duplication has long been thought to be a primary source of material for the origin of evolutionary novelties (Taylor and Raes 2004
It is known that duplicates, produced by gene duplication, are the outcome of a complicated process with multiple stages, including generation, fixation in the population, preservation, and further divergence. Several factors have been reported to be related with gene duplicability. Among them are gene complexity (He and Zhang 2005
), gene function (Conant and Wagner 2002
; Marland et al. 2004
; Prachumwat and Li 2006
), gene essentiality (Gu et al. 2003
; He and Zhang 2006
), dosage effects (Papp et al. 2003
), mRNA expression level (Davis and Petrov 2004
, 2005
), alternative splicing (Kopelman et al. 2005
; Su et al. 2006
), etc.
Biological processes are rarely performed by single isolated molecules. Instead, they typically involve a coordinated activity of many molecules forming a neighborhood in biomolecular networks. And the function of a protein is in the context of its interactions with other proteins in the cell (Eisenberg et al. 2000
). Many characteristics of proteins have been characterized to be related with their topological feature in biomolecular network. Hence, it is reasonable to hypothesize that there are intensive interplay between duplication of a gene and its environment in the protein interaction network (PIN). Although this question has been investigated in some studies recently (Wagner 2003
; Hughes and Friedman 2005
; Prachumwat and Li 2006
), no consensus has been made. There are 2 major difficulties in answering this question: limited quality of protein interaction data in one hand and the way of analysis in the other. In this study, we conducted detailed analyses of the possible influences of topological features on the propensity of a gene to duplicate in the PIN of Saccharomyces cerevisiae. Our results show that the mean CC of duplicates is significantly lower than that of singletons in the networks. Furthermore, we presented a hypothesis to explain the different behaviors of CC and degree in influencing gene duplicability from the perspective of modular organization of the network.
| Materials and Methods |
|---|
|
|
|---|
Protein Interaction Networks
We used 4 data sets, including 1 combined data set, 2 data sets from small-scale experiments, and 1 data set from a high-throughput (HT) experiment, to study the relationship of gene duplicability and topological features separately. Information of PIN of S. cerevisiae was obtained from the Database of Interacting Proteins (DIP) (Salwinski et al. 2004
The degree (denoted by k) of a node (protein) in an interaction network is defined by the number of interactions of the node with other nodes in that network. For a node of degree k in the network, its clustering coefficient (CC) is defined as 2N/k(k 1), where N is the number of interactions between the node's k neighbors and k(k 1)/2 is the number of possible interactions between its neighbors. It implies an average of interconnectivity among the neighbors of a node. A CC of 1 means that all the neighbors of a node are fully interconnected. The sparse part of the network is characterized by low CC. The distribution of CC is a measure of how clustered a network is. Because CC is not defined for nodes with degree of 1, these nodes are excluded in the correlation analysis of CC and duplicability.
Identification of Duplicates and Singletons
Saccharomyces cerevisiae protein sequences were downloaded from Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/). Duplicate genes were identified as the procedure in Gu et al. (2002)
. An all-against-all FASTA search was conducted for the whole set of S. cerevisiae protein sequences. Those fulfilling the criteria on alignable region and identity same as that used by Gu et al. (2002)
were identified as duplicate genes. We refer to all other genes as singletons, each of which has only 1 copy in the genome. Totally 1,815 proteins were identified as duplicates, each of which has at least 1 homolog. Among them, 1,350 are presented in DIP. After identifying the homologous genes, we used the single-linkage algorithm for clustering genes into gene families.
Protein Function Annotations
To study the relation between gene function and the propensity of a gene to duplicate, we obtained annotations for S. cerevisiae from the MIPS Functional Catalogue (FunCat) database (funcat-2.0 data version 20062005) (Ruepp et al. 2004
). FunCat consists of 28 main categories, of which 19 are available for S. cerevisiae. Each main functional branch is organized as a hierarchical, treelike structure. After excluding the category of "unclassified proteins" and the 2 categories with too few entries (less than 30), we proceeded with the analyses for the remaining 16 categories.
Fitness measurements were obtained from a HT study (Steinmetz et al. 2002
) that measures the growth of each strain of a nearly complete collection of yeast single-genedeletion mutants. Following Gu et al. (2003)
, we used the lowest fitness value across 5 growth conditions (YPD, YPDGE, YPE, YPG, and YPL) for each strain. Haploinsufficient and haplosufficient genes were identified as in the procedure of He and Zhang (2006)
.
Statistics
Our statistical analyses and plotting were conducted using R (version 2.2.0, http://www.r-project.org/). Statistical tests used in the study are
2 test and 2-sample Wilcoxon rank sum test, which is also known as "MannWhitney test." In contrast to the parametric 2-sample t-test, it is a nonparametric method. So it is more appropriate than the t-test because topological quantities generally are not normally distributed.
| Results |
|---|
|
|
|---|
Proportion of Duplicates and CC
To determine whether topological features relate to the duplicability of genes, we compared the mean degree and CC of duplicates with those of singletons in the PINs (table 1). There is no significant difference between the mean degree of duplicates and singletons, in consistence with previous observation (Wagner 2003
|
|
CC is a measure of relative intensity of the interconnectedness of a node's neighbors. Its reliability and practical meaning varies with the degree k of a node for which the CC is calculated. For example, for a node with a degree of 2, CC = 1 means that there is 1 interaction between its 2 neighbor. However, for a node with degree of 10 and CC of 1, it will have 10(10 1)/2 = 45 interactions between its 10 neighbors. Although both nodes have equal CC, it is reasonable to consider that the node with degree of 10 is more highly clustered and modular. Regarded as the mean of k(k 1)/2 measurements, the CC of a node with higher degree is also more reliable in the sense that it is less sensitive to false-positive and false-negative interactions in the network. Thus, we speculate that the correlation between CC and PD should increase with degree. We found this to be the case after binning nodes into groups according to their degree (fig. 2). The extent of differences in PD as a function of CC increases with degree. Specifically, among the proteins with degree greater than 5, those with CC of 0 have a probability of 37.4% to be a duplicate, whereas this number is 10.1% if their CC is greater than 0.5. Such a difference of nearly fourfold in PD at the 2 extremes of CC further emphasizes CC as an indicator of duplication. Thus, by reducing the confounding effects caused by nodes with low degree, this observation highlights the monotonic relationship between CC and PD.
|
PD and Mean CC of Neighbors
CC is a local measure on the interconnectedness of a node's immediate neighbors. It would be interesting to see whether the relation between PD and cliquishness extends to a longer range. So we calculated the mean CC of each node's neighbors. To prevent recounting the pairs that have been considered in calculating the node's CC, these pairs were excluded in the calculation of neighbors' CC. As expected, the neighbors of duplicates show a significantly lower mean CC than that of singletons (table 1). It further indicates that duplicates prefer to appear in the sparse part of the PIN, and this effect is not limited within immediate neighborhoods.
PD versus CC for Functional Categories
Strong relationships between gene function and the propensity of a gene to undergo duplication have been reported in a variety of species (Conant and Wagner 2002
; Blanc and Wolfe 2004
; Prachumwat and Li 2006
). After the relation between CC and PD at the topological level was revealed, it is of interest to check whether this relation also exists at the level of functional categories. We investigated this question in the frame of the 16 functional categories of FunCat. As expected, we reproduced the previous observations on the uneven propensities of genes to duplicate for certain functions. For instance, genes involving transcription have a lower PD than the network-wide average, whereas metabolism genes show markedly higher proportions of duplicates. Many function-specific arguments have been devoted to explain these phenomena (Conant and Wagner 2002
; Blanc and Wolfe 2004
; Prachumwat and Li 2006
). By plotting PD against mean CC (fig. 3), we observed a strong correlation between them (Pearson's r = 0.84, P = 4 x 105, degree of freedom = 14). Specifically, although the metabolism genes have higher duplicability than genes involving transcription (45% vs. 23%), they are also featured by much smaller CC (0.09 vs. 0.19) (table 2).
|
|
Furthermore, we compared the mean CC of duplicates and singletons for each functional category to see whether the difference in CC between duplicates and singletons is function specific. As shown in table 2, the difference qualitatively exists for nearly all the categories.
The same analysis was also conducted in another functional annotation scheme, GO Slim (downloaded from SGD on 2 October 2005), and similar results were obtained (data not shown).
CC and Duplicability
Due to potential bias introduced at the postduplication stage, the observed enrichment of duplicates with small CC does not necessarily mean higher duplicability of proteins with small CC. To disentangle the genuine factors influencing gene duplicability from this effect, we followed the strategy of He and Zhang (2006)
and limited following analyses to singletons in S. cerevisiae. In He and Zhang (2006)
, according to whether their orthologs have duplicated in 4 related yeast species (Debaryomyces hansenii, Candida albicans, Yarrowia lipolytica, and Saccharomyces bayanus) or not, 2 groups of genes were identified and denoted as group D (standing for duplicate) and group S (standing for singleton). We found the mean CC of group D (0.087) to be significantly lower than that of group S (0.159) (P = 0.01). The limited significance is partly due to the small sample size of group D, which have only 44 proteins with degree greater than 1 in DIP. It should be noted that the extent of the difference between group D and group S is similar to the difference between duplicates and singletons, indicating consistency of the 2 observations.
Another way to discern the relation of duplicability and CC is to make a comparison at the level of functional categories. Because the majority of the proteins are singletons, the consequence of gene duplication is unlikely to influence qualitatively the difference in the mean CC among categories. Therefore, the relationship of PD and CC at functional category level largely remains unchanged upon gene duplications. The observed strong negative correlation between PD and mean CC for the functional categories thus can be considered as a support of the relation between gene duplicability and CC. Taken together, our observations suggest a negative correlation between gene duplicability and CC.
CC and Modularity
Modularity is a key feature of cellular systems (Hartwell et al. 1999
). As a characteristic of network modularity, the CC has been shown to be much higher in the PIN compared with random network. So the influence of CC on gene duplicability might lie in the modular organization of the network.
To analyze the relationship of CC and modularity in a quantitative scheme, we applied a recent module identification method (Newman 2006
) to DIP. The method calculates network modularity using matrix eigen values and eigen vectors, to enable the division of networks into modules. An advantage of this method is that each node is assigned a measure Q, which is defined as the number of intramodular edges, subtracted by the expected value. By dividing Q by degree k, we obtained a scaled variable (denoted by Qn) representing the extent of modularity of a node. For nodes with similar degree, higher Qn means higher modularity. By plotting CC against Qn for nodes with degree greater than 4, we observed a strong correlation between them (Spearman's
= 0.50, P < 1015) (supplementary fig. 2, Supplementary Material online). Thus, although CC of the network average is a characteristic of network modularity, CC of individual nodes can be considered as a reflection of their topological modularity.
A recent HT experiment (Gavin et al. 2006
) identified 491 complexes in yeast and partitions proteins in complexes into 2 types: core components that are present in most isoforms and attachments present in only some of them. We mapped these proteins onto DIP and found that the mean CC of cores is 0.258, which is significantly higher than that of attachments (0.178, P < 105, 1-tailed Wilcoxon rank sum test). What is more, PD of cores is 14.3%, compared with 29.0% of attachments (
2 = 24.4, P < 106). Taken together, these observations suggest that the relation of CC and duplicability might be a reflection of the modular organization of the network. More analyses are given in Discussion.
| Discussion |
|---|
|
|
|---|
In this work, we studied gene duplicability in the context of PINs and found a negative correlation between gene duplicability and CC. This relationship was accentuated for proteins with high connectivity. Such patterns suggest that the rate of gene duplication and so the duplication of the protein product and the growth of the network size are inhomogeneous across the PIN. Specifically, gene duplication prefers to happen in the sparse part of the PIN. Our results highlight the importance of studying the interactions among gene duplication, network topology, and evolution.
The Influence of Data Quality of PIN
It is known that HT methods of detecting proteinprotein interactions may produce a significant fraction of false positives, and current knowledge on the yeast proteinprotein interaction is incomplete (i.e., the PIN contains false negatives) (Uetz and Finley 2005
). Any result deduced from the analysis of PIN may be influenced by this factor. To estimate the potential influence of incomplete and noisy protein interaction data on our findings, we mimicked the effects of false positives/negatives by adding/removing 20% of interactions between randomly selected protein pairs. We generated 100 randomly perturbed samples of removal and addition. The trend of the decrease of PD with the increase of CC remains qualitatively the same (Supplementary figs. 3 and 4, Supplementary Material online). Moreover, the trend was observed in not only the data set containing results of both HT experiments and small-scale experiments but also the data set containing small-scale experiments only. These 2 lines of evidence show the robustness of our results to the noises in data.
Yeast 2-hybrid assay is an important HT technology for detecting protein interactions. Although the fraction of false positives has been predicted to be high (Mrowka et al. 2001
), it would be interesting to check whether our observations can be reproduced in the data derived from this method. Thereby, we analyzed the result of a HT yeast 2-hybrid experiment (Ito et al. 2001
). It contains information on 3,266 proteins, connected by 4,383 interactions. A major difference of this data set with other data sets aforementioned is that it has a mean CC of just 0.037 and only 7% of the nodes have nonzero CC. Such a distribution of CC makes it an inappropriate sample for the analysis. As a result, no correlation between CC and PD was observed in this data set. Although the distribution of CC makes this phenomenon not unexpected, it is a caveat of our results on CC and PD for other data sets.
Why Preferential Duplication in the Sparse Part of PIN?
There are several factors known to influence gene duplicability. To understand why more highly clustered proteins have lower duplicability, we explored the possible contributions from several known factors. First, it has been suggested that less important genes in yeast have a higher duplicability (He and Zhang 2006
). In a study investigating the relation of essentiality to topological characteristics, essentiality was shown to be positively correlated with CC in the PIN of yeast (Yu et al. 2004
). We detected a negative correlation between essentiality, measured by the fitness reduction upon deletion, of a gene and its propensity to be a duplicate, consistent with Gu et al. (2003)
. After essentiality is controlled for, the remaining influence of CC on duplicability decreases (data not shown). However, the influence of fitness on the relation of CC and duplicability may be an overestimation due to the functional compensation between paralogs, which will generally reduce the essentiality of duplicates. Furthermore, to examine the influence of essentiality on the relation of CC and duplicability, we divided all genes into 4 groups according to the fitness effects and compared the mean CC of duplicates and singletons for each group (table 3). If the relation of CC and PD is a byproduct of the relation of essentiality and PD, it is expected to disappear within the groups. What was observed is that the difference of duplicates and singletons in CC was preserved for groups of more important genes but became much less distinct for less important genes. As aforementioned, the fitness of a gene may change upon duplication. This effect is especially severe for duplicates measured as unimportant genes. Thus, it may explain the less distinct difference in CC between duplicates and singletons in the less essential groups. Taken together, our observation might be partly attributed to the influence of the fitness effect.
|
Second, it has been suggested that haploinsufficient genes have a higher duplicability than haplosufficient genes because doubling the gene dosage of haploinsufficient genes is more likely to be beneficial immediately after gene duplication (Kondrashov and Koonin 2004
Third, gene duplicability is inevitably influenced by its function. Duplicates are overrepresented in certain functional categories and underrepresented in some others (Conant and Wagner 2002
; Blanc and Wolfe 2004
; Prachumwat and Li 2006
). As has been mentioned, we recovered a similar phenomenon. What is more, a strong negative correlation between mean CC and duplicability for functional categories was revealed, and this correlation exists not only at the level of functional categories but also at the level of the genes within nearly all the functional categories. Thus, rather than stating that our observation is influenced by the biased duplicability of some functional categories, we prefer to consider CC as a good indicator of gene duplicability of functional categories.
The above analyses suggest that the correlation between CC and duplicability can be explained only partly by most of known factors influencing gene duplicability. Thus, it is worth seeking to understand this phenomenon from network perspective. As shown in the Results, CC might influence gene duplicability as a measure of modularity. Modularity can promote or constrain duplication in different part of the network due to the benefits or disadvantages introduced by the duplication.
From the perspective of topological features, we speculate that a modular network might contain 3 basic constituents: intramodular hub, characterized by both high degree and CC, representing the central element of modules; intermodular hub, with high degree and low CC, connecting nodes in different modules; and peripheral elements, featured by moderate or small degree and CC. To give a snapshot on the duplicability of different constituents, we used the following rough but reasonable cutoffs. Hubs are defined as proteins within the top 30% in the network as measured by degree. Among the hubs, those within both the top/bottom 30% as measured by CC and top/bottom 30% as measured by Qn from the module identification method of Newman (2006)
were identified as intra-/intermodular hubs, respectively. Consequently, we identified 244 intramodular and 190 intermodular hubs, with 15% and 39% of them being duplicates, respectively, compared with the network average of 28%. Clearly, the choice of thresholds is somewhat arbitrary. But the results remain qualitatively the same even when we used different cutoffs to define hubs. A caveat of our procedure is that filtering by degree, CC and Qn may not be sufficient to identify different hubs, which may have temporal features not captured in a static map of protein interactions. So there might be some false positives in our result.
In summary, different network constituents show distinct propensity of gene duplication. Our observations suggest the following picture of network growth by duplication. Intermodular hubs represent the most stable and conservative part of the network, with little chance of duplication. Intramodular hubs are among the sparse and dynamic region of network evolution, not only due to a high rate of duplication but also in the sense that duplication of hubs may induce more interaction rewiring. Through duplications in these positions, the network evolves new cellular functions by reorganizing the connections among modules. The peripheral nodes constitute the major part of the network and grow at a moderate rate. The growth and modifications of modules largely lie in this part due to their large populations.
It is interesting to note that the result of another study (Fraser 2005
), which focused on the evolutionary rates of the 2 types of hubs, is compatible with our observation in measuring the evolutionary conservation of hubs, although different data and strategies were used to identify the hubs.
The Roles of Degree and CC in Deciding Gene Duplicability
Taken together, the above results provide valuable clues, based on which we propose a hypothesis to explain the relation of topological characteristics and the conservation of proteins in the PIN. On one hand, a high CC indicates high functional coherence and compactness between a node and its neighbors, which together form modules. On the other hand, a high CC, if accompanied by high degree, also means the central role of a node in the module. Thus, higher CC poses more severe constraint on the protein evolution, including deletion, duplication, and changes in sequence. The situation is quite different for degree. Degree correlates with pleiotropy, so that hubs experience more pleiotropic constraints. Nevertheless, higher degree is also a sign of higher intrinsic flexibility and thus has more potential to evolve adaptive function upon mutations. Consequently, we speculate that degree might impose 2 opposite effects, which largely counteract each other, on the evolution of a protein. In case of gene deletion and sequence divergence, the former effect may exceed the latter, leading to significantly reduced evolutionary rates of both types of hub. But in case of duplication, pleiotropy may additionally facilitate the preservation of duplicates, as suggested by subfunctionalization model (Force et al. 1999
) and "adaptive-conflict" model (Hughes 1994
). Thus, the intramodular hubs, both free from the constraint by CC and benefited from the pleiotropy, may be conferred duplicability similar to, or even higher than, the network average and therefore play active roles in network evolution and contribute to the plasticity of biomolecular network by organizing limited number of modules to fulfill various cellular functions.
| Supplementary Material |
|---|
|
|
|---|
Supplementary figures 14 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Dr. Jingdong Han for critical discussions, Dr. Mark Newman for providing the program of module identification, and Yun Zhou and Hu Chen for valuable comments. This work was supported by grants from the National Natural Science Foundation of China (No.90303017, No.90408019), Hi-Tech Research and Development 863 Program of China (No.2002AA234041), and Foundational Science Research Grant from the 973 project (No.2003CB715900).
| Footnotes |
|---|
Jianzhi Zhang, Associate Editor
| References |
|---|
|
|
|---|
Bader GD and Hogue CWV. (2002) Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol 20:991997.[CrossRef][ISI][Medline]
Blanc G and Wolfe KH. (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16:16791691.
Conant GC and Wagner A. (2002) Genomehistory: a software tool and its application to fully sequenced genomes. Nucleic Acids Res 30:33783386.
Davis JC and Petrov DA. (2004) Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol 2:e55.[Medline]
Davis JC and Petrov DA. (2005) Do disparate mechanisms of duplication add similar genes to the genome? Trends Genet 21:548551.[CrossRef][ISI][Medline]
Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. (2000) Protein function in the post-genomic era. Nature 405:823826.[CrossRef][Medline]
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:15311545.
Fraser HB. (2005) Modularity and evolutionary constraint on proteins. Nat Genet 37:351352.[CrossRef][ISI][Medline]
Gao L-Z and Innan H. (2004) Very low gene duplication rate in the yeast genome. Science 306:13671370.
Gavin A-C, Aloy P, Grandi P, et al. (32 co-authors). (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440:631636.[CrossRef][Medline]
Gu Z, Cavalcanti A, Chen F-C, Bouman P, Li W-H. (2002) Extent of gene duplication in the genomes of drosophila, nematode, and yeast. Mol Biol Evol 19:256262.
Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li W-H. (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421:6366.[CrossRef][Medline]
Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes H-W, Stumpflen V. (2006) MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 34:D436D441.
Hartwell LH, Hopfield JJ, Leibler S, Murray AW. (1999) From molecular to modular cell biology. Nature 402:C47C52.[CrossRef][Medline]
He X and Zhang J. (2005) Gene complexity and gene duplicability. Curr Biol 15:10161021.[CrossRef][ISI][Medline]
He X and Zhang J. (2006) Higher duplicability of less important genes in yeast genomes. Mol Biol Evol 23:144151.
Hughes AL. (1994) The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond B Biol Sci 256:119124.[Medline]
Hughes AL and Friedman R. (2005) Gene duplication and the properties of biological networks. J Mol Evol 61:758764.[CrossRef][ISI][Medline]
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 98:45694574.
Kondrashov FA and Koonin EV. (2004) A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20:287290.[CrossRef][ISI][Medline]
Kopelman NM, Lancet D, Yanai I. (2005) Alternative splicing and gene duplication are inversely correlated evolutionary mechanisms. Nat Genet 37:588589.[CrossRef][ISI][Medline]
Lynch M and Conery JS. (2000) The evolutionary fate and consequences of duplicate genes. Science 290:11511155.
Marland E, Prachumwat A, Maltsev N, Gu N, Li W-H. (2004) Higher gene duplicabilities for metabolic proteins than for nonmetabolic proteins in yeast and E. coli. J Mol Evol 59:806814.[CrossRef][ISI][Medline]
Mrowka R, Patzak A, Herzel H. (2001) Is there a bias in proteome research? Genome Res 11:19711973.
Newman MEJ. (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103:85778582.
Papp B, Pal C, Hurst LD. (2003) Dosage sensitivity and the evolution of gene families in yeast. Nature 424:194197.[CrossRef][Medline]
Prachumwat A and Li W-H. (2006) Protein function, connectivity, and duplicability in yeast. Mol Biol Evol 23:3039.
Rubin GM, Yandell MD, Wortman JR, et al. (50 co-authors). (2000) Comparative genomics of the eukaryotes. Science 287:22042215.
Ruepp A, Zollner A, Maier D, et al. (11 co-authors). (2004) The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32:55395545.
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32:D449D451.
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res 34:D535D539.
Steinmetz LM, Scharfe C, Deutschbauer AM, et al. (11 co-authors). (2002) Systematic screen for human disease genes in yeast. Nat Genet 31:400404.[ISI][Medline]
Su Z, Wang J, Yu J, Huang X, Gu X. (2006) Evolution of alternative splicing after gene duplication. Genome Res 16:182189.
Taylor JS and Raes J. (2004) Duplication and divergence: the evolution of new genes and old ideas. Annu Rev Genet 38:615643.[CrossRef][ISI][Medline]
Uetz P and Finley RLJ. (2005) From protein networks to biological systems. FEBS Lett 579:18211827.[CrossRef][ISI][Medline]
Wagner A. (2003) How the global structure of protein interaction networks evolves. Proc R Soc Lond B Biol Sci 270:457466.[Medline]
Yu H, Greenbaum D, Lu HX, Zhu X, Gerstein M. (2004) Genomic analysis of essentiality within protein networks. Trends Genet 20:227231.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. B. Dopman and D. L. Hartl A portrait of copy-number polymorphism in Drosophila melanogaster PNAS, December 11, 2007; 104(50): 19920 - 19925. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



