MBE Advance Access originally published online on July 12, 2006
Molecular Biology and Evolution 2006 23(10):1931-1936; doi:10.1093/molbev/msl060
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Feed-Forward Loop Circuits as a Side Effect of Genome Evolution
Department of Theoretical Biology and Bioinformatics, University of Utrecht, Utrecht, The Netherlands
E-mail: o.x.corderosanchez{at}bio.uu.nl.
| Abstract |
|---|
|
|
|---|
In this article, we establish a connection between the mechanics of genome evolution and the topology of gene regulation networks, focusing in particular on the evolution of the feed-forward loop (FFL) circuits. For this, we design a model of stochastic duplications, deletions, and mutations of binding sites and genes and compare our results with yeast network data. We show that the mechanics of genome evolution may provide a mechanism of FFL circuit generation. Our simulations result in overrepresentation of FFL circuits as well as in their clustering around few regulator pairs, in concordance with data from transcription networks. The mechanism here proposed and the analysis of the yeast data show that regulator duplication could have played an important role in FFL evolution.
Key Words: genome evolution regulatory networks mutational dynamics network motifs feed-forward loop yeast
| Introduction |
|---|
|
|
|---|
Gene regulatory networks can be decomposed into several layers starting from their genomic constituents (Babu et al. 2004
k
), with most of the nodes having few connections and a few having many. Nodes in the tail of the power law distribution have an atypically high number of connections (degree) and can be regarded as hubs. These are regulatory proteins, binding to many target genes. The centralized organization imposed by regulatory hubs may be significant for cellular dynamics. For instance, relative to the cell cycle, condition-specific hubs may provide the capacity to facilitate shifts between different phases, whereas nontransient hubs possibly define an interface between cell cycle progression and housekeeping functions (Luscombe et al. 2004
Milo et al. (2002)
have shown that at the local circuitry level, some small subgraphs, called network motifs, are much more abundant than would be expected by chance. The most well-known example is the feed-forward loop (FFL) motif, which is thought to perform an important signaling role (Mangan and Alon 2003
; Mangan et al. 2003
) and whose overabundance has been attributed to incremental acquisition of adaptive single interactions (Conant and Wagner 2003
; Babu et al. 2004
). An example of FFL circuits in yeast is shown in figure 1.
|
The statistical significance of the FFL circuit abundance was calculated by comparing the number of FFLs in an observed network with the average number of FFLs in an ensemble of random networks (Milo et al. 2002
was established to test for overrepresentation. Other studies have shown that some variants of the preferential attachment rule (Baraba
i and Albert 1999
In this article, we set up a model of "neutral" genome mutational dynamics to study its effects on the topology of the gene regulation network. Such mutational dynamics are defined by duplication, deletion, and mutation of genes and BS but without any selection mechanism. The general question is whether the abundance of FFL circuits can be explained as a "signature" of the evolutionary mechanics. Furthermore, we use data of the yeast transcription network (Lee et al. 2002
) to establish differences and similarities with our model.
| Modeling Mutational Dynamics |
|---|
|
|
|---|
Van Noort et al. (2004)
As in Van Noort et al. (2004)
, we initialize a genome G = {X1, X2, ..., Xn}, where each gene Xi has a promoter region Ri = {bsi1, bsi2, ..., bsim}, which is composed of randomly chosen sites that may or may not be functional (bound by a protein). A protein Pi is also associated to each gene. The genome is evolved according to the following events (fig. 2AF): 1) gene duplication: a gene Xi is duplicated to Xj, together with its respective promoter Ri and protein Pi; 2) gene deletion: Xi is removed from G; 3) BS duplication: a gene Xi acquires a new BS bsjk copied from another gene Xj, so
and 4) BS deletion: a TF BS is deleted from the set Ri. In addition, 2 operations are related to divergence of promoter regions and proteins: 5) a BS bsik mutates into
and 6) protein Pi diverges into protein
Both cases imply a mutation of distance 1 in a linear space, introducing the chance of novel elements emerging in the genome. Finally, we simulate genome mutational dynamics in a probabilistic manner: For each gene in the genome, we decide whether to operate at the level of the gene or its BSs, with 5050 chance. In the former case, according to their corresponding probabilities, one of (1), (2), (6), or else no event, is chosen. In the latter, for each BS associated with the gene, one of (3), (4), or (5), or else no event, is chosen according to their probabilities. The default rates are gene dupdel, 1 x 103; BS dupdel, 8 x 103; TF mutation, 5 x 103; and BS mutation, 8 x 104.
|
The network is calculated as follows. A protein becomes a TF (regulator) when it matches a BS of any other gene. Initially, the number of different regulators in the network is determined by the number of different BSs that are randomly distributed among the genes. Binding occurs when the protein falls within an interval of a few (here 2) mutations away from a BS. The transcriptional regulatory network can be calculated (fig. 2G) by establishing a connection from gene Xi to gene Xj if protein Pi matches any bs
BSj, that is to say, if Pi is a TF binding to the promoter region of gene Xj. When a target gene becomes inactive through loss of its functional BSs, it is no longer considered in the network because its degree becomes zero. In turn, a target gene with nonfunctional sites can be reintegrated into the network by BS mutations. In our study, we have initialized simulations with 1,0002,500 genes, 510% regulators, and 13 BSs per gene.
The choice of parameters is based on the following considerations. Based on the idea that single-gene duplication is an ongoing process and that most recent duplicates are lost in small mutations (Kellis et al. 2004
), we assume equal gene duplicationdeletion rates for our simulations (e.g., 1 duplicationdeletion every 1 or 2 generations). BS duplication and deletion rates are also assumed to be in equilibrium, resulting in genomes with stable average size but constant changes in internal organization. It was shown (Van Noort et al. 2004
) that higher BS duplication and deletion rates relative to those of genes were required to obtain better approximations of yeast coexpression data. Previous studies have shown the flexibility of upstream regions and highlighted their importance for fast adaptation (Stone and Wray 2001
; Berg et al. 2004
; Tanay et al. 2005
) and divergence of duplicate gene pairs (Evangelisti and Wagner 2004
; Maslov et al. 2004
). Therefore, in a typical simulation run, BS duplication and deletion rates are from two- to eightfold higher than gene duplication and deletion rates. In turn, BS mutation rates are typically no more than twofold higher than gene duplicationdeletion rates. This allows nonfunctional sites to become functional and vice versa but keeping duplications as the most dominant process in network evolution. On the other hand, considering the small proportion of regulators in our simulations, the rate of protein divergence is two- to fivefold higher than gene duplication and deletion rates, which compensates for losses of regulator families and allows to keep enough TF diversity within the genome. The resulting effective rate of innovation is, in average, 1 novel TF appearing every 10 time steps and a nonfunctional site turning into a functional BS every 5 time steps. Within the ranges we have mentioned, different parameter configurations were considered without affecting our results and conclusions.
| Results |
|---|
|
|
|---|
Global Network Structure
Given the small proportion of regulators, our networks start with a high average number of outgoing connections per TF, for example, a network with 2,000 genes and 100 regulators needs at least an average of 20 BS per regulator. Mutations increase the number of different BS types within the genome, but most of these new site types are nonfunctional, that is, with no protein bound, keeping the actual number of different TF almost unchanged.
The initial setting produces a random graph. However, the mutational dynamics defined in our model transforms this network such that its distribution of connections follows a power law (see fig. 3). Intuitively, the consequence of duplication of target genes and BSs is that some regulators will gain more connections, elongating the tail of the degree distribution, whereas most of the genes keep a low (in-) degree. Figure 4 shows the transformation of a toy network with 100 genes as an effect of the mutational dynamics.
|
|
The exact value of the exponents depends on the proportion of different TFs in the genome. With 2,500 genes and 5% TFs, the exponent that minimizes the sum of squares error is approximately 2. This coincides well with the yeast data (P < 0.001) (Lee et al. 2002
5% and
10%, which keeps a scaling behavior consistent with the data. A more detailed study of the model's limiting distributions in the network and the effect of the number of regulators is left for further research.
Local Network Circuitry
We find a high abundance of FFL circuits in our simulated networks. We observe that after a period of drift around a mean value, the network undergoes "FFL motif avalanches": sudden increases in the number of FFL circuits (fig. 5).
|
An analysis of our simulated genomes reveals that after an avalanche, most of the circuits (
90%) are formed by a large target overlap between few interconnected regulatory hubs (e.g., regulators contributing with
5% of the total number of connections). The mechanism behind this phenomenon is as follows: before the avalanche of circuits, some regulators have undergone duplication and thus bind to an overlapping set of target genes (as shown in fig. 5). A new connection linking these 2 paralog regulators creates as many FFL circuits as the number of coregulated targets.
Local Circuitry in Yeast Network
We investigated whether such a mechanism of coregulation and cross talk between regulatory hubs is present in real regulatory networks and to what extent the FFL circuits can be explained in the same manner as in the model. For this, we looked specifically at the S. cerevisiae network (Lee et al. 2002
) as derived from microarray experiments testing the binding affinity of 106 well-documented regulators. An error model of binding in the data allows the assignment of a P value to the interactions. Our analysis is based on P < 0.001.
In accordance with our simulations, we find that a large number of FFL circuits can be associated with pairs of homolog regulators. Namely, more than 30% of the FFL circuits in yeast are formed by 5 pairs of regulators with significant homology at the protein level (fig. 6A). From these pairs, those that contribute to the most FFL circuits are the cell cycle regulators Swi6Swi4 (forming the SCB binding factor complex [Kumar et al. 2000
; Horak et al. 2002
]) and Mbp1Swi4, both with bidirectional PBlastE < 1x1012 on the S. cerevisiae database. In concordance with the definition of a FFL circuit presented in Milo et al. (2002)
, we do not include the homolog pair Cin5Yap6, connected with a bidirectional regulatory interaction and generating 47 (symmetric) FFL circuits.
|
In general, most of the FFL circuits in the yeast network are linked to a list of 16 regulatory hubs. These regulators bind each to more than 80 targets and together they are involved in almost 50% of all the interactions in the network. In 55% percent of the FFL circuits, both master and secondary regulators are in this list of hubs (fig. 6B), and in 90% of the circuits, at least one regulator belongs to this list. We also find that many of these regulatory hubs are part of a serial circuit of 8 cell cycle transcriptional regulators (Swi4, Swi6, Fkh2, Ace2, Swi5, Mbp1, Mcm1, and Ndd1) that regulate almost 30% of yeast genes (Simon et al. 2001
FFL Significance, Average In-Degree of Regulators and Evolution
When applying the significance test defined by Milo et al. (2004)
on the yeast network (P < 0.001), we find 334 FFL circuits with a Z score of 6.73. This randomization test uses an ensemble of random networks that preserves the number of outgoing and incoming connections in each node. According to Itzkovitz et al. (2003)
, the expected number of FFL circuits in an ensemble of random networks with arbitrary in-degree and out-degree sequence is given by
![]() | (1) |
term, we see that the expected number of FFL circuits depends on the co-occurrence of in-degree and out-degree, that is, it depends on the average in-degree of regulators.
In the yeast P < 0.001 network, the average in-degree of regulators is 0.915, whereas the average in-degree of the whole network is 1.79. This in-degree bimodality renders a low number of expected circuits (170 in comparison with the observed 334), leading to the conclusion that FFL circuits are overrepresented. However, a network with the same number of genes, FFL circuits, and out-degree per regulator, but without the in-degree bimodality, has no FFL circuit overrepresentation because the expected number of circuits increases to
300.
In our simulations, both the presence of circuits and in-degree bimodality evolve together, resulting in networks with FFL circuit overrepresentation. Figure 7 illustrates this. The bimodality here occurs in the same way as in yeast, that is to say, some regulators have zero in-degree (they are not regulated by any other TF) but do have connections to some targets.
|
The process generating the FFL circuits, as described previously, may increase the average connectivity through the duplication of highly connected regulatory hubs. It is also an inherently unstable process in the sense that the circuits are clustered around a few regulatory hubs and depend on a single cross-hub connection. This means that after a certain level of network organization has been reached, many FFL circuits can be gained or lost easily by targeted addition or deletion of single connections, whereas for most random modifications the FFL significance remains unchanged, for example, 20% of the edges added or removed at random has no effect on the significance, the same holding for yeast and E. coli networks (Milo et al. 2002
30% of the cases, in contrast with the 5% expected for random networks. We can understand this better by looking back at figure 4, where we see that evolved networks such as that in figure 4C develop a hierarchical structure that is very different from a random network. Finally, note that many other selection pressures other than maintaining network motifs can conserve the hubs or the interconnections producing the FFL circuits. | Discussion and Conclusions |
|---|
|
|
|---|
We show that a genome evolution model, based on duplications, deletions, and innovations of genes and BSs, provides possible explanations for the generation of FFL circuits and their overrepresentation. In our simulations, circuits are formed by interconnected hubs with overlapping sets of targets. This kind of topology is a general result of the mutational dynamics and does not depend on specific parameter values or simulation schemes.
Overrepresentation of FFL circuits occurs in many types of networks. Our model specifically applies to transcription regulation networks. The circuits generated in our simulations cluster around a few master and secondary regulator pairs with overlapping binding targets, as observed in transcription regulation networks of E. coli and yeast (Dobrin et al. 2004
) but, interestingly, not in other networks with overrepresentation of FFL, for example, the Caenorhabditis elegans neural network (Kashtan et al. 2004
).
The fact that FFL circuits appear in "avalanches" as a side effect of the mutational dynamics shows that selection on individual circuits (Conant and Wagner 2003
; Babu et al. 2004
) is not needed to explain their abundance. Previous tests of whole-motif duplication (Conant and Wagner 2003
) or target-gene duplication (Babu et al. 2004
) led to the conclusion that gene duplication was not responsible for the abundance of circuits. In contrast, our results suggest that gene duplication does play a major role but in relation to regulators, rather than to target genes or whole circuits. Indeed, in yeast, many FFL circuits are formed by a few pairs of highly connected homolog regulators.
Another new perspective on the structure of the transcription network is given by the dependence of the statistical overrepresentation of FFL circuits on the low average in-degree of regulators. We show in our simulations that the mechanics of evolution leads to the generation of FFL circuits as well as a bimodal in-degree distribution and an increase in connectivity. The combined effect is that FFL circuit overrepresentation often occurs. An important bottom line is that evolution is very different from a randomization test, where all but one property is fixed. Instead, evolution changes many properties simultaneously.
| Acknowledgements |
|---|
|
|
|---|
We thank Anton Crombach, Max Heywood, and Nynke Kramer for helpful suggestions concerning this article. We like to thank Daniel van der Post for his thorough review and comments. We are also in debt with the reviewers, whose valuable comments helped to improve this article.
Funding for the research and the Open Access publication charges was provided by the Netherlands Organization for Scientific Research.
| Footnotes |
|---|
Yoko Satta, Associate Editor
| References |
|---|
|
|
|---|
Artzy-Randrup Y, Fleishman SJ, BenTal N, Stone L. 2004. Comment on "network motifs: simple building blocks of complex networks" and "superfamilies of evolved and designed networks". Science 305:1107.
Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA. 2004. Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14:28391.[CrossRef][ISI][Medline]
Baraba
i AL, Albert R. 1999. Emergence of scaling in random networks. Science 286:50912.
Berg J, Willmann S, Lassig M. 2004. Adaptive evolution of transcription factor binding sites. BMC Evol Biol 4:42.[CrossRef][Medline]
Conant GC, Wagner A. 2003. Convergent evolution of gene circuits. Nat Genet 34:2646.[CrossRef][ISI][Medline]
Dobrin R, Beg QK, Barabasi AL, Oltvai ZN. 2004. Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network. BMC Bioinformatics 5:10.[CrossRef][Medline]
Evangelisti AM, Wagner A. 2004. Molecular evolution in the yeast transcriptional regulation network. J Exp Zool B Mol Dev Evol 302:392411.[Medline]
Guelzim N, Bottani S, Bourgine P, Kepes F. 2002. Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 31:603.[CrossRef][ISI][Medline]
Horak CE, Luscombe NM, Qian J, Bertone P, Piccirrillo S, Gerstein M, Snyder M. 2002. Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae. Genes Dev 16:301733.
Itzkovitz S, Milo R, Kashtan N, Ziv G, Alon U. 2003. Subgraphs in random networks. Phys Rev E Stat Nonlin Soft Matter Phys 68:026127.[Medline]
Kashtan N, Itzkovitz S, Milo R, Alon U. 2004. Topological generalizations of network motifs. Phys Rev E 70:031909.[CrossRef]
Kellis M, Birren BW, Lander ES. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428:61724.[CrossRef][Medline]
Kumar R, Reynolds DM, Shevchenko A, Goldstone SD, Dalton S. 2000. Forkhead transcription factors, Fkh1p and Fkh2p, collaborate with Mcm1p to control transcription required for M-phase. Curr Biol 10:896906.[CrossRef][ISI][Medline]
Lee TI, Rinaldi NJ, Robert F, et al. (21 co-authors). 2002. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799804.
Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M. 2004. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431:30812.[CrossRef][Medline]
Mangan S, Alon U. 2003. Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci USA 100:119805.
Mangan S, Zaslaver A, Alon U. 2003. The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol 334:197204.[CrossRef][ISI][Medline]
Maslov S, Sneppen K, Eriksen KA, Yan KK. 2004. Upstream plasticity and downstream robustness in evolution of molecular networks. BMC Evol Biol 4:9.[CrossRef][Medline]
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. 2002. Network motifs: simple building blocks of complex networks. Science 298:8247.
Milo R, Itzkovtiz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U. 2004. Superfamilies of evolved and designed networks. Science 303:153842.
Simon I, Barnett J, Hannett NA, et al. (11 co-authors). 2001. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106:697708.[CrossRef][ISI][Medline]
Stone JR, Wray GA. 2001. Rapid evolution of cis-regulatory sequences via local point mutations. Mol Biol Evol 18:176470.
Tanay A, Regev A, Shamir R. 2005. Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast. Proc Natl Acad Sci USA 102:72038.
Teichmann SA, Babu MM. 2004. Gene regulatory network growth by duplication. Nat Genet 36:4926.[CrossRef][ISI][Medline]
Van Noort V, Snel B, Huynen MA. 2004. The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep 5:2804.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
O. X. Cordero, B. Snel, and P. Hogeweg Coevolution of gene families in prokaryotes Genome Res., March 1, 2008; 18(3): 462 - 468. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








