MBE Advance Access originally published online on August 19, 2007
Molecular Biology and Evolution 2007 24(10):2344-2353; doi:10.1093/molbev/msm165
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
A Long-Term Evolutionary Pressure on the Amount of Noncoding DNA





* Inserm, U571, Paris, France
Laboratoire d'InfoRmatique en Images et Systèmes d'Information, UMR CNRS 5205, INSA-Lyon/Université Claude Bernard Lyon 1, Villeurbanne, France
Laboratoire de Statistiques et Probabilités, INSA-Toulouse, Toulouse, France
Laboratoire de Biologie Fonctionnelle, Insectes et Interactions, UMR INRA/INSA 203 BF2I, INSA-Lyon, Villeurbanne, France
E-mail: guillaume.beslon{at}insa-lyon.fr.
| Abstract |
|---|
|
|
|---|
A significant part of eukaryotic noncoding DNA is viewed as the passive result of mutational processes, such as the proliferation of mobile elements. However, sequences lacking an immediate utility can nonetheless play a major role in the long-term evolvability of a lineage, for instance by promoting genomic rearrangements. They could thus be subject to an indirect selection. Yet, such a long-term effect is difficult to isolate either in vivo or in vitro. Here, by performing in silico experimental evolution, we demonstrate that, under low mutation rates, the indirect selection of variability promotes the accumulation of noncoding sequences: Even in the absence of self-replicating elements and mutational bias, noncoding sequences constituted an important fraction of the evolved genome because the indirectly selected genomes were those that were variable enough to discover beneficial mutations. On the other hand, high mutation rates lead to compact genomes, much like the viral ones, although no selective cost of genome size was applied: The indirectly selected genomes were those that were small enough for the genetic information to be reliably transmitted. Thus, the spontaneous evolution of the amount of noncoding DNA strongly depends on the mutation rate. Our results suggest the existence of an additional pressure on the amount of noncoding DNA, namely the indirect selection of an appropriate trade-off between the fidelity of the transmission of the genetic information and the exploration of the mutational neighborhood. Interestingly, this trade-off resulted robustly in the accumulation of noncoding DNA so that the best individual leaves one offspring without mutation (or only neutral ones) per generation.
Key Words: adaptive evolution noncoding DNA mutation rate rearrangements mutational variability indirect selection
| Introduction |
|---|
|
|
|---|
Eukaryotic genomes contain many sequences that are not translated into proteins. Although some of these sequences bear the hallmark of natural selection and are thus presumed to be functional (Duret et al. 1993
Sequences acquired in such a nonadaptive way can then provide novel substrates for evolutionary innovations (Brosius and Gould 1992
; Smit 1999
; Lynch and Conery 2003
). For instance, mRNA-derived retroposons can give rise to active genes (Brosius 2003
). Furthermore, even when they remain nonfunctional, sequences present in several copies promote genomic rearrangements that can affect the phenotype (Hughes 1999
; Kidwell 2002
; Rocha 2003
; Coghlan et al. 2005
). Thus, sequences that are nonfunctional in a particular organism may nonetheless play a major role in the appearance of nonneutral mutations, leading to new phenotypes in the offspring of this organism.
Now the level of nonneutral genetic variation is a key element for the long-term evolutionary success of a lineage. On the one hand, variability is a prerequisite for evolvability, the ability to innovate (Wagner and Altenberg 1996
; Kirschner and Gerhart 1998
; Radman et al. 1999
; Burch and Chao 2000
; Wagner 2005
). On the other hand, the long-term evolutionary success also requires that a sufficient proportion of the offspring keep the ancestral phenotype by bearing no mutation or only neutral ones (Van Nimwegen et al. 1999
; Wilke 2001a
, 2001b
; Wilke et al. 2001
). Indeed, if the ancestral fitness cannot be retained from one generation to the next because deleterious mutations are too frequent, the lineage will face a heavy mutational burden that can lead to extinction. Taken together, these considerations imply that competing organisms need to achieve not only a high fitness but also an appropriate level of nonneutral genetic variation, reflecting a trade-off between the exploration of new phenotypes and the reliable transmission of the current one.
As nonfunctional sequences are not under immediate selection, their number can easily vary, which could be a way to reach the appropriate level of nonneutral variation. If this hypothesis is correct, the amount of nonfunctional sequences may not just be the passive result of mutational processes. The long-term selection of an appropriate variability may exert a selective pressure on the amount of nonfunctional DNA. This selective pressure would be indirect because varying the amount of nonfunctional DNA would not change the immediate fitness of the organism but would rather modify the chance that its offspring retain it.
This fairly simple hypothesis is however hard to test for many reasons. The first is the indirect, long-term nature of this selective pressure. Many generations would be necessary to reveal its effect. A second and perhaps more serious obstacle is the difficulty to isolate this effect from the other evolutionary pressures acting on the amount of nonfunctional DNA, including mutational biases and direct selective constraints on genome size. Finally, the long-term selection of an appropriate frequency of nonneutral mutations can also act at other levels than genome compactness. It can lead, for example, to more or less robust topologies for regulatory networks and metabolic pathways. Thus, testing the hypothesis of an indirect selective pressure on nonfunctional DNA requires a specific approach, allowing us to isolate its effects. In silico experimental evolution of simple "organisms" is particularly useful in this context (Adami 2006
). Direct selective pressures are controlled and mutational biases can be turned-off. Moreover, the exact knowledge of lineages, ancestral sequences, and fixed mutations allows for a detailed analysis of the evolutionary mechanisms.
However, in previous in silico experiments designed to study long-term evolutionary forces, the effects on the genomic structure could not be predicted because only one gene was modeled (Eigen 1971
) or because the genome representation did not explicitly include the notions of gene, gene product, and intergenic sequences (Eigen 1971
; Wilke 2001b
; Wilke et al. 2001
). In other models involving a more realistic genome architecture (Wu and Lindsay 1996
; Burke et al. 1998
), the complexity of the phenotype was not allowed to evolve with the complexity of the genotype. As a consequence, unrealistic heuristics were used in the transition from genotype to phenotype, which introduced artifactual effects on the evolution of genome size.
Here, we study the evolution of artificial organisms where the genomic structure is biologically interpretable and where the complexity of the phenotype is allowed to evolve. This allows us to investigate the spontaneous evolution of genome size, that is, without either direct selection on genome size, or mutational biases, or self-replication of selfish elements. We show that in these conditions, the amount of noncoding sequences maintained in the genome, far from being random, is determined by the long-term selection of an appropriate level of nonneutral variation. This indirect selective pressure is at the origin of a strong relationship between the mutation rate and the amount of nonfunctional DNA contained in the artificial genomes.
| Materials and Methods |
|---|
|
|
|---|
These in silico experiments were performed on the "aevol" platform (Knibbe et al. 2007
General Principles
The simulated organisms have circular, double-strand binary genomes containing both coding and noncoding sequences (fig. 1). Each coding sequence encodes a "protein," able to either activate or inhibit a number of functions. The phenotype is defined as the set of functional abilities of the organism, resulting from the combination of all its proteins. Adaptation is then measured by comparing the functions the organism can achieve to the functions to be performed and to be avoided in the environment. During replication, genomes can undergo not only point mutations, small insertions and deletions but also genomic rearrangements, consisting of duplications, deletions, translocations, and inversions.
|
Detection of the Coding Sequences
Promoter and terminator signals define the boundaries of the transcribed regions. Within them, start and stop signals delimit the coding sequences. Promoters are sequences whose Hamming distance with a predefined 28-bp consensus sequence is d
dmax with dmax = 4 in this study. Terminator signals are sequences able to form a stem–loop secondary structure:
The expression level of a transcribed region is defined as
Translation and Phenotype Computation
A global set of feasible functions is defined as the real interval
The functional abilities of each gene product are represented by a fuzzy subset
The possibility distributions of these subsets are piecewise linear with "triangular" shapes, with a maximal possibility degree
for the function m (fig. 1). The 3 real parameters m, w, and h are encoded by the coding sequence. Each coding sequence is read codon by codon using the genetic code shown in figure 1. This genetic code is not degenerated in order to prevent robustness at this level interfering with the effect of the noncoding sequences. The run of codons m0 and m1 (respectively w0 and w1, h0 and h1) forms a gray encoding of m (respectively w, h). The sign of h determines whether the gene product activates or inhibits the functions
The functional abilities of the organism as a whole is the fuzzy set of functions that are activated and not inhibited by its proteins:
where Ai is the subset of the ith activator protein and Ij the subset of the jth inhibitor protein. Lukasiewicz fuzzy operators are used to compute the possibility distribution P(x) of this set, which represents the phenotype of the organism.
Adaptation Measure
The abilities required to survive in the environment are also modeled by a fuzzy set E, whose possibility distribution E(x) can be seen on figure 2. Adaptation is then measured by the gap
between the possibility distributions E(x) and P(x). Note that although this adaptation measure penalizes both the under- and the overrealized functions, it does not prevent increases in gene number. There is indeed a constant need for new activator and inhibitory genes to refine the phenotypic distribution P(x).
|
Mutations
Every time a genome is replicated, it can undergo point mutations, small indels (1–6 bp), inversions, translocations, large deletions, and duplications. The mutation algorithm proceeds as follows. When a genome of length L is replicated, we first draw the 4 numbers of rearrangements it will undergo. These 4 numbers all follow the binomial law
where urearr is the per-base rate for the 4 types of rearrangement. Hence, the genome undergoes on average urearr L inversions, urearr L translocations, urearr L large deletions, and urearr L duplications (the fact that larger genomes undergo more rearrangements per replication aims at simply taking into account the fact that they contain more repeated sequences, while avoiding a time-consuming similarity search). Then, all these rearrangements are performed in a random order. To perform, for instance, a large deletion, 2 breakpoints p1 and p2 are chosen randomly (uniformly) on the chromosome, and the segment ranging from p1 to p2 in the clockwise sense is excised. In a similar manner, the boundaries of the duplicated, inverted, and translocated segments, as well as the reinsertion points for the translocated and duplicated segments, are also chosen uniformly on the chromosome. Once all the rearrangements have been performed, the new chromosome length is called L' and we draw the 3 numbers of local mutations (point mutations, small insertions, and small deletions). They all follow the binomial law
where uloc is the per-base rate for the 3 types of local mutations. We finally perform all the local events in a random order, the affected positions being again randomly chosen. All the mutation rates were first adjusted to a same per-base pair value, uloc = urearr = u (with u = 5.10–6, 10–5, 2.10–5, 5.10–5, 10–4, or 2.10–4), in order not to give a priori more importance to a specific category of genetic change. Then, we ran additional simulations where the rate of the local events, uloc, was either smaller or larger than the rate of large-scale rearrangements, urearr.
Initialization
To initialize each population, random genomes of 5,000 bp were tested until one was found whose phenotype narrows the gap g, due to at least one beneficial gene. The whole population was seeded with that single genome. The number of trials required to get a suitable genome can be used to estimate the probability to find by chance a functional gene in a random sequence. On average, 610 genomes of 5,000 bp were tested before getting a suitable one, which means that a functional gene is found every 3,050,000 bp (on average) in a random sequence. This shows, albeit indirectly, that local mutations in the intergenic sequences have a low probability to create new genes ex nihilo.
Evolution of the Population
The population size, N, is fixed and organisms reproduce asexually, according to their adaptation. To control the selective pressure, and to keep it constant throughout the evolution period (Whitley 1989
), we used an exponential ranking selection scheme (Blickle and Thiele 1996
): The expected number of offspring of a given organism is an exponential function of its rank in the population. Thus, at each generation, the N organisms were sorted from the least adapted to the best adapted. Their expected numbers of offspring then followed the multinomial law with N = 1,000 trials and reproduction probabilities
where r is the rank of the organism. The parameter
is the curvature of the relationship between the rank and the probability of reproduction; hence, it controls the efficiency of the selection. The closer c is to 1, the less efficient the selection. This selection scheme allows us to test various selection efficiencies while keeping the population size tractable. We tested 4 values for c (0.9900, 0.9950, 0.9980, and 0.9995). For each combination of u and c, we tested 3 populations of N = 1,000 organisms. Supplementary text S2 (Supplementary Material online) presents additional experiments that were performed under a more classical selection scheme, where the probability of reproduction of an individual directly depends on its adaptation measure g rather than on its rank in the population.
Estimates of the Fraction of Neutral Offspring
The theoretical estimates of the fraction
of neutral offspring were computed for the final fittest organism by considering the transcribed regions—including their promoters and terminators—as the coding units, overlapping regions being merged into a single unit (see supplementary text S1, Supplementary Material online). Empirical estimates were obtained by generating 1,000 offspring for each final fittest organism, with the same mutation rate, u, as during the evolution period and by counting the number of offspring that retained the same gap g.
| Results |
|---|
|
|
|---|
To study the spontaneous evolution of the amount of nonfunctional DNA, we allowed 72 asexual populations to evolve during 20,000 generations under various mutation rates combined with various selection efficiencies.
Relation between the Mutation Rate and Genome Compactness
The initial genomes contained only one gene. In all cases, the very first generations were characterized by duplication-divergence events, allowing the organisms to acquire new functional capabilities and to reduce the gap g with the environment (fig. 2). Then, after a few thousands of generations, both the gene number and the amount of noncoding sequences reached equilibrium (fig. 3A). The equilibrium values were independent of the initial genome size (data not shown) but strongly dependent on the mutation rate (figs. 2 and 3B
). It has been suggested that as most mutations are deleterious, the per-base pair mutation rate can impose an upper limit to the number of genes (Eigen 1971
; Maynard-Smith 1983
; Hurst 1995
; Pal and Hurst 2000
) and this is indeed what happened here. The higher the mutation rate, the lower the number of genes at equilibrium (figs. 2 and 3B
) and the higher the gap with the target (supplementary fig. S1, Supplementary Material online). However, more surprisingly, our experiments show that the mutational pressure also acted on the amount of noncoding sequences (figs. 2 and 3B
). Under high mutation rates, the evolved genomes resembled viral ones, with overlapping genes and almost no noncoding sequences (fig. 2B). Under low mutation rates, the genomes contained high proportions of noncoding sequences (fig. 2A), up to 97% of the genome here. This implies that during adaptive evolution, large amounts of noncoding sequences can accumulate in the absence of self-replicating elements and without a predominance of the insertions on the deletions, if the per-base pair mutation rate is low. To further test this strong relationship between the mutation rate and the architecture of the genome, we changed the per-base pair mutation rate after 10,000 generations. This caused the genomes to evolve quickly toward the size corresponding to the new mutation rate (supplementary fig. S2, Supplementary Material online). We observed this tight coupling for the 4 selection efficiencies tested, the genomes being globally larger when the selection strengthens (fig. 3B).
|
Role of the Noncoding Sequences in the Mutational Variability of the Phenotype
To test whether the indirect selection of a specific level of variability could underlie this coupling, we investigated the role of genome compactness in the mutational variability of phenotype. One indicator of this variability is the fraction of "neutral offspring" (Ofria et al. 2003
can be approximately calculated using the probability that no functional region mutates during replication
![]() | (1) |
the probability that a random mutation of type j does not affect any transcribed region. This probability can be computed for each type of mutation:
![]() | (2) |
i the length of the intergenic sequence between the functional regions i and i + 1 (see supplementary text S1, Supplementary Material online for the details of this derivation). These equations show that, for a given mutation rate, longer intergenic sequences lower the fraction of neutral offspring (fig. 4) and hence promote the exploration of new phenotypes. There are 2 reasons for this. The first is that when new noncoding bases are acquired, the genome undergoes more mutational events. The second is that longer intergenic sequences do not make duplications and large deletions more neutral (eq. 2 and fig. 4). Indeed, contrary to the other types of mutation, their deleterious effects are not concentrated on a few points. Here, the average length of the rearranged segments increases with genome length, which implies that a duplication or a large deletion is not more likely to be neutral when intergenic sequences grow. Longer genomes undergo, however, more duplications and deletions per replication. The net effect of longer intergenic sequences is that genes have a higher probability to be deleted or duplicated at each replication. In short, intergenic sequences promoting large deletions and duplications are mutagenic for the genes they surround. Thus, longer intergenic sequences tend to enhance the level of nonneutral variation, that is, the mutational variability of the phenotype.
|
Indirect Selection of a Constant Level of Mutational Variability
In our model, noncoding sequences are not under direct selection; hence, their sizes can be easily increased and compensate for a low mutation rate or, conversely, decreased and compensate for a high mutation rate. To test whether this is what actually happened in our experiments, we calculated, for each run, the fraction
of neutral offspring of all ancestors of the final best individual, using equations 1 and 2. As shown in supplementary figure S3 (Supplementary Material online),
stabilizes quickly, after less than 5,000 generations. The final values are shown in figure 5A. We also computed empirical estimations of the final
(by simulating 1,000 independent replications of the final fittest individual, see Materials and Methods). These empirical values are shown in figure 5B. Both methods agree well and show that for a given selection efficiency, the evolved organisms exhibit roughly the same fraction of neutral offspring whatever the mutation rate be (fig. 5A and B). Thus, for each of the 4 selection efficiencies, a same level of mutational variability was indirectly selected. Under a low (respectively high) mutation rate, the organisms that exhibited the selected level of variability were those with a large (respectively compact) genome. Hence, the indirect selection of 4 specific levels of variability drove the evolution of genome compactness in the 4 data sets and underlies the 4 observed relationships between the mutation rate and the amount of nonfunctional DNA.
|
To test the generality of this principle, we ran additional experiments, where the rate of local mutations uloc and the rate of rearrangements urearr were allowed to differ (all these experiments were run under a same selection intensity, c = 0.9980). As shown in table 1, the evolved fraction of neutral offspring is of the same order when uloc > urearr, when uloc < urearr, and when uloc = urearr. This suggests that the fraction of neutral offspring is a general criterion driving the spontaneous evolution of genome compactness.
|
Respective Roles of the Local Mutations and the Rearrangements
Table 1 also shows that uloc and urearr both influence genome compactness and that they do so in the same direction. A higher rate of local mutations leads to more compact genomes and so does a higher rate of rearrangements. Conversely, either a lower rate of local mutations or a lower rate of rearrangements leads to a larger genome. This means that if either uloc or urearr is changed, this is compensated for by changing the number of genes, NG, and the lengths of the intergenic sequences,
i (see eqs. 1 and 2). These additional experiments raise an interesting point: Although the existence of duplications and large deletions is indispensable for the effect to take place (without them, changing the
i s would have no effect on F
), it is not mandatory to change their own rate to get an effect on the amount of noncoding DNA. Changing only the local mutation rate suffices to induce an effect. By changing uloc, the left term in equation 1 is modified and it is compensated for in the right term by changing the
is. This effect is however smaller than the effect induced by changing directly the rate of rearrangements table 1.
What Determines the Selected Level of Variability?
There remains an important question: What determines the value of the selected level of variability and why does this value depend on the selection intensity? The intensity of the selection, c, sets the relative probability of reproduction of the best individuals compared with the least adapted. When c = 0.9900 (efficient selection), the best individual gets an average of W = 10 reproductive trials. Now, the lineage can persist only if at least one of these offspring retains the ancestral phenotype, that is, if
Hence,
must be greater than 1/W, which means here that at least 10% of the offspring must bear no mutation or only neutral ones. Let us consider now a weaker selection, where the best individuals do not get many more reproductive trials than the worst ones. If c = 0.9995, for example, the expected number of reproductive trials of the best individual is as low as W = 1.27; hence,
must be greater than 79%. These examples show that the intensity of the selection indirectly determines a lower bound for
, and hence—for a given mutation rate—an upper bound for the number of genes and for the amount of noncoding sequences (eqs. 1 and 2).
It is harder to explain why the evolved
is actually always almost equal to its lower bound, 1/W (fig. 5). This means that all the 72 successful organisms share one property: When they reproduce, their progeny contains one neutral offspring, that is, the minimum ensuring the persistence in the following generation, but not more. In other words, for each of them, the genome is as large as possible given 1) the maximal number of reproductive trials he can expect and 2) the per-base pair mutation rate he undergoes. To make sure that this is not due to a hidden mutational bias toward genome growth, we monitored the evolution of genome size without any selection. In all cases, the genomes lost all their genes and shrank to less than 100 bp (supplementary fig. S4, Supplementary Material online), which suggests that in the standard runs the genome size is actively maintained by the selective pressure. This could basically be the direct selective pressure to close the gap g, which may tend to favor genomes with many genes and hence with low
This could also reflect the indirect selection of the lineages that were variable enough to explore new phenotypes and to sometimes discover fitter ones (or rediscover a fit phenotype after a deleterious mutation): in the lineage of the final best individual, beneficial mutations keep occurring even in the last 1,000 generations (data not shown). Such a pressure would not only favor high gene numbers, but could also favor high amounts of noncoding sequences (figs. 2 and 4). Thus, the empirical "rule" of one neutral offspring as a key of the long-term evolutionary success most likely reflects a trade-off between a sufficient fidelity of the transmission of the phenotype, a sufficient ability to explore new phenotypes, and a sufficient fitness.
To test whether this principle still applies under a more realistic selection scheme, we performed all experiments again under a "fitness-proportionate" selection scheme. In these experiments, the probability of reproduction of an individual directly depended on the absolute value of its gap g with environment, rather than on its rank in the population (see Materials and Methods). The evolved
were again of the order of 1/W, which means that the successful individuals were again those who produce one neutral offspring when they reproduce (see supplementary text S2, Supplementary Material online for more details). Besides, we obtained the same type of relationship between the mutation rate and the amount of noncoding positions (supplementary text S2, Supplementary Material online). This data set confirms that the previous results are not an artifact of the ranking selection scheme.
| Discussion |
|---|
|
|
|---|
Taken together, our experiments and the mathematical analysis show that a specific level of mutational variability is indirectly selected, which in turn induces the selection of a specific amount of noncoding sequences, depending on the mutation rate and the selection efficiency. This does not require the evolutionary process to be farsighted. Nor does it require that selection acts on a group level. In our experiments, selection acted only on the individuals. Individuals whose phenotypes are not robust enough undergo deleterious mutations and disappear, whereas individuals whose phenotypes are not variable enough are outcompeted by those that were able to discover innovations. In our experiments, the long-term evolutionary success requires that one of the offspring produced at each generation retains the phenotype of its progenitor, which reflects this trade-off between the exploration of new phenotypes and the reliable transmission of the current one.
What are the consequences of the selection of a specific variability level on genome compactness? It depends on the contribution of nonfunctional DNA to the variability level. Here, the simple mutational patterns we used allowed us to describe the relationship between genome structure and variability by simple equations (see eqs. 1 and 2 and fig. 4). In the tested situation, the fraction of neutral offspring decreases when additional noncoding bases are acquired because 1) more rearrangements occur and 2) the average size of the rearranged segments increases. The former effect is plausible if the number of repeated elements increases with genome size, which seems plausible for both bacterial and eukaryote genomes (Achaz et al. 2001
; Achaz et al. 2002
). The latter is a consequence of the uniform distribution we assumed here for the size of the spontaneous rearrangements. Is such a distribution relevant for living species? Although comparative genomics approaches can reveal the size distribution of the fixed rearrangements, it is extremely difficult to assess the size distribution of all the spontaneous rearrangements that occur in evolving populations. Indeed, in living organisms, illegitimate recombination, site-specific recombination, general homologous recombination, gene amplification by retroposition, and whole-genome duplications all contribute to genome dynamics at different levels (Hughes 1999
; Rocha 2003
; Cannon et al. 2004
; Dujon et al. 2004
). Hence, one can expect that each species, depending on its mutational patterns, exhibits its own complex, probably multimodal, size distribution. Our choice of a uniform distribution basically reflects the lack of knowledge in this area. However, we expect that in qualitative terms, the dynamics of noncoding DNA does not depend on the specific distribution of segment size, provided that the average segment size increases with genome size.
Aside from genome structure and the variety of mutational patterns, many other factors can influence the level of mutational variability of a living organism. From the robustness of protein folding to the robustness of developmental pathways, a multitude of mechanisms modulate the fraction of neutral mutations (Wagner 2005
). Recombination also plays a major role in variation and may have its own effect on the length of the noncoding sequences (Comeron 2001
). Thus, the relationship between genome size and the fraction of neutral offspring can be more complex in a living species than in our experiments.
As a consequence, the link we underscore here between the mutation rate and genome compactness may be difficult to reveal experimentally in living species. It is, however, noteworthy that the relationship we obtained is qualitatively consistent with Drake's (1991)
data, gathered for several microbial species from phage to fungi. Our results may provide an explanation for the constant genome-wide mutation rate he observed. It may reflect the indirect selection of the genome structure that allows for the best trade-off between a reliable transmission of the genetic information and the exploration of the mutational neighborhood. If the tested species share roughly the same mutational patterns, the same selective pressure and similar mutational robustness at other levels than the genome, then we can indeed expect them to appear on the same line on the log-log plot of genome size versus mutation rate (fig. 3). This pattern cannot be seen when unicellular and multicellular species are mixed (Lynch 2006
), probably because the transition to multicellularity has introduced fundamentally new mechanisms of mutational robustness, like a cellular selection in the germ line.
| Conclusion |
|---|
|
|
|---|
These in silico experiments shed light on a long-term evolutionary pressure that can drive the loss or the accumulation of noncoding sequences. These results show that under low mutation rates, a large amount of noncoding sequences can be maintained despite the absence of mutational biases or proliferation of "selfish" elements. A forthcoming challenge is the design of in vitro or in vivo experiments that could assess the strength of this spontaneous, long-term evolutionary dynamics compared with more immediate pressures like the self-replication of transposable elements. Furthermore, with the evidence that indirect selective pressures shape genome structure in silico, it is relevant to search for the hallmark of such pressures at all levels between genotype and phenotype, from protein sequence to gene networks and developmental pathways.
| Supplementary Material |
|---|
|
|
|---|
Supplementary texts S1 and S2 and figures S1, S2, S3, and S4 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank F. Taddéi, E. Rocha, C. Adami, L. Duret, V. Daubin, J. Lobry, S. Mousset, H. Charles, and H. Soula for comments on the manuscript. This work is part of the Biologie des Systèmes et Modélisation Cellulaire project. It is supported by the Rh
ne-Alpes region, the Bioinformatics Program of INSA Lyon, and the Rh
ne-Alpes Complex Systems Institute. | Footnotes |
|---|
Sudhir Kumar, Associate Editor
| References |
|---|
|
|
|---|
Achaz G, Netter P, Coissac E. Study of intrachromosomal duplications among the eukaryote genomes. Mol Biol Evol (2001) 18:2280–2288.
Achaz G, Rocha EPC, Netter P, Coissac E. Origin and fate of repeats in bacteria. Nucleic Acids Res (2002) 30:2987–2994.
Adami C. Digital genetics: unravelling the genetic basis of evolution. Nat Rev Genet (2006) 7:109–118.[Web of Science][Medline]
Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature (2005) 437:1149–1152.[CrossRef][Medline]
Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. Ultraconserved elements in the human genome. Science (2004) 304:1321–1325.
Blickle T, Thiele L. A comparison of selection schemes used in evolutionary algorithms. Evol Comput (1996) 4:361–394.
Brosius J. How significant is 98.5% junk in mammalian genomes? Bioinformatics (2003) 19:ii35.[Abstract]
Brosius J, Gould SJ. On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA". Proc Natl Acad Sci USA (1992) 89:10706–10710.
Burch CL, Chao L. Evolvability of an RNA virus is determined by its mutational neighbourhood. Nature (2000) 406:625–628.[CrossRef][Medline]
Burke DS, De Jong KA, Grefenstette JJ, Ramsey CL, Wu AS. Putting more genetics into genetic algorithms. Evol Comput (1998) 6:387–410.[Medline]
Cannon SB, Mitra A, Baumgarten A, Young ND, May G. in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol (2004) 1:4–10.[Medline]
Coghlan AG, Eichler EE, Oliver SG, Paterson AH, Stein L. Chromosome evolution in eukaryotes: a multi-kingdom perspective. Trends Genet (2005) 21:673–682.[CrossRef][Web of Science][Medline]
Comeron JM. What controls the length of noncoding DNA? Curr Opin Genet Dev (2001) 11:652–659.[CrossRef][Web of Science][Medline]
Denver DR, Morris K, Lynch M, Thomas WK. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature (2004) 430:679–682.[CrossRef][Medline]
Dermitzakis ET, Reymond A, Antonarakis SE. Conserved non-genic sequences—an unexpected feature of mammalian genomes. Nat Rev Genet (2005) 6:151–157.[CrossRef][Web of Science][Medline]
Drake JW. A constant rate of spontaneous mutation in DNA-based microbes. Proc Natl Acad Sci USA (1991) 88:7160–7164.
Dujon B, Sherman D, Fischer G. (67 co-authors). Genome evolution in yeasts. Nature (2004) 430:35–44.[CrossRef][Medline]
Duret L, Dorkeld F, Gautier C. Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Res (1993) 21:2315–2322.
Eigen M. Selforganization of matter and evolution of biological macromolecules. Naturwissenschaften (1971) 58:465–523.[CrossRef][Web of Science][Medline]
Frazer KA, Sheehan JB, Stokowski RP, Chen X, Hosseini R, Cheng JF, Fodor SPA, Cox DR, Patil N. Evolutionarily conserved sequences on human chromosome 21. Genome Res (2001) 11:1651–1659.
Hughes D. Impact of homologous recombination on genome organization and stability. In: Organization of the Prokaryotic Genome—Charlebois RL, ed. (1999) Washington (DC): ASM Press. 109–128.
Hurst LD. The silence of the genes. Curr Biol (1995) 4:459–461.
Keightley PD, Kryukov GV, Sunyaev S, Halligan DL, Gaffney DJ. Evolutionary constraints in conserved nongenic sequences of mammals. Genome Res (2005) 15:1373–1378.
Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica (2002) 115:49–63.[CrossRef][Web of Science][Medline]
Kirschner M, Gerhart J. Evolvability. Proc Natl Acad Sci USA (1998) 95:8420–8427.
Knibbe C, Mazet O, Chaudier F, Fayard J-M, Beslon G. Evolutionary coupling between the deleteriousness of gene mutations and the amount of non-coding sequences. J Theor Biol (2007) 244:621–630.[CrossRef][Web of Science][Medline]
Lynch M. The origins of eukaryotic gene structure. Mol Biol Evol (2006) 23:450–468.
Lynch M, Conery JS. The origins of genome complexity. Science (2003) 302:1401–1404.
Maestre J, Tchenio T, Dhellin O, Heidmann T. mRNA retroposition in human cells: processed pseudogene formation. EMBO J (1995) 14:6333–6338.[Web of Science][Medline]
Margulies EH, Blanchette M, Haussler D, Green ED. Identification and characterization of multi-species conserved sequences. Genome Res (2003) 13:2507–2518.
Maynard-Smith J. Models of evolution. Proc R Soc Lond B Biol Sci (1983) 219:315–325.
Mira A, Ochman H, Moran NA. Deletional bias and the evolution of bacterial genomes. Trends Genet (2001) 17:589–596.[CrossRef][Web of Science][Medline]
Ofria C, Adami C, Collier TC. Selective pressures on genomes in molecular evolution. J Theor Biol (2003) 222:477–483.[Web of Science][Medline]
Pal C, Hurst LD. The evolution of gene number: are heritable and non-heritable errors equally important? Heredity (2000) 84:393–400.[CrossRef][Web of Science][Medline]
Petrov DA, Sangster TA, Johnston JS, Hartl DL, Shaw KL. Evidence for DNA loss as a determinant of genome size. Science (2000) 287:1060–1062.
Radman M, Matic I, Taddei F. Evolution of evolvability. Ann N Y Acad Sci (1999) 870:146–155.[CrossRef][Web of Science][Medline]
Rocha EPC. An appraisal of the potential for illegitimate recombination in bacterial genomes and its consequences: from duplications to genome reduction. Genome Res (2003) 13:1123–1132.
Smit AFA. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev (1999) 9:657–663.[CrossRef][Web of Science][Medline]
Van Nimwegen E, Crutchfield JP, Huynen M. Neutral evolution of mutational robustness. Proc Natl Acad Sci USA (1999) 96:9716–9720.
Wagner A. Robustness and evolvability in living systems (2005) Princeton (NJ): Princeton University Press.
Wagner GP, Altenberg L. Complex adaptations and the evolution of evolvability. Evolution (1996) 50:967–976.[CrossRef][Web of Science]
Whitley D. The GENITOR algorithm and selection pressure: why rank-based allocation of reproductive trials is best. In: Proceedings of the 3rd International Conference on Genetic Algorithms—Schaffer JD, ed. (1989) San Mateo (CA): Morgan Kaufmann. 116–121.
Wilke CO. Adaptive evolution on neutral networks. Bull Math Biol (2001a) 63:715–730.[CrossRef][Web of Science][Medline]
Wilke CO. Selection for fitness versus selection for robustness in RNA secondary structure folding. Evolution (2001b) 55:2412–2420.[CrossRef][Web of Science][Medline]
Wilke CO, Wang JL, Ofria C, Lenski RE, Adami C. Evolution of digital organisms at high mutation rates leads to the survival of the flattest. Nature (2001) 412:331–333.[CrossRef][Medline]
Wu AS, Lindsay RK. A comparison of the fixed and floating building block representation in the genetic algorithm. Evol Comput (1996) 4:169–193.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






dotted line), inversions (
dotted-dashed line), and translocations (
dashed line) have a higher probability to be neutral. On the contrary, the proportions of neutral duplications and deletions (
and
solid line) do not increase. (B) As a result, the theoretical fraction of neutral offspring 