Skip Navigation


MBE Advance Access originally published online on March 9, 2006
Molecular Biology and Evolution 2006 23(6):1136-1143; doi:10.1093/molbev/msj121
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/6/1136    most recent
msj121v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Byrnes, J. K.
Right arrow Articles by Li, W.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Byrnes, J. K.
Right arrow Articles by Li, W.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Article

Reorganization of Adjacent Gene Relationships in Yeast Genomes by Whole-Genome Duplication and Gene Deletion

Jake K. Byrnes1, Geoffrey P. Morris1 and Wen-Hsiung Li

Department of Ecology and Evolution, University of Chicago

E-mail: whli{at}uchicago.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
In Saccharomyces, an ancient whole-genome duplication (WGD) and widespread duplicate gene deletion resulted in extensive reorganization of adjacent gene relationships. We have studied the evolution of adjacent gene pairs' identity, orientation, and spacing following whole-genome duplication and deletion (WGD-D) using comparative genomic analyses and simulations. Surveying adjacent gene organization across the Saccharomyces species complex, we find a genome-wide bias toward divergently and convergently transcribed gene pairs in all species but a reduction in this bias in the species that underwent WGD-D. Among neutral models of WGD-D, only single-gene deletion can produce the appropriate reduction in orientation bias and recapitulate the pattern of short, highly dispersed deletions we observe in Saccharomyces cerevisiae. To characterize the dynamics of WGD-D, we trace the conservation and creation of adjacent gene pairs along the S. cerevisiae lineage. We find that newly created adjacencies have a tandem orientation bias, while adjacencies conserved from prior to WGD-D have the same divergent-convergent bias as found in the species that diverged before WGD. We also find that adjacent gene pairs produced by WGD-D gained greater intergenic spacing but that this is reduced in the older adjacencies. Given this, and the preponderance of short deleted blocks, we argue that the deletion phase of WGD-D occurred primarily by small inactivating mutations followed by numerous small deletions. Newly created adjacent gene pairs also have an initial increase in mean log2 expression ratios and maximal expression levels, suggesting that increased intergenic spacing caused a genome-wide reduction in transcriptional interference.

Key Words: whole-genome duplication • gene deletion • Saccharomyces • adjacent gene orientation • intergenic spacing • gene expression


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Based on analyses of syntenic blocks of duplicate genes in the Saccharomyces cerevisiae genome, Wolfe and Shields (1997)Go proposed an ancient whole-genome duplication (WGD) in the budding yeast. This hypothesis has recently been confirmed by comparative genomic analyses of two yeast species that diverged from S. cerevisiae prior to WGD, Ashbya gossypii (Dietrich et al. 2004Go) and Kluyveromyces waltii (Kellis, Birren, and Lander 2004Go). There is now an excellent opportunity to study genome evolution following WGD using the many species in the Saccharomyces species complex (Kurtzman 2003Go) for which whole-genome sequence is available (Cliften et al. 2003Go; Kellis et al. 2003Go; Dujon et al. 2004Go). Those that diverged following WGD (post-WGD) include the Saccharomyces sensu stricto species, Saccharomyces mikatae, Saccharomyces kudriavzevii, Saccharomyces bayanus, and the more distantly related Saccharomyces castellii and Candida glabrata. Those that diverged from the S. cerevisiae lineage prior to WGD (non-WGD) include Saccharomyces kluyveri and Kluyveromyces lactis, in addition to A. gossypii and K. waltii.

Despite the initial doubling of genome content due to WGD, there are now only modest differences in genome size and gene number between "non-WGD" and "post-WGD" species. For example, the K. waltii genome has 10.7 million base pairs (Mbp) and ~5,200 genes, while the S. cerevisiae genome contains 12.5 Mbp and ~5,700 genes. Was the reduction of genome size concurrent with the reduction in gene number, with large deletions directly responsible for gene loss, or did much of the deletion follow prior pseudogenization? We may gain insight into the dynamics of this process by studying the organization of the remaining genes.

In fact, the process of WGD followed by deletion (WGD-D) left a complex pattern of large, interleaved syntenic blocks in S. cerevisiae. Because few duplicates remain (~10%; Kellis, Birren, and Lander 2004Go), these syntenic blocks only become obvious in a 2:1 alignment of S. cerevisiae syntenic blocks to the corresponding block in a non-WGD species (Dietrich et al. 2004Go; Kellis, Birren, and Lander 2004Go). Visual inspection of the interleaving pattern in these 2:1 alignments seems to suggest a preponderance of short, highly dispersed deletions, but the underlying deletion process has never been modeled. It is not known whether a simple model of short random deletions could produce this pattern or if deletions are more or less interleaved than would be expected by chance.

This interleaving is important because it may represent a rare opportunity for genome reorganization in yeast, given the apparent paucity of inversions and translocations (Fisher et al. 2000Go). Following WGD-D, genome location was mostly retained (albeit across two duplicate chromosomes), while adjacent gene relationships were largely altered. Therefore, we focused on the dynamics of genomic reorganization from the perspective of adjacent gene organization. A deletion may affect one or more of these aspects of adjacent gene organization: identity, orientation, and spacing. First, the identities of the genes in an adjacent pair change when one or more genes are lost and the flanking genes form a new adjacent pair. Second, after a deletion, the newly adjacent pair may have a different transcriptional orientation (tandem, convergent, or divergent; Cohen et al. 2000Go) than the old adjacent pairs. Finally, the spacing of adjacent genes may be reduced or increased by a deletion, depending on its boundaries.

There are a number of reasons to consider genome reorganization in terms of adjacent genes as opposed to chromosomal location. First, a preliminary examination of the S. cerevisiae synteny map suggests that most deletions are small, altering the local gene organization. In this case, adjacency may be the aspect of genome structure most affected by WGD-D. Second, there is mounting evidence for functional interactions between adjacent genes in eukaryotes. There are many well-documented instances of transcriptional interference for adjacent genes (Shearwin, Callen, and Egan 2005Go) and divergent transcription from bidirectional promoters, including GAL1-GAL10 in S. cerevisiae and prnD-prnB in Aspergillus nidulans (Lohr, Venkov, and Zlatanova 1995Go; Garcia et al. 2004Go). Genome-wide analyses have shown differing mean expression correlations for divergent, convergent, and tandem adjacent gene pairs in yeast (Cohen et al. 2000Go) and greater intergenic spacing for genes with higher expression in humans (Chiaromonte, Miller, and Bouhassira 2003Go). Adjacent gene orientation has also been linked to the localization of cohesin domains (Filipski and Mucha 2002Go) and hot spots for recombination (Gerton et al. 2000Go) in yeast. Finally, we study adjacent genes because several of the sequenced yeast genomes are currently available only as fragments such as contigs or supercontigs. By analyzing adjacent genes, we can investigate genome reorganization without knowing the entire genome structure.

In this study, we analyze genome sequences of post-WGD and non-WGD species and implement simulations of the WGD-D process to address the following questions. What is the typical scale of deletion and how does this inform our model of gene loss? Does the genome organization of post-WGD species (i.e., interleaving and adjacent gene relationships) provide evidence for selection shaping WGD-D or for functional consequences of WGD-D on expression patterns? We find that gene loss occurred by inactivating mutation, followed by numerous small deletions, and that the resulting increase in intergenic spacing led to a widespread, but largely neutral, reduction in transcriptional interference across the yeast genome.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Genomic Analyses
The Saccharomyces Genome Database (SGD) annotation for S. cerevisiae and the Washington University annotations for S. mikatae, S. kudriavzevii, S. bayanus, S. castellii, and S. kluyveri were downloaded from SGD (http://www.yeastgenome.org). The K. waltii genome annotation was downloaded from the supplemental Web site for Kellis, Birren, and Lander (2004Go; http://www.broad.mit.edu/seq/YeastDuplication/). To avoid spurious open reading frames (ORFs), we only used K. waltii ORFs with homology to S. cerevisiae ORFs. The A. gossypii genome annotation was downloaded from the Ashbya Genome Database (http://agd.unibas.ch/). The genome annotations for C. glabrata and K. lactis were downloaded from Génolevures (http://cbi.labri.fr/Genolevures/). Orthology to S. cerevisiae was provided in the respective genome annotations. The lengths of post-WGD deletions were collected from the S. cerevisiae-A. gossypii alignment, provided in the supplemental materials from Dietrich et al. (2004)Go. Orthologous genes with no connection to a syntenic block were not included in counts of deleted block lengths.

We define the quantity %DC, the percentage of adjacent pairs that are in divergent or convergent orientation. We use this summary statistic because of structural dependency between divergent and convergent adjacency. For any contiguous block of genes, the divergent and convergent counts can differ by at most one because divergent and convergent adjacencies are switching points between tracts of genes on opposite strands. Genome-wide %DC and intergenic length data were collected using PERL scripts and MySQL queries.

Evolutionary Analyses
We traced the origin of each pair of adjacent genes in S. cerevisiae, looking for conservation in successive outgroups along the S. cerevisiae lineage (including S. mikatae, S. kudriavzevii, S. bayanus, S. castellii, S. kluyveri, and A. gossypii). Because homoplasy is unlikely, we assume that any S. cerevisiae adjacency also found in an outgroup was present in their common ancestor regardless of its absence in intervening nodes. Tandem gene duplication also creates new adjacencies, skewing the distribution of orientations in younger adjacency classes. Indeed, we found that 14 out of 52 tandem duplicates in the full data sets are in the youngest class, so we removed all adjacent duplicate genes for subsequent analyses. We present results from analysis of all S. cerevisiae ORFs included in Harbison et al. (2004Go; 5546 ORFs), though the patterns hold if we use only adjacencies where both ORFs are classified by SGD as "verified." We investigated whether other possible origins for S. cerevisiae–specific adjacencies (inversion, misannotation) could have biased our results but found no evidence for this. If we use only the S. cerevisiae–specific adjacencies that have unambiguous evidence of WGD-D origin (i.e., orthologs in non-WGD species are less than 10 genes apart), the results are the same.

Simulations
To examine the potential for reorganization of adjacent gene relationships by WGD-D, we used a simulation coded in PERL. Each chromosome was represented as an array of genes that recorded presence/absence and orientation. The input genomes had the chromosome number and gene number of A. gossypii, a non-WGD species. Gene orientations were either modeled after A. gossypii or constructed randomly within a set range of %DC. The genomes were then duplicated, and the deletion process occurred in three stages. First, we draw "attempted" deletions, which represent the underlying mutational process. For each attempted deletion event, the locus was randomly selected and the length (in number of genes) was drawn from a uniform distribution (maximum block size of one or two genes) or a Poisson distribution (mean block size of one or two genes). For the neutral simulation, the duplicate copy was chosen for deletion randomly. For the selective simulation, we weighted the probability that a duplicate copy was chosen based on the net gain of tandem adjacencies for a deletion of one copy versus the other. When defining the deletion boundaries, previously deleted genes were assumed to have zero length. Next, attempted deletions that remove only redundant gene copies become "accepted" deletions. This assumes that any deletion that removes a single-copy gene would be effectively lethal. Finally, overlapping and adjacent deletions were combined to form "apparent" deletions, equivalent to the blocks of deleted genes that can be observed in yeast. Therefore, it is the apparent deleted block length distributions from our simulations that we compare to the observed deleted block length distribution from S. cerevisiae. When we discuss "deleted blocks" from either simulation or data analysis, we will always be referring to apparent deleted blocks. In S. cerevisiae, Kellis, Birren, and Lander (2004)Go found that ~10% of duplicates remained from the WGD, so in our simulations we allowed deletions to continue until 10% of duplicate genes remained. All simulation results were generated using 10,000 iterations.

Gene Expression Analyses
Saccharomyces cerevisiae expression data from Affymetrix GeneChip (101 microarray experiments) were obtained from National Center for Biotechnology Information's Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). Only the hybridization intensities for the perfect match probes were used for further analyses. These data were background corrected and quantile normalized in R (http://www.r-project.org) using the Affy package from Bioconductor (http://www.bioconductor.org; Gautier et al. 2004Go). The probe-to-gene annotation was created from a MEGABLAST (Zhang et al. 2000Go) similarity search of the probe sequences against the most recent version of S. cerevisiae coding sequences. To avoid spurious signal due to cross-hybridization, probes that matched more than one gene with an E value < 10–2 (at least 12 consecutive base pairs and 16/25 bp matching overall) were dropped from the analysis. The relative expression values used in subsequent analyses are mean intensities from the set of unique probes. To quantify expression coupling of the genes in an adjacent pair, we calculated the mean of the absolute value of log2 expression ratios for adjacent genes (abs[log2[adjacent gene #1 expression/adjacent gene #2 expression]]) across conditions and Pearson and Spearman correlations across conditions. To quantify maximal expression for an adjacent gene pair, we summed expression for pairs of adjacent genes within each condition, then took the maximum of this value across conditions (max[adjacent gene #1 expression + adjacent gene #2 expression]).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Adjacent Gene Relationships in Extant Yeast Genomes
To investigate the evolution of adjacent gene relationships in yeast, we first surveyed the variation in adjacent gene organization in the genome sequences of 10 members of the Saccharomyces species complex (table 1). All species examined have a bias toward divergent and convergent adjacencies; however, the %DC values for genomes of post-WGD species are consistently lower (51.0%–53.0%) than the %DC found in the genomes of non-WGD species (54.0%–57.3%). This suggests that one effect of WGD-D may be to reduce genome-wide bias in adjacent gene orientation.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary of Adjacent Gene Relationships for 10 species of the Saccharomyces Complex

 
The intergenic lengths of the non-WGD genomes are generally less than those of post-WGD species (table 1). Because intergenic length estimates are inflated by missing genes, it is most informative to compare genomes of similar annotation quality. Comparing the well-annotated non-WGD species A. gossypii (4,711 genes) with the post-WGD species S. cerevisiae (5,714 genes), we see that the intergenic spacing has increased regardless of orientation. The same is true in the comparison of the non-WGD species K. waltii (5,230 genes) and K. lactis (5,331 genes) with the post-WGD species C. glabrata (5,272 genes) and S. bayanus (4,716 genes).

Modeling Adjacent Gene Reorganization Due to WGD-D
We developed a simulation to test the ability of WGD-D to reorganize adjacent gene relationships (see Materials and Methods). Under random single-gene deletion, our model predicts that WGD-D will lead to extensive reorganization of adjacent gene orientations for a wide range of starting genome structures (fig. 1). In particular, the mean genome-wide orientation bias was always reduced in our simulations of neutral WGD-D. We also find that selection has the potential to shape the extent of reorganization, either in terms of %DC (fig. 1) or deleted block lengths (J. K. Byrnes and G. P. Morris, unpublished data). We present the results for strong selection on adjacency (i.e., duplicate copies are deterministically selected for deletions based on net differences in the adjacencies created) and moderate selection on adjacency (i.e., 40% adjustment in the probability of choosing the copy for deletion), either favoring or disfavoring tandem adjacencies. We can clearly distinguish the change in %DC expected under these selective models from that expected under the neutral model for a wide range of starting %DC (starting %DC > 20%).


Figure 1
View larger version (11K):
[in this window]
[in a new window]
 
FIG. 1.— The extent of reorganization for adjacent gene orientation depends on starting bias in orientation and selection on orientation. We created 51 starting genomes with %DC ranging from 0% to 100%, separated by increments of ~2%, based on the gene counts of Ashbya gossypii. We simulated WGD-D under a single-gene deletion model 10,000 times for each genome and here plot the mean change in %DC (+/– one standard deviation [SD]) for a neutral deletion (solid) or with strong (long dash) or moderate (short dash) selection favoring (dash) or disfavoring (dot-dash) deletions that result in a net gain of tandem adjacencies.

 
From our simulations, we can make quantitative predictions about the effect of WGD-D to compare with the pattern in extant yeast. We ran simulations using the A. gossypii genome organization as a proxy for the ancestral organization and used a range of attempted deletion distributions to account for the possibility of longer deletion tracts or clustering of deletions. Of the neutral deletion scenarios we examined, only single-gene deletion could reduce the %DC from 56.0% to 52.0%, the inferred reduction in the S. cerevisiae lineage (fig. 2A; P = 0.21 for single-gene deletion and P < 10–4 for models with longer deletions).


Figure 2
View larger version (19K):
[in this window]
[in a new window]
 
FIG. 2.— Shorter deletions in WGD-D result in more reorganization of gene orientation and result in shorter deleted blocks. (A) We plot the frequency distribution of %DC for 10,000 simulations of WGD-D using the Ashbya gossypii genome structure as the starting point, with deletion lengths drawn from a uniform or Poisson distribution. Vertical lines indicate the genome-wide %DC values observed for Saccharomyces cerevisiae (solid) and A. gossypii (dotted). Smaller deletions lead to significantly greater mean change and variance in final %DC (P < 10–15). The single-gene deletion model (i.e., uniform with max = 1; short dashed line) has a reduction in %DC from 56% to 52.5% ± 0.57% (mean ± SD), which is statistically indistinguishable from the S. cerevisiae value (P = 0.21). (B) We plot the distribution of deleted blocks lengths from the same set of simulations, with the mean (±SD) for each deletion model, against the distribution from S. cerevisiae. Again, only the single-gene deletion model (short dashed line) approximates the pattern observed in S. cerevisiae (solid line).

 
We can also use our simulations to derive the expected distribution of deleted block lengths under a variety of deletion models. We compared the distribution of deleted block lengths (in number of genes) from S. cerevisiae to the distribution from our simulations of uniform and Poisson-distributed attempted deletions. The distribution of deletions for S. cerevisiae is skewed toward small deletions relative to most neutral deletion models (fig. 2B; P < 0.0001), but a single-gene deletion model can approximate it well (P = 0.11).

Evolutionary Analysis of Orientation and Intergenic Spacing
To gain insight into the dynamics of the adjacent gene reorganization, we traced the origin of each pair of adjacent genes in S. cerevisiae, looking for conservation in successive outgroups along the S. cerevisiae lineage. The oldest adjacencies, which predate WGD, have a %DC of 56.7% (fig. 3A), statistically indistinguishable from the ancestral %DC (P = 0.59). Therefore, the orientation bias in conserved adjacencies is a reflection of the ancestral bias, not a bias in the retention rate. Given that the ancestral genome has a divergent-convergent bias, we expect that random deletion will initially create more new tandem adjacencies and reduce the divergent-convergent bias (see fig. 1). As deletion proceeds, and the genome-wide divergent-convergent bias is reduced, the bias in newly created adjacencies should approach this genome-wide value (see fig. 1). Indeed, the adjacencies created immediately following WGD have a tandem bias (%DC = 46.5%) and the tandem bias is reduced for those created more recently (%DC = 49.3%–49.9%), though they are not significantly different from one another.


Figure 3
View larger version (24K):
[in this window]
[in a new window]
 
FIG. 3.— Origin of adjacencies in the Saccharomyces cerevisiae lineage. (A) %DC, (B) mean intergenic length, (C) mean log2 expression ratio of adjacent genes, and (D) maximal expression for adjacent gene pairs in each age class (Scer = S. cerevisiae specific, stricto = Saccharomyces sensu stricto specific, Scas = present in S. cerevisiae-S. castellii common ancestor, pre-WGD = conserved from before WGD). Bars with the same letter are not significantly different (P > 0.05), and error bars represent 95% confidence intervals. The number of genes in each adjacency age class is as follows (with the subset of genes with expression data [C, D] given in parentheses): Scer, 1,392 (312); stricto, 1,530 (1,454); Scas, 810 (774); and pre-WGD, 2,156 (1,984). The pattern for %DC and mean intergenic length is the same whether we use the full set of genes (A, B) or the subset of genes with expression data (G. P. Morris, unpublished data). The %DC is lowest for adjacencies created immediately following WGD-D and increases slightly in more recently created adjacencies. Intergenic lengths, log2 expression ratios, and maximal expression are greater for younger adjacencies.

 
The evolutionary analysis also shows a pattern of greater intergenic spacing in more recently created adjacencies (fig. 3B). Furthermore, the mean and variance of intergenic length are monotonically decreasing with age, as would be expected if genes are lost by small, inactivating mutations followed by successive small deletions. This effect is not due to differences in %DC across the adjacency age classes because the pattern holds when the data is partitioned into convergent, divergent, and tandem adjacencies (G. P. Morris, unpublished data). Even the adjacencies created immediately following WGD have greater mean intergenic length than the adjacencies that remained from before WGD, suggesting that WGD-D has had a lasting effect on intergenic spacing.

Expression Evolution in Adjacencies
Given that Cohen et al. (2000)Go found that adjacent gene coexpression was less likely for pairs with greater spacing, we asked whether the evolution of longer intergenic regions was associated with the decoupling of expression in adjacent genes. If new adjacencies are as strongly coupled in expression as conserved adjacencies, then the increase in intergenic spacing provides no overall expression decoupling. To characterize the extent of expression coupling, we determined the log2 expression ratio for pairs of adjacent genes in each age class, averaged across 101 published microarray experiments (fig. 3C). There is a significant increase in the mean expression ratio for younger adjacencies, suggesting an overall expression decoupling due to WGD-D (P < 10–7). Pearson and Spearman correlation coefficients for expression of adjacent genes, which would detect finer scale coexpression patterns, show no relationship with age of adjacency (G. P. Morris, unpublished data). The higher mean log2 expression ratio for new adjacencies may not be due to decoupling at all levels of expression but could be driven by decoupling at high expression levels. Because the mean log expression ratio for adjacent genes is more sensitive to extreme values than correlation coefficients, which are bounded, the pattern in log expression ratio is likely driven by the increase in maximal expression. Therefore, we asked whether the evolution of longer intergenic regions was associated with greater maximal expression (i.e., maximum of the summed expression for an adjacent gene pair across conditions). In this case, the average maximal expression should be greater for the younger adjacency age classes. As predicted, greater maximal expression is observed in the adjacency age classes with greater intergenic spacing, that is, the S. cerevisiae and the Saccharomyces sensu stricto–specific adjacencies (fig. 3D; P < 10–12).

Expression values derived from microarray hybridization signal may potentially be influenced by differences in probe affinities due to base composition, but this is unlikely to affect our results given the large number (~104–105) of probes per adjacency age class. Indeed, when we performed a multiple regression accounting for any effect of GC-content on the expression measures, the relationship between maximal expression and adjacency age class remained highly significant (P < 10–6). Furthermore, this relationship is not due to a specific functional category of genes. While many of the highly expressed genes are ribosomal, the trend remains when we remove the 438 genes in our data set annotated by Gene Ontology to the "protein biosynthesis" biological process category.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Deletion and the Mechanism of Gene Loss
In the 2:1 alignments of the S. cerevisiae genome to the K. waltii or A. gossypii genome, there appears to be many multiple-gene deletion blocks. Kellis, Birren, and Lander (2004)Go noted that deleted blocks are small, with an average length of two genes, but they did not speculate whether these blocks represent individual multiple-gene deletion events or several single-gene losses beside one another. Our simulation demonstrates that long tracts of adjacent deletions are common under a random single-gene deletion model, and S. cerevisiae has no more long deleted blocks than expected under a model of random single-gene deletion (fig. 2B). Therefore, we find no evidence that multiple-gene deletion events played a significant role in WGD-D in yeast. Similarly, there is no evidence for clustering of deletions, as this would also lead to the appearance of an excess of multiple-gene deleted blocks. Because multiple-gene deleted blocks imply tracts of conserved adjacency on the paralogous chromosome, there is also no genome-wide tendency for conservation of gene clusters at the level of adjacencies. Because we only consider conservation of adjacency, this does not contradict window-based identification of conserved clusters of metabolic (Wong and Wolfe 2005Go) or essential genes (Pal and Hurst 2003Go).

While a single-gene deletion model was best able to approximate the effect of WGD-D in yeast, our gene-based simulation does not distinguish whether the loss of single genes occurs by gene-length deletion, smaller deletion, or inactivating mutation. However, the evolutionary analysis of intergenic lengths does provide information about the interplay between deletion and pseudogenization during the process of gene loss. For instance, if a functional duplicate gene is lost by a gene-length deletion event without an intermediate step of pseudogene formation, the intergenic length of the new adjacency may not increase significantly. This mode of gene loss is not likely because our analysis shows that adjacencies newly created by WGD-D have greater intergenic spacing (fig. 3B). This suggests that gene loss is initiated by inactivating mutations, such as small indels that cause frameshifts or substitutions that cause premature stop codons.

The monotonic decrease of intergenic spacing with increasing age of adjacency suggests that after pseudogenization the intergenic spacing for the newly formed adjacency is gradually reduced by many small deletions. This mean decrease of intergenic spacing cannot be due to rare large deletions because the variance is reduced along with the mean (fig. 3B). These data support a model for gene loss during the yeast WGD-D where small inactivating mutations (deletion or otherwise) are followed by a whittling down of the pseudogenic and intergenic sequence. Previous research has shown that small indels are biased toward deletion in a wide range of eukaryotes, from mammals and fish to insects and plants (Gregory 2004Go), but it has never been investigated in yeast. While it is possible that selection for genome compactness led to the reduction of intergenic space following WGD-D, it is likely that the yeast deletion pattern is due to the same mutational bias found across the eukaryotes.

In keeping with the idea of gradual deletion, it seems that the deletion phase was not complete in the common ancestor of the post-WGD species. If the deletion phase of WGD-D was very short, most new adjacencies would have been created by the time of the S. castellii-S. cerevisiae split, but we find that the majority (73%; fig. 3) of new adjacencies were created afterward. An incomplete annotation for the S. castellii genome could potentially lead us to underestimate the age of some adjacencies, spuriously dating them to the S. cerevisiae-Saccharomyces sensu stricto split. However, there are also many (721) S. cerevisiae–specific adjacencies, which are unlikely to have been missed in the genome sequencing of all Saccharomyces sensu stricto species and S. castellii. Therefore, we conclude that the process of gene loss and deletion has continued during the radiation of the Saccharomyces species complex. Because there are only a handful of pseudogenes in yeast (Harrison and Gerstein 2002Go) and none of these appear to be of WGD-D origin (G. P. Morris, unpublished data), little, if any, trace remains of the pseudogenes created by WGD-D.

Functional Consequences of WGD-D
A gradual process of gene loss may suggest that the deletion phase of WGD-D is neutral or even subject to negative selection. Unfortunately, the current annotation quality does not allow a reliable estimate of the gene loss rate, so we cannot determine whether the rate of gene loss is reduced due to negative selection conserving adjacencies or even increased by positive selection favoring the creation of new adjacencies. However, the signature of selection may appear as a bias in the conservation or creation of adjacent gene relationships or in the spatial patterning of deletions.

We find that random single-gene deletion can recapitulate the reorganization of adjacencies due to WGD-D in S. cerevisiae. Because random single-gene deletion is sufficient to produce the same genome-wide reduction in %DC (from ~56% to ~52%) as observed in yeast, WGD-D was largely neutral with respect to orientation (fig. 2A). The conservation and creation of adjacencies along the S. cerevisiae lineage is also consistent with a neutral reorganization of gene orientations (fig. 3A). Finally, the random single-gene deletion model was also able to produce the distribution of deleted block lengths observed in S. cerevisiae, with deletions neither excessively interleaved nor clustered (fig. 2B). Because the random single-gene deletion model of WGD-D reproduced the patterns of retention and turnover of adjacent gene relationships, there is no evidence that negative selection or positive selection shaped the WGD-D genome reorganization in yeast.

Even though there is no evidence for selection shaping WGD-D in yeast, there is evidence that this reorganization had functional consequences for the genome. A major effect of the interleaved gene loss was to provide increased intergenic spacing for many genes (fig. 3B). We suggest that this increased intergenic spacing was responsible for the uncoupling of expression and increased maximal expression for adjacent genes by relieving transcriptional interference. Given that the ancestral genome was highly compact, the concurrent expression of adjacent genes may have been limited by transcriptional interference (Shearwin, Callen, and Egan 2005Go). Even in the relatively spacious genome of humans, there is evidence that transcriptional interference can explain an association between higher expression and greater intergenic spacing (Chiaromonte, Miller, and Bouhassira 2003Go).

How can we reconcile the gradual neutral reorganization of adjacent gene relationships with the evidence for widespread functional changes? For instance, while WGD maintains relative copy numbers of interacting partners, the dosage balance hypothesis (Veitia 2004Go) would predict that an asynchronous deletion phase would be deleterious. It may be that downstream regulatory mechanisms (i.e., feedback or translational regulation) compensate for most expression changes or that expression changes of the magnitude we see do not affect the function of most genes. Future improvements to the genome sequences of the species in the Saccharomyces complex will clarify the dynamics of WGD-D, but a full understanding of the functional consequences will require genome-wide expression data from more species, particularly the non-WGD outgroups.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
This research was supported by the Natural Sciences and Engineering Research Council of Canada (G.P.M.), National Science Foundation (G.P.M.), the Department of Education's Graduate Assistance in Areas of National Needs Program (J.K.B and G.P.M.), and National Institutes of Health grants (W.-H.L.). We thank K. Wolfe and the reviewers for helpful suggestions.


    Footnotes
 
1 These authors contributed equally to this work. Back

Kenneth Wolfe, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Chiaromonte, F., W. Miller, and E. E. Bouhassira. 2003. Gene length and proximity to neighbors affect genome-wide expression levels. Genome Res. 13:2602–2608.[Abstract/Free Full Text]

    Cliften, P., P. Sudarsanam, A. Desikan, L. Fulton, B. Fulton, J. Majors, R. Waterston, B. A. Cohen, and M. Johnston. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301:71–76.[Abstract/Free Full Text]

    Cohen, B. A., R. D. Mitra, J. D. Hughes, and G. M. Church. 2000. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat. Genet. 26:183–186.[CrossRef][ISI][Medline]

    Dietrich, F. S., S. Voegeli, S. Brachat et al. (14 co-authors). 2004. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science 304:304–307.[Abstract/Free Full Text]

    Dujon, B., D. Sherman, G. Fischer et al. (67 co-authors). 2004. Genome evolution in yeasts. Nature 430:35–44.[CrossRef][Medline]

    Filipski, J., and M. Mucha. 2002. Structure, function and DNA composition of Saccharomyces cerevisiae chromatin loops. Gene 300:63–68.[Medline]

    Fischer, G., S. A. James, I. N. Roberts, S. G. Oliver, and E. J. Louis. 2000. Chromosomal evolution in Saccharomyces. Nature 405:451–454.[CrossRef][Medline]

    Garcia, I., R. Gonzalez, D. Gomez, and C. Scazzocchio. 2004. Chromatin rearrangements in the prnD-prnB bidirectional promoter: dependence on transcription factors. Eukaryot. Cell 3:144–156.[Abstract/Free Full Text]

    Gautier, L., L. Cope, B. M. Bolstad, and R. A. Irizarry. 2004. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20:307–315.[Abstract/Free Full Text]

    Gerton, J. L., J. DeRisi, R. Shroff, M. Lichten, P. O. Brown, and T. D. Petes. 2000. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 97:11383–11390.[Abstract/Free Full Text]

    Gregory, T. R. 2004. Insertion-deletion biases and the evolution of genome size. Gene 324:15–34.[CrossRef][ISI][Medline]

    Harbison, C. T., D. B. Gordon, T. I. Lee et al. (20 co-authors). 2004. Transcriptional regulatory code of a eukaryotic genome. Nature 431:99–104.[CrossRef][Medline]

    Harrison, P. M., and M. Gerstein. 2002. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J. Mol. Biol. 318:1155–1174.[CrossRef][ISI][Medline]

    Kellis, M., B. W. Birren, and E. S. Lander. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428:617–624.[CrossRef][Medline]

    Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254.[CrossRef][Medline]

    Kurtzman, C. P. 2003. Phylogenetic circumscription of Saccharomyces, Kluyveromyces and other members of the Saccharomycetaceae, and the proposal of the new genera Lachancea, Nakaseomyces, Naumovia, Vanderwaltozyma and Zygotorulaspora. FEMS Yeast Res. 4:233–245.[CrossRef][ISI][Medline]

    Lohr, D., P. Venkov, and J. Zlatanova. 1995. Transcriptional regulation in the yeast GAL gene family: a complex genetic network. FASEB J. 9:777–787.[Abstract]

    Pal, C., and L. D. Hurst. 2003. Evidence for co-evolution of gene order and recombination rate. Nat. Genet. 33:392–395.[CrossRef][ISI][Medline]

    Shearwin, K. E., B. P. Callen, and J. B. Egan. 2005. Transcriptional interference—a crash course. Trends Genet. 21:339–345.[CrossRef][ISI][Medline]

    Veitia, R. A. 2004. Gene dosage balance in cellular pathways: implications for dominance and gene duplicability. Genetics 168:569–574.[Free Full Text]

    Wolfe, K. H., and D. C. Shields. 1997. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387:708–713.[CrossRef][Medline]

    Wong, S., and K. H. Wolfe. 2005. Birth of a metabolic gene cluster in yeast by adaptive gene relocation. Nat. Genet. 37:777–782.[CrossRef][ISI][Medline]

    Zhang, Z., S. Schwartz, L. Wagner, and W. Miller. 2000. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7:203–214.[CrossRef][ISI][Medline]

Accepted for publication March 1, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
M. J. A. van Hoek and P. Hogeweg
The Role of Mutational Dynamics in Genome Shrinkage
Mol. Biol. Evol., November 1, 2007; 24(11): 2485 - 2494.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
M. Woolfit, E. Rozpedowska, J. Piskur, and K. H. Wolfe
Genome Survey Sequencing of the Wine Spoilage Yeast Dekkera (Brettanomyces) bruxellensis
Eukaryot. Cell, April 1, 2007; 6(4): 721 - 733.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
K. P. Byrne and K. H. Wolfe
Consistent Patterns of Rate Asymmetry and Gene Loss Indicate Widespread Neofunctionalization of Yeast Genes After Whole-Genome Duplication
Genetics, March 1, 2007; 175(3): 1341 - 1350.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. G. Giacomelli, A. S. Hancock, and J. Masel
The Conversion of 3' UTRs into Coding Regions
Mol. Biol. Evol., February 1, 2007; 24(2): 457 - 464.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Barrett, D. B. Troup, S. E. Wilhite, P. Ledoux, D. Rudnev, C. Evangelista, I. F. Kim, A. Soboleva, M. Tomashevsky, and R. Edgar
NCBI GEO: mining tens of millions of expression profiles--database and tools update
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D760 - D765.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/6/1136    most recent
msj121v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (7)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Byrnes, J. K.
Right arrow Articles by Li, W.-H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Byrnes, J. K.
Right arrow Articles by Li, W.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?