Skip Navigation


MBE Advance Access originally published online on December 21, 2006
Molecular Biology and Evolution 2007 24(3):805-813; doi:10.1093/molbev/msl206
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
24/3/805    most recent
msl206v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Homma, K.
Right arrow Articles by Nishikawa, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Homma, K.
Right arrow Articles by Nishikawa, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Research Articles

Gene Cluster Analysis Method Identifies Horizontally Transferred Genes with High Reliability and Indicates that They Provide the Main Mechanism of Operon Gain in 8 Species of {gamma}-Proteobacteria

Keiichi Homma*, Satoshi Fukuchi*, Yoji Nakamura{dagger}, Takashi Gojobori*,{ddagger} and Ken Nishikawa*

* Center for Information Biology–DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka 411-8540, Japan
{dagger} Graduate School of Information and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Japan
{ddagger} Japan Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, AIST Tokyo Waterfront, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan

E-mail: knishika{at}genes.nig.ac.jp


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
The formation mechanism of operons remains unresolved: operons may form by rearrangements within a genome or by acquisition of genes from other species, that is, horizontal gene transfer (HGT). One hindrance to its elucidation is the unavailability of a method to accurately identify HGT, although it is generally considered to occur. It is critically important first to select horizontally transferred (HT) genes reliably and then to determine the extent to which HGT is involved in operon formation. For this purpose, we considered indels in terms of gene clusters instead of individual genes and chose candidates of HT genes in 8 species of Escherichia, Shigella, and Salmonella based on the minimization of indels. To select a benchmark set of positively HT genes against which we can evaluate the candidate set, we devised another procedure using intergenetic alignments. Comparison with the benchmark set demonstrated the absence of a significant number of false positives in the candidate set, showing the high reliability of the method. Analyses of Escherichia coli K-12 operons revealed that although ~20 operons were probably gained from the last common ancestor of the 8 {gamma}-proteobacteria, deletion of intervening genes accounts for the formation of no operons, whereas horizontal transfer expanded 2 operons and introduced 4 entire operons. Based on these observations and reasoning, we suggest that the main mechanism of operon gain is HGT rather than intragenomic rearrangements. We propose that genes with related essential functions tend to reside in conserved operons, whereas genes in nonconserved operons mostly confer slight advantage to the organisms and frequently undergo horizontal transfer and decay. HT genes constitute at least 5.5% of the genes in the 8 species and approximately 45% of which originate from other {gamma}-proteobacteria. Genes involved in viral functions and mobile and extrachromosomal element functions are HT more often than expected. This finding indicates frequent mediation of HGT by bacteriophages. On the other hand, not only informational genes (those involved in transcription, translation, and related processes) but also operational genes (those involved in housekeeping) are HT less frequently than expected.

Key Words: horizontal gene transfer • operon • evolution • proteobacteria


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
With 3 major models having been proposed (Lawrence 1997Go), the evolutionary origin of operons in bacterial genomes still remains controversial. The Fisher (1930)Go model states that operons form because the proximity of the coadapted genes reduces the probability of obtaining unfavorable combinations of genes by recombination. Although this model explains operons whose constituent genes encode physically interacting proteins, the observed frequency of recombination is not high enough to justify the coadaptation of genes (Desjardins et al. 1995Go). The co-regulation model (Jacob and Monod 1962Go) instead hypothesizes that operons facilitate coordinated expression and regulation. Though supported by the tendency of functionally related genes to reside in the same operons (de Daruvar et al. 2002Go), this model fails to explain the existence of many operons containing genes of unrelated functions (Lawrence and Roth 1996Go). Finally, the selfish operon model (Lawrence and Roth 1996Go) postulates that operons allow propagation of functionally related genes via horizontal transfer. The model proposes that horizontal transfer of a gene cluster containing weakly selected genes followed by deletion of intervening genes leads to operon formation. Despite the attractiveness of the gradual operon formation mechanism, the prediction that operons frequently undergo horizontal transfer was not borne out by recent analysis (Price et al. 2005Go). Moreover, no definitive cases of operons that had formed by the mechanism have been reported. We test the last two models through examinations of real cases of operon gain.

All the hitherto available methods to identify horizontally transferred (HT) genes contain possible errors, making it impossible to draw definitive conclusions on the importance of horizontal gene transfer (HGT) in evolution in general and particularly in operon formation. Besides the probabilistic nature, methods based on nucleotide compositions are ineffective in selecting short HT genes, those from phylogenetically close species whose compositions do not appreciably differ and those whose compositions became indistinguishable from that of a host through amelioration (Lawrence and Ochman 1997Go; Nakamura et al. 2004Go). For the same reasons, compositional approach often leads to erroneous identification of HT genes (Koski et al. 2001Go; Ragan 2001Go; Wang 2001Go). Phylogenetic methods share the first 2 weaknesses with compositional methods and often produce false identification because the smallness in the number of genes one examines in HT gene selection often produces incorrect phylogenetic trees (Rokas et al. 2003Go), especially when the genes of close species are involved. The third type of methods determines HT genes from the presence/absence of genes and is based on the idea that two or more independent deletion events are less likely to occur than one insertion event and identify HT genes by the presence/absence of homologs in bacterial phyla (Hooper and Berg 2002Go; Ragan and Charlebois 2002Go; Daubin et al. 2003Go; Price et al. 2005Go). This type of methods arbitrarily sets the relative weights of deletion and insertion events and presumes the existence of a common reference tree. They may miss HT genes if there are paralogues and fail to assess the extent of false positives. Moreover, the assessment of indels in terms of individual genes often fails to correspond to real-world indels of gene clusters. We here develop a novel method using the assessment of gene cluster alterations to identify HT genes with high reliability, present evidence showing that HGT is the main mechanism of operon gain in 8 species of {gamma}-proteobacteria, and analyze characteristics of the identified HT genes.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Data and Statistical Analyses
All the sequence data used in this study were taken from those of the 3 domains of life in the Genome to Protein Structures and Functions (GTOP) database (Kawabata et al. 2002Go) (26 October 2005 version). The general policy of GTOP (http://spock.genes.nig.ac.jp/~genome/gtop-j.html) is to provide protein data of all the entirely sequenced organisms and contains 203 bacterial, 21 archaeal, and 50 eukaryotic species in the particular version. Among the Escherichia, Shigella, and Salmonella species in GTOP, we analyzed the following species: Escherichia coli K-12 (designated as ecol0), O157:H7 Sakai (ecol1), CFT073 (ecol3); Shigella flexneri 2a 301(sfle0), 2a 2457T (sfle1); Salmonella typhi CT18 (styp0), Salmonella enterica Typhi Ty2 (styp2), and Salmonella typhimurium LT2 (styp1). The 8 species are heretofore referred as the ESS species. The list of essential genes in ecol0 was obtained from http://www.shigen.nig.ac.jp/ecoli/pec/index.jsp, whereas functional assignments of the genes were taken from version 2.3 of the Comprehensive Microbial Resource (Peterson et al. 2001Go), disregarding the following categories: hypothetical proteins, unclassified, and unknown functions. The expectation value in each function is the product of the total number of all HT gene candidates with the functional annotation and the fraction of all genes with the functional annotation. We calculated the fraction of essential genes after excluding unclassified genes. All the statistical significances were evaluated by the chi-square test.

Identification of Orthologs
Syntenic regions were identified by a program employing the dynamic programming algorithm based on the similarity of chromosomally encoded gene (Fukuchi and Nishikawa 2004Go). We gave first preference to syntenic regions consisting of over 99 genes and second preference to syntenic regions containing between 10 and 99 genes. Syntenic regions multiply identified in the same preference category were neglected. Intergenetic regions with homology to genes in the other ESS species were searched by the Mummer program 3.0 (Kurtz et al. 2004Go) and designated as gene-homologous regions (GHRs). Genes and GHRs aligned in syntenic regions were considered as orthologs.

Selection of HT Gene Candidates
We regard a continuous stretch of genes with an identical presence/absence pattern among the ESS species as a reduced ortholog cluster (ROC) (genes of the same color in fig. 1a and b). The state of each ROC is expressed by 1 and 0 for the presence and absence of a gene or GHR in the species, respectively (fig. 1b and c). ROCs that exist in all the ESS species (colored orange and green in fig. 1b) are called conserved ROCs, whereas other ROCs are termed nonconserved ROCs. We analyze the nonconserved ROCs sandwiched by conserved ROCs (blue- and pink-tinted cells). At the top there is one nonconserved ROC before the first conserved ROC, and at the bottom there are 121 nonconserved ROCs after the last conserved ROC. We regarded these ROCs the same as those bounded at both ends by conserved ROCs and processed them accordingly. We represent the state of each species in a genome region with the exclusive use of bits (e.g., 01 for ecol1 in the given example). Due to the limitation in computational power, the maximum number of ROCs in one section, that is, the largest number of bits to describe states, is set at 8. The nonconserved ROCs between conserved orthologs are thus divided into sections of at most 8 ROCs, with an overlap of 4 from one to the next. The 3,313 nonconserved ROCs were thus divided into 1,342 sections. The probable insertion events were identified based on parsimony and strict consensus between sliding windows (for details, see Supplementary Method, Supplementary Material online). The best BlastP hit of each gene is defined as the one with the lowest e-value among the hits in GTOP other than those of orthologous genes with e-values below 0.001. A gene with at least one GHR but no other genes in the orthologous cluster in the ESS species is regarded as an over-annotated gene if it has no homologs in the GTOP database. There are 413 such genes. We exclude over-annotated genes from the set of inserted genes. The remaining inserted genes are regarded as HT genes if a majority of the best BlastP hits of the genes in an inserted ROC are of genes in non-ESS species.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Identification of inserted genes by the MinGeneCIDE method. (a) ROCs are represented by isochromatic arrows of proportional lengths presented in the ecol0 gene order with the arrowheads signifying orientations. Orthologs are vertically aligned. Lines and dotted lines, respectively, indicate intergenetic regions and the lack of corresponding sequences. (b) The same ROCs as in (a) in tables with cells colored according to the ROCs. (c) Under each node and species in the phylogenetic tree, all the possible states are shown in bit representation. Asterisks and crosses mark the states of the two minimal event pathways.

 
Operon Conservation
The operons we used are exclusively those in the Operon Data Library (Itoh et al. 1999Go), which lists verified operons from databases and literature. We examined the orthologs of the genes in each operon in the Operon Data Library and judged the operon conserved if and only if all the orthologs are present, contiguous, and in the same orientation in all the ESS species, neglecting over-annotated genes. An operon is considered to be possibly functional in a species other than ecol0 if there are multiple orthologs (possibly with inserted genes), but with gaps between the genes less than 300 nucleotides (Morreno-Hagelsieb and Collado-Vides 2002Go), and it is otherwise regarded as probably nonfunctional.

Phylogenetic Analyses
To carry out phylogenetic analysis of an ortholog cluster in the ESS species, we first identified homologs of the longest ortholog in each cluster. Among the BlastP hits in GTOP other than the ESS orthologs with e-values less than 0.001, we left only the hits whose aligned segments are more than or equal to 80% that of the query. We then chose the best hit, that is, the homolog with the shortest protein distance in each of the following 4 groups: {gamma}-proteobacteria, other proteobacteria, other bacteria, archaea and eukaryote. The protein distances were determined by the PROTDIST program after ClustalW alignments of the 10 BlastP hits with the smallest e-values. If the closest homologs exist in both groups in the cases of figure 2, we drew the phylogenetic tree using the protein distances of the 3 proteins using Dayhoff's PAM matrix after a new ClustalW alignment. If at least one BlastP hit exists in each of the 4 groups, then the best hit is available in each group, and there are 6 cases to carry out phylogenetic analyses. In each case, the phylogeny in figure 2a is regarded as compatible with vertical gene transfer, whereas that in figure 2b is judged to be consistent with HGT to ESS species from a group 2 species.


Figure 2
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Scheme of phylogenetic analyses. (a) Phylogeny consistent with vertical gene transfer. (b) Phylogeny consistent with HGT from a group 2 species to ESS species.

 
Identification of the Most Probable Origin of the HT Genes in an Operon
We first identified species in GTOP other than ESS that have homologs of all the HT genes in an operon. We kept only the species that possess homologs in the same order as those in the operon with no intervening genes. If multiple candidates exist, we regard a species the most probable origin if the homologs are the closest to the genes in the operon by the same method as used in the phylogenetic analyses, using the concatenated protein sequences.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Clusters of Orthologous Genes
We consider it more natural to analyze genome alterations in terms of gene clusters than individual genes as genes frequently undergo insertion and deletion in clusters as implied by the existence of species-specific islands containing many genes revealed by comparison of 2 closely related species (Perna et al. 2001Go). For this purpose, we first determined orthologs by identifying syntenic regions in the ESS species. Though most genes were identified as orthologs, some genes show incomplete synteny or none at all (fig. 1a). A total of 52.7% of the genes were in regions syntenic to all the ESS species, 37.9% resided in regions having synteny with some but not all the ESS species, whereas 9.4% fell in regions with no synteny detected. To take regions unannotated as genes into consideration, we identified intergenetic regions with homology to genes and designated them as GHRs and presented examples in the figure. We identified 3,299 GHRs. GHRs include pseudogenes and genome fragments that are too short to be recognized as pseudogenes. Note that in this scheme, independent alterations of the constituent genes of an ROC (e.g., insertion of only the ygiT gene in ecol0) are considered less likely than indels of the entire ROC, not only because independent alterations inevitably involve more events but also because they can affect different sets of genes.

HT Gene Identification Method
We developed the Minimum Gene Cluster Indel (MinGeneCIDE) method to identify inserted genes through analyses of gene cluster alterations based on the following idea: if all the possible pathways to account for the current states in the ESS species in the minimum number of indels involve an insertion of an ROC, the insertion of the cluster after the last common ancestor of the ESS species (LCA) is probable. There are 798 conserved ROCs containing 2,451 clusters of orthologous genes and 3,313 nonconserved ROCs comprising 8,413 clusters of orthologous genes. We assume conserved ROCs not to have undergone indels after the LCA, in accordance with the minimum indel principle. The phylogenetic tree (fig. 1c, designated as tree S) is that inferred from genome conservation (Kunin et al. 2005Go) and genome rearrangement (inversion) distances (Belda et al. 2005Go) and is identical to the tree based on the 16S rRNA sequences. Methods based on ortholog sequences (Pal et al. 2005Go) generally generate the alternative tree with slfe0 and sfle1 nesting with ecol0 (termed tree A, Supplementary fig. 1, Supplementary Material online). As combined distances between genome rearrangements and ortholog sequences infer tree S (our unpublished observation), however, we regard tree S to represent the true phylogeny (see further discussion below). Using tree S we identified 6,285 insertion gene candidates including the pink ROC in figure 1 from the nonconserved ROCs by the MinGeneCIDE method. From the group of inserted genes, we excluded over-annotated genes.

We then thought about a way to remove inserted genes caused by translocation or duplication to get HT genes. Although inserted genes as a result of translocation or duplication generally have easily identifiable close homologs within the ESS species, some of the homologs may have been lost or undergone rapid evolution. We therefore attribute an inserted ROC to translocation or duplication if the best BlastP hits in the GTOP database (Kawabata et al. 2002Go) of a majority of genes in the ROC are in the ESS species and otherwise consider it to be HT from other species. We note that this procedure makes it impossible to detect HGT among the ESS species and leads to underestimation of HT genes. The remaining inserted ROCs were subjected to this Majority Rule screening to yield 2,016 HT gene candidates (5.5% of total, table 1 and Supplementary table 1, Supplementary Material online), including the pink-colored genes in figure 1, and 3,961 translocated or duplicated genes. (The more stringent requirement to remove clusters if at least one of the genes has the best hit belonging to an ESS species results in a 64% decrease in the number of HT gene candidates. As we found no difference in the reliability of the 2 sets of HT gene candidates [see below for methodology], however, we decided to hold on to the Majority Rule screening).


View this table:
[in this window]
[in a new window]

 
Table 1 Species distribution of HT genes

 
The average number of HT gene candidates per ROC is 4.5, justifying our procedure to identify HGT of gene clusters instead of individual genes. We note that the species breakdown of HT gene candidates may not faithfully reflect the real distribution, as the identifiable fraction by the MinGeneCIDE method is dependent on the tree topology. Notice that the Majority Rule conservatively rejects those that could have been translocated or duplicated. This is why only a few of the genes in the 2 pap pilus operon-containing pathogenicity islands present in ecol3, but absent in ecol1 (Welch et al. 2002Go), were classified as HT gene candidates, although all of them were selected as inserted genes.

Selection of Positively HT Genes
To test the reliability of the HT gene identification procedure described above, we selected a benchmark set of positively HT genes by the Intergenetic Region Alignment (INTEGRAL) method, an automated method based on intergenetic alignments. From gene clusters, we first identify those that could have been inserted (e.g., the red genes in fig. 3a) and examine if we can verify the insertion. Although simple comparison of 2 genomes yields gene clusters that are present in only one of them, one insertion in the species and one deletion in the other species explain the difference equally well. We therefore checked if we could verify insertions based on the conjecture that the boundaries of gene clusters must be different in general if 2 independent deletion events had occurred. If the intergenetic regions of spp. 2 and 3 can be aligned and nucleotides N2 and N3 are exactly matched (fig. 3b), then we consider that the occurrence of 2 independent deletions in the 2 lineages is highly unlikely and X-Y is probably an inserted gene cluster. (It is theoretically possible to explain the present state by postulating the presence of W-X-Y-Z orthologs at LCA; introduction of W-Z homologous segments to spp. 2 and 3 and subsequent homologous recombinations may delete the X-Y segment at the identical location in the 2 species [Omelchenko et al. 2003Go]. However, we consider the occurrence of such cases extremely rare; nearly identical sequences must be HT twice independently, once between node B and sp. 2 and another between node A and sp. 3, and the displaced homologous segments must be fixed in both species). Although most genuine insertions are presumably not verifiable by this method due to the generally high mutability of intergenetic regions, we could verify the insertions in 99 gene clusters consisting of 251 genes. They were then subjected to the Majority Rule screening to yield 24 gene clusters containing 116 positively HT genes, including the red ROC in figure 3a (Supplementary table 2, Supplementary Material online).


Figure 3
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Verification of gene insertion by the INTEGRAL method. ROCs are drawn as in figure 1a with the genes in starred species depicted with their orientations reversed. (a) The regions corresponding to the ecol0 relA operon with the phylogenetic tree. (b) Conceptual scheme for verification shown with matching intergenetic regions aligned vertically. The intergenetic regions between W's and Z's are first aligned by BlastN and, if aligned, the alignments are extended by ClustalW. Insertion to sp. 1 is verified if sp. 3 forms an outgroup and N2 and N3 exactly line up. The intergenetic regions may have deletions.

 
Reliability of HT Gene Candidates
To estimate how many false positives the set of HT gene candidates contains, we investigated the probable origins. The species in which the best Blast hit of a HT gene candidate belongs to is regarded as the most likely origin. A gene that existed at the LCA is very likely to have orthologs in other {gamma}-proteobacteria. In fact, 91.7% of the genes in conserved ROCs, which we consider to be native genes, have best hits in {gamma}-proteobacteria (Supplementary table 3, Supplementary Material online). Therefore, if the set of HT gene candidates contains some false positives, the apparent proportion of those originating from {gamma}-proteobacteria must be higher than that of the set of positively HT genes. The fact that the observed fraction of HT gene candidates whose probable origins are {gamma}-proteobacteria (45.0%) is nearly identical to that of positively HT genes (45.7%; table 2) therefore shows that the set does not contain a significant number of non-HT genes. It is theoretically possible that the HT identification method selects HT genes from phylogenetically close species less effectively and therefore gives a lower fraction of HT genes from {gamma}-proteobacteria but incorporates some false positives to exactly compensate for the difference. However, the general similarity of the fraction distributions in table 2 (no statistical difference at P < 0.01) makes this interpretation unlikely. If we assume tree A (Supplementary fig. 1) instead of tree S, the set of HT gene candidates selected increases 2.37-fold in number and was found to contain significant errors (Supplementary table 3). We interpret both the drastic increase in the number of HT candidates and the inclusion of false positives as reflection of the wrong phylogeny; to account for the presence and absence of orthologs along the erroneous phylogenetic tree, many native genes were mistakenly classified as HT genes. Also, if we choose a set of HT genes without accepting GHRs, the set includes a significant fraction of false positives. Therefore, the inclusion of GHRs is crucial for reliable selection of HT genes.


View this table:
[in this window]
[in a new window]

 
Table 2 Probable origins of HT genes

 
To test the reliability of HT gene candidates, we also carried out phylogenetic analyses (fig. 2). Conserved ortholog clusters produced the phylogeny consistent with HGT to ESS species only 7.4% of the time, whereas 15.4% of the trees produced by positively HT genes were consistent with HGT (table 3). The latter fraction is not very high. The occurrence of HT from a group phylogenetically close to {gamma}-proteobacteria partly accounts for the finding. For example, HGT from an {alpha}-proteobacterium to ESS species is likely to produce the phylogeny consistent with HGT in case 1 only, that is, only one case out of a total of 6 if at least one BlastP hit exists in each of the 4 groups. Also, the fact that the phylogenetic analyses cannot detect HGT from {gamma}-proteobacterial species to ESS species accounts for the finding. This is because the cases involving {gamma}-proteobacteria (cases 1, 2, and 3 in the figure) result in the phylogeny of figure 2a, irrespective of the occurrence of such HGT. In any event, we regard 7.4% and 15.4% as the fractions of the groups of native and positively HT genes, respectively. The corresponding fraction of HT gene candidates, 17.1%, appears a little higher than the latter percentage, but the difference is statistically insignificant at P < 0.1. This result supports the conclusion that the set of HT gene candidates does not contain a significant number of false positives. Additionally, that the fraction of the group of nonconserved clusters (10.4%) falls between the corresponding figures of native and positively HT genes implies that the group is a mixture of native and HT genes.


View this table:
[in this window]
[in a new window]

 
Table 3 Phylogenetic analysis

 
Conservation of Operons
We then examined operon conservation. Out of a total of 256 experimentally verified operons in ecol0, 164 were found to be conserved in all, whereas 92 operons are not conserved in at least one of the ESS species (Supplementary table 4, Supplementary Material online). There are cases in which operons may be preserved in modified forms in both species despite the nonconservation of some constituent genes (Supplementary fig. 2, Supplementary Material online). In the example, the styp0 and styp2 regions corresponding to the ecol0 rhaD gene have deletions in the middle. Nevertheless, the conservation of the 2 upstream orthologs in the 2 species leaves the possibility that the remaining operon is still functional.

However, only 19.0% of the cases in which operons are not conserved in species X can possibly be functional in both ecol0 and X (Supplementary table 4), and most nonconserved operons are probably nonfunctional in species X. An operon gain in ecol0 or an operon loss in species X accounts for the latter equally well. As we have no reason to expect a particular prevalence of operons in ecol0 as compared with the other ESS species and the LCA, it is likely that approximately half, or ~20, of the operons nonconserved in Salmonella (Supplementary table 4) are attributable to operon gain between the LCA and ecol0, whereas the rest are cases of operon loss between the LCA and Salmonella. (Although ~20 cases of operon loss from the LCA to ecol0 and ~20 cases of operon gain from the LCA to Salmonella are also expected, these are not detectable by analyses based on operons currently present in ecol0). We observe that an operon is gained either by intragenomic rearrangements or by HGT. The frequency of the phylogeny consistent with HGT to ESS species produced by the nonconserved operons, 10.4%, is between the corresponding figures of conserved clusters and positively HT genes (table 3). This observation also indicates that the set of nonconserved operons contains both HT genes and native genes. Curiously, 17.2% of genes in the conserved operons are essential, whereas essential genes comprise only 7.4% in the nonconserved operons, nearly the same as the average frequency 8.2%. Thus, while essential genes are indeed preferentially found in operons overall (de Daruvar et al. 2002Go; Pal and Hurst 2004Go), the nonconserved operons do not contain essential genes more frequently than the average.

Involvement of HGT in Concrete Cases of Operon Gain
Can we identify concrete cases of operon gain by intragenomic arrangements as proposed by the selfish operon model? For this purpose, we examined the nonconserved operons for evidence of intervening gene deletion that formed new operons after the LCA (Lawrence and Roth 1996Go) (fig. 4a). Such genes must generally have orthologous genes in separate sections in ESS species other than ecol0, do not constitute a functional operon in the species, and the intervening genes should not be inserted genes because they must be orthologous to the genes deleted in the lineage leading to ecol0. We searched for such nonconserved operons and found only 5 cases: the rst, sap, purF, gat, and aga operons. In the rst operon (fig. 5a), the MinGeneCIDE method identified an insertion event between nodes B and E resulting in the purple ROC. The existence or absence of the red ROC in the LCA is ambiguous; it may have been present in the LCA and subsequently deleted between nodes A and B or absent in the LCA and inserted between nodes A and C. The latter is more probable as the orthologs of rstA and rstB are contiguous in a number of species, including a very close ESS relative, Yersinia pestis, whereas there are no species in GTOP containing the homologs of the red genes in between. Therefore, the rst operon was presumably not formed by the deletion of the purple or red ROC. Similar observations also make it unlikely that the sap, purF, and gat operons were formed by deletion of intervening genes. Lastly, the aga operon contains an extra red ROC (fig. 5b) in 4 species. The forward direction, Swiss-Prot annotation, and short intergenetic distances of the red genes all support the notion that they are part of the operon in these species. Furthermore, the report that agaW and agaA in ecol0 are pseudogenes with the C-terminal and N-terminal half deleted, respectively (Homma et al. 2002Go), supports the idea that the red ROC was deleted together with fragments of the upstream and downstream genes in ecol0 (red dotted line in fig. 5b). Thus, this is probably a case of an operon that existed at node B losing some of the constituent genes, rather than operon formation by shedding intervening genes. We therefore are left with no operons that were plausibly formed by deletion of intervening genes in the past ~100 Myr (Lawrence et al. 1991Go) from the divergence of Salmonella and Escherichia.


Figure 4
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Models for operon gain and expansion. Genes are represented by arrows on circular chromosomes. (a), (b), and (d) are cases of operon gain, whereas (c) is a case of operon expansion.

 

Figure 5
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Examples of nonconserved operons. Genes are drawn as in figure 3. All the ecol0 genes depicted reside in operons. (a) The genes in the rst operon and their orthologs together with the phylogenetic tree. (b) The genes in the aga operon and their orthologs. There are no orthologs in styp0, styp2, and styp1.

 
On the other hand, our procedure identified 24 HT genes of ecol0 in 6 operons: the entire fecABCDE, lac, glc, and cynTSX operons, the chpR and chpA genes in the relA operon, and the wbbK, wbbJ, wbbI, rfc, glf, and rfbX genes in the rfb operon (fig. 3a, Supplementary table 1). Though the number of cases is less than the expected number assuming proportionality, 40, their existence is undeniable, especially because the chpR and chpA are positively HT genes (Supplementary table 2). The horizontal transfer of operons as a whole (fig. 4b) was proposed previously (Omelchenko et al. 2003Go) and is a mechanism of an operon gain. The most likely origin of the fecABCDE operon is Photorhabdus luminescens, a {gamma}-proteobacterium, but those of the other 3 entirely HT operons could not be determined. We note that the lac operon was previously suggested to be HT (Starlinger 1977Go). On the other hand, the last 2 operons partially composed of HT genes are concrete cases of operon expansion by HGT (fig. 4c). The relA operon is probably a mixture of HT and non-HT genes because the relA gene is in a conserved ROC (fig. 3a) and therefore in all probability existed in the LCA. The HT portion of the relA operon most likely originated from Erwinia carotovora, a {gamma}-proteobacterial species, whereas the most likely origin of the HT genes in the rfb operon was indeterminable. In the relA operon, the relA gene located upstream in the operon encodes ATP:GTP 3'pyrophosphotransferase required in the stringent response to amino acid deprivation, whereas the downstream 2 genes, chpR and chpA, constitute an antitoxin/toxin system and were proposed to be responsible for programmed cell death (Aizenman et al. 1996Go). Both the disparate role the antitoxin/toxin system plays and the existence of an additional promoter between relA and chpR (Marianovsky et al. 2001Go) support the notion that the chpR and chpA are HT genes.

The gene cluster approach makes it possible to positively identify HT genes by insertion verification and thereby to test the reliability of selected HT genes. Only 4 cases of operon gain (the fecABCDE, lac, glc, and cynTSX operons) from the LCA to the present ecol0 species were uncovered and were all attributed to HGT of the entire operons, whereas approximately 20 operons probably were acquired in the same period. The method used to identify deletion of intervening genes overlooks few cases; such genes are not identified only if all the orthologs of the intervening genes had been lost in all the species. By contrast, the MinGeneCIDE method selects inserted genes conservatively due to the rejection of all ambiguous cases and may well have left many HT genes unidentified. Moreover, rejection of HGT within the ESS by the Majority Rule leads to an underestimation of HT genes, as mentioned above. Therefore, it is likely that HGT accounts for most operon gain in ecol0 from the LCA. We recognize the possibility that a large cluster of genes was HT but rapidly shed some of them to form an operon. This, however, can be regarded as a case in which HGT is first involved in such operon formation. As some genes were found to be inserted in existing operons, a gene may also be inserted just downstream of another gene, forming an operon de novo (fig. 4d). Possibly many operons were created by this mechanism early in bacterial evolution and have been maintained, giving rise to conserved operons currently observed. The conclusion that HGT is involved in most cases of operon gain from the LCA to ecol0 is at variance with a recent report (Price et al. 2005). The disagreement is partly attributable to their use of predicted operons in contrast to our exclusive usage of experimentally confirmed ones. A more crucial difference is that our procedure based on analyses of gene cluster alterations identifies HT genes without significant errors, whereas their HT identification method hinges on analyses of individual genes and provides no reliability test. We therefore consider our conclusion more probable.

Although the co-regulation model (Jacob and Monod 1962Go) explains conserved operons in general, it is incompatible with the insertion of seemingly unrelated genes in the relA operon. Although the apparent self-centeredness of nonconserved operons is consistent with the selfish operon model (Lawrence and Roth 1996Go), the expansion of an existing operon by insertion of functionally unrelated HT genes (fig. 4c) is not covered by the model. Considering the general importance of the genes and the lack of it in the conserved and nonconserved operons, respectively, we propose that genes with related essential functions tend to reside in conserved operons, whereas genes in nonconserved operons generally conferring slight advantage to the organisms undergo frequent horizontal transfer and decay.

Characteristics of HT Genes
Another look at the distribution of the probable origins of HT genes (table 2) reveals that HT genes preferentially originate from phylogenetically close species: more than 45% from {gamma}-proteobacteria other than the ESS species. The skew in the origin distribution is reasonable, as genes of similar species are more likely to function in host cells due to similarity in transcription factors, codon usage, protein repertoire, and other factors and therefore have higher fixation probability. This observation makes it likely that some cases of HGT occurred within the ESS species but were overlooked due to the Majority Rule.

None of the HT genes in ecol0 in our list of HT genes is classified as essential genes, whereas proportionally 10 are expected to be. The HT genes were classified into the main functional roles (Peterson et al. 2001Go) (fig. 6). The functional distribution is mostly independent of the origins of the HT genes (Supplementary table 5, Supplementary Material online). Intriguingly, genes involved in viral functions and mobile and extrachromosomal element functions are HT significantly more frequently than expected (P < 0.01), supporting the theory that HGT is frequently mediated by bacteriophages (Daubin and Ochman 2004Go). The complexity hypothesis (Jain et al. 1999Go) states that informational genes are HT at lower frequency than others, whereas those involved in housekeeping (operational genes) undergo HGT more frequently. Our data confirm that informational genes (daggered in fig. 6) are rarely HT. However, most of the categories corresponding to operational genes (starred) contain HT genes at lower frequency. Overall, both informational and operational genes are HT more infrequently than expected (P < 0.01).


Figure 6
View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— Functional distribution of HT genes. All the categories annotated to more than 0.5% of the ESS genes are shown. Black rectangles indicate that the fold increases of the categories are significantly higher or lower than unity at P < 0.01.

 
Characteristics of the HT Gene Selection Method
Our HT gene selection procedure differs from existing methods in that it analyzes alterations of gene clusters instead of those of individual genes. This approach makes it possible to incorporate GHRs, which are important vestiges of orthologs, in its analysis. According to this method, insertions or deletions of adjacent gene clusters are counted as one indel, as they should be. The 2 salient features cannot be incorporated in HT gene identification methods based on analyses of individual genes. Although the present procedure may miss some HT genes, it identifies HT genes with high reliability. We thus state that at least 5.5% of genes in ESS species are HT genes. By contrast, simple genome comparison often leads to overidentification of HT genes, although the fraction of false positives has never been assessed for lack of a procedure. (Previously identified HT genes can now be evaluated by our verification method). Approximately 47% of the HT genes we identified were also identified as HT genes by a method based on nucleotide compositions (Nakamura et al. 2004Go) (the last column in table 2). The coverage, however, depends on the origins; the shorter the phylogenetic distance between the ESS species and the species of origin, the less likely the HGT is identified by the nucleotide composition method. This result makes sense as our procedure is independent of the species of origin, whereas methods based on nucleotide compositions are prone to underidentify HT genes from phylogenetically close species (Nakamura et al. 2004Go).


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary figures 1 and 2, method, and tables 1–5 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
We thank Dr T. Kawabata and Mr S. Sakamoto for constructing and constantly updating the GTOP database, Drs T. Horiike and N. Saitoh for help in phylogeny, and Drs K. Fukami-Kobayashi and Y. Minezaki for constructive discussions. This work was supported in part by a grant-in-aid from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan.

Funding for the Open Access publication charges was provided by Management Expense Grant, MEXT, Japan.


    Footnotes
 
William Martin, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Aizenman E, Engelberg-Kulka H, Glaser G. (1996) An Escherichia coli chromosomal "addiction module" regulated by guanosine 3',5'-bispyrophosphate: a model for programmed bacterial cell death. Proc Natl Acad Sci USA 93:6059–6063.[Abstract/Free Full Text]

    Belda E, Moya A, Silva FJ. (2005) Genome rearrangement distances and gene order phylogeny in gamma-proteobacteria. Mol Biol Evol 22:1456–1467.[Abstract/Free Full Text]

    Daubin V, Lerat E, Perriere G. (2003) The source of laterally transferred genes in bacterial genomes. Genome Biol 4:R57.[CrossRef][Medline]

    Daubin V and Ochman H. (2004) Start-up entities in the origin of new genes. Curr Opin Genet Dev 14:616–619.[CrossRef][Web of Science][Medline]

    de Daruvar A, Collado-Vides J, Valencia A. (2002) Analysis of the cellular functions of Escherichia coli operons and their conservation in Bacillus subtilis. J Mol Evol 55:211–221.[CrossRef][Web of Science][Medline]

    Desjardins P, Picard B, Kaltenbock B, Elion J, Denamur E. (1995) Sex in Escherichia coli does not disrupt the clonal structure of the population: evidence from random amplified polymorphic DNA and restriction-fragment-length polymorphism. J Mol Evol 41:440–448.[CrossRef][Web of Science][Medline]

    Fisher RA. (1930) The genetical theory of natural selection. (Oxford University Press, Oxford (UK)).

    Fukuchi S and Nishikawa K. (2004) Estimation of the number of authentic orphan genes in bacterial genomes. DNA Res 11:219–231.[Abstract]

    Homma K, Fukuchi S, Kawabata T, Ota M, Nishikawa K. (2002) A systematic investigation identifies a significant number of probable pseudogenes in the Escherichia coli genome. Gene 294:25–33.[CrossRef][Web of Science][Medline]

    Hooper SD and Berg OG. (2002) Gene import or deletion: a study of the different genes in Escherichia coli strains K12 and O157:H7. J Mol Evol 55:734–744.[CrossRef][Web of Science][Medline]

    Itoh T, Takemoto K, Mori H, Gojobori T. (1999) Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. Mol Biol Evol 16:332–346.[Abstract]

    Jacob F and Monod J. (1962) On the regulation of gene activity. Cold Spring Harb Symp Quant Biol 26:193–211.[Medline]

    Jain R, Rivera MC, Lake JA. (1999) Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci USA 96:3801–3806.[Abstract/Free Full Text]

    Kawabata T, Fukuchi S, Homma K, Ota M, Araki J, Ito T, Ichiyoshi N, Nishikawa K. (2002) GTOP: a database of protein structures predicted from genome sequences. Nucleic Acids Res 30:294–298.[Abstract/Free Full Text]

    Koski LB, Morton RA, Golding GB. (2001) Codon bias and base composition are poor indicators of horizontally transferred genes. Mol Biol Evol 18:404–412.[Abstract/Free Full Text]

    Kunin V, Ahren D, Goldovsky L, Janssen P, Ouzonis CA. (2005) Measuring genome conservation across taxa: divided strains and united kingdoms. Nucleic Acids Res 33:616–621.[Abstract/Free Full Text]

    Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.[CrossRef][Medline]

    Lawrence JG. (1997) Selfish operons and speciation by gene transfer. Trends Microbiol 5:355–359.[CrossRef][Web of Science][Medline]

    Lawrence JG, Hartl DL, Ochman H. (1991) Molecular considerations in the evolution of bacterial genes. J Mol Evol 33:241–250.[CrossRef][Web of Science][Medline]

    Lawrence JG and Ochman H. (1997) Amelioration of bacterial genomes: rates of change and exchange. J Mol Evol 44:383–397.[CrossRef][Web of Science][Medline]

    Lawrence JG and Roth JR. (1996) Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143:1843–1860.[Abstract]

    Marianovsky I, Aizenman E, Engelberg-Kulka H, Glaser G. (2001) The regulation of the Escherichia coli mazEF promoter involves an unusual alternating palindrome. J Biol Chem 276:5975–5984.[Abstract/Free Full Text]

    Morreno-Hagelsieb G and Collado-Vides J. (2002) A powerful non-homology method for the prediction of operons in prokaryotes. Bioinformatics 18:S329–S336.[Abstract]

    Nakamura Y, Itoh T, Matsuda H, Gojobori T. (2004) Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat Genet 36:760–766.[CrossRef][Web of Science][Medline]

    Omelchenko MV, Makarova KS, Wolf YI, Rogozin IB, Koonin EV. (2003) Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ. Genome Biol 4:R55.[CrossRef][Medline]

    Pal C and Hurst LD. (2004) Evidence against the selfish operon theory. Trends Genet 20:232–234.[Web of Science][Medline]

    Pal C, Papp B, Lercher MJ. (2005) Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nat Genet 37:1372–1375.[CrossRef][Web of Science][Medline]

    Perna NT, Plunket G III, Burland V, et al. (28 co-authors). (2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529–533.[CrossRef][Medline]

    Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O. (2001) The comprehensive microbial resource. Nucleic Acids Res 29:123–125.[Abstract/Free Full Text]

    Price MN, Huang KH, Arkin AP, Alm EJ. (2005) Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res 15:809–819.[Abstract/Free Full Text]

    Ragan MA. (2001) On surrogate methods for detecting lateral gene transfer. FEMS Microbiol Lett 201:187–191.[CrossRef][Web of Science][Medline]

    Ragan MA and Charlebois RL. (2002) Distributional profiles of homologous open reading frames among bacterial phyla: implications for vertical and lateral transmission. Int J Syst Evol Microbiol 52:777–787.[Abstract]

    Rokas A, Williams BL, King N, Carroll SB. (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798–804.[CrossRef][Medline]

    Starlinger P. (1977) DNA rearrangements in prokaryotes. Annu Rev Genet 11:103–126.[CrossRef][Web of Science][Medline]

    Wang B. (2001) Limitations of compositional approach to identifying horizontally transferred genes. J Mol Evol 53:244–250.[CrossRef][Web of Science][Medline]

    Welch RA, Burland V, Plunkett G III, et al. (19 co-authors). (2002) Proc Natl Acad Sci USA. 99:17020–17024.

Accepted for publication December 19, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
N. Raghupathy and D. Durand
Gene Cluster Statistics with Gene Families
Mol. Biol. Evol., May 1, 2009; 26(5): 957 - 968.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. C. Lima, A. M. Varani, and C. F.M. Menck
NAD Biosynthesis Evolution in Bacteria: Lateral Gene Transfer of Kynurenine Pathway in Xanthomonadales and Flavobacteriales
Mol. Biol. Evol., February 1, 2009; 26(2): 399 - 406.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
R. H. Baran and H. Ko
Detecting Horizontally Transferred and Essential Genes Based on Dinucleotide Relative Abundance
DNA Res, October 1, 2008; 15(5): 267 - 276.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. J. Lercher and C. Pal
Integration of Horizontally Transferred Genes into Regulatory Interaction Networks Takes Many Million Years
Mol. Biol. Evol., March 1, 2008; 25(3): 559 - 567.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. Grayson, L. Han, T. Winther, and R. Phillips
From the Cover: Real-time observations of single bacteriophage {lambda} DNA ejections in vitro
PNAS, September 11, 2007; 104(37): 14652 - 14657.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
24/3/805    most recent
msl206v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Homma, K.
Right arrow Articles by Nishikawa, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Homma, K.
Right arrow Articles by Nishikawa, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?