MBE Advance Access originally published online on June 19, 2008
Molecular Biology and Evolution 2008 25(9):1909-1921; doi:10.1093/molbev/msn136
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Evolution of Closely Linked Gene Pairs in Vertebrate Genomes


* Biomolecular Chemistry, 271 Nijmegen Center of Molecular Life Science, Radboud University Nijmegen, Nijmegen, The Netherlands
Centre for Molecular and Biomolecular Informatics, NCMLS, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
E-mail: n.lubsen{at}science.ru.nl.
| Abstract |
|---|
|
|
|---|
The orientation of closely linked genes in mammalian genomes is not random: there are more head-to-head (h2h) gene pairs than expected. To understand the origin of this enrichment in h2h gene pairs, we have analyzed the phylogenetic distribution of gene pairs separated by less than 600 bp of intergenic DNA (gene duos). We show here that a lack of head-to-tail (h2t) gene duos is an even more distinctive characteristic of mammalian genomes, with the platypus genome as the only exception. In nonmammalian vertebrate and in nonvertebrate genomes, the frequency of h2h, h2t, and tail-to-tail (t2t) gene duos is close to random. In tetrapod genomes, the h2t and t2t gene duos are more likely to be part of a larger gene cluster of closely spaced genes than h2h gene duos; in fish and urochordate genomes, the reverse is seen. In human and mouse tissues, the expression profiles of gene duos were skewed toward positive coexpression, irrespective of orientation. The organization of orthologs of both members of about 40% of the human gene duos could be traced in other species, enabling a prediction of the organization at the branch points of gnathostomes, tetrapods, amniotes, and euarchontoglires. The accumulation of h2h gene duos started in tetrapods, whereas that of h2t and t2t gene duos only started in amniotes. The apparent lack of evolutionary conservation of h2t and t2t gene duos relative to that of h2h gene duos is thus a result of their relatively late origin in the lineage leading to mammals; we show that once they are formed h2t and t2t gene duos are as stable as h2h gene duos.
Key Words: head-to-head gene bidirectional promoter coordinate expression
| Introduction |
|---|
|
|
|---|
The textbook view of a eukaryote gene is a solitary functional entity, a monocistronic coding sequence of which the expression is controlled by an autonomous promoter. In fact, there is a significant clustering of genes in the mammalian genome where the genes in these clusters show coordinate expression (Hurst et al. 2004
The orientation of closely linked genes in the human genome is not random: there are more closely linked head-to-head (h2h) genes, usually defined as genes divergently transcribed from opposite strands separated by an intergenic region of 1 kb or less (Adachi and Lieber 2002
; Trinklein et al. 2004
; Li et al. 2006
), than expected. The region between these h2h gene pairs is usually denoted as a bidirectional promoter. Formal experimental proof that expression of a h2h gene pair is regulated by a common and shared bidirectional element is available for only a few of such bidirectional promoters (see e.g., Hansen et al. 2003
). However, close juxtaposition of 2 autonomous promoters does result in promoter cross talk (Hampf and Gossen 2007
), unless an insulator is interposed (see e.g., Xie et al. 2007
). It is therefore likely that the members of a closely linked h2h gene pair are no longer independently expressed. Indeed, most (Trinklein et al. 2004
; Li et al. 2006
; Lin et al. 2007
; Yang et al. 2007
), but not all (Takai and Jones 2004
), expression analyses showed significant correlation, both negative and positive, between the expression of h2h gene pair members.
The usual explanation for the evolutionary origin of closely linked h2h pairs is that once created by chance, it becomes difficult to separate the pair as insertion of intergenic DNA, such as a repetitive element, would disturb expression of both genes. H2h gene pairs would thus slowly accumulate during evolution. This explanation is supported by the higher than average evolutionary conservation of h2h pairs (Koyanagi et al. 2005
; Li et al. 2006
) and a lack of repetitive elements in the bidirectional promoter region between human h2h pairs (Takai and Jones 2004
). It is curious, however, that enrichment in closely linked h2h pairs is reported to be limited to mammals (Koyanagi et al. 2005
); one would expect them to accumulate in all evolutionary lineages. We have therefore examined the evolution of closely linked h2h gene pairs and compared the dynamics of the evolution of h2h gene pairs with that of closely linked convergently transcribed antisense gene pairs (tail-to-tail; t2t) and that of head-to-tail gene pairs (h2t; consecutive genes transcribed from the same strand). We show here that the enrichment in closely linked h2h gene pairs is not limited to mammalian genomes but is also seen in, for example, those of chicken and Xenopus tropicalis. A distinguishing feature of the mammalian gene organization, with the exception of platypus, as compared with that of other investigated vertebrates and lower eukaryotes, is the relative lack of closely linked h2t gene pairs. By tracing the emergence of the h2h, t2t, and h2t gene pairs closely linked in the human genome, we show that the accumulation of h2h pairs predates that of h2t and t2t pairs. However, once formed, the h2t and t2t pairs are as stable as the h2h pairs.
| Methods |
|---|
|
|
|---|
Data Sets and Gene Distribution
In Ensembl (version 40 and 46; Hubbard et al. 2007
For all species, the number of h2h, h2t, and t2t gene pairs was determined, together with the length of the intergenic region, by means of python scripting on the Ensembl gene annotation files in Ensmart (script available from the authors). The intergenic region was defined as the number of base pairs between the beginning and/or ends of the transcripts as annotated in Ensembl. In the better annotated genomes, this includes the 5' and 3' untranslated regions (UTR); in poorly annotated genomes, information about the UTRs may be incomplete, and the number of closely linked gene pairs could be underestimated.
Conservation and Dynamics of Gene Pairs
The cross species homology data (orthology files in Ensmart) were used to find orthologs of human gene pairs with an intergenic distance <600 bp (gene duos) in other species. When more than one possible ortholog was found, the most probable gene pair was chosen, that is, the one which most resembled the human situation in orientation and/or distance. For the orthologs of each member of a human gene duo, the location and organization in other species were determined with possible outcome h2h, h2t, t2t, or not linked. In case of location on the same chromosome, the intergenic distance was determined as well as whether or not the 2 genes were separated by other genes. By combining the data from different species, the most likely organization of the members of the Hs gene duos at the primate–rodent divergence was then inferred. Similarly, the putative organization of the orthologs could be inferred at the other branching points of a vertebrate tree consisting of Hs, Mm, Rn, Gg, Xt, Tn, Tr, Dr, Ol, and Ga, in which we placed Mm and Rn together in a rodent group and the 5 fish species in a fish group. In analyzing these data, the maximum parsimony principle was applied, assuming the least chromosomal rearrangements. Gene duos that were inferred to be closely linked at a branching point, yet separated in a descendant, were considered to be lost again (e.g., genes that are h2h gene duos in Hs, Mm, and Gg but not in Rn). We could trace orthologs of 47% (365) of the h2h, 27% (99) of the h2t, and 41% (185) of the t2t Hs gene duos; orthologs of the remainder of the Hs gene duos could not be found in a sufficient number of species.
Gene Expression
An expression data set consisting of a subset of normal human and mouse tissue samples from the Gene Logic BioExpress Database product (http://www.genelogic.com/genomics/bioexpress/) was used. The human data set consists of 115 tissue categories (compiled from 3,269 tissue samples) and 44,792 cDNA fragments; and the mouse data set consists of 25 tissue categories (compiled from 859 tissue samples) and 36,701 cDNA fragments (Hulsen et al. 2006
). First, the Pearson correlations between the expression profiles of all cDNA fragments in the human set and all genes in the mouse set were calculated (data available at http://www.cmbi.ru.nl/
timhulse/orthocomp/). A perfect correlation has a score of 1; a perfect anticorrelation has a score of –1. Second, the Affymetrix fragment IDs of the chip data were mapped to the Ensembl (version 40) IDs used in our study, using Ensembl-Affymetrix mapping files from the Ensembl FTP site (see also http://www.ensembl.org/info/about/docs/microarray_probe_set_mapping.html). When one Ensembl ID was mapped to multiple Affymetrix fragment IDs, the average of the multiple correlation coefficients was used. Of the 18,553 Ensembl Hs gene IDs mapped to 28,348 Affymetrix IDs, 10,497 map to a single ID, 4,854 to 2 IDs, 2,037 to 3 IDs, and 1,165 to 4 or more Affymetrix IDs. Of the 16,269 Ensembl Mm gene IDs mapped to 20,548 Affymetrix IDs, 11,146 map to a singe ID, 3,588 to 2 IDs, 1,136 to 3 IDs, and 399 to 4 or more Affymetrix IDs. Finally, the correlation coefficients were mapped for human and mouse Ensembl (version 40) h2h, h2t, and t2t gene pairs and 3,249 (human) or 2,197 (mouse) randomly assembled gene pairs as control.
| Results |
|---|
|
|
|---|
Organization of Closely Linked Gene Pairs
Ensembl (Hubbard et al. 2007
|
If gene organization is random, then the frequency of the 3 possible orientations of gene pairs should be 50% h2t, 25% h2h, and 25% t2t. This ratio is indeed more or less observed in most of the 34 investigated eukaryotic genomes (21 mammalian and 13 nonmammalian; table 1). However, in some genomes, notably not only in that of the hedgehog (Ee) but also, for example, in Cp, Sa, and La, we see considerably deviating frequencies. This is likely due to incomplete assembly of the genome; for reasons that we do not understand, incomplete assembly tends to result in a bias toward t2t pairs.
The mammalian genome has been reported to be enriched in closely linked h2h gene pairs (Trinklein et al. 2004
; Koyanagi et al. 2005
; Li et al. 2006
). As in the last years, the sequences of a number of other vertebrate genomes, both mammalian and nonmammalian, have become available, we reexamined whether enrichment in closely linked h2h gene pairs is indeed a characteristic of mammalian genomes only. The frequency of h2h, h2t, and t2t gene pairs relative to the intergenic distance between the members of those pairs in representatives of different vertebrate groups is shown in figure 1 (data for all species examined are shown in supplementary fig. S1 [Supplementary Material online]; note that we only selected protein-coding genes and that overlapping genes were not taken into account). With the notable exception of the opossum (Md) and platypus (Oa) genomes, all mammalian genomes do show an enrichment in closely linked h2h gene pairs and in some cases also in closely linked t2t pairs. It is unlikely that the difference in organization of the Md and Oa genomes is due to incomplete annotation as other mammalian genomes that also contain very few closely linked gene pairs do nevertheless show at least some enrichment in h2h pairs (see e.g., Cp and Et in supplementary fig. S2, Supplementary Material online). Enrichment in closely linked h2h pairs is also seen in the genome of G. gallus (Gg) and, to a lesser extent, in that of X. tropicalis (Xt) but not in any of the 5 fish genomes. The puffer fish genomes (Tn and Tr) show an enrichment in h2t as do the nonvertebrate genomes.
|
As these data indicate that between vertebrate clades genomes might differ not only in the enrichment of closely linked h2h gene pairs but also in the frequency of closely linked h2t and t2t gene pairs, we determined the number of closely linked h2h, h2t, and t2t pairs (table 1; as the enrichment of h2h gene pairs in the mammalian genomes is seen for gene pairs with an intergenic distance of 600 bp or less [arrows in fig. 1], we used 600 bp, rather than the 1,000 bp used in other studies, as the cutoff for closely linked gene pairs. We will refer below to such closely linked gene pairs as gene duos). The percentages of the 3 possible orientations of gene duos relative to the total number of gene duos in different vertebrate and nonvertebrate species are plotted in figure 2. There is some variation in the pattern in tetrapods, but the overall trend is clear: there is an increase in h2h gene duos not only in all mammals, except Oa, but also in Gg and Xt. The t2t gene duos are also in excess in most mammals but not in Gg and Xt, which have actually less t2t gene duos than expected from a random distribution (note that the excess of t2t gene duos is a feature of both well [e.g., Rn] and poorly [e.g., Ee] assembled mammalian genomes and thus unlikely to be an assembly artifact). The most noticeable and consistent feature is the marked lack of h2t gene duos in mammalian genomes except again in that of Oa. As can be expected from their much longer divergence times, the organization of the 5 fish genomes is much more variable than that of the mammalian genomes: Dr has a mammalian-like distribution with a lack of h2t gene duos and an overrepresentation of h2h gene duos, whereas in the Tn and Tr genomes, as in nonvertebrate genomes, the organization of the gene duos is more random.
|
Clustering of Gene Duos
If the location of genes was random, then gene density must correlate with genome size and a more compact genome such as that of Dm or Ce is likely to have more gene duos, merely due to a higher gene density. There is also a significant correlation between the closely linked h2h gene pair ratio and gene density in the human genome (Li et al. 2006
|
Gene duplication often results in gene clusters; the β-globin gene cluster is a prime example. To determine whether gene duplication is a significant cause of gene duos, we determined how many genes in the human genome are adjacent to a paralog gene. Paralogous genes were identified via the paralogy link in Ensembl. As shown in table 3, about 10% of all human protein-coding genes have a paralog neighbor transcribed from the same strand, but only 6% of the h2t gene duos are paralogous. About 8% of the human protein-coding genes have a paralog neighbor transcribed from the opposite strand, with an equal occurrence of divergent or convergent transcription. However, for both the divergently and the convergently transcribed gene duos, only 1% consists of paralogs (table 3). The members of h2t gene duos are thus 3 times more likely to be paralogs than those of h2h or t2t gene duos, but gene duplication events have not significantly contributed overall to the formation of gene duos. To have some idea as to when the gene duplications occurred, we checked for the presence of orthologs of all the Hs paralogous gene duos in other species. As shown in table 3, 5 out of the 7 h2h paralogous gene duos likely predate the gnathostomes, whereas only 11 out of the 22 h2t and 2 of the 4 t2t paralogous gene duos do so (see also below). Overall, at least half of the Hs paralogous gene duos that could be traced back in the vertebrate tree are the result of a gene duplication early in vertebrate evolution.
|
Conservation of Gene Duos
If close apposition of genes has consequences for the regulation of expression of those genes, then one would expect conservation of gene organization. Previous studies have shown that h2h gene pairs are significantly more likely to have the same gene organization in other species than h2t pairs (Koyanagi et al. 2005
|
When only the orthologs of the human gene duos are considered, there is little difference between the human h2h, h2t, and t2t gene duos with respect to similarity of their organization in other species (fig. 3, panel b). However, when the number of those orthologs in a species relative to the number of gene duos in that species is also taken into account, a different picture emerges (fig. 3, panel c). As expected, the larger the evolutionary distance is from man, the fewer the orthologous gene duos found. When ortholog gene duos are found, they are mostly h2h and not h2t or t2t gene duos. Thus, in species more distant from man, there are relatively fewer orthologous h2t and t2t gene duos than h2h gene duos. Assuming that the evolutionary rate of generation of gene duos is the same for h2h, h2t, and t2t, there are 2 possible explanations for the preponderance of orthologous h2h gene duos. One is that h2h gene duos were generated earlier in evolution than h2t and t2t genes and are therefore more likely to be common; the other is that the close linkage of h2h genes, once generated, is better conserved during evolution than that of h2t and t2t genes.
Dynamics of Formation of Gene Duos
To determine when a particular human gene duo was formed during evolution, we need to trace the organization of the genes in the ancestral species. To that end, we used the data about the organization of the orthologs of the members of the human gene duos in other species. Of the human gene duos, 365 h2h, 99 h2t, and 185 t2t were phylogenetically informative, that is, the organization of the orthologs could be traced in a sufficient number of different vertebrate species to be able to infer the most likely organization of those orthologs at the branching points of the gnathostome, tetrapod, amniote, and euarchontoglires (primates and rodents) lineages (nodes A–D in fig. 4; see also Methods). The inferred rearrangements of the members of the human h2h, h2t, and t2t duos are outlined in figure 5. For example, in the genome of the common ancestor of the gnathostomes (fig. 5, node A), 58 of the 365 human h2h gene duos were already present as h2h gene duos, 153 were already gene pairs but with a larger intervening distance, 34 were separated by intervening genes, 45 were linked but in the wrong orientation, and 75 were dispersed. Between branching points A and B, of these 75 dispersed pairs, 6 became linked but in the wrong orientation, 4 became linked in the right orientation but separated by intervening genes, 38 became a gene pair separated by >600 bp, and 8 became a gene duo, thus leaving 19 as dispersed gene pairs at node B. Of the 45 gene pairs at node A, which were linked in the wrong orientation, 2 became linked in the right orientation but still separated by intervening genes, 22 became a gene pair separated by >600 bp, and 8 became a gene duo. This left 13 linked gene pairs, but together with the 6 gene pairs that now became linked, this yields a total of 19 at node B. Of the 34 gene pairs separated by intervening genes at node A, in 10 cases the intervening genes were lost but leaving an intergenic distance >600 bp, and in 4 cases a gene duo was formed. Together with the 2 + 4 gene pairs that were gained, this gives a total of 26 at node B. Of the 153 gene pairs separated by >600 bp at node A, 19 now became a gene duo, presumably due to loss of intergenic DNA, while 10 + 22 + 38 were added, which then yielded 204 gene pairs at node B. Finally, to the 58 gene duos at node A, 19 + 4+8 + 8 were added, giving 97 gene duos at node B.
|
|
Figure 5 illustrates that the mode of formation of the gene duos, whether h2h, h2t, or t2t, is in general very similar: first genes happened to be rearranged such that they are linked in the proper orientation, then intergenic DNA was lost. For all 3 gene pair orientations, more than 80% of the pair members were already organized in the right orientation without an intervening gene in tetrapods (fig. 5, node B, and fig. 6, left panel). The most conspicuous difference between the h2h gene duos on the one hand and the h2t and t2t gene duos on the other hand is that about 50% of the h2h gene duos (172 out of 365) predate amniotes, whereas only 14% of the h2t (15 out of 99) and 28% of the t2t gene duos (52 out of 185) do so (see fig. 5, node C, and fig. 6, right panel). Formation of the human h2h gene duos thus started in early tetrapod evolution, whereas most of the human h2t and t2t gene duos were formed in amniotes (fig. 6).
|
We have also attempted to estimate the rate of loss of gene duos. In principle, the best measure is counting how many gene duos appeared to be lost again later in evolution (e.g., if orthologs form a gene duo in fish, Xt, rodents, and man but not in Gg, the inference is that the gene duo is lost in the Gg lineage). For that we need to know which gene duos were present in the ancestral genome. The numbers shown in figure 5 at a particular node are the sum of the gene duos present at the previous node plus the gene duos inferred to have been formed prior to divergence of the lineages deriving from that node. The latter gene duos thus represent those that are inferred to be present at that node because they are present in the descendant species. Hence, by definition, loss of those gene duos cannot be detected. That means that we can only determine whether a gene duo is lost after a particular node, if that gene duo was present at the previous node. For example, the 172 h2h gene duos present at node C (fig. 5) should also be present at node D. Loss in the rodent lineage can then be inferred for those 172 h2h gene duos (table 4). The numbers are very small, particularly in the case of h2t gene duos, and the reliability is therefore not high. A rate of loss can also be calculated by combining the data presented in figures 3 and 5 and knowing how many orthologs of the human gene duos can detected in other species (supplementary table S1, Supplementary Material online). From figure 5, it can be calculated how many of the orthologs of the human gene duos present in a particular species (supplementary table S1, Supplementary Material online) were likely to be already present at the nearest branch point; the loss then follows from the number of gene duos present in the genome of that species at this time (for sample calculation, see supplementary fig. S2, Supplementary Material online). For the h2h gene duos, these 2 approaches yield very similar rates: a loss between 11% and 14% per 100 Myr in the mouse lineage and between 8% and 10% per 100 Myr in the chicken lineage. For the h2t gene duos, the estimates for the mouse lineage are a loss between 6% and 15% per 100 Myr; in the chicken lineage, none would be lost. Finally, the loss of the t2t gene duos is estimated to be between 0% and 19% per 100 Myr in the mouse lineage and between 13% and 15% per 100 Myr in the chicken lineage. From these estimates, it appears that there is no major difference in stability of gene duos depending on orientation.
|
Gene Expression
The closer 2 genes are the more likely they are to be located in the same expression cluster (Sémon and Duret 2006
|
| Discussion |
|---|
|
|
|---|
Mammalian genomes have been reported to be enriched in divergently transcribed cis-antisense gene pairs with an intergenic distance of less than 1 kb, the h2h genes with a bidirectional promoter region. The exact number of such genes in the human genome is still a matter of debate; the reported numbers vary from 677 (Koyanagi et al. 2005
A potential problem with h2t gene pairs is transcriptional read-through from the upstream gene into the downstream gene. This can cause promoter occlusion—the elongating RNA polymerase could remove positive transcription factors; promoter activation—the elongating RNA polymerase could remove repressors (Callen et al. 2004
; Leupin et al. 2005
; Shearwin et al. 2005
); or result in the synthesis of a read-through mRNA, which encodes a chimeric protein (Parra et al. 2006
). To prevent transcriptional interference, a strong transcription termination signal is needed between the 2 genes. The exact mechanisms and sequence motifs that signal termination by polymerase II are still not exactly understood (for recent reviews, see Buratowski 2005
; Rosonina et al. 2006
), but it is clear that the cleavage that precedes polyA addition is a prerequisite. In this respect, it is of interest to note that the polyA addition signaling in yeast is more complex than in mammalian cells (Zhao et al. 1999
) and that the distance from the polyA addition site to transcription termination in yeast may be shorter (about 0.1 kb; Russo and Sherman 1989
) than that in mammalian cells (>0.5 kb; Rosonina et al. 2006
). Stringency of polyA addition and transcription termination could be a factor in the maintenance of h2t gene duos, and it would be of interest to examine if polyA addition signaling is also more complex in, for example, fugu or urochordates than in eutherian mammalian cells.
If transcription termination poses a problem for h2t gene pairs, why are t2t gene duos not depleted? One possible explanation is that collision could cause pausing of RNA polymerase II, which in turn enhances termination (Zhao et al. 1999
; Buratowski 2005
; Rosonina et al. 2006
) and thereby solving the termination problem. Another possibility is that the antisense transcripts serve in a presumably regulatory, yet unknown role. Antisense transcription abounds in the human genome (Yelin et al. 2003
; Dahary et al. 2005
; Sun et al. 2005
; Engström et al. 2006
) and over 40% of the human or mouse transcription units may have an antisense transcript, usually noncoding (Engström et al. 2006
). The functional consequences of antisense transcription could be a factor in the selection against t2t gene duos.
The depletion of h2t gene duos is not quite unique to eutherian mammalian genomes; we saw this also in the genome of the zebrafish, although not in other genomes of lower eukaryotes. What distinguishes the eutherian mammalian genomes from that of zebrafish is a nonuniform distribution of the distance between h2h genes (fig. 1). A subset of h2h genes with a short intergenic region is also seen in the Gg and Xt genomes. This would suggest that the trend toward formation of such a subset of h2h gene pairs started in tetrapods, before the mammalian divergence. However, the earliest offshoots in mammalian evolution, the monotremes (platypus) and marsupials (opossum) lack this subset of h2h gene pairs. The opossum genome has very few gene duos, which could be the reason that this subset is not detected. The opossum genome otherwise shares the eutherian mammalian characteristic of depletion of h2t gene duos and enrichment in h2h gene duos. The lack of gene duos in the opossum genome does not appear to be a problem of assembly of the genome sequence as the number of gene pairs that can be formed is about equal to the number of protein-coding genes (table 1). The platypus genome assembly is not yet complete as the number of potential gene pairs is only half of the number of protein-coding genes. The paucity of gene duos may be the reason that a subset of h2h gene pairs is not seen, it cannot explain that the organization of the gene duos in the platypus genome is random.
Incompletely assembled eutherian mammalian genomes with only a few gene duos (see table 1) do show the typical depletion of h2t genes (fig. 2). If the platypus gene organization reflects that of the mammalian ancestor, then we must conclude that the formation of a subset of closely linked h2h genes in chicken and Xenopus is evolutionarily independent of the emergence of such a gene organization in mammalian genomes. The alternative is that large rearrangements have taken place in the platypus genome. The latter alternative is the most likely as almost all the gene duos likely to have been present in the last common ancestor are no longer gene duos in the platypus genome (data not shown). It is noteworthy that platypus is a typical mammal with respect to the density of repetitive elements in its genome; for at least one stretch even higher than in the human genome (Margulies et al. 2005
; see also Warren et al. 2008
). Continuous insertion of repetitive elements would tend to create gene-poor and gene-rich domains and has been suggested to be one of the factors driving genes together (e.g., Takai and Jones 2004
). Whether the platypus genome indeed contains gene-poor and gene-rich domains awaits further analysis of that genome; if so, the organization of the gene duos in the platypus genome would then show that compaction to a gene-rich domain does not necessarily lead to an enrichment in closely linked h2h gene pairs. Amphibian and avian genomes are relatively poor in repetitive elements (Organ et al. 2007
) but do contain a subclass of closely linked h2h genes. Hence, there is no strict correlation between density of repetitive elements and enrichment in closely linked h2h gene pairs. For the h2h gene duos, it has been suggested that the sharing of regulator elements provides selection pressure to maintain the gene pair (Adachi and Lieber 2002
; Trinklein et al. 2004
; Lin et al. 2007
; Yang et al. 2007
). In the case of the rare eutherian h2t gene duo, it could be the transcriptional coupling or the chimeric gene product that is favorable; for the t2t gene duo, the antisense transcript could have a regulatory role (RIKEN Genome Exploration Research Group and Genome Science Group [Genome Network Project Core Group] and the FANTOM Consortium 2005
). It could also just be chance that gene duos stay together: insertion of DNA in such a short intergenic region would be a rare event. In the case of h2h or h2t gene duos, the target area for DNA insertion would be even smaller as repetitive elements tend to be excluded from the first 300 bp of the promoter region (Takai and Jones 2004
).
| Supplementary Material |
|---|
|
|
|---|
Supplementary table S1 and figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
The authors thank Gene Logic Inc. for the use of a subset of normal human and mouse tissue samples from the Gene Logic BioExpress Database product. This work was financially supported by the Netherlands Organization for Advancement of Pure Research (NWO).
| Footnotes |
|---|
1 Present address: Animal Breeding and Genetics Group, University of Wageningen, Wageningen, The Netherlands.
Kenneth Wolfe, Associate Editor
| References |
|---|
|
|
|---|
Adachi N, Lieber MR. Bidirectional gene organization: a common architectural feature of the human genome. Cell (2002) 109:807–809.[CrossRef][Web of Science][Medline]
Buratowski S. Connections between mRNA 3' end processing and transcription termination. Curr Opin Cell Biol (2005) 17:257–261.[CrossRef][Web of Science][Medline]
Callen BP, Shearwin KE, Egan JB. Transcriptional interference between convergent promoters caused by elongation over the promoter. Mol Cell (2004) 14:647–656.[CrossRef][Web of Science][Medline]
Dahary D, Elroy-Stein O, Sorek R. Naturally occurring antisense: transcriptional leakage or real overlap? Genome Res (2005) 15:364–368.
Engström PG, Suzuki H, Ninomiya N, et al, (24 co-authors). Complex loci in human and mouse genomes. PLoS Genet (2006) 2:e47.[CrossRef][Medline]
Gierman HJ, Indemans MHG, Koster J, Goetze S, Seppen J, Geerts D, van Driel R, Versteeg R. Domain-wide regulation of gene expression in the human genome. Genome Res (2007) 17:1286–1295.
Hampf M, Gossen M. Promoter crosstalk effects on gene expression. J Mol Biol (2007) 365:911–920.[CrossRef][Web of Science][Medline]
Hansen J, Bross P, Westergaard M, Nielsen M, Eiberg H, Børglum A, Mogensen J, Kristiansen K, Bolund L, Gregersen N. Genomic structure of the human mitochondrial chaperonin genes: HSP60 and HSP10 are localised head to head on chromosome 2 separated by a bidirectional promoter. Hum Genet (2003) 112:71–77.[CrossRef][Web of Science][Medline]
Hubbard TJP, Aken BL, Beal K, et al, (58 co-authors). Ensembl. Nucleic Acids Res (2007) 35:D610–D617.
Hulsen T, Huynen M, de Vlieg J, Groenen P. Benchmarking ortholog identification methods using functional genomics data. Genome Biol (2006) 7:R31.[CrossRef][Medline]
Hurst LD, Pal C, Lercher MJ. The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet (2004) 5:299–310.[CrossRef][Web of Science][Medline]
Koyanagi KO, Hagiwara M, Itoh T, Gojobori T, Imanishi T. Comparative genomics of bidirectional gene pairs and its implications for the evolution of a transcriptional regulation system. Gene (2005) 353:169–176.[CrossRef][Web of Science][Medline]
Leupin O, Attanasio C, Marguerat S, Tapernoux M, Antonarakis SE, Conrad B. Transcriptional activation by bidirectional RNA polymerase II elongation over a silent promoter. EMBO Rep (2005) 6:956–960.[CrossRef][Web of Science][Medline]
Li Y-Y, Yu H, Guo Z-M, Guo T-Q, Tu K, Li Y-X. Systematic analysis of head-to-head gene organization: evolutionary conservation and potential biological relevance. PLoS Comput Biol (2006) 2:e74.[CrossRef][Medline]
Lin JM, Collins PJ, Trinklein ND, Fu Y, Xi H, Myers RM, Weng Z. Transcription factor binding and modified histones in human bidirectional promoters. Genome Res (2007) 17:818–827.
Margulies EH. NISC Comparative Sequencing Program, Maduro VVB, Thomas PJ, Tomkins JP, Amemiya CT, Luo M, Green ED. Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes. Proc Natl Acad Sci USA (2005) 102:3354–3359.
Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV. Origin of avian genome size and structure in non-avian dinosaurs. Nature (2007) 446:180–184.[CrossRef][Web of Science][Medline]
Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, Thomson TM, Antonarakis SE, Guigo R. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res (2006) 16:37–44.
Purmann A, Toedling J, Schueler M, Carninci P, Lehrach H, Hayashizaki Y, Huber W, Sperling S. Genomic organization of transcriptomes in mammals: coregulation and cofunctionality. Genomics (2007) 89:580–587.[CrossRef][Web of Science][Medline]
RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) and the FANTOM Consortium. Antisense transcription in the mammalian transcriptome. Science (2005) 309:1564–1566.
Rosonina E, Kaneko S, Manley JL. Terminating the transcript: breaking up is hard to do. Genes Dev (2006) 20:1050–1056.
Russo P, Sherman F. Transcription terminates near the poly(A) site in the CYC1 gene of the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci USA (1989) 86:8348–8352.
Sémon M, Duret L. Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol Biol Evol (2006) 23:1715–1723.
Shearwin KE, Callen BP, Egan JB. Transcriptional interference—a crash course. Trends Genet (2005) 21:339–345.[CrossRef][Web of Science][Medline]
Sun M, Hurst LD, Carmichael GG, Chen J. Evidence for a preferential targeting of 3'-UTRs by cis-encoded natural antisense transcripts. Nucleic Acids Res (2005) 33:5533–5543.
Takai D, Jones PA. Origins of bidirectional promoters: computational analyses of intergenic distance in the human genome. Mol Biol Evol (2004) 21:463–467.
The FANTOM Consortium. The transcriptional landscape of the mammalian genome. Science (2005) 311:1709–1711.
Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP, Myers RM. An abundance of bidirectional promoters in the human genome. Genome Res (2004) 14:62–66.
Warren WC, Hillier LW, Marshall Graves JA, et al, (99 co-authors). Genome analysis of the platypus reveals unique signatures of evolution. Nature (2008) 453:175–183.[CrossRef][Web of Science][Medline]
Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander ES. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci USA (2007) 104:7145–7150.
Yang MQ, Koehly LM, Elnitski LL. Comprehensive annotation of bidirectional promoters identifies co-regulation among breast and ovarian cancer genes. PLoS Comput Biol (2007) 3:e72.[CrossRef][Medline]
Yelin R, Dahary D, Sorek R, et al, (16 co-authors). Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol (2003) 21:379–386.[CrossRef][Web of Science][Medline]
Zhao J, Hyman L, Moore C. Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev (1999) 63:405–445.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H. Yu, F.-D. Yu, G.-Q. Zhang, X. Shen, Y.-Q. Chen, Y.-Y. Li, and Y.-X. Li DBH2H: vertebrate head-to-head gene pairs annotated at genomic and post-genomic levels Database, June 2, 2009; 2009(0): bap006 - bap006. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







= >100 k). The number of human gene pairs for h2h are, respectively, 599, 594, 1,638, and 624 pairs; for h2t, 247, 2,131, 3,513, and 1,015 pairs; and for t2t, 364, 1,328, 1,425, and 441 pairs. The number of mouse gene pairs for h2h are, respectively, 401, 548, 1,213, and 355 pairs; for h2t, 145, 1,727, 2,574, and 608 pairs; and for t2t, 252, 1,088, 926, and 247 pairs. The number of human randomly paired genes was 3,249 pairs; for mouse, 2,197 random gene pairs were selected. (B) Pearson correlation distribution plot for human or mouse gene duos for which microarray data are available (human: 599 h2h, 247 h2t, and 364 t2t; mouse: 401 h2h, 145 h2t, and 252 t2t). Data are based on Ensembl Version 40.