MBE Advance Access originally published online on August 10, 2006
Molecular Biology and Evolution 2006 23(11):2134-2141; doi:10.1093/molbev/msl085
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Research Articles |
A Roadmap of Tandemly Arrayed Genes in the Genomes of Human, Mouse, and Rat
Department of Computer Science, Virginia Tech, Blacksburg, Virginia
E-mail: lqzhang{at}vt.edu.
| Abstract |
|---|
|
|
|---|
Tandemly arrayed genes (TAGs) play an important functional and physiological role in the genome. Most previous studies have focused on individual TAG families in a few species, yet a broad characterization of TAGs is not available. Here we identified all TAGs in the genomes of humans, mouse, and rat and performed a comprehensive analysis of TAG distribution, TAG sizes, TAG orientations and intergenic distances, and TAG functions. TAGs account for about 1417% of all genes in the genome and nearly one-third of all duplicated genes, highlighting the predominant role that tandem duplication plays in gene duplication. For all species, TAG distribution is highly heterogeneous along chromosomes and some chromosomes are enriched with TAG forests, whereas others are enriched with TAG deserts. The majority of TAGs are of size 2 for all genomes, similar to the previous findings in Caenorhabditis elegans, Arabidopsis thaliana, and Oryza sativa, suggesting that it is a rather general phenomenon in eukaryotes. The comparison with the genome patterns shows that TAG members have a significantly higher proportion of parallel gene orientation in all species, corroborating Graham's claim that parallel orientation is the preferred form of orientation in TAGs. Moreover, TAG members with parallel orientation tend to be closer to each other than all neighboring genes in the genome with parallel orientation. The analyses of Gene Ontology function indicate that genes with receptor or binding activities are significantly overrepresented by TAGs. Computer simulation reveals that random gene rearrangements have little effect on the statistics of TAGs for all genomes. Finally, the average proportion of TAGs shows a trend of increase with the increase of family sizes, although the correlation between TAG proportions in individual families and family sizes is not significant.
Key Words: duplication gene family gene orientation gene ontology concerted evolution
| Introduction |
|---|
|
|
|---|
DNA duplication is the principle process by which the genetic raw material is provided for the origin of evolutionary novelties such as new gene function and expression patterns and is important in adaptive evolution (Wolfe and Li 2003
The availability of complete genomic sequences makes it possible to investigate how genomes are structured by different mechanisms of gene duplication. In this paper, we focused on tandem duplication. Tandem duplication of related genes has been shown to act as the driving evolutionary force in the origin and maintenance of gene families (Reams and Neidle 2004
) and has been a common mechanism of genetic adaptation to environmental challenges in organisms such as bacteria (Anderson and Roth 1977
; Roth et al. 1996
; Hastings et al. 2000
), yeast (Brown et al. 1998
), mosquitoes (Lenormand et al. 1998
), plants (Harms et al. 1992
; Shyr et al. 1992
; Leister 2004
), and humans and other mammals (Stark 1993
).
Specifically, we identified all tandemly arrayed genes (TAGs) in the genomes of humans, mouse, and rat and addressed the following issues: First, because duplicated genes can be arranged in tandem or dispersed on different chromosomes, we want to determine how many duplicated genes are in tandem arrays. This will shed light on the contribution of tandem duplication to gene duplication in the 3 mammalian genomes. Second, we are interested in examining the chromosomal distribution of TAGs to see whether there is significant clustering of TAGs on some chromosomal regions. Third, about 70% of the TAGs in the Arabidopsis thaliana genome have only 2 members in the array (Zhang and Gaut 2003
), so do the 3 mammalian genomes show a similar pattern? Fourth, is there any nonrandom association between gene function defined by Gene Ontology (GO) categories and TAGs? We expect that genes with certain functions may prefer tandem arrangement over other types of more dispersed spatial arrangements as tandem arrangement can either entail high probability of generating more duplicated copies or promote a desired degree of diversity or homogeneity via concerted evolution (Ohno 1970
). Fifth, it has been hypothesized that the preferred orientation of TAGs is parallel because locating on different strands is detrimental to the stability of the array (Graham 1995
). We thus examined the orientations of array members and compared them with the genome pattern. Finally, is tandem duplication a preferred mechanism of duplication for large gene families? In other words, do we observe more TAGs in larger families than smaller ones?
| Materials and Methods |
|---|
|
|
|---|
The peptide sequences for humans, mouse, and rat (version 35: October 2005) were obtained from Ensembl Genome Browser: http://www.ensembl.org. There are 33,869 genes in the human genome, 36,471 genes in the mouse genome, and 32,543 genes in the rat genome. Sequences annotated as unknown, random, and mitocondrial were removed, and only genes with known chromosome location were kept. For genes that have overlapping chromosomal locations, we discarded all shorter genes and kept the longest ones. Similar methods have been used in previous studies (McLysaght et al. 2002
|
Next, we applied TribeMCL with the default parameters to cluster genes into putative gene families. TribeMCL uses the Markov clustering algorithm for the assignment of proteins into families based on the similarity matrix generated from the all-against-all BlastP comparison of sequences (Enright et al. 2002
TAGs are usually defined as genes that are duplicated tandemly on chromosomes. During evolution, mutations such as insertion of genes that are unrelated to the TAG members (i.e., not through duplication) can disrupt the tandem spatial arrangement of the original TAGs and thus make the TAG "imperfect." To be as broad as possible, we examined the effect of these "irrelevant genes" (defined hereafter as spacers) on the quantities of TAGs. We defined spacers as genes that have a Blast E value higher than 1010 with the other members in the array and TAGs as duplicated genes with less than 010 spacers in between and calculated the numbers of TAGs that satisfy the criteria (Zhang and Gaut 2003
). Similar to the observation in A. thaliana (Zhang and Gaut 2003
), we found that for all 3 species, the counts of TAGs increase as more spacers are allowed in the array and most dramatically when only one spacer is allowed in the array (fig. 1). Therefore, for the remainder of the study, we focused on TAGs with at most one spacer. Note that under this criterion, we do not consider tandem duplications that contain multiple spacer genes.
|
It is possible that due to various genome rearrangements, some duplicated genes that were not originated by mechanism of unequal crossover appear together as TAGs. To examine whether the amount of nonreal TAGs can adversely affect the statistics of TAGs, we evaluated the likelihood of random arrangement of duplicated genes happening to be TAGs. We numbered all the genes in each genome, randomly picked the locations of all the duplicated genes, and computed the proportion of duplicated genes that appear to be TAGs. We repeated this process 10,000 times and obtained the distribution of the proportion of randomly duplicated genes that belong to TAGs (Fig. 2).
To formally investigate the heterogeneous distribution of TAGs along the chromosomes in the 3 species, we partitioned every chromosome into 5-Mblong blocks and calculated the proportion of genes that are TAGs in each block and identified the blocks that are TAG "deserts" and "forests." TAG deserts are defined as regions where there are no TAGs; i.e., the proportion of genes that are TAGs in these regions is zero. TAG forests are regions with TAG proportions in the upper 10% of the distribution of TAG proportions. We asked the question: are there chromosomes that are either enriched or depleted with TAG forests/deserts? We applied hypergeometric tests to examine the enrichment/depletion of TAG forests/deserts in each chromosome. We also checked pericentromeric and subtelomeric regions for enrichment of TAG deserts/forests, and this analysis was limited to the human genome because it is the only species with information on location of these regions. We used the same definition of pericentromeric and subtelomeric regions as initially suggested by Bailey et al. (2001)
.
To examine what GO functions are most likely to be overrepresented by TAGs across the 3 species, we used Onto Express (Draghici et al. 2003
). Hypergeometric tests were performed with the Bonferroni correction for multiple testing (Sokal and Rohlf 1994
).
| Results |
|---|
|
|
|---|
There are 783 perfect TAG clusters (i.e., zero spacers) containing 2,150 genes in the human genome, 820 perfect TAGs with 2,832 genes in the mouse genome, and 727 perfect TAGs with 2,762 genes in the rat genome (table 1). The perfect TAGs account for up to 15% of the nonoverlapping genes in the 3 genomes. When 1 spacer is allowed in between TAGs, TAGs account for
14%, 16%, and 17% of the total genes in humans, mouse, and rat genomes, respectively, suggesting that tandemly duplicated genes are a major feature of the mammalian genomes (fig. 1).
|
Although tandem duplication has been known as one of the mechanisms of gene duplication for 2 decades, we still do not know quantitatively how much tandem duplication has contributed to gene duplication in the genome. Here we calculated the proportions of TAGs after removing single-member clusters from the nonoverlapping gene data set for all 3 genomes (table 2). Approximately, 13,000 human genes, 14,043 mouse genes, and 12,466 rat genes are likely product of gene duplication. Of these duplicated genes, more than 21%, 25%, and 25% in humans, mouse, and rat genomes are TAGs (table 2), respectively, suggesting that tandem duplication is a predominant mechanism of gene duplication in these mammalian genomes.
Various genome rearrangements can create fortuitous TAGs from the existing dispersed members of duplicated genes. However, our computer simulation shows that the effect of nonreal TAGs on real TAG statistics is negligible (fig. 2 and Supplementary Material online): the maximum proportion of nonreal TAGs among the 10,000 simulated samples is only 1.3% in humans, 1.4% in mouse, and 2.1% in rat. The observed proportions of TAGs in the genomes are much higher than the values in the simulated samples (P value = 0), suggesting that the occurrence of TAGs is unlikely due to genome rearrangement and random distributions.
|
TAG size refers to the number of genes in an array. For all species, most of the TAGs are of size 2. There are altogether 902 TAG clusters in the human genome, and TAGs of size 2 (616) account for more than 68% of the TAGs. The mouse and rat genomes have 939 and 778 TAG clusters, respectively, of which
60% belong to TAGs of size 2. The distribution of TAG sizes shows similar patterns across the 3 genomes with the majority of TAGs having only 2 members and far fewer larger TAGs (fig. 3). The mouse and rat genomes appear to have more larger TAGs than the human genome. The average numbers of genes in TAGs are about 3.7 and 4.2 in mouse and rat, respectively, versus 3.1 in human.
|
The physical locations of all TAGs in the human genome are shown in fig. 4 (see supplementary figures for mouse and rat genomes, Supplementary Material online). In all the genomes, there is a great heterogeneity in the TAG distribution along the chromosomes. Hypergeometric tests show that in the human genome, there is significant depletion of TAG deserts in chromosomes 17 and 22, enrichment of TAG deserts in chromosomes 8 and 13, depletion of TAG forests in chromosome 10, and enrichment of TAG forests in chromosomes 9 and 19 (see supplementary tables, Supplementary Material online). Furthermore, we observed that at least one subtelomeric region in 12 chromosomes are TAG deserts. The remaining subtelomeric regions have either no genes or low TAG densities, suggesting that subtelomeric regions are not the preferred locations for TAGs. In contrast, pericentromeric regions appear to be concentrated with TAGs. The proportions of genes that are TAGs in these regions are high, and in fact, 13 chromosomes have at least one pericentromeric region that is a TAG forest.
|
In the mouse genome, there is significant depletion of TAG deserts in chromosomes 6 and 11 and enrichment of TAG deserts in chromosome 15. In the rat genome, there is significant depletion of TAG deserts in chromosomes 1, 4, and 10, enrichment of TAG deserts in chromosomes 2 and 5, depletion of TAG forests in chromosome 9, and enrichment of TAG forests in chromosomes 1 and 4 (see supplementary tables, Supplementary Material online). We note that first, no tests remain significant after the Bonferroni correction for multiple testing (Sokal and Rohlf 1994
Table 3 shows the chromosome orientation of TAG members in the 3 genomes. The proportions of TAG pairs with parallel orientation (
or 
) in humans, mouse, and rat are 68%, 76%, and 72%, respectively. Therefore, majority of neighboring members in the TAGs are on the same strand. Interestingly, the proportion of gene pairs of convergent orientation (
) is roughly the same as that of divergent orientation (
) in all species. We also calculated the genome proportions of gene pairs for the 3 types of orientations and compared them with the observed orientation in TAGs. For all species, the proportion of gene pairs with parallel orientation is much higher in TAGs than in the genome and the percentages of gene pairs with convergent or divergent orientations in TAGs are only about half of that in the genome. The chi-square "goodness of fit" test (Snedecor and Cochran 1989
) shows that the distribution of different types of orientations in TAGs is significantly different from that of all genes in the genome for all species (df = 2, P value = 2.2 x 1016).
|
The pattern of the distribution of the TAG orientations is distinctly different from that of all genes in the genome, so do the physical distances between TAGs also show a different pattern from that of all genes in the genome? The cumulative distributions of the intergenic distances for both TAGs and all genes in the genome are shown in figures 6 (also supplementary figures, Supplementary Material online). In humans, mouse, and rat genomes, the gene pairs with convergent orientation tend to have shorter intergenic distances than those with parallel orientation, which in turn tend to have shorter distances than those with divergent orientation.
|
We compared GO categories for molecular function, biological process, and cellular components for TAGs. Since GO terms are hierarchical and there are many possible levels one can use to test for functional enrichment, we chose for simplicity only the top 10 most represented GO categories with known molecular function to examine functional associations in TAGs (table 4). The top 10 GO categories are similar among humans, mouse, and rat, except that the ranking of each category differs. For example, "olfactory receptor activity" is the most represented molecular function in rat, and it ranks fourth in humans and fifth in mouse. The results demonstrate that genes with the molecular function of either binding or receptor activity tend to be TAGs in these mammalian genomes. The analyses of biological process and cellular component also show a similar pattern (see supplementary tables, Supplementary Material online). Interestingly, for all 3 species, duplicated genes that are not TAGs show a ranking of GO categories very similar to TAGs (results not shown).
|
| Discussion |
|---|
|
|
|---|
Significance of Tandem Duplication
Previous and current studies all suggest that TAGs are a major component of the genome. The percentages of TAGs in different genomes of plants and animals span a narrow range of 1017% (10% for Caenorhabditis elegans [Semple and Wolfe 1999
Studies on recent duplication in several mammalian genomes show that intrachromosomal duplications are more common than interchromosomal duplications (Cheung et al. 2003
; Eichler and Sankoff 2003
; Friedman and Hughes 2004
; Zhang et al. 2005
). Intrachromosomal duplication may include one or more genes and depending on the locations and the mechanisms of duplication, it can be tandem duplication. In fact, the number of intrachromosomal duplicated genes is significantly correlated with the number of TAGs for the 3 species (P value = 1 x 105, see Supplementary Material online).
Contribution of Tandem Duplication to Different Sized Gene Families
The research on TAGs has been largely confined to individual families of TAGs that serve important physiological functions, such as ribosomal RNA genes, histone genes, immunoglobulin genes, and MHC genes. In these large gene families, most of the members arose through tandem duplication. The unanswered question remains as to whether tandem duplication is a more favored duplication mechanism in large gene families than small ones. It is expected that the larger the family is, the more likely the family resorts to tandem duplication as an efficient way of creating more duplicated genes (Ohno 1970
).
To examine this issue, we calculated the average proportion of TAGs in gene families of different sizes for all 3 genomes. The average proportion of TAGs in gene families of different sizes ranges from 15% to 33% in 3 genomes and appears to be higher in large families than small families (table 5). However, although the average proportions of TAGs and family sizes are positively correlated (i.e., large families tend to have on average more TAGs than smaller ones) in humans (Spearman's rank correlation coefficient
= 0.91, P value = 0.00047) and mouse (
= 0.71, P value = 0.0275), but not in rat (
= 0.53, P value = 0.1133), the proportion of TAGs in all individual families and family sizes do not show significant correlation (P value > 0.05). The observation that large families tend to have on average higher percentages of TAGs than small ones could be due to the possibility that large families have a high likelihood of being tandem through random arrangement. However, our simulation (figure 2 and Supplementary Material online) shows that this is unlikely because random permutations of all the duplicated genes yield a tiny amount of nonreal TAGs, and for large gene families, which is a much smaller subset of all duplicated genes, the proportions of nonreal TAGs should be even smaller.
|
Distribution of TAGs on the Chromosomes
In all species, TAG distribution shows great heterogeneity along the chromosomes with some chromosomes enriched with TAGs and some depleted of TAGs (fig. 4). Using the definitions of TAG deserts and forests, we studied TAG enrichment and depletion with respect to individual chromosomes. Interestingly, the chromosomes that have greater than expected numbers of TAG forests tend to have less than expected numbers of TAG deserts (fig. 5 and Supplementary Material online), suggesting that TAGs have preferences for chromosomes. Furthermore, using the information on subtelomeric and pericentromeric regions in humans, we found that TAGs tend to be enriched in pericentromeric regions and thus have preference for specific locations as well. In this regard, it is worth noting that it has been shown that pericentromeric regions are enriched with recent segmental duplications in humans (Bailey et al. 2001
|
An interesting question relevant to the TAG distribution is whether the regions that are enriched with TAGs are also rich in other non-TAG duplicates. Our analysis shows that in humans, the 2 regions that are statistically enriched in TAGs also show enrichment of other non-TAG duplicates, whereas in rat, the regions that are rich in TAGs show depletion of non-TAG duplicates (results not shown).
Distribution of TAG Sizes
It has been observed that the majority of TAGs have only 2 members in the array in many genomes such as A. thaliana (Zhang and Gaut 2003
), C. elegans (Semple and Wolfe 1999
), and rice (Yu et al. 2005
), suggesting that it might be a rather general phenomenon in eukaryotes (fig. 3). The distribution of TAG sizes can be described by a power law distribution, a common type of distribution that appears in various biological quantities such as the distribution of gene family sizes in different eukaryotes (Enright et al. 2002
) and prokaryotes (Huynen and van Nimwegen 1998
).
Because most TAGs have only 2 members in the array, we expect that large families contain many small TAGs in order to achieve the large requirement of gene copies. Consistent with the expectation, we observed that large gene families tend to have many small TAGs located on different chromosomes instead of a handful of large tandem arrays. For example, the largest gene family in humans is a class that contains zinc fingercontaining transcription factors. More than 300 genes of this family are members of 90 TAGs (with 218 genes in the arrays) located on 18 chromosomes. Of the 90 TAGs, 45 arrays are of size 2, 16 of size 3, 12 of size 4, and 17 of size
5. Similarly, the largest gene family in mouse is a class that contains olfactory receptor genes. Almost 800 genes in this family are products of tandem duplication and are located on 17 chromosomes. The largest TAG in this family has 55 members and is located on chromosome 10. Of the 68 TAGs in this family, 12 arrays are of size 2, 8 of size 3, 1 of size 4, and 47 of size
5.
This observation makes us postulate that a large tandemly arrayed cluster is evolutionarily unstable. It is easy to imagine that once the array becomes large, various genome rearrangements such as insertions of transposable elements, inversions, and translocations can interrupt the array and reduce its size. Moreover, as the array size increases, the rate of unequal crossover might increase as well. Consequently, the rate of fluctuation in copy number will increase and so will the instability of the array. The instability of the large array may become deleterious at certain threshold and be acted against by natural selection. Ohno (1970)
discussed the possible deleterious effect that TAGs could generate due to the fluctuations in array size by unequal crossover and pointed out that genes in tandem array are not stable and have to be able to cope with the fluctuation in gene dosage. Unfortunately, this scenario is speculative because there have been no empirical studies on how or whether the rate of unequal crossover is affected by array size.
Another possible explanation is that large arrays are not the preferred form of arrangement that can satisfy the requirement of highly differentiated functions among array members. For example, a few gene families such as histone genes and ribosomal RNA genes in humans are in large TAGs due to high gene dosage requirements. However, for the majority of genes, diversity in function might be more preferred than quantity, as clearly demonstrated by genes with binding and receptor activities and disease resistance functions (table 4).
TAG Orientations and Intergenic Distances
Graham defined "tandem arrays" as arrays in which a DNA segment is repeated head to tail, with all copies in the same orientation, and suggested that unequal recombination homogenizes head-to-tail tandem arrays but would cause arrays with oppositely oriented repeats to undergo disastrous duplicationdeletion events, which results in these arrays being rare (Graham 1995
). The definition of TAGs in the current study is somewhat different from that of Graham's. Nevertheless, the results show that compared with the genome, parallel orientation in TAGs appears to be more favored than divergent or convergent orientations (table 3), corroborating Graham's conjecture, at least in the 3 genomes. Furthermore, consistent with the great disparity between the proportion of parallel orientations in TAGs and that in the genomes, the intergenic distances of genes in parallel orientations also show the greatest disparity between TAGs and the genomes compared with the distances among genes in convergent and divergent orientations (see Supplementary Material online). This raises the question of the evolutionary significance of parallel orientations in TAGs: why TAGs with parallel orientation show distinct patterns from that in the genome. Is there any adaptive significance with parallel orientation or does the observed pattern simply reflect the pattern of unequal crossover? To our knowledge, there have been no previous studies examining the effect of parallel versus other types of orientations in TAGs other than Graham's hypothesis. More studies are needed to investigate the underlying mechanism and the nature of this phenomenon.
| Supplementary Material |
|---|
|
|
|---|
Supplementary tables and figure are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Brandon Gaut, Deng Pan, and Mark Lawson for valuable comments on the manuscript. The work was supported by a start-up fund and the A Support Program for Innovative Research Strategies (ASPIRES) grant at Virginia Tech to L.Z.
Funding to pay the Open Access publication charges for this article was provided by ASPIRES.
| Footnotes |
|---|
Aoife McLysaght, Associate Editor
| References |
|---|
|
|
|---|
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389402.
Anderson RP and Roth JR. (1977) Tandem genetic duplications in phage and bacteria. Annu Rev Microbiol 31:473505.[CrossRef][ISI][Medline]
Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11:100517.
Brown CJ, Todd K, Rosenzweig RF. (1998) Multiple duplications of yeast hexose transport genes in response to selection in a glucose-limited environment. Mol Biol Evol 15:93142.[Abstract]
Cheung J, Wilson MD, Zhang J, et al. (12 co-authors). (2003) Recent segmental and gene duplications in the mouse genome. Genome Biol 4:R47.[CrossRef][Medline]
Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA. (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31:377581.
Eichler E and Sankoff D. (2003) Structural dynamics of eukaryotic chromosome evolution. Science 301:7937.
Enright AJ, Van Dongen S, Ouzounis CA. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:157584.
Friedman R and Hughes AL. (2003) The temporal distribution of gene duplication events in a set of highly conserved human gene families. Mol Biol Evol 20:15461.
Friedman R and Hughes AL. (2004) Two patterns of genome organization in mammals: the chromosomal distribution of duplicate genes in human and mouse. Mol Biol Evol 21:100813.
Graham GJ. (1995) Tandem genes and clustered genes. J Theor Biol 175:7187.[CrossRef][ISI][Medline]
Harms CT, Armour SL, DiMaio JJ, Middlesteadt LA, Murray D, et al. (11 co-authors). (1992) Herbicide resistance due to amplification of a mutant acetohydroxyacid synthase gene. Mol Gen Genet 233:42735.[ISI][Medline]
Hastings PJ, Bull HJ, Klump JR, Rosenberg SM. (2000) Adaptive amplification: an inducible chromosomal instability mechanism. Cell 103:72331.[CrossRef][ISI][Medline]
Huynen MA and van Nimwegen E. (1998) The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol 15:5839.[Abstract]
Leister D. (2004) Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance genes. Trends Genet 20:11622.[CrossRef][ISI][Medline]
Lenormand T, Guillemaud T, Bourguet D, Raymond M. (1998) Appearance and sweep of a gene duplication: adaptive response and potential for a new function in the mosquito Culex pipiens. Evolution 52:12.[Medline]
McLysaght A, Hokamp K, Wolfe KH. (2002) Extensive genomic duplication during early chordate evolution. Nat Genet 31:2004.[CrossRef][ISI][Medline]
Nei M. (1987) Molecular evolutionary genetics. (Columbia University Press, New York).
Ohno S. (1970) Evolution by gene duplication. (Springer-Verlag, New York).
Reams AB and Neidle EL. (2004) Selection for gene clustering by tandem duplication. Annu Rev Microbiol 58:11942.[CrossRef][ISI][Medline]
Roth JR, Benson N, Galitski T, Haack K, Lawrence JG, Miesel L. (1996) Rearrangements of the bacterial chromosome: formation and applications. Escherichia coli and Salmonella cellular and molecular biology(ASM Press, Washington, DC) Volume 2: pp. 225676.
Semple C and Wolfe K. (1999) Gene duplication and gene conversion in the Caenorhabditis elegans genome. J Mol Evol 48:55564.[CrossRef][ISI][Medline]
Shyr YY, Hepburn AG, Widholm JM. (1992) Glyphosate selected amplification of the 5-enolpyruvylshikimate-3-phosphate synthase gene in cultured carrot cells. Mol Gen Genet 232:37782.[ISI][Medline]
Snedecor GW and Cochran WG. (1989) Statistical methods. (Iowa State University Press, New York).
Sokal RR and Rohlf FJ. (1994) Biometry. (W.H. Freeman, Ames, IA).
Stark GR. (1993) Regulation and mechanisms of mammalian gene amplification. Adv Cancer Res 61:87113.[ISI][Medline]
Wolfe KH and Li W-H. (2003) Molecular evolution meets the genomics revolution. Nature 33:25565.
Wootton JC and Federhen S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. J Comput Chem 17:14963.
Yu J, Wang J, Lin W, et al. (117 co-authors). (2005) The genomes of Oryza sativa: a history of duplications. PLoS Biol 3:26681.
Zhang L and Gaut BS. (2003) Does recombination shape the distribution and evolution of tandemly arrayed genes (TAGs) in Arabidopsis thaliana? . Genome Res 13:253340.
Zhang L, Lu HHS, Chung W, Yang J, Li W-H. (2005) Patterns of segmental duplication in the human genome. Mol Biol Evol 22:13541.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Fan, Y. Chen, and M. Long Recurrent Tandem Gene Duplication Gave Rise to Functionally Divergent Genes in Drosophila Mol. Biol. Evol., July 1, 2008; 25(7): 1451 - 1458. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Gaut and J. Ross-Ibarra Selection on Major Components of Angiosperm Genomes Science, April 25, 2008; 320(5875): 484 - 486. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







