MBE Advance Access originally published online on June 29, 2006
Molecular Biology and Evolution 2006 23(9):1652-1655; doi:10.1093/molbev/msl048
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
Processed Pseudogenes Are More Abundant in Human and Mouse X Chromosomes than in Autosomes
Département de Biologie et Centre de Recherche Avancée en Génomique Environnementale, Université d'Ottawa, Ottawa, Ontario, Canada
E-mail: gdrouin{at}science.uottawa.ca.
| Abstract |
|---|
|
|
|---|
Two different hypotheses have been proposed to explain the observation that some genomes contain more processed pseudogenes than others. One predicts that processed pseudogene abundance is inversely proportional to the substrate specificity of the reverse transcriptase that generates processed pseudogenes. The other predicts that the amount of processed pseudogenes found in genomes is proportional to the length of oogenesis. Here, we test the oogenesis hypothesis by analyzing the data from 6 studies that described the number of pseudogenes on different chromosomes of the human and/or mouse genomes. Our results show a significant overabundance of processed pseudogenes in the X chromosomes and a significant underrepresentation of processed pseudogenes in the Y chromosome of the human genome. These observations support the hypothesis that the number of processed pseudogenes is proportional to the length of oogenesis.
Key Words: processed pseudogenes X chromosomes Y chromosomes oogenesis human genome mouse genome
Processed pseudogenes are the result of the random integration of reverse-transcribed mature RNA molecules into genomes (Vanin 1985
; Weiner et al. 1986
). They are characterized by a lack of introns, the presence of a poly(A)-tail, and the presence of flanking direct repeats. Because mature RNA molecules do not contain promoter sequences, processed sequences are usually not expressed and they quickly accumulate frameshifts and/or premature stop codons and, in the vast majority of cases, become pseudogenes (Brosius 1999
).
Processed pseudogenes are common in mammalian species but are much less abundant in other animal species. Whereas thousands of processed pseudogenes are present in the mouse and human genomes (Gonçalves et al. 2000
; Zhang et al. 2002
, 2003
, 2004
; Ohshima et al. 2003
; Torrents et al. 2003
; Khelifi et al. 2005
; Bischof et al. 2006
), the Caenorhabditis elegans genome contains only 208 processed pseudogenes, the chicken genome contains at most 51 processed pseudogenes, and the Drosophila melanogaster genome contains at most 34 processed pseudogenes (Harrison et al. 2001
, 2003
; Misra et al. 2002
; International Chicken Genome Sequencing Consortium 2004
). Recent results indicate that part of these differences in processed pseudogene abundance is likely due to the substrate specificity of the reverse transcriptase that generates processed pseudogenes (International Chicken Genome Sequencing Consortium 2004
). However, differences in oogenesis are also likely to be responsible for these differences because the number of processed pseudogenes in animal species is proportional to the length of the lampbrush stage of these species (Weiner et al. 1986
). One way to test this hypothesis is to examine whether processed pseudogenes are more abundant in X chromosomes, and less abundant in Y chromosomes, than in autosomes (Weiner et al. 1986
; Graur and Li 2000
). Whereas autosomes spend half their time in males and females, X chromosomes spend two-thirds of their time in females and only one-third of their time in males and Y chromosomes are always in males (Miyata et al. 1987
). Therefore, according to the oogenesis hypothesis, processed pseudogenes should be roughly 33% more abundant in X chromosomes than in autosomes and not present in Y chromosomes. Here, we tested these predictions using data from 6 studies (Ohshima et al. 2003
; Torrents et al. 2003
; Zhang et al. 2003
, 2004
; Khelifi et al. 2005
; Bischof et al. 2006
).
Whereas both mouse genome studies identified about 4,000 processed pseudogenes, the numbers of human processed pseudogenes identified by different groups are very different (table 1). In the human genome, these numbers range from 3,664 to 17,759, a difference of almost 5-fold. As discussed in Zhang and Gerstein (2004)
, these differences result from the criteria used by individual research groups to define processed pseudogenes. Groups using more stringent criteria identified fewer processed pseudogenes than those using less stringent criteria. For example, the much larger number of processed pseudogenes identified by Torrents et al. (2003)
is due mainly to the fact that these authors did not limit the minimal size of processed pseudogenes (Zhang and Gerstein 2004
).
|
Table 1 shows that processed pseudogenes are significantly more abundant in X chromosomes than in autosomes. All but 1 of the 11 processed pseudogene data sets show a significant excess of processed pseudogenes in the human and mouse X chromosomes (table 1). The only exception is the data set of human ribosomal protein processed pseudogenes, where such sequences are significantly less abundant than expected. Interestingly, the human genome data of Bischof et al. (2006)
Table 1 also shows that processed pseudogenes are significantly less abundant in Y chromosomes than in autosomes. The data from the studies of Ohshima et al. (2003)
, Khelifi et al. (2005)
, and Bischof et al. (2006)
all show a clear and significant underrepresentation of processed pseudogenes on the human Y chromosome. The data from the study of Zhang et al. (2003)
also support an underrepresentation of processed pseudogenes on the Y chromosome, but this support is entirely due to ribosomal protein processed pseudogene sequences. Again, the data of Bischof et al. (2006)
are particularly interesting because of their internal control. In contrast to processed pseudogenes, there is a significant excess of nonprocessed pseudogenes on the Y chromosome. The fact that only processed pseudogenes are underrepresented on the Y chromosome supports the oogenesis hypothesis. The presence of processed pseudogenes on the Y chromosome, whereas the oogenesis hypothesis predicts they should be absent from this chromosome, is likely due to the transfer of processed pseudogenes from X chromosomes to Y chromosomes by recombination in the pseudoautosomal region (Graur and Li 2000
; Skaletsky et al. 2003
). Finally, it is not clear why the data from Torrents et al. (2003)
do not support the underrepresentation of processed pseudogenes on the human Y chromosome. However, the fact that this study did not identify any nonprocessed pseudogene on both the X and Y chromosomes may indicate that their data might be of limited usefulness to test the oogenesis hypothesis.
In conclusion, data from 6 studies on the number of pseudogenes show significant overabundance of processed pseudogenes in the X chromosomes of the human and mouse genomes and a significant underrepresentation of processed pseudogenes in the Y chromosome of the human genome. These observations support the hypothesis that the number of processed pseudogenes found in genomes is proportional to the length of their oogenesis.
| Methods |
|---|
|
|
|---|
The data on human processed and nonprocessed pseudogenes of Bischof et al. (2006)
| Acknowledgements |
|---|
|
|
|---|
This work was supported by a Discovery Grant from the National Science and Engineering Research Council of Canada.
| Footnotes |
|---|
Dan Graur, Associate Editor
| References |
|---|
|
|
|---|
Bischof JM, Chiang AP, Scheetz TE, Stone EM, Casavant TL, Sheffield VC, Braun TA. 2006. Genome-wide identification of pseudogenes capable of disease-causing gene conversion. Hum Mutat 27:54552.[CrossRef][Web of Science][Medline]
Brosius J. 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238:11534.[CrossRef][Web of Science][Medline]
Gonçalves I, Duret L, Mouchiroud D. 2000. Nature and structure of human genes that generate retropseudogenes. Genome Res 10:6728.
Graur D, Li W-H. 2000. Fundamentals of molecular evolution. 2nd ed. Sunderland, MA: Sinauer Associates.
Harrison PM, Echols N, Gerstein MB. 2001. Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. Nucleic Acids Res 29:81830.
Harrison PM, Milburn D, Zhang Z, Bertone P, Gerstein MB. 2003. Identification of pseudogenes in the Drosophila melanogaster genome. Nucleic Acids Res 31:10337.
International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695716.[CrossRef][Medline]
Khelifi A, Duret L, Mouchiroud D. 2005. HOPPSIGEN: a database of human and mouse processed pseudogenes. Nucleic Acids Res 33:D5966.
Misra S, Crosby MA, Mungall CJ, et al. (30 co-authors). 2002. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 3:research0083.
Miyata T, Hayashida H, Kuma K, Mitsuyasu K, Yasunaga T. 1987. Male-driven molecular evolution: a model and nucleotide sequence analysis. Cold Spring Harbor Symp Quant Biol 52:8637.
Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N. 2003. Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol 4:R74.[CrossRef][Medline]
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, et al. (40 co-authors). 2003. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423:82537.[CrossRef][Medline]
Torrents D, Suyama M, Zdobnov E, Bork P. 2003. A genome-wide survey of human pseudogenes. Genome Res 13:255967.
Vanin EF. 1985. Processed pseudogenes: characteristics and evolution. Annu Rev Genet 19:25372.[CrossRef][Web of Science][Medline]
Weiner AM, Deininger PL, Efstratiadis A. 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu Rev Biochem 55:63161.[CrossRef][Web of Science][Medline]
Zhang Z, Carriero N, Gerstein MB. 2004. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet 20:627.[CrossRef][Web of Science][Medline]
Zhang Z, Gerstein MB. 2004. Large-scale analysis of pseudogenes in the human genome. Curr Opin Genet Dev 14:32835.[CrossRef][Web of Science][Medline]
Zhang Z, Harrison P, Gerstein MB. 2002. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res 12:146682.
Zhang Z, Harrison PM, Liu L, Gerstein MB. 2003. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res 13:254158.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||