Skip Navigation


MBE Advance Access originally published online on December 7, 2007
Molecular Biology and Evolution 2008 25(1):131-143; doi:10.1093/molbev/msm251
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
25/1/131    most recent
msm251v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, Z. D.
Right arrow Articles by Gerstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, Z. D.
Right arrow Articles by Gerstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Analysis of Nuclear Receptor Pseudogenes in Vertebrates: How the Silent Tell Their Stories

Zhengdong D. Zhang*, Philip Cayting*, George Weinstock{dagger} and Mark Gerstein*,{ddagger},§

* Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut
{dagger} Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas
{ddagger} Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut
§ Department of Computer Science, Yale University, New Haven, Connecticut

E-mail: mark.gerstein{at}yale.edu.


    Abstract
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Transcription factor pseudogenes have not been systematically studied before. Nuclear receptors (NRs) constitute one of the largest groups of transcription factors in animals (e.g., 48 NRs in human). The availability of whole-genome sequences enables a global inventory of the NR pseudogenes in a number of vertebrate model organisms. Here we identify the NR pseudogenes in 8 vertebrate organisms and make our results available online at http://www.pseudogene.org/nr. The assignments reveal that NR pseudogenes as a group have characteristics related to generation and distribution contrary to expectations derived from previous large-scale pseudogene studies. In particular, 1) despite its large size, the NR gene family has only a very small number of pseudogenes in each of the vertebrate genomes examined; 2) despite the low transcription levels of NR genes, except for one, all other NR pseudogenes identified in this study are retropseudogenes; and 3) no duplicated NR pseudogenes are found, contrary to the fact that the NR gene family was expanded through several waves of gene duplication events. Our analyses further reveal a number of interesting aspects of NR pseudogenes. Specifically, through careful sequence analysis, we identify remnant introns in 2 mouse retropseudogenes, {psi}Rev-erbβ and {psi}LRH1. Generated from partially processed pre-mRNAs, they appear to be rare examples of highly unusual "semiprocessed" pseudogenes. Second, by comparing the genomic sequences, we uncover a pseudogene that is unique to the human lineage relative to chimpanzee. Generated by a recent duplication of a segment in the human genome, this pseudogene is a "duplicated–processed" pseudogene, belonging to a new pseudogene species. Finally, FXRβ was nonfunctionalized in the human lineage and thus appears to be an example of a rare unitary pseudogene. By comparing orthologous sequences, we dated the FXR–FXRβ duplication and the nonfunctionalization of FXRβ in primates.

Key Words: nuclear receptor • pseudogene • nonfunctionalization • protein evolution


    Background
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Nuclear receptors (NRs) regulate nuclear gene expression in response to various extracellular and intracellular signals and play a prominent role in a group of diverse and critical biological processes such as reproduction, differentiation, development, metabolism, metamorphosis, and homeostasis. Activated by binding of small hydrophobic molecules, they provide a direct link between ligands that signal different stages of those processes and cells' transcriptional responses. All NRs share a similar domain arrangement and, with a few exceptions, contain both the DNA-binding domain (DBD) and the ligand-binding domain (LBD), the 2 most conserved signature domains of this protein family. NRs have been specifically surveyed and studied in several species whose genomes have been fully sequenced, which include Ciona intestinalis (Dehal et al. 2002Go), Caenorhabditis elegans (Sluder et al. 1999Go), Drosophila melanogaster (Adams et al. 2000Go), human (Robinson-Rechavi et al. 2001Go; Zhang et al. 2004Go), mouse (Zhang et al. 2004Go), and rat (Zhang et al. 2004Go).

Pseudogenes ({psi}) are nongenic DNA segments that exhibit a high degree of sequence similarity to functional genes but contain disruptive defects, including, not exhaustively, premature stop codons, splice site mutations, and frameshift mutations, which prevent them from being expressed properly. Disruption in the promoter regions of gene can also result in its pseudogenization. Based on whether they have gone through RNA processing, pseudogenes can be classified into 2 categories: processed and unprocessed pseudogenes. Processed pseudogenes are generated by the integration of the reverse transcription products of processed mRNA transcripts into the genome. Unprocessed pseudogene has not gone through RNA processing and thus has retained the original exon–intron structure of the functional gene.

Previous studies have identified 3 NR pseudogenes in human: {psi}ERR{alpha} (Sladek et al. 1997Go), {psi}HNF4{gamma} (Tchenio et al. 1993Go), and {psi}FXRβ (Maglich et al. 2001Go; Otte et al. 2003Go) (see table 1 for symbols and full names of NRs included in this study). Recently, several other NR pseudogenes were also identified in mice and rats (Zhang et al. 2004Go). However, the availability of 8 vertebrate genome sequences (Waterston et al. 2002Go; Gibbs et al. 2004Go; International Chicken Genome Sequencing Consortium 2004Go; International Human Genome Sequencing Consortium 2004Go; The Chimpanzee Sequencing and Analysis Consortium 2005Go; Lindblad-Toh et al. 2005Go) makes it possible to conduct a detailed study of the NR pseudogenes in both human and vertebrate model systems. Here we present a comprehensive survey of NR pseudogenes in these 8 vertebrate genomes and report their locations, sequences, and defects. Recently, pseudogenes in the entire human genome have been identified either in gene family–specific studies (Glusman et al. 2001Go; Zhang et al. 2002Go) or in comprehensive surveys (Ohshima et al. 2003Go; Torrents et al. 2003Go; Zhang et al. 2003Go). Based on the mechanisms for pseudogene generation and the observations reported in those large-scale studies, we expected that NR pseudogenes would be mostly duplicated pseudogenes (like olfactory receptor pseudogenes) and few processed ones as NR genes were created by multiple gene duplication events and most NR genes have low expression levels. Our survey results here, however, are in striking opposition to these initial expectations. The analysis of these pseudogenes affords unique insights into the evolution and dynamics of this gene family and the mammalian genomes at large.


View this table:
[in this window]
[in a new window]

 
Table 1 Symbols of NR Used in the Text

 

    Methods
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
The human, mouse, and rat genomic sequences used in this study were human genome build of May 2004, mouse genome build of May 2004, and rat genome build of June 2003. Each of these 3 genomes was partitioned into 750-kb segments with 2-kb overlaps to take advantage of parallel computing. The DBD and LBD (designated as zf-C4 and hormone_rec in the Pfam database) were searched in the genomic sequences using GENEWISEDB. Predictions with frameshifts and premature stop codons that could not be credibly attributed to the sequencing errors were retained and aligned with 62 representative NR protein sequences to reveal their identities, which were the best BlastP hits. NR protein sequences to which these predictions were identified were then aligned to 10-kb genomic sequence intervals centered on the positions of these predictions using both GENEWISEDB and BLAT. The sequences, defects, and structures of the NR pseudogenes were constructed from GENEWISEDB and BLAT alignments, which verified and complemented each other.

To estimate the date of FXRFXRβ duplication (TD), 4 homologous sequences, FXRmouse, FXRβmouse, FXRrat, and FXRβrat, were used (Li 1997Go). Because the synonymous substitutions per synonymous site (Ks) are large and thus cannot be estimated accurately, they are not used to calculate TD. As the equation shows below, only the nonsynonymous substitutions per nonsynonymous site (Ka) are used. TD is estimated by

Formula
where TS is the divergence time between mouse and rat, for which 41 Myr were used in the calculation (Hedges 2002Go), is the average value of 4 numbers of nucleotide substitutions per site estimated from 4 pairwise comparisons: FXRmouseFXRβmouse, FXRmouseFXRβrat, FXRratFXRβmouse, and FXRratFXRβrat, and KaFXR and KaFXRβ are the numbers of the synonymous substitutions per synonymous site in FXR and FXRβ, respectively (supplementary table 1, Supplementary Material online).

To estimate the nonfunctionalization time (TN) of {psi}FXRβ in the primate lineage, we used the method devised by Chou et al. See Chou et al. (2002)Go for a detailed description of the method. Briefly, it assumes that nonsynonymous mutations are selected against until the gene is inactivated; thereafter, mutations at both synonymous and nonsynonymous sites accumulate at the neutral mutation rate. Quantification of lineage-specific mutation rates at synonymous and nonsynonymous sites remote from the inactivating deletion provides the information necessary for the calculation. Four FXRβ sequences, from human, mouse, rat, and chicken, were used for the calculation (supplementary table 2, Supplementary Material online). We used the method proposed by Li et al. (1981)Go to estimate the nonfunctionalization time of all retropseudogenes identified in this study. Because they are "dead on arrival," we assumed that TN = TD.

Multiple FXR and FXRβ peptide sequences together with the human LXR{alpha} peptide sequences were aligned using MUSCLE (Edgar 2004Go). The phylogeny of FXR and FXRβ was constructed from this sequence alignment using an implementation of the Neighbor-Joining algorithm in the PAUP*4.0 software package with a bootstrap of 1,000 replicates. The tree was rooted by LXR{alpha}.


    Results
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
NR Pseudogenes in Vertebrate Model Organisms
By using manual annotation and a pseudogene identification pipeline, we assigned NR pseudogenes in human, chimpanzee, mouse, rat, dog, chicken, tetraodon, and zebrafish—8 vertebrate model organisms whose genomes have been sequenced. Our identification results are available at http://pseudogene.org/nr. We focused our analyses on NR pseudogenes in human, chimpanzee, mouse, and rat due to the incomplete genome annotation for the other vertebrate genomes, which prevents complete assignments and confident interpretation of pseudogenes identified in those genomes. However, as the annotation improves, we will update our NR pseudogene assignments and post the results online.

Overall, there are only a very small number of NR pseudogenes in each of the vertebrate genomes examined. Within the human, chimpanzee, mouse, and rat genomes, 4, 3, 5, and 3 NR pseudogenes were identified, respectively (table 2). The existence of the 3 previously reported pseudogenes in the human genome—{psi}ERR{alpha} (Sladek et al. 1997Go), {psi}HNF4{gamma} (Tchenio et al. 1993Go), and {psi}FXRβ (Maglich et al. 2001Go; Otte et al. 2003Go)—was confirmed by our analysis. Except for 1 human NR pseudogene, {psi}FXRβ, which is unprocessed, all other NR pseudogenes identified are retropseudogenes. No duplicated NR pseudogenes were identified, a finding quite contrary to our expectation as described above and in the discussion—that is, because NR genes encode transcription factors and generally have low and restricted transcription profiles, we expected most of the NR pseudogenes to be created by duplication.


View this table:
[in this window]
[in a new window]

 
Table 2 Human and Rodent NR Pseudogenes

 
Two {psi}ERR{alpha} Are in the Human Genome
Sladek et al. (1997)Go reported the isolation of a processed ERR{alpha} pseudogene mapped to human chromosome 13q12.1. In our study, however, 2 processed {psi}ERR{alpha}s ({psi}ERR{alpha}+ and {psi}ERR{alpha}–), immediately next to each other on opposite DNA strands, were identified in the same chromosome band (13q12.11). The genomic sequence interval between these 2 {psi}ERR{alpha}, approximately 1.7 Mb, is well below the maximum resolution of conventional fluorescence in situ hybridization used by Sladek et al. on metaphase chromosomes and thus precluded the identification of both of pseudogenes in their study.

These 2 human {psi}ERR{alpha} sequences are very similar (but not identical, which rules out the possibility of a sequence assembly error): their Hamming distance, DH, which measures the proportion of site differences between 2 sequences, is only 3.65% and the number of nucleotide substitution per site between them, K, is 0.038 ± 0.006. The {psi}ERR{alpha} on the forward strand contains 5 frameshifts, the {psi}ERR{alpha} on the reverse strand has 4, and both have a premature stop codon at different positions. Of these defects in their sequences, 3 frameshifts are identical. Except for several internal deletions, both {psi}ERR{alpha} are of full length and highly similar, albeit defunct, copies of the transcript of the functional gene, which suggests a young age (~38 MYA) for both of them.

As expected, we identified a set of NR pseudogenes in chimpanzee similar to those in human. However, the chimpanzee ortholog of the human {psi}ERR{alpha}+ is absent. This absence indicates that {psi}ERR{alpha} was created first, at least before the divergence of human and chimpanzee, and at the same time the high sequence similarity and the shared defects between human {psi}ERR{alpha}+ and {psi}ERR{alpha}– suggest that the former was created by the duplication of the latter in the human lineage after its divergence from chimpanzee. In fact, those 2 pseudogenes reside in 2 expansive (>14.6 kb) and highly similar (96% identical) sequence segments in the human chromosome 13 that were created by a recent (<6 MYA), human-specific segmental duplication (Bailey et al. 2002Go; Cheng et al. 2005Go). Thus, human {psi}ERR{alpha}+ is a duplication of a processed pseudogene. This "duplicated–processed" pseudogene belongs to a new category of pseudogenes—first noted in a study of the human cytochrome c pseudogenes (Zhang and Gerstein 2003Go)—that are different from either duplicated or processed pseudogenes in terms of their underlying generating processes. The original processed pseudogene and the pseudogene duplicated from it both have little consequence to the fitness of the organism. Nevertheless, they are distinct pseudogene species. The distinction made between them is important for estimating the frequency of retrotransposition of mRNA transcripts. Clearly, such estimation will be inflated if the duplicated–processed pseudogenes are not excluded as they were generated by duplication, not retrotransposition, events.

Human {psi}FXRβ Is a Unitary Pseudogene with Multiple Nonfunctionalization Mutations
Previous studies (Maglich et al. 2001Go; Otte et al. 2003Go) have shown that human FXRβ is an unprocessed pseudogene with no functional counterpart (unitary pseudogene) in the human genome. This gene was also nonfunctionalized in other Old World primates studied so far but encodes a functional receptor in other mammals (see Otte et al. [2003Go] and below). The alignment of the mouse FXRβ protein sequence to the 3-frame translation of the human genomic sequence reveals that the "coding sequence" of the original human FXRβ gene were interrupted by at least 9 introns, and in the currently defunct gene, there are 10 disruptive defects, which consist of 4 frameshifts, 4 nonsense mutations, and 3 splice site mutations (fig. 1). These defects are equally distributed at the beginning and the end of this pseudogene.


Figure 1
View larger version (45K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The gene structure of human {psi}FXRβ. The mouse FXRβ protein sequence and the translation of the human genomic sequence at the {psi}FXRβ locus are aligned. The identical and similar character states in the alignment are indicated by vertical lines and colons, respectively. The identified sequence defects in human {psi}FXRβ locus are denoted in its translation by different symbols according to their types (see the figure key table) and also marked uniformly above the alignment. The human sequence coordinates indicate the distance of the nucleotide from the beginning of the genomic sequence from the sequencing clone RP11-350E19 (GenBank accession: AL358372.11).

 
Human {psi}FXRβ and its mouse ortholog are located in 2 expansive (>25 Mb) syntenic regions in the 2 genomes (fig. 2). The same set of genes, in an identical order and orientation, in 2 genomic neighborhood make it unlikely that human FXRβ was inactivated by a chromosomal translocation or other genomic rearrangement processes. The comparison of the orthologous sequences from human, chimpanzee, and rhesus (fig. 3A) reveals both ancestral and lineage-specific sequence defects, 14 in all, in {psi}FXRβ from these 3 primates (fig. 3B). The disruptive mutations at the 1st, 2nd, and 14th positions in {psi}FXRβ are present in all 3 species and, hence, most likely arose in the common ancestor of human, chimpanzee, and rhesus. Because the mutation at the 14th position, a nonsense mutation, is at the very end of the coding sequence and thus had considerably less disrupting power, either of the other 2 common mutations, 1 frameshift mutation and 1 splice site mutation at the start of the reading frame, could be the mutation that pseudogenized FXRβ in these primates. The orthologous genomic sequences from other primate species would make it possible to pin down the silencing mutation.


Figure 2
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— The genomic context of human and mouse {psi}FXRβ loci. The gene structure was constructed from the sequence alignment of mouse FXRβ protein sequence to the translated human genomic sequence. The approximate locations of the defects in human {psi}FXRβ are indicated by black dots above its enlarged gene structure. All exons, introns, and intergenic regions are drawn in proportion.

 

Figure 3
Figure 3
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Human, chimpanzee, and rhesus {psi}FXRβ. (A) Disruptive defects in {psi}FXRβ. Such sequence defects, including frameshifts, nonsense mutations, and splice site mutations, were found in the sequence alignment at 14 orthologous positions, which are numbered and accented in bold underlined letters. For clarity, the base letters in chimpanzee and rhesus {psi}FXRβ sequences identical to their corresponding ones in human {psi}FXRβ were replaced with dots. In this sequence alignment, "[ ]" marks the intron boundaries, "–" represents the gaps, and "~" the lost orthologous sequences. (B) Lineage specificity of disruptive defects in {psi}FXRβ. Defects specific to human, chimpanzee, and rhesus are shown at the corresponding leaf nodes. Defects occurred in an ancestor, shown at a branching node, are found in all its descendents. Thus, defects 1, 2, and 14 are found in all 3 primate species, whereas defects 3, 4, 5, 9, and 10 are found in both human and chimpanzee but not in rhesus.

 
Based on 4 pairwise comparisons among the mouse and rat FXR and FXRβ sequences, our study dated the ancient gene duplication event that created this pair of paralogous genes to be ~496 MYA prior to the speciation events (~450 MYA) that ultimately gave rise to fishes and other vertebrates (fig. 4A). This estimation was confirmed by the search result for FXR and FXRβ in the genomes of representative species that both genes exist in human, chimpanzee, mouse, chicken, frog (Xenopus tropicalis), and fish (both zebrafish and puffer fish, supplementary fig. 1, Supplementary Material online). The phylogeny of FXR and FXRβ reveals that by the measure of branch length (data not shown), FXRβ is evolving at least 5.6 times faster than FXR in mammals, but a similar difference in the evolution speed is not observed in nonmammalian vertebrates (fig. 4B, see supplementary fig. 2, Supplementary Material online for the multiple sequence alignment). Based on human, mouse, rat, and dog FXRβ sequences, our calculation indicates that the silencing of FXRβ happened ~42 MYA.


Figure 4
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— The evolution of FXR and FXRβ. (A) The relationships and divergence times of major groups of vertebrates (Hedges 2002Go). Both the FXRFXRβ duplication and FXRβ inactivation events are dated and marked accordingly in the phylogeny. Branch lengths are not proportional to time. (B) Dendrogram of FXR and FXRβ. The evolution of FXR and FXRβ in mammals is juxtaposed and highlighted in the tree. The difference in their evolution speed is readily perceivable. Branch lengths are proportional to time. The dendrogram was tested with a bootstrap of 1,000 replications, and the bootstrap values in percentage are labeled by the branching points.

 
Intergenic Sequences Immediately Upstream and Downstream to Human {psi}FXRβ Are Conserved
Human {psi}FXRβ is a transcribed pseudogene: Real-time quantitative polymerase chain reaction detected relatively high levels of expression of its mRNA in testis (Maglich et al. 2001Go; Otte et al. 2003Go). This strongly suggests that the promoter and possibly other cis-acting elements that regulate the transcription of human {psi}FXRβ have remained largely intact and functional even long after the inactivation of {psi}FXRβ. Alignment of multiple genomic sequences from 14 vertebrates including human shows strong sequence conservation in the upstream noncoding regions—where regulatory elements may reside—of human {psi}FXRβ. Three highly conserved sequence segments, each ~15 bp, were found within ~250 bp immediately upstream to the coding sequence of {psi}FXRβ (fig. 5A). Further upstream ~4,500 bp away in an expansive (75 kb) intergenic region between SIKE and SYCP1 resides a ~250-bp sequence segment that is highly conserved across vertebrates between human and chicken (fig. 5B). This sequence segment has a high regulatory potential (>0.35, see King et al. [2005Go]), and its mouse orthologous sequence is only 100 bp upstream to the first (noncoding) exon of the mouse FXRβ.


Figure 5
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Conservation of intergenic sequence upstream to human {psi}FXRβ. (A) Three highly conserved sequence segments immediately upstream to the coding sequence of {psi}FXRβ and the alignment of orthologous sequences from 13 vertebrates in these 3 sequence segments. (B) A highly conserved ~250-bp sequence segment with a high regulatory potential 4.5 kb upstream to {psi}FXRβ and a zoom-in view (C). Notice that this sequence segment has a high regulatory potential comparable to that of the transcription start site of the functional gene SYCP1.

 
Some NR Pseudogenes Were Derived from Semiprocessed RNA Transcripts
Most retropseudogenes were created from processed RNA transcripts. In this study, however, we found that 2 mouse NR pseudogenes contain remnant introns, which suggests that they were derived from semiprocessed RNA transcripts instead. Mouse {psi}Rev-erbβ on chromosome 19 is such a "semiprocessed pseudogene," as the fifth of 7 introns of Rev-erbβ was largely retained (fig. 6A). Although its splicing sites remain largely intact, this intron of {psi}Rev-erbβ, containing 1,962 nt, is two-thirds of its homologous sequence in Rev-erbβ. In addition to the length difference, these 2 introns share some sequence homology, mainly in their first 500 bases. A closer look also revealed another informative divergence: Although there is no interspersed repeat sequence present in the fifth intron of Rev-erbβ, the intron of {psi}Rev-erbβ hosts 2 short interspersed nuclear elements and 1 long interspersed nuclear element (LINE).


Figure 6
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— Detailed structures of 2 NR semiprocessed pseudogenes. (A) Correspondence between the gene structures of Rev-erbβ and {psi}Rev-erbβ in the mouse genome. Mouse {psi}Rev-erbβ is a semiprocessed pseudogene with a reduced intron, in which 2 short interspersed elements (the white arrows) and 1 LINE (the gray arrow) were found. These 3 interspersed repetitive sequences were not found in the intron at the same location in the functional paralogous gene. The similar sequences shared between the 2 introns, enlarged for clarity, are indicated by thicker line segments. In the picture, only the exons and the features in the 2 introns of interest were kept in proportion within each group. (B) The remnant intron in mouse {psi}LRH1 on chromosome 3. Sequence alignment shows that 2 sequence segments in this remnant intron have similar subsequences (86% and 100% identical, respectively) in the intron at the same location in LRH1. "[ ]" marks the intron boundaries, "*" represents a nonsense mutation, "!" a frameshift mutation, and "..." omitted sequences. The possible splicing sites, with a mutated donor site, are underlined.

 
There are 2 {psi}LRH1 in the mouse genome. Unlike {psi}LRH1 on chromosome 6, which is a processed pseudogene, {psi}LRH1 on chromosome 3 has a small intron of 86 bp long in its sequence (fig. 6B). Sequence alignment located this intron at the same place as the third intron, which is over 3.5 kb long, in the coding sequence of LRH1. Although 2 introns are greatly different in length, some limited sequence similarity is shared between them, which, in addition to their identical locations in respective genes, suggests that the former originated from the latter and was shortened subsequently. However, the presence of both the additional 3 bases, ATT, before the donor site (GT) and the 24 bases that could not be found in the corresponding intron of LRH1 is yet to be explained.


    Discussion
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
NR Pseudogenes Are Scarce
Overall, there are only a very small number of NR pseudogenes in each of the vertebrate genomes examined. Surprisingly, we could not identify any duplicated NR pseudogenes. The absence of duplicated NR pseudogenes is highly unusual because the NR family was expanded through 2 rounds of gene duplications to recognize more ligands as environmental signals: one that gave rise to the various groups of receptors before the arthropod/vertebrate split and the vertebrate-specific one that diversified the constituents of each group by creating the paralogous versions of the various receptors (Laudet 1997Go). Compared with the human olfactory receptor family, which was expanded through recent gene duplications but contains 359 (53%) duplicated pseudogenes (Glusman et al. 2001Go), the absence of NR-duplicated pseudogenes suggests that the duplications of the ancestral NR genes were tightly controlled: All NR genes newly created by duplication could successfully subfunctionalize and subsequently evolve into functionally different NR genes.

The number of processed NR pseudogenes is also unexpectedly small. In the human genome, ~8,000 processed pseudogenes, which originate from ~2,500 distinct functional genes, have been identified (Zhang et al. 2003Go)—that is, 3 processed pseudogenes for each functional gene that has been retrotransposed, an average well above that of NR family observed here. Given the size of the NR family (48 in human, 48 expected in chimpanzee, 49 in mouse, and 49 in rat were found in a genome-wide survey, see Zhang et al. [2004Go]), the scarcity of NR retropseudogenes is further evinced by the comparison with the ribosomal protein-coding genes, which have more than 1,700 (Zhang et al. 2002Go) retropseudogenes. The scarcity of NR retropseudogenes reflects the overall low expression level and oftentimes restricted expression locale of the NR genes and could be a general feature of most transcription factor–coding genes.

The inheritance and fixation of processed pseudogenes in a genome require—as a necessary condition—gene expression in the germ line or cells of the early embryo that contribute to the germ line. It has been shown that the required reverse transcription machinery can be provided by LINEs (Esnault et al. 2000Go). In addition, endogenous retroviruses (ERVs) can also contribute to the creation of processed pseudogenes (Jamain et al. 2001Go) as several ERV families are predominantly expressed in germ cells (especially in male germ cells) and in embryonic tissues (Lower et al. 1996Go).

The existence of processed pseudogenes of HNF4{gamma}, ERR{alpha}, Rev-erbβ, PNR, ERRβ, and LRH1 implies such an expression pattern for these NR genes. The expression of HNF4{gamma} was detected in spermatocytes and spermatozoa of testis (Drewes et al. 1996Go; Taraviras et al. 2000Go). ERR{alpha} is expressed both in the developing embryo (Bonnelye et al. 1997Go) and broadly in adult tissues including testis (Giguere et al. 1988Go). A recent study shows that LRH1 is expressed in the zygote and early embryo in the blastocyst in the inner cell mass, which at gastrulation gives rise, in part, to the germ line (Pare et al. 2004Go). Although expression of Rev-erbβ and PNR in germ line and early embryo has not been reported, their processed pseudogenes strongly suggest such an expression pattern.

Nonfunctionalization of FXRβ Was a Rare Event that Happened in the Evolution of Anthropoids
The creation of FXRβ exemplifies an episode in the second series of duplication events that created the paralogous versions of various receptors in vertebrates (Laudet 1997Go). Unlike most other paralogous NR genes, however, FXR and FXRβ have been evolving very differently in mammals: FXRβ is evolving much faster than FXR in mammals, but a similar difference in the evolution speed is not observed in nonmammalian vertebrates. It is known that both FXR and FXRβ regulate the biosynthesis of cholesterol (Goodwin et al. 2000Go; Lu et al. 2000Go; Otte et al. 2003Go). The accelerated evolution, a phenomenon also observed in many other new genes (Begun 1997Go; Johnson et al. 2001Go; Maston and Ruvolo 2002Go; Wang et al. 2002Go), is needed for FXRβ to be subfunctionalized as a receptor for lanosterol, a ligand different from the bile acids, which activates FXR.

Nonfunctionalization of FXRβ was a relatively recent event. Otte et al. studied FXRβ in human chimpanzee, gorilla, orangutan, and rhesus monkey, which are all Old World primates and found in all of them the telltale pseudogene defects similar to those in the human ortholog but not in the gene sequences from any other mammals. The date of the FXRβ silencing based on our calculation indicates that this event postdated the separation of catarrhines and platyrrhines in the primate phylogeny and thus suggests that FXRβ is not a pseudogene in the New World monkeys, such as marmosets and squirrel monkeys. Given the long evolution of ~496 Myr duration since its creation, prior to the nonfunctionalization, FXRβ had probably already evolved to encode an NR different from FXR.

Because the loss of a single-copy gene is usually deleterious and unlikely to be fixed in a population, it remains unclear under what circumstances FXRβ was silenced—making it an exceeding rare unitary pseudogene—and how its loss was tolerated and fixed in the ancestral anthropoid population. Two explanations, however, are possible. If the function that FXRβ provided became redundant in the ancient anthropoids under certain conditions, then {psi}FXRβ could be fixed in the population by random genetic drift under the same conditions because the loss of the FXRβ product did not constitute a disadvantage, and thus, the selection against the loss was rather weak. This release from selective pressure is believed to be how the nonfunctionalization of L-gulono-{gamma}-lactone oxidase could be fixed in humans and guinea pigs (Koshizaka et al. 1988Go): It has been hypothesized that the guinea pig and human ancestors subsisted on a naturally ascorbic acid–rich diet, and therefore, the loss of the enzyme did not constitute a disadvantage. On the other hand, instead of being a neutral event, the silencing of FXRβ could be advantageous to the anthropoid ancestors and consequently swept through the population to fixation—the kind of adaptive evolution illustrated by the inactivation of the {alpha}-1,3-galactosyltransferase gene in catarrhines (Galili and Swanson 1991Go) and the sarcomeric myosin gene (Stedman et al. 2004Go) and the CMP-N-acetylneuraminic acid hydroxylase gene (Chou et al. 2002Go) in humans as there seems to be a correlation between pseudogenization and physiological/anatomic changes. To our knowledge, no such correlation has been investigated for FXRβ inactivation. Until more data become available and further analyses are carried out, it remains unclear what was the fixation route—random genetic drift or positive selection—of {psi}FXRβ.

It is rather surprising to find {psi}FXRβ to be still transcribed in human even tens of millions of years after its pseudogenization. However, as recent studies have shown, transcription from pseudogenes may be a widely spread cellular phenomenon (Harrison et al. 2005Go; Zheng et al. 2005Go, 2007Go). Just like the transcription of functional genes, the transcription of pseudogenes should also be initiated from their promoters and possibly regulated by other sequence elements as they are transcribed by the same nuclear machinery. However, such cis-regulatory elements for pseudogenes have not been reported. The conserved noncoding sequences that we identified with high regulatory potential upstream to human {psi}FXRβ are possibly such "cryptic" promoter and other functional cis elements initiating and regulating its transcription. The conservation of short regulatory cis elements, which enables the transcription of pseudogenes long after their nonfunctionalization, may imply that the transcribed pseudogenes and their regulatory cis elements together are under negative selection. This in turn suggests that the pseudogene transcripts may play certain functional roles.

Semiprocessed Pseudogenes Provide Insights into the RNA Splicing Process
A retropseudogene is a nonfunctionalized retrosequence, which is generated through a multistep biological process: The DNA is transcribed into pre-mRNA and then processed into mRNA; the mRNA is reverse transcribed into cDNA, which becomes integrated into the genomic DNA. Most retropseudogenes were derived from (fully) processed RNA transcripts, including ones derived from alternatively spliced transcripts (Shemesh et al. 2006Go), but in rare cases, retropseudogenes such as the mouse {psi}Rev-erbβ and {psi}LRH1 found in this study were derived from semiprocessed RNA transcripts.

It is conceivable that the semiprocessed pseudogene structure found in a genome could be generated through several different biological processes (fig. 7). Pseudogenes with (remnant) "introns" can be genuine semiprocessed pseudogenes generated from partially spliced premature mRNA (fig. 7A). Such pseudogene structure could also be created by sequence insertion (fig. 7B) or deletion (fig. 7C); however, this is unlikely as the sequence alteration must be highly precise. A processed retropseudogene generated from the unobserved low-level alternatively spliced mRNA (fig. 7D) could also appear as a semiprocessed pseudogene at the first glance when compared with the known mRNA sequence. Sequence insertion could be slightly more probable than the latter 2 processes as intron insertion at the splice site—intron gain—has been observed before (Roy and Gilbert 2006Go). Nevertheless, the exceedingly low probability for the latter 3 pseudogene generation processes to occur and the sequence characteristics observed in mouse {psi}Rev-erbβ and {psi}LRH1 argue favorably, if not exclusively, that these 2 pseudogenes are rare semiprocessed retropseudogenes.


Figure 7
View larger version (11K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 7.— Creation of the semiprocessed pseudogene structure. (A) Retrotransposition of partially spliced premature mRNA. (B) Insertion of intron-like sequences into a processed pseudogene. (C) Deletion of intron sequences from a duplicated pseudogene. (D) Retrotransposition of unobserved low-level alternatively spliced mRNA. The wavy lines represent the genomic DNA.

 
By the nature of the generating process, retrosequences should lose their function right at their creation. However, the murine preproinsulin I gene, a functional semiprocessed retrogene, is a rare, if not the sole, exception. In our study, we found no substantial sequence similarity between the regions (up to 5 kb) upstream from the "coding regions" of {psi}Rev-erbβ and Rev-erbβ in mouse, which suggests that, unlike the murine preproinsulin I retrogene, {psi}Rev-erbβ did not carry any of the Rev-erbβ promoter and regulatory sequences and thus was silenced on the spot after its retrotransposition. The simultaneity of the duplication and the nonfunctionalization of {psi}Rev-erbβ, which freed its coding sequence from selective pressure immediately after retrotransposition, accounts for the similar sequence divergence in all its regions homologous to Rev-erbβ.

After being transcribed from the DNA, the primary transcripts undergo RNA splicing, a series of processing reactions mediated by the spliceosome to remove the intronic segments. The existence of the semiprocessed pseudogenes signifies that the removal of introns is not a nonstop process proceeding from the start to the end. Instead, it is a collection of discrete splicing events: Each intron is removed by a spliceosome assembled at its splicing sites. This discreteness makes it possible for a semiprocessed pre-mRNA to be "hijacked" and reversely transcribed into cDNAs. However, given the rarity of the semiprocessed pseudogenes, despite being a discrete process, RNA splicing should be a sequence of very fast and efficient removals of all introns from primary RNA transcripts.


    Conclusions
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
We surveyed the NR pseudogenes in 8 vertebrate species whose complete genome sequences are currently available and provide a detailed study of NR pseudogenes in human, chimpanzee, mouse, and rat, giving a complete catalog of their locations, sequences, and defects. In contrast to some highly expressed gene families, such as ones encoding ribosomal proteins and olfactory receptors, NR pseudogenes are scarce in all surveyed genomes, reflecting the temporally and spatially restricted expression pattern of transcription factor–coding genes.

In striking opposition to the initial expectations derived from the mechanisms for pseudogene generation and previous large-scale pseudogene analysis, all but one NR pseudogenes identified in this study are retropseudogenes, and no duplicated NR pseudogenes are found. Through detailed sequence analysis of {psi}FXRβ, a previously identified unitary pseudogene in the Old World primates, we could both date its nonfunctionalization in the anthropoid lineage and identify the mutations that most likely caused its silencing. Comparing the noncoding sequence upstream to {psi}FXRβ in human with the orthologous sequences in other vertebrate genomes, we found conserved sequence segments with high regulatory potential. Such short sequences are cryptic cis-regulatory elements, as they enable {psi}FXRβ, a human pseudogene, to be transcribed. Moreover, gene structure analysis revealed that 2 mouse NR pseudogenes contain remnant introns, which suggests that unlike processed pseudogenes they were derived from semiprocessed RNA transcripts. The finding of such rare semiprocessed pseudogenes indicates that RNA splicing is a sequence of fast and efficient but discrete removals of introns from primary RNA transcripts.


    Supplementary Material
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Supplementary tables 1 and 2 and figures 1 and 2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 
Z.D.Z. thanks Deyou Zheng for helpful discussion. Z.D.Z. was funded by a National Institutes of Health (NIH) grant (T15 LM07056) from the National Library of Medicine. This work was supported by grants from NIH/National Human Genome Research Institute to G.W. and M.G.


    Footnotes
 
Dan Graur, Associate Editor


    References
 TOP
 Abstract
 Background
 Methods
 Results
 Discussion
 Conclusions
 Supplementary Material
 Acknowledgements
 References
 

    Adams MD, Celniker SE, Holt RA, et al, (190 co-authors). The genome sequence of Drosophila melanogaster. Science (2000) 287:2185–2195.[Abstract/Free Full Text]

    Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science (2002) 297:1003–1007.[Abstract/Free Full Text]

    Begun DJ. Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics (1997) 145:375–382.[Abstract]

    Bonnelye E, Vanacker JM, Spruyt N, Alric S, Fournier B, Desbiens X, Laudet V. Expression of the estrogen-related receptor 1 (ERR-1) orphan receptor during mouse development. Mech Dev (1997) 65:71–85.[CrossRef][Web of Science][Medline]

    Cheng Z, Ventura M, She X, et al, (12 co-authors). A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature (2005) 437:88–93.[CrossRef][Medline]

    The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature (2005) 437:69–87.[CrossRef][Medline]

    Chou HH, Hayakawa T, Diaz S, Krings M, Indriati E, Leakey M, Paabo S, Satta Y, Takahata N, Varki A. Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution. Proc Natl Acad Sci USA (2002) 99:11736–11741.[Abstract/Free Full Text]

    Dehal P, Satou Y, Campbell RK, et al, (87 co-authors). The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science (2002) 298:2157–2167.[Abstract/Free Full Text]

    Drewes T, Senkel S, Holewa B, Ryffel GU. Human hepatocyte nuclear factor 4 isoforms are encoded by distinct and differentially expressed genes. Mol Cell Biol (1996) 16:925–931.[Abstract]

    Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 32:1792–1797.[Abstract/Free Full Text]

    Esnault C, Maestre J, Heidmann T. Human LINE retrotransposons generate processed pseudogenes. Nat Genet (2000) 24:363–367.[CrossRef][Web of Science][Medline]

    Galili U, Swanson K. Gene sequences suggest inactivation of alpha-1,3-galactosyltransferase in catarrhines after the divergence of apes from monkeys. Proc Natl Acad Sci USA (1991) 88:7401–7404.[Abstract/Free Full Text]

    Gibbs RAGM, Weinstock ML, Metzker DM, et al, (229 co-authors). Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature (2004) 428:493–521.[CrossRef][Medline]

    Giguere V, Yang N, Segui P, Evans RM. Identification of a new class of steroid hormone receptors. Nature (1988) 331:91–94.[CrossRef][Medline]

    Glusman G, Yanai I, Rubin I, Lancet D. The complete human olfactory subgenome. Genome Res (2001) 11:685–702.[Abstract/Free Full Text]

    Goodwin B, Jones SA, Price RR, et al, (13 co-authors). A regulatory cascade of the nuclear receptors FXR, SHP-1, and LRH-1 represses bile acid biosynthesis. Mol Cell (2000) 6:517–526.[CrossRef][Web of Science][Medline]

    Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M. Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res (2005) 33:2374–2383.[Abstract/Free Full Text]

    Hedges SB. The origin and evolution of model organisms. Nat Rev Genet (2002) 3:838–849.[CrossRef][Web of Science][Medline]

    International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature (2004) 432:695–716.[CrossRef][Medline]

    International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature (2004) 431:931–945.[CrossRef][Medline]

    Jamain S, Girondot M, Leroy P, Clergue M, Quach H, Fellous M, Bourgeron T. Transduction of the human gene FAM8A1 by endogenous retrovirus during primate evolution. Genomics (2001) 78:38–45.[CrossRef][Web of Science][Medline]

    Johnson ME, Viggiano L, Bailey JA, Abdul-Rauf M, Goodwin G, Rocchi M, Eichler EE. Positive selection of a gene family during the emergence of humans and African apes. Nature (2001) 413:514–519.[CrossRef][Medline]

    King DC, Taylor J, Elnitski L, Chiaromonte F, Miller W, Hardison RC. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res (2005) 15:1051–1060.[Abstract/Free Full Text]

    Koshizaka T, Nishikimi M, Ozawa T, Yagi K. Isolation and sequence analysis of a complementary DNA encoding rat liver L-gulono-gamma-lactone oxidase, a key enzyme for L-ascorbic acid biosynthesis. J Biol Chem (1988) 263:1619–1621.[Abstract/Free Full Text]

    Laudet V. Evolution of the nuclear receptor superfamily: early diversification from an ancestral orphan receptor. J Mol Endocrinol (1997) 19:207–226.[Abstract/Free Full Text]

    Li WH. Molecular evolution (1997) Sunderland (MA): Sinauer Associates.

    Li WH, Gojobori T, Nei M. Pseudogenes as a paradigm of neutral evolution. Nature (1981) 292:237–239.[CrossRef][Medline]

    Lindblad-Toh KC, Wade M, Mikkelsen TS, et al, (236 co-authors). Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature (2005) 438:803–819.[CrossRef][Medline]

    Lower R, Lower J, Kurth R. The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences. Proc Natl Acad Sci USA (1996) 93:5177–5184.[Abstract/Free Full Text]

    Lu TT, Makishima M, Repa JJ, Schoonjans K, Kerr TA, Auwerx J, Mangelsdorf DJ. Molecular basis for feedback regulation of bile acid synthesis by nuclear receptors. Mol Cell (2000) 6:507–515.[CrossRef][Web of Science][Medline]

    Maglich JM, Sluder A, Guan X, Shi Y, McKee DD, Carrick K, Kamdar K, Willson TM, Moore JT. Comparison of complete nuclear receptor sets from the human, Caenorhabditis elegans and Drosophila genomes. Genome Biol (2001) 2:RESEARCH0029.[Medline]

    Maston GA, Ruvolo M. Chorionic gonadotropin has a recent origin within primates and an evolutionary history of selection. Mol Biol Evol (2002) 19:320–335.[Abstract/Free Full Text]

    Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N. Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol (2003) 4:R74.[CrossRef][Medline]

    Otte K, Kranz H, Kober I, et al, (18 co-authors). Identification of farnesoid X receptor beta as a novel mammalian nuclear receptor sensing lanosterol. Mol Cell Biol (2003) 23:864–872.[Abstract/Free Full Text]

    Pare JF, Malenfant D, Courtemanche C, Jacob-Wagner M, Roy S, Allard D, Belanger L. The fetoprotein transcription factor (FTF) gene is essential to embryogenesis and cholesterol homeostasis and is regulated by a DR4 element. J Biol Chem (2004) 279:21206–21216.[Abstract/Free Full Text]

    Robinson-Rechavi M, Carpentier AS, Duffraisse M, Laudet V. How many nuclear hormone receptors are there in the human genome? Trends Genet (2001) 17:554–556.[CrossRef][Web of Science][Medline]

    Roy SW, Gilbert W. The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet (2006) 7:211–221.[CrossRef][Web of Science][Medline]

    Shemesh R, Novik A, Edelheit S, Sorek R. Genomic fossils as a snapshot of the human transcriptome. Proc Natl Acad Sci USA (2006) 103:1364–1369.[Abstract/Free Full Text]

    Sladek R, Beatty B, Squire J, Copeland NG, Gilbert DJ, Jenkins NA, Giguere V. Chromosomal mapping of the human and murine orphan receptors ERRalpha (ESRRA) and ERRbeta (ESRRB) and identification of a novel human ERRalpha-related pseudogene. Genomics (1997) 45:320–326.[CrossRef][Web of Science][Medline]

    Sluder AE, Mathews SW, Hough D, Yin VP, Maina CV. The nuclear receptor superfamily has undergone extensive proliferation and diversification in nematodes. Genome Res (1999) 9:103–120.[Abstract/Free Full Text]

    Stedman HH, Kozyak BW, Nelson A, Thesier DM, Su LT, Low DW, Bridges CR, Shrager JB, Minugh-Purvis N, Mitchell MA. Myosin gene mutation correlates with anatomical changes in the human lineage. Nature (2004) 428:415–418.[CrossRef][Medline]

    Taraviras S, Mantamadiotis T, Dong-Si T, Mincheva A, Lichter P, Drewes T, Ryffel GU, Monaghan AP, Schutz G. Primary structure, chromosomal mapping, expression and transcriptional activity of murine hepatocyte nuclear factor 4gamma. Biochim Biophys Acta (2000) 1490:21–32.[Medline]

    Tchenio T, Segal-Bendirdjian E, Heidmann T. Generation of processed pseudogenes in murine cells. EMBO J (1993) 12:1487–1497.[Web of Science][Medline]

    Torrents D, Suyama M, Zdobnov E, Bork P. A genome-wide survey of human pseudogenes. Genome Res (2003) 13:2559–2567.[Abstract/Free Full Text]

    Wang W, Brunet FG, Nevo E, Long M. Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster. Proc Natl Acad Sci USA (2002) 99:4448–4453.[Abstract/Free Full Text]

    Waterston RHK, Lindblad-Toh E, Birney J, et al, (222 co-authors). Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 420:520–562.[CrossRef][Medline]

    Zhang Z, Burch PE, Cooney AJ, Lanz RB, Pereira FA, Wu J, Gibbs RA, Weinstock G, Wheeler DA. Genomic analysis of the nuclear receptor family: new insights into structure, regulation, and evolution from the rat genome. Genome Res (2004) 14:580–590.[Abstract/Free Full Text]

    Zhang Z, Gerstein M. The human genome has 49 cytochrome c pseudogenes, including a relic of a primordial gene that still functions in mouse. Gene (2003) 312:61–72.[CrossRef][Web of Science][Medline]

    Zhang Z, Harrison P, Gerstein M. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res (2002) 12:1466–1482.[Abstract/Free Full Text]

    Zhang Z, Harrison PM, Liu Y, Gerstein M. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res (2003) 13:2541–2558.[Abstract/Free Full Text]

    Zheng D, Frankish A, Baertsch R, et al, (16 co-authors) Forthcoming. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription and evolution. Genome Res (2007) 17:839–851.[Abstract/Free Full Text]

    Zheng D, Zhang Z, Harrison PM, Karro J, Carriero N, Gerstein M. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol (2005) 349:27–45.[CrossRef][Web of Science][Medline]

Accepted for publication October 25, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
25/1/131    most recent
msm251v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Zhang, Z. D.
Right arrow Articles by Gerstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zhang, Z. D.
Right arrow Articles by Gerstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?