MBE Advance Access originally published online on July 13, 2005
Molecular Biology and Evolution 2005 22(11):2135-2138; doi:10.1093/molbev/msi209
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
The Silencing of Pseudogenes
Evolutionary Genomics Group, Division of Microbiology, Miguel Hernandez University, Alicante, Spain
E-mail: alex.mira{at}umh.es.
| Abstract |
|---|
|
|
|---|
Pseudogenes are nonfunctional DNA sequences that can accumulate in the genomes of some bacterial species, especially those undergoing processes like niche change, host specialization, or weak selection strength. They may last for long evolutionary periods, opening the question of how the genome prevents expression of these degenerated or disrupted genes that would presumably give rise to malfunctioning proteins. We have investigated ribosomal binding strength at Shine-Dalgarno sequences and the prevalence of
70 promoter regions in pseudogenes across bacteria. It is reported that the RNA polymerasebinding sites and more strongly the ribosome-binding regions of pseudogenes are highly degraded, suggesting that transcription and translation are impaired in nonfunctional open reading frames. This would reduce the metabolic investment on faulty proteins because although pseudogenes can persist for long time periods, they would be effectively silenced. It is unclear whether mutation accumulation on regulatory regions is neutral or whether it is accelerated by selection.
Key Words: pseudogene Shine-Dalgarno mutation accumulation promoter spacer gene expression
When a bacterial gene undergoes a period of frequent mutations, stop codons can be introduced in the sequence. In addition, insertions or deletions can cause a shift in the reading frame or the removal of a vital section of the gene, giving rise to a pseudogene (J. O. Andersson and S. G. Andersson 1999
). The finding of these functionless open reading frames (ORFs) was relatively rare and frequently linked to species undergoing episodes of genetic drift or low selection strength. Recently, the exponential increase of genomic data have shown that pseudogenes are more common than previously thought and that they can represent a significant fraction of the genome. The case of the obligate intracellular species Mycobacterium leprae is dramatic, containing over 1,100 recognizable pseudogenes (Cole et al. 2001
). Other examples of bacteria with hundreds of pseudogenes are given by species that have reduced their host ranges, like Shigella flexneri or Salmonella typhi (Parkhill et al. 2001
; Wei et al. 2003
). It has been shown that pseudogenes can also be present in high numbers in free-living species such as Escherichia coli, where numerous pseudogenes had passed previously undetected (Lerat and Ochman 2004
).
Not only are pseudogenes pervasive in bacterial genomes but they can also last for long periods (Mira, Ochman, and Moran 2001
; van Ham et al. 2003
). For example, the average half-life of pseudogenes in Buchnera has been estimated in 23.9 Myr (Gomez-Valero, Latorre, and Silva 2004
), and a few strains of this intracellular symbiont share common deletions in a limited number of pseudogenes, suggesting that some of these nonfunctional ORFs may have lasted in the genome since their divergence, over 50 MYA (Mira, Ochman, and Moran 2001
). Shigella shared an ancestor with the K12 strain of E. coli 35,000270,000 years ago (Pupo, Lan, and Reeves 2000
), and 14 common pseudogenes still remain in both species (Lerat and Ochman 2004
). In M. leprae the average length of pseudogenes is still 82.3% that of functional genes. Thus, many of these truncated ORFs might in principle be transcribed and/or translated. If this were the case, a great deal of metabolic resources would be employed to express degenerated ORFs that no longer code for proper functions. It is also likely that these malfunctioning proteins could interact with other pathways, for example, by losing specificity and competing for other substrates. It is therefore important that mechanisms are deployed to prevent this potential danger and unnecessary waste of resources.
One of these mechanisms would be the elimination of intergenic spacers containing regulatory regions or the accumulation of mutations in ribosome-binding sites and transcription promoters. We studied the prevalence of these sequences in different bacteria with numerous pseudogenes. The Shine-Dalgarno (SD) region is a sequence complementary to a highly conserved region at the 3' end of the 16S rRNA gene (Shine and Dalgarno 1974
). It is involved in the binding of the mRNA to the ribosome, and changes in the SD sequence severely restrict translation (Dunn, Buzash-Pollert, and Studier 1978
). We have estimated the binding strength (free-energy values based on nucleotide pairing rules, see Methods) of the region preceding annotated pseudogenes with the equivalent 3' section of the 16S rRNA from each bacterium. When compared to the functional genes in the same species, pseudogenes appear to have largely lost the SD region (fig. 1). Functional homologs of the pseudogenes in related species had conserved SD regions, indicating that the degradation of these ribosome-binding sites was undergone after the genes turned into pseudogenes. In M. leprae, where pseudogenes have only barely reduced their length (average length is 804 vs. 977 bp for functional genes), the ribosomal binding strength is dramatically diminished. This was repeated in virtually all cases of fully sequenced bacteria (some examples are shown in fig. 1ae, and statistics are included in the supplementary information table 1, Supplementary Material online). A few exceptions appeared in the species Yersinia pestis, S. typhi, and S. flexneri (the latter is shown in fig. 1f). It is interesting to note that these three species have only recently specialized in human hosts, after which they have probably undergone the pseudogenization expansion (Parkhill et al. 2001
). This process is thought to have happened with the human population expansion that occurred in Neolithic times, no more than 10,000 years ago. The case of the causative agent of plague can be particularly recent, as Y. pestis could have emerged as early as 1,50020,000 years in the past (Achtman et al. 1999
). Thus, the few cases in which the SD sequences of pseudogenes displayed close to normal binding strengths occurred in species that have suffered pseudogenization in recent evolutionary time.
|
The length patterns of intergenic spacers correlate with their flanking genes' potential for transcription initiation and termination (Rogozin et al. 2002
|
To calculate a gene's transcription potential in silico is, however, complicated (Stormo 2000
70 promoter candidates found on each pseudogene in E. coli CFT073 (the sequenced strain with the highest number of pseudogenes) and its corresponding functional equivalent in the strains K12 and O157 (42 and 38 putative functional homologs were found, respectively) were estimated. This data set is biased toward recently formed pseudogenes (i.e., recognizable by sequence similarity). As shown in figure 3, some promoters have similar scores in either strain but most pseudogenes in the CFT073 strain show divergent promoters when compared with the homologous functional regions in the other two E. coli strains (P < 0.01, t-test). The analysis was repeated in the pseudogenes of Salmonella spp., but no significant result was found (t = 1.94, P = 0.067). Thus, promoter degeneration may not be pervasive across bacteria, although the latter result could be due to different promoter sequences or too short time since pseudogenization. When promoter-finding algorithms are improved, a better evaluation of regulatory regions' degeneration will be possible on a wider range of bacteria. In addition, experimental studies of gene expression in pseudogenes will also be helpful.
|
The transcriptional and translational silencing of pseudogenes would add to the tagging system mediated by transfer and messenger RNA. This dual-nature RNA tags proteins translated from "broken mRNAs," i.e., mRNA sequences without stop codons, which are then degraded by proteases (Withey and Friedman 2003
The latter possibility can be tested by contrasting deletion rates between pseudogenes and their spacers. Thus, we compared the length of pseudogenes and their spacers in relation to the length of their functional homologs in related species (see supplementary fig. 2, Supplementary Material online). For some species, like B. mallei, Bartonella quintana, or Staphylococcus aureus, deletions seem to have happened preferentially on the spacers. However, in other bacteria such as B. pertussis, there are multiple cases where pseudogene length (relative to their functional homologs) is significantly shorter than their corresponding spacers, and other species show no trend. The results are also difficult to interpret because remnants of IS elements' insertions and eroded genes extend the length of some spacers and pseudogenes, making them longer than their functional counterparts. Furthermore, even if all cases in which ratios are larger than one (implying insertions of IS elements or misannotation of pseudogenes as part of intergenic spacers) are removed, the data do not show a uniform pattern across species. Thus, it is unclear whether selection is accelerating the degeneration of pseudogenes' regulatory regions.
| Methods |
|---|
|
|
|---|
We used the software developed by Osada, Saito, and Tomita (1999)
We studied promoter preservation solely for E. coli strains, where the consensus sequence for transcription promoters is best known. Annotated pseudogenes in E. coli CFT073 (n = 94) were BlastN searched (Altschul et al. 1997
) against the strains K12 and O157:H7 in order to find functional homologs for comparison (E value < 105). The DNA sequences of the corresponding spacers preceding these ORFs were extracted, and the "DNA Master" (http://cobamide2.bio.pitt.edu/computer.htm), developed by J. G. Lawrence, was used to calculate a score indicative of
70 promoter strength. The score is obtained by calculating the geometric mean of DNA similarity at the 35 and 10 regions to the E coli consensus sequence TTGACA-TATAAT, separated by a 15- to 19-bp segment. Deviations from the optimal 17-bp segment are weighed 10% of the score given by the two binding regions. Promoter scores were very similar for functional homologs of different E. coli strains (see additional fig. 3 in supplementary file, Supplementary Material online), indicating that the promoter consensus sequence is similar across strains and that the same genes have similar scores.
Spacer length of both pseudogenes and their corresponding functional homologs was measured as the number of nucleotides between their annotated start codons and the end of the preceding genes.
Supplementary Material
Supplementary table 1 and supplementary figures 2 and 3 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Y. Osada and J. G. Lawrence for kindly providing the requested software. A.M. is recipient of a Ramón y Cajal research contract from MCyT. Support from European Union project GEMINI (QLK3-CT-2002-02056) is also acknowledged. We are grateful to P. L. Valdés for help with statistical analysis and three anonymous referees for constructive comments that improved the manuscript.
| Footnotes |
|---|
Jennifer Wernegreen, Associate Editor
| References |
|---|
|
|
|---|
Achtman, M., K. Zurth, G. Morelli, G. Torrea, A. Guiyoule, and E. Carniel. 1999. Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc. Natl. Acad. Sci. USA. 96:1404314048.
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:33893402.
Andersson, J. O., and S. G. Andersson. 1999. Insights into the evolutionary process of genome degradation. Curr. Opin. Genet. Dev. 9:664671.[CrossRef][ISI][Medline]
Babu, M. M. 2003. Did the loss of sigma factors initiate pseudogene accumulation in M. leprae? Trends Microbiol. 11:5961.[CrossRef][ISI][Medline]
Cole, S. T., K. Eiglmeier, J. Parkhill et al. (41 co-authors). 2001. Massive gene decay in the leprosy bacillus. Nature 409:10071011.[CrossRef][Medline]
Dunn, J. J., E. Buzash-Pollert, and F. W. Studier. 1978. Mutations of bacteriophage T7 that affect initiation of synthesis of the gene 0.3 protein. Proc. Natl. Acad. Sci. USA 75:27412745.
Gomez-Valero, L., A. Latorre, and F. J. Silva. 2004. The evolutionary fate of nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol. Biol. Evol. 21:21722181.
Knudsen, B., J. Wower, C. Zwieb, and J. Gorodkin. 2001. tmRDB (tmRNA database). Nucleic Acids Res. 29:171172.
Lerat, E., and H. Ochman. 2004. Psi-Phi: exploring the outer limits of bacterial pseudogenes. Genome Res. 14:22732278.
Maaloe, O. 1979. Regulation of the protein-synthesizing machinery in ribosomes, tRNA, factors and so on. Pp. 487542 in R. F. Goldberger, ed. Biological regulation and development, Vol. 1. Plenum Press, New York.
Mira, A., H. Ochman, and N. A. Moran. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17:589596.[CrossRef][ISI][Medline]
Osada, Y., R. Saito, and M. Tomita. 1999. Analysis of base-pairing potentials between 16S rRNA and 5' UTR for translation initiation in various prokaryotes. Bioinformatics 15:578581.
Parkhill, J., G. Dougan, K. D. James et al. (38 co-authors). 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848852.[CrossRef][Medline]
Pupo, G. M., R. Lan, and P. R. Reeves. 2000. Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc. Natl. Acad. Sci. USA 97:1056710572.
Rogozin, I. B., K. S. Makarova, D. A. Natale, A. N. Spiridonov, R. L. Tatusov, Y. I. Wolf, J. Yin, and E. V. Koonin. 2002. Congruent evolution of different classes of non-coding DNA in prokaryotic genomes. Nucleic Acids Res. 30:42644271.
Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71:13421346.
Stormo, G. D. 2000. DNA binding sites: representation and discovery. Bioinformatics 16:1623.
Turner, D. H., N. Sugimoto, J. A. Jaeger, C. E. Longfellow, S. M. Freier, and R. Kierzek. 1987. Improved parameters for prediction of RNA structure. Cold Spring Harb. Symp. Quant. Biol. 52:123133.[ISI][Medline]
van Ham, R. C., J. Kamerbeek, C. Palacios et al. (16 co-authors). 2003. Reductive genome evolution in Buchnera aphidicola. Proc. Natl. Acad. Sci. USA 100:581586.
Wei, J., M. B. Goldberg, V. Burland et al. (14 co-authors). 2003. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T. Infect. Immun. 71:27752786.
Withey, J. H., and D. I. Friedman. 2003. A salvage pathway for protein structures: tmRNA and trans-translation. Annu. Rev. Microbiol. 57:101123.[CrossRef][ISI][Medline]
Wosten, M. M. 1998. Eubacterial sigma-factors. FEMS Microbiol. Rev. 22:127150.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
L. D. Alcaraz, G. Olmedo, G. Bonilla, R. Cerritos, G. Hernandez, A. Cruz, E. Ramirez, C. Putonti, B. Jimenez, E. Martinez, et al. The genome of Bacillus coahuilensis reveals adaptations essential for survival in the relic of an ancient marine environment PNAS, April 15, 2008; 105(15): 5803 - 5808. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Bentley, C. Corton, S. E. Brown, A. Barron, L. Clark, J. Doggett, B. Harris, D. Ormond, M. A. Quail, G. May, et al. Genome of the Actinomycete Plant Pathogen Clavibacter michiganensis subsp. sepedonicus Suggests Recent Niche Adaptation J. Bacteriol., March 15, 2008; 190(6): 2150 - 2160. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith Proteogenomics: needs and roles to be filled by proteomics in genome annotation Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





