Skip Navigation


MBE Advance Access originally published online on December 20, 2005
Molecular Biology and Evolution 2006 23(3):479-481; doi:10.1093/molbev/msj076
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/3/479    most recent
msj076v2
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by DeBarry, J. D.
Right arrow Articles by McDonald, J. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DeBarry, J. D.
Right arrow Articles by McDonald, J. F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letter

The Contribution of LTR Retrotransposon Sequences to Gene Evolution in Mus musculus

Jeremy D. DeBarry, Eric W. Ganko1, Eugene M. McCarthy and John F. McDonald2

Department of Genetics, University of Georgia

E-mail: john.mcdonald{at}biology.gatech.edu.


    Abstract
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 
Approximately 1.5% of mouse genes (Mus musculus) contain long terminal repeat retrotransposon sequences (LRS). Consistent with earlier findings in Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens, LRS are more likely to be associated with newly evolved genes. Evidence is presented that LRS are often recruited as novel exons or as spliced additions to existing exons. These novel gene configurations may be expressed initially as alternative transcripts providing an opportunity for the evolution of new gene function.

Key Words: LTR retrotransposon • gene evolution • Mus musculus • mouse


    Background
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 
Once considered parasitic sequences of little or no functional significance (Doolittle and Sapienza 1980Go; Orgel and Crick 1980Go; Charlesworth, Sniegowski, and Stephan 1994Go), transposable elements (TE) are now widely recognized as significant contributors to gene evolution (McDonald 1993Go; Brosius 1999Go; Kidwell and Lisch 2001Go; Bowen et al. 2003Go; Makalowski 2003Go; Kazazian 2004Go). It has recently been reported that retrotransposon sequences contribute to ~4% of protein-coding regions (Nekrutenko and Li 2001Go), ~27% of untranslated regions (van de Lagemaat et al. 2003Go), and ~25% of promoter regions (Jordan et al. 2003Go) of human genes. We recently reported that long terminal repeat retrotransposon sequences (LRS) (defined as full-length elements or fragments of full-length elements) are present within the regulatory region and/or the transcription boundaries of 0.6% of Caenorhabditis elegans genes (Ganko et al. 2003Go) and 1.8% of Drosophila genes. (Ganko et al. 2006Go). Here we report on the contribution of LRS to transcribed regions of genes in the sequenced genome of the mouse (Mus musculus).


    LRS Are Components of Many Mouse Genes
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 
Recently, 21 families of long terminal repeat (LTR) retrotransposons have been identified in the mouse genome, including 13 not previously described (McCarthy and McDonald 2004Go). Genes with at least one fully sequenced mRNA from the mouse Unigene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene) were Blasted against consensus sequences representing each of these 21 families of mouse LTR retrotransposons. Of the 18,374 Unigenes examined, 267 (~1.5%) were found to contain LRS. It has been reported that ~0.9% of genes in humans contain LRS (Nekrutenko and Li 2001Go). The higher number of mouse genes containing LRS may, in part, be due to the greater number of transpositionally active LTR retrotransposons in the mouse genome (Smit 1999Go; Waterston et al. 2002Go; Deininger et al. 2003Go).

As a rule, genes involved in basic cellular functions are relatively conserved across taxa, while more recently evolved, specialized genes are taxa specific (e.g., van de Lagemaat et al. 2003Go; Castillo-Davis et al. 2004Go). To determine if mouse LRS are more likely to be associated with newly evolved genes, we queried the ortholog information provided by the Homologene data set (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene). Homologenes are genes associated with functions that are generally conserved among even phylogenetically diverse groups of species (Wheeler et al. 2003Go). Of the 18,374 Unigenes examined, 11,341 had a homolog assigned by the Homologene data set, 88 (~0.8%) of these contained an LRS. Thus, LRS are associated with homologenes about half as frequently as they are with all mouse genes.


    Many LRS Are Located in Mouse Exons
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 
Our results indicate LRS are located within the coding regions of many mouse genes. Such associations are believed to have arisen directly by insertion of an LRS into an existing exon or indirectly by exon recruitment of an LRS from an adjacent intron or untranslated leader region (ULR) (Nekrutenko and Li 2001Go). To estimate the number of LRS located within the coding regions of mouse genes, a database of mouse exonic sequences was obtained from Ensembl (http://www.ensembl.org/Multi/martview). If the predicted transcripts or proteins display significant similarity to species-specific Swiss-Prot, RefSeq, or TrEMBL entries, Ensembl classifies them as "known" otherwise they are classified as "novel" genes. At the time of this analysis (September 2004), the Ensembl database was composed of 25,307 mouse genes. Of these, 20,166 were subclassified as known and 5,141 as novel genes. Many of the novel genes in the Ensembl database are unannotated. Thus, as a precaution against overestimating the contribution of LRS to mouse gene evolution, we limited analysis to well-annotated genes (Ensembl known genes). The 20,166 annotated (known) genes in the Ensembl data set contain 186,823 exons. A total of 239 of these genes are associated with LRS located in 263 exons. Ten of these exons have two independently inserted LRS, and one has three (i.e., 275 associations). We found 22 of these 275 associations (22/275 or 8.0%) were composed of ≥95% LRS. Exons composed almost exclusively of LRS are considered to have been recruited as novel exons from LRS located in adjacent introns or ULRs (e.g., Nekrutenko and Li 2001Go). Of the remaining LRS associated with exons, 55 (55/275 or 20.0%) were located at the 5' or 3' exon boundaries. LRS located at exon boundaries are considered to have been added to preexisting exons due to the presence of appropriate splice acceptor or donor sites (e.g., Nekrutenko and Li 2001Go). The remaining 198 (72.0%) LRS associated with exons, including the 11 exons containing more than one LRS, are located within the body of the exon. The origins of these LRS are more difficult to reconstruct, but some are likely to represent insertions into preexisting exons.

The addition of LRS to preexisting exons may be expected to disrupt gene function and be eliminated by natural selection. One possible hypothesis to explain the maintenance of relatively high number of LRS associated with exons is they are tolerated by natural selection at loci encoding multiple alternative transcripts. Under such a scenario, an inserted sequence would, due to the presence of appropriate splice acceptor/donor sites, be associated with generation of one or more novel alternative transcripts while the native transcript maintains the original gene function. Over evolutionary time, a novel transcript containing the LRS may evolve to encode a function favored by natural selection and thus be selectively maintained in conjunction with, or in place of, the original transcript. Under this hypothesis, alternative transcripts generated by TE insertions may provide an opportunity for the evolution of new gene functions in a manner similar to what has been proposed for gene duplications (Ohno 1970Go). Consistent with this view, we found that those mouse genes confirmed to encode alternative transcripts rarely contained LRS in all transcripts (table 1).


View this table:
[in this window]
[in a new window]
 
Table 1 LRS-Associated Mouse Genes Encoding Multiple Transcripts Rarely Contain LRS in All Alternative Transcripts

 

    LRS Are Preferentially Associated with Genes Encoding Metabolic Functions
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 
Functional information from the Gene Ontology (GO) Consortium (http://www.geneontology.org/) was used to investigate possible functional trends among genes associated with LRS. GO networks are composed of three main functional classifications: molecular function, cellular component, and biological process. A number of subclasses are listed under each of these classifications. Based on the observed frequency of all experimentally verified genes in the Unigene database that group under each of the GO classifications, we computed the number of LRS-associated genes expected to group under each GO classification. This expected number was compared with the observed number to identify significant differences within the subclasses of each main classification. Only subclasses within the "biological process" classification demonstrated a significant difference between observed and expected numbers of associations (chi square = 30.05, df = 6, P ≤ 0.025).

The results presented in table 2 show mouse genes grouped under the biological process classification that encode physiological functions associated with LRS more frequently (251 observed, 0.68 success probability, 307 trials, P ≤ 0.025), while genes encoding cellular processes are associated with LRS less frequently (39 observed, 0.24 success probability, 307 trials, P ≤ 0.025) than expected. Significant deviations from expected numbers were also observed in two additional subclasses of physiological function. Genes associated with LRS that encode cell growth and maintenance functions are less frequent, while LRS-associated genes encoding metabolic functions were more frequent than expected.


View this table:
[in this window]
[in a new window]
 
Table 2 Biological Process GO Terms for Mouse Genes Associated with LRS

 

    Summary and Conclusions
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 
Consistent with earlier studies of Homo sapiens (Nekrutenko and Li 2001Go), Drosophila (Ganko et al. 2006Go), and C. elegans (Ganko et al. 2003Go) genomes, we found LRS are contained within the coding regions of a significant proportion of mouse genes. Also consistent with earlier findings (Waterston et al. 2002Go; van de Lagemaat et al. 2003Go), our results indicate LRS may be preferentially associated with more recently evolved genes. The mechanisms by which LRS have been incorporated into genes over evolutionary time are likely to be varied and complex. However, our results suggest the recruitment of LRS as novel or spliced additions to existing exons is likely a primary mechanism. We propose that these novel gene configurations may be expressed initially as alternative transcripts, providing an opportunity for the evolution of new gene functions.


    Footnotes
 
1 Present address: Department of Biology, University of North Carolina. Back

2 Present address: School of Biology, Georgia Institute of Technology. Back

Pierre Capy, Associate Editor


    References
 TOP
 Abstract
 Background
 LRS Are Components of...
 Many LRS Are Located...
 LRS Are Preferentially...
 Summary and Conclusions
 References
 

    Bowen, N. J., I. K. Jordan, J. A. Epstein, V. Wood, and H. L. Levin. 2003. Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res. 13:1984–1997.[Abstract/Free Full Text]

    Brosius, J. 1999. Genomes were forged by massive bombardments with retroelements and retrosequences. Genetica 107:209–238.[CrossRef][ISI][Medline]

    Castillo-Davis, C. I., F. A. Kondrashov, D. L. Hartl, and R. J. Kulathinal. 2004. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 14:802–811.[Abstract/Free Full Text]

    Charlesworth, B., P. Sniegowski, and W. Stephan. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371:215–220.[CrossRef][Medline]

    Deininger, P. L., J. V. Moran, M. A. Batzer, and H. H. Kazazian Jr. 2003. Mobile elements and mammalian genome evolution. Curr. Opin. Genet. Dev. 13:651–658.[CrossRef][ISI][Medline]

    Doolittle, W. F., and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284:601–603.[CrossRef][Medline]

    Ganko, E. W., V. Bhattacharjee, P. Schliekelman, and J. F. McDonald. 2003. Evidence for the contribution of LTR retrotransposons to C. elegans gene evolution. Mol. Biol. Evol. 20:1925–1931.[Abstract/Free Full Text]

    Ganko, E. W., C. S. Greene, J. A. Lewis, V. Bhattacharjee, and J. F. McDonald. 2006. LTR retrotransposon-gene associations in Drosophila melanogaster. J. Mol. Evol. (in press).

    Jordan, I. K., I. B. Rogozin, G. V. Glazko, and E. V. Koonin. 2003. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19:68–72.[CrossRef][ISI][Medline]

    Kazazian, H. H. Jr. 2004. Mobile elements: drivers of genome evolution. Science 303:1626–1632.[Abstract/Free Full Text]

    Kidwell, M. G., and D. R. Lisch. 2001. Perspective: transposable elements, parasitic DNA, and genome evolution. Evol. Int. J. Org. Evol. 55:1–24.

    Makalowski, W. 2003. Genomics. Not junk after all. Science 300:1246–1247.

    McCarthy, E. M., and J. F. McDonald. 2004. Long terminal repeat retrotransposons of Mus musculus. Genome Biol. 5:R14.[CrossRef][Medline]

    McDonald, J. F. 1993. Evolution and consequences of transposable elements. Curr. Opin. Genet. Dev. 3:855–864.[CrossRef][Medline]

    Nekrutenko, A., and W. H. Li. 2001. Transposable elements are found in a large number of human protein-coding genes. Trends Genet. 17:619–621.[CrossRef][ISI][Medline]

    Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York.

    Orgel, L. E., and F. H. C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284:604–607.[CrossRef][Medline]

    Smit, A. F. 1999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9:657–663.[CrossRef][ISI][Medline]

    van de Lagemaat, L. N., J. R. Landry, D. L. Mager, and P. Medstrand. 2003. Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 19:530–536.[CrossRef][ISI][Medline]

    Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (222 co-authors). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.[CrossRef][Medline]

    Wheeler, D. L., D. M. Church, S. Federhen et al. (11 co-authors). 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31:28–33.[Abstract/Free Full Text]

Accepted for publication December 12, 2005.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
B. Xue, A. P. Rooney, M. Kajikawa, N. Okada, and W. L. Roelofs
Novel sex pheromone desaturases in the genomes of corn borers generated through gene duplication and retroposon fusion
PNAS, March 13, 2007; 104(11): 4467 - 4472.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
23/3/479    most recent
msj076v2
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by DeBarry, J. D.
Right arrow Articles by McDonald, J. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DeBarry, J. D.
Right arrow Articles by McDonald, J. F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?