Skip Navigation


MBE Advance Access originally published online on April 29, 2007
Molecular Biology and Evolution 2007 24(8):1579-1581; doi:10.1093/molbev/msm082
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/8/1579    most recent
msm082v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letters

On the Incidence of Intron Loss and Gain in Paralogous Gene Families

Scott William Roy and David Penny

Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand

E-mail: scottwroy{at}gmail.com.


    Abstract
 TOP
 Abstract
 Introduction
 References
 
Understanding gene duplication and gene structure evolution are fundamental goals of molecular evolutionary biology. A previous study by Babenko et al. (2004. Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res. 32:3724–3733) employed Dollo parsimony to infer spliceosomal intron losses and gains in paralogous gene families and concluded that there was a general excess of gains over losses. This result contrasts with patterns in orthologous genes, in which most lineages show an excess of intron losses over gains, suggesting the possibility of fundamentally different modes of intron evolution between orthologous and paralogous genes. We further studied the data and found a low level of intron position conservation with outgroups, and this led to problems with using Dollo parsimony to analyze the data. Statistical reanalysis of the data suggests, instead, that intron losses have outnumbered intron gains in paralogous gene families.

Key Words: gene duplication • gene families • genome evolution • parsimony • statistical inference


    Introduction
 TOP
 Abstract
 Introduction
 References
 
Two ongoing mechanisms of genomic and functional diversification in eukaryotes are duplication of genes and the gain and loss of spliceosomal introns, genomic sequences that are excised from RNA transcripts. A wealth of recent work has elucidated the evolution of spliceosomal introns through eukaryotic evolution; however the ultimate causes and timing of intron origin remain the subject of debate (Fedorov et al. 2003Go; Fedorov and Fedorova 2004Go, 2006Go; Collins and Penny 2005; Martin and Koonin 2006Go; Wang, Feng, and Niu 2007Go). The numbers and positions of introns vary across homologous genes, implying the occurrence of intron loss and/or gain through evolution (Perler et al. 1980Go; Federov, Merican, and Gilbert 2002; Rogozin et al. 2003Go). Over the past few years, a growing body of work has shown that intron loss is a major mechanism of genome evolution, and that intron losses tend to outnumber intron gains in orthologous genes, a pattern observed across a wide array of diverse lineages (e.g., Robertson 1998Go; Roy, Fedorov, and Gilbert 2003Go; Mourier and Jeffares 2003Go; Niu et al. 2004; Lin and Zhang 2005Go; Lin et al. 2006Go; Roy and Gilbert 2005a; Stajich and Dietrich 2006Go; Roy and Hartl 2006Go; Roy and Penny 2007Go).

Interestingly, then, the largest study to date of intron evolution in gene duplicates came to the opposite conclusion. Babenko et al. (2004)Go studied intron evolution in lineage-specific expansions (LSEs; genes that have undergone one or more duplications following divergence from the other studied species; see example in figure 1) in six eukaryotic lineages. For each LSE they aligned paralogous genes with orthologous genes from available outgroups and mapped intron positions onto the alignment to identify shared intron positions. Using Dollo parsimony to infer intron loss/gain for each intron position (fig. 1), they reported an excess of gains over losses. This result stands as the most important large-scale analysis to find considerably more gain than loss. Intriguingly, this pattern suggests that the pronounced differences between the evolution of orthologous and paralogous genes (e.g., Katju and Lynch 2003Go; Kopelman et al. 2005Go; Wang, Yu, and Long 2004Go) include intron–exon structure evolution.


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Hypothetical example of a lineage-specific expansion (LSE). The LSE consists of four paralogs in one species (LSE genes 1–4) due to three gene duplications since the divergence from outgroup species (OUTG SP 1–3). There are six observed intron positions (introns A–F), with intron presence/absence data (+/–) for each homolog. We consider the second gene duplication (oval), and estimate numbers of intron losses and gains in the directly descendent branches (bold lines). Dollo parsimony infers one intron loss (intron C) and two gains (E and F). Intron D represent an event that occurred on a later branch (after the divergence of LSE genes 1 and 2); intron B represents at least two events, one before the duplication in question and one after the divergence of LSE genes 1 and 2. Among the two intron positions shared between both descendent lineages and thus likely present at the time of duplication (A and B), only one is present in an outgroup (A). This raises the possibility that apparent gains (E and F) may also have been present at the gene duplication and instead represent real intron losses. Employing the method described in the text, we have l2= 2 (introns A and B), l2A= 1 (A), l1= 3 (C, E and F), and l1A= 1 (C). Thus we estimate that there are roughly two losses (=l1Al2/l2A) and one gain (l1l1Al2/l2A).

 
However, accumulating evidence for frequent intron loss suggests that Dollo parsimony may have trouble distinguishing intron loss and gain (Roy and Gilbert 2005bGo; Roy and Penny 2006Go). The data from the original paper (ftp://ftp.ncbi.nih.gov/koonin/intron_evolution/LSEs/) are summarized in table 1, and compare with the illustrative example given in figure 1. Dollo parsimony will only be accurate if intron presence/absence in available outgroups reflects presence/absence at the time of gene duplication. Intron positions shared between both descendent lineages of a gene duplication (e.g., introns A and B in figure 1) very likely represent introns present at the time of duplication. The presence of such introns in outgroups thus represents an important test of the utility of Dollo parsimony. Of a total 3,809 such shared intron positions across 1,421 gene duplications in the data set, nearly half (45.9%) are not represented in any studied outgroups (available outgroup species as well as possible additional gene copies from the LSE; intron B in figure 1 is an example of such an intron). Thus Dollo parsimony frequently fails to infer correctly intron presence/absence at the time of duplication. This failure of outgroups is expected to lead to many actual intron losses' being misidentified as intron gains: indeed, for the 45.5% gene duplications for which there are no intron positions shared between both descendent lineages as well as outgroup(s), parsimony infers 4.1 gains per loss (101/248), compared to 2.4 (176/74) for duplications with one such shared intron and 1.2 (142/121) for duplications with more than one shared intron.


View this table:
[in this window]
[in a new window]

 
Table 1 Summary of the Data

 
How can we correct for this? Consider a pair of duplicates (which may either represent terminal branches or internal branches that themselves bifurcate at later duplication events; figure 1). For each duplication event, we first estimate P, the probability that an intron present at the time of duplication is represented in an available outgroup. If there are l2 total intron positions that are shared between (descendents of) both duplicates, of which l2A are represented in an outgroup sequence, this suggests that P is around Formula = l2A/l2. Now, if there are L introns that were present at the duplication event and subsequently lost, we likewise expect that only roughly a fraction p would be present in outgroups. If there are l1A introns shared between exactly one descendent branch and outgroup(s) (i.e., losses inferred by parsimony), we have Formula = l1A; thus we estimate Formula = l1A/Formula= l1Al2/l2A. If there are l1 total introns that are present in descendents of only one duplicate (both those present and those absent in outgroups), the total estimated number of gains is thus l1 minus the estimated number of losses: l1l1Al2/l2A.

We applied this method to the data (Table 2). Contrary to the previous finding, we estimated that there are more total intron losses than gains (280 versus 233 in the entire data set; table 1). Losses are estimated to have outnumbered gains in three of five lineages, and to have been roughly equal to gains in a fourth.


View this table:
[in this window]
[in a new window]

 
Table 2 Estimated Intron Losses and Gains in Lineage Specific

 
However, we think that this measure, too, is likely to be biased toward intron gain. Consider a pair of gene duplicates with two shared introns, each with a p probability of being represented in outgroups. According to the binomial distribution, the probability that 0, 1, or 2 introns will be represented in outgroups is (1–p)2, 2p(1–p), and p2, respectively. If 0 are represented in outgroups, the gene will be excluded as uninformative. Thus the probabilities of 1 or 2 introns being present in the outgroup, given that at least one is, are 2(1–p)/(2–p) and p/(2–p), respectively. If one or two introns are present in outgroups, we estimate Formula is 0.5 or 1, respectively. Thus on average we estimate Formula = 0.5 X 2(1–p)/(2–p) + 1 X p/(2–p) = 1/(2–p), which is larger than p since 1/(2–p)– p = (1–p)2/(2–p) > 0. In general for a gene with n introns, we will overestimate Formula by on average p(1–p)n/[1–(1–p)n]. This overestimate of p will cause us to underestimate the number of intron losses that are incorrectly inferred by parsimony to be intron gains, and thus it will lead us to under/overestimate intron losses/gains.

The case is worst for genes with a single intron shared between duplicates: either the intron is absent in outgroups (in which case the duplication is not considered) or it is present (in which case we estimate Formula = 1, leading to an overall estimate of Formula = 1). The bias toward inference of intron gain was confirmed by simulations (data not shown). Correspondingly, the overall estimated ratio of intron losses to gains jumps from 1.2 (280/233) to 1.7 (228/134) when duplications with only a single shared intron (i.e., l2A = 1) are excluded, and to 2.0 (174/87) among cases with l2A > 2.

These results suggest that paralogous genes, like orthologous genes, have experienced an excess of intron loss over intron gain over most lineages. Further studies employing more (and more closely related) species, and accounting for the possibility of homoplastic intron insertion, will be necessary to finally resolve the issue. These results provide a further cautionary example in using parsimony in directionalizing intron loss/gain events, and underscore the importance of using more sophistocated statistical methods and/or more closely related species for accurate inferences about genome evolution.


    Footnotes
 
Kenneth Wolfe, Associate Editor


    References
 TOP
 Abstract
 Introduction
 References
 

    Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. Prevalance of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res (2004) 32:3724–3733.[Abstract/Free Full Text]

    Csurös M. Likely scenarios of intron evolution. In: Third RECOMB Satellite Workshop on Comparative Genomics (2005) 47–60. Springer LNCS 3678.

    Fedorov A, Fedorova L. Introns: mighty elements from the RNA world. J Mol Evol (2004) 59:718–721.[CrossRef][Web of Science][Medline]

    Fedorov A, Fedorova L. Where is the difference between the genomes of humans and annelids? Genome Biol (2006) 7:203.[CrossRef][Medline]

    Fedorov A, Merican AF, Gilbert W. Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci USA (2002) 99:16128–16133.[Abstract/Free Full Text]

    Fedorov A, Roy S, Fedorova L, Gilbert W. Mystery of intron gain. Genome Res (2003) 13:2236–2241.[Abstract/Free Full Text]

    Katju V, Lynch M. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics (2003) 165:1793–1803.[Abstract/Free Full Text]

    Kopelman NM, Lancet D, Yanai I. Alternative splicing and gene duplication are inversely correlated evolutionary mechanisms. Nat Genet (2005) 37:588–589.[CrossRef][Web of Science][Medline]

    Lin H, Zhu W, Silva J, Gu X, Buell CR. Intron gain and loss in segmentally duplicated genes in rice. Genome Biol (2006) 7:R41.[CrossRef][Medline]

    Lin K, Zhang D-Y. The excess of 5' introns in eukaryotic genomes. Nucl Acids Res (2005) 33:6522–6527.[Abstract/Free Full Text]

    Martin W, Koonin EV. Introns and the origin of nucleus-cytosol compartmentalization. Nature (2006) 440:41–45.[CrossRef][Medline]

    Mourier T, Jeffares DC. Eukaryotic intron loss. Science (2003) 300:1393.[Free Full Text]

    Nguyen H, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol (2005) 1:e79.[CrossRef][Medline]

    Niu DK, Hou WR, Li SW. mRNA-mediated intron losses: evidence from extraordinarily large exons. Mol Biol Evol (2005) 22:1475–1481.[Abstract/Free Full Text]

    Perler F, Efstratiadis A, Lomedico P, Gilbert W, Kolodner R, Dodgson J. The evolution of genes: the chicken preproinsulin gene. Cell (1980) 20:555–566.[CrossRef][Web of Science][Medline]

    Raible F, Tessmar-Raible K, Osoegawa K, et al. Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science (2005) 310:1325–1326.[Abstract/Free Full Text]

    Robertson HM. Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. Genome Res (1998) 8:449–463.[Abstract/Free Full Text]

    Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol (2003) 13:1512–1517.[CrossRef][Web of Science][Medline]

    Roy SW, Fedorov A, Gilbert W. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA (2003) 100:7158–7162.[Abstract/Free Full Text]

    Roy SW, Gilbert W. The pattern of intron loss. Proc Natl Acad Sci USA (2005a) 102:713–718.[Abstract/Free Full Text]

    Roy SW, Gilbert W. Complex early genes. Proc Natl Acad Sci USA (2005b) 102:1986–1991.[Abstract/Free Full Text]

    Roy SW, Hartl DL. Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number. Genome Res (2006) gr.4845406.

    Roy SW, Penny D. Smoke without fire: most reported cases of intron gain in nematodes instead reflect intron losses. Mol Biol Evol (2006) 23:229–2262.

    Roy SW, Penny D. Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol (2007) 24:171–181.[Abstract/Free Full Text]

    Stajich JE, Dietrich FS. Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans. Eukaryotic Cell (2006) 5:789–793.[Abstract/Free Full Text]

    Wang HF, Feng L, Niu D-K. Relationship between mRNA stability and intron presence. Biochem Biophys Res Commun (2007) [epub].

    Wang W, Yu H, Long M. Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species. Nat Genet (2004) 36:523–527.[CrossRef][Web of Science][Medline]

Accepted for publication April 10, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
M. D. Wilkerson, Y. Ru, and V. P. Brendel
Common introns within orthologous genes: software and application to plants
Brief Bioinform, November 1, 2009; 10(6): 631 - 644.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Pavesi, F. Zambelli, C. Caggese, and G. Pesole
Exalign: a new method for comparative analysis of exon-intron gene structures
Nucleic Acids Res., May 1, 2008; 36(8): e47 - e47.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Irimia and S. W. Roy
Spliceosomal introns as tools for genomic and evolutionary analysis
Nucleic Acids Res., March 1, 2008; 36(5): 1703 - 1712.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/8/1579    most recent
msm082v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Roy, S. W.
Right arrow Articles by Penny, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?