MBE Advance Access originally published online on April 29, 2007
Molecular Biology and Evolution 2007 24(8):1579-1581; doi:10.1093/molbev/msm082
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letters |
On the Incidence of Intron Loss and Gain in Paralogous Gene Families
Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
E-mail: scottwroy{at}gmail.com.
| Abstract |
|---|
|
|
|---|
Understanding gene duplication and gene structure evolution are fundamental goals of molecular evolutionary biology. A previous study by Babenko et al. (2004. Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res. 32:3724–3733) employed Dollo parsimony to infer spliceosomal intron losses and gains in paralogous gene families and concluded that there was a general excess of gains over losses. This result contrasts with patterns in orthologous genes, in which most lineages show an excess of intron losses over gains, suggesting the possibility of fundamentally different modes of intron evolution between orthologous and paralogous genes. We further studied the data and found a low level of intron position conservation with outgroups, and this led to problems with using Dollo parsimony to analyze the data. Statistical reanalysis of the data suggests, instead, that intron losses have outnumbered intron gains in paralogous gene families.
Key Words: gene duplication gene families genome evolution parsimony statistical inference
| Introduction |
|---|
|
|
|---|
Two ongoing mechanisms of genomic and functional diversification in eukaryotes are duplication of genes and the gain and loss of spliceosomal introns, genomic sequences that are excised from RNA transcripts. A wealth of recent work has elucidated the evolution of spliceosomal introns through eukaryotic evolution; however the ultimate causes and timing of intron origin remain the subject of debate (Fedorov et al. 2003
Interestingly, then, the largest study to date of intron evolution in gene duplicates came to the opposite conclusion. Babenko et al. (2004)
studied intron evolution in lineage-specific expansions (LSEs; genes that have undergone one or more duplications following divergence from the other studied species; see example in figure 1) in six eukaryotic lineages. For each LSE they aligned paralogous genes with orthologous genes from available outgroups and mapped intron positions onto the alignment to identify shared intron positions. Using Dollo parsimony to infer intron loss/gain for each intron position (fig. 1), they reported an excess of gains over losses. This result stands as the most important large-scale analysis to find considerably more gain than loss. Intriguingly, this pattern suggests that the pronounced differences between the evolution of orthologous and paralogous genes (e.g., Katju and Lynch 2003
; Kopelman et al. 2005
; Wang, Yu, and Long 2004
) include intron–exon structure evolution.
|
However, accumulating evidence for frequent intron loss suggests that Dollo parsimony may have trouble distinguishing intron loss and gain (Roy and Gilbert 2005b
|
How can we correct for this? Consider a pair of duplicates (which may either represent terminal branches or internal branches that themselves bifurcate at later duplication events; figure 1). For each duplication event, we first estimate P, the probability that an intron present at the time of duplication is represented in an available outgroup. If there are l2 total intron positions that are shared between (descendents of) both duplicates, of which l2A are represented in an outgroup sequence, this suggests that P is around
= l2A/l2. Now, if there are L introns that were present at the duplication event and subsequently lost, we likewise expect that only roughly a fraction p would be present in outgroups. If there are l1A introns shared between exactly one descendent branch and outgroup(s) (i.e., losses inferred by parsimony), we have
= l1A; thus we estimate
= l1A/
= l1Al2/l2A. If there are l1 total introns that are present in descendents of only one duplicate (both those present and those absent in outgroups), the total estimated number of gains is thus l1 minus the estimated number of losses: l1– l1Al2/l2A. We applied this method to the data (Table 2). Contrary to the previous finding, we estimated that there are more total intron losses than gains (280 versus 233 in the entire data set; table 1). Losses are estimated to have outnumbered gains in three of five lineages, and to have been roughly equal to gains in a fourth.
|
However, we think that this measure, too, is likely to be biased toward intron gain. Consider a pair of gene duplicates with two shared introns, each with a p probability of being represented in outgroups. According to the binomial distribution, the probability that 0, 1, or 2 introns will be represented in outgroups is (1–p)2, 2p(1–p), and p2, respectively. If 0 are represented in outgroups, the gene will be excluded as uninformative. Thus the probabilities of 1 or 2 introns being present in the outgroup, given that at least one is, are 2(1–p)/(2–p) and p/(2–p), respectively. If one or two introns are present in outgroups, we estimate
is 0.5 or 1, respectively. Thus on average we estimate
= 0.5 X 2(1–p)/(2–p) + 1 X p/(2–p) = 1/(2–p), which is larger than p since 1/(2–p)– p = (1–p)2/(2–p) > 0. In general for a gene with n introns, we will overestimate
by on average p(1–p)n/[1–(1–p)n]. This overestimate of p will cause us to underestimate the number of intron losses that are incorrectly inferred by parsimony to be intron gains, and thus it will lead us to under/overestimate intron losses/gains.
The case is worst for genes with a single intron shared between duplicates: either the intron is absent in outgroups (in which case the duplication is not considered) or it is present (in which case we estimate
= 1, leading to an overall estimate of
= 1). The bias toward inference of intron gain was confirmed by simulations (data not shown). Correspondingly, the overall estimated ratio of intron losses to gains jumps from 1.2 (280/233) to 1.7 (228/134) when duplications with only a single shared intron (i.e., l2A = 1) are excluded, and to 2.0 (174/87) among cases with l2A > 2.
These results suggest that paralogous genes, like orthologous genes, have experienced an excess of intron loss over intron gain over most lineages. Further studies employing more (and more closely related) species, and accounting for the possibility of homoplastic intron insertion, will be necessary to finally resolve the issue. These results provide a further cautionary example in using parsimony in directionalizing intron loss/gain events, and underscore the importance of using more sophistocated statistical methods and/or more closely related species for accurate inferences about genome evolution.
| Footnotes |
|---|
Kenneth Wolfe, Associate Editor
| References |
|---|
|
|
|---|
Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. Prevalance of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res (2004) 32:3724–3733.
Csurös M. Likely scenarios of intron evolution. In: Third RECOMB Satellite Workshop on Comparative Genomics (2005) 47–60. Springer LNCS 3678.
Fedorov A, Fedorova L. Introns: mighty elements from the RNA world. J Mol Evol (2004) 59:718–721.[CrossRef][Web of Science][Medline]
Fedorov A, Fedorova L. Where is the difference between the genomes of humans and annelids? Genome Biol (2006) 7:203.[CrossRef][Medline]
Fedorov A, Merican AF, Gilbert W. Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci USA (2002) 99:16128–16133.
Fedorov A, Roy S, Fedorova L, Gilbert W. Mystery of intron gain. Genome Res (2003) 13:2236–2241.
Katju V, Lynch M. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome. Genetics (2003) 165:1793–1803.
Kopelman NM, Lancet D, Yanai I. Alternative splicing and gene duplication are inversely correlated evolutionary mechanisms. Nat Genet (2005) 37:588–589.[CrossRef][Web of Science][Medline]
Lin H, Zhu W, Silva J, Gu X, Buell CR. Intron gain and loss in segmentally duplicated genes in rice. Genome Biol (2006) 7:R41.[CrossRef][Medline]
Lin K, Zhang D-Y. The excess of 5' introns in eukaryotic genomes. Nucl Acids Res (2005) 33:6522–6527.
Martin W, Koonin EV. Introns and the origin of nucleus-cytosol compartmentalization. Nature (2006) 440:41–45.[CrossRef][Medline]
Mourier T, Jeffares DC. Eukaryotic intron loss. Science (2003) 300:1393.
Nguyen H, Yoshihama M, Kenmochi N. New maximum likelihood estimators for eukaryotic intron evolution. PLoS Comput Biol (2005) 1:e79.[CrossRef][Medline]
Niu DK, Hou WR, Li SW. mRNA-mediated intron losses: evidence from extraordinarily large exons. Mol Biol Evol (2005) 22:1475–1481.
Perler F, Efstratiadis A, Lomedico P, Gilbert W, Kolodner R, Dodgson J. The evolution of genes: the chicken preproinsulin gene. Cell (1980) 20:555–566.[CrossRef][Web of Science][Medline]
Raible F, Tessmar-Raible K, Osoegawa K, et al. Vertebrate-type intron-rich genes in the marine annelid Platynereis dumerilii. Science (2005) 310:1325–1326.
Robertson HM. Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. Genome Res (1998) 8:449–463.
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV. Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol (2003) 13:1512–1517.[CrossRef][Web of Science][Medline]
Roy SW, Fedorov A, Gilbert W. Large-scale comparison of intron positions in mammalian genes shows intron loss but no gain. Proc Natl Acad Sci USA (2003) 100:7158–7162.
Roy SW, Gilbert W. The pattern of intron loss. Proc Natl Acad Sci USA (2005a) 102:713–718.
Roy SW, Gilbert W. Complex early genes. Proc Natl Acad Sci USA (2005b) 102:1986–1991.
Roy SW, Hartl DL. Very little intron loss/gain in Plasmodium: intron loss/gain mutation rates and intron number. Genome Res (2006) gr.4845406.
Roy SW, Penny D. Smoke without fire: most reported cases of intron gain in nematodes instead reflect intron losses. Mol Biol Evol (2006) 23:229–2262.
Roy SW, Penny D. Patterns of intron loss and gain in plants: intron loss-dominated evolution and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol (2007) 24:171–181.
Stajich JE, Dietrich FS. Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans. Eukaryotic Cell (2006) 5:789–793.
Wang HF, Feng L, Niu D-K. Relationship between mRNA stability and intron presence. Biochem Biophys Res Commun (2007) [epub].
Wang W, Yu H, Long M. Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species. Nat Genet (2004) 36:523–527.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Pavesi, F. Zambelli, C. Caggese, and G. Pesole Exalign: a new method for comparative analysis of exon-intron gene structures Nucleic Acids Res., May 1, 2008; 36(8): e47 - e47. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Irimia and S. W. Roy Spliceosomal introns as tools for genomic and evolutionary analysis Nucleic Acids Res., March 1, 2008; 36(5): 1703 - 1712. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

