Molecular Biology and Evolution 19:256-262 (2002)
© 2002 Society for Molecular Biology and Evolution
Extent of Gene Duplication in the Genomes of Drosophila, Nematode, and Yeast
*Department of Ecology and Evolution
Department of Statistics, University of Chicago
| Abstract |
|---|
|
|
|---|
We conducted a detailed analysis of duplicate genes in three complete genomes: yeast, Drosophila, and Caenorhabditis elegans. For two proteins belonging to the same family we used the criteria: (1) their similarity is
I (I = 30% if L
150 a.a. and I = 0.01n + 4.8L-0.32(1 + exp(-L/1000)) if L < 150 a.a., where n = 6 and L is the length of the alignable region), and (2) the length of the alignable region between the two sequences is
80% of the longer protein. We found it very important to delete isoforms (caused by alternative splicing), same genes with different names, and proteins derived from repetitive elements. We estimated that there were 530, 674, and 1,219 protein families in yeast, Drosophila, and C. elegans, respectively, so, as expected, yeast has the smallest number of duplicate genes. However, for the duplicate pairs with the number of substitutions per synonymous site (KS) < 0.01, Drosophila has only seven pairs, whereas yeast has 58 pairs and nematode has 153 pairs. After considering the possible effects of codon usage bias and gene conversion, these numbers became 6, 55, and 147, respectively. Thus, Drosophila appears to have much fewer young duplicate genes than do yeast and nematode. The larger numbers of duplicate pairs with KS < 0.01 in yeast and C. elegans were probably largely caused by block duplications. At any rate, it is clear that the genome of Drosophila melanogaster has undergone few gene duplications in the recent past and has much fewer gene families than C. elegans. | Introduction |
|---|
|
|
|---|
It has been proposed that gene duplication is the most important step for the origin of genetic novelties (Ohno 1970
The rate of gene duplication in a genome is also of great interest. This type of study is possible only when the whole genome data is available. Lynch and Conery (2000)
estimated the gene duplication rates in the yeast, Drosophila, and C. elegans genomes using the synonymous site changes (KS) as the time scale. However, it is well known that codon usage is highly biased in some genes in these organisms (Ikemura 1982
; Akashi, Kliman, and Eyre-Walker 1998
). A negative correlation between synonymous rate (KS) and strength of codon usage bias in Drosophila suggests that in some genes synonymous changes are not neutral (Sharp and Li 1989
; Moriyama and Hartl 1993
), though Dunn, Bielawski, and Yang (2001)
argued against the existence of this correlation. Therefore, the KS value might not reflect the real age of a gene duplication. A combination of KS and the genetic distances in intron and flanking regions might be more informative.
The relatively good quality of genomic sequences and concomitant annotation for yeast, Drosophila, and C. elegans make it possible for us to investigate the above questions in these genomes. However, the presence of same genes with different names and the existence of alternative splicing forms in the database make it difficult to study the extent of gene duplication in a genome. Moreover, retrotranscriptase (RT) and protein parts derived from repetitive elements (REs) might mislead the identification of homologous proteins. For these reasons, it is important to clean the database. In this paper, after carrying out a detailed cleaning procedure for the protein databases of these three genomes, we asked two questions: How many gene families are there in each genome? How often has gene duplication occurred in the recent past in each organism? We defined two simple homology criteria by improving the criterion adopted by Rost (1999)
. Using the new criteria for identifying homologous genes and the single-linkage algorithm for clustering, we estimated the number of gene families in each of the three genomes. The frequency of recent gene duplication was investigated by using gene pairs with a small KS. We excluded the gene pairs with possible gene conversion and codon usage bias by comparing KS with the genetic distance in intron and flanking regions. The effect of codon usage on KS in yeast was further studied.
| Materials and Methods |
|---|
|
|
|---|
The protein data sets were obtained from the following websites:
Caenorhabditis elegans: http://www.sanger.ac.uk/Projects/C_elegans/wormpep/ Wormpep release 40 was used. There were 19,730 protein sequences in the database, of which 48 did not have genomic position information and 22 did not have corresponding coding sequences (cds). We used the rest of the 19,660 protein sequences in our analysis.
Yeast: ftp://ncbi.nlm.nih.gov/genbank/genomes/S_cerevisiae/ We used the NCBI October 2000 version, which was part of the Reference Sequence (RefSeq) project. The annotation for this version was based on the Saccharomyces Genome Database in the Stanford genomic resources (SGD, http://genome-www.stanford.edu/Saccharomyces/). A total of 6,297 protein sequences were in the database and used in our analysis. Information for block duplications and gene pairs within the blocks was downloaded from the website http://www.gen.tcd.ie/khwolfe/ (Wolfe and Shields 1997
).
Drosophila: ftp://ncbi.nlm.nih.gov/genbank/genomes/D_melanogaster/ Release 2, October 2000 from NCBI was used. A total of 14,335 protein sequences were in the database.
The corresponding cds and genomic sequences for all three genomes were also downloaded from the above websites.
The data for each genome was processed as described below.
First Round Grouping
In each of the three genomes studied, every protein was used as the query to search against all other proteins in the database using FASTA (E = 10). Our criteria for two proteins to form a link (i.e., to be in the same family) are (1) the FASTA-alignable region between the two proteins should be longer than 80% of the longer protein, and (2) the identity between the two proteins (I) should be I
30% if the alignable region is longer than 150 a.a. and I
0.01n + 4.8L-0.32(1 + exp(-L/1000)) (Rost 1999
) if otherwise, where L is the alignable length between the two proteins. Rost's formula was derived from an empirical study, which suggests that a higher I value was needed for shorter proteins. We use n = 6, which makes the formula continuous at L = 150. We call it a hit if two proteins form a link. The single-linkage algorithm was used to group proteins into clusters, i.e., if protein A hits protein B and protein B hits protein C, we group proteins A, B, and C together, regardless of whether protein A hits protein C or not.
Cleaning of Same Genes with Different Names
Occasionally, more than one name is assigned to the same gene and these names are presented as different genes in the database. Such a situation can be detected by comparing their sequence coordinates. In each of such cases, only one copy was kept in the analysis.
Isoform Cleaning
Based on gene and exon annotation, we define that two genes are isoforms if their shared coding region is longer than 20% of the entire coding region of the shorter gene. We delete one of the two isoforms from the database as follows: Delete the shorter one if both are singletons; delete the one that is a singleton if the other belongs to a multigene family; delete the shorter one if the two isoforms form a two-member cluster; delete the shorter one if both are from the same gene family with more than two members and they have the same hits; and delete the one with fewer hits if they belong to the same cluster but their hits are not all the same. We keep both proteins if they belong to different multigene families.
RE Cleaning
Each protein was used as the query to search against the RE database for the same organism using FASTAX (E = 10-5). We delete the whole protein sequence from the database if the part of the protein hit by an RE is longer than 80% of the protein itself. We delete only the part that was hit by an RE if it is shorter than 80% of the protein.
Second Round Grouping
We repeat the steps in the first round grouping with the cleaned database. The new clusters are regarded as gene families.
Synonymous-Nonsynonymous Substitution Calculation
If two proteins are linked to each other, the FASTA-alignable regions of the two proteins are realigned using clustalW, and the corresponding coding regions of the genes are aligned based on the protein alignment. The number of substitutions per synonymous site (KS) and nonsynonymous site (KA) are calculated using PAML (Yang and Nielsen 2000
; the default parameters are used in the calculation).
Genetic Distance in Intron and Flanking Regions
Sequences of entire introns and flanking 150 bp of cds (both upstream and downstream) are extracted using gene annotation data. ClustalW is used to do the alignment, and genetic distances are calculated using Kimura's (1980)
two-parameter method when the divergence is less than 5%, but Tamura and Nei's (1993)
six-parameter method is used for more divergent sequences.
Codon Usage Bias Calculation
The effective number of codons (ENC) is calculated for each gene using the CodonW package (ftp://molbiol.ox.ac.uk/cu/codonW.tar.Z) and is used as a criterion for the strength of codon usage bias: the smaller the ENC, the stronger the codon usage bias.
The results of the above analysis were stored in a MySQL database.
| Results |
|---|
|
|
|---|
Database Cleaning
Table 1 shows the number of cases of a gene with different names and the number of isoforms that were cleaned from each database used. In the Drosophila database, more than 400 cases of a gene with different names were found, suggesting a relatively poor annotation. In yeast, isoforms were found mainly as singletons (i.e., they do not belong to protein families), whereas in Drosophila and C. elegans the majority of isoforms were found in protein families. The number of proteins hit by REs is also listed in table 1 . In Yeast and C. elegans, a large number of sequences deleted belong to the RT families. On the other hand, there are much fewer RTs in the current Drosophila protein database. It will be interesting to see whether the low number of RTs in Drosophila is real or is caused by incomplete genome sequencing. We note that in many cases the part of the protein derived from a RE is less than 80% of the protein (9 in yeast, 57 in Drosophila, and 251 in C. elegans). Many of these proteins belong to non-RT protein families, although they include part(s) derived from a RE. This observation suggests that REs play an important role in protein evolution (Brosius 1999
|
The dramatic effect of database cleaning is shown in table 2 . For example, for Drosophila, before the database was cleaned, there were 660 gene pairs with KS < 0.01 within gene families having fewer than six members, but this number was reduced to two after database cleaning; 413 pairs were the same gene with different names and most of the rest (245 pairs) were isoforms.
|
Number of Gene Families
The numbers of protein families were estimated to be 530, 674, and 1,219 in yeast, Drosophila, and C. elegans, respectively (table 3 ). Our estimates are much smaller than those of Rubin et al. (2000)
10-6 but did not require any minimal length of the alignable region). If we count one gene family as one unique gene type, the numbers of unique gene types are estimated to be 5,298, 11,460, and 14,077 in yeast, Drosophila, and C. elegans, respectively. A striking point is that the number of gene families in Drosophila is somewhat larger than that in yeast but much smaller than that in C. elegans, although the genome size and total protein number in the whole genome are similar in Drosophila and C. elegans but much larger than those in yeast. The number of unique gene types in Drosophila is more similar to that in C. elegans than that in yeast. The top five gene families for each organism are listed in table 4 . The largest multiple gene family in yeast is the seripauperin gene family, whereas it switches to trypsin and olfactory receptor gene families in Drosophila and C. elegans, respectively (table 4 ).
|
|
Gene Duplication Rates
Table 5 lists the number of duplicate genes with KS < 0.01. We compared KS with the genetic distances in intron and flanking regions. Those pairs with both the genetic distance in intron and flanking regions (both up and down stream 150 bp) larger than 0.02 were excluded from the table. It is interesting to note that there are only six pairs of duplicate genes in Drosophila that have KS < 0.01, whereas this number is 55 in yeast and 147 in C. elegans (35 in yeast if we exclude the putative helicase protein family). Assuming similar synonymous mutation rates in these three organisms and using the estimated rate in Drosophila (15.6 substitutions per site per 109 years; Li 1997
|
| Discussion |
|---|
|
|
|---|
Grouping Strategy
Gene family clustering is a difficult problem for two reasons. First, domain shuffling, which is a common mode for protein evolution, might mislead the clustering of two nonhomologous proteins into the same family because of the shared domain alone. Second, deciding whether two proteins are homologous becomes difficult when their sequence identity is low. In this paper, we improved the criterion adopted by Rost (1999)
Importance of Database Cleaning
We conducted an extensive database cleaning before we did protein family grouping. The presence of same genes with different names and the existence of isoforms (mainly from alternative splicing) in the database inflate the counts of protein families, whereas protein parts derived from REs may cause nonhomologous genes to be clustered into the same family. The database cleaning was found to be very necessary. For example, the number of protein families in the original Drosophila database was 1,094, but it was reduced to 674 after database cleaning. The selection of the database for study is also of great importance. The protein and related cds database in NCBI for C. elegans was used at the beginning of our study. However, it was found that the database was full of redundant protein sequences, many gene annotations were not consistent with cds sequences, and the translation of many cds were not the same with the corresponding protein sequences. We switched to Wormpep for its better quality.
Paucity of Young Duplicate Genes in Drosophila
It is interesting that the gene duplication rate is very low in Drosophila. This conclusion comes from two lines of evidence: (1) the number of gene pairs with a small KS (KS < 0.01) in Drosophila is much smaller than those in yeast and C. elegans (table 5
), and (2) the number of gene families in Drosophila is only somewhat larger than that in yeast but much smaller than that in C. elegans (table 3
), despite the fact that Drosophila has a genome size and protein number much larger than those in yeast and similar to those in C. elegans. However, this conclusion should be taken with caution because the sequencing strategy for Drosophila was different from the other two organisms. It is possible that the shotgun strategy used to sequence the Drosophila genome could not distinguish between very recently duplicated genes, which have KS and KA equal or close to 0. The fact that we still see some gene pairs with KS = KA = 0 in Drosophila genome and that the number of gene pairs with an intermediate KS is still much smaller in Drosophila than in the other two organisms (table 2
) suggests that the sequencing strategy might not be a major factor. In other words, Drosophila might indeed have had a much lower duplication rate than yeast and C. elegans, at least in the recent past.
It has been shown that block duplication occurs more frequently in yeast and C. elegans than in Drosophila (Friedman and Hughes 2001
). This might largely account for the higher duplication rates in yeast and C. elegans. For example, we investigated the most recent gene duplications in yeast and found that among gene pairs with KS < 0.01, there are at least eight duplication events involving more than one gene pair with KS < 0.01. The largest block has seven embedded gene pairs with KS < 0.01. There are other possible explanations. For example, a higher evolutionary rate for synonymous site, a higher deletion rate for duplicate genes, or a lower gene conversion rate in Drosophila than the other two organisms might lead to the observed results too. Further investigation is necessary for distinguishing among these possibilities.
Our estimates of gene pairs with KS < 0.01 in the three genomes are different from those of Lynch and Conery (2000)
(table 5 ). Lynch and Conery (2000)
found 10 pairs with KS < 0.01 in Drosophila but only six pairs are indicated in their current website. Two of these six pairs are the same as ours. In each of the remaining four pairs in our result, either one or both genes do not exist in Release 1 of the fly database from NCBI, which was used by Lynch and Conery (2000)
. Among the remaining four pairs in their result, there are two pairs for each of which either one or both genes do not exist in Release 2 of the fly database from NCBI, and for each of the two other pairs, the two proteins have very different lengths: 69 a.a. and 220 a.a. for one pair and 210 a.a. and 580 a.a. for the other. For yeast, if we count only the gene pairs in small gene families (size <6), as was done in Lynch and Conery (2000)
, there are 29 pairs with KS < 0.01 in our result (32 pairs before excluding gene pairs with both genetic distances in intron and flanking regions >0.02). Lynch and Conery found 32 pairs with KS < 0.01. For C. elegans we used WormPep40 from the Sanger Centre instead of the database from NCBI. There are 76 and 77 pairs with KS < 0.01 in small (size <6) and large (size >5) gene families, respectively. After excluding gene pairs with both genetic distances in intron and flanking regions >0.02, these numbers became 73 and 74, respectively. In comparison, there were 164 pairs with KS < 0.01 in small gene families (size <6) in Lynch and Conery (2000)
. Although we used gene pairs with KS < 0.01 in all gene families to estimate the gene duplication rates, our estimates are lower than their results for Drosophila and C. elegans and similar for yeast if we exclude the putative helicase protein family.
Common Protein Families in the Three Organisms
It has been estimated that about 550 functional chemoreceptors (olfactory receptors) are scattered in the C. elegans genome (Robertson 1998
). These proteins help nematodes detect different kinds of chemicals. Chemoreceptors can be divided into different protein families, among which srh, str, stl, and srd are large ones. The srh gene family was estimated to have 214 members (Robertson 2000
), most of which fall into our second largest family in C. elegans, which has 181 members. Two closely related protein families, the str and stl (str-like protein) protein families, constitute our largest gene family which has 242 members (the sum of proteins in these two families was estimated to be 240, Robertson 1998
). The three other most common gene families in nematode (table 4
) are all hypothetical protein families. Their function and phylogenetic relationship need to be investigated in the future. It is interesting to note that among the five largest gene families in yeast there are two (seripauperins and putative helicases) located at the end of the chromosomes. Seripauperin genes encode serine-poor relatives of serine-rich proteins. Members in the putative helicase gene family are very similar to each other. Transcriptions of these genes are not detected under normal culture conditions (Viswanathan et al. 1994
; Yamada et al. 1998
). However, expression of putative helicases can be detected under some stress conditions (Yamada et al. 1998
). Another common gene family in yeast is also a stress response gene family: heat shock proteins. Most of the proteins within this family are cell stress chaperones. The remaining two of the five largest protein families are both membrane proteins. Although yeast has a gene duplication rate as high as that in C. elegans and much higher than that in Drosophila, there are no protein families in yeast that are as large as those in the other two organisms. Like olfactory receptors in C. elegans, cuticle proteins represent two out of the five largest protein families in Drosophila. The P450 proteins have been divided into two protein families (Tijet, Helvig, and Feyereisen 2001
), one of which is among the five largest protein families in Drosophila (table 4
), and the other is among the 10 most common protein families in flies (data not shown). The other two largest protein families in Drosophila are trypsin and GTP-binding proteins, which are involved in metabolic and regulatory pathways.
Nonneutral Evolution at Synonymous Sites in Yeast
As codon usage is strongly biased in many yeast genes (Ikemura 1982
), we now consider whether codon usage bias can affect the estimation of KS in yeast duplicate genes. We used gene pairs within the same duplicate block to investigate the relationship between KS and codon usage bias, which is negatively correlated with the ENC in a gene. In principle, gene pairs within a block should have the same age, but we noted that those pairs with a small ENC value (implying strong codon usage bias) tended to have a small KS value (data not shown). A similar pattern was found by Friedman and Hughes (2001)
. These results suggest that synonymous sites in genes with strong codon usage bias are under selective constraints. We now compare the KS values with the numbers of substitutions per site (K) in both the upstream- and downstream-flanking 150-bp regions. We consider only gene pairs with KS < 0.25 in small gene families (size <6) because it is easier to obtain reliable KS and K when they are not large and because if the family size is larger than six, there may be too many nonindependent pairs. The result is shown in figure 1
. For all gene pairs with average ENC
32, the KS values are much smaller than the average K values in the flanking regions, whereas for the majority of gene pairs with average ENC
32, the KS values are similar to the average K values in the flanking regions. This suggests that for a duplicate gene pair with strong codon usage bias, the KS value does not increase linearly with the age of the gene duplication. In the above study of recent gene duplication rates, we considered only gene pairs with KS < 0.01 and excluded those gene pairs whose genetic distances in intron and flanking regions were >0.02. This procedure reduced the effect of codon usage bias on KS (table 5
) and should give a more accurate estimate of the rate of gene duplication.
|
| Acknowledgements |
|---|
|
|
|---|
This work was supported by NIH grants GM30998, GM55759, and HD38287. A.C. was supported by CAPES-Brasilia/Brazil. We thank A. Nekrutenko, H. Wang, K. Thornton, and E. Stahl for discussion. The comments of the two reviewers and the computer support from R. Blocker are greatly appreciated.
| Footnotes |
|---|
Saitou Naruya, Reviewing Editor
Keywords: gene families
gene duplication rate
database cleaning
codon usage bias ![]()
Address for correspondence and reprints: Wen-Hsiung Li, Department of Ecology and Evolution, University of Chicago, 1101 East 57th street, Chicago, Illinois 60637. whli{at}uchicago.edu
. ![]()
| References |
|---|
|
|
|---|
Akashi H., R. M. Kliman, A. Eyre-Walker, 1998 Mutation pressure, natural selection, and the evolution of base composition in Drosophila Genetica 102/103:49-60
The Arabidopsis Genome Initiative, 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815.[Medline]
Brosius J., 1999 Genome were forged by massive bombardments with retroelements and retrosequences Genetica 107:209-238[Web of Science][Medline]
Doolittle R. F., 1986 Of URFs and ORFs: a primer on how to analyze derived amino acid sequences University Science Book, Mill Valley, Calif.
. 1995 The multiplicity of domains in protein Annu. Rev. Biochem 64:287-314[Web of Science][Medline]
Dunn K. A., J. P. Bielawski, Z. Yang, 2001 Substitution rates in Drosophila nuclear genes: implications for translational selection Genetics 157:295-305
Friedman R., A. L. Hughes, 2001 Gene duplication and the structure of eukaryotic genome Genome Res 11:373-381
Ikemura T., 1982 Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs J. Mol. Biol 158:573-597[Web of Science][Medline]
Kimura M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[Web of Science][Medline]
Krogh A., M. Brown, I. S. Mianm, K. Sjolander, D. Haussler, 1994 Hidden markov models in computational biology: applications to protein modeling J. Mol. Biol 235:1501-1531[Web of Science][Medline]
Li W.-H., 1997 Molecular evolution Sinauer, Sunderland, Mass
Lynch M., J. S. Conery, 2000 The evolutionary fate and consequences of duplicate genes Science 290:1151-1155
Makalowski W., 2000 Genome scrap yard: how genomes utilize all that junk Gene 259:61-67[Web of Science][Medline]
Moriyama E. N., D. L. Hartl, 1993 Codon usage bias and base composition of nuclear genes in Drosophila Genetics 134:847-858[Abstract]
Nekrutenko A., W.-H. Li, 2001 Transposable elements are found in a large number of human protein coding regions Trends Genet 17:619-621[Web of Science][Medline]
Ohno S., 1970 Evolution by gene duplication Springer-Verlag, Berlin
Robertson H. M., 1998 Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss Genome Res 8:449-463
. 2000 The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses Genome Res 10:192-203
Rost B., 1999 Twilight zone for protein sequences alignments Protein Eng 12:85-94
Rubin G. M., M. D. Yandell, J. R. Wortman, (54 co-authors) 2000 Comparative genomics of the eukaryotes Science 287:2204-2215
Seoighe C., K. H. Wolfe, 1999 Updated map of duplicated regions in the yeast genome Gene 238:253-261[Web of Science][Medline]
Sharp P. M., W.-H. Li, 1989 On the rate of DNA sequence evolution in Drosophila J. Mol. Evol 28:398-402[Web of Science][Medline]
Sonnhammer E. L. L., S. R. Eddy, R. Durbin, 1997 Pfam: a comprehensive database of protein domain families based on seed alignments Proteins 28:405-420[Web of Science][Medline]
Tamura K., M. Nei, 1993 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees Mol. Biol. Evol 10:512-526[Abstract]
Tijet N., C. Helvig, R. Feyereisen, 2001 The cytochrome P450 gene superfamily in Drosophila melanogaster: annotation, intron-exon organization and phylogeny Gene 262:189-198[Web of Science][Medline]
Viswanathan M., G. Muthukumar, Y. S. Cong, J. Lenard, 1994 Seripauperins of Saccharomyces cerevisiae: a new multigene family encoding serine-poor relatives of serine-rich proteins Gene 148:149-153[Web of Science][Medline]
Wolfe K. H., D. C. Shields, 1997 Molecular evidence for an ancient duplication of the entire yeast genome Nature 387:708-713[Medline]
Yamada M., N. Hayatsu, A. Matsuura, F. Ishikawa, 1998 Y'-Help1, a DNA helicase encoded by the yeast subtelomeric Y' element, is induced in survivors defective for telomerase J. Biol. Chem 255:335-345
Yang Z., R. Nielsen, 2000 Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models Mol. Biol. Evol 17:32-43
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. D. Cutter, A. Dey, and R. L. Murray Evolution of the Caenorhabditis elegans Genome Mol. Biol. Evol., June 1, 2009; 26(6): 1199 - 1234. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Barriere, S.-P. Yang, E. Pekarek, C. G. Thomas, E. S. Haag, and I. Ruvinsky Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes Genome Res., March 1, 2009; 19(3): 470 - 480. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. J. Binford, M. R. Bodner, M. H.J. Cordes, K. L. Baldwin, M. R. Rynerson, S. N. Burns, and P. A. Zobel-Thropp Molecular Evolution, Functional Variation, and Proposed Nomenclature of the Gene Family That Includes Sphingomyelinase D in Sicariid Spider Venoms Mol. Biol. Evol., March 1, 2009; 26(3): 547 - 566. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Osada, S. Mano, and J. Gojobori Quantifying dominance and deleterious effect on human disease genes PNAS, January 20, 2009; 106(3): 841 - 846. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Zhou, G. Zhang, Y. Zhang, S. Xu, R. Zhao, Z. Zhan, X. Li, Y. Ding, S. Yang, and W. Wang On the origin of new genes in Drosophila Genome Res., September 1, 2008; 18(9): 1446 - 1455. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Prachumwat and W.-H. Li Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes Genome Res., February 1, 2008; 18(2): 221 - 232. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Irimia, J. L. Rukov, D. Penny, J. Garcia-Fernandez, J. Vinther, and S. W. Roy Widespread Evolutionary Conservation of Alternatively Spliced Exons in Caenorhabditis Mol. Biol. Evol., February 1, 2008; 25(2): 375 - 382. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Matsuya, R. Sakate, Y. Kawahara, K. O. Koyanagi, Y. Sato, Y. Fujii, C. Yamasaki, T. Habara, H. Nakaoka, F. Todokoro, et al. Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees Nucleic Acids Res., January 11, 2008; 36(suppl_1): D787 - D792. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R. Thornton The Neutral Coalescent Process for Recent Gene Duplications and Copy-Number Variants Genetics, October 1, 2007; 177(2): 987 - 1000. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Freeling, L. Rapaka, E. Lyons, B. Pedersen, and B. C. Thomas G-Boxes, Bigfoot Genes, and Environmental Response: Characterization of Intragenomic Conserved Noncoding Sequences in Arabidopsis PLANT CELL, May 1, 2007; 19(5): 1441 - 1457. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G Bragg and A. Wagner Protein carbon content evolves in response to carbon availability and may influence the fate of duplicated genes Proc R Soc B, April 22, 2007; 274(1613): 1063 - 1070. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Labbe, A. Berthomieu, C. Berticat, H. Alout, M. Raymond, T. Lenormand, and M. Weill Independent Duplications of the Acetylcholinesterase Gene Conferring Insecticide Resistance in the Mosquito Culex pipiens Mol. Biol. Evol., April 1, 2007; 24(4): 1056 - 1067. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Thomas, L. Rapaka, E. Lyons, B. Pedersen, and M. Freeling Arabidopsis intragenomic conserved noncoding sequence PNAS, February 27, 2007; 104(9): 3348 - 3353. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Guo, R. E. Weiss, X. Gu, and M. A. Suchard Time Squared: Repeated Measures on Phylogenies Mol. Biol. Evol., February 1, 2007; 24(2): 352 - 362. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, Y. Huang, X. Xia, and Z. Sun Preferential Duplication in the Sparse Part of Yeast Protein Interaction Network Mol. Biol. Evol., December 1, 2006; 23(12): 2467 - 2473. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. T. Naughton, E. Fratkin, S. Batzoglou, and D. L. Brutlag A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites Nucleic Acids Res., November 6, 2006; 34(20): 5730 - 5739. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Wang, K. Chong, and T. Wang Divergence in spatial expression patterns and in response to stimuli of tandem-repeat paralogues encoding a novel class of proline-rich proteins in Oryza sativa J. Exp. Bot., August 1, 2006; 57(11): 2887 - 2897. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Oakley, B. Ostman, and A. C. V. Wilson Repression and loss of gene expression outpaces activation and gain in recently duplicated fly genes PNAS, August 1, 2006; 103(31): 11637 - 11641. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Gimelbrant and A. Chess An epigenetic state associated with areas of gene duplication Genome Res., June 1, 2006; 16(6): 723 - 729. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Louzoun, L. Muchnik, and S. Solomon Copying nodes versus editing links: the source of the difference between genetic regulatory networks and the WWW Bioinformatics, March 1, 2006; 22(5): 581 - 588. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Su, J. Wang, J. Yu, X. Huang, and X. Gu Evolution of alternative splicing after gene duplication Genome Res., February 1, 2006; 16(2): 182 - 189. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yang, A. I. Su, and W.-H. Li Gene Expression Evolves Faster in Narrowly Than in Broadly Expressed Mammalian Genes Mol. Biol. Evol., October 1, 2005; 22(10): 2113 - 2118. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Mueller, K. R. Ram, L. A. McGraw, M. C. Bloch Qazi, E. D. Siggia, A. G. Clark, C. F. Aquadro, and M. F. Wolfner Cross-Species Comparison of Drosophila Male Accessory Gland Protein Genes Genetics, September 1, 2005; 171(1): 131 - 143. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Phadnis and J. D. Fry Widespread Correlations Between Dominance and Homozygous Effects of Mutations: Implications for Theories of Dominance Genetics, September 1, 2005; 171(1): 385 - 392. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wagner Energy Constraints on the Evolution of Gene Expression Mol. Biol. Evol., June 1, 2005; 22(6): 1365 - 1374. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Oakley, Z. Gu, E. Abouheif, N. H. Patel, and W.-H. Li Comparative Methods for the Analysis of Gene-Expression Evolution: An Example Using Yeast Functional Genomic Data Mol. Biol. Evol., January 1, 2005; 22(1): 40 - 50. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Wong, L. V. Zhang, A. H. Y. Tong, Z. Li, D. S. Goldberg, O. D. King, G. Lesage, M. Vidal, B. Andrews, H. Bussey, et al. Combining biological networks to predict genetic interactions PNAS, November 2, 2004; 101(44): 15682 - 15687. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Clauss and T. Mitchell-Olds Functional Divergence in Tandemly Duplicated Arabidopsis thaliana Trypsin Inhibitor Genes Genetics, March 1, 2004; 166(3): 1419 - 1436. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Cobbe and M. M. S. Heck The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications Mol. Biol. Evol., February 1, 2004; 21(2): 332 - 347. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Makova and W.-H. Li Divergence in the Spatial Pattern of Gene Expression Between Human Duplicate Genes Genome Res., July 1, 2003; 13(7): 1638 - 1645. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Betran and M. Long Dntf-2r, a Young Drosophila Retroposed Gene With Specific Male Expression Under Positive Darwinian Selection Genetics, July 1, 2003; 164(3): 977 - 988. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yang, Z. Gu, and W.-H. Li Rate of Protein Evolution Versus Fitness Effect of Gene Deletion Mol. Biol. Evol., May 1, 2003; 20(5): 772 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kaessmann, S. Zollner, A. Nekrutenko, and W.-H. Li Signatures of Domain Shuffling in the Human Genome Genome Res., November 1, 2002; 12(11): 1642 - 1650. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ranson, C. Claudianos, F. Ortelli, C. Abgrall, J. Hemingway, M. V. Sharakhova, M. F. Unger, F. H. Collins, and R. Feyereisen Evolution of Supergene Families Associated with Insecticide Resistance Science, October 4, 2002; 298(5591): 179 - 181. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










