MBE Advance Access originally published online on August 3, 2006
Molecular Biology and Evolution 2006 23(11):2072-2080; doi:10.1093/molbev/msl076
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Impacts of Gene Essentiality, Expression Pattern, and Gene Compactness on the Evolutionary Rate of Mammalian Proteins

* Department of Ecology and Evolutionary Biology, University of Michigan
Department of Human Genetics, University of Michigan
E-mail: jianzhi{at}umich.edu.
| Abstract |
|---|
|
|
|---|
Understanding the determinants of the rate of protein sequence evolution is of fundamental importance in evolutionary biology. Many recent studies have focused on the yeast because of the availability of many genome-wide expressional and functional data. Yeast studies revealed a predominant role of gene expression level and a minor role of gene essentiality in determining the rate of protein sequence evolution. Whether these rules apply to complex organisms such as mammals is unclear. Here we assemble a list of 1,138 essential and 2,341 nonessential mouse genes based on targeted gene deletion experiments and report a significant impact of gene essentiality on the rate of mammalian protein evolution. Gene expression level has virtually no effect, although tissue specificity in expression pattern has a strong influence. Unexpectedly, gene compactness, measured by average intron size and untranslated region length, has the greatest influence. Hence, the relative importance of the various factors in determining the rate of mammalian protein evolution is gene compactness > gene essentiality
tissue specificity > expression level. Our results suggest a considerable variation in rate determinants between unicellular organisms such as the yeast and multicellular organisms such as mammals.
Key Words: evolutionary rate gene essentiality tissue specificity expression level gene compactness mammals
| Introduction |
|---|
|
|
|---|
What determines the rate of protein sequence evolution is a fundamental question in molecular evolution. It is well known that the evolutionary rates of different proteins in a genome vary by several orders of magnitude (Dayhoff 1972
Among the potential rate determinants, gene essentiality is perhaps the most studied and debated factor. Essential genes refer to those that cause lethality or infertility when deleted. Based on the neutral theory of molecular evolution (Kimura and Ohta 1974
), it was predicted that essential genes are subject to stronger selective constraints and, therefore, evolve more slowly than nonessential genes (Wilson et al. 1977
). However, Hurst and Smith (1999)
failed to verify this prediction when they compared 67 essential genes and 108 nonessential genes of the mouse. Although subsequent analysis of bacterial and yeast genes found gene essentiality to be an important rate determinant (Hirsh and Fraser 2001
; Jordan et al. 2002
), these results were suggested to arise from a confounding factor of the gene expression level (Pal et al. 2003
; Rocha and Danchin 2004
). More recent analyses, however, showed that gene essentiality has a small, yet statistically significant, impact on the evolutionary rate of yeast proteins even when the gene expression level is controlled for (Wall et al. 2005
; Zhang and He 2005
). Nonetheless, despite the availability of many mouse strains produced in targeted gene deletion experiments, whether gene essentiality influences mammalian protein evolution remains unsolved due to the lack of a comprehensive list of essential and nonessential genes.
The importance of gene expression level in determining the protein evolutionary rate in yeasts and bacteria is well established (Pal et al. 2001a
; Rocha and Danchin 2004
; Zhang and He 2005
; Drummond et al. 2006
), although the molecular evolutionary mechanisms are unclear and debated (Akashi 2003
; Drummond et al. 2005
). Unlike unicellular organisms, mammalian cells are highly differentiated, and different types of cells turn on different sets of genes to maintain their identities and functions. Hence, both the expression level and tissue specificity of expression may be important in determining the rate of mammalian gene evolution. In fact, previous studies of mammalian genes showed higher evolutionary rates among lowly expressed genes than highly expressed genes (Subramanian and Kumar 2004
) and higher rates among tissue-specific genes than housekeeping genes (Duret and Mouchiroud 2000
; Winter et al. 2004
; Zhang and Li 2004
). However, because housekeeping genes tend to be highly expressed (Vinogradov 2004
; Liao and Zhang 2006a
), it is unknown whether expression level and tissue specificity have independent influences on the evolutionary rate.
A previous study of 363 mouse and rat genes showed a significant, but weak, negative correlation between protein length and the rate of protein sequence evolution (Zhang 2001
). An opposite pattern, however, was found in the fruit fly (Lemos et al. 2005
). Recent studies also showed that highly expressed genes tend to code for short proteins and have short introns (Castillo-Davis et al. 2002
). Because highly expressed genes tend to have low rates of protein evolution, one would expect a positive correlation between protein (or intron) length and the rate of protein evolution. It is interesting to test this prediction.
In the present study, we first compile a list of 3,479 mouse genes with essentiality information derived from targeted gene deletion data. We then study the influences of gene essentiality, gene expression level, tissue specificity, and gene compactness (in terms of protein length, average intron length, and untranslated region (UTR) length) on the rate of mammalian protein evolution. We conduct a series of partial correlation analyses to disentangle the contributions of various factors and compare our results with findings from the yeast. Our results reveal a great variation in rate determinants between unicellular and multicellular organisms.
| Materials and Methods |
|---|
|
|
|---|
Mouse Essential and Nonessential Genes
Mouse genes subject to targeted deletion experiments were downloaded from Mouse Genome Database (MGD) (http://www.informatics.jax.org/). Only those genes having one corresponding Ensembl gene name were kept for subsequent analysis. These genes were classified into essential and nonessential genes based on their knockout phenotypic codes (MP numbers) provided by MGD. By definition, essential genes are those with the knockout phenotype of lethality or sterility. That is, those entries possessing embryonic lethality (MP: 0002080), prenatal lethality (MP: 0002081), survival postnatal lethality (MP: 0002082), premature death or induced morbidity (MP: 0002083), or reproductive system phenotype (MP: 0002161) were grouped as essential genes. All other genes associated with a phenotypic classification term, including those entries with a normal phenotype, were grouped as nonessential genes. The primary data set included 1,138 essential and 2,341 nonessential mouse genes.
Gene Orthology and Evolutionary Rate
The homology information of mouse and rat genes was obtained from Ensembl EnsMart (http://www.ensembl.org/Multi/martview). There were several annotated homology relationships between mouse and rat genes by Ensembl. We only considered those pairs of genes annotated as UBRH (Unique Best Reciprocal Hit, meaning that they were unique reciprocal best hits in all-against-all BlastZ searches) to be orthologous. The number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous site (dN) between mouse and rat orthologs were estimated by the maximum likelihood method of Yang (1997)
and retrieved from Ensembl EnsMart.
Structural and Functional Annotations of Mouse Genes
The structural and functional annotations of mouse genes were obtained from Ensembl version 31. Chromosomal positions, coding sequence (CDS) lengths, intron numbers, intron lengths, and 5' and 3' UTR lengths of mouse genes were retrieved from Ensembl EnsMart (http://www.ensembl.org/Multi/martview) (Kasprzyk et al. 2004
). For alternatively spliced genes, we chose structural information of the splice form with the longest CDS. Genes having immune-related functions were identified from the Gene Ontology description (http://www.geneontology.org/) contained in Ensembl database. It should be noted that not all mouse genes in the preliminary data set have rat orthologs. After removing mouse genes without UBRH rat orthologs, 1,038 essential and 2,126 nonessential mouse genes were kept for subsequent analysis.
The gene structure annotation of the yeast S. cerevisiae was also obtained from Ensembl EnsMart. Nucleotide substitution rates between S. cerevisiae and Saccharomyces bayanus orthologous genes were obtained from Zhang and He (2005)
.
Analysis of Gene Expression Pattern
The spatial expression information of mouse genes was obtained from the Gene Atlas V2 data set (http://symatlas.gnf.org/SymAtlas/). This data set was generated by hybridization of RNAs from 61 mouse tissues onto Affymetrix microarray chips (GNF1M) (Su et al. 2004
). To assign expression data from probe sets to corresponding Ensembl mouse genes, we aligned probe sequences of each probe set to the Ensembl cDNA sequences (Mus_musculus.NCBIM33.feb.cdna.fa; http://www.ensembl.org/info/data/download.html) using BlastN (http://www.ncbi.nlm.nih.gov/blast/). Only those probe sets in which all 10 matching probes perfectly matched with the same Ensembl gene were considered to be valid. The expression level detected by each probe set was obtained as the signal intensity (S) computed from MAS 5.0 algorithm (MAS5) (Hubbell et al. 2002
). The S values were averaged among replicates.
In the present study, we measured 2 properties of the mouse gene expression pattern: expression level (ExpLev) and tissue specificity (
). ExpLev is defined as the average signal intensity (S) of a mouse gene across 61 examined tissues. The tissue specificity of a gene is defined as the heterogeneity of its expression level across all the tissues and is estimated by
where n = 61 is the number of mouse tissues examined here and Smax is the highest expression signal of the gene across all tissues (Yanai et al. 2005
). To minimize the influence of noise from low intensities, we arbitrarily let S(j) be 100 if it is lower than 100 (Liao and Zhang 2006a
). The
value ranges from 0 to 1, with higher values indicating greater variations in expressional level across tissues and thus higher tissue specificity. The advantage of using
rather than expression breadth, which requires an arbitrary cutoff to determine whether a gene is expressed in a given tissue, has been extensively discussed (Liao and Zhang 2006a
). Some mouse genes are represented by more than one probe set on the microarray. Because it was not possible to tell which probe set provides the best expression measure of a target gene (Liao and Zhang 2006b
), we computed ExpLev and
by averaging the values derived from the different probe sets of the same gene. The final data set used in partial correlation analyses contained 2,575 mouse genes with knockout phenotypes, orthologous rat genes, and structural and expression data. Among them, 852 were essential and 1,723 were nonessential.
| Results |
|---|
|
|
|---|
Nonessential Proteins Evolve Faster than Essential Proteins
We compiled a list of 1,138 essential and 2,341 nonessential genes using mouse targeted gene deletion data. Among them, 1,038 essential and 2,126 nonessential genes have orthologous genes in the rat. The number of synonymous substitutions per synonymous site (dS) and the number of nonsynonymous substitutions per nonsynonymous site (dN) were estimated for these genes using mouse and rat orthologs. We found a significant difference between essential and nonessential genes in dN (P < 1014, MannWhitney U test; fig. 1a). On average, dN is 30% greater for nonessential genes than essential genes. We noticed that X-linked genes and immune system genes are slightly overrepresented in the nonessential group (3.3% and 7.0%), compared with the essential group (2.3% and 3.1%). Because X-linked mammalian genes may behave differently from autosomal genes due to differences in gene content, mutation rate, and selection intensity (Wang et al. 2001
3%) (fig. 1b). The average dN/dS ratio of nonessential genes is 2434% greater than that of essential genes, depending on whether X-linked genes and immune system genes are considered or not (fig. 1c). Thus, there is a significantly negative correlation between gene essentiality and dN/dS (or dN) (table 1). These results indicate that gene essentiality affects the rate of mammalian protein evolution by influencing the selective constraint on the proteins.
|
|
Effects of Gene Expression Level and Tissue Specificity on the Rate of Protein Evolution
Two gene expression properties, expression level (Pal et al. 2001a
= 0.04) and only marginally significant (P = 0.041) in mammals. Similar results are obtained when essential and nonessential genes are analyzed separately. By contrast, the correlation between tissue specificity (
) and dN is much stronger (
= 0.168, P < 1016). We noticed that tissue-specific genes not only have greater dN/dS but also greater dS values (fig. 2), implying that faster protein evolution of tissue-specific genes may have resulted from both higher mutation rate and lower purifying selection. Because average dS does not exhibit the same magnitude of increase as average dN while
becomes larger (
17% increase vs.
90% increase), mutation rate bias is unlikely to be the main cause for high dN of tissue-specific genes. Our result is consistent with that of Zhang and Li (2004)
|
Because the expression level and tissue specificity may be correlated, we measured the partial correlation between ExpLev and dN by controlling for
. Although the partial correlation becomes stronger and more significant (
= 0.061, P = 0.002), it is still not comparable to the partial correlation between
and dN when ExpLev is controlled for (
= 0.174, P < 1018). These results suggest that tissue specificity is much more important than average expression level in determining the rate of mammalian protein sequence evolution.
Compact Genes Have High Rates of Evolution
Although a significant positive correlation between the CDS length and dN was observed in fruit fly (Lemos et al. 2005
) and a significant negative correlation was observed in a set of 363 mouse and rat genes (Zhang 2001
), no significant correlation is found in our data (table 1). Surprisingly, we found a negative correlation between UTR length and dN (or dN/dS) (table 1 and fig. 3). For example, the mean dN of genes with a total UTR length of <300 nt is about twice that of genes with a total UTR length of >2,400 nt (fig. 3a). Similarly, we found a negative correlation between average size per intron in a gene (but not intron number) and dN (or dN/dS) of the gene (table 1 and fig. 4). The mean dN of genes with an average intron size of <1,000 nt is over 5 times that of genes with an average intron size of >8,000 nt (fig. 4a). The correlations between gene compactness and dN are of comparable or even higher magnitudes than that between tissue specificity (
) and dN (table 1).
|
|
In the above analysis, we used the longest splice form for those genes that have alternative splicing. We repeated the above analysis by using the shortest splice form or by removing genes with alternative splicing. The results are essentially the same (supplementary tables 1 and 2, Supplementary Material online). There are also many overlapping (including nested) genes in the mouse genome (Veeramachaneni et al. 2004
|
Relative Impacts of Gene Essentiality, Expression Pattern, and Gene Compactness on the Evolutionary Rate
The above-examined factors are not completely independent in determining the rate of protein sequence evolution. For instance, genes with high expression levels tend to have small introns (
= 0.079, P < 104). In order to separate the contributions of multiple factors, we applied partial correlation analyses. Although a recent study suggested that principle component regression analysis is superior to partial correlation analysis for noisy data (Drummond et al. 2006
tissue specificity > gene expression level. | Discussion |
|---|
|
|
|---|
In this work, we used statistical analysis to study the determinants of the rate of mammalian protein sequence evolution. Because there are potentially many rate determinants and because some measures of these determinants (e.g., gene expression level and tissue specificity) have large estimation errors (Wall et al. 2005
Based on an analysis of 175 mouse genes, Hurst and Smith (1999)
found no significant correlation between gene essentiality and dN/dS. Zhang and He (2005)
suggested that this negative result was likely due to an insufficient sample size. Indeed, when 3,164 mouse genes are analyzed here, essential genes showed significantly lower dN/dS than nonessential genes. This difference remains highly significant even when we remove immune system genes and X-linked genes. Furthermore, the correlation between gene essentiality and dN (or dN/dS) is still significant after controlling for gene expression level, tissue specificity, UTR length, and intron length. We conclude that gene essentiality is an independent determinant of the rate of mammalian protein evolution. It is interesting to note that in yeasts, the average dN of nonessential genes is
40% higher than that of essential genes (Zhang and He 2005
), a number slightly greater than that observed for mammalian genes (30%). The rank correlation coefficient between gene essentiality and dN is
0.2 in yeast, also slightly greater than that in mammals (0.14). After controlling for gene expression level, the correlation coefficient becomes 0.100.15 in yeast and 0.13 in mammals. Note that the yeast gene knockout data used by Zhang and He (2005)
contained >90% of yeast genes, whereas the mouse gene knockout data used here contained only 15% of mouse genes. Because targeted gene deletion in mouse requires great efforts, it is possible that researchers tend to study and report functionally important mouse genes that have human orthologs, thus reducing the variation in essentiality among the genes included in our data set. This reduction could potentially decrease the correlation coefficient between gene essentiality and dN. But, at any rate, gene essentiality and dN are significantly correlated in mammals. Thus, in all organisms so far examined (bacteria, yeasts, nematodes, and mammals), nonessential genes tend to evolve faster than essential genes. It is thus appropriate to conclude that the fundamental prediction of the neutral theory, that less important genes evolve faster than important genes, is universally supported by empirical data at the genomic level. However, it should be pointed out that the correlation between gene essentiality and dN, although statistically significant, is small in magnitude. This weak correlation contrasts the strong belief of many biologists that functionally important DNA sequences evolve slowly, which is the basis of many successful bioinformatic methods such as Blast (Altschul et al. 1990
) and phylogenetic footprinting (Gumucio et al. 1993
). It is possible that the knockout phenotype observed in the lab only roughly reflects the amount of fitness reduction in the wild, which is expected to be a better rate determinant.
A previous study showed that human morbid genes (those known to cause diseases when mutated) evolve more slowly than nonmorbid genes (Kondrashov et al. 2004
). Their analysis is not equivalent to a comparison between essential and nonessential genes because nonmorbid genes can have unidentifiable embryonic lethal phenotype or infertility phenotype when mutated. In other words, nonmorbid genes include both essential and nonessential genes, and thus, there is no clear prediction as to whether nonmorbid genes should evolve more rapidly or more slowly than morbid genes. In fact, Smith and Eyre-Walker (2003)
also analyzed morbid and nonmorbid genes but obtained an opposite result.
We found that the rate of mammalian protein evolution is not, or is only weakly, correlated with the gene expression level, when gene essentiality is controlled for. In the future, it would be important to verify this finding for the entire genome as more gene knockout data become available. If our finding is generally true for mammals, it contrasts that from the yeast, where the expression level explains about a quarter (
2 =
0.25) of the variation in dN (Zhang and He 2005
). The reduction of the correlation in mammals may be due to smaller population sizes in mammals than in yeasts because the expression level becomes a weaker selective force as the population size reduces (Ohta 1992
). However, although the correlations between various rate determinants and protein evolutionary rate in mammals may be weakened by smaller population sizes, the relative importance of these rate determinants should remain unchanged. Why are the influences of gene expression level on dN drastically different between yeast and mammals? To address this question, one has to understand why the gene expression level affects dN in yeast. However, no widely accepted explanation exists at this time. The recently proposed translational robustness hypothesis (Drummond et al. 2005
) suggests that highly expressed proteins are prone to forming misfolded protein aggregates that could be toxic or pathogenic to the organism (Ellis and Pinheiro 2002
). Thus, their coding regions are under intense selective pressure to maintain certain sequences that avoid misfolding in the presence of translational errors (Drummond et al. 2005
). If this hypothesis is correct, our observation of no impact of expression level on dN in mammals may be due to a lowered probability of protein aggregation in mammalian cells. It is known that a misfolded protein may aggregate, particularly when it is in high concentration (Minton 2000
). The cell volume of the mouse sperm (6170 µm3) (Brotherton 1975
), the smallest mouse cell, is similar to that of a haploid yeast cell (
70 µm3) (Sherman 1991
). Generally speaking, other types of mammalian cells are much larger than the sperm cell (and the yeast cell). If the protein concentration (per gene) in a cell is generally lower in mammals than in yeast, the pressure of avoiding aggregation would also be lower in mammalian cells, making expression level a negligible factor in determining dN. Nonetheless, this explanation is built on 2 assumptions, the translational robustness hypothesis and a lower protein concentration per gene in mammalian cells than in yeast cells, both of which require further scrutiny. An alternative explanation is that the gene expression level of a unicellular organism and the average gene expression level across tissues of a multicellular organism are 2 different things and are not comparable. Interestingly, when using the gene expression level estimated from the mouse expressed sequence tags (ESTs) at an embryonic stage, Subramanian and Kumar (2004)
found a significant impact of gene expression level on the rate of protein evolution. Because many genes are not expressed at the embryonic stage, the biological meaning of their observation is not immediately clear. It remains to be seen whether the correlation between gene expression level and protein evolutionary rate exists only among genes having similar functions or expression patterns (as in Subramanian and Kumar's study) but not among genes with diverse properties. Alternatively, the microarray gene expression data used in the present study may be too noisy to accurately reflect mRNA abundance compared with the EST data used by Subramanian and Kumar (2004)
. But, interestingly, the same microarray data revealed a strong correlation between
and dN, suggesting that these data still contain a sufficient amount of expression information. We also examined the correlation between the dN of a gene and the maximum expression level of the gene across 61 tissues surveyed. Unexpectedly, a weak positive correlation was observed (
= 0.075, P = 1.4 x 104). It is unclear what caused this positive correlation.
A surprising finding of the present study is that compact genes (with short UTRs and introns) tend to evolve fast (figs. 3 and 4). Although the above finding was based on genes with knockout data, essentially the same result was obtained when the entire genome was analyzed (supplementary table 4, Supplementary Material online). Previous studies showed that highly expressed genes have short introns (Castillo-Davis et al. 2002
) and evolve slowly (Subramanian and Kumar 2004
). Thus, one expects that genes with short introns evolve slowly. But, our observation is opposite. The reason for this unexpected observation is not entirely clear. Of course, in our analysis, gene expression level and dN are virtually uncorrelated, and thus, the prediction that compact genes evolve slowly is invalid. Nevertheless, the observation that compact genes evolve fast is still surprising. Because UTRs and introns are noncoding regions of a gene and the majority of these sequences are more tolerant than coding regions to insertions and deletions, we consider the length variation of these noncoding sequences as a result of variation of local insertion and deletion rates (Vinogradov 2004
). That is, we assume that the insertion/deletion rate ratio varies across genomic regions, making some genes more compact than others. It has been proposed that the presence and length of noncoding regions such as introns and intergenic regions can increase the frequency of recombination between adjacent exons and genes (Comeron and Kreitman 2002
). Accordingly, for 2 genes with the same functional importance, same CDS length, same number of introns, but different intron sizes, purifying selection is expected to be more efficient for the gene with bigger introns than the one with smaller introns, as the former has a higher recombination rate (per gene) than the latter. This difference results in a lower expected dN for the gene with bigger introns, which is observed in this study. This recombination rate hypothesis also predicts a negative correlation between intron number and dN/dS, which is not observed (table 1). It is likely that for a given gene, intron number is much less changeable by mutation than average intron size and thus does not show the predicted correlation. Of course, recombination rate variation provides just one possible explanation of our observation; other possibilities cannot be excluded. Contrary to mammals, only 263 yeast protein-coding genes (
5%) contain intron(s). Thus, it is expected that gene compactness will not be an important factor in determining yeast protein evolution at the genomic level. However, among 86 intron-containing yeast (S. cerevisiae) genes that have S. bayanus orthologs, the average intron size and dN are negatively correlated (
= 0.282, P < 0.01), similar to the result obtained from mammalian genes. It would be interesting to examine whether the influence of gene compactness on protein evolutionary rate is as significant as in mammals for unicellular eukaryotes with high prevalence of introns (e.g., the green algae Chlamydomonas reinhardtii).
In summary, we find that the relative importance of various rate determinants in mammals is gene compactness > gene essentiality
tissue specificity > gene expression level. This order differs substantively from that in yeasts or bacteria. For example, although the absolute magnitudes of the impact of gene essentiality are similar between the yeast and mammals, the relative impacts appear quite different because the gene expression level plays a much greater role in yeast than in mammals. It seems that the rules governing the rate of protein evolution need not be the same for all major clades of living organisms. Our results highlight the danger of applying findings from a single species, even based on a genome-wide analysis, to distantly related species and suggest reexamination of the roles of various rate determinants across a wide range of species, which is becoming feasible with the rapid advance of functional and comparative genomics.
| Supplementary Material |
|---|
|
|
|---|
Supplementary tables 14 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Wendy Grus, Ondrej Podlaha, Peng Shi, and 2 anonymous reviewers for valuable comments. This work was supported by research grants from the University of Michigan and the National Institutes of Health to J.Z.
| Footnotes |
|---|
Koichiro Tamura, Associate Editor
| References |
|---|
|
|
|---|
Akashi H. (2003) Translational selection and yeast proteome evolution. Genetics 164:1291303.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol 215:40310.[CrossRef][Web of Science][Medline]
Brotherton J. (1975) The counting and sizing of spermatozoa from ten animal species using a Coulter counter. Andrologia 7:16985.[Web of Science][Medline]
Castillo-Davis CI and Hartl DL. (2003) Conservation, relocation and duplication in genome evolution. Trends Genet 19:5937.[CrossRef][Web of Science][Medline]
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA. (2002) Selection for short introns in highly expressed genes. Nat Genet 31:4158.[Web of Science][Medline]
Comeron JM and Kreitman M. (2002) Population, evolutionary and genomic consequences of interference selection. Genetics 161:389410.
Dayhoff MO. (1972) Atlas of protein sequence and structure(National Biomedical Research Foundation, Washington DC).
Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102:1433843.
Drummond DA, Raval A, Wilke CO. (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:32737.
Duret L and Mouchiroud D. (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 17:6874.
Ellis RJ and Pinheiro TJ. (2002) Medicine: dangermisfolding proteins. Nature 416:4834.[CrossRef][Medline]
Fraser HB. (2005) Modularity and evolutionary constraint on proteins. Nat Genet 37:3512.[CrossRef][Web of Science][Medline]
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. (2002) Evolutionary rate in the protein interaction network. Science 296:7502.
Gumucio DL, Shelton DA, Bailey WJ, Slightom JL, Goodman M. (1993) Phylogenetic footprinting reveals unexpected complexity in trans factor binding upstream from the epsilon-globin gene. Proc Natl Acad Sci USA 90:601822.
Hahn MW and Kern AD. (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22:8036.
Hastings KE. (1996) Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J Mol Evol 42:63140.[CrossRef][Web of Science][Medline]
He X and Zhang J. (2006) Toward a molecular understanding of pleiotropy. Genetics Forthcoming.
Hirsh AE and Fraser HB. (2001) Protein dispensability and rate of evolution. Nature 411:10469.[CrossRef][Medline]
Hubbell E, Liu WM, Mei R. (2002) Robust estimators for expression analysis. Bioinformatics 18:158592.
Hughes AL. (1999) Adaptive evolution of genes and genomes. (Oxford University Press, New York).
Hughes AL and Nei M. (1988) Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:16770.[CrossRef][Medline]
Hurst LD and Smith NG. (1999) Do essential genes evolve slowly? . Curr Biol 9:74750.[CrossRef][Web of Science][Medline]
Jordan IK, Rogozin IB, Wolf YI, Koonin EV. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:9628.
Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E. (2004) EnsMart: a generic system for fast and flexible access to biological data. Genome Res 14:1609.
Kimura M. (1983) The neutral theory of molecular evolution. (Cambridge University Press, Cambridge).
Kimura M and Ohta T. (1974) On some principles governing molecular evolution. Proc Natl Acad Sci USA 71:284852.
Kondrashov FA, Ogurtsov AY, Kondrashov AS. (2004) Bioinformatical assay of human gene morbidity. Nucleic Acids Res 32:17317.
Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol 22:134554.
Li WH. (1997) Molecular evolution. (Sinauer Associates, Sunderland).
Liao BY and Zhang J. (2006a) Low rates of expression-profile divergence in highly-expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol 23:111928.
Liao BY and Zhang J. (2006b) Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol 23:53040.
Lu J and Wu CI. (2005) Weak selection revealed by the whole-genome comparison of the X chromosome and autosomes of human and chimpanzee. Proc Natl Acad Sci USA 102:40637.
Makino T and Gojobori T. (2006) The evolutionary rate of a protein is influenced by features of the interacting partners. Mol Biol Evol 23:7849.
Malcom CM, Wyckoff GJ, Lahn BT. (2003) Genic mutation rates in mammals: local similarity, chromosomal heterogeneity, and X-versus-autosome disparity. Mol Biol Evol 20:163341.
Minton AP. (2000) Implications of macromolecular crowding for protein assembly. Curr Opin Struct Biol 10:349.[CrossRef][Web of Science][Medline]
Nembaware V, Crum K, Kelso J, Seoighe C. (2002) Impact of the presence of paralogs on sequence divergence in a set of mouse-human orthologs. Genome Res 12:13706.
Ohta T. (1992) The nearly neutral theory of molecular evolution. Annu Rev Ecol Syst 23:26386.[CrossRef][Web of Science]
Pal C, Papp B, Hurst LD. (2001a) Highly expressed genes in yeast evolve slowly. Genetics 158:92731.
Pal C, Papp B, Hurst LD. (2001b) Does the recombination rate affect the efficiency of purifying selection? The yeast genome provides a partial answer. Mol Biol Evol 18:23236.
Pal C, Papp B, Hurst LD. (2003) Genomic function: rate of evolution and gene dispensability. Nature 421:4967.[Medline]
Rocha EP and Danchin A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:10816.
Sherman F. (1991) Getting started with yeast. Methods Enzymol 194:321.[CrossRef][Web of Science][Medline]
Smith NG and Eyre-Walker A. (2003) Human disease genes: patterns and predictions. Gene 318:16975.[CrossRef][Web of Science][Medline]
Su AI, Wiltshire T, Batalov S, et al. (13 co-authors). (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:60627.
Subramanian S and Kumar S. (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168:37381.
Veeramachaneni V, Makalowski W, Galdzicki M, Sood R, Makalowska I. (2004) Mammalian overlapping genes: the comparative perspective. Genome Res 14:2806.
Vinogradov AE. (2004) Compactness of human housekeeping genes: selection for economy or genomic design? . Trends Genet 20:24853.[CrossRef][Web of Science][Medline]
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW. (2005) Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA 102:54838.
Wang PJ, McCarrey JR, Yang F, Page DC. (2001) An abundance of X-linked genes expressed in spermatogonia. Nat Genet 27:4226.[CrossRef][Web of Science][Medline]
Wilson AC, Carlson SS, White TJ. (1977) Biochemical evolution. Annu Rev Biochem 46:573639.[CrossRef][Web of Science][Medline]
Winter EE, Goodstadt L, Ponting CP. (2004) Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res 14:5461.
Yanai I, Benjamin H, Shmoish M, et al. (12 co-authors). (2005) Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21:6509.
Yang J, Gu Z, Li WH. (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol Biol Evol 20:7724.
Yang Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:5556.
Zhang J. (2001) Protein-length distributions for the three domains of life. Trends Genet 16:1079.
Zhang J and He X. (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol 22:114755.
Zhang L and Li WH. (2004) Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol 21:2369.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B.-Y. Liao, M.-P. Weng, and J. Zhang Impact of Extracellularity on the Evolutionary Rate of Mammalian Proteins Gen Biol Evol, January 22, 2010; 2010(0): 39 - 43. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhao and R. J. Epstein Programmed Genetic Instability: A Tumor-Permissive Mechanism for Maintaining the Evolvability of Higher Species through Methylation-Dependent Mutation of DNA Repair Genes in the Male Germ Line Mol. Biol. Evol., August 1, 2008; 25(8): 1737 - 1749. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-Y. Liao and J. Zhang Null mutations in human and mouse orthologs frequently result in different phenotypes PNAS, May 13, 2008; 105(19): 6987 - 6992. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Andres, C. de Hemptinne, and J. Bertranpetit Heterogeneous Rate of Protein Evolution in Serotonin Genes Mol. Biol. Evol., December 1, 2007; 24(12): 2707 - 2715. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Georgelis, E. L. Braun, J. R. Shaw, and L. C. Hannah The Two AGPase Subunits Evolve at Different Rates in Angiosperms, yet They Are Equally Sensitive to Activity-Altering Amino Acid Changes When Expressed in Bacteria PLANT CELL, May 1, 2007; 19(5): 1458 - 1472. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







