MBE Advance Access originally published online on April 12, 2006
Molecular Biology and Evolution 2006 23(6):1107-1108; doi:10.1093/molbev/msk019
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
Average Gene Length Is Highly Conserved in Prokaryotes and Eukaryotes and Diverges Only Between the Two Kingdoms


* Laboratory of Population and Quantitative Genetics, School of Life Sciences and Institute of Biomedical Sciences, Fudan University, Shanghai, China; and
School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom
E-mail: z.luo{at}bham.ac.uk.
| Abstract |
|---|
|
|
|---|
The average length of genes in a eukaryote is larger than in a prokaryote, implying that evolution of complexity is related to change of gene lengths. Here, we show that although the average lengths of genes in prokaryotes and eukaryotes are much different, the average lengths of genes are highly conserved within either of the two kingdoms. This suggests that natural selection has clearly set a strong limitation on gene elongation within the kingdom. Furthermore, the average gene size adds another distinct characteristic for the discrimination between the two kingdoms of organisms.
Key Words: average gene length prokaryote eukaryote
| Background |
|---|
|
|
|---|
Gene elongation is recognized as one of the most important steps in the evolution of functional complexities of genes (Li 1997
| Results and Discussion |
|---|
|
|
|---|
We have surveyed almost all prokaryotic and eukaryotic species whose complete genome sequence data are available and well annotated up to date. These included 81 prokaryotes and 19 eukaryotes to which predictions of the coding sequences were validated and are listed in Tables 1 and 2 of the Supplementary Material online. The tables also illustrated genome sequence size (N) in kilobase pairs, number of predicted genes (n), and ratio of coding sequence over the genome sequence (r) for each of these species together with the key reference from which the data were collected. From these parameters, we calculated the MLGCS within each of the species as L = N x r/n. We regressed the estimate of total coding sequence length on the estimate of the number of genes for each of the two groups of species and demonstrated the analyses in figure 1.
|
It can be seen from figure 1 that there is a perfect linear relationship between the total coding sequence and the number of genes in both prokaryotic and eukaryotic genomes. The perfect linear relationship does hold between the number of genes and the total sequence length in the prokaryotes but it does not in the eukaryotes probably because of introns, transposable elements, and junk DNA in the eukaryotic genomes (data not shown). The mean (standard error), the coefficients of skewness, and the coefficients of kurtosis of MLGCS were estimated as 924 (9) bp, 0.1952, and 3.3501 for the prokaryotic group and as 1,346 (28) bp, 0.1723, and 2.5661 for the eukaryotic group, respectively. The analyses indicate that the genic coding sequence has a relatively constant average length in both prokaryotes and eukaryotes in spite of the remarkable variation in the coding sequence length among individual genes within these genomes. The coding sequence of a gene in the eukaryote kingdom is on average 445 bp longer than that in the prokaryotes.
It is widely accepted that natural selection favors shorter genic coding sequence length for higher transcriptional efficiency, for efficient protein synthesis, and for avoiding accumulation of deleterious mutation, on one hand, but evolution seems to improve the function of a protein through elongating its coding sequence on the other (Li 1997
; Zhang 2000
; Akashi 2003
; Claverie and Ogata 2003
; Wang, Hsieh, and Li 2005
). Schneider and Ebert (2004)
have recently argued that the covariation between genome size and gene length is expected to be strongest in smallest genomes and that selection for reduced gene length becomes progressively weaker when genomes become larger. Our observation suggests that there is a stringent structural constraint on evolution of gene size on a genomic scale. The species that have been diverged for more than a few billions of years ago in either prokaryotic (Prochlorococcus marinus) or eukaryotic (Ashbya gossypii) group share a relatively constant mean gene size. The mean gene size adds another distinct characteristic for the discrimination between the two kingdoms of organisms. Wang, Hsieh, and Li (2005)
observed a tendency for conservation in length of orthologous proteins among the five eukaryotic genomes, which seems not very surprising given that the comparison was made between the proteins that are highly evolutionarily conserved across the species. The present study considers whole-genome coding sequence, contrasts it against the number of genes in the genome, and thus reveals a more general tendency in gene length evolution.
Question may arise for making use of the coding sequence data and the gene numbers predicted for the eukaryotic genomes because the current state of de novo gene prediction from sequence data may have various intrinsic limitations (Zhang 2002
). The 19 eukaryotic genomes from 35 candidates under this study have the gene annotation validated either through genome-wide cDNA and/or expressed sequence tag comparison or through comparison in the gene prediction between one genome and that of its closely related species (see Supplementary Material online). Reliability of the criteria has been tested from three yeast species (Kellis et al. 2003
). Moreover, all the 19 eukaryotes do also survive the commonly recommended 70% accuracy hurdle (Bork 2000
).
| Supplementary Material |
|---|
|
|
|---|
Tables 1 and 2 and the other supplementary material are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank two anonymous reviewers for their constructively critical comments that have helped improve the presentation of this paper, and we particularly owe to one of the reviewers who pointed out the perfect linear relationship between the number of genes and the total length of genome sequence in the prokaryotes. This study was supported by China's National Natural Science Foundation (30430380) and the Basic Research Program of China (2004CB518605). Z.W.L. is also supported by research grants from the Biotechnology and Biological Sciences Research Council and the Natural Environment Research Council of the United Kingdom.
| Footnotes |
|---|
William Martin, Associate Editor
| References |
|---|
|
|
|---|
Akashi, H. 2003. Translational selection and yeast proteome evolution. Genetics 164:12911303.
Bork, P. 2000. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res. 10:398400.
Claverie, J., and H. Ogata. 2003. The insertion of palindromic repeats in the evolution of protein. Trends Biochem. Sci. 28:7580.[CrossRef][Web of Science][Medline]
Kellis, M., N. Patterson, M. M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241254.[CrossRef][Medline]
Li, W. H. 1997. Molecular evolution. Sinauer Associates Inc., Sunderland, Mass.
Long, M. Y., E. Betran, K. Thornton, and W. Wang. 2003. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4:865875.[Web of Science][Medline]
Schneider, A., and D. Ebert. 2004. Covariation of mitochondrial genome size with gene lengths: evidence for gene length reduction during mitochondrial evolution. J. Mol. Evol. 59:9096.[Medline]
Wang, D. Y., M. F. Hsieh, and W. H. Li. 2005. A general tendency for conservation of protein length across eukaryotic kingdom. Mol. Biol. Evol. 22:142147.
Zhang, J. 2000. Protein length distributions for the three domains of life. Trends Genet. 16:107109.[CrossRef][Web of Science][Medline]
Zhang, M. Q. 2002. Computational prediction of eukaryotic protein-coding genes. Nat. Rev. Genet. 3:698710.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
E. M. Kramer and S. A. Hodges Aquilegia as a model system for the evolution and ecology of petals Phil Trans R Soc B, February 12, 2010; 365(1539): 477 - 490. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Catania, X. Gao, and D. G. Scofield Endogenous Mechanisms for the Origins of Spliceosomal Introns J. Hered., September 1, 2009; 100(5): 591 - 596. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Lippow, P. M. Aha, M. H. Parker, W. J. Blake, B. M. Baynes, and D. Lipovsek Creation of a type IIS restriction endonuclease with a long recognition sequence Nucleic Acids Res., May 1, 2009; 37(9): 3061 - 3073. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Biers, S. Sun, and E. C. Howard Prokaryotic Genomes and Diversity in Surface Ocean Waters: Interrogating the Global Ocean Sampling Metagenome Appl. Envir. Microbiol., April 1, 2009; 75(7): 2221 - 2229. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Birin, Z. Gal-Or, I. Elias, and T. Tuller Inferring horizontal transfers in the presence of rearrangements by the minimum evolution criterion Bioinformatics, March 15, 2008; 24(6): 826 - 832. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Marguet, F. Balagadde, C. Tan, and L. You Biology by design: reduction and synthesis of cellular components and behaviour J R Soc Interface, August 22, 2007; 4(15): 607 - 623. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Giacomelli, A. S. Hancock, and J. Masel The Conversion of 3' UTRs into Coding Regions Mol. Biol. Evol., February 1, 2007; 24(2): 457 - 464. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







