Skip Navigation


MBE Advance Access originally published online on April 12, 2006
Molecular Biology and Evolution 2006 23(6):1107-1108; doi:10.1093/molbev/msk019
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/6/1107    most recent
msk019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, L.
Right arrow Articles by Luo, Z. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, L.
Right arrow Articles by Luo, Z. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Letter

Average Gene Length Is Highly Conserved in Prokaryotes and Eukaryotes and Diverges Only Between the Two Kingdoms

Lin Xu*, Hong Chen*, Xiaohua Hu*, Rongmei Zhang*, Ze Zhang{dagger} and Z. W. Luo*,{dagger}

* Laboratory of Population and Quantitative Genetics, School of Life Sciences and Institute of Biomedical Sciences, Fudan University, Shanghai, China; and {dagger} School of Biosciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom

E-mail: z.luo{at}bham.ac.uk.


    Abstract
 TOP
 Abstract
 Background
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
The average length of genes in a eukaryote is larger than in a prokaryote, implying that evolution of complexity is related to change of gene lengths. Here, we show that although the average lengths of genes in prokaryotes and eukaryotes are much different, the average lengths of genes are highly conserved within either of the two kingdoms. This suggests that natural selection has clearly set a strong limitation on gene elongation within the kingdom. Furthermore, the average gene size adds another distinct characteristic for the discrimination between the two kingdoms of organisms.

Key Words: average gene length • prokaryote • eukaryote


    Background
 TOP
 Abstract
 Background
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Gene elongation is recognized as one of the most important steps in the evolution of functional complexities of genes (Li 1997Go) and in the evolution of new genes (Long et al. 2003Go). Zhang (2000)Go calculated the mean and median of the proteins from 22 species including several representative organisms such as Escherichia coli, yeast, nematode, Drosophila, humans, and Arabidopsis of which the genome sequence information was available at the time. He observed that orthologous genes are longer in eukaryotes than in prokaryotes and that eukaryote-specific proteins are longer on average than prokaryote-specific proteins. Wang, Hsieh, and Li (2005)Go analyzed orthologous protein data in detail by reconstructing the ancestral states among the eukaryotes under question. They found that proteins in yeast, nematode, Drosophila, humans, and Arabidopsis are, on average, longer than their orthologs in E. coli and observed conservation of protein sequence length across eukaryotic kingdoms. We present here a more general pattern of the size of coding sequence of prokaryotic and eukaryotic genes and show that the mean length of genic coding sequence (MLGCS) is highly conserved in prokaryotes and eukaryotes but diverges between the two kingdoms.


    Results and Discussion
 TOP
 Abstract
 Background
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
We have surveyed almost all prokaryotic and eukaryotic species whose complete genome sequence data are available and well annotated up to date. These included 81 prokaryotes and 19 eukaryotes to which predictions of the coding sequences were validated and are listed in Tables 1 and 2 of the Supplementary Material online. The tables also illustrated genome sequence size (N) in kilobase pairs, number of predicted genes (n), and ratio of coding sequence over the genome sequence (r) for each of these species together with the key reference from which the data were collected. From these parameters, we calculated the MLGCS within each of the species as L = N x r/n. We regressed the estimate of total coding sequence length on the estimate of the number of genes for each of the two groups of species and demonstrated the analyses in figure 1.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
 
FIG. 1.— Analysis of regression of total coding sequence length on the number of genes in 81 prokaryotic species and in 19 eukaryotic species.

 
It can be seen from figure 1 that there is a perfect linear relationship between the total coding sequence and the number of genes in both prokaryotic and eukaryotic genomes. The perfect linear relationship does hold between the number of genes and the total sequence length in the prokaryotes but it does not in the eukaryotes probably because of introns, transposable elements, and junk DNA in the eukaryotic genomes (data not shown). The mean (standard error), the coefficients of skewness, and the coefficients of kurtosis of MLGCS were estimated as 924 (9) bp, 0.1952, and 3.3501 for the prokaryotic group and as 1,346 (28) bp, 0.1723, and 2.5661 for the eukaryotic group, respectively. The analyses indicate that the genic coding sequence has a relatively constant average length in both prokaryotes and eukaryotes in spite of the remarkable variation in the coding sequence length among individual genes within these genomes. The coding sequence of a gene in the eukaryote kingdom is on average 445 bp longer than that in the prokaryotes.

It is widely accepted that natural selection favors shorter genic coding sequence length for higher transcriptional efficiency, for efficient protein synthesis, and for avoiding accumulation of deleterious mutation, on one hand, but evolution seems to improve the function of a protein through elongating its coding sequence on the other (Li 1997Go; Zhang 2000Go; Akashi 2003Go; Claverie and Ogata 2003Go; Wang, Hsieh, and Li 2005Go). Schneider and Ebert (2004)Go have recently argued that the covariation between genome size and gene length is expected to be strongest in smallest genomes and that selection for reduced gene length becomes progressively weaker when genomes become larger. Our observation suggests that there is a stringent structural constraint on evolution of gene size on a genomic scale. The species that have been diverged for more than a few billions of years ago in either prokaryotic (Prochlorococcus marinus) or eukaryotic (Ashbya gossypii) group share a relatively constant mean gene size. The mean gene size adds another distinct characteristic for the discrimination between the two kingdoms of organisms. Wang, Hsieh, and Li (2005)Go observed a tendency for conservation in length of orthologous proteins among the five eukaryotic genomes, which seems not very surprising given that the comparison was made between the proteins that are highly evolutionarily conserved across the species. The present study considers whole-genome coding sequence, contrasts it against the number of genes in the genome, and thus reveals a more general tendency in gene length evolution.

Question may arise for making use of the coding sequence data and the gene numbers predicted for the eukaryotic genomes because the current state of de novo gene prediction from sequence data may have various intrinsic limitations (Zhang 2002Go). The 19 eukaryotic genomes from 35 candidates under this study have the gene annotation validated either through genome-wide cDNA and/or expressed sequence tag comparison or through comparison in the gene prediction between one genome and that of its closely related species (see Supplementary Material online). Reliability of the criteria has been tested from three yeast species (Kellis et al. 2003Go). Moreover, all the 19 eukaryotes do also survive the commonly recommended 70% accuracy hurdle (Bork 2000Go).


    Supplementary Material
 TOP
 Abstract
 Background
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Tables 1 and 2 and the other supplementary material are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Background
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
We thank two anonymous reviewers for their constructively critical comments that have helped improve the presentation of this paper, and we particularly owe to one of the reviewers who pointed out the perfect linear relationship between the number of genes and the total length of genome sequence in the prokaryotes. This study was supported by China's National Natural Science Foundation (30430380) and the Basic Research Program of China (2004CB518605). Z.W.L. is also supported by research grants from the Biotechnology and Biological Sciences Research Council and the Natural Environment Research Council of the United Kingdom.


    Footnotes
 
William Martin, Associate Editor


    References
 TOP
 Abstract
 Background
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Akashi, H. 2003. Translational selection and yeast proteome evolution. Genetics 164:1291–1303.[Abstract/Free Full Text]

    Bork, P. 2000. Powers and pitfalls in sequence analysis: the 70% hurdle. Genome Res. 10:398–400.[Free Full Text]

    Claverie, J., and H. Ogata. 2003. The insertion of palindromic repeats in the evolution of protein. Trends Biochem. Sci. 28:75–80.[CrossRef][Web of Science][Medline]

    Kellis, M., N. Patterson, M. M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254.[CrossRef][Medline]

    Li, W. H. 1997. Molecular evolution. Sinauer Associates Inc., Sunderland, Mass.

    Long, M. Y., E. Betran, K. Thornton, and W. Wang. 2003. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4:865–875.[Web of Science][Medline]

    Schneider, A., and D. Ebert. 2004. Covariation of mitochondrial genome size with gene lengths: evidence for gene length reduction during mitochondrial evolution. J. Mol. Evol. 59:90–96.[Medline]

    Wang, D. Y., M. F. Hsieh, and W. H. Li. 2005. A general tendency for conservation of protein length across eukaryotic kingdom. Mol. Biol. Evol. 22:142–147.[Abstract/Free Full Text]

    Zhang, J. 2000. Protein length distributions for the three domains of life. Trends Genet. 16:107–109.[CrossRef][Web of Science][Medline]

    Zhang, M. Q. 2002. Computational prediction of eukaryotic protein-coding genes. Nat. Rev. Genet. 3:698–710.[CrossRef][Web of Science][Medline]

Accepted for publication April 10, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. M. Lippow, P. M. Aha, M. H. Parker, W. J. Blake, B. M. Baynes, and D. Lipovsek
Creation of a type IIS restriction endonuclease with a long recognition sequence
Nucleic Acids Res., May 1, 2009; 37(9): 3061 - 3073.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
E. J. Biers, S. Sun, and E. C. Howard
Prokaryotic Genomes and Diversity in Surface Ocean Waters: Interrogating the Global Ocean Sampling Metagenome
Appl. Envir. Microbiol., April 1, 2009; 75(7): 2221 - 2229.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Birin, Z. Gal-Or, I. Elias, and T. Tuller
Inferring horizontal transfers in the presence of rearrangements by the minimum evolution criterion
Bioinformatics, March 15, 2008; 24(6): 826 - 832.
[Abstract] [Full Text] [PDF]


Home page
J R Soc InterfaceHome page
P. Marguet, F. Balagadde, C. Tan, and L. You
Biology by design: reduction and synthesis of cellular components and behaviour
J R Soc Interface, August 22, 2007; 4(15): 607 - 623.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. G. Giacomelli, A. S. Hancock, and J. Masel
The Conversion of 3' UTRs into Coding Regions
Mol. Biol. Evol., February 1, 2007; 24(2): 457 - 464.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/6/1107    most recent
msk019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Xu, L.
Right arrow Articles by Luo, Z. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xu, L.
Right arrow Articles by Luo, Z. W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?