MBE Advance Access originally published online on September 20, 2006
Molecular Biology and Evolution 2006 23(12):2474-2479; doi:10.1093/molbev/msl128
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Large Global Effective Population Sizes in Paramecium


* Department of Biology, Indiana University
Molecular Evolution and Animal Systematics, Institute of Biology II, University of Leipzig, Leipzig, Germany
E-mail: milynch{at}indiana.edu.
| Abstract |
|---|
|
|
|---|
The genetic effective population size (Ne) of a species is an important parameter for understanding evolutionary dynamics because it mediates the relative effects of selection. However, because most Ne estimates for unicellular organisms are derived either from taxa with poorly understood species boundaries or from host-restricted pathogens and most unicellular species have prominent phases of clonal propagation potentially subject to strong selective sweeps, the hypothesis that Ne is elevated in single-celled organisms remains controversial. Drawing from observations on well-defined species within the genus Paramecium, we report exceptionally high levels of silent-site polymorphism, which appear to be a reflection of large Ne.
Key Words: ciliates effective population size genome evolution mitochondrial DNA Paramecium
| Introduction |
|---|
|
|
|---|
Through its inverse relationship with the power of random genetic drift, the genetic effective size of a population (Ne) plays a central role in evolution by mediating the efficiency of selection. Because the probabilities of fixation of beneficial and deleterious mutations, respectively, scale positively and negatively with Ne, species with smaller Ne are expected to accumulate more mildly deleterious and fewer beneficial mutations (Ohta 1973
In principle, the long-term Ne of a species can be estimated from observations on genetic markers assumed to be neutral, although such measures are actually composite functions of both Ne and the substitutional mutation rate per nucleotide site per generation (µ). For a population in drift-mutation equilibrium, the average number of substitutions between neutral nucleotide sites in randomly sampled alleles is equal to the ratio of the rate of input of variation by mutation (2µ) to the rate of loss by gamete sampling (1/2Ne), or
s = 4Neµ in a diploid population. Thus, given an estimate of µ, Ne can be inferred indirectly from surveys of silent-site polymorphisms in protein-coding genes. Numerous observations of this sort suggest that Ne is negatively correlated with organism size over a wide phylogenetic range of species (Lynch and Conery 2003
; Lynch 2006
), although the scaling is much less pronounced than that between absolute population size and organism size (Finlay 2002
).
Despite the substantial evidence that free-living unicellular lineages generally harbor much higher levels of silent-site diversity than those of multicellular species and the fact that such estimates are most likely to be downwardly biased in high-Ne species (Lynch and Conery 2004
; Lynch 2006
), the idea that microbial species have elevated effective population sizes has been questioned (Daubin and Moran 2004
; Katz et al. 2006
). Contributing to the uncertainties are several fundamental difficulties with surveys of molecular variation in microbes: the shortage of geographically widespread samples, the absence of objective species boundaries defined by reproductive cohesion, and the tendency for investigators to rely on subjective interpretations of clade structures in assigning species status.
To avoid these problems, we have estimated global levels of nucleotide diversity at silent sites for species in the ciliate genus Paramecium. Paramecium are free-living, single-celled eukaryotes that can be assigned to species groups with simple laboratory tests for reproductive compatibility, although several of these species are morphologically cryptic (Sonneborn 1975
; Coleman 2005
). Paramecium also has intriguing genomic organization, including nuclear duality. The transcriptionally active macronucleus regulates vegetative activity, whereas the transcriptionally silent micronucleus acts as the germ line, leading to a situation analogous to the separation of germ line and somatic-cell genomes in multicellular species. Following sexual reproduction, each daughter cell acquires a new recombinant micronuclear genome by conventional syngamy as well as a secondarily processed macronuclear genome derived via amplification of micronuclear DNA, fragmentation, and elimination of noncoding DNA (Preer 1968
). We show below that all species of Paramecium surveyed harbor very high levels of silent-site diversity and exhibit numerous other molecular features that are likely to be the consequences of high long-term Ne.
| Materials and Methods |
|---|
|
|
|---|
Strains
Paramecium aurelia cultures were gifts of Drs. John and Louise Preer (Indiana University, Bloomington, Indiana) and Ewa Pryzsbos (Institute of Systematics and Evolution of Animals, Krakow, Poland, Paramecium biaurelia strains PS, CS, and PW) or ordered from the American Type Culture Center (P. biaurelia strains 185 and 310). These cultures have been held in laboratory culture for several years and were highly inbred, as supported by the homozygosity of all sequenced nuclear genes. Paramecium multimicronucleatum and Paramecium caudatum stocks were recently established lab strains collected from natural populations by S. F. Fokin, and although not necessarily inbred, were analyzed only in the haploid mitochondrial genome. As a definitive determination that strains were assigned to the correct species, mating tests were performed. The geographic origins of all isolates are listed in supplementary table S1, Supplementary Material online.
PCR, Cloning, and Sequencing
Four strains of Paramecium primaurelia, 6 of Paramecium tetraurelia, 6 of P. biaurelia, 25 of P. caudatum, and 8 of P. multimicronucleatum were surveyed for DNA sequence variation. DNA was extracted from P. primaurelia, P. tetraurelia, and P. biaurelia following the DNAzol protocol or Qiagen DNeasy, starting with approximately 0.1 ml of Paramecium culture. DNAzol extractions were followed by phenol/chloroform purification. For DNA extraction from P. caudatum and P. multimicronucleatum, 46 cells from each clonal culture were washed in Eau Volvic and incubated overnight with 100 µl of 10% Chelex and 10 µl Proteinase K (10 mg/ml) at 56 °C. The mixture was then boiled for 20 min, and the supernatant was used for subsequent PCR reactions.
Gene fragments were amplified using the standard PCR technique and primers designed from sequences deposited at National Center for Biotechnology Information or in the Paramecium Genome Survey (http://paramecium.cgm.cnrs-gif.fr). PCR products from P. primaurelia, P. biaurelia, and P. tetraurelia for tRNA synthetase for phenylalanine (Phe tRNA synthetase), cytochrome oxidase subunit 1(cox1), cytochrome B (cytb) and NADH subunit 1 (nadh 1) were directly sequenced from both directions when possible using the PCR primers. PCR fragments that could not be sequenced directly and all kin241 and dihydrofolate reductasethymidylate synthase (dhfrts) PCR fragments were subcloned into the pGEM T-easy Vector (Promega Corporation, Madison, WI) and sequenced from both directions using vector primers. All sequences for P. caudatum and P. multimicronucleatum were subcloned into a TOPO cloning vector (Invitrogen Corporation, Carlsbad, CA). None of these analyses yielded evidence of allelic variation or of cross-amplification of duplicated genes.
Sequence Analysis
Sequences were verified using the forward and reverse sequences and aligned using ClustalX 1.8 (Jeanmougin et al. 1998
). The Kumar method in MEGA 3.0 (Kumar et al. 2004
) was used to calculate levels of silent-site divergence among alleles, and the estimates of species-wide
s were calculated as the average pairwise synonymous divergences, multiplied by n/(n 1), where n is the sample size, to correct for smallsample size bias. Total nuclear and mitochondrial averages reported are weighted by the number of basepairs sequenced for each gene. Phylogenetic trees created in MEGA 3.0 (Neighbor-Joining) and PHYLIP (maximum likelihood; Felsenstein 2004
) were tested by bootstrapping the data 1,000 times. Because the mitochondrial genome is inherited as a single unit, the gene genealogies for these genes were evaluated with single concatenated sequences.
Ka and Ks estimated in MEGA 3.0 following the Kumar method were used to calculate Tajima's D in Microsoft Excel following the original equations (Tajima 1989a
, 1989b
) and the method of Hughes (2005)
, with statistical significance being evaluated according to Tajima's (1989a
, 1989b
) confidence intervals. Ka/Ks ratios were calculated at the within-species and the net between-species levels using MEGA 3.0.
| Results and Discussion |
|---|
|
|
|---|
Average levels of nucleotide diversity at silent sites in the nuclear genes of the 3 members of the P. aurelia complex are in the range of 0.0930.203, with 4 of the 9 locus-specific estimates falling in the range of 0.1550.488 (table 1). These estimates exceed the average from a broad survey of 26 other genera of unicellular eukaryotes (mostly pathogens), 0.051 (standard error [SE] = 0.014), and are well above the averages for vertebrates, invertebrates, and land plants (0.004, 0.026, and 0.015, respectively) (Lynch 2006
|
Our results are dramatically different from those recently reported for another ciliate, Tetrahymena thermophila, which indicate an average silent-site diversity of just 0.003 (Katz et al. 2006
s appears to be quite exceptional as it is substantially smaller than any other estimate for a unicellular eukaryote (5% of the average unicellular-eukaryote estimate even when genetically depauperate parasites are included) and just 10% of the average estimate for invertebrates (Lynch 2006
Although there is variation in estimates of
s among loci within Paramecium lineages (e.g., 1 nuclear locus within P. biaurelia exhibited no variation), such heterogeneity is not unexpected with the low amount of allelic sampling in this study. Substantial sampling error can arise at the level of nucleotides, individuals, and populations even under an entirely neutral situation (Lynch and Crease 1990
), and episodes of selection on specific genes can cause reduced levels of neutral variation in chromosomally linked regions.
The consistently high average estimates of
s across 5 Paramecium species supports the idea that members of this genus have exceptionally high Ne and/or µ. Additionally, phylogenetic observations are consistent with the hypothesis that these high estimates of
s reflect large Ne. Gene genealogies of the P. aurelia species do not form mutually exclusive clades (fig. 1). Because the coalescence time of a gene genealogy is
4Ne generations (Hudson 1990
), the retention of shared polymorphisms from the common ancestor of these 3 species is an expected reflection of historically large long-term effective population sizes. Although there is some possibility of introgression of genes between P. primaurelia and P. tetraurelia, mating tests between the strains involved in our analyses and in extensive earlier work (Sonneborn 1975
; Coleman 2005
) indicate a strong barrier to interspecies gene flow. In any event, should gene flow be occurring between the various named species, this would only reinforce the idea that the base of genetic diversity available to true biological species in the genus Paramecium is exceptionally high.
|
Are these high levels of silent-site variation experimental artifacts or consequences of sampling error? In principle, a false pattern of deep ancestral polymorphism could arise if paralogous members of duplicate genes had been amplified in different strains. To determine the copy number of our nuclear genes in the P. tetraurelia genome, we used the strain 51 (P. tetraurelia) sequences to perform a BLAT (Kent et al. 2002
However, without sequenced micronuclear and macronuclear genomes for all 3 species, we cannot definitively rule out the presence of macronuclear paralogs for any of the nuclear genes. In the extreme case, epigenetic modifications might result in the transmission of different micronuclear paralogs to the macronuclei of different strains within a species (Epstein and Forney 1984
), which could artificially elevate estimates of silent-site polymorphism, but there is no evidence that this is the case in this study, and such an explanation cannot apply to the mitochondrial genome.
In addition, it appears unlikely that the nonreciprocally monophyletic phylogenies of P. primaurelia and P. tetraurelia reflect unusual pressures of balancing selection within these species. For example, 85% of the Tajima's D estimates are consistent with the hypothesis that the surveyed variation is neutral (table 3). The only Tajima's D that is significantly positive for a nuclear gene and, therefore, potentially consistent with balancing selection is that for dhfrts in P. primaurelia when evaluated for nonsynonymous sites alone. The only other circumstantial evidence of balancing selection based on Tajima's D derives from the mitochondrial cytochrome oxidase gene. Cox1 silent sites in P. primaurelia and replacement sites in P. multimicronucleatum have significant positive Tajima's D. However, because of the haploid nature of the mitochondrion, balancing selection via heterozygote superiority is not possible for its genes.
|
Further evidence against the hypothesis that balancing selection is responsible for maintaining the same polymorphisms across P. primaurelia and P. tetraurelia derives from the observation that between-species Ka/Ks is generally smaller than that within populations (table 2). Under balancing selection, within-population Ka/Ks is expected to be relatively low with old alleles (high Ks) maintaining relatively stable amino acid sequences. Although gene-wide analyses of Ka/Ks provide only a weak test for balancing selection if the latter is largely associated with a restricted domain of a protein, sliding-window analyses that we performed revealed no regions of unusually high replacement-site variation. Thus, taken together, neither estimates of Tajima's D nor Ka/Ks support the idea that balancing selection is encouraging the maintenance of molecular variation in Paramecium species. Moreover, the conditions necessary for the maintenance of large numbers of alleles by balancing selection are known to be very stringent (Lewontin et al. 1978
|
Because the expected value of
s is a function of both Ne and µ, one might argue that the unusually high estimates of
s for Paramecium species is a consequence of an elevated mutation rate. However, although there is considerable need for more refined estimates, all existing information suggests that the per-generation mutation rate is substantially lower in unicellular than multicellular species. The average estimate of µ for substitution changes is
0.5 x 109 for 8 prokaryotes,
1.6 x 109 for 4 unicellular eukaryotes,
9.5 x 109 for the nematode Caenorhabditis elegans, and
23.2 x 109 for human (Lynch 2006
109 for Paramecium, our results suggest Ne in the range of 2.5 x 107 to 7.5 x 107 for nuclear genes. Similar analyses for invertebrates, vertebrates, annual plants, and trees yield average Ne estimates of 106, 104, 106, and 104, respectively (Lynch 2006
A recent survey suggests that nuclear and mitochondrial mutation rates are approximately equal in unicellular eukaryotes (Lynch et al. 2006
). This issue can be evaluated with the members of the P. aurelia complex from estimates of the net interspecific divergences at silent sites (in excess of the variation within species). Under the assumption that these sites are neutral, the ratio of divergences for mitochondrial and nuclear genes provides an estimate of the ratio of the mutation rates. In accordance with previous results, this ratio (1.40, SE = 0.54) is not significantly different from one, providing further evidence that Paramecium is not unusual with respect to mutational features. Division of the ratio of within-population silent-site diversities for mitochondrial and nuclear genes by the ratio of mutation rates yields an estimate of the ratio of the effective number of genes per population per locus in the 2 genomes (Lynch et al. 2006
), 0.28 for P. primaurelia, 0.15 for P. biaurelia, and 0.37 for P. tetraurelia. (The effective number of genes per locus is equivalent to the effective population size for a haploid genome and roughly twice that for a diploid genome in an outcrossing species). These values, which are not significantly different from the average estimate available for other unicellular eukaryotes, 0.52 (0.19) (Lynch et al. 2006
), indicate that the power of random genetic drift in the mitochondrial genome of Paramecium is approximately 36 times greater than that for nuclear genes.
Because silent-site variation in species with very large Ne may be somewhat depressed below the neutral expectation by translation-associated selection (codon bias) or by selection on features that influence transcript processing, our estimates of 4Neµ may be somewhat downwardly biased. Thus, our results corroborate the hypothesis that unicellular eukaryotes have effective population sizes 24 orders of magnitude greater than those found in multicellular species. This magnitude of difference is quantitatively sufficient to have a substantial influence on the evolution of genomic architecture, encouraging the hypothesis that the considerable divergence in genome organization and gene structure that exist between multicellular and unicellular eukaryotes is largely a consequence of differences in Ne rather than of differences in cellular or physiological limitations (Lynch and Conery 2003
; Lynch 2006
). It is therefore notable that P. tetraurelia has the smallest recorded average intron size of any eukaryote, just 25 bp, as well as an extraordinarily low amount of intergenic DNA, just 2.1 kb/gene in macronuclear chromosomes (Zagulski et al. 2004
).
| Supplementary Material |
|---|
|
|
|---|
Supplementary table S1 and figure S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank J. Preer and B. Rudman for significant technical guidance and donation of material to this study, E. Pryzbos for donation of Paramecium strains CS, PS, and PW, S. F. Fokin for the donation of several P. caudatum and P. multimicronucleatum strains, and L. Sperling for assistance with access to the genomic sequence of P. tetraurelia. This work was supported by NIH grant R01 GM36827 to M.L., an NSF Predoctoral Graduate Fellowship to M.S.S., and Deutshche Forschungsgemeinschaft (DFG) Schwerpunktprogramm AQUASHIFT (BE 2299/31) to T.B.
| Footnotes |
|---|
Laura Katz, Associate Editor
| References |
|---|
|
|
|---|
Coleman AW. (2005) Paramecium aurelia revisited. J Eukaryot Microbiol 52:6877.[CrossRef][ISI][Medline]
Daubin V and Moran NA. (2004) Comment on "The origins of genome complexity". Science 306:978.
Epstein L and Forney J. (1984) Mendelian and non-mendelian mutations affecting surface antigen expression in Paramecium tetraurelia. Mol Cell Biol 4:15831590.
Felsenstein J. (2004) PHYLIP (phylogeny inference package). Version 3.6. Distributed by the author. (Department of Genome Sciences, University of Washington, Seattle (WA)).
Finlay BJ. (2002) Global dispersal of free-living microbial eukaryote species. Science 296:10611063.
Frankham R. (1995) Effective population size/adult population size ratios in wildlife: a review. Genet Res 66:95107.[ISI]
Gillespie JH. (2000) Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909919.
Hudson RR. (1990) Gene genealogies and the coalescent process. Oxf Surv Evol Biol 7:144.
Hughes AL. (2005) Evidence for abundant slightly deleterious polymorphisms in bacterial populations. Genetics 169:533538.
Jeanmougin F, Thompson J, Gouy M, Higgins D, Gibson T. (1998) Multiple sequence alignment with Clustal X. Trends Biochem Sci 23:403405.[CrossRef][ISI][Medline]
Katz LA, Snoeyenbos-West O, Doerder FP. (2006) Patterns of protein evolution in Tetrahymena thermophila: implications for estimates of effective population size. Mol Biol Evol 23:608614.
Kent WJ, Sugnet CW, Fury TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. (2002) The human genome browser at UCSC. Genome Res 12:9961006.
Krzywicka A, Beisson J, Keller AM, Cohen J, Jerka-Dziadosz M, Klotz C. (2001) Kin241: a gene involved in cell morphogenesis in Paramecium tetraurelia reveals a novel protein family of cyclophilin-RNA interacting proteins (CRIPs) conserved from fission yeast to man. Mol Microbiol 42:257267.[CrossRef][ISI][Medline]
Kumar S, Tamura K, Nei M. (2004) Mega 3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5:150163.
Lewontin RC, Ginzburg LR, Tuljapurkar SD. (1978) Heterosis as an explanation for large amounts of genic polymorphism. Genetics 88:149170.
Lynch M. (2006) The origins of eukaryotic gene structure. Mol Biol Evol 23:450468.
Lynch M and Conery JS. (2003) The origins of genome complexity. Science 302:14011404.
Lynch M and Conery JS. (2004) Response to comment on "The origins of genome complexity". Science 306:978.
Lynch M and Crease T. (1990) The analysis of population survey data on DNA sequence variation. Mol Biol Evol 7:377394.[Abstract]
Lynch M, Koskella B, Schaack S. (2006) Mutation pressure and the evolution of organelle genome structure. Science 311:17271730.
Ohta T. (1973) Slightly deleterious mutant substitutions in evolution. Nature 246:9698.[CrossRef][Medline]
Ohta T. (2002) Near-neutrality in evolution of genes and gene regulation. Proc Natl Acad Sci USA 99:1613416137.
Preer JR Jr. (1968) Genetics of protozoa. (Pergamon Press, Oxford).
Schlichtherle IM, Roos DS, Van Houten JL. (1996) Cloning and molecular analysis of the bifunctional dihydroflate reductase-thymidylate synthase gene in the ciliated protozoan Paramecium tetraurelia. Mol Gen Genet 250:665673.[ISI][Medline]
Sonneborn TM. (1975) The Paramecium aurelia complex of fourteen sibling species. Trans Am Microsc Soc 94:155178.[CrossRef]
Tajima F. (1989a) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595.
Tajima F. (1989b) The effect of change in population size on DNA polymorphism. Genetics 123:597601.
Zagulski M, Nowak JK, Le Mouel A, et al. (14 co-authors). (2004) High coding density on the largest Paramecium tetraurelia somatic chromosome. Curr Biol 14:13971404.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
