MBE Advance Access originally published online on March 20, 2006
Molecular Biology and Evolution 2006 23(6):1217-1231; doi:10.1093/molbev/msk006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
A Unique Recent Origin of the Allotetraploid Species Arabidopsis suecica: Evidence from Nuclear DNA Markers


,||

* Bioinformatics Program, Department of Human Genetics, University of Michigan;
Department of Cell and Organism Biology, Genetics, Lund University, Lund, Sweden;
Department of Biology, Linköping University, Linköping, Sweden;
Molecular and Computational Biology, University of Southern California; || Department of Oncology, University of Cambridge, Cambridge, United Kingdom; ¶ Department of Clinical Chemistry, Malmö University Hospital; # Department of Mathematics and Natural Sciences, Kristianstad University, Kristianstad, Sweden
E-mail: mjakob{at}umich.edu.
| Abstract |
|---|
|
|
|---|
A coalescent-based method was used to investigate the origins of the allotetraploid Arabidopsis suecica, using 52 nuclear microsatellite loci typed in eight individuals of A. suecica and 14 individuals of its maternal parent Arabidopsis thaliana, and four short fragments of genomic DNA sequenced in a sample of four individuals of A. suecica and in both its parental species A. thaliana and Arabidopsis arenosa. All loci were variable in A. thaliana but only 24 of the 52 microsatellite loci and none of the four sequence fragments were variable in A. suecica. We explore a number of possible evolutionary scenarios for A. suecica and conclude that it is likely that A. suecica has a recent, unique origin between 12,000 and 300,000 years ago. The time estimates depend strongly on what is assumed about population growth and rates of mutation. When combined with what is known about the history of glaciations, our results suggest that A. suecica originated south of its present distribution in Sweden and Finland and then migrated north, perhaps in the wake of the retreating ice.
Key Words: Arabidopsis suecica Arabidopsis thaliana polyploidy polymorphism speciation founders
| Introduction |
|---|
|
|
|---|
It has long been recognized that polyploidy has played an important role in speciation, especially in plants (Stebbins 1971
Arabidopsis suecica (Fr.) was first described as an allotetraploid (2n = 26) by Hylander (1957)
, who suggested Arabidpsis thaliana and Arabidopsis (Cardaminopsis) arenosa as its parental species, a suggestion since supported by molecular phylogenetics (Kamm et al. 1995
; O'kane, Schaal, and Al-Shehbaz 1996
). A. suecica is found in Sweden and southern Finland (Hultén 1971
) and it is highly, but not completely, selfing (Säll et al. 2004
). A. thaliana is a highly selfing, cosmopolitan weed, familiar as the leading model plant. It is diploid (2n = 10), although tetraploid A. thaliana individuals are occasionally found (L. Comai and M. Koornneef, personal communication). A. arenosa is a European species which is tetraploid (2n = 4x = 32) in most of its range and reportedly diploid (2n = 2x = 16) in a small area in eastern Europe (Mesicek 1970
). Our observations in the greenhouse indicate that A. arenosa is self-incompatible and thus an obligate outbreeder (Säll et al. 2004
). Several lines of evidence, including most recently cpDNA sequencing, have identified A. thaliana as the maternal parent of A. suecica (Mummenhoff and Hurka 1994
; Price, Al-Shebaz, and Palmer 1994;
Comai et al. 2000
; Säll et al. 2003
). Furthermore, when producing "artificial" A. suecica through crosses, Comai et al. (2000)
only succeeded in producing viable offspring when A. thaliana was the maternal parent. Given this, there are still several possible routes through which A. suecica could have originated (table 1). The most likely scenario would appear to involve a normal male gamete from a tetraploid A. arenosa fertilizing either a normal female gamete from a tetraploid A. thaliana (which are rare, but do exist), or an unreduced female gamete from a diploid A. thaliana.
|
An important aspect of the formation of a polyploid species is the number of independent origins. Many allopolyploid species are believed to have multiple origins (D. Soltis and P. Soltis 1993
In this paper we introduce a Bayesian multi-locus method based on coalescent theory that can be used to infer the number of origins and the time of the origin of a polyploid. The method is based on a rejection algorithm where sets of demographic parameters are simulated under a demographic model of two species (one "parent" and one allopolyploid "offspring" species). We utilize our method to analyze two different data sets: the first is 52 nuclear microsatellite markers from 8 A. suecica and 14 A. thaliana individuals; the second is a smaller data set of four (
0.5 kb each) sequenced fragments of nuclear DNA from each of four individuals of A. suecica, as well as from the two parental species A. thaliana and A. arenosa. We conclude that there has been a single origin of A. suecica between 12,000 and 300,000 years ago, although the time-of-origin estimates are sensitive to modeling assumptions.
| Materials and Methods |
|---|
|
|
|---|
Sequencing
DNA was extracted from fresh leaves using Qiagen miniprep kits. Four A. suecica individuals from four geographically distinct locations covering much of the species range were sampled (table 2). Four short fragments located on chromosome 4 in A. thaliana were polymerase chain reaction (PCR)-amplified in each of the four A. suecica individuals, using primers designed from the A. thaliana genome sequence (Table S1, Supplementary Material online). The primers had previously been used to amplify and sequence four fragments in 20 accessions of A. thaliana (see Nordborg and Bergelson (1999)
|
For the allotetraploid A. suecica individuals, the PCR products were cloned in order to allow individual alleles to be sequenced. Cloning was done with the pGEMT Easy Vector System using standard protocols. Clones were then selected randomly and used for an additional round of PCR amplification to provide template for direct sequencing of ten clones from each individual and fragment. Assuming no bias in cloning or PCR amplification, this gives a 99.9% chance of amplifying at least one copy of each of the two ancestral allelic types (A. arenosa and A. thaliana).
Once sequences from the parental genomes in A. suecica had been obtained by sequencing cloned fragments, we designed allele-specific primers to directly amplify and sequence each sequence type. This allowed us to distinguish variation introduced by cloning and PCR from natural variation.
Sequencing of cloned PCR fragments from A. suecica genomic DNA resulted in sequences that could easily be designated as thaliana-like or arenosa-like by comparison with sequences from the ancestral species. For each primer pair, there were always more clones with thaliana-like than with arenosa-like sequences, and for one fragment, no arenosa-like sequences at all were obtained even though an additional 10 clones were sequenced. The bias toward thaliana-like sequences was not surprising, as the primers used for PCR amplification were designed using the A. thaliana genome sequence.
Recombinant fragments were found in 6 out of the original 40 clones sequenced. Recombination between chromosomes of different parental origin is possible but seems unlikely, and allele-specific primers revealed that the recombinant fragments were indeed cloning artifacts.
In summary we successfully sequenced both the arenosa-like and the thaliana-like alleles for three out of the four fragments. For the fourth fragment, only a thaliana-like sequence was obtained (table 3). There was no variation within allelic classes, either within or between individuals.
|
Multiple alignments were done in Sequencher, with additional adjustments by hand. Genealogical trees were drawn using the phylogeny program PAUP 4.0.0b10 (Sinauer Associates, Sunderland, USA) or by hand (as the sequences were on the whole very similar). The trees were rooted using outgroup sequences when these were available. NCBI annotation for A. thaliana BAC clone F6N23 (accession number AF058919) was used to determine the positions of exons and introns in the fragments.
For the outgroup species, we used whichever species that was successfully amplified and sequenced. This turned out to be Cardamine hirsuta for two fragments and Capsella bursa-pastoris for a third. No outgroup was obtained for the fourth fragment (see table 3, below). The outgroup sequence was used to assign mutations to either the branch of A. thaliana or A. arenosa. Mutations that could not be assigned to any branch were divided evenly between the two branches. The mutation rates were then calculated by dividing the number of mutations on each branch by the product of the length of the sequence and the time since divergence of the two species.
Microsatellite Analysis
A total of eight A. suecica individuals from geographically distinct locations covering most of the species range were sampled in addition to 14 A. thaliana individuals (table 4). DNA was isolated from young greenhouse-grown offspring of the sampled plants using a Plant DNeasy kit from Qiagen according to the manufacturer's instructions. DNA integrity was checked by agarose gel electrophoresis and DNA concentration was determined using fluorometry with Pico Green (Molecular Probes, Carlsbad, Calif.). A total of 52 microsatellite loci were identified from the complete sequence of A. thaliana (version 2.0) using Tandem Repeat Finder (Benson 1999
) and appropriate primers (C. Lind-Halldén, C. Halldén, M. Jakobsson, and T. Säll, unpublished data) were designed with Oligo (v. 6.3). Primer sequences were subsequently analyzed for specificity using Blast. One primer for each pair was labeled with either HEX or 6-FAM. PCR assays were optimized with regard to annealing temperature and primer concentrations. The exact PCR conditions for each primer pair are available from the authors on request. PCR reactions contained 5 ng template DNA, 2 mM MgCl2, 100 µM dNTP (Amersham/Pharmacia, Uppsala, Sweden), 0.75 units AmpliTaq Gold polymerase (Applied Biosystems, Foster City, Calif.), 1 x PCR reaction buffer (Applied Biosystems), and 50900 nM of each primer in a final volume of 25 µl. PCR products were resolved using capillary electrophoresis run on ABI 310 or 3730 sequencers employing GeneScan and GeneMapper v.3.0 software (Applied Biosystems). The allele sizes of the microsatellite markers were determined in relation to a size marker, GeneScan-500 LIZ (Applied Biosystems). The 52 microsatellite loci were recruited from the whole nuclear A. thaliana genome with an overrepresentation of loci from chromosome 2.
|
Models and Rejection Algorithms
Because the sequence data set was much smaller, did not include a good sample of Swedish A. thaliana accessions, and no variation in A. suecica, we used slightly different models for the two data sets. For the sequence data we only consider polymorphism within A. suecica, and restrict ourselves to inferring the time-of-origin under the assumption of a single origin. Because no variation was detected in A. suecica the maximum likelihood estimate of the number of origins would be a single origin under any model. For the microsatellite data we model polymorphism within both species, and infer the number of founders as well.
Before we describe the explicit models used in this article let us introduce some notation. For a population with constant size N, the coalescent model states that the time Wj during which a sample of size n has j ancestors (
) is exponentially distributed with parameter j(j 1)/2. The times for different j are independent and we assume in the following that time is counted in units of N generations. The time to the most recent common ancestor T is given by
![]() | (1) |
![]() | (2) |
When the population size is variable, the times Wn, Wn1, ..., W2 are no longer independent. In the case when the population size grows exponentially (forward in time), the population size at time t (backwards in time) is
for some constant value of
. The conditional distribution of the time Wj for which there are exactly j ancestors, given that the time in which there are more than j ancestors is s, is (Tavaré et al. 1997
; eq. 21):
![]() | (3) |
in the case of exponential growth.
For a set of parameters
, the conditional probability density function of
given the data D is denoted f(
|D),
![]() | (4) |
(
) is the product of the probability density functions (pdf) of the prior distributions and P(G) is the probability of the genealogy that is given by the joint pdf (3). We will also assume that mutations occur independently in each branch with constant rate µ. Then, if a branch is of length w, the number of mutations, k, will be Poisson distributed with mean wNµ, Po(k, wNµ).
A Model of a Species Founded by One Individual, Model 1
For the sequence data we considered a model (Model 1) where A. suecica grew exponentially from a single individual
years (or generations: A. suecica appears to be annual) ago to a present effective size of N individuals (we tried different scenarios for growth, but they all gave similar results). Gene genealogies follow the standard coalescent distribution for a growing population (eq. 3). We assume that A. suecica is sufficiently highly selfing that the rate of coalescence is twice that of an equivalent outbreeding population (Nordborg and Donnelly 1997
). Neutral mutations were assumed to occur with probability µs per bp per generation. For
= {
, N, µs} and by substituting the data D with the number of segregating sites S, the conditional pdf (4) can be expressed as (Tavaré et al. 1997
):
![]() | (5) |
, T). The pdf
N(u) and 
(x) are the prior distribution of N, µs, and
. Because the sequenced fragments were located within a 70-kb region, the degree of selfing influences how we should treat the seven fragments (four thaliana-like and three arenosa-like). If A. suecica was completely selfing, then all seven fragments must have the same genealogy. If A. suecica was completely outcrossing, then the seven fragments would probably have almost independent genealogical histories depending on the recombination fraction between the fragments. We considered both extremes, complete selfing and complete outcrossing, but the results turned out to be almost identical so we only present the results from the former model here.
Estimating the Time-of-origin from Model 1
We assumed uniform prior distributions for the parameters. The posterior distribution given the data (eq. 5) was evaluated using a simple rejection algorithm (cf. Tavaré et al. 1997
):
Algorithm 1. Rejection algorithm for f(, N, µs|S = k).
- Simulate a parameter set
= {
, N, µs} from the prior distributions.
- Simulate a standard haploid (to reflect selfing) coalescent for a population that has undergone
generations of exponential growth from a single individual to reach a current size of N (eq. 3), and record the total length L* (eq. 2) of the branches up to the minimum of
and T (eq. 1) of the sample.
- Accept the parameter set with probability exp(NL*µsb), where b is the number of sequenced bases (thus only parameter sets without mutations are accepted), otherwise reject it.
The parameter values that result from accepted observations in this algorithm are then samples from the posterior distribution.
A Model of a Parent Species and an Offspring Species, Model 2
For the microsatellite data we considered a model (Model 2; fig. 1) of two species, t and s (A. thaliana and A. suecica), which have been completely separated from each other for
years (or generations). Both species have grown exponentially from 
years ago (0 <
1) to the present. The population size of species t was Nt(
) at time
(as well as at time 
) and has grown to a population size of Nt(0) at present. We define Ns(
) as the population size of species s a negligibly short time
s before (backwards in time) the actual founding event. This means that the n founders are allowed to increase to Ns(
) during
s, and the effect of drift during
s will be considered below. Before (backwards in time) this short initial phase drift is modeled using the coalescent. The population size of species s grows exponentially from Ns(
) to a population size of Ns(0) at present. After
years (backwards in time) the two populations were considered to be one population with a constant population size of Nt(
)+Ns(
). Gene genealogies from 0 to 
years ago follow the standard coalescent distribution for a growing population (eq. 3) and the gene genealogies after 
years follow the standard coalescent distribution with constant population size (see, e.g., Nordborg 2001
). We again assume that both species are sufficiently highly selfing that the rate of coalescence is twice that of an equivalent outbreeding population (Nordborg and Donnelly 1997
). Microsatellite loci were modeled according to a stepwise mutation model (Ohta and Kimura 1973
), that is, mutations were assumed to be Poisson distributed on each branch and a mutation is an addition or a subtraction of one repeat unit at a particular microsatellite locus. Mutations were assumed to occur with probability µm per generation and locus. Here we use another summary statistic, the average pairwise difference (S'), to simplify the data D. S' was computed using the following expression:
![]() | (6) |
![]() | (7) |
![]() | (8) |
was fixed to a predefined value between 0 and 1. For each of the parameters a uniform prior distribution was assumed.
|
Algorithm 2. Rejection algorithm for f(, Nt(0), Ns(0), Nt(
), Ns(
), µm|DS <
).
- Simulate a parameter set
= {
, Nt(0), Ns(0), Nt(
), Ns(
), µm} from the prior distributions.
- Repeat steps a and b for the total number of modeled loci, I:
- a. Simulate two standard haploid (to reflect selfing) coalescent for the two populations (species t and species s) until
years ago, where the population sizes were Nt(
) and Ns(
) (note that the population sizes were constant from
years to
years) that have undergone
generations of exponential growth (eq. 3) to reach a current sizes of Nt(0) and Ns(0).
- b. Simulate a standard haploid coalescent from
years with constant population size of Nt(
)+Ns(
) for all remaining lines at
.
- Accept the parameter set if DS <
, where
is a value > 0, otherwise reject it.
We chose
to equal ß(S't,obs+S's,obs) where ß was set to some small value (ß was chosen to be as small as possible and still accept reasonable number of samples from the algorithm in a reasonable amount of time. We used ß = 0.05, unless otherwise indicated). The parameter values that result from accepted observations in Algorithm 2 are then samples from the posterior distributions (8) of the parameters. Our model of an ancestral population that splits into two populations is relatively similar to the model of Hey (2005)
. The method described above is derived from Tavaré et al. (1997)
and Pritchard et al. (1999)
. Our method uses multi-locus data and infers the number of origins of a particular species, which are the major differences between our method and the method of Prichard et al. (1999). For additional similar approaches see Approximate Bayesian Computation methods (Beaumont, Zhang, and Balding 2002
).
A Model of n Founders, Model 2a
To estimate the number of founders we modified Model 2 to allow modeling a fixed number of founders, n slightly. The modified model, Model 2a, includes an extra parameter n and founders denoted j = 1, ..., n. All lines from species s that remain at time
are randomly assigned to the n founders with equal probability (1/n) for each locus i. We thus do not model drift during the establishment of the species during the first
s generations since the appearance of A. suecica (or, rather, the first A. suecica that could have been the ancestor of modern A. suecica: it is possible that earlier polyploidization events occurred that subsequently went extinct) explicitly. Our results can be viewed as being conditional on the composition of the species after it has become established. The maximum number of founders of species s is then n, but fewer founders are possible in this model. This is the only difference between Model 2 and Model 2a. The accepted parameter sets are then samples from the posterior distribution of n. We also used Model 2a to compare scenarios of different number of founders n, given the data, for fixed values of
, Nt(0), Ns(0), Nt(
), Ns(
), and
. Likelihoods were approximated by the acceptance rate of Algorithm 2.
An Unequal Contribution Model, Model 2b
Model 2a assumes that each founder j contributed equally to the founding of the species. A slightly more realistic model incorporates the possibility of unequal contributions of the n founders. We therefore made a modification of Model 2a that allows unequal contributions of different founders to the whole founding population. This was done by assigning the lineages of species s that remain at
to either of the j founders with probability pj, and
The probability pj can be considered to be the proportion that founder j contributed to the whole founding population. The founding population is then a common gene pool [of population size Ns(
)] at time
to which each founder j contributed pj. This founding population is then modeled in the same way as in Model 2. This model is henceforth known as Model 2b. We approximated the posterior distribution of pj given the data in the same way as for Model 2a.
| Results |
|---|
|
|
|---|
Sequence and Microsatellite Variation in A. thaliana, A. arenosa, and A. suecica
Sequence divergence between A. thaliana and A. arenosa is summarized in table 5. Fragment 1 contained a number of indels and could not be aligned unambiguously, so the alignment requiring the least number of indels was used. The exact number of indels and substitutions will depend on the alignment, but an alternative alignment seems unlikely to affect the general conclusions drawn. We also note that the substitution rates for this fragment are broadly in agreement with those for other fragments, suggesting that the alignment is reasonable. As expected, non-coding regions differed more than exons, in particular with respect to indels, which were completely absent from the coding regions. In the coding regions, the rate of synonymous substitution was higher than the non-synonymous rate (KS = 0.18 > KA = 0.04).
|
The divergence time between A. thaliana and A. lyrata has been estimated to be between 3.8 and 5.8 MYA by Kuittinen and Aguadé (2000)
The ancestral state of indels could be determined by comparison with the outgroup sequence, with the exception of a single bp indel on fragment 4 for which no outgroup was available, and several complex rearrangements on fragment 1. A. arenosa had only experienced a slight reduction in size, but the A. thaliana sequences appeared to be 4% shorter than the ancestral state, which is similar to the 5% reduction in the A. thaliana intron size compared to A. lyrata previously found by Wright, Lauga, and Charlesworth (2002)
.
A. suecica contains two nuclear genomes, one from A. thaliana and one from A. arenosa. Thus, when sequencing, we expected to find two quite different fragments. When we sequenced the four fragments in A. suecica, four thaliana-like sequences and three arenosa-like sequences were obtained (in total seven sequences; table 3). Figure 2 summarizes the observed pattern of variation as gene genealogies. The relationship between the A. suecica sequences and the ancestral species is evident. For each fragment, the four identical thaliana-like sequences fall within the cluster of actual A. thaliana sequences, and there are always A. thaliana sequences identical to the thaliana-like sequences. The arenosa-like sequences always cluster with A. arenosa, but they are different from the single A. arenosa allele we sequenced. It seems likely, however, that sequences identical to the four arenosa-like sequences would have been encountered, had we sequenced a larger sample of A. arenosa. Overall, A. suecica appears to have diverged little from its ancestral species, at least where homologous sequences exist. The fact that an arenosa-like allele was not successfully amplified from one of the four fragments could imply either that this allele does not amplify with our primers (although at least one A. arenosa allele does) or that the allele inherited from A. arenosa has been lost entirely for this fragment.
|
A total of 52 microsatellite loci were also typed for 14 A. thaliana and 8 A. suecica accessions. All 52 loci were variable within A. thaliana and 24 loci were variable in A. suecica. Because the microsatellites were chosen without any prior knowledge of variability in either of the two species, we can compare the variability of these two species using all microsatellite loci. A. thaliana is thus more variable than A. suecica considering all 52 loci (table 6). A. thaliana was still more variable, even if we only considered the 24 microsatellite loci that are variable within A. suecica. There were a total of 384 alleles among the 14 A. thaliana accessions and 106 among the eight A. suecica accessions. The comparable number of alleles for eight A. thaliana accessions would be 259.3 using a rarefaction correction (Kalinowski 2004
|
Number of Founders of A. suecica
We computed the likelihood of the Model 2a for 110 founders, using the microsatellite data. We also tested a wide range of values of the remaining parameters (µm, Ns(0), Ns(
), Nt(0), Nt(
),
, and
). However, because only two parameters,
and Ns(
), affected the likelihood for different number of founders significantly (results not shown), we limit our discussion to these parameters. The parameter
affected the likelihoods moderately. A small value of
(
< 1) resulted in a more evenly distributed likelihood as a function of
(a more detailed analysis of the effect of
follows below). However, the effect on the likelihood distributions was almost identical for any tested number of founders. Figure 3 shows the likelihood surfaces for 1, 2, and 10 founders using Model 2a with remaining parameters held constant. From fig. 3 it is clear that if Ns(
) is larger than about 500, we can be fairly certain that the species was founded by one event. However, there are clearly some parts of the [
, Ns(
)] parameter space in which the likelihoods are very similar for 1, 2, and 10 founders. If Ns(
) is less than about 500, the likelihoods are similar, except for small values of
. The number of founders cannot be determined in this part of the parameter space. The reason that we cannot infer the number of founders for small values of Ns (
) is that all A. suecica lines have an MRCA before (going backwards in time)
, regardless of the number of founders. Figure 4 shows the posterior distribution of the number of founders (n = 1, ..., 10) for fixed parameter values of the population sizes and a wide range of values of the time-of-origin
. Unless Ns(
) < 1000 there is strong support for one origin of A. suecica. There also appears to be little support for
> 40,000 for either value of n (see below for a more detailed estimation of the time-of-origin).
|
|
If there were more than one founder, these founders may have contributed unequal fractions to the founding population of A. suecica. We evaluated this scenario using Model 2b and Algorithm 2. Figure 5a shows the posterior distribution of the contributing fraction p2 from a second founder, given the model, for four different values of Ns(0) and Ns(
). The second founding event was arbitrarily chosen to be the founder that contributed less than 0.5 to the founding population. Note that this model is identical to Model 2a with n = 2 when p1 = p2 = 0.5, and it is identical to Model 2a with n = 1 when p2 = 0 (or p1 = 0). From fig. 5a we see that if Ns(
)
104, most of the weight is on small values of p2. It is thus unlikely that a second founder contributed more than 0.050.1 to the founding population of A. suecica given that Ns(
)
104. If Ns(
) = 103 we see the same tendency of smaller values of p2 being more likely than larger values of p2, whereas if Ns(
) = 102 we cannot distinguish small contributions to the founding population from a second founder. Figure 5b shows the posterior distribution of the contributing fraction p2 from a second founder given the data, and when the mutation rate is assumed to be between 104 and 103. From fig. 5b it is clear that the general conclusions about a second founding event are robust with regard to the mutation rate.
|
The complete lack of variation in the A. suecica sequence data (four thaliana-like fragments and three arenosa-like fragments) also suggests that our geographically diverse sample of A. suecica stem from a single polyploidization event. Given the level of variation in A. thaliana (the average pairwise difference per bp was 0.00719, 0.0736, 0.00157 and 0.00301 in the four fragments respectively; see also fig. 2), it is unlikely that two or more polyploidization events would have involved A. thaliana parents identical at all four fragments (although we note that some A. thaliana accessions studied are indeed identical in all four fragments).
Time-of-origin
The time-of-origin of the nuclear genome of A. suecica was estimated using Model 2, Algorithm 2, and the microsatellite data. We used uniform distributions for the six priors [µm,
, Nt(0), Ns(0), Nt(
), Ns(
)] and tested different types of growth scenarios (
= 1.0, 0.5, 0.25, or 0.1). The prior distribution of µm was uniform in the interval 105 to 103. These bounds were based on microsatellite mutation rate estimates from a variety of organisms (Schug, MacKay, and Aquadro 1997
; Vazquez et al. 2000
; Xu et al. 2000
; Vigouroux et al. 2002
). The origin of A. suecica is unlikely to be more recent than a thousand years ago, given its present geographical distribution and level of variation. It is also unlikely that the origin is more than a million years ago given a divergence between its parents of about 5 Myr. Thus, we used a uniform distribution between 103 and 106 as a prior for
. Strong evidence of population growth in A. thaliana has recently been reported (Nordborg et al. 2005
) and we used uniform priors for the past and present population size of A. thaliana to reflect this: Nt(0)
uniform(107, 109) and Nt(
)
uniform(105, 107). For the population sizes of A. suecica we used uniform priors, Ns(0)
uniform(105, 106) and Ns(
)
uniform(1, 104). In addition to these "reasonable" priors (runs 14 in table 7), several more or less unrealistic priors were tested (runs 511, table 7) to check whether the priors chosen above were justified or not as control.
|
Figure 6a shows the posterior distribution of
for four different growth scenarios and with a uniform prior distribution of µm between 105 and 103 using "reasonable" priors for all parameters (runs 14 in table 7). It is obvious from fig. 6a that the growth scenario affects the conclusions for the time-of-origin of A. suecica. If growth started close to the origin (
= 1) the posterior probability distribution of
is relatively narrow with a median of 45,000 years. If the growth started long after the origin (
= 0.25 or 0.1) the posterior distributions of
are quite wide with medians of 100,000 and 240,000. The probability that
< 12,000 is about 5% for
= 1 (where
is larger at the 5% limit for all
< 1, table 8). Figure 6b shows one minus the cumulative probability distribution for the same four growth scenarios. This figure illustrates that the upper limit of
also depends heavily on
. When
0.25 we find that the probability that
> 290,000 is about 5%. Figure 7 shows the heavy negative correlation of
and µm. For example, if µm > 5 x 105, then
is <100,000. Clearly, inference of
is heavily dependent on what is assumed about µm and the population growth, whereas the other parameter had much smaller effects on
.
|
|
|
Estimates of microsatellite mutation rates from plants are in the range of 104 to 103 mutations per generation (Vigouroux et al. 2002
would be much smaller;
> 44,000 is about 5% for
0.25 (fig. 8). Under this assumption on µm the probability distributions of
are shifted toward the lower end of the prior of
. For example, the median of
when
= 1, 0.5, and 0.25 is about 9,000, 13,000, and 22,000, respectively.
|
Given Model 1 described above, what does the fact that we observed no variation in the four sequenced fragments tell us about
, N, and µs? It is clear that the information in these data is limited: the maximum likelihood estimate of all three parameters is zero, and the best we can hope for is to rule out certain parameter combinations. A similar problem has been studied previously in the context of a human Y chromosome data set without variation (Tavaré et al. 1997
, N, and µs. To emphasize the fact that we can really only hope to rule out large values, one minus the cumulative distribution is shown. For example, the probability that
> 200,000 years is about 20% and the probability that
> 650,000 years is about 5%. It is clear that a wide range of values are plausible for all three parameters. It is also clear that there is almost no information about N: the cumulative posterior distribution is a straight line, which is the same as the prior. The only exception is for very small values of N. The reason for this is that, with the exception of very small N, observing no variation is extremely unlikely unless either
or µs is small enough. A small effective population size is not a good explanation for the data.
|
It is much more likely that the lack of variation reflects the relatively recent origin of A. suecica. It may seem disappointing that we cannot pin down this origin more narrowly than indicated in fig. 9. The reason for this is the uncertainty in µs. Similar to what was observed for the microsatellite model (fig. 7), the joint posterior density of
and µs show a very strong negative correlation between the two parameters (fig. 10). We originally assumed a uniform prior between 109 and 107 for µs based on mutation rates estimated above and earlier reports of mutation rates of Arabidopsis and Arabis species (e.g., Koch, Haubold, and Mitchell-Olds 2000
would change dramatically. As shown in fig. 11, the probability that
> 200,000 years is now about 7% instead of the 20% we obtained earlier. Figure 11 illustrates how strongly these kinds of evolutionary inferences depend on what is assumed. It is clear that with a high µs the estimate of
will be low and visa versa.
|
|
| Discussion |
|---|
|
|
|---|
Number of Origins
Unlike other forms of speciation, speciation by polyploidization is instantaneous. It is the result of a well-defined, discrete event (or events), and in contrast to, say, humans it is clearly meaningful to ask when and where the species arose. The new polyploid species is expected to be isolated from its parental species because crosses between a polyploid and its parental species are most likely infertile. However, repeated polyploidization events may result in a polyploid species with multiple origins. If alleles at a given locus of all existing individuals of a polyploid species have an MRCA before (going backwards in time) or at the time-of-origin, at least two scenarios are possible. These alleles either have an MRCA before the time-of-origin simply because of drift, or the species was founded by one polyploidization event. In addition to these two scenarios, the polyploid species may be the result of more that one polyploidization event of genetically very closely related parental individuals. This scenario would be very difficult if not impossible to distinguish genetically from a scenario of only one founding event. On the other hand, this case would lead to a polyploid species which, effectively, can be considered to be of a single origin because the founding individuals were genetically very similar. Distinguishing between single and multiple origins using polymorphism data is not trivial. In particular, the observation of monophyly at one or a small number of loci with respect to the parental species does not imply a single origin: alleles in the polyploid species could be monophyletic with respect to the parental species simply because they happen to coalesce more recently than the origins of the species (Rosenberg and Nordborg 2002
We investigated the number of origins of A. suecica using an explicit coalescent model. We found that the likelihood of different founder numbers was determined by two parameters: the time-of-origin,
, and the population size of A. suecica immediately after the founding, Ns(
). We concluded that there is power to infer a single founder if Ns(
)
500. It is worth clarifying what this means (cf. fig. 3). First, it is obvious that we cannot say anything about the number of founders if
is sufficiently large because all lineages will have coalesced with A. suecica. However, the low level of polymorphism in A. suecica argues against a very large
(unless the population size is tiny). Second, we are not interested in founder lines that went extinct in a small number of generations. To model multiple origins, we consider the composition of A. suecica immediately after the most recent founder event that resulted in an established population. We define the population size of A. suecica at this time Ns(
). Thus, the statement that there is power to infer a single founder if Ns(
)
500 means that we can reject multiple founders unless the total (effective) population size when multiple lineages last existed was less than 500. This seems unlikely given the high reproductive rate of Arabidopsis, and the fact polyploidization events are rare so that each founding event is likely to have been isolated (geographically) from other founding events for some time. An initial population size of 500 is quickly reached given the mode of reproduction of A. suecica (Säll et al. 2004
).
For simplicity, we do not explicitly model multiple origins at different points in time (our method tests for multiple origins at one point in time). Nonetheless, our results bear on this (rather more plausible scenario) as well. In particular, our "unequal contribution" model (Model 2b) can be thought of as modeling a situation where two founding populations were established at different points in time, and have reached different sizes. We then consider the future of the merged population. We believe that the general conclusions we reach about what can, and cannot be inferred about multiple founding events would be valid under more detailed models as well. However, an explicit model of multiple founding events at different points in time would require many assumptions about initial demography, and is beyond the scope of this paper.
In summary, then, we believe that the current genome of A. suecica stems from a single founding event. While it is possible that some variation was inherited from A. arenosa (and survived the original bottleneck that must have taken place), it seems likely that the genome inherited from A. thaliana was originally that of an homozygous inbred parent.
Time-of-origin
The coalescent method was also used to estimate the time-of-origin of A. suecica. This was done separately using microsatellite data and sequence data. The results were more sensitive to assumptions than those for the number of founders. The microsatellite data show that the origin probably is >12,000 years ago, unless the microsatellite mutation rate is high (if µm > 104, then we can only conclude that the origin most likely is >3,000 years ago). The upper limit of the origin is almost certainly <290,000 years ago (based on the microsatellite data). Figures 6, 8, and 11 illustrate how strongly these kinds of evolutionary inferences depend on the assumptions. If we are willing to assume that the mutation rates are in the higher end of the intervals (µm > 104 and µs > 108), then the upper bounds on the time-of-origins will be much lower,
< 44,000 for the microsatellites and
< 200,000 for the sequence data. An advantage of the Bayesian approach taken here is that it provides a natural way of making uncertainty about parameters explicit. Our estimate of µs based on divergence between A. thaliana and A. arenosa is about 6 x 109, but this estimate has considerable uncertainty. In our simulations, we assumed a uniform distribution for µs: we could have chosen some other distribution and obtained a different posterior distribution. This may seem arbitrary, but the correct conclusion is that there is not enough information in the sequence data. It should be remembered, however, that most estimates of divergence times in evolutionary biology treat the mutation rate as constant, completely ignoring the uncertainty.
There are strong genetic indications of growth for the A. thaliana population (Nordborg et al. 2005
) and the A. suecica population must necessarily have grown since its origin. It is plausible that the growth has occurred in connection with the retreat of the ice cover of the last glaciation because both these species are presently found in areas that were completely or partly covered by ice. For scenarios when the growth started relatively late (
= 0.25 or 0.1) we find that the time-of-origin (and thus the real starting time of growth) is further back in time than for scenarios when the growth started relatively early (
= 0.5 or 1; figs. 6 and 8). Thus the real starting time of the growth for the different growth scenarios tends to be shifted toward a common point in time.
Estimates of the age of the most recent common ancestor of the A. suecica chloroplast genome (Säll et al. 2003
; Jakobsson et al. 2005) are consistent with the estimates obtained here.
Place of Origin
Because the present range of A. suecica was covered by ice during the last glaciation (Andersen and Borns 1997
), and because we estimate the origin to be >12,000 years ago, the species have probably originated somewhere south of Sweden and Finland and then migrated to its present range. If this is the case, then, because A. suecica is not found outside Sweden and Finland today, the species must have become extinct in the areas where it originated.
The fragment genealogies in fig. 2 reveal that the thaliana-like sequence from A. suecica always falls within clades of sequences from A. thaliana. As expected given the level of linkage disequilibrium in A. thaliana (Nordborg et al. 2002
), the thaliana-like sequence clusters with different A. thaliana accessions in different fragments. However, it is always identical to the two accessions Lund and Vimmerby from southern Sweden as well as the accession Pu-2-8 from the Czech Republic. Studies of cpDNA have found that A. suecica clusters with A. thaliana originating from southern Sweden and central Europe (Säll et al. 2003
; Jakobsson et al. 2005
). In subsequent studies, we found many A. thaliana accessions from southern Sweden, as well as some from middle Sweden, and central and southern Europe, that carry a sequence identical to the thaliana-like sequences at fragments 1 and 4 (Hagenblad et al. 2004
). Moreover, all A. suecica accessions cluster together when the accessions in the microsatellite data are analyzed using a Neighbor-Joining algorithm (Saitou and Nei 1987
; data not presented). The A. thaliana accessions clustering closest to the A. suecica branch were from southern and central Sweden (T160, T340 and T350), Finland (T81), western Russia (T93), Poland (Lip-0), and Lithuania (Wil-0). These observations suggest that A. suecica migrated to Sweden and Finland in the wake of the retreating ice together with A. thaliana individuals related to its ancestor followed by extinction of the species outside Sweden and Finland.
| Supplementary Material |
|---|
|
|
|---|
The new sequences reported in this paper are available from GenBank with accession numbers AY319334AY319368. A. thaliana sequences are available from GenBank with accession numbers AY092437AY092456, AY092477AY092496, AY092517AY092536, and AY092637AY092656. The above data and Table S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
The authors wish to thank H. Ellegren for letting us use his sequencing equipment at an early stage of this project and N. Arnheim for suggesting the use of allele-specific PCR. We would also like to thank D. Charlesworth, M. Uyenoyama, and two anonymous reviewers for comments on the manuscript. S.T. is a Royal Society Wolfson Research Merit Award holder.
| Footnotes |
|---|
Marcy Uyenoyama, Associate Editor
| References |
|---|
|
|
|---|
Andersen, B., and H. Borns Jr. 1997. The ice age world. Scandinavian University Press, Oslo.
Benson, G. 1999. Tandem repeats finder: a program to analyse DNA sequences. Nucleic Acids Res. 27:573580.
Beaumont, M. A., W. Y. Zhang, and D. J. Balding. 2002. Approximate Bayesian computation in population genetics. Genetics 162:20252035.
Chen, Z. J., L. Comai, and C. S. Pikaard. 1998. Gene dosage and stochastic effects determine the severity and direction of uniparental ribosomal RNA gene silencing (nucleolar dominance) in Arabidopsis allopolyploids. Proc. Natl. Acad. Sci. USA 95:1489114896.
Comai, L., A. Tyagi, K. Winter, S. Holmes-Davis, R. Reynolds, Y. Stevens, and B. Byers. 2000. Phenotypic instability and rapid gene silencing in newly formed Arabidopsis allotetraploids. Plant Cell 12:15511567.
Hagenblad, J., and M. Nordborg. 2002. Sequence variation and haplotype structure surrounding the flowering time locus FRI in Arabidopsis thaliana. Genetics 161:289298.
Hagenblad J., C. L. Tang, J. Molitor, J. Werner, K. Zhao, H. G. Zheng, P. Marjoram, D. Weigel, and M. Nordborg. 2004. Haplotype structure and phenotypic associations in the chromosomal regions surrounding two Arabidopsis thaliana flowering time loci. Genetics 168:16271638.
Heenan, P. B., A. D. Mitchell, and M. Koch. 2002. Molecular systematics of the New Zealand Pachycladon (Brassicaceae) complex: generic circumscription and relationships to Arabidopsis sens. lat. and Arabis sens. lat. N. Z. J. Bot. 40:543562.
Hey, J. 2005. On the number of new world founders: a population genetic portrait of the peopling of the Americas. PLoS Biol. 3:965975.
Hultgård, U. 1987. Parnassia palustris L. in Scandinavia. Symb. Bot. Ups. 28:1128.
Hultén, E. 1971. Atlas of the distribution of vascular plants in northwestern Europe. Generalstabens litografiska anstalts förlag, Stockholm.
Hylander, N. 1957. Cardaminopsis suecica (Fr.) Hiit., a northern amphidiploid species. Bull. Jard. Bot. Brux. 27:591604.
Jakobsson, M., T. Säll, C. Lind-Halldén, and C. Halldén. 2005. The evolutionary history of the common chloroplast genome of Arabidopsis thaliana and A. suecica. in Genome Divergence in Progress - a population genetic analysis of the allopolyploid Arabidopsis suecica and its maternal parent A. thaliana. Doctoral dissertation (M. Jakobsson), Lund University, Lund, Sweden.
Kalinowski, S. T. 2004. Counting alleles with rarefaction: private alleles and hiearchical sampling design. Conserv. Genet. 5:539543.[CrossRef]
Kamm, A., I. Galasso, T. Schmidt, and J. Heslop-Harrison. 1995. Analysis of a repetitive DNA family from Arabidopsis arenosa and relationships between Arabidopsis species. Plant Mol. Biol. 27:853862.[CrossRef][Web of Science][Medline]
Koch, M., B. Haubold, and T. Mitchell-Olds. 2000. Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis and related genera (Brassicaceae). Mol. Biol. Evol. 17:14831498.
Ku, H., T. Vision, J. Liu, and S. Tanksley. 2000. Comparing sequenced segments of the tomato and Arabidopsis genomes: largescale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97:91219126.
Kuittinen, H., and M. Aguadé. 2000. Nucleotide variation at the Chalcone isomeras locus in Arabidopsis thaliana. Genetics 155:863872.
Mesicek, J. 1970. Chromosome counts in Cardaminopsis arenosa Agg. (Cruciferae). Preslia 42:225248.
Mummenhoff, K., and H. Hurka. 1994. Subunit polypeptide composition of Rubisco and the origin of allopolyploid Arabidopsis suecica (Brassicaceae). Biochem. Syst. Ecol. 22:807812.[CrossRef]
Nordborg, M. 1998. On the probability of Neanderthal ancestry. Am. J. Hum. Genet. 63:12371240.[CrossRef][Web of Science][Medline]
. 2001. Coalescent theory. Pp. 179212 in D. J. Balding, M. J. Bishop, and C. Cannings, eds. Handbook of statistical genetics. John Wiley & Sons, Inc., Chichester, UK.
Nordborg, M., and J. Bergelson. 1999. The effect of seed and rosette cold treatment on germination and flowering time in some Arabidopsis thaliana Brassicaceae ecotypes. Am. J. Bot. 86:470475.
Nordborg, M., J. Borevitz, J. Bergelson et al. (12 co-authors). 2002. The extent of linkage disequilibrium in Arabidopsis thaliana. Nat. Genet. 30:190193.[CrossRef][Web of Science][Medline]
Nordborg M., T. Hu, Y. Ishino et al. (23 co-authors). 2005. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3:12891299.
Nordborg M., and P. Donnelly. 1997. The coalescent process with selfing. Genetics 146:11851195.[Abstract]
O'kane, S., B. Schaal, and I. Al-Shehbaz. 1996. The origin of Arabidopsis suecica (Brassicaceae) as indicated by nuclear rDNA sequences. Syst. Bot. 21:559566.
Osborn, T., J. Pires, J. Birchler et al. (11 co-authors). 2003. Understanding mechanisms of novel gene expression in polyploids. TIG 19:141147.
Ohta, T., and M. Kimura. 1973. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22:201204.[Web of Science][Medline]
Pontes O., N. Neves, M. Silva, M. S. Lewis, A. Madlung, L. Comai, W. Viegas, and C. S. Pikaard. 2004. Chromosomal locus rearrangements are a rapid response to formation of the allotetraploid Arabidopsis suecica genome. Proc. Natl. Acad. Sci. USA 101:1824018245.
Price, R. A., I. A. Al-Shebaz, and J. D. Palmer. 1994. Systematic relationships of Arabidopsis: a molecular and morphological perspective in Arabidopsis. Pp. 719 in E. Meyerowitz and C. Somerville, eds. Cold Spring Harbour Laboratory Press, New York.
Pritchard, J. K., M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman. 1999. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16:17911798.[Abstract]
Rosenberg, N. A., and M. Nordborg. 2002. Genealogical trees, coalescent theory, and the analysis of genetic polymorphisms. Nature Rev. Genet. 3:380390.[CrossRef][Web of Science][Medline]
Saitou, N., and M. Nei. 1987. The neighbor-joining methoda new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Säll, T., M. Jakobsson, C. Lind-Halldén, and C. Halldén. 2003. Chloroplast DNA indicates a single origin of the allotetraploid Arabidopsis suecica. J. Evol. Biol. 16:10191029.[CrossRef][Web of Science][Medline]
Säll, T., C. Lind-Halldén, M. Jakobsson, and C. Halldén. 2004. Mode of reproduction in Arabidopsis suecica. Hereditas 141:313317.[CrossRef][Web of Science][Medline]
Schug, M. D., T. F. C. MacKay, and C. F. Aquadro. 1997. Low mutation rates of microsatellite loci in Drosophila melanogaster. Nat. Genet. 15:99102.[CrossRef][Web of Science][Medline]
Soltis, D., and P. Soltis. 1993. Molecular data and the dynamic nature of polyploidy. Crit. Rev. Plant Sci. 12:243273.
Song, K., P. Lu, K. Tang, and T. C. Osborn. 1995. Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. Proc. Natl. Acad. Sci. USA 92:77197723.
Stebbins, G. L. 1971. Chromosomal evolution in higher plants. E. Arnold, London.
Tavaré, S., D. J. Balding, R. C. Griffiths, and P. Donnelly. 1997. Inferring coalescence times from DNA sequence data. Genetics 145:505518.[Abstract]
Vazquez, F., T. Perez, J. Albornoz, and A. Dominguez. 2000. Estimation of the mutation rates in Drosophila melanogaster. Genet. Res. 76:232326.
Vigouroux Y, J. S. Jaqueth, Y. Matsuoka, O. S. Smith, W. F. Beavis, J. S. C. Smith, and J. Doebley. 2002. Rate and pattern of mutation at microsatellite loci in maize. Mol. Biol. Evol. 19:12511260.
Wendel, J. 2000. Genome evolution in polyploids. Plant Mol. Biol. 42:225249.[CrossRef][Web of Science][Medline]
Wright, S., B. Lauga, and D. Charlesworth. 2002. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol. Biol. Evol. 19:14071420.
Xu, X., M. Peng, Z. Fang, and X. Xu. 2000. The direction of microsatellite mutations is dependent upon the allele length. Nat. Genet. 24:396399.[CrossRef][Web of Science][Medline]
Yang, Y.-W., K.-N. Lai, P.-Y. Tai, D.-P. Ma, and W.-H. Li. 1999. Molecular phylogenetic studies of Brassica, Rorippa, Arabidopsis and allied genera based on the internal transcribed spacer region of 18S25S rDNA. Mol. Phylogenet. Evol. 13:455462.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Ha, E.-D. Kim, and Z. J. Chen Duplicate genes increase expression diversity in closely related species and allopolyploids PNAS, February 17, 2009; 106(7): 2295 - 2300. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Hazzouri, A. Mohajer, S. I. Dejak, S. P. Otto, and S. I. Wright Contrasting Patterns of Transposable-Element Insertion Polymorphism and Nucleotide Diversity in Autotetraploid and Allotetraploid Arabidopsis Species Genetics, May 1, 2008; 179(1): 581 - 592. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Fortune, N. Pourtau, N. Viron, and M. L. Ainouche Molecular phylogeny and reticulate origins of the polyploid Bromus species from section Genea (Poaceae) Am. J. Botany, April 1, 2008; 95(4): 454 - 464. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Chen, M. Ha, E. Lackey, J. Wang, and Z. J. Chen RNAi of met1 Reduces DNA Methylation and Induces Genome-Specific Changes in Gene Expression and Centromeric Small RNA Accumulation in Arabidopsis Allopolyploids Genetics, April 1, 2008; 178(4): 1845 - 1858. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. B. Blum and N. A. Rosenberg Estimating the Number of Ancestral Lineages Using a Maximum-Likelihood Method Based on Rejection Sampling Genetics, July 1, 2007; 176(3): 1741 - 1757. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Koch and M. Matschinger Evolution and genetic differentiation among relatives of Arabidopsis thaliana PNAS, April 10, 2007; 104(15): 6272 - 6277. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Slotte, A. Ceplitis, B. Neuffer, H. Hurka, and M. Lascoux Intrageneric phylogeny of Capsella (Brassicaceae) and the origin of the tetraploid C. bursa-pastoris based on chloroplast and nuclear DNA sequences Am. J. Botany, November 1, 2006; 93(11): 1714 - 1724. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





















