Skip Navigation


MBE Advance Access originally published online on September 21, 2007
Molecular Biology and Evolution 2007 24(11):2515-2524; doi:10.1093/molbev/msm197
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/11/2515    most recent
msm197v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hollister, J. D.
Right arrow Articles by Gaut, B. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hollister, J. D.
Right arrow Articles by Gaut, B. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Population and Evolutionary Dynamics of Helitron Transposable Elements in Arabidopsis thaliana

Jesse D. Hollister and Brandon S. Gaut

Department of Ecology and Evolutionary Biology, University of California, Irvine

E-mail: bgaut{at}uci.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Helitrons, a recently discovered superfamily of DNA transposons that capture host gene fragments, constitute up to 2% of the Arabidopsis thaliana genome. In this study, we identified 565 insertions of a family of nonautonomous Helitrons, known as Basho elements. We aligned subsets of these elements, estimated their phylogenetic relationships, and used branch lengths to yield insight into the age of each Basho insertion. The age distribution suggests that 87% of Bashos inserted within 5 Myr, subsequent to the divergence between A. thaliana and its sister species Arabidopsis lyrata. We screened 278 of these insertions for their presence or absence in a sample of 47 A. thaliana accessions. With both phylogenetic and population frequency data, we investigated the effects of gene density, recombination rate, and element length on Basho persistence. Our analyses suggested that longer Basho copies are less likely to persist in the genome, consistent with selection against the deleterious effects of ectopic recombination between Basho elements. Furthermore, we determined that 39% of Basho elements contain fragments of expressed protein-coding genes, but all of these fragments were explained by only 5 gene-capture events. Overall, the picture of A. thaliana Helitron evolution is one of rapid expansion, relatively few gene-capture events, and weak selection correlated with element length.

Key Words: helitron • exon shuffling • ectopic recombination


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The abundance and distribution of transposable elements (TEs) in host genomes depend on several factors, including insertion-site specificity, transposition rate, and selection against deleterious effects (Jakubczak et al. 1991Go; Nuzhdin 1999Go; Devos et al. 2002Go). Selection has been proposed to limit TE copy numbers in at least 2 ways. The first is "gene disruption," in which strong selection acts against TE insertions into genes or regulatory regions (Finnegan 1992Go; Nuzhdin 1999Go). This model predicts that TEs should accumulate in gene-poor regions, which offer "safe havens" for insertion. Typically the gene-disruption model has been investigated by analysis of genomic data. For example, the abundance of TEs in Arabidopsis thaliana is negatively correlated with gene density, and this observation was interpreted to be consistent with selection against gene disruption (Wright et al. 2003Go). The second is the "ectopic exchange" model. In this case, selection acts against the deleterious effects of recombination between elements at nonhomologous sites on the same or different chromosomes. In addition to predicting that TEs should accumulate in regions of low recombination, this model also predicts that selection should act more strongly against large TEs and TEs at high copy number because both increase the probability of ectopic recombination (Montgomery et al. 1987Go; Montgomery et al. 1991Go; Dray and Gloor 1997Go; Petrov et al. 2003Go). The distribution of TEs in Drosophila melanogaster is consistent with the ectopic exchange model because the genomic recombination rate, rather than the density of genes, correlates negatively with both the genomic distribution of TEs and their population frequencies (Bartolomé et al. 2002Go; Maside et al. 2005Go).

Genome sizes and TE complements covary (Kidwell 2002Go). Accordingly, one might expect selection on the same TE family to differ substantially among host genomes. The self-fertilizing angiosperm A. thaliana and its outcrossing sister species Arabidopsis lyrata, which share a common ancestor ~5 MYA (Koch et al. 2000Go), provide a particularly interesting contrast. Population frequency data for Ac-like III TEs indicate that these elements are present at intermediate to high frequency within A. thaliana populations, suggesting that element insertions are not discernibly subject to purifying selection. In A. lyrata, however, Ac-like TEs are present in lower frequencies, on average, than expected under neutrality, consistent with purifying selection acting against new insertions. The differences between these species may be due to a decreased efficacy of selection in inbreeding A. thaliana (Wright et al. 2001Go). Alternatively, Ac-like elements may simply be less deleterious in A. thaliana because heterozygosity is low for selfing species, and deleterious ectopic recombination events are more likely to occur between heterozygous TE loci (Montgomery et al. 1991Go).

The study of Ac-like elements in A. thaliana and A. lyrata demonstrates the value of TE frequency data, but such data are rare. Here, we study the evolutionary dynamics of Helitron TEs in the A. thaliana genome. Helitrons are a recently discovered family of TEs that were identified computationally in the genomes of Caenorhabditis elegans, rice, and Arabidopsis (Kapitonov and Jurka 2001Go). A sample of nonautonomous Helitrons had been previously discovered in a 17-Mb segment of the Arabidopsis genome and were designated Basho elements (Li et al. 2000Go). Basho Helitrons comprise about 2% of the A. thaliana genome or one-fifth of the total transposable element DNA (Lai et al. 2005Go). Basho elements exhibit strong conservation of the 3' terminal sequence CTAG, which is preceded by an 18-bp palindromic sequence capable of producing a hairpin loop. The 5' terminus is less conserved but usually consists of the sequence CHH, where H denotes T, A, or C (Li et al. 2000Go).

Helitrons have been shown to capture gene fragments by an unknown mechanism possibly associated with their rolling circle (RC) replication process, and may be important agents of gene evolution by exon shuffling (Bennetzen 2005Go). Such shuffling can be extensive (Morgante et al. 2005Go); for example, gene movement by Helitrons is responsible for noncolinearity between maize accessions (Lai et al. 2005Go). However, thus far there is little evidence that Helitrons have contributed directly to gene function, and the rate of capture and proliferation of protein-coding sequence remains unclear.

In this study, we computationally identify 565 Basho insertions in the published A. thaliana genome sequence. At 278 of these loci, we screen a panel of 47 A. thaliana ecotypes for presence/absence of Basho elements. We integrate these population frequency data with genomic and phylogenetic information to identify factors influencing the pattern of proliferation of the Basho family of Helitrons in the A. thaliana genome. We investigate how gene distribution, recombination rate along chromosomes, and Basho insertion size affect both the spread of these TEs and their persistence over time. We also document gene capture, exon shuffling, and contribution to novel coding sequence by Basho insertions segregating in our panel, exemplifying their role as agents of genome evolution.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Basho Sequence Retrieval and Phylogenetic Analysis
We used the sample of Basho sequences from Li et al. (2000)Go as a BlastN query against the A. thaliana genome (release 4). Following Li et al. (2000)Go, the sample of Basho elements was divided into 7 groups or subfamilies, each consisting of elements sharing ~80% sequence identity. The Blast was performed with a 1E-20 cutoff, with no filtering of repetitive sequences. If the Blast hits overlapped in location, they were merged into a single sequence, representing a discrete Helitron element. The Blast identified numerous sequences homologous to one or more of the Basho elements used in the query and possessing highly conserved 3' termini. When these were combined with the sequences in the query, they comprised a database of 565 sequences from distinct genomic locations. We recorded the start and end positions of elements in the sequence of the chromosomes in which they were identified and used this information to calculate the size of each element.

Sequences in each of the 7 Basho subfamilies were aligned using the ClustalW multiple alignment program. Alignments were manually adjusted using BioEdit version 7.0 (Hall 1999Go), and element lengths were confirmed by identification of conserved terminal sequences. We built Neighbor-Joining phylogenies (Kimura 2-parameter substitution method; 1,000 bootstrap replicates) for each subfamily, using the Molecular Evolutionary Genetics Analysis (MEGA) 3.1 software package (Kumar et al. 2004Go). Because Basho sequences are rife with insertions and deletions, pairwise deletion was used for phylogeny reconstruction. MEGA's "display branch length" feature was used to ascertain terminal branch lengths (TBLs) for each element.

Identification of Gene Fragments within Basho Elements
We used the complete Basho database in a BlastN query against 2 A. thaliana genomic databases. The first Blast was against the coding sequence (CDS) database from The Institute for Genomic Research A. thaliana genome release 5. This database consists of only protein-coding DNA sequences. The second Blast was against National Center for Biotechnology Information's UniGene Build #53, an assembly of all the expressed sequenced tags from dbEST as of 21 May 2006 (http://www.ncbi.nlm.nih.gov/gquery/). Both Blasts were performed with an E-value cutoff of E-05. To ensure the validity of Blast results, hits between Helitrons and expressed sequence from the Unigene Blast were matched to their respective CDS and/or genomic sequences. The genomic locations of Basho elements containing host gene sequence were verified from our database and GenBank chromosome sequences.

Occupation Frequency Survey
We selected a population sample of 48 accessions from the Nordborg panel of 96 (Nordborg et al. 2005Go). Individuals were selected to represent a global sample of Arabidopsis populations (supplementary table S1, Supplementary Material online). Accessions were grown from seed obtained from The Arabidopsis Information Resource (www.arabidopsis.org). Genomic DNA was extracted from leaf tissue using the Qiagen DNeasy plant extraction kit (www.qiagen.com).

Using the primer3 program, we designed 3 primers for each Basho locus. Two were complementary to genomic sequences flanking the 3' and 5' termini of each element (Left-Flanking [LF] and Right-Flanking [RF] primers), and an additional internal primer was complementary to a sequence in the first 400 bp of the element itself (TE primer). We ensured that the polymerase chain reaction (PCR) products resulting from amplification of the LF and RF primers differed in size from those resulting from RF–TE amplification. The 3 primers were used in each PCR assay in each of the 47 accessions. The presence of a Basho element at each locus was indicated by at least 1 of 2 distinct PCR products, from LF–RF and RF–TE amplification, respectively. Absence of the Helitron was indicated by a single, smaller band from amplification of the flanking primers. We recorded the occupation frequency of Basho at a given locus as the fraction of individuals in the population possessing an element against the total number of successful PCRs (reflecting both element presence and absence). Columbia was included as a 48th member of the panel and treated as a positive control for Basho presence. Occupation frequency was calculated on the remaining 47 accessions. Note that the removal of Colombia does not fully correct for ascertainment bias caused by sampling only TEs present in the published A. thaliana Columbia genome. Ascertainment biases likely resulted in an undersampling of very low frequency insertions. Proper ascertainment correction would, however, require a realistic demographic model that invokes several assumptions about population history, but the demographic history of A. thaliana is unclear (Nordborg et al. 2005Go).

Gene Density, Recombination Rate, and Statistical Analysis
To estimate gene density around Basho loci, we calculated the number of genes in a 1.0-MB window around each locus, using the positions (in base pairs) of all annotated genes from the A. thaliana GenBank chromosome files (release 4). We also calculated the distance from each Basho locus to the nearest protein-coding sequence. To estimate recombination rate, we used polynomial functions from Zhang and Gaut (2003)Go that describe the fit of physical to genetic distance for each of the 5 A. thaliana chromosomes. The first derivative of the polynomial function for each chromosome was used to obtain a point estimate of recombination rate at each Helitron locus based on its physical position (in mega base pairs).

To identify genomic correlates with population frequency of Basho insertions, we performed a multiple regression of population frequency on TBL, insertion size, distance to nearest gene, recombination rate, and gene density. We performed an additional regression of TBL on insertion size, distance to nearest gene, recombination rate, and gene density. Regressions were performed using the statistical program R version 2.2.1. The separate variance contributions of independent variables to the correlation coefficient were calculated by dividing the square of their standardized regression coefficients by the multiple r2 value, to obtain a percentage of the variance in the dependent variable explained by each independent variable.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The Age of Basho Elements
We identified 565 putative Basho insertions within the published genome sequence of A. thaliana by Blast homology to a query set of 179 elements. This query set was identified by hand curation and represents 7 distinct subfamilies, designated Basho I–VII (Li et al. 2000Go). Of the 565 putative elements identified by our homology search, 422 were aligned. The remaining 143 were omitted from further analysis for various reasons, including 43 with sequences too short (<200 bp) for phylogeny reconstruction, 64 that lacked conserved termini, and 11 with interior sequences that could not be reliably aligned. In addition, 25 elements exhibited visual evidence of a chimeric origin between 2 Basho subfamilies; as chimeras, these could not be included in the alignment of any single subfamily. When the query elements were combined with the elements found in our genome scan, the 7 subfamilies (subfamily I to subfamily VII) contained 19, 239, 64, 20, 40, 33, and 7 sequences, respectively.

We constructed Neighbor-Joining phylogenies for each subfamily and used TBLs as a proxy for the time since an element's origin by replication of an ancestral copy (fig. 1; supplementary fig. S1, Supplementary Material online). Average TBL among elements in Basho I–VII were 0.081, 0.094, 0.352, 0.068, 0.266, 0.112, and 0.268 mutations per site per branch, respectively. We converted TBLs into time estimates by dividing each TBL by the substitution rate of 1.05 x 10–8 substitutions per site per year estimated for intergenic regions (DeRose-Wilson and Gaut 2007). Overall, our estimate of the average age of Basho elements was 3.04 ± 0.32 Myr. Only 57 of 422 insertions (13%) had estimated ages exceeding the 5.0 Myr divergence date between A. thaliana and A. lyrata (Koch et al. 2000Go), suggesting that most Basho elements arose from replicative transposition after the thalianalyrata split (fig. 2).


Figure 1
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Neighbor-Joining phylogeny of Basho V elements in the Arabidopsis thaliana Columbia genome sequence. Labels at each external node represent individual Basho insertions. Numbers in parentheses indicate occupation frequency of an insertion in our population sample (NA, not measured for that insertion). The gray background indicates elements containing sequence fragments homologous to exons from At4g22800 and At2g27070. The asterisks indicate insertions at At4g22800 and At2g27070 loci.

 

Figure 2
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Frequency histogram of ages of Basho insertions (number of loci = 422). Ages were estimated by dividing TBLs from the phylogenetic analysis by the mutation rate of 1.05 x 10–8 substitutions/site/year (DeRose-Wilson L, Gaut BS, unpublished data). The x axis is in units of years before present. The estimated time of the speciation event giving rise to Arabidopsis thaliana and Arabidopsis lyrata is represented by the vertical line.

 
Investigation of Basho Contribution to Genes
Helitrons are known to capture gene fragments in maize (Lai et al. 2005Go). To investigate the evolutionary importance of Basho gene–capture events in A. thaliana, we used our database of 565 Basho insertions as a BlastN query against expressed and annotated sequences, as represented by Unigene and CDS databases, respectively. The Blast query revealed that 216 Basho elements (39% of identified insertions) contained sequences homologous to expressed and annotated protein-coding sequences. Surprisingly, these 216 hits to Basho elements could be traced to only 5 different genes (table 1). These genes shared >90% sequence identity with the fragments carried by Basho elements, suggesting recent capture and rapid proliferation. This phenomenon was not limited to annotated protein-coding sequence. For example, all 20 elements of the Basho IV subfamily contain a 104-bp fragment that is 90% identical to the expressed noncoding sequence of one gene, At1g49500.


View this table:
[in this window]
[in a new window]

 
Table 1 The Set of 5 Genes Showing Portions of Homology with Sequences Carried by 216 Basho Elements, the Subfamilies to which the Elements Belong, and Number of Elements Carrying Gene Fragments

 
Of the 5 genes with expressed protein-coding sequence homologous to Basho sequences, 2 overlapped with a Basho element. At1g13662 (a disease-resistance response/dirigent-related protein) contains a 3' 32-bp motif that overlaps with the 5' terminus of a Basho insertion (Basho II_587). A similar situation was observed at the 3' end of the coding region of At1g79150 and the 5' terminus of Basho II_85, which overlap for 36 bp. This 36-bp sequence differs at only 2 nucleotide sites from the 32-bp sequence shared by At1g13662 and Basho II_587. Apart from the sequences shared with Basho elements, At1g13662 and At1g79150 do not show significant Blast hits to one another.

In addition to the genes overlapping Basho elements, one of the 5 genes has a whole exon bounded by a Basho insertion (fig. 3). At4g22800 (a protein of unknown function) consists of 3 exons, the second of which is contained in a Basho element (Basho V_2080) with characteristic 5' and 3' Basho termini within the intronic DNA. This element is closely related to Basho V_210, which bounds the tenth exon of At2g27070 (a 2-component response regulator family protein). The exons within the Basho elements share 93% sequence identity between the 2 genes, but the genes are not homologous outside the bounds of the 2 Basho insertions. Sixteen other Basho V insertions contain sequences homologous to the exons bounded by Basho V_2080 and Basho V_210 and form a distinct clade in the Basho V phylogeny (fig. 1). Given this pattern, it appears that the shared exon sequence originated in At4g22800, or was incorporated into its coding sequence from elsewhere via transposition of Basho V_2080, and subsequently became integrated into the protein coded by At2g27070 via insertion of Basho V_210.


Figure 3
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Putative exon shuffling event by chimeric Basho V insertions at At4g22800 and At2g27070 accounting for shared exon between 2 otherwise unrelated genes.

 
Genomic Correlates of Occupation Frequency and Element Age
To investigate the evolutionary dynamics of Basho on a population time scale, we screened a panel of 47 Arabidopsis accessions for presence/absence of 278 Basho loci (supplementary table S1, Supplementary Material online). The 278 loci were chosen independently of phylogenetic relationships. We used a PCR-based assay that yielded the frequency of occupation of each Basho element. Of 278 loci, 136 contained elements that were fixed (present in every individual) in our sample and 142 were polymorphic (fig. 4). The mean occupation frequency for all loci was 0.79 ± 0.03; excluding fixed loci, the mean was 0.60 ± 0.051. Fully 226 of the 278 loci were present in frequencies >50%. These values may be biased upwards by our sampling strategy, which underrepresents low frequency insertions (see Materials and Methods). However, our results are in qualitative agreement with those for Ac-like transposons (Wright et al. 2001Go), which were also found to be at high population frequencies. In any case, the high number of fixed and common insertions makes it unlikely that strong purifying selection acts to limit Basho elements in A. thaliana.


Figure 4
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Frequency histogram of occupation frequency of Basho insertions from the population screen (population sample size = 47, number of loci = 278). The x axis represents the fraction of individuals harboring the Basho insertion.

 
Nonetheless, weak purifying selection and other genomic factors may still shape the frequency distribution of element insertions. To investigate the evolutionary forces acting on Basho elements more thoroughly, we compared occupation frequency and the age of each element (as measured by TBL) to a suite of genomic parameters, including sequence length, distance to nearest gene, recombination rate, and gene density. We first pooled genomic and TBL data across subfamilies and used them as independent variables in a multiple regression, with occupation frequency as the dependent variable. We present the results of the multiple regression considering the 142 polymorphic loci only; results using the full sample of 278 TEs are qualitatively equivalent. For the polymorphic loci, the Pearson's correlation coefficient (r = 0.47) was significant (analysis of variance P << 0.001). However, the separate contributions of the independent variables to the variance in occupation frequency were not equal. There was, for example, a significant correlation between occupation frequency and TBL (P < 0.001). TBL explains ~75% of the variance in occupation frequency accounted for by all the independent variables (fig. 5a). The size of the Basho sequence was also significantly negatively correlated with occupation frequency (P < 0.04). Sequence length explained ~10% of the variance accounted for by the model (fig. 5b). Distance to gene, recombination rate, and gene density explained <5% of the variance (fig. 5c and d) in the model. Thus, the age of an element, as approximated by TBL, appears to be the variable most closely tied to occupation frequency.


Figure 5
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— Scatterplots illustrating correlations between genomic, population, and phylogenetic parameters. Partial correlation coefficients and P values are from multiple regressions (see Results). Data are from polymorphic insertions only: (a) occupation frequency versus TBL. (b) occupation frequency versus Basho insertion size, (c) occupation frequency versus gene density in 1-MB window, (d) occupation frequency versus recombination rate, and (e) TBL versus Basho insertion size. The "distance to gene" data are not shown.

 
Having established a strong correlation between occupation frequency and TBL, we wished to investigate the relationship between TBL and genomic factors (e.g., insertion size, distance to gene, recombination rate, and gene density), with particular emphasis on sequence length because of its negative correlation with occupation frequency. We performed a multiple regression analysis of TBL against sequence length, recombination rate, and gene density. An initial analysis revealed 6 outlier loci, with very long TBLs (>2 standard deviations from the mean TBL across subfamilies), which we removed from subsequent analyses. Again, we present the data from only polymorphic insertions, but equivalent results were obtained considering the full data set, or only fixed insertions. Most (85%) of the variance in this model was attributable to the effect of Basho length (r = –0.28; P < 0.001; fig. 5e). Thus, the main correlate of the age (TBL) of Basho elements is the size of insertions.

Patterns and Correlates of Insertion Size Variation
Multiple regressions indicate that small Basho insertions tend to persist for longer periods than large insertions. However, these correlations do not rule out an alternative explanation: it is possible that old insertions have, over time, simply accumulated more deletions than young insertions. The size distribution of elements does not support this alternative. Our analysis of Basho subfamilies allowed us to explicitly examine the pattern of size variation in a phylogenetic context. Closely related elements were similar in size, with small differences in size (~1–100 bp) probably arising by insertion and deletion events (fig. 6). Large size differences (~0.3–1.5 kb) were apparent between subfamilies, or groups within subfamilies, and appear to be inherited from ancestral elements. Groups of elements less than ~1,000-bp long tended to have both young and old representatives. In contrast, longer elements had primarily young representatives and were less abundant (fig. 5e). On average, lineages consisting of large elements have a shorter genomic lifespan than small-sized groups.


Figure 6
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— Boxplots of insertion sizes of clades of Basho elements. Clades represent either entire subfamilies (e.g., BI = Basho I) or monophyletic groups of similarly sized elements within a subfamily (e.g., BIIa–d = Basho II clade a–d). Gray and white backgrounds alternate between subfamilies. Widths of boxplots represent the square root of the number of observations.

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Recent Expansion Features a Limited Number of Exon-Capture Events
Basho elements were described prior to the identification of Helitrons as a distinct and likely ubiquitous class of eukaryotic transposons (Li et al. 2000Go). They were later shown to be nonautonomous Helitrons, highly diverged from their autonomous parent elements but retaining the characteristic terminal sequences needed for replication via proteins produced by autonomous elements (Feschotte and Wessler 2001Go; Kapitonov and Jurka 2001Go). Our study began with a computational survey of Basho elements in the A. thaliana genome, using a previously defined query set as a starting point (Li et al. 2000Go). Computational surveys necessarily have a trade-off between sensitivity and accuracy. Sensitive approaches are more apt to uncover highly diverged copies, and thus provide a broader view of TE history. Sensitivity may, however, sacrifice accuracy. In this study, we have emphasized accuracy to facilitate sequence alignment. Even so, a subset of 143 of our 565 elements was either unalignable or lacked telltale Basho features, suggesting our balance between sensitivity and accuracy is reasonable.

We used Basho sequences to construct phylogenies and infer insertion times, but biological and methodological biases undoubtedly impact these time estimates. Missing elements strongly affect insertion time estimates because they represent missing branches on the phylogeny. If, for example, our search missed numerous highly divergent Basho elements, we may have underestimated the average age of Basho insertions. The magnitude of this underestimation depends critically on the shape of the tree; underestimation would only be substantial if highly diverged elements also tended to lack close relatives (i.e., they have long TBLs). Deleted elements also strongly affect insertion time estimates because they represent missing branches on the phylogeny. If recently inserted Bashos have been deleted, the result is longer TBLs for existing elements and systematic overestimation of insertion times. Population variation causes a similar overestimation phenomenon because low frequency TEs are unlikely to be represented in the phylogeny and/or closely related elements can be lost from the Columbia genome by recombination and drift. In contrast, only recent insertion times can be estimated if numerous old TEs have been deleted from the genome (turnover), thus providing an incomplete picture of element activity over evolutionary time. The latter phenomenon likely contributes to the now common, though perhaps not always correct, conclusion that TE families in extant genomes have undergone recent expansion (Zhang et al. 2000Go; Devos et al. 2002Go; Bennetzen et al. 2005Go; Silva et al. 2005Go).

Despite these caveats, 4 lines of evidence strongly suggest that Basho elements have undergone a lineage-specific expansion in the A. thaliana genome. First, our age estimates date the insertion of 87% of Basho elements to be more recent than 5 Myr, subsequent to the divergence of A. thaliana from its closest relative A. lyrata. Second, 2 different types of PCR-based evidence corroborate our claim of a lineage-specific expansion. Transposon display based on A. thaliana Basho primers fails to yield products in A. lyrata, whereas the same approach works quite well for a broad range of other element types (Lockton S, Gaut BS, unpublished data). In addition, PCR amplification of A. lyrata intergenic regions indicates that shared Basho elements are rare or absent between species, whereas shared elements from other TE families can be amplified and easily identified (DeRose-Wilson L, Gaut BS, unpublished data).

Third, draft sequence data from the ongoing A. lyrata sequencing project (www.jgi.doe.gov) are consistent with our hypothesis of lineage-specific expansion. We used a randomly chosen representative Basho element from each subfamily as a query in a Blast against the A. lyrata trace archive (http://trace.ensembl.org), a database of single-pass sequence reads from the A. lyrata sequencing project. Two of the Basho subfamilies (IV and VI) showed no similarity to A. lyrata genomic sequences, whereas the remaining subfamilies shared partial homology with a number of A. lyrata sequences, ranging from 7 hits for Basho V_2080 to 80 hits for Basho II_394. Importantly, of the 228 Blast hits to the A. lyrata archive, only 4 included the conserved 3' region characteristic of Basho elements in our data set. The remaining hits were to interior regions of the Basho sequences. It is possible that these hits represent Basho elements that have suffered deletion of their 3' sequences in A. lyrata, but we note that these 3' sequences are crucial to nonautonomous Helitron replication (Brunner et al. 2005Go). Given other sources of evidence for expansion and the propensity of Basho elements for acquiring host DNA (Brunner et al. 2005Go; Morgante et al. 2005Go), it seems likely that these hits represent sequences conserved between A. thaliana and A. lyrata that have been incorporated into Basho elements in A. thaliana.

Finally, the pattern of gene capture supports recent expansion. About 40% (216) of Basho insertions harbor sequences highly similar to A. thaliana protein-coding sequences, with these sequences ranging in size from 30 to 350 bp. Although prevalent, these sequences can be traced to just 5 distinct gene-capture events (table 1), and 2 of these 5 genes still have Basho elements associated with them. The 2 genes share sequence identity with 2 other genes only at a shared fragment or single exon, respectively. These shared sequences appear to be the result of Basho transpositions.

One of the genes, At3g13362, overlaps for 32 bps of its 3'-coding region (including the termination codon) with the 5' terminus of a Basho II element. The same phenomenon was observed at the 5' end of a second Basho II insertion and the 3' end of At1g79150. This 5' sequence motif is shared with 197 other Basho II insertions (42 Basho II insertions lack it), and it thus seems possible that this motif is derived from a gene. Interestingly, the Basho II insertion into At1g79150 is segregating at low frequency (present in only 1 out of 48 individuals other than Columbia), suggesting that this event is recent (estimated age = 17,000 years based on TBL).

The remaining gene (At4g22800) and another (At2g27070) are unrelated except for a shared exon that is located within Basho V elements (fig. 3). We cannot account for the origin of the shared exon as it does not demonstrate homology (based on a BlastN search) with any other expressed coding sequence in the A. thaliana genome. This sequence is also present in only a subset of Basho V elements, but Basho V_2080, which contains the chimeric exon in At4g22800, is phylogenetically basal to this Basho group (fig. 1), suggesting that it represents the oldest remaining, and perhaps original, version of this exon-capture event. The evidence that Basho V activity has contributed to the coding region of At2g27070 is particularly convincing. At2g27070 is a "type-B response regulator transcription factor" gene that shares ~90% identity with its closest paralog (At5g07210) across most of their sequences, but the 2 genes share no sequence identity in the chimeric exon. Thus, the insertion of Basho V_210 has added 20 amino acids to a preexisting transcription factor protein. The Basho V insertions into both At4g22800 and At2g27070 are segregating in our population sample at frequencies 0.83 and 0.24, respectively. These insertions will provide a unique opportunity to study gene structure variation that segregates within a species.

In summary, roughly half of the Basho elements in the A. thaliana genome harbor gene fragments, but these represent only 5 gene-capture events. Even fewer of these events (2 out of 216) contribute to the expressed coding sequence complement of the A. thaliana genome, suggesting the vast majority of fragment-bearing elements have had no impact on the evolution of protein-coding DNA. These observations do lead to 2 important conclusions, however. First, the close homology of such a small number of genes with so large number of Basho sequence fragments is further testament to the rapid expansion of Basho elements in A. thaliana. Second, these observations suggest that some caution should be taken before estimating the contribution of Helitrons to the gene complement of other species. For example, Morgante et al. (2005)Go estimate, by extrapolation, that 10,000 nonshared gene fragments exist between 2 maize inbred lines. Our work leads us to speculate that the pattern in maize may reflect far fewer than 10,000 gene-capture events.

Evolutionary Forces Limiting the Distribution of Basho Elements
To learn more about the evolutionary forces governing the accumulation of Bashos, we estimated the occupation frequency of 278 elements. Most of the Basho elements in our study are either fixed or present at high frequencies; 81% of sampled loci were present in 50% or more of our panel (including fixed insertions). These results are in sharp contrast to data from Drosophila, where most copies of TEs are present in less than 5% of sampled individuals and only a few are at high frequency or fixed (Petrov et al. 2003Go; Maside et al. 2005Go). Basho occupation frequencies were also strongly correlated with age, as is expected for neutral alleles (Kimura 1983Go)(fig. 5a). Overall, these 2 observations (high frequency and relationship with time) suggest that selection against our sample of Basho insertions is not strong.

Nonetheless, 2 observations suggest that Bashos do not accumulate in a purely neutral fashion. First, the genomic distribution of Basho elements in A. thaliana is negatively correlated with gene density (Wright et al. 2003Go). This observation has been interpreted as evidence for selection against the disruption of genes. Second, both the population frequency and the age of Basho elements are negatively correlated with sequence length (fig. 5b and e). Note that this relationship is not simply an artifact of small deletions within elements over time (fig. 6; also note vertical canalization of sizes in fig. 5e). Rather, these observations suggest differential selection against elements as a function of size.

Given evidence of weak selection against Basho insertions, is selection more likely a function of gene disruption or ectopic recombination? To date, few studies have explicitly investigated the effects of recombination rate or the density of genes on TE population frequency (Charlesworth et al. 1992Go; Hoogland and Biemont 1996Go). Instead, studies have primarily relied on the distribution of TEs within a genome sequence, routinely revealing that TEs are rare in gene-rich regions and accumulate in pericentromeric regions (Arabidopsis Genome Initiative 2000Go; Wright et al. 2003Go; International Rice Genome Sequencing Project 2005Go; Maside et al. 2005Go). This distribution suggests that gene disruption is the major force countering TE expansion. However, we detect no correlation between gene density, or distance to nearest gene, and Basho occupation frequency (or age) and thus uncover no evidence suggesting that selection against gene disruption is a major factor governing the persistence of the TEs in our sample. One reason for this could be that severe gene-disruption events are likely strongly selected against and may not rise to appreciable population frequencies (Naito et al. 2006Go). Because our sampling method underestimates the number of low frequency insertions throughout the population (see Materials and Methods), severe gene-disruption events are unlikely to be included in our sample.

There is growing evidence that ectopic recombination is a major mechanism of genome size reduction in both plants (including selfing species) and animals and has the potential to create large-scale chromosomal aberrations with high accompanying fitness costs (Petrov et al. 1996Go; Bennetzen et al. 2005Go; Gaut et al. 2007Go). The formation of solo-LTRs by ectopic recombination between retrotransposon copies has been observed in the genome of A. thaliana, where the ratio of solo-LTRs to intact elements is roughly 1:1 (Devos et al. 2002Go). Intriguingly, the occurrence of solo-Long Terminal Repeats (LTRs) seems to be dependent on LTR size (Shirasu et al. 2000Go; Vitte and Panaud 2003Go). This relationship also holds for non-LTR retrotransposons in Drosophila, in which the strength of selection against insertions is positively correlated with size (Petrov et al. 2003Go).

Basho elements have 3 features which make them likely candidates for ectopic pairing: they are abundant in the A. thaliana genome (2% of nuclear DNA), insertions can be quite large (>2 kb), and a given element generally shares a great deal of sequence similarity with many other elements, due to recent expansion of the family. We found initial evidence of recombination between Basho elements in a subset of sequences (25 of 565 elements), which appeared to be chimeras of sequences from different subfamilies. In addition, our regression analysis revealed that large insertions do not persist in the genome for as long as small insertions and segregate at lower population frequencies, indicating size-based selection on our sample of Basho elements. These results are consistent with the ectopic recombination model, and we hypothesize that ectopic recombination is an important force for governing the persistence of not only LTR retrotransposons (Devos et al. 2002Go; Bennetzen et al. 2005Go) but also Basho elements. Further, the ectopic model asserts that heterozygous TE loci are more likely to ectopically pair than homozygous loci, resulting in stronger selection against heterozygous loci (Montgomery et al. 1991Go). Thus, selection against TEs should be stronger in an outcrossing species such as A. lyrata, due to higher heterozygosity, than in a selfer such as A. thaliana. This prediction is consistent with the higher frequency of TEs in A. thaliana (Wright et al. 2001Go), but additional contrasts between species may help elucidate the role of ectopic recombination in limiting expansion of TE families.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary table and figure are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The authors would like to thank R. Gaut for population assays and lab members for useful discussion. This work was supported by National Science Foundation grants DEB-0426166 and DBI-0320683.


    Footnotes
 
Manolo Gouy, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature (2000) 408:796–815.[CrossRef][Medline]

    Bartolomé C, Maside X, Charlesworth B. On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol Biol Evol (2002) 19:926–937.[Abstract/Free Full Text]

    Bennetzen JL. Transposable elements, gene creation and genome rearrangement in flowering plants. Curr Opin Genet Dev (2005) 15:621–627.[CrossRef][Web of Science][Medline]

    Bennetzen JL, Ma J, Devos KM. Mechanisms of recent genome size variation in flowering plants. Ann Bot (2005) 95:127–132.[Abstract/Free Full Text]

    Brunner S, Pea G, Rafalski AJ. Origins, genetic organization and transcription of a family of non-autonomous helitron elements in maize. Plant J (2005) 43:799–810.[CrossRef][Web of Science][Medline]

    Charlesworth B, Lapid A, Canada D. The distribution of transposable elements within and between chromosomes in a population of Drosophila melanogaster. II: inferences on the nature of selection against elements. Genet Res (1992) 60:115–130.[Web of Science][Medline]

    DeRose-Wilson L, Gaut BS. Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata. BMC Evol Biol. (2007) 7:66.[CrossRef][Medline]

    Devos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res (2002) 12:1075–1079.[Abstract/Free Full Text]

    Dray T, Gloor G. Homology requirements for targeting heterologous sequences during P-induced gap repair in Drosophila melanogaster. Genetics (1997) 147:689–699.[Abstract]

    Feschotte C, Wessler SR. Treasures in the attic: rolling circle transposons discovered in eukaryotic genomes. Proc Natl Acad Sci USA (2001) 98:8923–8924.[Free Full Text]

    Finnegan D. Transposable elements. Curr Opin Genet Dev (1992) 2:861–867.[CrossRef][Medline]

    Gaut BS, Wright S, Rizzon C, Dvorak J, Anderson LK. Recombination: an underappreciated factor in the evolution of plant genomes. Nat Rev Genet (2007) 8:8–14.[Web of Science]

    Hall T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser (1999) 41:95–98.

    Hoogland C, Biemont C. Chromosomal distribution of transposable elements in Drosophila melanogaster: test of the ectopic recombination model for maintenance of insertion site number. Genetics (1996) 144:197–204.[Abstract]

    International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature (2005) 436:793–800.[CrossRef][Medline]

    Jakubczak JL, Burke WD, Eickbush TH. Retrotransposable elements R1 and R2 interrupt the rRNA genes of most insects. Proc Natl Acad Sci USA (1991) 88:3295–3299.[Abstract/Free Full Text]

    Kapitonov V, Jurka J. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA (2001) 98:8714–8719.[Abstract/Free Full Text]

    Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica (2002) 115:49–63.[CrossRef][Web of Science][Medline]

    Kimura M. The neutral theory of molecular evolution (1983) Cambridge (MA): Cambridge University Press.

    Koch MA, Haubold B, Mitchell-Olds T. Comparative evolutionary analysis of the chalcone synthase and alcohol dehydrogenase loci among different lineages of Arabidopsis, Arabis and related genera (Brassicaceae). Mol Biol Evol (2000) 17:1483–1498.[Abstract/Free Full Text]

    Kumar A, Tamura K, Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.[Abstract/Free Full Text]

    Lai J, Li Y, Messing J, Dooner HK. Gene movement by Helitron transposons contributes to the haplotype variability of maize. Proc Natl Acad Sci USA (2005) 102:9068–9073.[Abstract/Free Full Text]

    Li Q, Wright S, Yu Z, Bureau T. Transposon diversity in Arabidopsis thaliana. Proc Natl Acad Sci USA (2000) 97:7376–7381.[Abstract/Free Full Text]

    Maside X, Assimacopoulos S, Charlesworth B. Fixation of transposable elements in the Drosophila melanogaster genome. Genet Res (2005) 85:195–203.[CrossRef][Web of Science][Medline]

    Montgomery EA, Charlesworth B, Langley CH. A test for the role of natural selection in the stabilization of transposable element copy number in a population of Drosophila melanogaster. Genet Res (1987) 49:31–41.[Web of Science][Medline]

    Montgomery EA, Huang S-M, Langley CH, Judd BH. Chromosome rearrangement by ectopic recombination in Drosophila melanogaster: genome structure and evolution. Genetics (1991) 129:1085–1098.[Abstract]

    Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, Rafalski AJ. Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet (2005) 37:997–1002.[CrossRef][Web of Science][Medline]

    Naito K, Cho E, Yang G, Campbell MA, Yano K, Okumoto Y, Tanisaka T, Wessler SR. Dramatic amplification of a rice transposable element during recent domestication. Proc Natl Acad Sci USA (2006) 103:17620–17625.[Abstract/Free Full Text]

    Nordborg M, Hu T, Ishino Y, et al, (24 co-authors). The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol (2005) 3:1289–1299.[Web of Science]

    Nuzhdin SV. Sure facts, speculations, and open questions about the evolution of transposable element copy number. Genetica (1999) 107:129–137.[CrossRef][Web of Science][Medline]

    Petrov DA, Aminetzach YT, Davis JC, Bensason D, Hirsh AE. Size matters: non-LTR retrotransposoable elements and ectopic recombination in Drosophila. Mol Biol Evol (2003) 20:880–892.[Abstract/Free Full Text]

    Petrov DA, Lozovskaya ER, Hartl DL. High intrinsic rate of DNA loss in Drosophila. Nature (1996) 384:346–349.[CrossRef][Medline]

    Shirasu K, Schulman AH, Lahaye T, Schulze-Lefert P. A contiguous 66-kb barley DNA sequence provides evidence for reversible genome expansion. Genome Res (2000) 10:908–915.[Abstract/Free Full Text]

    Silva JC, Bastida F, Bidwell SL, Carlton JM. A potentially functional Mariner transposable element in the Protist Trichomonas vaginalis. Mol Biol Evol (2005) 22:126–134.[Abstract/Free Full Text]

    Vitte C, Panaud O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol Biol Evol (2003) 20:528–540.[Abstract/Free Full Text]

    Wright SI, Agrawal N, Bureau T. Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res (2003) 13:1897–1903.[Abstract/Free Full Text]

    Wright SI, Quang HL, Schoen DJ, Bureau TE. Population dynamics of an Ac-like transposable element in self- and cross-pollinating Arabidopsis. Genetics (2001) 158:1279–1288.[Abstract/Free Full Text]

    Zhang L, Gaut BS. Does recombination shape the distribution and evolution of tandemly arrayed genes (TAGs) in the Arabidopsis thaliana genome? Genome Res. (2003) 13:2533–2540.[Abstract/Free Full Text]

    Zhang Q, Arbuckle J, Wessler SR. Recent, extensive, and preferential insertion of members of the miniature inverted-repeat transposable element family Heartbreaker into genic regions of maize. Proc Natl Acad Sci USA (2000) 97:1160–1165.[Abstract/Free Full Text]

Accepted for publication August 30, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
L. Yang and J. L. Bennetzen
Structure-based discovery and description of plant and animal Helitrons
PNAS, August 4, 2009; 106(31): 12832 - 12837.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. D. Hollister and B. S. Gaut
Epigenetic silencing of transposable elements: A trade-off between reduced transposition and deleterious effects on neighboring gene expression
Genome Res., August 1, 2009; 19(8): 1419 - 1428.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Li and H. K. Dooner
Excision of Helitron Transposons in Maize
Genetics, May 1, 2009; 182(1): 399 - 402.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Lockton, J. Ross-Ibarra, and B. S. Gaut
Demography and weak selection drive patterns of transposable element diversity in natural populations of Arabidopsis lyrata
PNAS, September 16, 2008; 105(37): 13965 - 13970.
[Abstract] [Full Text] [PDF]


Home page
Mol PlantHome page
C. Fan, Y. Zhang, Y. Yu, S. Rounsley, M. Long, and R. A. Wing
The Subtelomere of Oryza sativa Chromosome 3 Short Arm as a Hot Bed of New Gene Origination in Rice
Mol Plant, September 1, 2008; 1(5): 839 - 850.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
B. S. Gaut and J. Ross-Ibarra
Selection on Major Components of Angiosperm Genomes
Science, April 25, 2008; 320(5875): 484 - 486.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
24/11/2515    most recent
msm197v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hollister, J. D.
Right arrow Articles by Gaut, B. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hollister, J. D.
Right arrow Articles by Gaut, B. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?