MBE Advance Access originally published online on September 20, 2006
Molecular Biology and Evolution 2006 23(12):2505-2520; doi:10.1093/molbev/msl127
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Transcription and Evolutionary Dynamics of the Centromeric Satellite Repeat CentO in Rice

* Department of Horticulture, University of Wisconsin-Madison
Institute of Plant Molecular Biology, Ceske Budejovice, Czech Republic
E-mail: jjiang1{at}wisc.edu.
| Abstract |
|---|
|
|
|---|
Satellite DNA is a major component of centromeric heterochromatin in most multicellular eukaryotes, where it is typically organized into megabase-sized tandem arrays. It has recently been demonstrated that small interfering RNAs (siRNAs) processed from centromeric satellite repeats can be involved in epigenetic chromatin modifications which appear to underpin centromere function. However, the structural organization and evolution of the centromeric satellite DNA is still poorly understood. We analyzed the centromeric satellite repeat arrays from rice chromosomes 1 and 8 and identified higher order structures and local homogenization of the CentO repeats in these 2 centromeres. We also cloned the CentO repeats from the CENH3-associated nucleosomes by a chromatin immunoprecipitation (ChIP)based method. Sequence variability analysis of the ChIPed CentO repeats revealed a single variable domain within the repeat. We detected transcripts derived from both strands of the CentO repeats. The CentO transcripts are processed into siRNA, suggesting a potential role of this satellite repeat family in epigenetic chromatin modification.
Key Words: transcription centromere satellite repeat siRNA
| Introduction |
|---|
|
|
|---|
It has long been known that centromeric regions in many complex eukaryotic species contain highly repetitive satellite DNAs. In several model eukaryotes, including humans, mouse, Drosophila melanogaster, and Arabidopsis thaliana, satellite repeats make up the bulk of the centromeric heterochromatin. The centromeric satellite repeats in these species are so abundant that they form the most dominant tandem repeat families in the genomes. It has recently been demonstrated in several plant and animal species that the functional centromeres, which are marked by a centromere-specific histone H3 variant, CENH3, are embedded within the centromeric satellite arrays (Henikoff et al. 2001
Human centromeres have been the most extensively studied centromeres among complex eukaryotic species. The main DNA component of human centromeres is the
satellite DNA that consists of AT-rich 171-bp monomers arranged in a tandem, head-to-tail configuration. The amount of the
satellite DNA in different centromeres varies from
250 kb to >4 Mb (Wevrick and Willard 1989
; Oakey and Tyler-Smith 1990
). There are 2 major types of
satellite DNA: "monomeric" repeat and "higher order" repeat. Higher order
satellite DNA consists of several monomeric repeats that are amplified as a unit, with the multimeric units being arranged in a tandem head-to-tail configuration. The higher order repeats are highly homogeneous and are typically 97100% identical, whereas monomeric repeats are on average
70% identical (Rudd and Willard 2004
). There are several lines of evidence indicating that the higher order
satellite DNA, not the monomeric
satellite DNA, is associated with the functional centromeres (Schueler et al. 2001
; Ando et al. 2002
; Ohzeki et al. 2002
; Spence et al. 2002
).
The centromeres of several plant species, including A. thaliana, rice, and maize, have been studied extensively in recent years. Centromere-specific satellite repeats were found in all 3 species (Ananiev et al. 1998
; Heslop-Harrison et al. 1999
; Cheng et al. 2002
). The amount of satellite repeats among individual centromeres varies significantly, ranging from
60 kb in rice chromosome 8 (Cheng et al. 2002
) up to multimegabase arrays in several chromosomes among all 3 species (Kumekawa et al. 2000
, 2001
; Cheng et al. 2002
; Jin et al. 2004
). It has been demonstrated in both Arabidopsis and maize that only part of the megabase-sized satellite DNA arrays is incorporated into the "centromeric chromatin" that contains CENH3 (Jin et al. 2004
; Shibata and Murata 2004
; Jin et al. 2005
; Lamb et al. 2005
). However, it is not known if the satellite repeats associated with CENH3 are structurally unique compared with the satellite repeats in the pericentromeric domains.
In Schizosaccharomyces pombe, the tandem repeats located in the pericentromeric heterochromatin are transcribed and subject to RNA interference (RNAi) (Hall et al. 2002
; Volpe et al. 2002
). Mutation of genes associated with the RNAi pathway resulted in aberrant accumulation of complementary transcripts from the repeats, which was accompanied by loss of histone H3 lysine-9 methylation and impairment of centromere function (Volpe et al. 2002
, 2003
). Transcription and production of small interfering RNAs (siRNAs) from centromeric satellite repeats have recently been reported in several complex eukaryotic species (Fukagawa et al. 2004
; Kanellopoulou et al. 2005
; May et al. 2005
; Zhang et al. 2005
). However, the relationship between transcription of centromeric satellite repeats and centromeric silencing/centromere function is not clear in these species. It appears that if such relationships exist, they should be far more complex than that reported in S. pombe.
Rice (Oryza sativa) centromeres contain a 155-bp satellite repeat CentO (Dong et al. 1998
). The presence of only limited amounts of CentO in some rice chromosomes (60150 kb) (Cheng et al. 2002
) facilitated development of bacterial artificial chromosome (BAC) contigs that span the entire centromeres, allowing full sequencing of these regions (Matsumoto et al. 2005
). In contrast, several other rice centromeres contain CentO arrays that extend over megabases of DNA (Cheng et al. 2002
), similar to the organization of the 178-bp satellite repeat in Arabidopsis centromeres. Thus, rice provides an excellent model system to study the organization of complete arrays of centromeric satellite repeats within specific centromeres. Here we report the structure and organization of the CentO satellite in the centromeres of rice chromosomes 1 and 8 (Cen1 and Cen8), which contain the largest and smallest CentO arrays, respectively, among the 12 rice chromosomes. We also isolated transcribed CentO repeats and CentO repeats from the CENH3-containing nucleosomes. We detected siRNAs cognate to the CentO repeats using gel-blot hybridization. Implications of these results on function and evolution of the CentO satellite repeat family are discussed.
| Materials and Methods |
|---|
|
|
|---|
ChIP Cloning and DNA Sequencing
Oryza sativa spp. japonica rice variety "Nipponbare" was used for chromatin immunoprecipitation (ChIP) cloning and transcription studies. The ChIP cloning experiments using a rice anti-CENH3 antibody were conducted as described previously (Lee et al. 2005
Sequence Analyses
The CentO repeats in Cen1 were extracted from the International Rice Genome Sequencing Project (IRGSP) sequence (version 3.0, 30 December 2004) (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml) by in silico restriction digestion using MAPDRAW (DNAstar, Madison, WI) and EMBOSS programs (Rice et al. 2000
) (http://emboss.sourceforge.net). CentO tracts of Cen1 and Cen8 were determined using the dot plot program, DOTTER (Sonnhammer and Durbin 1995
) and local Blast program. The CentO repeats from Cen1 and Cen8 were characterized as monomeric or higher order repeats using the DotPlot alignment tool of MegAlign (DNAstar). Groups of monomers that show a higher order structure by DotPlot (stringency of greater than or equal to 95% identical over 100-bp window) were aligned by MegAlign, and percent identity among higher order repeats was determined. We used ClustalW version 1.83 to compute all pairwise alignments among CentO monomers. Pairwise similarities of monomers were extracted from ClustalW output using a Perl script and translated into particular color values as described by Macas et al. (2006)
. The CentO repeats from different sources were aligned using ClustalX and manually examined and edited using MacClade (http://macclade.org/macclade.html). We used PAUP* 4.0b10 (http://paup.csit.fsu.edu) to generate neighbor-joining trees. A neighbor-joining bootstrap of 100 replicates was performed using the Tajima and Nei method. Sequence periodicity analysis was based on the concept of nucleotide autocorrelation functions (Herzel et al. 1999
) and expressed for a distance of k base pairs and nucleotide X as a difference CXX(k) = pXX(k) pX.pX, where pXX is the observed frequency of identical nucleotides X and pX is the proportion of nucleotide X in the sequence. Thus, a positive value of CXX implies that there are more X-X pairs at distance k than expected by chance. The analysis was implemented in BioPerl program, and the results were visualized using Mgraph (Macas et al. 2006
).
Conserved and variable regions of a CentO monomer were defined by a sliding window analysis as described previously (Hall et al. 2003
). The percent occurrence of the most frequent base at each site was calculated for CentO repeats; this was plotted with the average percent occurrence and standard deviation (SD). z-Scores of 10-bp windows were used to define significantly higher or lower variable region of CentO repeat sequences and then the residual graphs of z-scores from 10-bp window analysis are presented. Windows that had z-scores of ±1 SD from the means were considered significant.
Reverse TranscriptasePolymerase Chain Reaction and 3' Rapid Amplification of cDNA Ends
RNA used for reverse transcriptasepolymerase chain reaction (RTPCR) and 3' rapid amplification of cDNA ends (RACE) experiments was isolated using Trizol (Invitrogen) and treated with DNaseI (Ambion, Austin, TX). SuperScript III First-Strand Synthesis System for RTPCR kit (Invitrogen) was used for both RTPCR and 3' RACE according to manufacturer's protocol. Reverse transcription (cDNA synthesis) was carried out using 100 ng RNA and a mix of CentO strand-specific primers (RTPCR) or 3' RACE_oligoT primer (5'-GGC CAC GCG TCG ACT AGT ACT TTT TTT TTT TTT TTT TTV-3'; 3' RACE). The mix of forward CentO primers consisted of DNA oligonucleotides CentO_U (5'-TCATGTTTTGGTGCTTTTTG-3'), CentO_F1 (5'-CAATATGTCCAAAAANCATGTTT-3'), and CentO_F2 (5'-CGAACGCACCCAATACANT-3'). The mix of reverse CentO primers included DNA oligonucleotides CentO_L+ (5'-GNTTTTTGGACATATTGGAGTG-3'), CentO_R1 (5'-AAACATGNTTTTTGGACATATTG-3'), and CentO_R2 (5'-ANTGTATTGGGTGCGTTCG-3'). Reversely transcribed RNA was used as a template for PCR amplification. The PCR reaction mix (25 µl) consisted of 1x PCR buffer, 0.2 mM deoxynucleoside triphosphates, 0.2 µM primers, 1.5 mM MgCl2, 1 U of Platinum Taq polymerase (Invitrogen), and 5 ng of reversely transcribed RNA or an equal amount of reverse transcriptaseuntreated RNA as a negative control. The reaction profile included 35 cycles of 30 s at 94 °C, 50 s at 55 °C and 13 min at 72 °C; preceded by initial denaturation (3 min at 94 °C) and followed by final extension step (10 min at 72 °C). Three combinations of CentO primers (CentO_U and CentO_L, CentO_F1 and CentO_R2, CentO_F2 and CentO_R1) were used for RTPCR amplification. Primer pairs including AUAP_3' RACE (5'-GGC CAC GCG TCG ACT AGT AC-3') and either of all CentO primers were used for 3'RACE amplification. Sequences of cloned RTPCR and RACE products were deposited in GenBank expressed sequence tag (EST) database under accession numbers EB086891EB086995.
Detection of siRNA
The RNA enriched for short fragments was isolated using mirVana miRNA isolation kit (Ambion). Approximately 10 µg RNA was resolved on denaturing 15% polyacrylamide gel and then transferred electrophoretically on Nytran SPC nylon membrane (Schleicher & Schuell BioScience, Keene, NH). Strand-specific probes were labeled using MAXIscript kit for in vitro transcription labeling (Ambion). The template for in vitro transcription was prepared from RTPCR clone ID124 (GenBank accession number EB086904). Promoter sequences for T7 polymerase were added to either site of the insert by PCR with primer pairs T7 + CentO_U (5'-TAA TAC GAC TCA CTA TAG GGT CAT GTT TTG GTG CTT TTT G-3') and CentO_L+ (reverse probe) or T7+CentO_L+ (5'-TAA TAC GAC TCA CTA TAG GGN TTT TTG GAC ATA TTG GAG TG-3') and CentO_U (forward probe). To visualize marker RNA, 0.5 fmol of the marker-specific template was added to the labeling reactions. The hybridization was performed overnight in 125 mM sodium phosphate buffer (pH 7.2) containing 50% deionized formamide, 7% sodium dodecyl sulfate (SDS), and 250 mM sodium chloride at 42 °C. After the hybridization, membranes were washed 3 times in 2x standard saline citrate (SSC) and 0.1% SDS for 10 min, twice in 1x SSC and 0.1% SDS for 15 min, and finally once in 5x SSC and 0.5% SDS for 10 min at 50 °C. Signals were detected using a phosphoimager.
| Results |
|---|
|
|
|---|
Sequencing and Sequence Assembling of the CentO Repeats in Cen8
Rice Cen8 contains a single CentO block, named as CentO_8, in the CENH3-binding domain (fig. 1A and B) (Nagaki et al. 2004
|
The assembly of the CentO_8 sequences of a0038J12 was a challenging process. We constructed 2 shotgun libraries (average insert size 24 kb and 612 kb, respectively) for this BAC clone. The shotgun sequences (1,434 total) were assembled using the The Institute for Genomic Research (TIGR) Assembler (Sutton et al. 1995
Cen8 was sequenced independently by Wu et al. (2004)
. The CentO_8 block within the 1.97 Mb Cen8 sequence reported by Wu et al (2004)
contains 77,772-bp sequences (http://rgp.dna.affrc.go.jp/publicdata/cent8/download.html). However, the CentO_8 block within the most recent release of the chromosome 8 sequence (Build 4.0 psuedomolecules, August 2005) by the IRGSP contains only 76,175-bp sequence (http://rgp.dna.affrc.go.jp/IRGSP/Build4/build4.html). The sizes of the CentO_8 block in both reports are longer than the 64- to 65-kb estimation by fiber-FISH. The size variation of CentO_8 from independent sequencing efforts shows that sequencing and assembly of a large block of highly homogenized satellite repeats is a still major technical challenge. Thus, we need to be cautious in analyzing such sequence data and in drawing biological conclusions solely based on the sequence data.
Structure and Organization of the CentO Repeats in Cen8
The CentO_8 sequences from both a0038J12 (named as TIGR sequence thereafter) and IRGSP chromosome 8 pseudomolecule (named as IRGSP sequence thereafter) contain 3 subblocks of CentO repeats, CentO_8A, CentO_8B, and CentO_8C, respectively (fig. 1A). These 3 CentO blocks are separated by 2 centromeric retrotransposon (CRR)related sequences (fig. 1A and B). We compared the 2 sequences by dot plot analysis and pairwise alignment. The 20,551 bp in the center of the 2 sequences are 100% identical. It is likely that the CRR-related sequences provided valuable anchoring sequences to sequence assembling. Short CentO fragments, 3,699 bp and 131 bp, respectively, located at the edges of the 2 sequences are also 100% identical (fig. 1A). The rest of the sequences cannot be perfectly aligned. These results again suggest that one or both sequences are not accurately assembled. BACs containing satellite repeats may not be stably maintained in Escherichia coli (Song et al. 2001
), which can also cause the discrepancy of the 2 CentO_8 sequences.
The CentO_8A, CentO_8B, and CentO_8C subblocks are18,342 bp, 7,617 bp, and 12,249 bp, respectively, in the TIGR sequence. We calculated base periodicities within individual CentO_8 subblocks and generated graphs of peaks showing most frequent monomer, dimer, and multimer (fig. 1C). The graph of each CentO tract indicated that the most frequent monomer is 155 bp, but CentO_8B and CentO_8C contain small proportions of monomers with different sizes, 145 bp and 167 bp, respectively (fig. 1C). CentO_8A contained 115 units of the 155-bp monomers. CentO_8B contained 41 units of 155-bp monomer and 8 units of the 145-bp CentO monomer that contains a 10-bp deletion (fig. 1D). CentO_8C consists of 67 units of the 155-bp monomer and 6 units of 167-bp monomer that contains a 12-bp duplication at the 58th base position (fig. 1D). All the CentO monomers were tandemly ordered and uninterrupted in a head-to-tail arrangement within each subblock. CentO_8A and CentO_8B subblocks are in the same orientation, but the CentO_8C subblock is in an opposite orientation (fig. 1B). The CentO repeats within CentO_8 have an overall A + T content of 56.6%.
Using a combination of Blast and DotPlot alignment tools (see Materials and Methods), we found that the CentO repeats can be classified as either monomeric or higher order. The higher order CentO repeats contain at least 2 tandem copies of a multimeric unit. Such repeats were found in the CentO_8A subblock (fig. 2A and B), as well as CentO_8B and CentO_8C subblocks (Supplementary Figure 1, Supplementary Material online). CentO_8A contains 2 multimeric units, HOR A and HOR B, each comprising of eleven 155-bp monomers and another 95-bp partial sequence derived from the 155-bp monomer (fig. 2A and B). HOR A and HOR B are 99. 2% identical and are separated by a 24-bp sequence. Individual monomers within each multimeric unit share 47.796.8% sequence similarity (70.596.8% similarity if taking out the highly divergent first monomer). Phylogenetic trees of these individual monomers indicate that monomers located at equivalent positions in the duplicated multimeric units are highly homologous (fig. 2C).
|
Structure and Organization of the CentO Repeats in Cen1
Rice Cen1 contains
1.4 Mb of CentO repeat, representing one of the largest CentO arrays amongt the 12 rice chromosomes (Cheng et al. 2002
|
Most of the CentO_1 blocks contain only heterogeneous CentO monomers that fail to show any evidence of higher order periodicity. These heterogeneous monomers within Cen1 are 67100% identical. However, some higher order CentO repeats were found in Cen1 (fig. 4, Supplementary Figure 1, Supplementary Material online). For example, CentO_1D contains 2 different higher order CentO repeats that consists of 6 and 10 different monomers, respectively (fig. 4B and C). The equivalent monomers within the 2 higher order repeats share >97% and >99% sequence identities.
|
Local Homogenization of the CentO Repeats within Cen8 and Cen1
We investigated if homogenization of the CentO repeats occurred within a specific centromere. We first extracted all CentO monomers from all known higher order repeats within Cen1 and Cen8 and constructed a phylogenetic tree using neighbor-joining methods (fig. 5). The CentO repeats from Cen1 and Cen8, respectively, fall into 2 distinct clades. Most CentO monomers were grouped into subclades that can be associated with specific CentO_1 and CentO_8 subblocks (fig. 5). Similarly, the monomeric CentO repeats within Cen1 and Cen8 were also sorted into different subclades on the neighbor-joining tree (Supplementary Figure 2, Supplementary Material online). These results show that CentO repeats from the same centromere are more closely related to each other than to repeats from different centromeres, supporting a local homogenization model.
|
We then analyzed the percent identity scores for all the CentO repeats within Cen1 and Cen8. The CentO repeats from the same centromere are clearly more similar based on the plot of percent identity scores (fig. 6). The CentO monomers from Cen8 are more uniformly similar to each other than the CentO monomers from Cen1. This is partially due to the fact that some Cen1 CentO monomers differ significantly in size from the typical 155-bp and 165-bp CentO monomers. Notably, the CentO repeats from the short arm of the Cen1 (CentO_1A, CentO_1B, and CentO_1C) appear to be more similar compared with the CentO repeats from the long arm of the Cen1 (CentO_1D, CentO_1E, CentO_1F, CentO_1G, and CentO_1H) on the plot of percent identity scores (fig. 6), although the CentO_1I and 1J monomers are more similar to those in CentO_1 A, B, and C. Thus, these data support local homogenization of CentO repeats within Cen1 and Cen8.
|
We also calculated the means of mutual percent identities among CentO monomers within and between individual CentO subblocks from Cen1 and Cen8 (table 1). Within and between each of CentO_8A, CentO_8B, and CentO_8C, monomer percent identity was 87.690.9%. The percent identity of CentO monomers within and between each Cen1 CentO block was 72.790.4%. CentO_1H, which contains several significantly divergent monomers, has a particularly low mean of percent identity (72.7%) and high SD (15.3) (table 1). The overall means of percent identity among CentO monomers of Cen1 and Cen8 are 84.5% (SD 9.0) and 88.6% (SD 3.1), respectively, whereas the mean of percent identity between Cen1 and Cen8 is only 81. 3% (SD 7.7). Thus, these data again indicate that similarity among CentO monomers is greatest within a centromere.
|
Cloning and Analysis of the CentO Repeats Located in CENH3-Binding Domains
If a centromere contains several megabases of centromeric satellite repeats, it is likely that only a portion of the satellite array is associated with CENH3 (Jin et al. 2004
We extracted a total of 78 complete and 235 partial CentO monomers from the 112 sequences. Multiple alignments were conducted to generate the consensus sequence of the complete CentO monomers (Supplementary Figure 3, Supplementary Material online). The CentO repeats from the ChIP-cloned data set are fairly consistent in length, consisting exclusively of 155-bp (46) and 165-bp (32) monomers. The sizes of some CentO monomers deviate slightly from the typical 155-bp (154156 bp) and 165-bp (163166 bp) monomers, indicating that insertion and deletion events occurred within these repeats. The 165-bp monomer contains a 10-bp insertion (ATGCCAATAT) from 149- to 158-bp position. This 10-bp insertion showed >99% nucleotide identity among 32 units. Pairwise alignment by clustal method of the sequences revealed that the percent identity among 155-bp CentO monomers ranges from 76% to 100% and the percent identity among 165-bp CentO monomers from 86% to 100%. The CentO repeats derived from the CENH3-binding domains have an A + T content of 57.2%, similar to the 56.6% A + T content of the CentO_8.
Sequence Variability of CentO Repeats
Multiple alignment analysis revealed differences in sequence conservation across the CentO monomer (Supplementary Figure 3, Supplementary Material online). To measure this variation precisely, we calculated the nucleotide occurrence frequency at each base. The percentage of occurrence for the most frequent nucleotide was subjected to a z-score analysis, computed over a sliding window of 10 bp (fig. 7). We first used all ChIPed CentO monomers in this analysis. The 10-bp insertion within the 165-bp monomers was marked as a gray box on the graph (fig. 7A), and these 10 bp were calculated independently (see Materials and Methods). Most nucleotides within the CentO monomer were conserved within 1 SD of the mean of 92.7 ± 8.3%. The CentO monomer contains 8 polymorphic sites in which the most common nucleotide is less than 3 times more frequent than any other nucleotide (fig. 7). Six of the eight polymorphic sites are located within a highly variable region at the 111135th positions. The same highly variable domain was also identified in analyses using different sizes of sliding window (in the range of 518 bp, data not shown).
|
We then assessed the sequence variation of the CentO repeats from Cen8 and obtained similar results (fig. 7C and D). Base frequency analysis of all 155-bp CentO monomers from Cen8 revealed that most nucleotides were conserved within 1 SD from the mean of 93.2 ± 8.6%, including 8 polymorphic sites. The sliding window of z-scores of CentO monomers from Cen8 identified a single variable region that is located at a similar position to the highly variable domain of the ChIPed CentO monomers (fig. 7B and D). We also analyzed the sequence variation within the 155-bp monomers extracted from Cen1 (fig. 7EH). The sliding window of z-scores of the CentO repeats extracted from the short arm of Cen1 (CentO_1A, 1B, and 1C) shows expanded variable domains at similar positions to those within the ChIPed CentO repeats (fig. 7B and F). Interestingly, the sliding window of z-scores of the CentO repeats extracted from the long arm of Cen1 (CentO_1D, 1E, 1F, 1G, 1H, and 1I) shows variable domains throughout the CentO monomers with a significantly different graph compared with those from Cen8 and from ChIPed DNA (fig. 7G and H). The sequence in the 4560 bp region is particularly more variable than the same regions of the CentO in Cen8 and ChIPed DNA (fig. 7B, D, and H).
Transcription of the CentO Repeats
In order to investigate the transcription of the CentO repeats, we first searched the rice full-length cDNA (fl-cDNA) and EST databases using BlastN. One fl-cDNA (AK069198
[GenBank]
) and 2 ESTs (CF307961
[GenBank]
and CK041480
[GenBank]
) were identified in the databases. The fl-cDNA AK069198
[GenBank]
contains 3 monomers of CentO flanked by other repetitive sequences. It was mapped to a BAC clone OSJNBb0063C17 (AC146908
[GenBank]
; chromosome 11), which contains several clusters of CentO sequences intermingled with other sequences. The EST sequences CF307961
[GenBank]
and CK041480
[GenBank]
are composed of 2 CentO monomers preceded by a sequence of different origin and of 4 CentO monomers, respectively. CF307961
[GenBank]
was mapped to a BAC clone OJ1058_D04 (AP006234
[GenBank]
, chromosome 1) and in-depth analysis of the genomic region showed that the transcribed CentO sequence is a part of a relatively small CentO cluster containing only 9 full-length monomers. The region located upstream of the CentO cluster was identical to 2 cDNA sequences (AK063242
[GenBank]
and AK067469
[GenBank]
), which, however, terminated before CentO region. These results suggest that the CentO sequence in CF307961
[GenBank]
is possibly a result of read-through transcription from the upstream transcribed locus. The genomic locus for the EST sequence CK041480
[GenBank]
was not found.
We then used a RTPCR approach to examine the transcription of CentO in Nipponbare rice. CentO primers were designed from the most conserved regions identified within the alignment of ChIP-cloned CentO sequences (Supplementary Figure 3, Supplementary Material online). Strand specificity of the RTPCR was ensured by use of strand-specific CentO primers for cDNA synthesis. Although transcripts derived from both strands were detected in all tissues tested, there were differences between reactions using different primers. Although transcripts derived from both CentO strands were easily detected using CentO_U and CentO_L+ primers in all tissues (fig. 8A), primers CentO_F1 and CentO_R2 detected CentO transcripts with lower efficiency and primers CentO_F2 and CentO_R1 did not detect CentO transcripts at all (data not shown). As all primers worked well on genomic DNA (data not shown), these differences were likely due to different level of transcription of different variants of the CentO repeats. To confirm the transcription of the CentO repeats and to assess the variability of amplified sequences, products from 12 RTPCR reactions were cloned and a few clones from each library were sequenced. A total of 102 CentO monomers were identified in 77 sequenced clones.
|
The 2 CentO transcripts identified in databases showed that transcripts containing CentO repeats can be terminated both inside (CF307961 [GenBank] ) and outside (AK069198 [GenBank] ) of the CentO clusters. To assess variability in 3' end positions (i.e., polyadenylation sites) of CentO transcripts, we conducted 3' RACE experiments using RNA isolated from root, leaf, and panicles. In order to reduce amplification of artifacts, we used different primers for reverse transcription (3' RACE_oligoT) and PCR amplification (AUAP_3' RACE). For PCR amplification, we tested 6 CentO primers (3 reverse and 3 forward) of which 4 were able to detect CentO transcripts using RTPCR (see above). Although products were detected in all 6 reactions, hybridization with CentO probe revealed that reactions using forward primers mostly resulted in amplification of sequences not related to CentO (fig. 8B and C). The negative controls, which were not treated with reverse transcriptase, did not yield any product (data not shown).
The 3' RACE products from reactions with a positive hybridization result were cloned, and several clones were randomly picked for sequencing. We sequenced a total of 25 clones of which 24 were derived from the reverse CentO strand. We identified 9 sites of polyadenylation within the reverse CentO strand (Supplementary Figure 4, Supplementary Material online). Only one 3' RACE CentO product (sequence 206 in Supplementary Figure 4, Supplementary Material online) was clearly extended into a downstream sequence of retrotransposon origin. Only 2 polyadenylated sequences (CF307961 [GenBank] and sequence 121 in this study) were derived from the forward CentO strand. The position of polyA-tail in both of them was the same although these sequences were only 84% identical. The 3' RACE data show that the transcription of the CentO repeats can be terminated at different positions within the CentO monomers and can also be extended into the downstream regions.
CentO Transcripts Are Processed into siRNA
Because both strands of the CentO repeat are transcribed, the transcripts have a potential to form double-stranded RNA, a precursor of siRNA. In order to discover whether the CentO transcripts are processed into siRNA, we hybridized blots containing small RNA isolated from rice leaves with 2 strand-specific CentO probes. The probes were prepared from an RTPCR clone ID124 (EB086904
[GenBank]
). We detected siRNAs from probes prepared from both forward and reverse strands of ID124. However, the sizes of the siRNAs detected by the 2 probes varied. While the forward CentO probe hybridized to 21- to 24-nt siRNA, the reverse probe hybridized to 23-nt siRNA only (fig. 9). In addition to the siRNA, both probes also hybridized to
40-nt-long RNAs. As the hybridization stringency was optimized to allow hybridization of small RNA, it inevitably resulted in cross-hybridizations to longer and highly abundant RNA types such as tRNAs and 5S RNA (fig. 9).
|
We also searched miRNA and siRNA sequences recently described in rice (Sunkar, Girke, Jain, and Zhu 2005
| Discussion |
|---|
|
|
|---|
Organization of the CentO Satellite Repeats
Extensive studies on the
satellite DNA in human centromeres revealed highly homogenized higher order repeats and more divergent monomeric repeats (Rudd and Willard 2004
satellite repeats have been identified in most human centromeres. Studies of the
satellite in the X chromosome centromere showed that the divergent monomeric repeats are located at the edge of the
satellite array, and the center of the array contains the highly homogenized higher order repeats (Schueler et al. 2001
satellite DNA in other centromeres appears to be organized similarly to the X centromere (Rudd and Willard 2004
Rice Cen8 contains a
750-kb region that is associated with CENH3 (Nagaki et al. 2004
), including a single CentO array, CentO_8 (fig. 1B). Both monomeric and higher order CentO repeats are found in CentO_8 (fig. 2, Supplementary Figure 1, Supplementary Material online). The higher order repeats are separated into several domains within CentO_8. Similarly, we found short zones of higher order CentO repeats within Cen1 (Supplementary Figure 1, Supplementary Material online). The majority of the CentO array in Cen1 is not included in the current sequence map, and the composition of these missing sequences is unknown. The higher order CentO repeats within Cen8 and Cen1 are highly similar to the short zones of higher order
satellite repeats found in human centromeres (Rudd and Willard 2004
). Such zones were predicted to arise via local homogenization events, which represent transition states in the early stages of sequence family homogenization (Smith 1976
; Dover 1982
).
In humans, only the higher order
satellite DNA is incorporated into CENP-A (human CENH3)associated centromeric chromatin (Schueler et al. 2001
; Ando et al. 2002
; Ohzeki et al. 2002
; Spence et al. 2002
). There has been no evidence for the direct involvement of the monomeric
satellite DNA in centromere function. We demonstrate that the CentO repeats in Cen8 are largely monomeric (Supplementary Figure 1, Supplementary Material online). Thus, the higher order structure of the centromeric satellite DNA is not required to become the CENH3-associated centromeric chromatin. Analysis of the
satellite DNA in the X chromosomes from human and other primates showed that the X centromere evolved through repeated expansion events involving the central domain that may contain mainly higher order repeats (Schueler et al. 2005
). Thus, the higher order structure may be the product of yet unknown mechanisms that drive the evolution of centromeric satellite DNA.
Homogenization of the CentO Satellite Repeats
Centromeric satellite DNA families are subject to concerted evolution. The
satellite repeats in primates show more sequence similarity within a species than between species (Willard and Waye 1987
). The higher order
satellite repeats in humans have been diverged into chromosome-specific subfamilies (Willard and Waye 1987
). Local homogenization of the
satellite repeats has been well demonstrated in the centromeres of human chromosome 17 and X (Schueler et al. 2005
; Rudd et al. 2006
). Higher rates of divergence among the higher order repeats as compared with the monomeric repeats were confirmed in both centromeres. Local homogenization was even associated with the monomeric
satellite repeats in centromere 17 although these repeats are more similar to the monomeric
satellite repeats from other centromeres than the neighboring higher order
satellite repeats (Rudd et al. 2006
).
Our analysis of the CentO repeats within Cen8 and Cen1 is also consistent with the model in which centromeric satellites are homogenized locally. Although the CentO repeats from both Cen8 and Cen1 are mostly monomeric, both dot plot and phylogenetic analyses revealed that the CentO repeats from the same centromere are more similar than those from a different centromere (figs. 5 and 6). Ma and Jackson (2006)
compared 226 CentO monomers collected from 12 rice centromeres. The neighbor-jointing tree derived from these 226 monomers showed that some monomers either within a single centromere or between different centromeres show very similar distances. It was concluded that the CentO satellites have undergone interchromosomal exchange and genome-wide homogenization (Ma and Jackson 2006
). However, the CentO repeats from Cen1 and Cen8 are clearly separated into 2 distinct clusters (fig. 5, Supplementary Figure 2, Supplementary Material online). We also constructed a neighbor-jointing tree using all 155-bp CentO monomers from the centromeres of rice chromosomes 1, 4, 8, and 11. Four distinct clusters were formed in the tree (data not shown). Thus, selection of small number of CentO repeats in the phylogenetic analysis will mask the significance of the local homogenization of this repeat. Local homogenization of the centromeric satellite has also been demonstrated in A. thaliana and its related species (Hall et al. 2005
). These observations support that the centromeric satellite repeats in plants have undergone similar intrachromosomal exchanges and local homogenization as the
satellite repeats in humans.
Functional Constraints on the Evolution of Centromeric Satellite Repeats
Satellite DNA families are subject to rapid changes in sequence and copy numbers (Smith 1976
; Charlesworth et al. 1994
). Most satellite repeats are preserved only in closely related species. However, the evolution of satellite repeats associated with CENH3-containing chromatin may be constrained with centromere function. CENH3 and CENP-C, another DNA-binding inner kinetochore protein, are undergoing rapid adaptive evolution (Malik and Henikoff 2001
; Talbert et al. 2002
; Cooper and Henikoff 2004
; Talbert et al. 2004
). These proteins may serve as adaptors which match rapidly evolving centromeric DNA to the well-conserved centromeric protein machinery (Cooper and Henikoff 2004
), and their evolution is driven by selection to minimize the consequences of centromeric satellite changes, which may be inherently destabilizing for the genome (Malik and Henikoff 2002
). A highly conserved sequence motif has been found in the centromeric satellite DNAs among distantly related grass species (Lee et al. 2005
). Similarly, highly conserved satellite repeats have been found among animal species that have been diverged for more than 50 Myr (de la Herran et al. 2001
; Mravinac et al. 2005
). These results support a functional constraint on the evolution of certain satellite repeat families.
The presence of conserved and/or variable domains in the centromeric satellite repeats suggests that the evolution of such sequences has been influenced by selective constraints. Such constraints may be related to their interaction with the centromeric proteins. Hall et al. (2003)
were the first to demonstrate that the 178-bp centromeric satellite repeat in A. thaliana contains significantly conserved and variable domains. A single variable domain detected in the 178-bp repeat by Hall et al. (2003)
is strikingly similar to the single variable domain observed in the CentO repeat from rice Cen8 (fig. 7D). A similar single but expanded variable domain was also observed in the CentO repeats isolated from the ChIPed CentO sequences (fig. 7B). Interestingly, similar analysis of the CentO repeats from the long arm of rice chromosome 1 show a significantly different graph with variable domains distributed throughout the CentO sequence (fig. 7H). Because the entire CentO_8 block is located within the CENH3-binding domain, it was not surprising that the variability graphs of the CentO repeats from Cen8 and ChIPed CentO are similar to each other. However, CentO_1 represents one of the largest CentO arrays in rice. The CentO repeats collected from Cen1 of the current sequence map represent the sequences on the edges of the CentO_1 array, which are possibly not associated with CENH3. Such repeats may evolve differently from those associated with CENH3 and are free from the constraints associated with centromere function. An expanded variable domain was also observed in the 178-bp repeats collected from the edges of the centromeric satellite arrays in A. thaliana (Hall et al. 2003
).
It is not clear if a highly conserved domain or a highly variable domain or both are functionally significant. A highly conserved domain may be critical for protein binding. For example, one of the centromeric proteins in humans, CENP-B, recognizes a 17-bp motif in
satellite repeat known as the CENP-B box (Masumoto et al. 1989
). DNA motifs similar to the CENP-B box were reported in the centromeric repeats of various eukaryotes. Such motifs may have been maintained because of selective pressure for their interaction with centromeric proteins. Interestingly, the CENP-B box in the
satellite repeats is located within a highly variable domain (Hall et al. 2003
). Because the CENP-B box is only located in subsets of the
satellite repeats, it was suggested that the polymorphism associated with CENP-B box region may serve to phase CENP-B binding within the satellite array, which may be required for the assembly of higher order structure of the
satellite DNA (Choo 2000
; Hall et al. 2003
). It will be interesting to know if the sequence in the single variable domain within the CentO repeat is specifically recognized by centromeric proteins in rice.
Transcription and siRNA Production from the CentO Satellite Repeats
Transcriptional activity of centromeric satellites has been reported in a number of species including both plants (Topp et al. 2004
; May et al. 2005
; Zhang et al. 2005
) and animals (Baldwin and Macgregor 1985
; Fukagawa et al. 2004
; Kanellopoulou et al. 2005
; Martens et al. 2005
; Terranova et al. 2005
). The structure of transcripts of satellite sequences is mostly unknown. It was shown that the transcription might be initiated from upstream promoters provided by mobile elements inserted within or near satellite DNA clusters (Topp et al. 2004
; May et al. 2005
). We show that both strands of the CentO repeat are transcribed. At least, in some cases, the transcription was initiated from upstream non-CentO sequences, and the transcripts are terminated and polyadenylated within the CentO sequences. The CentO transcripts were detected in all 3 organs tested (root, leaf, and panicle), suggesting that the transcription is constitutive. However, the RTPCR results from different primer sets indicate that only some subfamilies or certain specific loci of the CentO repeat are transcribed, whereas others are silent. The overall CentO transcription level is rather low because we were not able to detect unambiguous hybridization signals on a regular Northern blot (data not shown). In addition, only few CentO transcripts were found in large collections of the rice fl-cDNA/EST databases.
Satellite DNAs located in the heterochromatic regions are often transcriptionally silent. However, it appeared recently that low level of transcription is actually necessary for establishing transcriptionally silent heterochromatin state through RNAi (reviewed in Bernstein and Allis 2005
; Gendrel and Colot 2005
). This process is initiated by both strand transcription and formation of double-stranded RNA, which is processed by RNA-induced silencing complex into 20- to 26-nt-long siRNAs. The siRNAs are then recognized by RNA-induced initiation of transcriptional gene silencing (RITS) complex, which is responsible for initiation of heterochromatin assembly and transcriptional silencing (Verdel et al. 2004
). The role of the siRNA in RITS is to target this complex to specific chromosome regions by interaction with DNA or nascent transcripts. Our results showed that CentO transcripts are processed into 21- to 24-nt-long siRNA. This is in agreement with other studies where siRNAs derived from satellite DNA were identified either by cloning and sequencing (Aravin et al. 2003
; Lu et al. 2005
) or detected by hybridization (Fukagawa et al. 2004
; May et al. 2005
; Zhang et al. 2005
). However, as no CentO sequences were found among miRNAs and siRNAs cloned from rice root, shoot, and inflorescence tissues (Sunkar, Girke, Jain, and Zhu 2005
; Sunkar, Girke, and Zhu 2005
), it seems that CentO siRNAs are not highly abundant in rice. This conclusion is also supported by the fact that we could not detect CentO siRNA by less-efficient probes labeled using alternative methods (5' end labeling, random priming), which were sufficient to detect some other small RNAs (data not shown). Interestingly, CentO probes also hybridized to
40-nt-long RNA. It is not clear, whether this RNA is a product or intermediate of some RNA-processing pathway or whether it is a short CentO transcript. RNA 40900 nt in length derived from centromeric satellite repeat CentC was detected in maize (Topp et al. 2004
). This RNA was shown to be tightly bound within maize centromeric chromatin and was implied to contribute to initiation and stabilization of kinetochore chromatin structure. Thus, the transcripts from the centromeric satellite repeats in these species may play different roles, including contribution to epigenetic chromatin modifications via the RNAi pathway.
| Supplementary Material |
|---|
|
|
|---|
Supplementary Figures 14 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This research was supported by Department of Energy grant FG02-01ER15266 to J.J. and grant GA204/04/1207 to J.M. We thank Robin Buell for description and discussion of the sequencing effort involving BAC a0038J12 and Tim Langdon for his valuable comments on the manuscript.
| Footnotes |
|---|
1 These 2 authors contributed equally to this work.
Naoko Takezaki, Associate Editor
| References |
|---|
|
|
|---|
Ananiev EV, Phillips RL, Rines HW. (1998) Chromosome-specific molecular organization of maize (Zea mays L.) centromeric regions. Proc Natl Acad Sci USA 95:1307313078.
Ando S, Yang H, Nozaki N, Okazaki T, Yoda K. (2002) CENP-A, -B, and -C chromatin complex that contains the I-type alpha-satellite array constitutes the prekinetochore in HeLa cells. Mol Cell Biol 22:22292241.








