MBE Advance Access originally published online on January 23, 2007
Molecular Biology and Evolution 2007 24(4):956-968; doi:10.1093/molbev/msm012
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
The Complete Chloroplast and Mitochondrial DNA Sequence of Ostreococcus tauri: Organelle Genomes of the Smallest Eukaryote Are Examples of Compaction



* Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), Ghent University, Ghent, Belgium
UMR 7628 CNRS, Université Paris VI, Laboratoire Arago, Banyuls sur Mer, France
Institut de Génétique Humaine, UPR CNRS 1142, Montpellier, France
E-mail: yves.vandepeer{at}psb.ugent.be.
| Abstract |
|---|
|
|
|---|
The complete nucleotide sequence of the mt (mitochondrial) and cp (chloroplast) genomes of the unicellular green alga Ostreococcus tauri has been determined. The mt genome assembles as a circle of 44,237 bp and contains 65 genes. With an overall average length of only 42 bp for the intergenic regions, this is the most gene-dense mt genome of all Chlorophyta. Furthermore, it is characterized by a unique segmental duplication, encompassing 22 genes and covering 44% of the genome. Such a duplication has not been observed before in green algae, although it is also present in the mt genomes of higher plants. The quadripartite cp genome forms a circle of 71,666 bp, containing 86 genes divided over a larger and a smaller single-copy region, separated by 2 inverted repeat sequences. Based on genome size and number of genes, the Ostreococcus cp genome is the smallest known among the green algae. Phylogenetic analyses based on a concatenated alignment of cp, mt, and nuclear genes confirm the position of O. tauri within the Prasinophyceae, an early branch of the Chlorophyta.
Key Words: chloroplast genome mitochondrial genome Chlorophyta Ostreococcus tauri
| Introduction |
|---|
|
|
|---|
The so-called green lineage (Viridiplantae) is divided into 2 major divisions, namely, Streptophyta and Chlorophyta. Streptophyta contain all known land plants and their immediate ancestors, a group of algae known as "charophyte green algae" (e.g., Chaetosphaeridium globosum), whereas Chlorophyta contain the other green algae (e.g., Chlamydomonas reinhardtii) that form a monophyletic assemblage and are a sister group to the Streptophyta (Graham and Wilcox 2000
The mt genomes of chlorophytes are usually small (2590 kb), whereas in general a bigger genome size is observed for the streptophytes (from 68 kb for Chara vulgaris to around 400 kb for higher plants). The great majority of these genomes are circular, except for some species of Chlamydomonales that have a linear genome (Vahrenholz et al. 1993
). The increase of the genome size observed within Streptophyta does not necessarily reflect an increase in coding capacity. Indeed, the transfer of mt genes to the nucleus over evolutionary time (Brennicke et al. 1993
), the enlargement and incorporation of new sequences within the mt intergenic spacers, the loss of genes, the increase of intron size, and the resulting decrease of the coding density are all characteristic for the mt genomes of higher land plants. In angiosperms, the most striking feature is the presence of a multipartite genome structure, which results in high-frequency recombination via repeated sequences in the genome (Fauron et al. 1995
), altering the genome copy number, which can result in different phenotypes (Kanazawa et al. 1994
; Janska et al. 1998
).
All cp (chloroplast) genomes that have been described for land plants have a very conserved genome size, usually around 150 kb covering about 7080 genes. In contrast, the cp genomes of green algae, although having a rather similar genome size between 150 and 200 kb, show a tremendous variation in gene content, due to massive gene loss, genome erosion, and gene transfer to the nucleus (Grzebyk and Schofield 2003
). All cp genomes described so far are circular. Previous studies have shown that, although in green algae (e.g., C. reinhardtii) more genes have been transferred to the nucleus compared with land plants (e.g., tobacco), the rate of gene flow has subsequently slowed down dramatically and the transfer of DNA from cp to the nucleus is now very rare (Lister et al. 2003
). However, until very recently (Derelle et al. 2006
; this study), there was no chlorophyte that had both its nuclear, cp, and mt genome published, and it therefore remained difficult to quantify precisely the extent of gene transfer from the organelles to the nucleus.
Ostreococcus tauri is a unicellular green alga that was discovered in the Mediterranean Thau lagoon (France) in 1994. With a size less than 1 µm, comparable to that of a bacterium, it is the smallest eukaryotic organism currently described (Courties et al. 1994
). Its cellular organization is rather simple with a relatively large nucleus with only 1 nuclear pore, a single chloroplast 1 mitochondrion, 1 Golgi body, and a highly reduced cytoplasm compartment (Chrétiennot-Dinet et al. 1995
). A membrane surrounds the cells, but no cell wall can be observed. Apart from this simple cellular structure, the O. tauri nuclear genome is small (12.56 Mb) and is fragmented into 20 chromosomes (Derelle et al. 2006
). Phylogenetically, O. tauri belongs to the Prasinophyceae, an early branch of the Chlorophyta (Courties et al. 1998
). The presence of only 1 chloroplast and 1 mitochondrion and its basal position in the green lineage makes this alga interesting for studying the structure and evolution of both genomes, whereas comparison with other members of the green lineage sheds light on the evolution of organelle genomes.
| Materials and Methods |
|---|
|
|
|---|
Sequencing
For the sequencing of the nuclear genome, cellular DNA was used for the preparation of the shotgun libraries (Derelle et al. 2006
Gene Prediction and Annotation
All genes were annotated based on their similarity with cp and mt genes that were available in public databases and if necessary manually corrected using Artemis (Rutherford et al. 2000
). Homologous relationships between publicly available genes and the O. tauri genes were identified through Blast (Altschul et al. 1990
). Also small and large ribosomal subunit RNA genes were identified by Blast. Alignment and secondary structure annotation was done using the DCSE alignment editor (De Rijk and De Wachter 1993
). The secondary structure drawings were made using RnaViz (De Rijk et al. 2003
). tRNA genes were identified by tRNAscan-SE (Lowe and Eddy 1997
) using the option "search for organellar tRNAs (-O)". The 5S rRNA gene of the cp genome was identified using the CMSEARCH program from the INFERNAL package (Eddy 2002
) with the 5S rRNA covariance model (RF00001) from the RFAM database (Griffiths-Jones et al. 2005
).
Sequence Analyses
Pairwise comparison of gene permutations by inversions between different mt and cp genomes was obtained using the GRIMM web server (Tesler 2002
). The data sets used contained, respectively, 54 conserved mt and 82 conserved cp genes. As this tool cannot deal with duplicated genes, genes located in the inverted repeats (IRs) were counted only once.
Duplicated sequences within both genomes were identified using DOTTER (Sonnhammer and Durbin 1995
). For both genomes (but including only one of the IR sequences), short repeated sequences were identified with REPUTER 3.1 (Kurtz et al. 2001
), using the -p (palindromic), -f (forward), -l (minimum length), and -allmax parameters; and MUMMER 3.0 (Kurtz et al. 2004
), using the -l (minimum length) and -b (forward and reverse complement matches) options. PIPMAKER (Schwartz et al. 2000
) was used to visualize the location of the repeated sequences.
Phylogenetic Analysis
Homologous genes of O. tauri cp and mt genes were searched for in the public databases (GenBank/EMBL/DDBJ) (Benson et al. 2002
; Stoesser et al. 2002
; Tateno et al. 2002
) using BlastP (Altschul et al. 1997
). Protein sequences were aligned with ClustalW (Thompson et al. 1994
). Two different data sets were built:
- Forty-seven cp protein sequences (atpA, atpB, atpE, atpF, atpH, clpP, petB, petG, psaA, psaB, psaC, psaJ, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbN, psbT, psbZ, rbcL, rpl14, rpl16, rpl2, rpl20, rpl36, rpoA, rpoB, rpoC1, rpoC2, rps11, rps12, rps14, rps18, rps19, rps3, rps4, rps7, rps8, ycf3, and ycf4) from 14 different organisms (Chlorella vulgaris [Wakasugi et al. 1997
] [AB001684], Nephroselmis olivacea [Turmel et al. 1999a
] [AF137379], Pseudendoclonium akinetum [Pombert et al. 2005
] [AY835431], Stigeoclonium helveticum [Bélanger et al. 2006
] [DQ630521], Scenedesmus obliquus [de Cambiaire et al. 2006
] [DQ396875], Oltmannsiellopsis viridis [Pombert et al. 2006
a] [DQ291132], C. reinhardtii [Maul et al. 2002
] [BK000554], Mesostigma viride [Lemieux et al. 2000
] [AF166114], C. globosum [Turmel et al. 2002a] [AF494278], Marchantia polymorpha [Ohyama et al. 1986
] [M68929], Nicotiana tabacum [Shinozaki et al. 1986
] [Z00044], Pinus thunbergii [Wakasugi et al. 1994
] [D17510], Cyanophora paradoxa [Stirewalt et al. 1995
] [U30821], and O. tauri) were independently aligned and concatenated into a data set of 9,553 amino acids.
- A nuclear gene (small subunit [SSU] rRNA), 1 mt gene (nad5), and 2 cp genes (rbcL and atpB), encompassing 44 organisms, were combined into a data set of 5,053 nucleotides (based on Karol et al. 2001
)
PHYML 2.4.4 (Guindon and Gascuel 2003
) was used to compute maximum likelihood trees, using the cpREV45 model for cp sequences and the Hasegawa, Kishino and Yano (1985) model for the combined nucleic acid data set. Pairwise distance trees were obtained using TREECON (Van de Peer and De Wachter 1994
), based on Poisson (Zuckerkandl and Pauling 1965
; Dickerson 1971
) and Kimura (1983)
corrected distances for the protein alignment and Jukes and Cantor (1969) corrections for nucleic acid sequences. PHYLIP (the Phylogeny Inference Package; Felsenstein 1989
) was used for 1) computing pairwise distance trees using the Dayhoff PAM matrix (1979) for protein alignment and Jukes and Cantor (1969) for nucleic acid sequences and 2) obtaining maximum parsimony trees for both data sets. For each method, bootstrap analyses with 500 replicates were performed to test the significance of the nodes. Finally, MrBayes (500.000 generations and 4 chains) was used for Bayesian inference of phylogenetic trees (Huelsenbeck et al. 2001
), using a JTT +
substitution model (Jones et al. 1992
).
After manual improvement of the alignments using BIOEDIT (Hall 1999
), only unambiguously aligned positions were taken into account for tree construction. TREEVIEW was used to visualize the trees (Page 1996
).
| Results and Discussion |
|---|
|
|
|---|
Phylogenetic Analyses
Previous phylogenetic analyses based on the 18S rDNA sequence of different Chlorophyta suggested that O. tauri belongs to the Prasinophyceae, an early diverging group within the green plant lineage (Courties et al. 1998
|
In our phylogenetic analyses, we have also included the unicellular freshwater alga, M. viride, whose phylogenetic position is still being discussed (previously referred to as the "enigma of Mesostigma" [McCourt et al. 2004
Structure and Gene Content of the mt Genome
The O. tauri mt genome assembles as a circle of 44,237 bp (fig. 2), with an overall GC content of 38%. This size is similar to the mt genome of another early branching chlorophyte N. olivacea (45,223 bp) (Turmel et al. 1999b
). However, in contrast to the N. olivacea genome, the O. tauri mt sequence contains a duplicated region, containing 22 genes and covering 44% (19,542 bp) of the genome (see further). Sixty-five genes (unique open reading frames [ORFs] were not taken into account, and duplicated genes were counted only once) are encoded on both strands, encompassing 93% of the genome, which makes the mt genome of O. tauri the most gene dense among the Chlorophyta. For comparison, both M. viride (Turmel et al. 2002b) and N. olivacea also have sixty-five genes, but only covering 87% and 81% of their genome, respectively (table 1). Among the 65 genes, 36 are protein-encoding genes, 26 are transfer RNAs, and 3 are rRNAs (see supplementary table S1, Supplementary Material online). Two predicted proteins (orf129 and orf153) coding for 129 and 153 amino acids, respectively, did not show any clear similarity to other known genes. The compactness of the O. tauri mt genome is further illustrated by the shortness of the intergenic regions, ranging from 1 to 475 bp, with an average of 42 bp. Only 5 intergenic regions exceed 100 bp, and these are all located in the duplicated region. In addition, there are 3 cases of overlapping genes (trnR1-rnpB, rps14-rpl5, and orf153-trnH). Lastly, in contrast to other members of the green lineage, neither group I nor group II type introns are present in any of the genes.
|
|
All 26 tRNAs fold into the conventional cloverleaf secondary structure and are able to decode all codons. The small subunit rRNA (SSU rRNA, rns in fig. 2) gene is fragmented into 2 parts, but retains its ability to fold into the normal secondary structure model (see supplementary fig. S2, Supplementary Material online). The fragmentation site is located near the hairpin loop of helix 29 (indicated by gray area) of the secondary structure model (Wuyts et al. 2004
The most striking feature of the O. tauri mt genome is the presence of a large duplicated segment (19,542 bp; shaded box in fig. 2). This duplication is also observed in the partially sequenced mt genome of another Ostreococcus strain (O. lucimarinus; Palenik B, personal communication), thereby excluding erroneous genome assembly. The presence of such a duplicated sequence has not been observed in any other member of the Chlorophyta, except for C. reinhardtii, wherein its mt genome, which is linear instead of circular, terminal IRs of approximately 500 bp have been described (Vahrenholz et al. 1993
; table 1). No duplication is present in the mt genome of the charophyte Chara vulgaris (Turmel et al. 2003
). The only large repeated sequences previously reported are present in higher land plants (e.g., Arabidopsis thaliana: 366,924 bp, containing repeat sequences of 6.5 and 4.5 kb and Beta vulgaris: 368,799 bp, containing a repeat sequence of 6.2 kb) (Unseld et al. 1997
; Kubo et al. 2000
). These repeated regions in the mt genome of angiosperms gave rise to a multipartite genome structure (Fauron et al. 1995
) and lead to high-frequency intramolecular recombination. Indeed, a master circle, containing the complete genetic information, can lead to different subgenomic circles by homologous recombination via a repeated sequence motif (e.g., the tobacco mt genome can provide 6 different subgenomic circles by homologous recombination between the different repeated sequences) (Knoop 2004
; Sugiyama et al. 2005
). The presence of this multipartite genome structure enables them to change their gene and genome copy number, resulting in an altered plant phenotype (Kanazawa et al. 1994
; Janska et al. 1998
).
The unique repeated segment in the mt genome of O. tauri contains 5 protein-coding genes (cob, cox1, atp4, atp8, and nad4L), 2 tRNAs (trnMe and trnY), the SSU rRNA (rns), LSU rRNA (rnl), and 5S rRNA (rrn5) genes and the orf129. The duplicated nucleotide sequences are 100% identical over a length of 9,771 bp, covering 44% of the genome (see supplementary fig. S3, Supplementary Material online). Repeats that are a 100% identical in mt genomes are not exceptional. For instance, Brassica napus has 2 repeats with 1 mismatch over 2,427 bp (Handa 2003
), B. vulgaris has 2 repeats (Kubo et al. 2000
), and A. thaliana has 3 repeats that are 100% identical (Unseld et al. 1997
).
Short dispersed repeats (SDR) are also thought to play an important role in mt genome rearrangements, thereby altering the gene content and genome size. This is not only true for members of the Chlorophyta, but also in land plants, yeasts, and even animals, where they serve as hot spots for recombination (Pombert et al. 2004
). Short dispersed repeats have been described in all known members of the Chlorophyta, although their abundance is highly variable. All Chlorophyta members hold SDRs of at least 15 bp in their genome. This number is reduced to 52 repeats in N. olivacea (Pombert et al. 2006b
) and to only 11 in O. tauri. The largest repeats found in O. tauri and N. olivacea are rather short, being 34 bp and 42 bp, respectively. The GC content of the SDRs present in O. tauri does not differ much from the overall GC content of the mt genome (36% for the SDRs vs. 38% for the complete genome). In general, more derived lineages show an increase of the number of SDRs: O. viridis contains 1,206 (Pombert et al. 2006b
), Scenedesmus obliquus 4,086 (Nedelcu et al. 2000
), and P. akinetum 8,002 (Pombert et al. 2004
) SDRs of at least 15 bp long. It seems that after the split of the Prasinophyceae, an increase of SDRs took place (with the exception of the Chlamydomonadales), and it is tempting to correlate this increase with the gene rearrangements that took place within the other members of the Chlorophyta (see further).
Comparison with Other mt Genomes
Comparison of the O. tauri mt genome with 9 other species of the Viridiplantae lineage (Cr: C. reinhardtii, No: N. olivacea, Ov: O. viridis, Pa: P. akinetum, So: S. obliquus, At: A. thaliana, Mp: M. polymorpha, Cg: C. globosum, and Mv: M. viride) unveiled only 9 genes (not including tRNAs), which are common to all these species (table 2). However, when removing C. reinhardtii (Michaelis et al. 1990
) and S. obliquus, 2 members of the Chlorophyceae, from this comparison, this number increases to 25 shared genes. When further removing the 2 ulvophyte green algae (O. viridis and P. akinetum), the number of conserved genes increases to 30, thus, representing the gene content conservation between the 2 prasinophytes and the land plants. However, when only considering the protein-coding genes of O. tauri, N. olivacea, and M. viride, 36 genes are shared, which represents 95% of the O. tauri and 92% of the M. viride protein-coding gene content. Apparently, the gene content conservation between these genomes, which are assumed to represent a more ancestral state, is still very high. One of the 7 protein-coding genes that are absent in the O. tauri mt genome, namely rpl2, could be uncovered in the nuclear genome (see supplementary table S3, Supplementary Material online).
|
Disregarding the unique ORFs and tRNAs (trnG[gcc] and trnL[gag] seem to have been lost in N. olivacea compared with O. tauri, whereas trnR[ucg] is lost in O. tauri compared with N. olivacea), the gene repertoires of O. tauri and N. olivacea are identical (table 2). Furthermore, there is a high degree of synteny between these 2 algae, with 5 gene clusters of at least 5 genes and 1 of 2 genes, which are almost identical in both mt genomes (genes denoted in black in fig. 3). However, when one considers gene polarities, synteny is limited to only 2 gene clusters (12 genes extending from rps11 to rps10 and 5 genes extending from atp6 to cox3). The major difference between both mt genomes is the duplication in O. tauri and the presence of 4 group I introns in N. olivacea (3 within the rnl and 1 in the cob gene) (Turmel et al. 1999b
|
A certain degree of synteny can still be detected when adding C. globosum (charophyte) (Turmel et al. 2002a) and Marchantia polymorpha (streptophyte) (Oda et al. 1992
Additionally, we estimated the number of gene inversions needed to transform the gene organization of one genome into another, thereby providing quantitative measurement of their evolutionary distances. Fifty-four conserved genes (duplicated genes were used only once) of 3 Chlorophyta (O. tauri, N. olivacea, and P. akinetum) and M. viride were used, showing that a minimum of 29 inversions are needed to transform the gene organization of O. tauri into that of N. olivacea. When comparing O. tauri with the other mt genomes, almost twice as many inversions are needed (50 for both P. akinetum and M. viride), again indicating the close relationship between the 2 Prasinophyceae.
Structure and Gene Content of the cp Genome
With a circular cp genome of 71,666 bp long (fig. 4), O. tauri contains the smallest cp genome known so far within the Viridiplantae (except for the parasite Helicosporidium sp. [de Koning and Keeling 2006
]). Cp genome size in green algae ranges from 118,360 bp in M. viride (Lemieux et al. 2000
) to 203,395 bp in C. reinhardtii (Maul et al. 2002
; table 3). The GC content (39.9%) of the O. tauri cp genome is close to that of N. olivacea (42.1%) (Turmel et al. 1999a
) and O. viridis (40.5%) (Pombert et al. 2006a
), but higher than that of other chlorophytes, such as Chlorella vulgaris (31.6%) (Wakasugi et al. 1997
), C. reinhardtii (34.6%), and M. viride (30.1%). Like all known members of the Chlorophyta, except C. vulgaris, the cp genome of O. tauri has a quadripartite structure containing 2 large IRs of 6,825 bp (covering 9.5% of the genome) separating a large single-copy (LSC) region (35,684 bp, covering 49.8%) and a small single-copy (SSC) region (22,332 bp, covering 31.2%) (fig. 4 and supplementary fig S4, Supplementary Material online). Despite the difference in size, both the LSC and the SSC contain 41 genes, whereas the IR sequences contain, next to psbA, the rRNA operon (rrs, trnI[gau], trnA[ugc], rrl, and rrf).
|
|
Besides its ultrasmall cp genome, the gene content is reduced to a minimum: 86 genes (unique ORFs were not taken into account, and duplicated genes were counted only once) were identified, including 25 tRNAs and the rRNA gene cluster (rrf, rrl, and rrs). Two predicted proteins (orf537 and orf1260) coding for 537 and 1,260 amino acids, respectively, show little similarity with known genes: ycf1 and ycf2. These genes will be indicated as orf537/ycf1 and orf1260/ycf2. (see supplementary table S4, Supplementary Material online) This gene repertoire is the smallest known to date among the green algae: C. reinhardtii has a slightly higher number of genes (94 genes, not including the duplicated genes and unique ORFs), but this number is also much lower than the number of genes present in other Chlorophyta (e.g., N. olivacea contains 127 genes and M. viride contains 135 genes) (table 3). Twenty-five tRNAs could be detected, a number that is low compared with that of other members of the green lineage (e.g., N. olivacea: 32, M. viride: 37, and A. thaliana: 37 [Sato et al. 1999
Comparison with Other cp Genomes
The gene repertoire of the cp genomes of 7 Chlorophyta (Cr: C. reinhardtii, Cv: C. vulgaris, No: N. olivacea, Ov: O. viridis, Pa: P. akinetum, So: S. obliquus, and Ot: O. tauri), 2 Streptophyta (At: A. thaliana and Nt: N. tabacum), and M. viride (Mv) were compared and the results shown in table 4. Fifty-three core genes are shared between both Chlorophyta and Streptophyta (bold gene names), whereas 4 additional core genes (ycf12, tufA, rpl5, and rps9) are present when only considering the Chlorophyta lineage. The 53 core cp genes are involved either in photosynthesis, energy metabolism, or some housekeeping functions. Gene loss and gene transfer to the nucleus is a common feature of cp genomes (Stegemann et al. (2003
)), and (Grzebyk and Schofield 2003)
reported the loss of 7 genes (rpl21, rpl22, rpl33, rps15, rps16, odpB, and ndhJ) at the base of the Chlorophyta lineage. These genes were also not detected in the O. tauri cp genome, but 5 of them are present in the nuclear genome (see supplementary table S3, Supplementary Material online).
|
In O. tauri, 34 genes are lost in the cp genome compared with other Chlorophyta: 1) the 10 homologs of the mt ndh genes, subunits of the NADH:ubiquinone oxidoreductase. None of these genes were present in the nuclear genome; 2) the genes chlB, chlI, chlL, and chlN involved in the chlorophyl synthesis in dark. In almost all known green algal cp genomes, these 4 genes are present, but not in O. tauri where only chlI was found in the nuclear genome (on chromosome 2). The absence of chlB, chlL, and chlN in the cp or nuclear genome of O. tauri confirms the inability of this organism to produce chlorophyl in dark (Derelle et al. 2006
Despite these differences in gene content, 10 conserved blocks, ranging from 2 to 12 genes are shared between O. tauri and N. olivacea, 11 between O. tauri and C. vulgaris, and 12 between O. tauri and M. viride. When aligning the 4 genomes together, 9 conserved blocks of at least 2 genes can be unveiled. However, when adding the cp genome of C. reinhardtii, whose genome is structurally the most comparable to that of O. tauri (see below), almost no conserved blocks shared by all species, can be detected. Comparison of the cp genome of O. tauri with the one of O. viridis, a member of the Ulvophyceae, also showed shared gene clusters. So in general, without considering C. reinhardtii, 9 conserved blocks of at least 2 genes can be unveiled between different members of the Chlorophyta, representing 33 genes (for O. tauri 37% of its gene content), indicating the importance of maintaining certain gene clusters throughout evolution. However, if we compare the gene order of O. tauri cp genome with the 24 "ancestral" gene clusters present in N. olivacea and M. viride (de Cambiaire et al. 2006
), only 7 of them are completely present in O. tauri, indicating the loss of its ancestral characteristics.
The number of gene inversions necessary to transform the gene organization of one genome into another has been estimated for 4 Chlorophyta (O. tauri, N. olivacea, O. viridis, and C. vulgaris) and for M. viride. An average of 50 inversions is needed to transform the gene organization of O. tauri into that of any other of these cp genomes.
Although some genes and gene clusters are well conserved among green algae, the overall structure of the cp genomes can show remarkable differences. First, both the LSC and the SSC region of O. tauri cp genome contain 41 genes, in contrast to the cp genomes of other green algae (N. olivacea, M. viride, O. viridis, and P. akinetum), where most of the genes are located in the LSC region (Pombert et al. 2006a
). Second, the difference in length between the 2 SSCs is much smaller than in other Chlorophyta (e.g., in N. olivacea, the LSC region is 5.6 times larger than its SSC region) or even Streptophyta (e.g., in A. thaliana, the LSC region is 4.7 times larger than its SSC region) (table 3). In this respect, the cp genome of O. tauri is more similar to the cp genome of C. reinhardtii (Maul et al. 2002
) for 2 reasons: 1) the SSCs have almost identical lengths and both contain an almost identical number of genes (81 and 78, respectively) and 2) the IRs, which in both cases cover almost 20% of the genome, contain exactly the same genes, orientated in the same direction.
The distribution of different genes over the LSC and SSC regions is highly conserved, not only in the entire streptophyte lineage (M. viride and land plant genomes share essentially the same gene partitioning), but also in the early diverging N. olivacea, indicating that the last common ancestor of all chlorophytes featured a gene partitioning very similar to that observed in land plants. In this respect, Pombert et al. (2006a)
created an ancestral cp genome based on the genomes of O. viridis and P. akinetum (both Chlorophyta, belonging to the Ulvophyceae) and compared that with the genome of N. olivacea, which is a prasinophyte and can be considered as ancestral to the 2 ulvophyte. They concluded that the LSC region of the ancestral genome of both Ulvophyceae contained only genes characteristic of the LSC region of N. olivacea and that the SSC region contained genes usually found in the SSC and LSC region of N. olivacea. However, in the O. tauri cp genome, the genes are scattered across the LSC and SSC region, and the previous assumption made by Pombert (2006a)
holds no longer true for O. tauri. Because the Prasinophyceae are not a monophyletic group, it is not surprising that the O. tauri cp genome differs significantly from the N. olivacea cp genome and that changes in gene partitioning have occurred independently in O. tauri from those observed in ulvophycean and chlorophycean algae. With the availability of more cp genomes it will become clearer whether O. tauri is an exception to the rule and has undergone specific genome reshuffling or whether different species all have their own independent evolutionary history regarding their cp genome structure.
Also in the cp genome, we looked for the presence of SDRs. Sixty-four repeats larger than 15 bp are present, but none of the detected repeats exceed the length of 25 bp. Almost all these SDRs are located in the coding region of 5 protein-coding genes (rpl23, psbD, psaB, psaA, and psbA) and 5 tRNAs (see supplementary fig. S5, Supplementary Material online). The GC content of the SDRs is comparable to the overall GC content of the cp genome (38% for the SDRs vs. 39.9% for the cp genome). The number of SDRs in N. olivacea is similar, but substantially differs from C. reinhardtii, which cp genome is more similar to the O. tauri cp genome regarding its structure (see above). In the O. tauri cp genome, no direct link can be made between the major reshuffling that took place and the abundance of SDRs, whereas for C. reinhardtii the major rearrangements could be explained by the huge collection of SDRs present in its cp genome. Consequently, another mechanism is probably responsible for the large number of rearrangements present in the cp genome of O. tauri.
| Conclusion |
|---|
|
|
|---|
Ostreococcus tauri is the smallest eukaryotic organism known to date, and recently, its small (12.56 Mb), but gene dense nuclear genome has been described (Derelle et al. 2006
The main difference between the O. tauri and the other Chlorophyta mt genomes is the presence of a unique duplication, previously unobserved in the Chlorophytae. On the other hand, the mt genome of O. tauri, which is the most gene dense among all known green algae, closely resembles the one of Nephroselmis olivacea, another member of the Prasinophyceae. This is illustrated by a number of common characteristics: 1) the gene content is almost identical in both genomes; 2) there is a high degree of synteny between the 2 genomes, which is illustrated by the presence of a number of conserved gene blocks and by a low number of gene inversions necessary to transform the O. tauri gene structure into the one of N. olivacea; and finally 3) Pombert (2006b)
showed that there is an increase in the number of Short Dispersed Repeats (SDR) when moving in the tree from N. olivacea to the more derived lineages within the Chlorophyta. These analyses were confirmed by O. tauri, which contains even fewer SDRs than N. olivacea. All these data clearly show that the mt genome of O. tauri shares the "after ancestral pattern of evolution typified by the N. olivacea genome. This conclusion for N. olivacea representing an ancestral state (Turmel et al. 1999b
) was based on its basal phylogenetic position in the chlorophyte lineage, on the presence of 3 genes (nad10, rpl14, and rnpB) that had not been identified at that time in any other mt genome (today, rpl14 is also identified in P. akinetum), and on its ancestral organizational pattern. These arguments also hold for the O. tauri mt genome, and most likely, both the O. tauri and N. olivacea mt genome represent the most ancestral form known to date for the green lineage. Whether the unique duplication seen in Ostreococcus is restricted to this organism will hopefully become clear with the availability of more mt genomes of basal green algae (e.g., the one of Micromonas pusilla, another prasinophyte which is currently being sequenced; Worden A, personal communication).
The O. tauri cp genome is very compact, and both the genome size and the gene number are the smallest known among the green plants and green algae. Looking at the gene content, the O. tauri cp genome lost many genes compared with other prasinophyte green algae or to M. viride. This is well illustrated by the small number of ancestral gene clusters still present in the O. tauri cp genome where only 7 of the 24 Mesostigma/Nephroselmis gene clusters (de Cambiaire et al. 2006
) could be uncovered. Finally, although gene partitioning among LSC and SSC regions is well conserved in all Streptophyta and early-diverging Chlorophyta, the genes in the O. tauri cp genome are randomly distributed between both regions. All these data strongly suggest that, in contrast to its mt genome, the O. tauri cp genome seems to have lost most of the ancestral features observed in the M. viride and N. olivacea genomes.
| Supplementary Material |
|---|
|
|
|---|
The genome data have been submitted to the European Molecular Biology Laboratory, www.embl.org (accession numbers CR954200 [mt genome] and CR954199 [cp genome]) or can be found at http://bioinformatics.psb.ugent.be/. Supplementary tables S1S5 and figures S1S5 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We would like to thank Sasker Grootjans for his help in the phylogenetic analyses, Jeroen Raes for discussions, Yvan Saeys for help with the figures, and Igor Grigoriev, Brian Palenik, and the Joint Genome Institute for prior access to the O. lucimarinus data. S.R. is indebted to the Institute for the Promotion of Innovation by Science and Technology in Flanders for a predoctoral fellowship. This work was supported by the Génopole Languedoc-Roussillon and the French research ministry, and was conducted within the framework of the "Marine Genomics Europe" European Network of Excellence (GOCE-CT-2004-505403).
| Footnotes |
|---|
Peter Lockhart, Associate Editor
| References |
|---|
|
|
|---|
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol 215:403410.[CrossRef][ISI][Medline]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:33893402.
Bélanger AS, Brouard JS, Charlebois P, Otis C, Lemieux C, Turmel M. (2006) Distinctive architecture of the chloroplast genome in the chlorophycean green alga Stigeoclonium helveticum. Mol Genet Genomics 276:464477.[CrossRef][ISI][Medline]
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL. (2002) GenBank. Nucleic Acids Res 30:1720.
Bhattacharya D, Weber K, An SS, Berning-Koch W. (1998) Actin phylogeny identifies Mesostigma viride as a flagellate ancestor of the land plants. J Mol Evol 47:544550.[CrossRef][ISI][Medline]
Brennicke A, Grohmann L, Hiesel R, Knoop V, Schuster W. (1993) The mitochondrial genome on its way to the nucleus: different stages of gene transfer in higher plants. FEBS Lett 325:140145.[CrossRef][ISI][Medline]
Chrétiennot-Dinet MJ, Courties C, Vaquer A, Neveux J, Claustre H, Lautier J, Machado MC. (1995) A new marine picoeukaryote: Ostreococcus tauri gen. et sp. Nov. (Chlorophyta, Prasinophyceae). Phycologia 4:285292.
Courties C, Perasso R, Chrétiennot-Dinet MJ, Gouy M, Guillou L, Troussellier M. (1998) Phylogenetic analysis and genome size of Ostreococcus tauri (Chlorophyta, Prasinophyceae). J Phycol 34:844849.[CrossRef][ISI]
Courties C, Vaquer A, Troussellier M, Lautier J, Chrétiennot-Dinet M-J, Neveux J, Machado MC, Claustre H. (1994) Smallest eukaryotic organism. Nature 370:255.
Dayhoff MO. (1979) Atlas of protein sequence and structure. (National Biochemical Foundation, Silver Spring (MD)) Vol. 5:suppl 3.
de Cambiaire JC, Otis C, Lemieux C, Turmel M. (2006) The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands. BMC Evol Biol 6:37.[CrossRef][Medline]
de Koning AP and Keeling PJ. (2006) The complete plastid genome sequence of the parasitic green alga Helicosporidium sp. is highly reduced and structured. BMC Biol 21:12.
De Rijk P and De Wachter R. (1993) DCSE, an interactive tool for sequence alignment and secondary structure research. Comput Appl Biosci 9:735740.
De Rijk P, Wuyts J, De Wachter R. (2003) RnaViz 2: an improved representation of RNA secondary structure. Bioinformatics 19:299300.
Derelle E, Ferraz C, Rombauts S, et al. (26 co-authors). (2006) From the cover: genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci USA 103:1164711652.
Dickerson RE. (1971) The structures of cytochrome c and the rates of molecular evolution. J Mol Evol 1:2645.[CrossRef][Medline]
Eddy SR. (2002) A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics 3:18.[CrossRef][Medline]
Ewing B, Hillier L, Wendl MC, Green P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175185.
Fauron C, Casper M, Gao Y, Moore B. (1995) The maize mitochondrial genome: dynamic, yet functional. Trends Genet 11:228235.[CrossRef][ISI][Medline]
Felsenstein J. (1989) PHYLIP (phylogeny inference package). Version 3.2. Cladistics 5:164166.
Graham LE and Wilcox LW. (2000) Green algae Iintroduction and prasinophyceans. In Graham LE and Wilcom LW (Eds.). AlgaeUpper Saddle River, Prentice Hall pp. 397419.
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A. (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33:D121D124.
Grzebyk D and Schofield O. (2003) The mesozoic radiation of eukaryotic algae: the portable plastid hypothesis. J Phycol 39:259267.[ISI]
Guindon S and Gascuel O. (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696704.[CrossRef][ISI][Medline]
Hall TA. (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:9598.
Handa H. (2003) The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res 31:59075916.
Hasegawa M, Kishino H, Yano T. (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160174.[CrossRef][ISI][Medline]
Huelsenbeck JP and Ronquist F. (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754755.
Janska H, Sarria R, Woloszynska M, Arrieta-Montiel M, Mackenzie SA. (1998) Stoichiometric shifts in the common bean mitochondrial genome leading to male sterility and spontaneous reversion to fertility. Plant Cell 10:11631180.[Medline]
Jones DT, Taylor WR, Thornton JM. (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275282.
Jukes TH and Cantor CR. (1969) Evolution of protein molecules. (Mammalian PressIn Munro HN (Ed.). , New York)21123.
Kanazawa A, Tsutsumi N, Hirai A. (1994) Reversible changes in the composition of the population of mtDNAs during dedifferentiation and regeneration in tobacco. Genetics 138:865870.[Abstract]
Karol KG, McCourt RM, Cimino MT, Delwiche CF. (2001) The closest living relatives of land plants. Science 294:23512353.[CrossRef][ISI][Medline]
Kimura M. (1983) The neutral theory of molecular evolution. (Cambridge University Press, Cambridge).
Knoop V. (2004) The mitochondrial DNA of land plants: peculiarities in phylogenetic perspective. Curr Genet 46:123139.[ISI][Medline]
Kubo T, Nishizawa S, Sugawara A, Itchoda N, Estiati A, Mikami T. (2000) The complete nucleotide sequence of the mitochondrial genome of sugar beet (Beta vulgaris L.) reveals a novel gene for tRNA(Cys)(GCA). Nucleic Acids Res 28:25712576.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R. (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 29:46334642.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12.[CrossRef][Medline]
Lemieux C, Otis C, Turmel M. (2000) Ancestral chloroplast genome in Mesostigma viride reveals an early branch of green plant evolution. Nature 403:649652.[CrossRef][Medline]
Lister DL, Bateman JM, Purton S, Howe CJ. (2003) DNA transfer from chloroplast to nucleus is much rarer in Chlamydomonas than in tobacco. Gene 316:3338.[CrossRef][ISI][Medline]
Lowe TM and Eddy SR. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955964.
Martin W, Rujan T, Richly E, Hansen A, Cornelsen S, Lins T, Leister D, Stoebe B, Hasegawa M, Penny D. (2002) Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA 99:1224612251.
Mattox KR and Stewart KD. (1984) Classification of the green algae: a concept based on comparative cytology. In Irvine DEG and John DM (Eds.). The systematics of green algae(Academic Press, London) pp. 2972.
Maul JE, Lilly JW, Cui L, dePamphilis CW, Miller W, Harris EH, Stern DB. (2002) The Chlamydomonas reinhardtii plastid chromosome: islands of genes in a sea of repeats. Plant Cell 14:26592679.
McCourt RM, Delwiche CF, Karol KG. (2004) Charophyte algae and land plant origins. Trends Ecol Evol 19:661666.[CrossRef][Medline]
Melkonian M. (1989) Flagellar apparatus ultrastructure in Mesostigma viride (Prasinophyceae). Plant Syst Evol 164:93122.
Michaelis G, Vahrenholz C, Pratje E. (1990) Mitochondrial DNA of Chlamydomonas reinhardtii: the gene for apocytochrome b and the complete functional map of the 15.8 kb DNA. Mol Gen Genet 223:211216.[CrossRef][ISI][Medline]
Nedelcu AM, Lee RW, Lemieux C, Gray MW, Burger G. (2000) The complete mitochondrial DNA sequence of Scenedesmus obliquus reflects an intermediate stage in the evolution of the green algal mitochondrial genome. Genome Res 10:819831.
Nozaki H, Misumi O, Kuroiwa T. (2003) Phylogeny of the quadriflagellate Volvocales (Chlorophyceae) based on chloroplast multigene sequences. Mol Phylogenet Evol 29:5866.[CrossRef][ISI][Medline]
Oda K, Kohchi T, Ohyama K. (1992) Mitochondrial DNA of Marchantia polymorpha as a single circular form with no incorporation of foreign DNA. Biosci Biotechnol Biochem 56:132135.[Medline]
Ohyama K, Fukuzawa H, Kohchi T, et al. (13 co-authors). (1986) Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322:572574.
Page RD. (1996) TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12:357358.
Petersen J, Teich R, Becker B, Cerff R, Brinkmann H. (2006) The GapA/B gene duplication marks the origin of Streptophyta (charophytes and land plants). Mol Biol Evol 23:11091118.



