MBE Advance Access originally published online on December 15, 2005
Molecular Biology and Evolution 2006 23(3):663-674; doi:10.1093/molbev/msj075
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Article |
Phylogenomic Analysis Identifies Red Algal Genes of Endosymbiotic Origin in the Chromalveolates
Department of Biological Sciences and Roy J. Carver Center for Comparative Genomics, University of Iowa
E-mail: debashi-bhattacharya{at}uiowa.edu.
| Abstract |
|---|
|
|
|---|
Endosymbiosis has spread photosynthesis to many branches of the eukaryotic tree; however, the history of photosynthetic organelle (plastid) gain and loss remains controversial. Fortuitously, endosymbiosis may leave a genomic footprint through the transfer of endosymbiont genes to the "host" nucleus (endosymbiotic gene transfer, EGT). EGT can be detected through comparison of host genomes to uncover the history of past plastid acquisitions. Here we focus on a lineage of chlorophyll ccontaining algae and protists ("chromalveolates") that are postulated to share a common red algal secondary endosymbiont. This plastid is originally of cyanobacterial origin through primary endosymbiosis and is closely related among the Plantae (i.e., red, green, and glaucophyte algae). To test these ideas, an automated phylogenomics pipeline was used with a novel unigene data set of 5,081 expressed sequence tags (ESTs) from the haptophyte alga Emiliania huxleyi and genome or EST data from other chromalveolates, red algae, plants, animals, fungi, and bacteria. We focused on nuclear-encoded proteins that are targeted to the plastid to express their function because this group of genes is expected to have phylogenies that are relatively easy to interpret. A total of 708 genes were identified in E. huxleyi that had a significant Blast hit to at least one other taxon in our data set. Forty-six of the alignments that were derived from the 708 genes contained at least one other chromalveolate (i.e., besides E. huxleyi), red and/or green algae (or land plants), and one or more cyanobacteria, whereas 15 alignments contained E. huxleyi, one or more other chromalveolates, and only cyanobacteria. Detailed phylogenetic analyses of these data sets turned up 19 cases of EGT that did not contain significant paralogy and had strong bootstrap support at the internal nodes, allowing us to confidently identify the source of the plastid-targeted gene in E. huxleyi. A total of 17 genes originated from the red algal lineage, whereas 2 genes were of green algal origin. Our data demonstrate the existence of multiple red algal genes that are shared among different chromalveolates, suggesting that at least a subset of this group may share a common origin.
Key Words: chromalveolates Emiliania huxleyi endosymbiosis gene transfer phylogenomics
| Introduction |
|---|
|
|
|---|
An important challenge in evolutionary biology is to clarify the origin and spread of the photosynthetic organelle (plastid) in algae and plants. It is believed that plastids ultimately trace their origin to a single primary endosymbiosis (fig. 1A); that is, the acquisition and retention of a cyanobacterium by a heterotrophic eukaryote (e.g., Bhattacharya and Medlin 1995
2.5- to 5-million base pair genome of the prokaryote was reduced through outright gene loss and gene transfer (endosymbiotic gene transfer, EGT) to the "host" nucleus (Martin and Hermann 1998), resulting in a typical plastid genome that encodes between 100 and 200 genes. The products of the transferred genes that are involved in photosynthesis were retargeted to the plastid (i.e., plastid-targeted proteins) with the evolution of an N-terminal extension that allowed passage of the protein through the two plastid membranes (McFadden 1999
|
The primordial alga that resulted from the primary endosymbiosis diverged into three lineages (fig. 1B), the Rhodophyta (red algae), the Glaucophyta, and the Viridiplantae (green algae plus land plants), that together are referred to as the Plantae (Cavalier-Smith 2004
Following secondary endosymbiosis, the process of EGT also occurred from both the nuclear genome of the engulfed alga that contained the genes encoding plastid-targeted proteins required for plastid function and from the secondary plastid genome (McFadden 1999
). The transferred genes evolved bipartite protein targeting signals for traversing the 34 membranes of the secondary plastid (McFadden 1999
). In phylogenies, these nuclear-encoded (now secondary) plastid-targeted proteins are predicted to be nested within their donor taxa (red or green algae) that are sister to cyanobacterial homologs, as described above (see fig. 1D).
Here we used phylogenomics (e.g., Eisen and Fraser 2003
; Huang et al. 2004
) and a novel unigene data set of 5,081 expressed sequence tags (ESTs) from the haptophyte alga Emiliania huxleyi to study nuclear-encoded proteins that are targeted to the plastid in this species and in other related "chromalveolate" protists. The chromalveolates include the alveolates (dinoflagellates, apicomplexans, and ciliates) and chromists (cryptophytes, haptophytes, and stramenopiles) and are postulated to share a single red algal secondary endosymbiosis in their common ancestor (Cavalier-Smith 1999
). Our analysis combined available genomic sequences (both complete genome and EST data) from plants, animals, fungi, bacteria, red algae, and chromalveolates. The preliminary trees identified with our phylogenomic pipeline were used as starting points for extensive database searches from which we prepared phylogenies that included the available sequence data. Our analyses resulted in 19 protein maximum likelihood trees that did not contain significant paralogy (allowing relative ease of interpretation) and had significant bootstrap support at the internal nodes to allow us to robustly identify the source of the EGT. A total of 17 genes were of the expected red algal origin, consistent with the chromalveolate hypothesis, whereas two genes had a green algal ancestry. Our data provide evidence for significant red algal EGT in chromalveolates and suggest that some of these taxa may share a monophyletic origin.
| Materials and Methods |
|---|
|
|
|---|
Phylogenomic Pipeline
We generated a set of 5,081 unique complementary DNAs (cDNAs) (J. D. Hackett, D. Bhattacharya, and M. B., Soares unpublished data) from the haptophyte alga E. huxleyi CCMP 1280 (as in Hackett et al. 2004
The 5,081 DNA sequences were translated into the six possible open reading frames using the Transeq program in the Emboss package (http://emboss.sourceforge.net/). The final fasta file that included all of the data was formatted using the formatdb program in the Blast package (http://www.ncbi.nlm.nih.gov/BLAST/) and comprised the database for the PhyloGenie Blast search. We initially included a minimal set of 10 species in the local database comprising the dinoflagellates Alexandrium tamarense (Hackett et al. 2005
) and Karenia brevis (Lidie et al. 2005
), the diatom (stramenopile) Thalassiosira pseudonana (Armbrust et al. 2004
), the apicomplexan Plasmodium falciparum, the green alga Chlamydomonas reinhardtii, the red alga C. merolae (Matsuzaki et al. 2004
), Drosophila melanogaster, Saccharomyces cerevisiae, the cyanobacterium Nostoc sp. PCC 7120, and Escherichia coli. We set the minimum Expect "e value" for the Blast search of these data at 10. To run PhyloGenie, it was necessary to give the java virtual machine the right to use up to 1,000 megabytes of memory via the "java -jar blammer.jar" command. Otherwise, the program did not run and returned the "java.lang.outOfMemoryError" message. All hits with an e value better than 0.01 were then taken to build the hidden Markov model (hmm) alignments. All other parameters were kept as default. The program TreeView (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html) was used to visualize the resulting trees.
To simplify the search for plastid-targeted proteins in chromalveolates, we retained trees in which only one gene copy or a small number of closely related copies were found for each species in the alignment. Two classes of trees were retained for more extensive analysis. The first class included E. huxleyi and at least one other chromalveolate (usually the diatom T. pseudonana), a red or green alga (i.e., C. merolae and/or C. reinhardtii), and Nostoc. All trees containing D. melanogaster and/or S. cerevisiae or lacking the cyanobacteria were excluded from this group. The second class addressed the potential issue that plastid-targeted proteins of cyanobacterial origin may be too highly divergent in the red and/or green algae to be easily identified with our pipeline and therefore included trees that contained E. huxleyi and at least one other chromalveolate and Nostoc. To verify these results, the candidate genes from the initial run were used as input for a second run under PhyloGenie with the addition in the local database of the predicted proteins from the following eight genome data sets and the EST data set from Toxoplasma gondii (available from NCBI): eukaryotesArabidopsis thaliana, Giardia intestinalis, Guillardia theta (nucleomorph genome), Trypanosoma brucei and prokaryotesHalobacterium sp. NRC-1, Sulfolobus tokodaii, Synechococcus elongatus PCC 7942, Trichodesmium erythraeum. Trees that had complex patterns of gene family evolution such as deep paralogy with duplicated genes distributed across different eukaryotic lineages or had low bootstrap support at nodes (e.g., due to small protein size) were again discarded from subsequent analyses. This conservative approach most certainly resulted in an underestimation of the number of nuclear-encoded plastid-targeted genes in E. huxleyi and other chromalveolates but provided a manageable set of candidate trees for building the final in-depth alignments.
Building the Final Alignments
All potential homologs of the candidate E. huxleyi sequences that were used to build the final alignments were identified using Blast searches (e value
1010) against the GenBank nonredundant (nr) and Expressed Sequence Tag (dbEST) and other databases. In particular, we focused on red algal and chromalveolate data including Galdieria sulphuraria (Weber et al. 2004
; Michigan State University Galdieria Database http://genomics.msu.edu/galdieria/sequence_data.html), Porphyra yezoensis (Asamizu et al. 2003
; http://www.kazusa.or.jp/en/plant/porphyra/EST), and Phaeodactylum tricornutum (Scala et al. 2002
; http://avesthagen.sznbowler.com/). Overlapping aa sequences from each taxon were aligned using ClustalW and adjusted manually under BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html).
The intracellular destination of the proteins was inferred from the similarity to known plastid-targeted proteins (primarily in the annotated genomes of C. merolae and A. thaliana) and from analysis of the N-terminal extensions using the following transit peptide prediction programs: TargetP (plant version) (http://www.cbs.dtu.dk/services/TargetP), PlasmoAP (http://www.plasmodb.org/cgi-bin/plasmoap.cgi), and Prediction of Apicoplast-Targeted Sequences (PATS, http://gecco.org.chemie.uni-frankfurt.de/pats/pats-index.php). The length of the N-terminal extension predicted by TargetP was verified using the protein alignments. The annotated genome data from C. merolae were used to identify gene function in E. huxleyi and in other chromalveolates.
Phylogenetic Analysis
For each data set, a phylogeny was reconstructed under maximum likelihood (ML) using the PHYML V2.4.3 computer program (Guindon and Gascuel 2003
) with the WAG + I +
evolutionary model and tree optimization. The alpha value for the gamma distribution was calculated using eight rate categories. To assess the stability of monophyletic groups in the ML trees, we calculated PHYML bootstrap (100 replicates) support values (Felsenstein 1985
). In addition, we calculated bootstrap values (100 replications) using the neighbor joining (NJ) method with JTT +
distance matrices (PHYLIP V3.63, http://evolution.genetics.washington.edu/phylip.html). The NJ analysis was done with randomized taxon addition. Finally, we generated Bayesian posterior probabilities for nodes in the ML tree using MrBayes V3.0b4 (Huelsenbeck and Ronquist 2001
) and the WAG +
model with Metropolis-coupled Markov chain Monte Carlo from a random starting tree. The Bayesian analyses were run for 1,000,000 generations with trees sampled each 1,000 cycles. Four chains were run simultaneously of which three were heated and one was cold, with the initial 500,000 cycles (500 trees) being discarded as the "burn in." A consensus tree was made with the remaining 500 phylogenies to determine the posterior probabilities at the different nodes.
| Results and Discussion |
|---|
|
|
|---|
The goal of our research was to identify the sources of nuclear-encoded plastid-targeted proteins in the haptophyte alga E. huxleyi and in the limited genome data available from other chromalveolates. To this end, we analyzed all genes of cyanobacterial origin in the nuclear genome of E. huxleyi that were found either only in chromalveolates or in chromalveolates and members of the Plantae. The expectation under the chromalveolate hypothesis, described later in detail, is that the majority of plastid-targeted genes of cyanobacterial origin in this group should be monophyletic and specifically related to the red algae (i.e., as sister to the Cyanidiales; Yoon et al. 2002
Chromalveolate Hypothesis
Chromalveolate monophyly would unify a broad assemblage of protists but remains strongly in question because of incomplete phylogenetic data. Analyses of plastid genes (e.g., Yoon et al. 2002
; Hagopian et al. 2004
; Bachvaroff, Sanchez Puerta, and Delwiche 2005
) provide evidence that chromist plastids are closely related to each other and likely monophyletic, consistent with a single origin of the organelle in this group. The topology of plastid gene trees shows an early divergence of the cryptophytes with the haptophytes and stramenopiles forming a sister group. The highly divergent alveolate sequences are more difficult to place in plastid gene trees, but a recent analysis by Yoon et al. (2005)
reveals that dinoflagellate secondary plastids have a weakly supported sister group relationship to the stramenopiles. Analysis of 10 plastid-encoded proteins from the dinoflagellate Amphidinium operculatum also places this species within (not sister to) the chromists (Bachvaroff, Sanchez Puerta, and Delwiche 2005
). Plastid monophyly does not, however, prove the chromalveolate hypothesis because these organelles could potentially have resulted from multiple independent secondary endosymbioses involving closely related red algae or tertiary endosymbioses involving existing chromalveolates (e.g., a stramenopile origin of the dinoflagellate peridinin plastid).
Phylogenies of the host nuclear genes are equivocal with respect to chromalveolate monophyly. Trees inferred from concatenated nuclear data sets strongly support a sister group relationship between the stramenopiles and alveolates (e.g., Baldauf et al. 2000
; Harper, Waanders, and Keeling 2005
). However, the position of the cryptophytes and haptophytes remains uncertain with a recent analysis of a six-protein data set providing weak support for their monophyly. This group was however distantly related to the stramenopile + alveolate clade in the trees (fig. 1E; Harper, Waanders, and Keeling 2005
).
An alternative approach to assess chromalveolate monophyly that was taken here is to study the phylogenies of nuclear genes encoding plastid-targeted proteins. Because the red algae (as members of the Plantae) are distantly related to chromalveolates (fig. 1E; e.g., Baldauf et al. 2000
; Rodríguez-Ezpeleta et al. 2005
), nuclear genes shared among chromalveolates that have a well-supported sister group relationship to the red algae would most likely (barring multiple red algal horizontal transfers) have originated through EGT via the secondary endosymbiosis. These trees would not prove chromalveolate monophyly but rather test the prediction that genes of red algal origin are shared by the different members of this lineage. Such a finding would be most easily explained by a single origin of the genes in the chromalveolate common ancestor through EGT from a red algal endosymbiont. Nuclear-encoded proteins of red algal origin have been reported for several species of chromalveolates; e.g., ftsZ in stramenopiles and cryptophytes (Miyagishima et al. 2004
), atpF and atpI in dinoflagellates (Hackett et al. 2004
), and genes of red algal origin involved in the amylopectin pathway have been found in apicomplexans (Coppin et al. 2005
). Analyses of the plastid-targeted glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and fructose-1,6-bisphosphate aldolase (FBA) also support chromalveolate monophyly (Fast et al. 2001
; Patron, Rogers, and Keeling 2004
). These compelling data are, however, based on unusual evolutionary histories. In the first case, the gene encoding plastid-targeted GAPDH has resulted from the duplication of the cytosolic gene of the secondary plastid host and the retargeting of one of the copies to the plastid, whereas in the second case, the plastid-targeted FBA arose from the retargeting of a class II FBA gene of uncertain origin.
Results of the Phylogenomic Analysis
The initial phylogenomic analysis returned a total of 708 genes (i.e., the inferred protein sequences) in the E. huxleyi ESTs that had a significant Blast hit to at least one other taxon in our local database. Of these alignments, completion of the second round of analysis showed that 46 contained at least one other chromalveolate (i.e., besides E. huxleyi), a Plantae member, and one or more cyanobacteria, whereas 15 alignments contained E. huxleyi, other chromalveolates, and only cyanobacteria. This list of 61 genes is shown in table S1 (Supplementary Material online). Detailed phylogenetic analyses of these data sets resulted in 19 trees (E. huxleyi shown in large boldface text in each phylogeny) that contained plastid-targeted proteins that did not contain significant paralogy and had strong ML bootstrap support at the internal nodes, allowing us to confidently identify their sources in E. huxleyi (see table 1). We note that many genes that are of cyanobacterial origin and encode plastid-targeted proteins (e.g., chlorophyll-binding proteins) gave rise to trees that were simply too complex for us to infer the history of each gene copy among the taxa and these were not considered further. In addition, by disposing of all trees that included non-Plantae taxa, we likely excluded genes or gene family members that encode plastid-targeted proteins, but the presence of taxa such as the ophistokonts suggests an ancient gene origin (or potentially ancient lateral transfer) that predates the Plantae. In our hands, the laborious process of gene selection was largely guided by the effect of taxon addition. The 19 genes considered here for in-depth analysis became more resolved and easier to interpret with the addition of taxa (e.g., the second round of phylogenomics), suggesting that they were useful phylogenetic markers with the available data. These genes were primarily annotated using the C. merolae genome data (see table 1) and a total of 17 originated from the red algal lineage, whereas 2 genes were of green algal origin. Although we do not have taxon sampling from each taxonomic group of chromalveolates in these trees, there are least two different lineages represented in each phylogeny. In the following section, we describe in detail the putative function of several outstanding examples from this gene set and their inferred evolutionary histories (other trees [except geranyl geranyl diphosphate synthase] are found in figs. S2, S3, and S4 [Supplementary Material online]). We did not present the results of the analysis of PSBO and FTSZ because these have been previously reported (e.g., Ishida and Green 2002
; Hackett et al. 2004
; Miyagishima et al. 2004
).
|
Thylakoid Lumen and Pentapeptide Proteins
Our analysis of the E. huxleyi ESTs revealed two conserved proteins that are members of the pentapeptide repeat protein family (PEP). From our Blast search, when using the haptophyte sequences as the query, significant hits were found to homologs in cyanobacteria, red algae, green algae, and plants (i.e., e value
1010). One of these sequences showed a high similarity to the thylakoid lumen protein (TLP) that is annotated as plastid targeted in the dinoflagellate Heterocapsa triquetra (Patron et al. 2005The TLP and PEP protein sequences were included in a single alignment for the phylogenetic analysis with the branch connecting these paralogs (see filled circle in fig. 2A) used to root each subtree (134 aa, fig. 2A). Although all the nodes within subtrees are not fully resolved, likely due to the small size of the data set, the ML tree is most easily interpreted as supporting a cyanobacterial origin of the TLP and PEP genes in plants and algae. The gene duplication that gave rise to these genes occurred in the cyanobacteria prior to their transfer into the Plantae nuclear genome following primary endosymbiosis. These genes entered the nucleus of chromalveolates (i.e., alveolates, haptophytes, and stramenopiles for TLP) via secondary EGT from a red alga. A similar evolutionary history is found for the duplicated mRNA-binding proteins (see fig. S3, Supplementary Material online). These topologies are predicted by the chromalveolate hypothesis (fig. 1D), although the data do not allow us to resolve the branching order within the chromalveolate clade. To test these findings, we concatenated the TLP and PEP protein sequences into a single alignment (269 aa) to gain more phylogenetic resolution. The A. tamarense TLP and H. triquetra PEP proteins were combined to create a "Dinoflagellate" sequence in this data set. The ML and NJ bootstrap support values supporting the cyanobacterial origin of TLP and PEP were 100% in the concatenated protein tree (fig. S1, Supplementary Material online), and the red algal origin of these genes in chromalveolates and the monophyly of the chromalveolate clade was supported by the Bayesian inference (prob = 0.95, 1.0) and by the ML (83%, 70%) and NJ (62%, 64%) bootstrap analyses, respectively.
|
FKBP_C Protein
FKBP_C, or trigger factor-like protein, is a member of the FKBP family of immunophilins (He, Li, and Luan 2004
Our analysis shows that this gene is nuclear encoded in three chromalveolates: E. huxleyi, and the two diatoms P. tricornutum, and T. pseudonana. The ML tree provides moderate bootstrap support for the monophyly of chromalveolate FKBP_C (ML = 89%) and their close evolutionary relationship to homologs in the red algae (ML = 79%, NJ = 71%; as in fig. 1D). Within eukaryotes there is, as would be expected, a close relationship between the diatom sequences (i.e., P. tricornutum and T. pseudonana; ML = 100%, NJ = 100%) and a sister group relationship between chlorophyte (Ulva linza) and land plant FKBP_C (ML = 95%, NJ = 93%).
Hypothetical Plastid-Targeted Protein aaui170
Analysis of the E. huxleyi EST library with PhyloGenie revealed two closely related genes of a hypothetical nuclear-encoded protein that is clearly of cyanobacterial origin. We named these proteins aaui170, corresponding to a shortened version of the identification number of the E. huxleyi clone encoding one of these copies (UI-EH-HG2-aau-I-17-0-UI.s1, see table 1). The Blast searches against GenBank (BlastP against the nr database and TBlastN against the est_others database) and against our local database revealed homologs in plants (>30 species, many of these were annotated as seed maturation-like protein), photosynthetic protists (green algae, red algae, stramenopiles, haptophytes), and apicomplexans. According to TargetP, the plant (e.g., Asparagus officinalis [prob = 0.939; TPlen = 82], A. thaliana [prob = 0.869; TPlen = 58]) homologs contain N-terminal extensions for plastid targeting, whereas the sequence in C. merolae was predicted to have a similar targeting potential for either the mitochondrion (prob = 0.525; TPlen = 76) or the plastid (prob = 0.544; Tplen = 76). The two aaui170 homologs from T. pseudonana contain typical stramenopile bipartite plastid-targeting sequences that consist of a 33 aalong N-terminal signal peptide followed by a 29 aalong transit peptide. According to PlasmoAP, the significant N-terminal extensions in P. falciparum and Plasmodium yoelii aaui170 did not encode an apicoplast-targeting signal (3/5 tests returned positive). However, analysis of these sequences with PATS suggested the existence of full-length apicoplast-targeting signals in these taxa (P. falciparum, prob = 1.00; P. yoelii, prob = 0.996).
Phylogenetic analysis of this data set (192 aa) provides strong bootstrap (ML = 89%, NJ = 87%) and Bayesian support for a single origin of the aaui170 genes in chromalveolates from a red algal source (ML = 94%, NJ = 91%, fig. 2C). The sister group relationship between the red + chromalveolate and the chlorophyte (C. reinhardtii) + land plant clades (ML = 99%, NJ = 97%) and the close phylogenetic relationship of these eukaryotic sequences to homologs in cyanobacteria fit in well with the scenario shown in figure 1. These data argue strongly for the existence of a red algal endosymbiont in apicomplexans that is shared with the stramenopiles and haptophytes, consistent with the findings of Coppin et al. (2005)
.
Plastid-Specific 30S and L10 Ribosomal Proteins
Plastid-specific 30S ribosomal protein (PSRP-1) is a member a family of proteins that are believed to bind to ribosomes and regulate protein translation in the plastid (Yamaguchi and Subramanian 2000
). A likely homolog of PSRP-1 in E. coli (protein Y) has also been implicated in the regulation of translation during cold shock (for a review, see Wilson and Nierhaus 2004). Homologs of PSRP-1 are widespread in cyanobacteria, other eubacteria (known as Sigma 54 modulation protein or S30EA ribosomal protein in this group), and in sequenced land plant genomes. Analysis of the N-terminal extensions in plants such as Spinacia oleracea (prob = 0.975; TPlen = 64) and Lycopersicon esculentum (prob = 0.929; TPlen = 75) suggest strongly that these proteins are plastid targeted, whereas TargetP is unable to find a targeting signal for the C. merolae PSRP-1 homolog. Phylogenetic analyses confirm the suspected cyanobacterial origin of the nuclear gene encoding PSRP-1 (Johnson, Kruft, and Subramanian 1990
) with the ML tree providing strong bootstrap support for the origin of haptophyte and stramenopile PSRP-1 from a red algal source (ML = 90%, NJ = 89%). PSRP-1 in the distantly related chlorarachniophyte amoeba Bigelowiella natans is monophyletic with the chromalveolate clade, but this likely represents an independent lateral transfer of a chromalevolate gene into this organism (see Archibald et al. 2003
).
The ML tree inferred from the plastid-targeted L10 ribosomal protein also conforms to the expectations of the chromalveolate hypothesis (fig. 2E). This protein contains a strong signal for plastid targeting in plants (e.g., A. thaliana [prob = 0.887; TPlen = 40], O. sativa [prob = 0.763; TPlen = 50]) and in C. merolae (prob = 0.769; TPlen = 36). L10 in the cryptophyte G. theta has been annotated as being plastid targeted (GenBank CAH25357). The chromalveolates form a monophyletic group in the L10 tree with weak ML (69%) bootstrap support and with Bayesian (P = 1.0) support. The interrelationships of the chromalveolate taxa conform to the expectation from plastid gene trees with cryptophytes as sister to a clade defined by the haptophytes and stramenopiles (e.g., Yoon et al. 2004
, 2005
). However, this is the case only when the highly divergent C. merolae sequence is removed from the analysis. This red alga branches, without bootstrap support, as sister to E. huxleyi (see filled square in fig. 2E) when retained in the data set.
Magnesium-Chelatase Subunits CHLI and CHLD
Magnesium-chelatase is involved in chlorophyll synthesis; that is, Mg2+ is inserted into protoporphyrin IX to form Mg-protoporphyrin IX (Romano et al. 2005
). This function is usually carried out by an association of three protein subunits: CHLD, CHLH, and CHLI (Jensen et al. 1999
; Gibson et al. 1995
). The genes for plastid-encoded CHLI and nuclear-encoded CHLD are related through a gene duplication and fusion event and share about 40% aa identity (Jensen et al. 1996
). We identified one chlD sequence that has homologs in cyanobacteria, red algae, land plants, and chromalveolates. CHLD in all the studied land plants contains a plastid-targeting signal (e.g., Pisum sativum, prob = 0.831; TPlen = 51, A. thaliana, prob = 0.822; Tplen = 49), and this protein appears to be plastid targeted in C. merolae (prob = 0.641; TPlen = 56). ChlI is found in the plastid of plants and all algae (Jensen et al. 1996
) except peridinin-containing dinoflagellates (Hackett et al. 2004
; Bachvaroff et al. 2004
). We analyzed CHLD and CHLI separately to compare the topology of these trees.
The CHLI (324 aa) tree (fig. 3A) is typical (i.e., fig. 1B) for plastid-encoded proteins (Yoon et al. 2005
). The monophyly of the green and red + chromalveolate clades, their distant phylogenetic relationship to the glaucophyte Cyanophora paradoxa, as well as the origin of this gene from a cyanobacterial primary endosymbiont are supported (the latter only weakly) in the ML tree. The CHLD phylogeny (564 aa) provides essentially the same topology with moderate bootstrap support (ML = 92%, NJ = 64%) for the sister group relationship of chromalveolates and red algae (fig. 3B). Again, this result is consistent with the scenario shown in figure 1D and implies that the chlD gene had a cyanobacterial origin in the Plantae and that the chromalveolates most likely obtained this sequence through red algal EGT.
|
Genes of Green Algal Origin
Of the 20 protein ML trees that were analyzed in detail after the phylogenomics approach, two (chlorophyll a synthase, phosphorubulokinase) suggested a green algal rather than a rhodophyte ancestry for nuclear-encoded plastid-targeted genes in chromalveolates. A green algal contribution to alveolate nuclear genomes or the existence of a plastid of green algal origin in these taxa has been previously suggested and hotly debated (e.g., Funes et al. 2002
Chlorophyll a Synthase
Chlorophyll a synthase catalyzes the final step in chlorophyll biosynthesis, the introduction of the tetraprenyl side chain, and is also implicated in the regulation of photosynthesis (Schmid et al. 2001
). This well-studied enzyme (included in the Pfam UbiA prenyltransferase family) is widespread in cyanobacteria and other eubacteria and in plant and algal nuclear genomes. In our Blast searches, the eukaryotic genes were more closely related to cyanobacterial orthologs (<10100) than to orthologs in other eubacteria (approximately 1050 to 1020) supporting their origin in algae/plants through primary endosymbiosis. The plant proteins in our alignment all contain a plastid-targeting signal (e.g., A. thaliana, prob = 0.849; Tplen = 57, Avena sativa, prob = 0.947; Tplen = 45) as does this protein from C. merolae (prob = 0.820; Tplen = 50). The protein ML tree of chlorophyll a synthase (fig. 4A) provides strong support for the monophyly of the plastid proteins (ML = 100%, NJ = 98%) relative to the cyanobacteria, and the NJ (74%) and Bayesian (prob = 1.0) methods suggest a green algal origin of the gene in chromalveolates and in B. natans.
|
Phosphoribulokinase
Class II phosphoribulokinase (PRK) is found in most photosynthetic organisms including cyanobacteria, photosynthetic algae, and land plants. Along with ribulose bisphosphate carboxylase/oxygenase, GAPDH, fructose-1,6-bisphosphate, and sedoheptulose-1,7-bisphosphatase (SBPase), PRK is considered to be a key enzyme of the Calvin cycle (Graciet, Lebreton, and Gontero 2004
Phylogenetic analysis of PRK confirms its cyanobacterial origin in algae and plants (fig. 4B). The eukaryotic clade is divided into the red (NJ = 86%), chromalveolate (ML = 66%, NJ = 83%), and green lineages (ML = 96%, NJ = 94%). The Bayesian and bootstrap (ML = 100%, NJ = 93%) support for the monophyly of chromalveolate and "green" PRKs supports a green algal origin of this gene in at least the dinoflagellates (A. tamarense, Amphidinium carterae, Heterocapa triquetra [see arrow in fig. 4B]), haptophytes, and stramenopiles. Within the chromalveolates, there is significant Bayesian (prob = 1.0) and bootstrap (ML = 100%) support for a close relationship between the haptophytes and dinoflagellates (see fig. S4, Supplementary Material online). This result may, however, reflect the highly variable PRK divergence rates among the different eukaryotic clades combined with a shared rate elevation in haptophytes and dinoflagellates could that lead to artifactual long-branch attraction.
| Summary |
|---|
|
|
|---|
Understanding the role of EGT in shaping algal and plant genomes remains a significant challenge in comparative genomics (e.g., Martin et al. 2002
Although we suggest that the chromalveolate hypothesis is the most parsimonious explanation for our data, there are several caveats to our analysis. First, the cryptophytes are missing from all of our nuclear data sets except for the L10 ribosomal protein (fig. 2E) and glutamyltransfer RNA reductase (fig. S2, Supplementary Material online) trees. In the latter case, the cryptophyte G. theta does not group with the other chromalveolates (with moderate bootstrap support). Whether this result reflects the poor resolution associated with a single-protein analysis or a potential independent origin of this plastid or the plastid-targeted gene remains to be determined. Clearly, cryptophytes need to be included in a larger number of nuclear gene trees to determine their position within the chromalveolates. Second, the relative positions of chromalveolate members are in conflict between the different trees. The existing nuclear gene trees (e.g., Baldauf et al. 2000
; Harper, Waanders, and Keeling 2005
) suggest that stramenopiles and alveolates should form a monophyletic group, whereas plastid gene/genomes trees (e.g., Hagopian et al. 2004
; Bachvaroff, Sanchez Puerta, and Delwiche 2005
; Yoon et al. 2005
) suggest a sister group relationship between haptophytes and stramenopiles. Both these results (as well as other topologies) are found in our trees (e.g., stramenopiles + alveolates; aaui170 [fig. 2C], dihydrolipoamide dehydrogenase [fig. 2S, Supplementary Material online]), presently making it impossible to ascertain the true interrelationships of chromalveolates. Taxon sampling from additional chromalveolates may address this issue, although any single tree may not unambiguously support a given topology due to a deficit of phylogenetic signal (see Yoon et al. 2005
).
In conclusion, our results lead to three major insights into endosymbiosis: (1) uncovering sufficient and convincing examples of EGT will likely require a concerted comparative genomic approach (e.g., Martin et al. 2002
) rather than reliance on anecdotal findings; (2) the present results suggest that only a small subset of candidate genes will likely be of sufficient length, conservation, and free of extensive paralogy to address ancient gene transfer events; and (3) although our data are consistent with the chromalveolate hypothesis, the often unresolved interrelationships of chromalveolates in our trees leaves open the possibility that the transferred genes of red algal origin may have originated through independent horizontal transfers in different lineages or through a series of endosymbioses involving red algae or algae containing red algal endosymbionts. A resolved host tree of chromalveolates combined with a more detailed understanding of endosymbiotic events in this lineage will ultimately prove or modify this fascinating model of eukaryotic evolution. And finally, our analysis identified a novel putatively plastid-targeted protein (aaui170, according to PATS) that is conserved across photosynthetic eukaryotes and in apicomplexans. Phylogenomics offers, therefore, the opportunity to identify novel proteome components in algal/plant plastids and in the apicoplast. These proteins could be of potential importance in understanding organelle function.
| Supplementary Material |
|---|
|
|
|---|
Supplementary table S1, figures S1S4, and figure 2S and are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This work was supported by grants from the National Science Foundation awarded to D.B. (MCB 02-36631, EF 04-31117). S.L. and T.N. were partially supported by an Avis E. Cone Research Fellowship from the University of Iowa. J.D.H. was supported by an Institutional NRSA grant (T 32 GM98629) from the National Institutes of Health. We are grateful to T. Frickey for assistance with the use of PhyloGenie.
| Footnotes |
|---|
1 These authors have contributed equally to the manuscript.
2 Present address: Biology Department, Woods Hole Oceanographic Institution. ![]()
Martin Embley, Associate Editor
| References |
|---|
|
|
|---|
Archibald, J. M., M. B. Rogers, M. Toop, K. Ishida, and P. J. Keeling. 2003. Lateral gene transfer and the evolution of plastid-targeted proteins in the secondary plastid-containing alga Bigelowiella natans. Proc. Natl. Acad. Sci. USA 100:76787683.
Armbrust, E. V., J. A. Berges, C. Bowler et al. (45 co-authors). 2004. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 306:7986.
Asamizu, E., M. Nakajima, Y. Kitade, N. Saga, Y. Nakamura, and S. Tabata. 2003. Comparison of RNA expression profiles between the two generations of Porphyra yezoensis (Rhodophyta), based on expressed sequence tag frequency analysis. J. Phycol. 39:923930.[ISI]
Bachvaroff, T. R., G. T. Concepcion, C. R. Rogers, E. M. Herman, and C. F. Delwiche. 2004. Dinoflagellate expressed sequence tag data indicate massive transfer of chloroplast genes to the nuclear genome. Protist 155:6578.[Medline]
Bachvaroff, T. R., M. V. Sanchez Puerta, and C. F. Delwiche. 2005. Chlorophyll c-containing plastid relationships based on analyses of a multigene data set with all four chromalveolate lineages. Mol. Biol. Evol. 22:17771782.
Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972977.
Bhattacharya, D., and L. Medlin. 1995. The phylogeny of plastids: a review based on comparisons of small-subunit ribosomal RNA coding regions. J. Phycol. 31:489498.[CrossRef][ISI]
Bhattacharya, D., H. S. Yoon, and J. D. Hackett. 2004. Photosynthetic eukaryotes unite: endosymbiosis connects the dots. BioEssays 26:5060.[CrossRef][ISI][Medline]
Bonaldo, M. F., G. Lennon, and M. B. Soares. 1996. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6:791806.
Cavalier-Smith, T. 1999. Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree. J. Eukaryot. Microbiol. 46:347366.[CrossRef][ISI][Medline]
. 2004. Only six kingdoms of life. Proc. Biol. Sci. 271:12511262.
Coppin, A., J. S. Varre, L. Lienard, D. Dauvillee, Y. Guerardel, M. O. Soyer-Gobillard, A. Buleon, S. Ball, and S. Tomavo. 2005. Evolution of plant-like crystalline storage polysaccharide in the protozoan parasite Toxoplasma gondii argues for a red alga ancestry. J. Mol. Evol. 60:257267.[CrossRef][ISI][Medline]
Delwiche, C. F. 1999. Tracing the thread of plastid diversity through the tapestry of life. Am. Nat. 154(S4):S164S177.[CrossRef][Medline]
Eisen, J. A., and C. M. Fraser. 2003. Phylogenomics: intersection of evolution and genomics. Science 300:17061707.
Fast, N. M., J. C. Kissinger, D. S. Roos, and P. J. Keeling. 2001. Nuclear-encoded, plastid-targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids. Mol. Biol. Evol. 18:418426.
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783791.[CrossRef][ISI]
Funes, S., E. Davidson, A. Reyes-Prieto, S. Magallon, P. Herion, M. P. King, and D. Gonzalez-Halphen. 2002. A green algal apicoplast ancestor. Science 298:2155.
Gibbs, S. P. 1993. The evolution of algal chloroplasts. Pp. 107121 in R. A Lewin. ed. Origins of plastids. Chapman and Hall, New York.
Gibson, L. C. D., R. D. Willows, C. G. Kannangara, D. von Wettstein, and C. N. Hunter. 1995. Magnesium-protoporphyrin chelatase of Rhodobacter sphaeroides: reconstitution of activity by combining the products of the bchH, -I, and -D genes expressed in Escherichia coli. Proc. Natl. Acad. Sci. USA 92:19411944.
Graciet, E., S. Lebreton, and B. Gontero. 2004. Emergence of new regulatory mechanisms in the Benson-Calvin pathway via protein-protein interactions: a glyceraldehyde-3-phosphate dehydrogenase/CP12/phosphoribulokinase complex. J. Exp. Bot. 55:12451254.
Graybeal, A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47:917.
Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696704.[CrossRef][ISI][Medline]
Hackett, J. D., H. S. Yoon, M. B. Soares, M. F. Bonaldo, T. L. Casavant, T. E. Scheetz, and D. Bhattacharya. 2005. Insights into a dinoflagellate genome through expressed sequence tag analysis. BMC Genomics 6:80.[CrossRef][Medline]
Hackett, J. D., H. S. Yoon, M. B. Soares, M. F. Bonaldo, T. L. Casavant, T. E. Scheetz, T. Nosenko, and D. Bhattacharya. 2004. Migration of the plastid genome to the nucleus in a peridinin dinoflagellate. Curr. Biol. 14:213218.[CrossRef][ISI][Medline]
Hagopian, J. C., M. Reis, J. P. Kitajima, D. Bhattacharya, and M. C. Oliveira. 2004. Comparative analysis of the complete plastid genome sequence of the red alga Gracilaria tenuistipitata var. liui: insight on the evolution of rhodoplasts and their relationship to other plastids. J. Mol. Evol. 59:464477.[CrossRef][ISI][Medline]
Harper, J. T., E. Waanders, and P. J. Keeling. 2005. On the monophyly of chromalveolates using a six-protein phylogeny of eukaryotes. Int. J. Syst. Evol. Microbiol. 55:487496.
He, Z., L. Li, and S. Luan. 2004. Characterization of an Arabidopsis cDNA encoding a thylakoid lumen protein related to a novel pentapeptide repeat family of proteins. Plant Physiol. 134:12481267.
Hillis, D. M., D. D. Pollock, J. A. McGuire, and D. J. Zwickl. 2003. Is sparse sampling a problem for phylogenetic inference? Syst. Biol. 52:124126.[ISI][Medline]
Huang, J., N. Mullapudi, C. A. Lancto, M. Scott, M. S. Abrahamsen, and J. C. Kissinger. 2004. Phylogenomic evidence supports past endosymbiosis, intracellular and horizontal gene transfer in Cryptosporidium parvum. Genome Biol. 5:R88.[Medline]
Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754755.
Ishida, K., and B. R. Green. 2002. Second- and third-hand chloroplasts in dinoflagellates: phylogeny of oxygen-evolving enhancer 1 (psbO) protein reveals replacement of a nuclear-encoded plastid gene by that of a haptophyte tertiary endosymbiont. Proc. Natl. Acad. Sci. USA 99:92949299.
Jensen, P. E., L. C. D. Gibson, K. W. Henningsen, and C. N. Hunter. 1996. Expression of the chlI, chlD, and chlH genes from the cyanobacterium Synechocystis PCC6803 in Escherichia coli and demonstration that the three cognate proteins are required for magnesium-protoporphyrin chelatase activity. J. Biol. Chem. 271:1666216667.
Jensen, P. E., L. C. D. Gibson, F. Shephard, V. Smith, and C. N. Hunter. 1999. Introduction of a new branchpoint in tetrapyrrole biosynthesis in Escherichia coli by co-expression of genes encoding the chlorophyll-specific enzymes magnesium chelatase and magnesium protoporphyrin methyltransferase. FEBS Lett. 455:349354.[Medline]
Johnson, C. H., V. Kruft, and A. R. Subramanian. 1990. Identification of a plastid-specific ribosomal protein in the 30 S subunit of chloroplast ribosomes and isolation of the cDNA clone encoding its cytoplasmic precursor. J. Biol. Chem. 265:1279012795.
Kieselbach, T., A. Mant, C. Robinson, and W. P. Schroder. 1998. Characterization of an Arabidopsis cDNA encoding a thylakoid lumen protein related to a novel pentapeptide repeat family of proteins. FEBS Lett. 428:241244.[CrossRef][Medline]
Köhler, S., C. F. Delwiche, P. W. Denny, L. G. Tilney, P. Webster, R. J. Wilson, J. D. Palmer, and D. S. Roos. 1997. A plastid of probable green algal origin in apicomplexan parasites. Science 275:14851489.
Lidie, K. L., J. C. Ryan, M. Barbier, and F. M. Van Dolah. 2005. Gene expression in the Florida red tide dinoflagellate Karenia brevis: analysis of an expressed sequence tag (EST) library and development of a DNA microarray. Mar. Biotechol. 7:481493.
Lopez-Juez, E., and K. A. Pyke. 2005. Plastids unleashed: their development and their integration in plant development. Int. J. Dev. Biol. 49:557577.[CrossRef][ISI][Medline]
Lupas, N. A., and T. Frickey. 2004. PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res. 32:52315238.
Martin, W., and R. G. Herrmann. 1998. Gene transfer from organelles to the nucleus: how much, what happens, and why? Plant Physiol. 118:917.
Martin, W., T. Rujan, E. Richly, A. Hansen, S. Cornelsen, T. Lins, D. Leister, B. Stoebe, M. Hasegawa, and D. Penny. 2002. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl. Acad. Sci. USA 99:1224612251.
Matsuzaki, M., O. Misumi, I. T. Shin et al. (40 co-authors). 2004. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428:653657.


0.95 posterior probability from Bayesian inference. Only bootstrap values 
