Skip Navigation


MBE Advance Access originally published online on February 22, 2008
Molecular Biology and Evolution 2008 25(4):643-654; doi:10.1093/molbev/msn034
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/4/643    most recent
msn034v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Swingley, W. D.
Right arrow Articles by Raymond, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Swingley, W. D.
Right arrow Articles by Raymond, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Integrating Markov Clustering and Molecular Phylogenetics to Reconstruct the Cyanobacterial Species Tree from Conserved Protein Families

Wesley D. Swingley*, Robert E. Blankenship{dagger} and Jason Raymond{ddagger}

* Institute of Low Temperature Science, Hokkaido University, Sapporo, Japan
{dagger} Departments of Biology and Chemistry, Washington University, St Louis, MO
{ddagger} School of Natural Sciences, University of California, Merced

E-mail: jason.raymond{at}ucmerced.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
Attempts to classify living organisms by their physical characteristics are as old as biology itself. The advent of protein and DNA sequencing—most notably the use of 16S ribosomal RNA—defined a new level of classification that now forms our basic understanding of the history of life on earth. High-throughput sequencing currently provides DNA sequences at an unprecedented rate, not only providing a wealth of information but also posing considerable analytical challenges. Here we present comparative genomics–based methods useful for automating evolutionary analysis between any number of species. As a practical example, we applied our method to the well-studied cyanobacterial lineage. The 24 cyanobacterial genomes compared here occupy a wide variety of environmental niches and play major roles in global carbon and nitrogen cycles. By integrating phylogenetic data inferred for upward of 1,000 protein-coding genes common to all or most cyanobacteria, we have reconstructed an evolutionary history of the phylum, establishing a framework for resolving key issues regarding the evolution of their metabolic and phenotypic diversity. Greater resolution on individual branches can be attained by telescoping inward to the larger set of conserved proteins between fewer taxa. The construction of all individual protein phylogenies allows for quantitative tree scoring, providing insight into the evolutionary history of each protein family as well as probing the limits of phylogenetic resolution. The tools incorporated here are fast, computationally tractable, and easily extendable to other phyla and provide a scaleable framework for contrasting and integrating the information present in thousands of protein-coding genes within related genomes.

Key Words: genomics • cyanobacteria • evolution • Markov clustering • phylogenomics


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
Although the 16S ribosomal RNA (rRNA) paradigm continues to provide a strong framework for understanding evolution, it represents only one small piece of an organism's history. The exponentially increasing number of genome sequencing projects is pushing our understanding of diversity well beyond the limitations of the single-gene proxy. Integrating the enormous wealth of genetic information—hundreds to tens of thousands of genes per genome—stands as one of the central challenges to biology in the 21st century. Ultimately, an evolutionary tree will be available for every (nonnovel) gene from every sequenced genome, providing a temporal and cross-species blueprint of how Darwinian evolution has brought these genes together into an organism able to thrive in its particular niche.

The goal of phylogenomics has recently been the subject of a number of novel and provocative approaches (Eisen 1998Go; Lerat et al. 2003Go; Rivera and Lake 2004Go; Delsuc et al. 2005Go; Snel et al. 2005Go). Although insightful, their results are often quite controversial; for example, some strongly support the canonical tree of life as deduced by 16S rRNA analysis, whereas others suggest striking rearrangements to this orthodoxy (Wolf et al. 2002Go; Charlebois et al. 2003Go; Doolittle 2005Go; Ciccarelli et al. 2006Go). Perhaps the best developed and most rigorously tested of these methods, molecular phylogeny, have been difficult to implement due primarily to computational challenges of constructing gene trees with very large data sets. Furthermore, single-gene phylogenies are complex by default, often reflecting nonvertical evolution due to horizontal gene transfer, gene duplication (paralogy), and loss (Gogarten and Townsend 2005Go). Deep phylogenies are especially prone to poor resolution due to sequence divergence. In particular, although 1 set of homologous genes or proteins may be quite useful in resolving species or genus-level relationships, it might be quite poor at resolving phylum-level relationships due to poor conservation or short sequence length.

In this work, we take a new approach integrating clustering and sequence analysis toward resolving an integrated phylogeny spanning multiple taxonomic levels within a single phylum. Using all genomes available from a single phylum, our approach combines the rigorous (maximum likelihood) analysis of large numbers of orthologs, as well as of concatenated sets of up to several hundred proteins representing a large fraction of some genomes, and of consensus phylogenies based on single-protein trees. The ultimate goal is to determine, given the known role of horizontal gene transfer particularly in prokaryote evolution as well as the difficulty in resolving deep phylogenies, whether a plurality phylogenetic signal exists that is both consistent with, and potentially explanatory toward, systematic and taxonomic information about a group of organisms.

This phylum first approach is well suited to the >103 ongoing genome projects, for several reasons. First, most phyla appear to be robustly defined based both on molecular methods, especially 16S, and on traditional systematics. Organisms within a phylum typically share unique phenotypic traits that are variable enough to be both interesting and informative of the evolutionary process. Second, by focusing first on resolving the distribution and phylogeny of single proteins, it is possible to select for subsequent analysis those that are potentially most useful in resolving relationships at different taxonomic levels. For example, many proteins are not common to all organisms within a clade and would be excluded from analyses of completely conserved, or "core," proteins, whereas they might be useful for determining relationships between subsets of organisms. Additionally, depending on factors such as length and degree of conservation, some proteins give well-resolved trees for only some taxonomic levels. Ribosomal proteins often share 100% amino acid identity—and are thereby phylogenetically uninformative—between members of the same genus or species.

Working with a single phylum (as opposed to, say, all 3 domains of life) also prevents data sets from becoming computationally intractable, especially when employing maximum likelihood–based approaches. This methodology can also be naturally extended into different taxonomic levels. Whereas some subset of proteins may be useful for resolving relationships within phyla, when needed, additional proteins can be incorporated for reconstructing family-, class-, or genus-level relationships by selecting only those proteins conserved at these taxonomic levels. Understanding which proteins are adequate at resolving different taxonomic levels enables selection of proteins that are useful in determining relationships between phyla—an ultimate goal (and persistent shortcoming) in reconstructions of the tree of life.

As an introductory example, we focus on the phylum cyanobacteria, which is notable for sequencing projects covering a wide swath of their enormous diversity as well as for their evolutionary importance and time constraints on their early evolution. The most ancient diagnostic markers for any organism come in the way of chemical biomarkers argued to have been left by cyanobacterial ancestors some 2.7 billion years ago, and the global-scale effects resulting from the oxygen produced during cyanobacterial photosynthesis are seen in rocks ~2.43 billion years old and younger (Summons et al. 1999Go; Farquhar et al. 2000Go; Knoll 2003Go; Kopp et al. 2005Go). Ongoing and completed sequencing projects include cyanobacteria from marine and freshwater environments, thermophiles, nitrogen fixers, and symbionts. In addition to illustrating the robust evolutionary resolution acquired using our method, we also seek to build a growing phylogenetic framework upon which the evolution of this phenotypically diverse group of organisms is based.

The long history of cyanobacterial systematics has been confounded by morphology-based botanical classifications as well as difficulties in resolving closely related species using 16S rRNA (Rippka et al. 1979Go; Fox et al. 1992Go; Castenholz 2001Go; Casamatta et al. 2005Go). Individual genes and proteins conserved across all organisms or specifically in all cyanobacteria have been used to build phylogenies (Woese 1987Go; Giovannoni et al. 1988Go; Honda et al. 1999Go; Hess et al. 2001Go; Seo and Yokota 2003Go; Henson et al. 2004Go). Some subsets of cyanobacteria have also been compared extensively, particularly within the (genomically) well-sampled Prochlorophyte clade (Hess 2004Go; Dufresne et al. 2005Go). However, only a few studies thus far have assembled cyanobacterial phylogenies based on a larger set of proteins conserved across all cyanobacteria. Martin et al. (2002)Go examined several thousand genes from 3 then-available cyanobacteria to determine the evolutionary history of nuclear genes from Arabidopsis thaliana, establishing the widescale impact that imported cyanobacterial genes have had on the evolution of photosynthetic eukaryotes, as well as plausible gene complements of chloroplast/cyanobacterial ancestors. A Blast-based comparison of the genomes of 8 cyanobacterial genomes by (Martin et al. 2003Go) revealed 181 signature genes that do not have homologs in other organisms, roughly 3/4 of which had no ascribable function yet are clearly important in some aspect of cyanobacterial lifestyle. Sanchez-Baracaldo et al. (2005)Go more recently developed a method based on multigene concatenation combined with morphological character analysis to construct and map traits onto a cyanobacterial species tree. Additionally, a cyanobacterial phylogeny based on 31 proteins conserved across the entire tree of life was constructed as part of a large-scale tree construction (Ciccarelli et al. 2006Go), but this study used only 8 cyanobacterial taxa and the ribosomal proteins used for tree construction did not resolve terminal branches. A cluster of orthologous groups (COG)–based analysis was used to determine the distribution of proteins in 15 complete cyanobacterial genomes, with a particular focus on understanding the origin of photosynthesis (Mulkidjanian et al. 2006Go). However, the analysis did not undertake phylogenetic analysis, either of individual protein families or in an attempt to resolve the evolution of the phylum as a whole. Zhaxybayeva et al. (2006)Go have conducted the most extensive sampling of the phylum to date, reconstructing histories of 1,128 protein-coding genes from 11 cyanobacterial genomes in order to reconstruct a plurality tree based on quartet analysis (Zhaxybayeva et al. 2006Go).

In addition to constructing maximum likelihood trees for a large number of orthologs from completed cyanobacterial genomes, we assembled concatenated alignments as a further test of phylogenetic robustness. Importantly, variations in the concatenated alignment used resulted in 2 distinct but very highly supported phylogenies, suggesting that even large, statistically well-supported concatenations can converge on very different trees. To further test phylogenetic robustness, we used a tree consensus method to build a single tree that best captures all single-protein phylogenies. Recent work (Gadagkar et al. 2005Go) has compared the effectiveness of concatenated versus consensus methods for phylogenetic inference in the face of incongruent signals (e.g., due to horizontal gene transfer, poor resolution, invalid model assumptions, or use of the same model for all data sets). They found that concatenated phylogenies outperform consensus phylogenies, though importantly both methods can converge on incorrect trees when systematic biases are present in individual trees—for example, when the evolutionary model used is a poor match to the data. However, our consensus tree agrees exactly with one of the trees inferred from concatenated alignments, compares the results of multiple evolutionary models, and also is compatible with modern cyanobacterial classification schemes that integrate both systematic and molecular information.

To further test, and potentially increase, resolution of individual nodes on our concatenated/consensus genome tree, we used a telescoping method whereby protein families that are conserved among a smaller number of very closely related taxa can be taken into account. This proved useful particularly in resolving relationships between the very closely related marine Synechococcus and Prochlorococcus clades, which were clarified with exceptional support by analyzing conserved protein families between just these 2 groups. In cases, such as these, the inverse relationship between the number of conserved protein families and the number of taxa tends to yield a uniform total number of phylogenetically informative characters.

Finally, using these methods to model cyanobacterial speciation provides a framework for understanding and explaining the distribution of cyanobacterial protein families. Generating a robust "background" tree is crucial for framing key evolutionary events, such as the origin and evolution of capabilities such as pigment biosynthesis, carbon and nitrogen fixation, and provides insight into fundamental evolutionary mechanisms such as niche adaptation, genome reduction, and horizontal gene transfer. This approach can be similarly extended to other phyla to provide a high-resolution framework, based on the totality of evolutionary information from many protein families, which can be linked together to assemble the tree of life.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
All data are publicly available in the way of completed or nearly complete genome sequences (table 1). The pipeline of methods used is diagrammed in figure 2. BlastP comparisons (10–4 cutoff, BLOSUM62, standard settings for word size, gap opening/extension, and filtering) were made between all protein sequences from the genomes of 24 cyanobacteria and 2 non-cyanobacterial outgroups (see table 1), representing all complete plus diverse set of nearly complete cyanobacteria, and outgroups from well-sampled bacterial phyla (proteobacteria and Gram positive bacteria). To generate first-pass protein families, Markov clustering (Enright et al. 2002Go) was performed iteratively on a matrix generated from Blast e values. To optimize clustering results, inflation parameters ranging from 1.2 to 20.0 were used, with resultant protein family/cluster size distributions given in supplementary table S2 (Supplementary Material online). An inflation parameter of 2.8 yielded the highest number of protein families with single orthologs from all or most (>21) of the 24 cyanobacterial genomes (445 total), as well as families with no more than 2 paralogs (178 total). Note that the smallest cyanobacterial genome (Prochlorococcus sp. MED4) analyzed contains 1,809 proteins; this level of filtering captures nearly 34% of that genome for further phylogenetic analysis. Despite the large number of clusters involved, the Markov clustering method is quite fast (<10 min on a 32-bit/2 GHz AMD desktop PC) and has been argued to have advantages over, for example, COG-based protein family assignment (Harlow et al. 2004Go). Ultimately, the end goal of these and other clustering methods is identical—to assemble proteins from complete genomes into groups of evolutionarily related orthologs—and no matter what heuristic is used, curation is a necessary part of the process. Following clustering, all families were then multiply aligned using ClustalW (Gonnet protein weight matrix, default gap opening/extension penalties) on an MPI-enabled 18 processor AMD Athlon cluster (all pre- and postcurated protein family alignments, as well as scripts for using cluster results for translating complete genomes into protein families, are freely available on request).


View this table:
[in this window]
[in a new window]

 
Table 1 Genomes Analyzed in This Study

 

Figure 2
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— Bayesian maximum likelihood tree for the full-concatenated data set, based on 230,415 aligned positions in 26 genomes. Note the strong agreement with the consensus tree from figure 2, as well as the presence of non-cyanobacterial outgroups that support Gloeobacter violaceus as an early-branching cyanobacterium. The scale bar indicates the number of substitutions per site. Shown at each bifurcation are the predicted core-genome (upper number) and pan-genome (lower number) sizes of an ancestor at that point. The core-genome represents the intersection of all protein families in all progeny of an ancestor, whereas the pan-genome represents the union of all protein families in those progeny (the 2 numbers converge at the tips of the tree).

 
Using the multiple alignments and corresponding Neighbor-Joining trees generated by ClustalW as a guide, protein families were then manually checked for poor alignments and/or long-branch lengths, with poorly aligned sequences and/or poorly assembled protein families either corrected or removed. Most frequently, these differences involved inclusion of a paralog in a protein family, which can be easily detected based on the number of homologs per organism or, often, the presence of long branches in the phylogeny. As depicted in table 2, these curated protein families were then parsed using various filters, for example, selecting protein families present in all or most cyanobacteria, any imaginable subset of organisms, or by selecting protein families that all share a common function or annotation. The full-protein family spreadsheet is available as supplementary table S2 (Supplementary Material online).


View this table:
[in this window]
[in a new window]

 
Table 2 Parsing Protein Families Based on Different Criteria*

 
In addition to the distance-based trees generated during multiple alignment, phylogenies based on single-protein families were generated for every aligned protein family using 2 different maximum likelihood methods. The first approach used PHYLIP's ProML package with the following parameters: JTT probability model, one category of sites with constant rate, and with randomized input order (Felsenstein 1989Go). Additionally, a second, quartet-based maximum likelihood approach was used with the parallelized version of iqpnni, here using the Whelen and Goldman substitution model and estimating a gamma parameter with 4 rate categories (Minh et al. 2005Go). PHYLIP's CONSENSE package was used to generate extended majority rule consensus phylogenies for each separate set of phylogenies (distance and both ML runs; iqpnni run results shown in fig. 2).

Concatenated multiple alignments were generated by end-to-end attachment of individual protein families, using gaps as placeholders for species missing a particular ortholog. As an additional test of robustness, variable/uninformative positions were filtered out of these concatenated alignments using progressively more stringent Shannon information entropy cutoffs (SIE 1.0–3.0) and filtering out positions with >50% gaps. The resulting concatenated alignments from all 26 genomes ranged from 28,281 (SIE 1.0) to 230,415 (full/unfiltered concatenation) aligned amino acid positions and contained up to 300,000 aligned positions in the case of the Prochlorococcus/Synechococcus-conserved protein families (fig. 4). PHYLIP ProML and Neighbor-Joining phylogenies were then constructed for each of these filtered concatenated alignments to determine the effect of removing gaps and progressively more variable sites from alignments (see e.g., discussion of difference in support for the Prochorococcus/Synechococcus clade in the main text).


Figure 4
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— Using phylogeny and the distribution of protein families in different genomes to infer ancestral characteristics. As illustrated in the diagram, each bifurcation represents an ancestor whose core-genome contains the protein families found in every one of its descendents (the intersection of descendent genomes), whereas the pan-genome contains all proteins families found in all descendents (the union of descendent genomes).

 
Our final goal was to test the effect of correcting for site heterogeneity in concatenated alignments by incorporating a gamma parameter, rather than strictly filtering out variable regions of alignments. The size and associated memory requirements of inferring gamma corrected phylogenies for these concatenated alignments required they be analyzed using MrBayes (Huelsenbeck and Ronquist 2001Go). MrBayes was run using the VT evolutionary model, incorporating a gamma parameter sampled from 4 rate categories, with the substitution model analyzed over 20,000–30,000 generations in 4 separate runs and a 1,000 generation burn-in. Because of the large data set and number of free parameters in the model, MrBayes required a 64bit dual CPU system with 8 GB RAM. Although this limited the number of generations and discrete chains, in all cases topological convergence to the consensus phylogeny was achieved within 5,000–8,000 generations and was maintained throughout all the remaining runs. In addition, this topology was also observed in phylogenetic inference using the Neighbor-Joining algorithm as implemented in MEGA v3.0, using multiple models and incorporating a gamma parameter (Kumar et al. 1994Go), and (topologically) agreed with the trees obtained using PHYLIP's ProML on entropy-filtered concatenated alignments, as discussed in the main text. All concatenated alignments and phylogenies are available upon request.

Tree comparisons used PHYLIP's consense, using the extended majority rule method and both the symmetric (Robinson–Foulds) and branch score distance metrics. Comparisons also included 50 trees comprised of the same cyanobacterial taxa arranged in randomized topologies. As illustrated in figure 5, core- and pan-genome numbers are determined for a specific rooted phylogeny by 1) counting the number of protein families conserved within all descendents of a particular node in the tree (core) and 2) counting the total number of protein families present in the descendents of a particular node in the tree (pan).


Figure 5
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— (a) Possible scenarios for the distribution of nitrogen fixation in cyanobacteria, contrasting convergence versus gene loss as suggested by the protein family composition of core- and pan-ancestral genomes. The nitrogen fixation pathway is found in 7 genomes (black + next to species name). Pan-ancestral genome data posit nitrogen fixation arose before the cyanobacterial common ancestor and many of its descendents (black dots) but were subsequently lost in many lineages (black x's). Core composition of ancestral genomes suggests that the ability to fix nitrogen appeared 3 independent times (gray +'s; gray boxes indicate ancestral nodes where N2-fixation was present). (b) The phylogenetic tree from protein family 1574—the catalytic molybdenum–iron subunit of the nitrogenase complex (see e.g., table 2). Though many lineages are missing, the species present have a similar phylogeny as observed in the species tree, suggesting largely vertical evolution with multiple gene losses. The exception is the distinct position of Trichodesmium erythraeum—not closely related to the Nostocales as in 5a, suggesting that horizontal gene transfer may have been important early in the evolution of cyanobacterial nitrogen fixation.

 

    Results and Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
The 24 genomes analyzed here represent all cyanobacteria with either complete or very nearly complete sequencing projects and encompass nearly 94,000 protein-coding genes. Homology-based Markov clustering resulted in 7,378 families of proteins present in more than 1 cyanobacterium (an additional 12,955 protein families were found only in a single cyanobacterium). Many of these families include multiple, often closely related paralogs. For example, the D1 and D2 proteins of the photosystem II reaction center complex are members of the same family, and ABC transporter and serine/threonine kinase paralogs are quite extensive even in the smallest cyanobacterial genomes. To avoid problems associated with inclusion of paralogs in phylogenies, initial analysis focused on families with few or no paralogs present in most or all cyanobacteria, which includes housekeeping proteins common to most organisms as well as cyanobacterial-specific proteins that have been important during their evolution and early diversification.

Following the initial clustering, 613 protein families fit the criterion of being absent in not more than 2 cyanobacteria and having not more than 2 paralogs in total for all organisms. Alignments and Neighbor-Joining phylogenies for all families were manually checked, and poorly aligned proteins (as well as those with disproportionately long-branch lengths; for details, see Methods) were removed from alignments or else the family was removed from the analysis. A total of 583 protein families remained after this manual curation. Here we focus on a substantial number of relatively easily obtained families of orthologs, selected by a fast clustering approach that minimizes the number of paralogs while maximizing the total number of genomes represented in a given protein family (see supplementary table S1, Supplementary Material online).

Phylogenies for each of the 583 families were constructed using 2 different implementations of the maximum likelihood method (PHYLIP and quartet-based iqpnni; see Methods). A total of 438 of these families—those comprised strictly of orthologs—were then used to generate a consensus phylogeny that portrays the bifurcations that occur most frequently across all trees (fig. 2). For example, both the marine Synechococcus/Prochlorococcus (11 organisms) and the Synechococcus sp. A and B' clusters are conserved in every tree generated, and the Nostocales clade is observed in 421 of 438 trees. Importantly, only minority support is observed for several nodes on the tree, especially among the cyanobacteria often argued as among the earliest branching (Gloeobacter)—which may indeed reflect asymmetric rates of evolution—as well as for some members of the Prochlorococcus lineages, which recent studies suggest may result from horizontal gene transfer (Beiko et al. 2005Go). The ability to detect this phylogenetic incoherence is a crucial step in being able to segregate both protein families and organisms that are responsible. An attractive, iterative approach would take these into account by fine-tuning parameters of ascribed evolutionary models or progressively removing "difficult" protein families from tree-building methods that rely on combined data sets.

This consensus phylogeny gives a straightforward method for finding putative horizontal gene transfer events and indicates that gene transfer "across" the tree, that is, between Prochlorococcus/marine Synechococcus and cyanophytes, is very rare among this particular subset of proteins. Note that as these proteins are common to almost all cyanobacteria, a very specific type of horizontal gene transfer—orthologous gene replacement—must occur, whereby a newly transferred gene displaces a functional wild-type gene. Importantly, though recent evidence indeed supports an important role for horizontal gene transfer among cyanobacteria (Zhaxybayeva et al. 2006Go), simulations suggest that these phylogenetic signals are not self-reinforcing and, even when corrections are not made for variations in evolutionary rate or composition, convergence to the true tree is frequently observed (Gadagkar et al. 2005Go). Indeed, Zhaxybayeva et al. (2006)Go obtained a plurality tree based on quartet reconstruction with which the consensus and concatenated trees presented here are consistent.

In addition to individual and consensus phylogenies, all alignments without paralogs were concatenated into a single large alignment containing 230,415 positions encompassing 26 organisms. Smaller alignments were generated from this full alignment using a Shannon information entropy–based filter (Reche and Reinherz 2003Go) to remove phylogenetically uninformative (too variable or too conserved) sites from the alignment. Shannon entropy can be calculated for each position in an alignment and provides a more robust method for parsing informative positions from alignment than simply culling positions that fall below a given percentage identity or similarity. For example, a position in a protein sequence alignment might have 1 amino acid in half of the sequences and a different amino acid in the other half. If a percentage-based cutoff were used, this position would contain the same informative value as one where half the positions were 1 amino acid and the other half were all different amino acids. However, the Shannon entropy score of these 2 examples is quite different and, furthermore, is conceptually similar to maximum likelihood calculations. Phylogenies for all concatenated alignments were generated as discussed in the methods and showed overall agreement with one another, with one notable exception—differing levels of filtering (Shannon entropy cutoff values ranging from 1 to 4, where 0 is an invariant site and 4.322 is a site where all 20 amino acids are equally represented) resulted in 2 distinct trees differing by monophyly of the Prochlorococcus/Synechococcus clades. One of the trees—shown in figure 3—was converged upon from multiple MrBayes runs using the full/unfiltered data set. This tree is characterized by separate/monophyletic Prochlorales (the order containing Prochlorococcus species) and marine Synechococcus clades, with Synechococcus sp. strain WH 5701 basal to both groups, a topology supported in previous single-gene trees (Rocap et al. 2002Go; Scanlan 2003Go). Notably, this tree was in almost exact agreement with the consensus phylogeny generated from 438 trees (with the exception of the poorly supported Acaryochloris marina/Thermosynechococcus elongatus clade, resolved as 2 distinct lineages in the concatenated tree).


Figure 3
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Maximum likelihood phylogeny of 848 concatenated orthologous protein families (~290,000 aligned amino acid positions) common to 11 Prochlorococcus and marine Synechococcus genomes. By incorporating a larger number of protein families shared in a smaller number of closely related organisms, we find strong support for 1 of 2 topologies found in 26 genome trees, effectively improving the resolution of the consensus tree. The alternative topological position of Synechococcus sp. WH5701, observed in some filtered concatenated trees as discussed in the text, is illustrated by the dashed line.

 
Although the observed convergence to a single tree from 2 different approaches lends support to this as the true tree, the fact that a different tree was inferred from some filtered concatenated alignments underscores the importance of using multiple methods of analysis to infer phylogenies. Shannon entropy presents a metric for pruning highly variable (less phylogenetically informative) positions from long alignments, making phylogenetic analysis more tractable. However, care must be taken that evolutionary models are compared each time a data set is filtered as it is feasible that the best model can change as positions are pruned from an alignment. Even character-rich data sets can be prone to error, in particular when they contain multiple phylogenetic signals or include highly divergent or deeply branching organisms (Mossel and Steel 2006Go).

As is evident in figures 3 and 4, order Prochlorales shows anomalously long-branch lengths, evident both in individual as well as concatenated phylogenies, that may account for the alternative topology seen in some filtered concatenated phylogenies (this alternate topology is illustrated by the dashed line in fig. 4). However, one of the trees is converged to in both concatenated and consensus phylogenies, lending support to this as the true tree.

As a further test, we demonstrate one of the advantages of our approach by incorporating additional information from protein families excluded from the initial analysis because they were not present in most or all cyanobacteria. Specifically, 1,108 protein families are found in all Prochlorococcus and marine Synechococcus species (including WH 5701). A total of 848 of these families have no paralogs within either of these clades, and so individual and consensus/concatenated phylogenies can be generated for this Prochlorococcus/Synechococcus-specific subset of families. As shown in figure 4, phylogeny based on 848 concatenated protein families (287,466 aligned positions in 11 Prochlorococcus/Synechococcus genomes) supports a branching order in agreement with both the consensus and fully concatenated data sets. Moreover, the resulting phylogeny also retains the relatively long-branch lengths characteristic of several members of the prochlorales clade, suggesting that an accelerated substitution rate across many proteins has accompanied genome reduction. Prochlorococcus genome analyses have observed this long-branch effect, which is likely due to loss of several DNA repair capabilities during genome reduction (Dufresne et al. 2005Go).

The single phylogeny converged upon by multiple methods used herein also provides a framework for understanding the distribution of protein families at each ancestral node on the tree (Martin et al. 2002Go; Eisen and Fraser 2003Go; Lerat et al. 2003Go). As shown in figure 3, the common ancestor of all cyanobacteria is inferred to have had a conserved core of 361 protein families as these are present in the full set of 26 genomes analyzed. A total of 675 proteins (within which the 361 are nested) are common to all 24 cyanobacterial genomes analyzed, though as mentioned, many of these families contain paralogs and so were excluded from this analysis. These families represent a widely conserved core of housekeeping proteins common not only across known cyanobacterial diversity but also present to some extent in non-cyanobacterial genomes. Furthermore, the total diversity of modern cyanobacterial protein families—the union of all protein families in all progeny of an ancestor—is inferred to be just over 20,000 proteins for the cyanobacterial common ancestor and 25,292 when including the non-cyanobacterial outgroups. This is referred to as the cyanobacterial pan-genome (which must be emphasized never actually existed but simply captures the extent of protein family variability across the phylum), illustrated along with the core-genome concept in figure 5. These pan- and core-genome numbers provide upper and lower bounds on protein family distributions at each node in a given phylogeny and are not parsimony-based estimates of the true genetic content of ancient organisms.

The core-genome at the base of the cyanobacterial phylum encompasses most of the major proteins of the photosynthetic apparatus, suggesting that oxygenic photosynthesis evolved prior to or early in the cyanobacterial radiation. This is in stark contrast with the ability to fix nitrogen, which is found paraphyletically throughout the cyanobacterial tree (illustrated in fig. 6a—N2-fixing lineages denoted by "+"). The nodes where nitrogen fixation is inferred—that is, whose descendent lineages all fix nitrogen—occur at multiple points across the tree (gray squares on fig. 6a) so that gene loss, horizontal gene transfer, or some combination of these processes must be invoked to explain the distribution of nitrogen fixation. The strength of having both combined and individual phylogenies comes from the capability to contrast the background tree of cyanobacterial speciation (figs. 2 and 3) with the evolutionary tree for nitrogenase. For example, based on the species tree, one plausible scenario is that nitrogenase was acquired on independent occasions within cyanobacterial lineages (e.g., through horizontal gene transfers would be required at the gray +'s in fig. 6a), followed by largely vertical evolution to result in the observed distribution in the phylum. Alternatively (and arguably less parsimoniously), one could posit that the ancestor of all cyanobacteria had the capability to fix nitrogen but that the nitrogenase evolutionary history has since been dominated by gene loss. This scenario begins with nitrogen fixation in the hypothetical pan-genome and is followed by multiple independent losses, shown as x's on figure 6a.


Figure 6
View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 6.— Change in core- or pan-genome size at increasing evolutionary distances for the cyanobacterial tree in figure 3. The black dots (left axis) indicate the core-genome size versus evolutionary distance between all pairwise combinations of the 26 genomes analyzed. The black line shows a single exponential fit (r2 = 0.802). The gray dots (right axis) give the same information for the pan-genome size.

 
By examining phylogenies of individual protein families, for example, that of the NifD (nitrogen fixation catalytic subunit) protein family shown in figure 6b, we can explore whether one of these scenarios is indeed more parsimonious than the other or if some combination of the 2 is more likely. The NifD tree (fig. 6b) shows some congruence with the cyanobacterial species tree (fig. 6a) but provides an important example of the complex history of protein families, often overlooked or not accurately captured in species trees. As well as supporting numerous gene losses, the NifD tree shows evidence for several gene duplications and plausible horizontal gene transfer, as suggested by the position of Trichodesmium erythraeum, comprising the earliest cyanobacterial branch among NifD proteins (though note poor bootstrap support makes it difficult to resolve this from the Synechococcus sp. A/B' divergence). At face value, this indeed suggests a combination of vertical evolution and gene loss accounts for the distribution of nitrogen fixation in cyanobacteria, with evidence for horizontal gene transfer as well as duplication in several lineages.

As with this truth-is-in-between example, the cyanobacterial ancestor would have had a genome content somewhere between the core- and pan-ancestral extremes, with functions and capabilities that, as demonstrated above, can be understood through examining of individual phylogenies. In a broader sense, the range established by ancestral core- and pan-genomes gives insight into the relative importance of genome reduction versus the evolution or acquisition of new genes and helps constrain the appearance of phenotypes specific to individual organisms or clades. This approach is extended to several other pathways of key importance to cyanobacterial evolution, such as carbon fixation and pigment biosynthesis, in Swingley et al. (2007)Go. As shown in figure 7, the increasing size of the core-genome between any 2 organisms shows strong inverse correlation with their phylogenetic distance, whereas the pan-genome size shows only weak correlation. This results mainly because of the presence of novel/orphan genes that distinguish even closely related genomes, such as the 2 Synechococcus elongatus strains with 2,219 shared protein families.


Figure 7
View larger version (35K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 7.— Diagram of the steps involved in going from complete genomes to phylogenetic analysis, as detailed in the Methods.

 
As mentioned above, the major elements of the cyanobacterial species tree find strong support in other analyses, coming both from systematics and molecular analyses. This includes: monophyly of heterocystous diazotrophs with the nonheterocystous diazotroph Trichodesmium erythreum as an outgroup (Sanchez-Baracaldo et al. 2005Go); the sister relationship and monophyly of marine Synechococcus and prochlorales (with Synechococcus sp. WH5701 basally branching) (Scanlan 2003Go) and a more deeply branching group of freshwater Synechococcus (PCC6301 and 7942) (Giovannoni et al. 1988Go; Honda et al. 1999Go); the cluster of Synechocystis sp. PCC6803 and Crocosphaera watsonii (Sanchez-Baracaldo et al. 2005Go); and evidence for Gloeobacter violaceus as an early-branching cyanobacterium (Nelissen et al. 1995Go), though intriguingly 2 thermophilic, N2-fixing Synechococcus strains also branch very deeply (Ferris et al. 1996Go). Note that this approach, like any, is subject to biases in ongoing sequencing projects and is therefore missing several important cyanobacterial taxonomic groups; however, it establishes a framework for incorporating further genomic data as well as expanding individual protein families with sequence data from public databases. This also provides a straightforward approach with which to target sequencing strategies toward organisms that will most improve phylogenetic resolution.

The phylogenies presented here integrate a large amount of genomic data from all completed, as well as a few nearly complete, cyanobacterial genomes. The fact that concatenated and consensus phylogenies from as many as 583 proteins converge on nearly identical topologies that agree with earlier systematic and molecular approaches suggests that this tree represents an accurate, though averaged, history of cyanobacterial speciation. Moreover, phylogenies from individual protein families are retained and can be selected and contrasted based on overall resolution, taxonomic distribution, degree of orthology versus paralogy, or various functional or pathway-associated criteria (e.g., table 2). Though attempting to resolve organismal evolution as a single phylogenetic tree invariably ignores the rich histories of single genes, here we have emphasized how organismal history can be understood at one level by integrating the information present in diverse genes and on additional levels by contrasting that integrated tree with individual phylogenies.

This telescoping approach to phylogenetic reconstruction—incorporating data from protein sequences at multiple taxonomic levels of conservation—can be used to refine evolutionary trees at different levels of phylogenetic resolution. Furthermore, inference of robust phylogenies stands as a primary technique by which horizontal gene transfer can be detected (and then be subtracted from consensus data sets). As genome data continue to fill out the branches of the tree of life, this approach will become increasingly useful as it provides a way to incorporate, compare, and contrast entire genomes' worth of sequence data, without ignoring information from individual genes or proteins.


    Accession Numbers
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
Accession numbers for genomes used in this study are given in table 1.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
Supplementary figure S1 and tables S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


Figure 1
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— Consensus cyanobacterial phylogeny based on maximum likelihood trees for each of 438 orthologous protein families. The numbers at each bifurcation indicate the total number of trees where that exact bifurcation/branching order is observed; for example, all 438 trees have Synechococcus sp. A and B' as closest neighbors and also cleanly distinguish marine Synechococcus and Prochlorococcus from all other cyanobacteria. Note that as this is a consensus tree, topology is meaningful but distances are not (as opposed to fig. 3, where tree distances are still meaningful).

 


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 
The authors wish to thank Jeff Touchman and the DNA sequencing team at the Translational Genomics Institute for making available sequence data for Acaryochloris marina. The authors also acknowledge very helpful discussions and suggestions from Carrine Blank and Elbert Branscomb. The A. marina genome project is funded by grant 0412824 from the National Science Foundation Microbial Genome Sequencing Program (http://genomes.tgen.org/). R.B. acknowledges additional support from grant NNG04GK59G from the Exobiology Program at the National Aeronautics and Space Administration. J.R. acknowledges support through a Lawrence Postdoctoral Fellowship at Lawrence Livermore National Laboratory.


    Footnotes
 
Takashi Gojobori, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Accession Numbers
 Supplementary Material
 Acknowledgements
 References
 

    Beiko RG, Harlow TJ, Ragan MA. Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA (2005) 102:14332–14337.[Abstract/Free Full Text]

    Casamatta DA, Johansen JR, Vis ML, Broadwater ST. Molecular and morphological characterization of ten polar and near-polar strains within the Oscillatoriales (cyanobacteria). J Phycol (2005) 41:421–438.[CrossRef][Web of Science]

    Castenholz RW. Phylum BX. Cyanobacteria. Oxygenic photosynthetic bacteria. In: Bergey's manual of systematic bacteriology. Volume 1: the Archaea and deeply branching and phototrophic Bacteria—Boone DR, Castenholz RW, eds. (2001) New York: Springer-Verlag. 413–439.

    Charlebois RL, Beiko RG, Ragan MA. Microbial phylogenomics: branching out. Nature (2003) 421:217.[CrossRef][Medline]

    Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science (2006) 311:1283–1287.[Abstract/Free Full Text]

    Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet (2005) 6:361–375.[Web of Science][Medline]

    Doolittle RF. Evolutionary aspects of whole-genome biology. Curr Opin Struct Biol (2005) 15:248–253.[CrossRef][Web of Science][Medline]

    Dufresne A, Garczarek L, Partensky F. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol (2005) 6:R14.[CrossRef][Medline]

    Eisen JA. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res (1998) 8:163–167.[Free Full Text]

    Eisen JA, Fraser CM. Phylogenomics: intersection of evolution and genomics. Science (2003) 300:1706–1707.[Abstract/Free Full Text]

    Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res (2002) 30:1575–1584.[Abstract/Free Full Text]

    Farquhar J, Bao H, Thiemens M. Atmospheric influence of earth's earliest sulfur cycle. Science (2000) 289:756–759.[Abstract/Free Full Text]

    Felsenstein J. PHYLIP—Phylogeny inference package (Version 3.2). Cladistics (1989) 5:164–166.

    Ferris MJ, Ruff-Roberts AL, Kopczynski ED, Bateson MM, Ward DM. Enrichment culture and microscopy conceal diverse thermophilic Synechococcus populations in a single hot spring microbial mat habitat. Appl Environ Microbiol (1996) 62:1045–1050.[Abstract]

    Fox GE, Wisotzkey JD, Jurtshuk P Jr. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. Int J Syst Bacteriol (1992) 42:166–170.[Abstract/Free Full Text]

    Gadagkar SR, Rosenberg MS, Kumar S. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J Exp Zoolog B Mol Dev Evol (2005) 304:64–74.[Medline]

    Giovannoni SJ, Turner S, Olsen GJ, Barns S, Lane DJ, Pace NR. Evolutionary relationships among cyanobacteria and green chloroplasts. J Bacteriol (1988) 170:3584–3592.[Abstract/Free Full Text]

    Gogarten JP, Townsend JP. Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol (2005) 3:679–687.[CrossRef][Web of Science][Medline]

    Harlow TJ, Gogarten JP, Ragan MA. A hybrid clustering approach to recognition of protein families in 114 microbial genomes. BMC Bioinformatics (2004) 5:45.[CrossRef][Medline]

    Henson BJ, Hesselbrock SM, Watson LE, Barnum SR. Molecular phylogeny of the heterocystous cyanobacteria (subsections IV and V) based on nifD. Int J Syst Evol Microbiol (2004) 54:493–497.[Abstract/Free Full Text]

    Hess WR. Genome analysis of marine photosynthetic microbes and their global role. Curr Opin Biotechnol (2004) 15:191–198.[CrossRef][Web of Science][Medline]

    Hess WR, Rocap G, Ting CS, Larimer F, Stilwagen S, Lamerdin J, Chisholm SW. The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics. Photosynth Res (2001) 70:53–71.[CrossRef][Web of Science][Medline]

    Honda D, Yokota A, Sugiyama J. Detection of seven major evolutionary lineages in cyanobacteria based on the 16S rRNA gene sequence analysis with new sequences of five marine Synechococcus strains. J Mol Evol (1999) 48:723–739.[CrossRef][Web of Science][Medline]

    Huelsenbeck JP, Ronquist F. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics (2001) 17:754–755.[Abstract/Free Full Text]

    Knoll AH. The geological consequences of evolution. Geobiology (2003) 3–14.

    Kopp RE, Kirschvink JL, Hilburn IA, Nash CZ. The paleoproterozoic snowball earth: a climate disaster triggered by the evolution of oxygenic photosynthesis. Proc Natl Acad Sci USA (2005) 102:11131–11136.[Abstract/Free Full Text]

    Kumar S, Tamura K, Nei M. MEGA: molecular evolutionary genetics analysis software for microcomputers. Comput Appl Biosci (1994) 10:189–191.[Abstract/Free Full Text]

    Lerat E, Daubin V, Moran NA. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biol (2003) 1:E19.[Medline]

    Martin KA, Siefert JL, Yerrapragada S, Lu Y, McNeill TZ, Moreno PA, Weinstock GM, Widger WR, Fox GE. Cyanobacterial signature genes. Photosynth Res (2003) 75:211–221.[CrossRef][Web of Science][Medline]

    Martin W, Rujan T, Richly E, Hansen A, Cornelsen S, Lins T, Leister D, Stoebe B, Hasegawa M, Penny D. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA (2002) 99:12246–12251.[Abstract/Free Full Text]

    Minh BQ, Vinh le S, von Haeseler A, Schmidt HA. pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics (2005) 21:3794–3796.[Abstract/Free Full Text]

    Mossel E, Steel M. How much can evolved characters tell us about the tree that generated them? In: Mathematics of evolution and phylogeny—Gascuel O, ed. (2006) Oxford: Oxford University Press. 384–412.

    Mulkidjanian AY, Koonin EV, Makarova KS, et al, (12 co-authors). The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci USA (2006) 103:13126–13131.[Abstract/Free Full Text]

    Nelissen B, Van de Peer Y, Wilmotte A, De Wachter R. An early origin of plastids within the cyanobacterial divergence is suggested by evolutionary trees based on complete 16S rRNA sequences. Mol Biol Evol (1995) 12:1166–1173.[Abstract]

    Reche PA, Reinherz EL. Sequence variability analysis of human class I and class II MHC molecules: functional and structural correlates of amino acid polymorphisms. J Mol Biol (2003) 331:623–641.[CrossRef][Web of Science][Medline]

    Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY. Generic assignments, strain histories and properties of pure cultures of cyanobacteria. J Gen Microbiol (1979) 111:1–61.[Abstract/Free Full Text]

    Rivera MC, Lake JA. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature (2004) 431:152–155.[CrossRef][Medline]

    Rocap G, Distel DL, Waterbury JB, Chisholm SW. Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol (2002) 68:1180–1191.[Abstract/Free Full Text]

    Sanchez-Baracaldo P, Hayes PK, Blank CE. Morphological and habitat evolution in the cyanobacteria using a compartmentalization approach. Geobiology (2005) 3:145–165.[CrossRef]

    Scanlan DJ. Physiological diversity and niche adaptation in marine Synechococcus. Adv Microb Physiol (2003) 47:1–64.[Web of Science][Medline]

    Seo PS, Yokota A. The phylogenetic relationships of cyanobacteria inferred from 16S rRNA, gyrB, rpoC1 and rpoD1 gene sequences. J Gen Appl Microbiol (2003) 49:191–203.[CrossRef][Medline]

    Snel B, Huynen MA, Dutilh BE. Genome trees and the nature of genome evolution. Annu Rev Microbiol (2005) 59:191–209.[CrossRef][Web of Science][Medline]

    Summons RE, Jahnke LL, Hope JM, Logan GA. 2-Methylhopanoids as biomarkers for cyanobacterial oxygenic photosynthesis. Nature (1999) 400:554–557.[CrossRef][Medline]

    Swingley WD, Blankenship RE, Raymond J. Insights into cyanobacterial evolution from comparative genomics. In: Genomics and molecular biology of cyanobacteria—Herrero A, Flores E, eds. (2007) Norwich (UK): Horizon Scientific Press. 22–43.

    Woese CR. Bacterial evolution. Microbiol Rev (1987) 51:221–271.[Free Full Text]

    Wolf YI, Rogozin IB, Grishin NV, Koonin EV. Genome trees and the tree of life. Trends Genet (2002) 18:472–479.[CrossRef][Web of Science][Medline]

    Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res (2006) 16:1099–1108.[Abstract/Free Full Text]

Accepted for publication December 26, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Int. J. Syst. Evol. Microbiol.Home page
R. S. Gupta
Protein signatures (molecular synapomorphies) that are distinctive characteristics of the major cyanobacterial clades
Int J Syst Evol Microbiol, October 1, 2009; 59(10): 2510 - 2526.
[Abstract] [Full Text] [PDF]


Home page
Gen Biol EvolHome page
O. Zhaxybayeva, W. F. Doolittle, R. T. Papke, and J. P. Gogarten
Intertwined Evolutionary Histories of Marine Synechococcus and Prochlorococcus marinus
Gen Biol Evol, September 23, 2009; 2009(0): 325 - 339.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
I. Luque, M. L. Riera-Alberola, A. Andujar, and J. A. G. Ochoa de Alda
Intraphylum Diversity and Complex Evolution of Cyanobacterial Aminoacyl-tRNA Synthetases
Mol. Biol. Evol., November 1, 2008; 25(11): 2369 - 2389.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. A. Welsh, M. Liberton, J. Stockel, T. Loh, T. Elvitigala, C. Wang, A. Wollam, R. S. Fulton, S. W. Clifton, J. M. Jacobs, et al.
The genome of Cyanothece 51142, a unicellular diazotrophic cyanobacterium important in the marine nitrogen cycle
PNAS, September 30, 2008; 105(39): 15094 - 15099.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
25/4/643    most recent
msn034v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Swingley, W. D.
Right arrow Articles by Raymond, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Swingley, W. D.
Right arrow Articles by Raymond, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?