MBE Advance Access originally published online on April 7, 2008
Molecular Biology and Evolution 2008 25(7):1321-1332; doi:10.1093/molbev/msn080
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
The Capsid of the T4 Phage Superfamily: The Evolution, Diversity, and Structure of Some of the Most Prevalent Proteins in the Biosphere
Laboratoire de Microbiologie et Génétique Moléculaires, Centre National de la Recherche Scientifique—Université Paul Sabatier-Toulouse III, Toulouse, France
E-mail: krisch{at}ibcg.biotoul.fr.
| Abstract |
|---|
|
|
|---|
The Escherichia coli bacteriophage T4 has served as a classic system in phage biology for more than 60 years. Only recently have phylogenetic analyses and genomic comparisons demonstrated the existence of a large, diverse, and widespread superfamily of T4-like phages in the environment. We report here on the T4-like major capsid protein (MCP) sequences that were obtained by targeted polymerase chain reaction (PCR) of marine environmental samples. This analysis was then expanded to include 1,000s of new sequences of T4-like capsid genes from the metagenomic data obtained during the Sorcerer II Global Ocean Sampling (GOS) expedition. This data compilation reveals that the diversity of the major and minor capsid proteins from the GOS metagenome follows the same general patterns as the sequences from cultured phage genomes. Interestingly, the new MCP sequences obtained by PCR targeted to MCP sequences in environmental samples are more divergent (deeper branching) than the vast majority of the MCP sequences coming from the other sources. The marine T4-like phage population appears to be largely dominated by the T4-like cyanophages. Using
1,400 T4-like MCP sequences from various sources, we mapped the degree of sequence conservation on a structural model of the T4-like MCP. The results indicate that within the T4 superfamily there are some clear phylogenetic groups with regard to the more conserved and more variable domains of the MCP. Such differences can be correlated with variations in capsid morphology, the arrangement of the MCP lattice, and the presence of different capsid accessory proteins between the subgroups of the T4 superfamily.
Key Words: bacteriophage T4 major capsid protein evolution structure diversity metagenomics
| Introduction |
|---|
|
|
|---|
Bacteriophage T4 is a remarkably complex nanomachine that parasitizes and proficiently kills Escherichia coli. The T4 virion is assembled from a series of modular components including a large elongated head (
111 x 78 nm), a tail structure (
113 x 16 nm) that can be triggered to contract, and an intricate baseplate that incorporates the tail triggering mechanism and also the 6 long tail fibers that specifically bind to receptors on the surface of the host bacteria (reviewed in Leiman et al. 2003
Apart from T4 itself, a number of other distant T4-like phages have been studied in recent years, including Aeromonas spp. phages (Chow and Rouf 1983
; Petrov et al. 2006
; Comeau et al. 2007
; Gibb and Edgell 2007
), vibriophages (Matsuzaki et al. 1998
, 1999
, 2000
; Miller, Heidelberg, et al. 2003
), and cyanophages (Hambly et al. 2001
; Mann et al. 2005
; Sullivan et al. 2005
; Weigele et al. 2007
). These divergent T4-like phages infect hosts that are evolutionarily distant from the Enterobacteriaceae, the classical host of T4, and its closest relatives. They also vary somewhat from the classical T4 morphology, as the length of the head varies from smaller isometric forms (
85 x 85 nm) in the cyanophages (Hambly et al. 2001
), to the more elongated forms (
140 x 80 nm) in some vibriophages and Aeromonas phages (Miller, Heidelberg, et al. 2003
; Comeau et al. 2007
) which contain larger genomes (>230 kb). There is variation in tail length as well, with the cyanophages having tails of up to
180 nm in length (Hambly et al. 2001
; Weigele et al. 2007
). One of the criteria used to choose T4-like candidate phages for full genome sequencing was to assure representation of such morphological variants in the T4-like phage database and to include them in a full comparative analysis of the T4 superfamily (Nolan et al. 2006
; Petrov et al. 2006
; Comeau et al. 2007
). Extensive phylogenetic analyses of these genomic sequences (Tétart et al. 2001
; Desplats and Krisch 2003
; Filée et al. 2006
) have confirmed the existence of a large and extremely divergent superfamily of T4-like phages, a situation very similar to that now emerging for the unrelated T7 podovirus supergroup (Rohwer et al. 2000
; Chen and Lu 2002
; Hardies et al. 2003
; Scholl et al. 2004
). Until quite recently, the T4 superfamily had been believed to contain only 4 subgroups—the "true" T-evens which are coliphages very closely related to T4; the morphologically similar Pseudo T-evens which are, nonetheless, phylogenetically diverged from T4 and infect a broader range of hosts; the Schizo T-evens which are yet more divergent, morphologically distinguishable from T4, and infect Aeromonas and Vibrio spp.; and finally, the Exo T-evens which are extremely distant from T4, morphologically distinct, and infect cyanobacteria and thermophilic eubacteria (Desplats and Krisch 2003
). However, this "simple" scenario had to be expanded as yet more T4-like phages were found in the marine environment (Filée et al. 2005
). It became clear that the different subgroups of T4 phages that were previously thought to be narrowly restricted in their environmental range actually have a much more widespread distribution. In addition, the examination of these marine sequences related to the T4 gp23 major capsid protein (MCP) has resulted in the identification of a much larger set of thus far uncharacterized subgroups of the T4 superfamily.
The environmental MCP sequencing project reported here was targeted on the most divergent of T4-like phages. Simultaneously with this effort, a massive amount of untargeted, metagenomic data became available from the Sorcerer II Global Ocean Sampling (GOS) expedition (Rusch et al. 2007
). This metagenomic data gave us a unique opportunity to look at the diversity and evolution of an enormous and unselected compilation MCP sequences. In addition, we could compare this metagenomic data with sequences obtained from our large-scale cultured phage genome sequencing project (Nolan et al. 2006
; Petrov et al. 2006
; Comeau et al. 2007
) and also with the >100 sequences collected by our earlier targeted analysis of MCP sequences in diverse ocean samples (Filée et al. 2005
). The resulting comparative analysis of more than a thousand homologs of the T4 MCP has allowed us to obtain a more complete representation of the vast diversity within the T4 superfamily.
| Materials and Methods |
|---|
|
|
|---|
Aquatic Samples
Thirteen samples of the concentrated viral size fraction (Suttle et al. 1991
Environmental Polymerase Chain Reaction and Sequencing
Degenerate polymerase chain reaction (PCR) primers ScExoT-F (5'-CWC GTC AAY TGA AAG CTC AA-3'; positions 788–807 in T4 g23 NC_000866
[GenBank]
) and ScExoT-R (5'-AWT TKM AYA CCG TAR CGA GT-3'; positions 1423–1442) were designed on the basis of alignments of cultured T4 superfamily phages to preferentially target Schizo T-even and noncyanophage Exo T-even phages. Control PCR reactions with the resulting primers amplified Schizo T-even and not T-even/Pseudo T-even phages nor cyanophages. PCR amplification, DNA purification, cloning, and sequencing of the amplified environmental sequences were carried out as described previously (Filée et al. 2005
), except PCR cycling conditions which were modified as follows: initial denaturation at 94 °C for 1 min; followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 45 °C for 1 min, and extension at 72 °C for 1 min; followed by final extension at 72 °C for 9 min. GenBank accession numbers for the novel sequences presented in this paper are EU236767
[GenBank]
–EU236786.
Comparative Genomics and Phylogenetic Analyses
Environmental sequences were compared with complete (http://www.ncbi.nlm.nih.gov/genomes/static/phg.html) and draft (http://phage.bioc.tulane.edu) T4 superfamily genomes. The GOS data set (Rusch et al. 2007
) was queried using the built-in Blast function of the CAMERA database (Seshadri et al. 2007
). Sequence manipulations were carried out in BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html); and alignments of small data sets were conducted using ClustalW (Thompson et al. 1994
). Alignments of large data sets could not be handled by traditional alignment programs and had to be performed using the MUSCLE program (Edgar 2004
) using a gap opening penalty of –3 and a gap extension penalty of –0.275. The degree of conservation in alignments (amino acid identity) was calculated using ProtSkin (http://www.mcgnmr.ca/ProtSkin). Protein secondary structure was predicted using NNPREDICT (Kneller et al. 1990
) through the STRAP interface (Gille and Frommel 2001
). The MCP phylogenetic tree was constrained, given the limitations imposed by the size of the data set, to the protein distance and Neighbor-Joining methods, all carried out using the PHYLIP v3.66 package (http://evolution.genetics.washington.edu/phylip.html). The Jones-Taylor-Thornton (JTT) model was used for protein distance calculations and a nearly full-position alignment was used for the MCP tree construction given the availability of complete, or near full-length, sequences (for both cultured phages and the metagenome) and given the sheer depth of coverage of overlapping fragments for the GOS data. However, 8 common gaps at final alignment positions 1–25 (N-terminus), 156–166, 185–212, 369–380, 390–394, 476–480, 511–520, and 624–651 (C-terminus) were removed before construction of the MCP tree (527 residue final length). During construction of the tree, sequences from 4 cultured phages (MV9a/b, MV12, and MV13) as well as 2 environmental sequences (CS43 and GOS-1131) had to be removed from the analysis due to the creation of unusually long branches, indicating possible PCR chimeras which were verified using Bellerophon (Huber et al. 2004
). The MCP tree was rooted using the RM378 sequence, primarily not only for presentational reasons but also because this sequence was the most divergent characterized MCP sequence currently available.
MCP Homology Modeling
The homology model (Dunbrack 2006
; Ginalski 2006
) of the core of RB49 gp23 was constructed using the SWISS-MODEL server (Arnold et al. 2006
) and the DeepView protein modeling/assessment program (Schwede et al. 2003
) using the solved T4 gp24 structure (Protein Data Bank: 1YUE
[PDB]
) (Fokine et al. 2005
) as a template. Portions of the protein not present in the template were modeled using HMMSTR (Bystroff and Shao 2002
) (extra N-terminal portion, A65-T84) and ModLoop (Fiser and Sali 2003
) (novel loops L135-A151, E160-F171, N200-S213, G235-S250, T332-K336, R374-G381, and K418-A422). A portion of the RB49 sequence (loop L156-I180) was removed from the model building due to its very poor conservation among the
1400 gp23 sequences. Model quality was assessed within DeepView and using Verify3D (Eisenberg et al. 1997
) and VADAR (Willard et al. 2003
). The final model had 93.1% of residues within favorable Ramachandran areas (ignoring glycines), compared with 96.9% for the solved 1YUE structure, and had a final root-mean–squared (RMS) distance to 1YUE of 2.00 Å across 353 corresponding residues. Final RMS between our model and a previous model (Protein Data Bank: 1Z1U) proposed by Fokine et al. (2005)
was 3.55 Å across 304 corresponding residues (the "Insertion" domain was excluded due to slightly different predicted positions). Protein alignment conservation was mapped onto the model structure using ProtSkin (http://www.mcgnmr.ca/ProtSkin) and visualized in DeepView. Conservation groups were delimited from the phylogenetic tree and were as follows: the "Near T4" group (46 sequences) was composed of all the cultured phages, except the Exo T-evens, plus 13 environmental clones from Filée et al. (2005)
; the "Far T4" group (277 sequences) was composed of RM378, plus 19/20 of the new clones from this study, clone 37510 from Filée et al. (2005)
, and the 256 RM378-like GOS hits obtained using either RM378 or our clone ScExo373-21 as queries; finally, the "Cyano T4" group (1,077 sequences) was composed of all cultured cyanophages, plus one of our clones (ScExo420-3), 70 Filée et al. (2005)
clones, and the top 1,000 GOS hits against RB49 gp23.
| Results and Discussion |
|---|
|
|
|---|
Capsid Genes in Cultured T4-Like Phage Genomes
The mature T4 capsid shell (or lattice) is composed essentially of only 4 proteins (fig. 1A)—gp23* ("*" indicates the processed, mature form of the gene 23 protein), the MCP; gp24*, the vertex protein; highly immunogenic outer capsid (Hoc), an accessory protein which protrudes from the surface of the capsid; and small outer capsid (Soc), a small protein (80 aa) which binds to the junctions between gp23* hexamers creating a protein "grid" on the surface of the capsid (reviewed in Leiman et al. 2003
|
Only the MCP, which represents 55% of the mass of the capsid (Leiman et al. 2003
A survey (fig. 1B) of the T4 superfamily genomes that are completely sequenced reveals that only 6 of them (T4, T6, RB14, RB32, RB69, and JS98) have homologs of all 5 of the T4 capsid shell proteins. All these coliphages are very closely related to T4. The Soc protein appears to be the shell protein that is the least present in T4-like phages. However, the small size of the Soc protein (
80 aa) makes it difficult to identify distant homologs. There are 10 Soc homologs in the databases, and among these, only the trio of RB14, RB69, and JS98 sequences have modestly diverged from T4 (at both termini). Like Soc, Hoc is a facultative protein that is frequently missing in the T4-like phage genomes (fig. 1B), but it is retained in slightly more phages (12) than the former. With the significant exception of the Exo T-even subgroup, the gp24 capsid vertex protein is found in all T4 superfamily phages for which we have sufficiently complete genomic sequence data to make an evaluation, including the recently completed JS98 (Zuber et al. 2007
) which shows a duplication of its g24. The Exo T-even subgroup includes 4 cyanophage genomes (P-SSM2, P-SSM4, S-PM2, and Syn9) (Mann et al. 2005
; Sullivan et al. 2005
; Weigele et al. 2007
) and the genome of phage RM378 which infects Rhodothermus marinus, a thermophilic eubacterium (Hjorleifsdottir et al. 2002
; Blondal et al. 2003
). These cyanophages and RM378 have isometric icosahedral capsid structures, unlike the prolate icosahedral form of the other T4-like phages. This important morphological difference may reflect the lattice composition of these phages that apparently have only gp23* and gp20 homologs. Possibly because of a slight structural alteration in their gp23* subunits, they could have obviated the requirement for a distinct vertex protein (see above). Alternatively, these distant isometric T4-like phages may use a vertex protein analog instead of a gp24 homolog, as has been suggested by the cryoEM reconstruction and genomic analysis of Syn9 (Weigele et al. 2007
).
All the T4 superfamily phages have both the MCP and the gp20 portal proteins, the latter being involved in DNA packaging also serves as the initiator complex for prohead formation and connects the capsid to the tail structure (Leiman et al. 2003
). Sequences in both of these highly conserved genes have been used for the PCR amplification of T4-like sequences in the environment (Zhong et al. 2002
; Dorigo et al. 2004
; Filée et al. 2005
; Jia et al. 2007
). The size of the MCPs from the cultured phages for which we have full-length sequences (27; supplementary fig. S1, Supplementary Material online) varies somewhat (from 412-560 aa; a 36% difference), but overall the protein is quite well conserved, with
74% of the protein alignment showing at least 50% similarity. The average identity among the 27 cultured phages over the full length of the protein is
53% and there is only one major variable segment, located approximately between residues 210–290 (supplementary fig. S1, Supplementary Material online). This variable region is flanked by the primers we normally used to amplify environmental g23 sequences, leading to the observation of variable-size PCR amplicons (Filée et al. 2005
).
Marine Diversity of T4-Like Capsid Genes
In an attempt to "fill in" the T4 superfamily phylogenetic tree, which had an unexplained paucity of culture-independent sequence representatives of the Schizo T-even and noncyanophage Exo T-even subgroups (Filée et al. 2005
), which include phages such as RM378 of the thermophilic eubacteria R. marinus (Hjorleifsdottir et al. 2002
), we designed a new set of degenerate MCP primers (ScExoT-F and ScExoT-R) with enhanced specificity for these subgroups. This new primer set was used in PCR reactions using as template the 10 marine samples of concentrated viral size fraction from our previous study (Filée et al. 2005
) as well as 3 new samples from other marine habitats. Given that some cultured Schizo T-even phage hosts have been isolated from freshwater (Aeromonas spp.) (Ackermann and Krisch 1997
), we also attempted PCR amplification on 5 samples from 2 British Columbia lakes (Cultus and Chilliwack) and a canal in the southeast of France (Canal du Midi). Of these 18 samples, only 3 were positive for PCR amplification (all marine; data not shown), with one sample showing a single band of the incorrect size which was later determined to be spurious. The PCR amplified fragments from the remaining 2 positive samples (#373—Salmon Arm; #420—Jericho Pier) were cloned and analyzed. These gave 20 unique g23-like sequences ranging in size from 643 to 694 bp. Surprisingly, almost all of these sequences (19/20) were most closely related to the phage RM378 (51–65% protein similarity), the exceptional sequence being related to the cyanophage P-SSM2 (80% protein similarity). The 19 unique RM378-like nucleotide sequences were 91% identical among themselves, indicating a surprisingly low level of diversity in the 2 samples. Although we cannot exclude the possibility of amplification or sampling artifacts, the low level of positive samples, and the absence of Schizo T-even sequences in these, raises the possibility that Schizo T-even phages may be much rarer in nature than the other T4 superfamily subgroups. Alternatively, the few cultured Schizo T-even sequences used to design the primers may simply not be representative of the subgroup as a whole. This may be the case given that the host bacteria for these phages (Aeromonas and Vibrio spp.) are often extremely abundant in the culturable fraction of the bacterioplankton, yet represent only a minor fraction of the total, culture-independent bacterial biomass (Thompson et al. 2004
).
Having completed this focused environmental sequencing, we combined this additional data with the recently released Sorcerer II GOS expedition metagenomic data (Rusch et al. 2007
). This massive amount of culture-independent sequence gave us a unique opportunity to go beyond a relatively modest targeted effort to fill in the denuded branches of our MCP phylogenetic tree by permitting us to analyze the sequence diversity of literally 1,000s of gp23 homologs isolated from the global marine sampling. An additional incentive to make this compilation came from the astonishing fact that 5 of the 6 most overrepresented protein families in the GOS metagenome were either T4 structural proteins or enzymes (with gp23 actually being ranked first) (Yooseph et al. 2007
). Performing simple Blast searches of the GOS metagenome using various T4 capsid proteins as the query sequences (table 1) confirmed the omnipresence of the marine members of the T4 superfamily. Gp20 and gp23 homologs are by far the most abundant sequences in the database, with over 3,000 hits each (E < 10–4), and are fairly evenly distributed among
45 sampling sites. The gp24 vertex protein is the next most abundant capsid protein, with
900 homologs but with generally lower E values than those obtained for the first 2 proteins. This presumably reflects a lower level of gp24 sequence conservation and the possible replacement of the vertex protein function in many phages (e.g., Exo T-evens) by analogs. The Hoc protein sequence had only
110 database hits and the Soc protein had no hits in the GOS metagenome, even using a very permissive cutoff (E < 10–2). As mentioned previously, the small size of this latter protein may make it difficult to identify homologs or, alternatively, this function may be supplied in distant phages by analogs. In summary, the abundance of the capsid shell proteins in the GOS metagenome follows the same pattern of gene preservation as the cultured phage genomes—with Soc being the sequence least present in the metagenome, followed by Hoc, then gp24, and finally by gp23 and gp20 that are the most frequent sequences having nearly 4 times more hits in the metagenome than all the others. A comparable situation occurs for the proteins that are located internal to the T4 phage capsid—proteins which form part of the scaffold during prohead assembly or serve functions in host takeover/DNA defense once they are injected with the phage DNA during infection (Leiman et al. 2003
; Comeau and Krisch 2005
; Depping et al. 2005
). The gp21 prohead protease, responsible for maturing many of the T4 capsid proteins as outlined above, and the essential gp22 scaffold protein (Black et al. 1994
) both have strikingly different metagenomic profiles than the other internal capsid proteins. These 2 proteins are well conserved within the T4 superfamily, which correlates with their abundance (
1,400–1,800 hits) in the GOS metagenome, but many of the other internal proteins of the capsid have no hits in the GOS metagenome (table 1) which is probably related to their moderate to low conservation even within the known members of the T4 superfamily.
|
Diversity of the MCP (gp23)
The MCP is the foundation of the both the T4-like phage's capsid structure and also of the operant phylogeny of the T4 superfamily (Desplats and Krisch 2003
150–300 in the current alignment) was fully confirmed and, in addition, there is some sequence plasticity at both extremities of the protein. It is also evident from visual inspection that there are clear-cut sequence subgroups. First, there is separation of the cultured phage sequences (C) from nearly all the others, with the exception of a few of the Filée et al. (2005)
225) in the cultured phages being somewhat longer than T4-type phages isolated in the wild. There is also a striking cohesion among the new environmental sequences reported in this study (ScExo) with the 256 GOS hits (GOS lines immediately above ScExo) resulting from using either phage RM378 or the sequence ScExo373-21 as queries. Finally, there is also strong cohesion among the top 1,000 GOS sequences resulting from using RB49 gp23 as the query (GOS).
|
These clear general trends were reinforced in quantitative detail by a Neighbor-Joining phylogeny of all the
1,400 gp23 sequences (fig. 3). At the "top" of the tree, all cultured, non-Exo T-even (cyanophages, RM378,
SMB14) phages are well grouped together, along with about half of the Filée et al. (2005)
|
Structure and Evolution of the MCP (gp23)
Our further analysis of the MCPs in the T4 superfamily involved an investigation of the structure of the gp23 protein. A structural homology model (Dunbrack 2006
20/33% identity/similarity vs.
17/31% for T4 gp23) to T4 gp24, which served as the template structure because it is currently the only member of the gp23/24 family whose structure is known (Fokine et al. 2005
11% and
13% identity, respectively), given that structural homologies appear to be conserved much longer than the primary sequence. In fact, Fokine et al. (2005)
|
Our MCP model could then be used to map the location of sequence divergence in the 3-dimensional structure. Such "sequence divergence mapping" permits the easy visualization of the conserved, and hence the potentially most important, motifs within a protein structure. Similarly, such a map highlights the most variable segments within the structure, presumably those that are responsible for variations in capsid morphology and other more subtle kinds of subgroup divergence. Although the N-terminal domain is not well conserved (fig. 4B), the central core portion of the gp23 protein is. The Insertion domain, which corresponds to the major variable region previously alluded to in the MCP alignment (fig. 2 and supplementary fig. S1, Supplementary Material online), is the least conserved domain. When the gp23 structural model is assembled into a hexamer (the functional unit of the capsid lattice) (Leiman et al. 2003
2-3) is somewhat uncertain. The Near T4 group (primarily cultured phages) has good overall conservation of sequence, and strikingly so in the "outer ring" (formed by multiple β-strands); the only exception is the β14 strand at the C-terminus that is poorly conserved. The Cyano T4 group has an outer ring less conserved than the Near T4s, yet has a well-conserved central core (primarily helices
7–8). Finally, the Far T4 group shows the least conserved outer ring and is also the least conserved overall (more blue throughout). Mutant analyses of the T4 MCP indicate that the outer ring residues are responsible, in part, for the morphology of the head, whereas the central core contains residues that allow gp23 to also function as the vertex protein (Black et al. 1994| Conclusion and Perspectives |
|---|
|
|
|---|
Our analysis shows that the diversity of T4 superfamily capsid proteins from the GOS environmental metagenome follow the same general patterns as do their homologs from cultured phage genomes. Interestingly, the novel environmental gp23 MCP sequences obtained in our targeted sequencing are quite divergent (deeper branching) from most other known sequences, including the overwhelming majority of the GOS metagenome that is dominated by cyanophage-like T4 phages. These results suggest that the T4 superfamily's diversity has not yet been fully delimited and that there are certainly additional more distant T4-like phages yet to be discovered in the environment. Directly related to this issue, we are using this comprehensive gp23 sequence analysis to iteratively design a new set T4 superfamily PCR primers that should be yet more inclusive in its capacity to amplify distant T4-like phage sequences from the environment.
A homology model of the MCP, combined with the mapping of protein diversity from
1,400 environmental and cultured sequences, shows that T4 superfamily phylogenetic groups have clear differences with respect to conserved and variable structural areas of their capsid proteins. These differences can be correlated with the diversity of capsid morphology, capsid lattice arrangements, and accessory protein content interactions in this phage family. The conclusions of this structure/diversity analysis would be made much stronger by the availability of the structures for the MCP in several different branches of the T4 superfamily. Given the central role that the MCP plays in coordinating interactions between the different essential and accessory components in the arrangement of the capsid lattice, having a panel of several representative structures could lead to a major advance in our understanding of how virion nanomachines are constructed and have evolved. From this point of view, having the X-ray crystallographic structures of the MCPs of various T4 phage morphotypes, such as the isometric forms of cyanophage S-PM2 and the deep-branching Exo T-even RM378, could be both fascinating and highly informative. Such data would put us in a better position to understand the structural and evolutionary features of the T4-like MCP that have allowed it to become one of the most abundant protein species in the biosphere.
| Supplementary Material |
|---|
|
|
|---|
Supplementary figure S1 and data set S1 as well as color versions of figures 1 and 3, are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This work was supported by the CNRS. We thank our colleagues Curtis Suttle (University of British Columbia, Vancouver) for the marine samples and Christine Arbiol (CNRS IFR109) for DNA sequencing/figure consultation. We also thank Viknesh Sivanathan for his advice on the STRAP interface and Shibu Yooseph for his help with the CAMERA database and GOS data set. In agreement with the Convention on Biological Diversity (http://www.cbd.int), the genetic information from the CAMERA database analyzed in this publication may be considered the genetic patrimony of the countries from which the samples were procured. A.M.C. gratefully acknowledges the support of the Les Treilles Foundation and H.M.K. that of the Kribu Foundation.
| Footnotes |
|---|
Hervé Philippe, Associate Editor
| References |
|---|
|
|
|---|
Ackermann HW, Krisch HM. A catalogue of T4-type bacteriophages. Arch Virol (1997) 142:2329–2345.[CrossRef][Web of Science][Medline]
Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics (2006) 22:195–201.
Black LW, Showe MK, Steven AC. Morphogenesis of the T4 head. In: Molecular biology of bacteriophage T4—Karam JD, ed. (1994) Washington (DC): American Society for Microbiology Press. 182–185.
Blondal T, Hjorleifsdottir SH, Fridjonsson OF, Ævarsson A, Skirnisdottir S, Hermannsdottir AG, Hreggvidsson GO, Smith AV, Kristjansson JK. Discovery and characterization of a thermostable bacteriophage RNA ligase homologous to T4 RNA ligase 1. Nucleic Acids Res (2003) 31:7247–7254.
Bystroff C, Shao Y. Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics (2002) 18:S54–S61.[Abstract]
Chen F, Lu JR. Genomic sequence and evolution of marine cyanophage P60: a new insight on lytic and lysogenic phages. Appl Environ Microbiol (2002) 68:2589–2594.
Chow MS, Rouf MA. Isolation and partial characterization of 2 Aeromonas hydrophila bacteriophages. Appl Environ Microbiol (1983) 45:1670–1676.
Comeau AM, Bertrand C, Letarov A, Tétart F, Krisch HM. Modular architecture of the T4 phage superfamily: a conserved core genome and a plastic periphery. Virology (2007) 362:384–396.[CrossRef][Web of Science][Medline]
Comeau AM, Chan AM, Suttle CA. Genetic richness of vibriophages isolated in a coastal environment. Environ Microbiol (2006) 8:1164–1176.[CrossRef][Medline]
Comeau AM, Krisch HM. War is peace—dispatches from the bacterial and phage killing fields. Curr Opin Microbiol (2005) 8:488–494.[CrossRef][Web of Science][Medline]
Dabrowska K, Switala-Jelen K, Opolski A, Gorski A. Possible association between phages, Hoc protein, and the immune system. Arch Virol (2006) 151:209–215.[CrossRef][Web of Science][Medline]
Depping R, Lohaus C, Meyer HE, Ruger W. The mono-ADP-ribosyltransferases Alt and ModB of bacteriophage T4: target proteins identified. Biochem Biophys Res Commun (2005) 335:1217–1223.[Web of Science][Medline]
Desplats C, Krisch HM. The diversity and evolution of the T4-type bacteriophages. Res Microbiol (2003) 154:259–267.[Medline]
Dorigo U, Jacquet S, Humbert JF. Cyanophage diversity, inferred from g20 gene analyses, in the largest natural lake in France, Lake Bourget. Appl Environ Microbiol (2004) 70:1017–1022.
Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol (2006) 16:374–384.[CrossRef][Web of Science][Medline]
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 32:1792–1797.
Eisenberg D, Luthy R, Bowie JU. VERIFY3D: assessment of protein models with three-dimensional profiles. In: Methods in enzymology. Vol. 277—Carter CW Jr, Sweet RM, eds. (1997) New York: Academic Press. 396–404.
Filée J, Bapteste E, Susko E, Krisch HM. A selective barrier to horizontal gene transfer in the T4-type bacteriophages that has preserved a core genome with the viral replication and structural genes. Mol Biol Evol (2006) 23:1688–1696.
Filée J, Tétart F, Suttle CA, Krisch HM. Marine T4-type bacteriophages, a ubiquitous component of the dark matter of the biosphere. Proc Natl Acad Sci USA (2005) 102:12471–12476.
Fiser A, Sali A. ModLoop: automated modeling of loops in protein structures. Bioinformatics (2003) 19:2500–2501.
Fokine A, Battisti AJ, Kostyuchenko VA, Black LW, Rossmann MG. Cryo-EM structure of a bacteriophage T4 gp24 bypass mutant: the evolution of pentameric vertex proteins in icosahedral viruses. J Struct Biol (2006) 154:255–259.[CrossRef][Web of Science][Medline]
Fokine A, Chipman PR, Leiman PG, Mesyanzhinov VV, Rao VB, Rossmann MG. Molecular architecture of the prolate head of bacteriophage T4. Proc Natl Acad Sci USA (2004) 101:6003–6008.
Fokine A, Leiman PG, Shneider MM, Ahvazi B, Boeshans KM, Steven AC, Black LW, Mesyanzhinov VV, Rossmann MG. Structural and functional similarities between the capsid proteins of bacteriophages T4 and HK97 point to a common ancestry. Proc Natl Acad Sci USA (2005) 102:7163–7168.
Gibb EA, Edgell DR. Multiple controls regulate the expression of mobE, an HNH homing endonuclease gene embedded within a ribonucleotide reductase gene of phage Aeh1. J Bacteriol (2007) 189:4648–4661.
Gille C, Frommel C. STRAP: editor for structural alignments of proteins. Bioinformatics (2001) 17:377–378.
Ginalski K. Comparative modeling for protein structure prediction. Curr Opin Struct Biol (2006) 16:172–177.[CrossRef][Web of Science][Medline]
Hambly E, Tétart F, Desplats C, Wilson WH, Krisch HM, Mann NH. A conserved genetic module that encodes the major virion components in both the coliphage T4 and the marine cyanophage S-PM2. Proc Natl Acad Sci USA (2001) 98:11411–11416.
Hardies SC, Comeau AM, Serwer P, Suttle CA. The complete sequence of marine bacteriophage VpV262 infecting Vibrio parahaemolyticus indicates that an ancestral component of a T7 viral supergroup is widespread in the marine environment. Virology (2003) 310:359–371.[Web of Science][Medline]
Henikoff S, Henikoff JG. Amino-acid substitution matrices from protein blocks. Proc Natl Acad Sci USA (1992) 89:10915–10919.
Hjorleifsdottir S, Hreggvidsson GO, Fridjonsson OH, Ævarsson A, Kristjansson JK. Bacteriophage RM 378 of a thermophilic host organism (2002) US Patent 6,492,161.
Huber T, Faulkner G, Hugenholtz P. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics (2004) 20:2317–2319.
Ishii T, Yamaguchi Y, Yanagida M. Binding of structural protein Soc to head shell of bacteriophage-T4. J Mol Biol (1978) 120:533–544.[CrossRef][Web of Science][Medline]
Iwasaki K, Trus BL, Wingfield PT, Cheng NQ, Campusano G, Rao VB, Steven AC. Molecular architecture of bacteriophage T4 capsid: vertex structure and bimodal binding of the stabilizing accessory protein. Soc Virol (2000) 271:321–333.
Jia ZJ, Ishihara R, Nakajima Y, Asakawa S, Kimura M. Molecular characterization of T4-type bacteriophages in a rice field. Environ Microbiol (2007) 9:1091–1096.[CrossRef][Medline]
Jiang J, Abushilbayeh L, Rao VB. Display of a PorA peptide from Neisseria meningitidis on the bacteriophage T4 capsid surface. Infect Immun (1997) 65:4770–4777.[Abstract]
Karam JD, Konigsberg WH. DNA polymerase of the T4-related bacteriophages. Prog Nucleic Acid Res Mol Biol (2000) 64:65–96.[Web of Science][Medline]
Kneller DG, Cohen FE, Langridge R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol (1990) 214:171–182.[CrossRef][Web of Science][Medline]
Leiman PG, Kanamaru S, Mesyanzhinov VV, Arisaka F, Rossmann MG. Structure and morphogenesis of bacteriophage T4. Cell Mol Life Sci (2003) 60:2356–2370.[CrossRef][Web of Science][Medline]
Letarov A, Manival X, Desplats C, Krisch HM. gpwac of the T4-type bacteriophages: structure, function, and evolution of a segmented coiled-coil protein that controls viral infectivity. J Bacteriol (2005) 187:1055–1066.
Mann NH. Phages of the marine cyanobacterial picophytoplankton. FEMS Microbiol Rev (2003) 27:17–34.[CrossRef][Web of Science][Medline]
Mann NH, Clokie MRJ, Millard A, Cook A, Wilson WH, Wheatley PJ, Letarov A, Krisch HM. The genome of S-PM2, a "photosynthetic" T4-type bacteriophage that infects marine Synechococcus strains. J Bacteriol (2005) 187:3188–3200.
Matsuzaki S, Inoue T, Kuroda M, Kimura S, Tanaka S. Cloning and sequencing of major capsid protein (MCP) gene of a vibriophage, KVP20, possibly related to T-even coliphages. Gene (1998) 222:25–30.[CrossRef][Web of Science][Medline]
Matsuzaki S, Inoue T, Tanaka S, Koga T, Kuroda M, Kimura S, Imai S. Characterization of a novel Vibrio parahaemolyticus phage, KVP241, and its relatives frequently isolated from seawater. Microbiol Immunol (2000) 44:953–956.[Web of Science][Medline]
Matsuzaki S, Kuroda M, Kimura S, Tanaka S. Major capsid proteins of certain Vibrio and Aeromonas phages are homologous to the equivalent protein, gp23*, of coliphage T4. Arch Virol (1999) 144:1647–1651.[CrossRef][Web of Science][Medline]
Miller ES, Heidelberg JF, Eisen JA, et al, (13 co-authors). Complete genome sequence of the broad-host-range vibriophage KVP40: comparative genomics of a T4-related bacteriophage. J Bacteriol (2003) 185:5220–5233.
Miller ES, Kutter E, Mosig G, Arisaka F, Kunisawa T, Ruger W. Bacteriophage T4 genome. Microbiol Mol Biol Rev (2003) 67:86–156.
Nolan JM, Petrov V, Bertrand C, Krisch HM, Karam JD. Genetic diversity among five T4-like bacteriophages. Virol J (2006) 3:30.[CrossRef][Medline]
Petrov VM, Nolan JM, Bertrand C, Levy D, Desplats C, Krisch HM, Karam JD. Plasticity of the gene functions for DNA replication in the T4-like phages. J Mol Biol (2006) 361:46–68.[CrossRef][Web of Science][Medline]
Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem (2004) 25:1605–1612.[CrossRef][Web of Science][Medline]
Ren ZJ, Black LW. Phage T4 Soc and Hoc display of biologically active, full-length proteins on the viral capsid. Gene (1998) 215:439–444.[CrossRef][Web of Science][Medline]
Ren ZJ, Lewis GK, Wingfield PT, Locke EG, Steven AC, Black LW. Phage display of intact domains at high copy number: a system based on SOC, the small outer capsid protein of bacteriophage T4. Protein Sci (1996) 5:1833–1843.[Web of Science][Medline]
Rohwer F, Segall A, Steward G, Seguritan V, Breitbart M, Wolven F, Azam F. The complete genomic sequence of the marine phage roseophage SIO1 shares homology with nonmarine phages. Limnol Oceanogr (2000) 45:408–418.
Rossmann MG, Mesyanzhinov VV, Arisaka F, Leiman PG. The bacteriophage T4 DNA injection machine. Curr Opin Struct Biol (2004) 14:171–180.[CrossRef][Web of Science][Medline]
Rusch DB, Halpern AL, Sutton G, et al, (40 co-authors). The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol (2007) 5:398–431.[Web of Science]
Sathaliyawala T, Rao M, Maclean DM, Birx DL, Alving CR, Rao VB. Assembly of human immunodeficiency virus (HIV) antigens on bacteriophage T4: a novel in vitro approach to construct multicomponent HIV vaccines. J Virol (2006) 80:7688–7698.
Scholl D, Kieleczawa J, Kemp P, Rush J, Richardson CC, Merril C, Adhya S, Molineux IJ. Genomic analysis of bacteriophages SP6 and K1-5, an estranged subgroup of the T7 supergroup. J Mol Biol (2004) 335:1151–1171.[CrossRef][Web of Science][Medline]
Schwede T, Kopp J, Guex N, Peitsch MC. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res (2003) 31:3381–3385.
Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M. CAMERA: a community resource for metagenomics. PLoS Biol (2007) 5:394–397.[Web of Science]
Shamoo Y, Friedman AM, Parsons MR, Konigsberg WH, Steitz TA. Crystal-structure of a replication fork single-stranded-DNA binding-protein (T4 gp32) complexed to DNA. Nature (1995) 376:362–366.[CrossRef][Medline]
Shivachandra SB, Rao M, Janosi L, Sathaliyawala T, Matyas GR, Alving CR, Leppla SH, Rao VB. In vitro binding of anthrax protective antigen on bacteriophage T4 capsid surface through Hoc-capsid interactions: a strategy for efficient display of large full-length proteins. Virology (2006) 345:190–198.[CrossRef][Web of Science][Medline]
Short CM, Suttle CA. Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments. Appl Environ Microbiol (2005) 71:480–486.
Steven AC, Greenstone HL, Booy FP, Black LW, Ross PD. Conformational-changes of a viral capsid protein—thermodynamic rationale for proteolytic regulation of bacteriophage-T4 capsid expansion, cooperativity, and super-stabilization by Soc binding. J Mol Biol (1992) 228:870–884.[CrossRef][Web of Science][Medline]
Sullivan MB, Coleman ML, Weigele P, Rohwer F, Chisholm SW. Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLoS Biol (2005) 3:790–806.[Web of Science]
Suttle CA, Chan AM, Cottrell MT. Use of ultrafiltration to isolate viruses from seawater which are pathogens of marine phytoplankton. Appl Environ Microbiol (1991) 57:721–726.
Thomassen E, Gielen G, Schutz M, Schoehn G, Abrahams JP, Miller S, Van Raaij MJ. The structure of the receptor-binding domain of the bacteriophage T4 short tail fibre reveals a knitted trimeric metal-binding fold. J Mol Biol (2003) 331:361–373.[CrossRef][Web of Science][Medline]
Thompson FL, Iida T, Swings J. Biodiversity of vibrios. Microbiol Mol Biol Rev (2004) 68:403–431.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL-W—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.
Tétart F, Desplats C, Kutateladze M, Monod C, Ackermann HW, Krisch HM. Phylogeny of the major head and tail genes of the wide-ranging T4-type bacteriophages. J Bacteriol (2001) 183:358–366.
Weigele PR, Pope WH, Pedulla ML, Houtz JM, Smith AL, Conway JF, King J, Hatfull GF, Lawrence JG, Hendrix RW. Genomic and structural analysis of Syn9, a cyanophage infecting marine Prochlorococcus and Synechococcus. Environ Microbiol (2007) 9:1675–1695.[CrossRef][Medline]
Willard L, Ranjan A, Zhang HY, Monzavi H, Boyko RF, Sykes BD, Wishart DS. VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res (2003) 31:3316–3319.
Yooseph S, Sutton G, Rusch DB, et al, (33 co-authors). The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol (2007) 5:432–466.[Web of Science]
Zhong Y, Chen F, Wilhelm SW, Poorvin L, Hodson RE. Phylogenetic diversity of marine cyanophage isolates and natural virus communities as revealed by sequences of viral capsid assembly protein gene g20. Appl Environ Microbiol (2002) 68:1576–1584.
Zuber C, Ngom-Bru C, Barretto C, Bruttin A, Brüssow H, Denou E. Genome analysis of phage JS98 defines a fourth major subgroup of T4-like phages in Escherichia coli. J Bacteriol (2007) 189:8206–8214.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



