MBE Advance Access originally published online on November 20, 2007
Molecular Biology and Evolution 2008 25(1):155-167; doi:10.1093/molbev/msm243
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Insight into Ligand Diversity and Novel Biological Roles for Family 32 Carbohydrate-Binding Modules
,1
* Department of Biochemistry and Microbiology, University of Victoria, Victoria, Canada
Departamento de Biologia Celular y Molecular, Universidade da Coruña, A Coruña, Spain
E-mail: wabbott{at}uvic.ca.
| Abstract |
|---|
|
|
|---|
Family 32 carbohydrate-binding modules (CBM32s) are found in a diverse group of microorganisms, including archea, eubacteria, and fungi. Significantly, many members of this family belong to plant and animal pathogens where they are likely to play a key role in enzyme toxin targeting and function. Indeed, ligand targets have been shown to range from insoluble plant cell wall polysaccharides to complex eukaryotic glycans. Besides a potential direct involvement in microbial pathogenesis, CBM32s also represent an important family for the study of CBM evolution due to the wide variety of complex protein architectures that they are associated with. This complexity ranges from independent lectin-like proteins through to large multimodular enzyme toxins where they can be present in multiple copies (multimodularity). Presented here is a rigorous analysis of the evolutionary relationships between available polypeptide sequences for family 32 CBMs within the carbohydrate active enzyme database. This approach is especially helpful for determining the roles of CBM32s that are present in multiple copies within an enzyme as each module tends to cluster into groups that are associated with distinct enzyme classes. For enzymes that contain multiple copies of CBM32s, however, there are differential clustering patterns as modules can either cluster together or in very distant sections of the tree. These data suggest that enzymes containing multiple copies possess complex mechanisms of ligand recognition. By applying this well-developed approach to the specific analysis of CBM relatedness, we have generated here a new platform for the prediction of CBM binding specificity and highlight significant new targets for biochemical and structural characterization.
Key Words: carbohydrate-binding module family 32 evolution multimodularity structural characterization
| Introduction |
|---|
|
|
|---|
Carbohydrate-binding modules (CBMs) are critical components of microbial carbohydrate active enzymes (CAZymes). Primarily, CBMs function in nature by concentrating the parent enzyme at their dedicated substrates thereby enhancing its catalytic efficacy (Boraston et al. 2004). This process helps the enzyme to degrade refractory substrates such as insoluble structural polysaccharides and to target complex substrates resident within eukaryotic extracellular glycans. Originally, CBMs were classified as cellulose-binding domains because the first documented examples interacted preferentially with crystalline cellulose (Gilkes et al. 1988
Although, the classical definition of a CBM includes the constraint that they "... are found within the primary structures of carbohydrate-active enzymes..." (Boraston et al. Forthcoming
), several exceptions to this paradigm now have been reported (Charnock et al. 2002
; Flint et al. 2004
; Vaaje-Kolstad et al. 2005
; Abbott et al. 2007
). For example, Yersinia enterocolitica contains a family 32 carbohydrate-binding module (CBM32) (YeCBM32) that is an independent periplasmic protein that interacts with polygalacturonic acid (Abbott et al. 2007
). This interaction appears to retain polygalacturonic acid within the periplasm where it becomes a substrate for resident depolymerases. The oligogalacturonide products of these reactions are the preferential ligands for TogB (Abbott and Boraston 2007
), the specificity determinant of the intracellular TogMNAB transporter. Reconstruction of the pathway indicates, therefore, that ligand and substrate selectivities of these proteins support a funneling mechanism initiated by YeCBM32 to direct pectin breakdown products into the cell and prevent their escape back into the environment.
Although the most closely related proteins to YeCBM32 (CBM32s from Yersinia intermedia, Yersinia pestis, Yersinia pseudotuberculosis, Erwinia carotovora, and Vibrio vulnificus) retain a conserved role in pectin utilization, the majority of CBM32s are appended to a wide variety of enzymes and in many cases can be detected in multiple copies within the same enzyme. This observation underlines a diverse functional repertoire for this family, and many CBM32s are likely to be directly involved in pathogenesis. For example, CBM32s are found in Clostridium perfringens, Burkholdia cepacia, Bacteroides sp., and Streptococcus pneumoniae; however, in most cases direct roles for these proteins in virulence remain to be determined. To date, CBM32s have been shown to bind a variety of carbohydrate ligands displaying a signature galacto-configured moiety (i.e., axial C4). These include galactose, LacNAc (β-D-galactosyl-1,4-β-D-N-acetylglucosamine), type II blood group antigen H-trisaccharide, and polygalacturonic acid (Newstead et al. 2005
; Ficko-Blean and Boraston 2006
; Abbott et al. 2007
). However, the full spectrum of biological ligands is likely to be much larger as these proteins are found in a milieu of biochemically uncharacterized modular enzymatic architectures with diverse predicted activities.
In order to characterize the relatedness of family 32 CBMs and provide a framework for predicting the diverse structure–function relationships of these proteins, we have performed a rigorous analysis on sequences available within the CAZy database (Coutinho and Henrissat 1999
). These data provide a research platform for predicting the ligand specificities of CBM32s based upon a functional classification and also highlight CBM32s with distinctive biological roles. To the best of our knowledge, this approach has only been reported for starch-binding CBMs from family 20 and 21, which display a stringently conserved pattern of ligand binding (Machovic et al. 2005
; Machovic and Janecek 2006a
, 2006b
). In addition to presenting the overall phylogeny of CBM family 32, we have focused on the multimodularity present within C. perfringens and Saccharophagus degradans proteins and the independent CBM32s from Myxococcus xanthus and Y. enterocolitica. When possible, we have included the structural information garnered by crystallographic studies and highlighted strategic avenues for future biochemical and structural studies.
| Materials and Methods |
|---|
|
|
|---|
Module Boundary Determination
A total of 180 polypeptide sequences for carbohydrate-binding modules belonging to different species have been used in our analyses (Supplementary Materials online), including 4 outgroup sequences from eukaryotes. Complete protein sequences were retrieved from the CAZy Database (http://www.cazy.org/). Domain architecture was determined using a complementary analysis of IntroProScan (Zdobnov and Apweiler 2001
Evolutionary Analysis of CBM32 Modules
Nucleotide coding sequences for the modules analyzed in the focused examples were aligned on the basis of their translated amino acid sequences using the BioEdit (Hall 1999
) and MEGA ver. 3.1 (Kumar et al. 2004
) programs with default parameters. The alignment for the complete set of CBM sequences consisted of 245 amino acid sites, and a bar chart representation was used in order to represent the frequency of each residue at every position of the alignment, using the LogoBar program (Perez-Bercoff et al. 2006
).
All molecular evolutionary analyses in the present work were carried out using the program MEGA ver. 3.1 (Kumar et al. 2004
). The extent of nucleotide and amino acid variation between sequences was estimated by means of the uncorrected differences (p-distance) as this distance is known to give better results than more complicated methods when the number of sequences is large and the number of positions used is relatively small (Nei and Kumar 2000).
The numbers of synonymous (pS) and nonsynonymous (pN) nucleotide differences per site were computed using the modified Nei–Gojobori method (Zhang et al. 1998
), providing in both cases the transition/transversion ratio (R). Although saturation in synonymous sites is possible, pS values have been used in the present work in order to ascertain the nature of the selective process operating on different modules, as well as to perform Z-test of selection locally from comparisons between modules within species. In this background, saturation in synonymous sites could be discarded.
Distances were estimated using the pairwise-deletion option and standard errors were calculated by the bootstrap method with 1,000 replicates. The presence and nature of selection was tested in CBM modules by using the codon-based Z-test for selection, establishing the alternative hypotheses as either H1: pN < pS or H1: pN > pS and the null hypothesis as H0: pN = pS (Nei and Kumar 2000
). The Z-statistic and the probability that the null hypothesis is rejected were obtained, indicating the significance level as **P (P < 0.001) and *P (P < 0.05).
The Neighbor-Joining tree-building method (Saitou and Nei 1987
) was used to reconstruct the phylogenetic trees. In order to assess that our results are not dependent on this choice, phylogenetic inference analyses were completed by the reconstruction of a maximum parsimony tree (Rzhetsky and Nei 1992a
) using the close-neighbor-interchange search method with search level 1 and with 10 replications for the random addition trees option. We combined the bootstrap (Felsestein 1985
; Efron et al. 1996
) and the interior-branch test methods (Rzhetsky and Nei 1992a
; Sitnikova 1995
) in order to test the reliability of the obtained topologies, producing the bootstrap probability (BP) and the confidence probability (CP) values for each internal branch, assuming BP > 80% and CP
95% as statistically significant (Sitnikova et al. 1995; Rzhetsky and Nei 1992b
). The CBM module associated with the galactose oxidase enzyme in 3 species of fungi was used as outgroup in the reconstruction.
The analysis of the nucleotide variation across coding regions was performed using a sliding-window approach, by estimating the total (
) and the synonymous (
S) nucleotide diversity (average number of nucleotide differences per site between 2 sequences) with a window length of 20 bp and a step size of 5 bp (for
) and a window length of 5 bp and a step size of 1 bp (for
S). The codon usage bias in genes encoding CBM32-containing molecules was estimated as the effective number of codons (Wright 1990
), where the highest value (61) indicates that all synonymous codons are used equally (no bias) and the lowest (20) that only a preferred codon is used in each synonymous class (extreme bias). Both analyses were conducted with the program DnaSP ver. 4.10 (Rozas et al. 2003
).
| Results and Discussion |
|---|
|
|
|---|
Overall Evolutionary Relationships in CBM Family 32
The phylogenetic tree presented in figure 1 depicts the evolutionary relationships among family 32 CBMs deposited within the CAZy database (Coutinho and Henrissat 1999
|
Overall, CBM32 sequences display a clustering pattern that parallels the activities of the catalytic module in which the module is appended to. These enzyme groups, which include family 8, 10, 16, 29, 31, 84, 85, and 89 glycoside hydrolases (GHs); pectate lyases (both pectate lyase and β-helix and β-propeller TolB folds); an unclassified metalloprotease;
-1,2 mannosidases; and the galactose oxidases, are indicated in the right margin. Although many of these enzymes remain to be characterized biochemically, their predicted activities span a wide range of substrates, including chitin,
- and β-glucans, mannose, fucose, and hexosamines. In addition to the catalytically active proteins, the galacturonic acid–binding modules (i.e., Yersinia sp. single modules) cluster into a grouping that is independent of a catalytic module. Such heterogeneity within a CBM family is in contrast to other previously reported phylogenic analyses of the starch-binding CBMs from family 20 and 21. These closely related families fall into a clan that also includes more distantly related starch-binding CBM families 25, 26, 34, 41, and 45 (Machovic et al. 2005
-1,4 glucans and in most cases are associated with an enzymatic module involved in the biosynthesis or utilization of starch (Machovic and Janecek 2006aAs revealed by the branch lengths within the CBM family tree (fig. 1), there is significant amino acid variation among members of the family 32 at all levels, with the peculiarity that the extent of this protein variation is roughly the same within and between species, as well as within and between enzyme groups. Surprisingly, a large variation is evident between different modules associated with a particular enzyme in a given species, resulting in an interspersed distribution of CBM sequences across the phylogeny. This property is evidenced in the modules from the C. perfringens isozymes from both family GH84 (GH84A-E) and GH85 (GH85A-B) enzymes, which contain some of the most divergent CBM32 sequences in the analysis.
In some cases, a noticeable difference in relatedness is also evident for modules associated with homologous enzymes from different species. For example, there are 2 divergent pectate lyase enzyme clusters, which appear to have been diverging at a fast rate for a long time period rather than suggesting the possibility of a polyphyletic origin. However, it is possible that the differentiation within these 2 groups has been influenced by unique selective constraints acting on the different modules associated with the catalytic module. For example, pectate lyases operate by a β-elimination mechanism to depolymerize pectic substrates found within primary plant cell walls (Marin-Rodriguez et al. 2002
). Although pectin primarily comprises
-1,4 linked galacturonic acid backbone, it is a heterogeneous polysaccharide that also contains rhamnose, arabinose, and galactose moieties, and there are various modifications that can be presented including C6 methoxyl esters and C2/C3 acetylations (Willats et al. 2001
). Selection on these 2 clusters of CBM32s therefore may be based upon the degree of polymerization and asymmetry of the galacturonic acid residues not only within the pectic backbone but also toward regions of carbohydrate heterogeneity.
One of the greatest challenges in CBM research is determining ligand specificity. This is a critical first step toward characterizing the molecular determinants of protein–carbohydrate complex formation. In this respect, the phylogenic analysis presented here is a novel approach and potentially invaluable tool for streamlining the experimental investigation of uncharacterized CBM32s. In several cases, CBM32s that are components of enzymes with unknown activities cluster with CBMs appended to predicted or characterized enzymes (fig. 1—highlighted with black arrows). For example, several lactic acid bacteria have 1 CBM32 member defined within the database. These CBMs cluster into 2 very different enzyme groupings. The GH31 (
-glucosidase) group consists of Lactobacillus delbrueckii, Lactobacillus acidophilus, Lactobacillus gasseri, and Lactobacillus casei, whereas the GH85 (mannosyl glycoprotein endo-β-N-acetylglucosaminidase) group contains Lactobacillus plantarum (Lp_0182) and the uncharacterized protein from Lactococcus lactis subsp. lactis (YpcC). Sequence comparisons between these 2 closely related proteins reveal that the L. lactis CBM32 is an independent acting protein, which aligns very well with the C-terminal portion of the L. plantarum enzyme (42% identity—not shown). This observation provides an informed avenue for investigating the mechanisms of carbohydrate recognition and utilization by these currently uncharacterized proteins.
In addition to predicting the specificity of CBM32s appended to unknown catalytic modules, this approach is also helpful for classifying enzyme architectures that contain more than 1 CBM. This concept can be clearly demonstrated for both "homogeneous" and "heterogeneous" clustering patterns. Homogeneous clustering occurs when 2 or more CBM32s from the same enzyme display the most similarity to each other. For example, the GH3 from S. degradans contains 4 tandemly arranged CBM32s that cluster very closely within the phylogeny, an observation that may indicate overlapping ligand specificities. Heterogeneous clustering on the other hand refers to the occurrence of a module from a multimodular CBM32 protein displaying the most similarity to a CBM32 from a different enzyme grouping. This is the case of the GH89
-N-acetylglucosaminidase from C. perfringens, which has 6 appended CBM32s that cluster into 3 distinct groups. An additional complication comes from species-specific clustering patterns (i.e., modules GH20, GH31, GH33, and GH84C from C. perfringens) or by an enzyme-specific pattern (i.e., Fucosidase GH29 in Porphyromonas, Bacteroides fragilis and Bacteroides thetaiotaomicron). The high variation presented between modules even within the same species and in the same enzyme class masks the phylogenetic relationships in some cases. In general terms, however, it is possible to define the different lineages across the phylogeny. The differences in amino acid composition of these modules will likely translate into differences in ligand-binding specificities and/or affinities. In the following section, we will explore the phenomenon of multimodularity in greater detail as it relates to homogeneous (GH3) and heterogeneous (GH89) dispersion of CBMs.
Analysis of the GH89 Multimodular Enzyme from C. perfringens
CpGH89 is a 239.5-kDa protein that has a complicated modular architecture consisting of an
-N-glucosaminidase catalytic module (215–911), 6 CBM32 modules (1: 8–157, 2: 919–1060, 3: 1066–1203, 4: 1208–1345, 5: 1361–1496, 6: 1511–1623), and 3 C-terminal modules possibly involved in protein–protein complex formation (fig. 2A). Although the biological role of this enzyme in virulence has yet to be determined, the human homolog is involved in the lysosomal degradation of heparan sulfate and mutations within its gene have been causally linked to Sanfilippo B syndrome (Weber et al. 1996
).
|
To determine the relatedness between the 6 modules of GH89, the sequences at both the amino acid and nucleotide levels were analyzed. The variation between protein sequences is very high although the numbers of synonymous and nonsynonymous substitutions per site are equivalent. Indeed, only 5 out of a total 163 positions are conserved in every sequence (table 1; fig. 2B). This divergence in amino acid composition has 2 implications. Firstly, there is plasticity in amino acid selection involved in the formation of secondary structure elements, as variation is abundant throughout the β-structures. Secondly, there is poor conservation of functional residues implicated in ligand recognition, suggesting that different modules interact with different ligands—or perhaps similar ligands through different mechanisms. For example, when GH89 module 5 is compared with the remaining GH89 modules, only a single arginine residue predicted to be involved in galactose binding (Ficko-Blean and Boraston 2006
|
Closer analysis of the CpGH89 CBM32 modules reveals the presence of 3 distinct modular clusters: 2, 3, and 4; 1 and 6; and module 5 (fig. 1). This scenario is an example of heterogeneous multimodularity, where CBM32s that cluster differently within the phylogeny are present within the same enzyme. The cluster containing modules 2, 3, and 4 has an average amino acid difference per site that ranges from 0.34 to 0.71 (meaning that 34–71% of the time an amino acid is conserved), with modules 3 and 4 displaying the highest similarity (0.34 amino acid differences per amino acid site). In the overall phylogeny, these modules group with the GH85 CBMs and the GH29 fucosidase CBMs from Xanthomonas sp. The module 1 and 6 clusters have an amino acid difference of 0.68 per site and group with the CBM32s from the B. fragilis GH2 and Enterococcus faecalis chondroitin lyase. Module 5 displays high similarity to several other modules from different C. perfringens enzymes including the single modules from GH33 and GH84C, and module 3 from both GH20 and GH31, which suggests that a mechanism based on the horizontal transfer of genetic material (very common in bacterial evolution) could contribute to a certain extent to the homologies detected between modules from different CBMs in this species. The structure of the GH84C CBM32 has been solved enabling the comparison of the secondary structures between these modules (Ficko-Blean and Boraston 2006
|
Another key observation about these enzyme architectures is their modular positioning. The related sequences are found in a different order along the enzyme coordinate. Within CpGH84C, CpGH31, CpGH89, and CpGH20, the module is located C-terminal of the catalytic domain at various distances, and for CpGH33, the CBM32 is N-terminal (fig. 3C). Although this observation may simply underline the inherent flexibility of the connecting regions between modules, its complete functional significance awaits structural characterization that will define the modular positions in 3-dimensional space.
The Tri- and Tetramodule CBM32 Proteins from S. degradans
Saccharophagus degradans is a glycophilic marine bacterium. This organism displays a remarkable ability to degrade at least 10 stereochemically unique complex carbohydrates from algal, plant, and invertebrate sources (Taylor et al. 2006
). Included in its impressive arsenal of CAZymes are 128 predicted GHs, 36 polysaccharide lyases, 15 carbohydrate esterases, and 133 CBMs, including 25 members from family 32.
Similar to C. perfringens, CBM32 multimodularity within S. degradans enzymes is commonly observed. Examples include a tetramodular CBM32 appended to a GH3 (SdGH3) catalytic module, an independent trimodular protein, and 6 other enzymes with 2 CBM32 copies (Supplementary Materials online). In order to characterize the relatedness of 2 distinctly different polypeptides containing family 32 CBMs, we analyzed the phylogenies of the independent trimodule and SdGH3.
The independent trimodule contains modular boundaries at amino acid positions 128–254 (module 1), 275–394 (module 2), and 407–556 (module 3). It is difficult to predict the function of this protein for 3 reasons. 1) They lack a catalytic appendage; 2) they cluster within a section of the tree that contains CBMs appended to catalytic modules displaying a wide variety of predicted activities (fig. 1); and 3) these sequences are very distantly related to the structurally characterized CBM32s that compromises the reliability of binding site residue identification.
The SdGH3 also has 3 contiguously arranged CBM32s at its N-terminus: module 1 (35–170), 2 (171–305), and 3 (406–542). Module 4 (1405–1546) is separated from this cluster by the catalytic module housed within the protein core (fig. 4A). The functional implications for this modular architecture are difficult to determine without appropriate structural information and complementary biochemical analysis of the contributions of each CBM32 to binding; however, it seems possible that the C-terminal module may assist in anchoring and perhaps orientating the enzymatic module at the substrate surface.
|
Although all 4 of these modules cluster together within the overall family topology, there are clearly 2 subgroupings: modules 1 and 2 sharing 48% identity and modules 3 and 4 sharing 37% identity. The predicted substrate for the family 3 GH is cellulose (β-linked glucan), one of the main structural polysaccharides found within plant cell walls. Although cellulose adopts a relatively simple structure (β-1,4 linked glucose within a linear, fibrous superstructure), the presence of 4 modules within the enzyme with potentially 2 distinctively tailored binding specificities suggests that the enzyme may be targeted to different substructures within the polysaccharide. Although it is tempting to speculate about this, characterization of the CBM32s with SdGH3 is required before appropriate conclusions can be drawn.
The trimodule and GH3 tetramodule from S. degradans are excellent examples of homogeneous multimodularity, which implies that all the CBM32 components within a protein show higher similarity to each other than to those found within other proteins (table 2). This is in direct contrast to the heterogeneous multimodularity observed within the more complex CpGH89 enzyme. Whether amino acid homogeneity at the structural level translates into overlapping ligand-binding profiles at the functional level remains to be determined. The CBMs from GH3 tetramodule would make excellent targets for future structural studies as they represent some of the most distantly related sequences within the tree to any of the current CBM32s with a known 3-dimensional structure.
|
The Independent CBM32s from Y. enterocolitica and M. xanthus
The galacturonic acid–binding proteins introduce another novel mechanism in CBM32 function. These proteins localize within the periplasm of several gram-negative enteric animal pathogens and operate within a pectin utilization pathway as an independent protein (Rodionov et al. 2004
|
The evolution of these proteins is of particular interest toward understanding the overall phylogeny of CBM32s as this group is nestled between 2 of the 3 modular clusters from CpGH89 (modules [2–4] and [1, 6]) (fig. 1). As described above, the cognate ligands for these clusters are currently unknown so direct comparisons of ligand-binding profiles is not possible. Potential similarities in ligand specificity between the CBM32s of CpGH89, an enzyme predicted to be active upon the
-1,4 linked gluco-configured monosaccharides of heparan sulphate (N-acetylglucosamine and glucuronic acid), and the Yersinia sp. modules, which bind
-1,4 linked galacto-configured (O4 axial) polysaccharides, is not readily clear. Possibly, these proteins may utilize a similar mechanism of binding directed toward the common equatorial C5 carboxylate groups. Experiments have been initiated to help illuminate this relationship.
In proximity to the galacturonic acid–binding cluster, there is another class of independent CBMs belonging to the bacterium M. xanthus (table 2; fig. 5A). Within these organisms, there are 2 closely related CBM32-containing proteins (52% identity) that also operate independently of a catalytic module: one of which consists of a single CBM32 (MXAN_0542) and one that contains 2 tandem copies with 84% identity to each other (MXAN_4914). Fused to the C-terminus of these proteins is a large unknown region that displays similarity to a predicted lipoprotein also from M. xanthus, suggesting a possible role in extracellular localization (fig. 5B). When the CBM32s from Yersinia sp. and M. xanthus are compared, there are 24 fixed residues out of 130 alignment positions. As in previous examples of multimodularity, Myxococcus modules show a moderate degree of protein and nucleotide variation (table 2); however, it is considerably lower in comparison than the C. perfringens GH89 (table 1) and S. degradans tri- and tetramodules (table 2). Comparison between silent and nonsilent variation within the Y. enterocolitica and M. xanthus proteins indicates the presence of purifying selection maintaining the β-sandwich scaffold (fig. 5A). By analyzing the nucleotide diversity across the modules, it seems that there is a trend toward higher nucleotide (total and silent) variation in regions corresponding to β-structures, which is actually coincident with many positions showing fixed amino acid residues in all sequences (fig. 5A and C). In this regard, the effect of negative selection would be evident in these regions, allowing for high numbers of synonymous substitutions without altering the amino acid residues composing the β-sandwich. In YeCBM32, the codon usage bias is an average of 47.4 ± 1.4, which contrasts with the extreme bias in the case of the modules of Myxococcus (32.5 ± 0.9). Compared with previous estimations, and by taking into account the presence of different modules, it is evident a trend toward higher codon bias values in multimodular CBMs compared with single-module CBMs. The higher amount of codon bias within multimodular CAZymes could be representative of a higher level of polypeptide complexity. Thus, the different amino acids will be encoded by preferred codons in each module, increasing the efficiency and accuracy of translation (Akashi 1994
). However, and given that the likelihood of a species-specific codon bias is high considering the ecology of these organisms, further studies would be needed in order to assess if such differences are due to the biased usage of preferred codons in multimodular CBMs (compared with single-module CBMs) or, if otherwise, these are the result of differences in the species-specific codon bias levels between these organisms.
Structural alignments between YeCBM32 and the 3 closely related CBM32s from Myxococcus reveal that there is a high level of conservation of the basic amino acids predicted to be involved in ligand binding (Abbott et al. 2007
) (fig. 5A and D). It is believed that these residues create a basic pocket designed to interact with the internal sugars of the negatively charged polygalacturonic acid (fig. 5E). In this regard, the noticeable absence of genes within the M. xanthus genome encoding classical pectin utilization enzymes, such as polysaccharide lyases (families 1–4, 9–11) and the polygalacturonases (family 28 GHs), is surprising. Biochemical characterization of the M. xanthus CBM32s, therefore, would make an interesting study with implications for the role of the conserved basic binding pocket in carbohydrate recognition and possibly extracellular adherence.
| Conclusion |
|---|
|
|
|---|
Traditionally, CBMs are viewed as performing relatively simple roles in nature. More recently, however, CBMs have been implicated in many inflammatory forms of microbial pathogenesis and the metabolism of higher organisms including plants and animals. In addition to this unraveling functional diversity, new complex architectures have emerged, including enzymes with multiple copies of CBM32s and independent CBM32s such as the polygalacturonic acid–binding lectin-like proteins.
The presence of multiple CBMs within 1 enzyme is a phenomenon that invites more intense phylogenetic and biochemical investigation. Toward this end, family 32 provides a repository of information as currently 31 proteins from different organisms have been determined to contain more than 1 CBM32. At first glance, it is tempting to speculate that CBM multimodularity results from a process of intragenic duplication, involving functional domains; however, the process is clearly not that simple. The data presented here suggest that there are at least 2 distinct strategies intended to enhance overall enzyme function: 1) heterogeneous and 2) homogeneous modular arrangement.
Heterogeneous multimodularity occurs when one or more module within an enzyme is more closely related to a CBM found in a different enzyme than the other CBM32s within the same protein. This pattern underlines the likelihood that different CBMs within the same enzyme possess distinct ligand-binding specificities. For example, module 5 from CpGH89 displays a high level of divergence from the other CBMs within this enzyme and is closely related to modules from GH84C, GH33, GH31, and GH20 (fig. 2). The biological importance of heterogeneous CBMs within these enzymes has not been determined; however, there are 2 tantalizing possibilities. Firstly, harnessing CBMs that possess distinct ligand specificities may fine-tune enzyme targeting to a heterogeneous multivalent substrate. Secondly, it may facilitate enzyme residency as nascent ligands are exposed during catalytic turnover that are recognized by ancillary CBMs.
Homogeneous multimodularity refers to CBMs that display the closest similarity to other CBMs within the same protein. It is predicted that such amino acid similarity will translate into overlapping ligand-binding profiles, which increases the propensity of the complete polypeptide to interact more tightly with a target ligand through an "avidity effect." Simply stated, this means that the overall affinity of the CBMs operating in tandem is greater than the individual modules acting independent of one another. This specialized form of multivalency has been previously characterized for a variety of tandem CBMs found in other families, including the xylan-binding CBM2bs from a Cellulomonas fimi xylanase (Bolam et al. 2001
), the xylo- and cellulose-binding CBM6 triplicate modules from Clostridium stercorarium (Boraston et al. 2002
), and the starch-binding CBM26s from lactobacillus
-amylases (Guillén et al. 2007
). Homogeneous multimodularity within family 32, however, may be a more complex and fine-tuned process. As described above, the presence of 2 subgroups (modules 1, 2 and modules 3, 4) within SdGH3 suggests that the protein may possess tailored CBM32 ligand-binding specificities that work in cooperation to recognize distinct substructures within the target substrate. In essence, this possibility may blur the apparent functional distinction between homogeneous and heterogeneous multimodularity. Nevertheless, both these modular signatures appear to be a powerful evolutionary method to tweak or enhance enzyme activity.
In addition to the contributions of multimodular carbohydrate active proteins, the functions of independent CBMs are now beginning to be understood. Beyond the galacturonic acid–binding CBM32s, which likely operate to retain oligogalacturonides within the periplasm, independent CBMs have also been documented in families 29 and 33, where they perform novel roles in the degradation of insoluble polysaccharides. Family 29 contains only 1 member: the noncatalytic protein NCP1 from the anaerobic fungus Piromyces equi. NCP1 consists of 2 tandem CBMs (CBM29-1 and -2) and is a component of the multisubunit extracellular plant cell wall degradation complex found in the gastrointestinal tract of herbivores (Wong et al. 1995
). This protein displays a promiscuous pattern of ligand binding, including xylan, mannan, and cellulose (Charnock et al. 2002
; Flint et al. 2005
). The tandem positioning of 2 related CBMs enables the protein to interact with carbohydrates in a unique "lamination" mechanism that sandwiches the ligand between the domains (Flint et al. 2004
). The CBM33 from Serratia marcescens (Cbp21) is an independent acting protein involved in the degradation of chitin. Cbp21 is secreted and interacts with crystalline chitin perturbing its superstructure. This process facilitates destruction of the polysaccharide by chitinase A, B, and C (Vaaje-Kolstad et al. 2005
). The abundance of independent proteins within family 32, including the dimodule from M. xanthus and the trimodule from S. degradans, provides an excellent opportunity to expand our understanding of the operations of independent CBMs and perhaps discover new functions for this interesting class of proteins.
CBM32s represent a complex family of proteins that span a wide range of ligand-binding profiles and functions. In many cases, the presence of multiple CBM32s within an enzyme or the lack of catalytic module complicates their characterization even further. The phylogenic analysis presented here helps to alleviate these difficulties by providing a novel approach for their classification. By analyzing each CBM32 independently, it positions the module with CBM32s of related sequence that may more accurately represent its biological function. In addition to streamlining several new routes for investigation within family 32, we anticipate that this approach will be helpful for the future classification of members within other CBM families that have diversity in ligand selectivity.
| Supplementary Material |
|---|
|
|
|---|
Supplementary materials are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
A.B.B. is Canada Research Chair in molecular interactions. This work was funded by the National Sciences and Engineering Research Council of Canada and by a Postdoctoral Marie Curie International Fellowship within the 6th European Community Framework Program (to J.M.E.-L). We are grateful to L. Ficko-Blean and A. Lammerts van Bueren for their helpful comments and critical review of the manuscript.
| Footnotes |
|---|
1 These authors contributed equally to this work.
Claudia Kappen, Associate Editor
| References |
|---|
|
|
|---|
Abbott DW, Boraston AB. Specific recognition of saturated and 4,5-unsaturated hexuronate sugars by a periplasmic binding protein involved in pectin catabolism. J Mol Biol (2007).
Abbott DW, Hrynuik S, Boraston AB. Identification and characterization of a novel periplasmic polygalacturonic acid binding protein from Yersinia enterocolitica. J Mol Biol (2007).
Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics (1994) 136:927–935.[Abstract]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.
Bolam DN, Xie H, White P, Simpson PJ, Hancock SM, Williamson MP, Gilbert HJ. Evidence for synergy between family 2b carbohydrate binding modules in Cellulomonas fimi xylanase 11A. Biochemistry (2001) 40:2468–2477.[CrossRef][Web of Science][Medline]
Boraston A, Lammerts van Bueren A, Ficko-Blean E, Abbott DW, Forthcoming. Carbohydrate-protein interactions: carbohydrate-binding modules. Carbohydr Glycosci.
Boraston AB, Bolam DN, Gilbert HJ, Davies GJ. Carbohydrate-binding modules: fine-tuning polysaccharide recognition. Biochem J (2004) 382:769–781.[CrossRef][Web of Science][Medline]
Boraston AB, McLean BW, Chen G, Li A, Warren RA, Kilburn DG. Co-operative binding of triplicate carbohydrate-binding modules from a thermophilic xylanase. Mol Microbiol (2002) 43:187–194.[CrossRef][Web of Science][Medline]
Charnock SJ, Bolam DN, Nurizzo D, Szabo L, McKie VA, Gilbert HJ, Davies GJ. Promiscuity in ligand-binding: the three-dimensional structure of a Piromyces carbohydrate-binding module, CBM29-2, in complex with cello- and mannohexaose. Proc Natl Acad Sci USA (2002) 99:14077–14082.
Coutinho PM, Henrissat B. Carbohydrate-active enzymes: an integrated database approach. In: Recent advances in carbohydrate bioengineering—Gilbert GDHJ, Henrissat B, Svensson B, eds. (1999) Cambridge: The Royal Society of Chemistry. 3–12.
Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ. JPred: a consensus secondary structure prediction server. Bioinformatics (1998) 14:892–893.
Efron B, Halloran E, Holmes S. Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA (1996) 93:13429–13434.
Felsestein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution Int J Org Evolution (1985) 39:783–791.[CrossRef][Web of Science]
Ficko-Blean E, Boraston AB. The interaction of carbohydrate-binding module from a clostridium perfringens N-acetyl-beta-hexosaminidase with its carbohydrate receptor. J Biol Chem (2006) 281:37748–37757.
Flint J, Bolam DN, Nurizzo D, Taylor EJ, Williamson MP, Walters C, Davies GJ, Gilbert HJ. Probing the mechanism of ligand recognition in family 29 carbohydrate-binding modules. J Biol Chem (2005) 280:23718–23726.
Flint J, Nurizzo D, Harding SE, Longman E, Davies GJ, Gilbert HJ, Bolam DN. Ligand-mediated dimerization of a carbohydrate-binding molecule reveals a novel mechanism for protein-carbohydrate recognition. J Mol Biol (2004) 337:417–426.[CrossRef][Web of Science][Medline]
Gilkes NR, Warren RA, Miller RC Jr, Kilburn DG. Precise excision of the cellulose binding domains from two Cellulomonas fimi cellulases by a homologous protease and the effect on catalysis. J Biol Chem (1988) 263:10401–10407.
Gouet P, Robert X, Courcelle E. ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of proteins. Nucleic Acids Res (2003) 31:3320–3323.
Guillén D, Santiago M, Linares L, Pérez R, Morlon J, Ruiz B, Sánchez S, Rodríguez-Sanoja R. Alpha-amylase starch binding domains: cooperative effects of binding to starch granules of multiple tandemly arranged domains. Appl Environ Microbiol (2007) 73:3833–3837.
Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser (1999) 41:95–98.
Hugouvieux-Cotte-Pattat N, Blot N, Reverchon S. Identification of TogMNAB, an ABC transporter which mediates the uptake of pectic oligomers in Erwinia chrysanthemi 3937. Mol Microbiol (2001) 41:1113–1123.[CrossRef][Web of Science][Medline]
Hugouvieux-Cotte-Pattat N, Reverchon S. Two transporters, TogT and TogMNAB, are responsible for oligogalacturonide uptake in Erwinia chrysanthemi 3937. Mol Microbiol (2001) 41:1125–1132.[CrossRef][Web of Science][Medline]
Kumar S, Tamura K, Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.
Machovic M, Janecek S. Starch-binding domains in the post-genome era. Cell Mol Life Sci (2006a) 63:2710–2724.[CrossRef][Web of Science][Medline]
Machovic M, Janecek S. The evolution of putative starch-binding domains. FEBS Lett (2006b) 580:6349–6356.[CrossRef][Web of Science][Medline]
Machovic M, Svensson B, MacGregor EA, Janecek S. A new clan of CBM families based on bioinformatics of starch-binding domains from families CBM20 and CBM21. FEBS J (2005) 272:5497–5513.[CrossRef][Medline]
Marin-Rodriguez MC, Orchard J, Seymour GB. Pectate lyases, cell wall degradation and fruit softening. J Exp Bot (2002) 53:2115–2119.
Nei M, Kumar S. Molecular evolution and phylogenetics (2000) New York: Oxford University Press.
Newstead SL, Watson JN, Bennet AJ, Taylor G. Galactose recognition by the carbohydrate-binding module of a bacterial sialidase. Acta Crystallogr Sect D Biol Crystallogr (2005) 61:1483–1491.[CrossRef][Medline]
Perez-Bercoff A, Koch J, Burglin TR. LogoBar: bar graph visualization of protein logos with gaps. Bioinformatics (2006) 22:112–114.
Rodionov DA, Gelfand MS, Hugouvieux-Cotte-Pattat N. Comparative genomics of the KdgR regulon in Erwinia chrysanthemi 3937 and other gamma-proteobacteria. Microbiology (2004) 150:3571–3590.
Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics (2003) 19:2496–2497.
Rzhetsky A, Nei M. A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol (1992a) 9:945–967.[Web of Science]
Rzhetsky A, Nei M. Statistical properties of the ordinary least-squares, generalized least-squares, and minimum-evolution methods of phylogenetic inference. J Mol Evol (1992b) 35:367–375.[CrossRef][Web of Science][Medline]
Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol (1987) 4:406–425.[Abstract]
Sitnikova T. Bootstrap method of interior-branch test for phylogenetic trees. Mol Biol Evol (1996) 13:605–611.[Abstract]
Sitnikova T, Rzhetsky A, Nei M. Interior-branch and bootstrap tests of phylogenetic trees. Mol Biol Evol (1995) 12:319–333.[Abstract]
Taylor LE 2nd, Henrissat B, Coutinho PM, Ekborg NA, Hutcheson SW, Weiner RM. Complete cellulase system in the marine bacterium Saccharophagus degradans strain 2-40T. J Bacteriol (2006) 188:3849–3861.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res (1994) 22:4673–4680.
Tomme P, Van Tilbeurgh H, Pettersson G, Van Damme J, Vandekerckhove J, Knowles J, Teeri T, Claeyssens M. Studies of the cellulolytic system of Trichoderma reesei QM 9414. Analysis of domain function in two cellobiohydrolases by limited proteolysis. Eur J Biochem (1988) 170:575–581.[Web of Science][Medline]
Vaaje-Kolstad G, Horn SJ, van Aalten DM, Synstad B, Eijsink VG. The non-catalytic chitin-binding protein CBP21 from Serratia marcescens is essential for chitin degradation. J Biol Chem (2005) 280:28492–28497.
Van Tilbeurgh H, Loontiens FG, Engelborgs Y, Claeyssens M. Studies of the cellulolytic system of Trichoderma reesei QM 9414. Binding of small ligands to the 1,4-beta-glucan cellobiohydrolase II and influence of glucose on their affinity. Eur J Biochem (1989) 184:553–559.[Web of Science][Medline]
Weber B, Blanch L, Clements PR, Scott HS, Hopwood JJ. Cloning and expression of the gene involved in Sanfilippo B syndrome (mucopolysaccharidosis III B). Hum Mol Genet (1996) 5:771–777.
Willats WG, McCartney L, Mackie W, Knox JP. Pectin: cell biology and prospects for functional analysis. Plant Mol Biol (2001) 47:9–27.[CrossRef][Web of Science][Medline]
Wong MV, Ho YW, Tan SG, Abdullah N, Jalaludin S. Isozyme and morphological characteristics of the anaerobic fungus Piromyces mae isolated from the duodenum, rumen and faeces of sheep. FEMS Microbiol Lett (1995) 134:9–14.[CrossRef][Web of Science][Medline]
Wright F. The effective number of codons used in a gene. Gene (1990) 87:23–29.[CrossRef][Web of Science][Medline]
Zdobnov EM, Apweiler R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics (2001) 17:847–848.
Zhang J, Rosenberg HF, Nei M. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA (1998) 95:3708–3713.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Michel, T. Barbeyron, B. Kloareg, and M. Czjzek The family 6 carbohydrate-binding modules have coevolved with their appended catalytic modules toward similar substrate specificity Glycobiology, June 1, 2009; 19(6): 615 - 623. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Ficko-Blean, K. J. Gregg, J. J. Adams, J.-H. Hehemann, M. Czjzek, S. P. Smith, and A. B. Boraston Portrait of an Enzyme, a Complete Structural Analysis of a Multimodular {beta}-N-Acetylglucosaminidase from Clostridium perfringens J. Biol. Chem., April 10, 2009; 284(15): 9876 - 9884. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Abbott and A. B. Boraston Structural Biology of Pectin Degradation by Enterobacteriaceae Microbiol. Mol. Biol. Rev., June 1, 2008; 72(2): 301 - 316. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







