MBE Advance Access originally published online on August 31, 2006
Molecular Biology and Evolution 2006 23(12):2288-2302; doi:10.1093/molbev/msl100
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Insights into Early Extracellular Matrix Evolution: Spongin Short Chain Collagen-Related Proteins Are Homologous to Basement Membrane Type IV Collagens and Form a Novel Family Widely Distributed in Invertebrates





* Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, Villeurbanne, France
Institut de Biologie et Chimie des Protéines (IBCP), UMR CNRS 5086, Université Claude Bernard Lyon 1, IFR128 BioSciences Lyon-Gerland, Lyon, France
E-mail: jy.exposito{at}ibcp.fr.
| Abstract |
|---|
|
|
|---|
Collagens are thought to represent one of the most important molecular innovations in the metazoan line. Basement membrane type IV collagen is present in all Eumetazoa and was found in Homoscleromorpha, a sponge group with a well-organized epithelium, which may represent the first stage of tissue differentiation during animal evolution. In contrast, spongin seems to be a demosponge-specific collagenous protein, which can totally substitute an inorganic skeleton, such as in the well-known bath sponge. In the freshwater sponge Ephydatia mülleri, we previously characterized a family of short-chain collagens that are likely to be main components of spongins. Using a combination of sequence- and structure-based methods, we present evidence of remote homology between the carboxyl-terminal noncollagenous NC1 domain of spongin short-chain collagens and type IV collagen. Unexpectedly, spongin short-chain collagenrelated proteins were retrieved in nonsponge animals, suggesting that a family related to spongin constitutes an evolutionary sister to the type IV collagen family. Formation of the ancestral NC1 domain and divergence of the spongin short-chain collagenrelated and type IV collagen families may have occurred before the parazoaneumetazoan split, the earliest divergence among extant animal phyla. Molecular phylogenetics based on NC1 domain sequences suggest distinct evolutionary histories for spongin short-chain collagenrelated and type IV collagen families that include spongin short-chain collagenrelated gene loss in the ancestors of Ecdyzosoa and of vertebrates. The fact that a majority of invertebrates encodes spongin short-chain collagenrelated proteins raises the important question to the possible function of its members. Considering the importance of collagens for animal structure and substratum attachment, both families may have played crucial roles in animal diversification.
Key Words: extracellular matrix basement membrane spongin collagen remote homology metazoan evolution
| Introduction |
|---|
|
|
|---|
Basement membranes are sheet-like complexes of extracellular matrix structures underlying epithelial and endothelial tissues and surrounding muscle cells, peripheral nerves, and adipocytes. They play important functions as selective barriers for macromolecules and scaffold support for cells and in cell behavior (Erickson and Couchman 2000
1
6) have been identified, which are involved in the formation of heterotrimeric molecules with (
1)2
2 being the most abundant and ubiquitous isoform (Hudson et al. 1993
chains and also for the initiation of triple helix formation (Boutaud et al. 2000
Type IV is one of the vertebrate collagens, which shows a wide distribution in invertebrates, from cnidarians to chordates. It has been described in a unique group of sponges, Homoscleromorpha, which presents a basement membrane-like structure (Boute et al. 1996
). Homoscleromorpha has been included in the class Demospongiae for a long time. However, from recent phylogenetic analyses, Borchiellini et al. (2004)
proposed that Homoscleromorpha may rather form one of the 4 main sponge taxa and should no more be included in the taxon Demospongiae. Thus, a common morphological character of both Homoscleromorpha and Eumetazoa (nonsponge Metazoa), but not Demospongiae, is the presence of a basal membrane with type IV collagen. Other types of collagens were found in Demospongiae species. A family of collagens including a collagenous domain of approximately 120 Gly-Xaa-Yaa triplets and a carboxy (C)-terminal region sharing some similarities with nematode cuticular collagens and vertebrate fibril-associated collagens with interrupted triple helices has been reported in the sponge Microciona prolifera (Aho et al. 1993
). In addition, a fibrillar collagen chain and a short-chain collagen family have been described in the freshwater sponge Ephydatia mülleri (Exposito and Garrone 1990
; Exposito et al. 1991
). Genes encoding these 2 collagen families are highly expressed during the early development of sponges from asexual buds (gemmules). In these developing animals, 2 collagen supramolecular structures have been defined, that is, the striated fibrils and the spongins. Like in other animals, fibrillar collagens are involved in the formation of striated fibrils. For spongins, our previous data strongly suggested that they are made, at least in part, by the short-chain collagens (for the sake of simplicity, this sponge short-chain collagen family is termed "spongin short-chain collagens" in this article). Indeed, genes encoding the sponge short-chain collagens are highly expressed in cells located in the epithelial layer and around the inorganic skeleton, these cells being precisely those that secrete spongin (for an ultrastructural analysis, see fig. 7 in Exposito et al. 1991
; http://www.jbc.org/cgi/reprint/266/32/21923). In freshwater sponges, these 2 cell types are similar and often join to form a continuous epithelium including the sponge basal surface and ramifying inside the animal body, around the skeleton (fig. 5, ibid). Interestingly, these sponge short-chain collagens also share similarities with nematode cuticular collagens (Exposito et al. 1990
, 2002
). Spongins, which have been defined as an exoskeleton (Garrone 1984
), stick the animal to its substratum, link together the skeletal spicules, and are also present in the coat of gemmules. Although the spongin matrix has been defined as an exoskeleton (Garrone 1984
), spongins exhibit different morphological aspects among demosponges and according to the tissues (the term "spongin" initially served to designate sponge structures made of microfibrils of about 10 nm in diameter). To date, it is not known if all spongin assemblies are equivalent (Simpson 1984
; Garrone 1985
) and whether or not they are entirely made of sponge short-chain collagens. At the molecular level, the spongin short-chain collagens contain 2 collagenous domains encompassing 79 Gly-Xaa-Yaa triplets and 3 noncollagenous domains. Notably, the noncollagenous C-terminal domain has also been observed in 2 proteins of the sponge Suberites domuncula, with one of them including a short collagenous domain of 24 Gly-Xaa-Yaa triplets (Krasko et al. 2000
; Schröder et al. 2000
). At this point, it is important to indicate that from the collagen nomenclature, noncollagenous domains have been named purely on the basis of their position from the C-terminus of the collagen chain, that is, the most C-terminal noncollagenous regions have been defined as NC1 domains although their sequences are often unrelated. In that respect, we previously noticed that spongin short-chain collagen NC1 domain could be divided, like type IV NC1, into 2 similar subdomains sharing
26% of identity (Exposito et al. 1990
). However, except for 2 short regions, similarity between the NC1 domains of these 2 collagen families was not obvious. Now, with the availability of complete genomic sequences and improvements in bioinformatic tools, we examined this resemblance in detail.
|
|
Here, we show that a novel protein family related to spongin short-chain collagens is present in invertebrates (except Ecdysozoa), including nonsponge organisms but is undetectable in vertebrates. Evidences from comparison of modular structure, careful examination of primary sequence features, and structural modeling of the NC1 domain of E. mülleri spongin short-chain collagen strongly suggest a common origin for spongin short-chain collagen and type IV collagen NC1 domains. Phylogenetic studies show that formation of the bipartite NC1 domain and divergence of the spongin short-chain collagen and type IV collagen families may have occurred early in the evolution of multicellular animals (most probably before the parazoaneumetazoan split), possibly representing cases of ancient intra- and intergenic duplications in the evolutionary history of Metazoa. We propose that although type IV collagen and spongin short-chain collagen NC1 domains diverged appreciably (across more than 500 Myr of evolutionary time), they are component of modular proteins that most likely subserve related structural (stability of a macromolecular network) and biological (barriers and cellular attachment) functions in Metazoa.
| Materials and Methods |
|---|
|
|
|---|
Database Searching
Published sequences from sponge and type IV collagen chains were obtained using the Entrez Nucleotide database at National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). The NC1 sequences of E. mülleri spongin short-chain collagen, spongin short-chain collagenrelated proteins, and type IV collagen proteins were used to screen nucleotide databases located at NCBI using TBlastN (Altschul et al. 1997
|
Molecular Modeling
The 3D model of E. mülleri spongin short-chain collagen C-terminal domain based on type IV collagen NC1 domain was built by using Geno3D, a comparative molecular modeling program for proteins (Combet et al. 2002
1 chain) was taken as template for molecular modeling. Sequence alignment of spongin short-chain collagen NC1 domain based on type IV collagen proteins was validated by using phylogenetic and predicted secondary structure information (Geourjon et al. 2001
carbons (Geourjon and Deleage 1995
Alignment and Evolutionary Analysis
Sequences of type IV collagen and spongin short-chain collagenrelated NC1 domains (either separately or in combination) were first aligned using ClustalW (Thompson et al. 1994
) with BLOSUM alignment matrices and adjusted gap penalties (at the Pole BioInformatique Lyonnais). The resulting initial alignments were scanned using RASCAL (Thompson et al. 2003
) and manually improved using the SeaView alignment editor (Galtier et al. 1996
). When possible, structural information was incorporated in order to improve alignment accuracy. The alignments were constructed in a 2-stage manner: 1) alignments of complete NC1 domains were first produced [subdomain a plus subdomain b] and 2) the stretches corresponding to the different subdomains were separated, and the 2 resulting alignments were aligned together using information from the consensus sequences (subdomain a over subdomain b). Neighbor-joining (NJ or BIONJ) and maximum likelihood (ML) analyses were performed on the final alignments. For NJ, the trees were made using Phylo_win (Galtier et al. 1996
) with pairwise gap removal, 1,000 bootstrap repetitions, and observed divergence or Poisson correction as distance methods. The PHYML v2.4.4 algorithm (Guindon and Gascuel 2003
) was applied for the ML analyses, under the JTT or Dayhoff model of sequence evolution. Bootstrap support was based on 100 replicates using the programs SEQBOOT and CONSENSE (majority rule extended) of the PHYLIP package (Felsenstein 1996
), to generate data replicates and consensus tree, respectively. Illustrations were drawn using the TreeView program (Page 1996
) and then annotated using Adobe Illustrator. Number of synonymous (Ks) and nonsynonymous (Ka) nucleotide substitutions per site between homologous DNA sequences were estimated using an ML method as implemented in the codeml program (Goldman and Yang 1994
). For each humanmouse and humanchicken orthologous gene pairs, cDNA sequences were aligned in accordance with pairwise amino acid alignments.
| Results |
|---|
|
|
|---|
Type IV Collagen and Spongin Short-Chain Collagen-Related NC1 Domains Display Distinct Phyletic Distribution but Share Similar Primary Structure Features
With the initial aim of searching proteins related to sponge short-chain collagens in Porifera, we mined public databases with Blast using short-chain collagen NC1 sequence from the freshwater sponge E. mülleri as seed. This search led to the discovery of cDNAs encoding putative spongin short-chain collagenrelated proteins in S. domuncula and, quite unexpectedly, in a number of protostomes with the notable exception of Ecdysozoa, and in invertebrate deuterostomes (table 1). The same analysis carried out with ecdysozoan (drosophila, mosquito, nematodes, and honeybee) or vertebrate (tetraodon, zebrafish, chicken, and human) genomes confirmed the absence of spongin short-chain collagenrelated sequences in these animals, using this approach. Use of the NC1 sequences of the 2 spongin short-chain collagenrelated proteins from S. domuncula or from the newly identified spongin short-chain collagenrelated proteins gave the same result. In addition, HMM search against Swiss-Prot using profiles built with complete NC1 domains of spongin short-chain collagenrelated proteins recovered only the E. mülleri spongin short-chain collagen sequence (P18503).
Our previous work revealed that, intriguingly, spongin short-chain collagen and type IV collagen NC1 domains exhibit a same bipartite architecture and regions with local similarities (Exposito et al. 1990
). Indeed, it appears clearly from the schematic view presented in figure 1 that spongin short-chain collagen (related) and type IV collagen NC1 domains display similar lengths, have conserved cysteine residues, and are equally subdivided into 2 presumably homologous subdomains. Moreover, like in the sponge S. domuncula, other spongin short-chain collagenrelated proteins can possess a collagenous region including several Gly-Xaa-Yaa triplets. The different NC1 domains have not been found in combination with other known protein domains. Thus, members of the spongin short-chain collagenrelated and type IV collagen protein families could include a collagenous region in addition to a NC1 domain, indicating that they might have homeomorphic evolutionary relationships. We wondered whether spongin short-chain collagenrelated proteins would be retrieved using type IV collagen NC1 sequences as seeds in Blast searches. This analysis confirmed the presence of type IV collagen in the sponge class Homoscleromorpha and in all eumetazoan lineages (table 1) but failed to recover any spongin short-chain collagenrelated sequence. These data suggested that spongin short-chain collagenrelated and type IV collagen NC1 domains were too distantly related to be detected by reciprocal Blast searches.
|
Type IV Collagen and Spongin Short Chain Collagen NC1 Domains Display Structural Similarities
Secondary Structure Predictions and Threading Experiments
Threading methods are 3D-structure prediction techniques that can reveal more distant relationships than conventional sequence-based methods such as Blast. We decided to take advantage of the solved structures of type IV collagen NC1 domain (Sundaramoorthy et al. 2002
1)2
2]2 hexamer structure (table 2). The best result was obtained for the hydra sequence Hma CN62 with the FUGUE analysis system at the 99% confidence level, indicating a remarkable compatibility of the secondary structures (Z-Score of 16.22; table 2). For E. mülleri spongin short-chain collagen NC1, FUGUE gave the best result with 1li1 at the 95% confidence level. We also used the tissue inhibitor of metalloproteinase (TIMP-1) sequence as input because a putative structural link between the type IV collagen NC1 domain and TIMP-1 was previously proposed (Netzer et al. 1998
16% of identity between spongin short-chain collagen and human
1(IV) NC1 domains; table S1, Supplementary Material online), the NC1 domains of spongin short-chain collagenrelated and type IV collagens may have similarities in their 3D structures.
|
Spongin Short-Chain Collagen NC1 Model Construction and Analysis
On the basis of the threading results and 2D predictions, we attempted to model the E. mülleri spongin short-chain collagen NC1 domain using 1li1 as template. A structural model of the spongin short-chain collagen NC1 monomer is presented in figure 2A, whereas the X-ray derived structure of a type IV collagen NC1 monomer is shown in figure 2B. The first observation that could be made is that the ß-strands located near the triple-helical junction are clearly retrieved in the spongin short-chain collagen model. This suggests that this ordered region is likely to be rigid, a prerequisite for the initiation of a quaternary structure where NC1 trimers are expected to be attached to a rope-like triple helix. Sequence conservation information derived from the complete multiple alignments were mapped onto the spongin short-chain collagen NC1 model and the type IV collagen NC1 structure (fig. 2Cand D). Apart from cysteine residues (addressed below and fig. 2A and B), 9 and 37 conserved residues were observed within the spongin short-chain collagenrelated and type IV collagen sequence clusters, respectively, and 4 amino acids were perfectly conserved between spongin short-chain collagenrelated and type IV collagen NC1 domains. Noteworthy, in the structural context, the NC1 residues conserved in spongin short-chain collagen (fig. 2C, yellow), type IV collagen (fig. 2D, blue), and both sequences (fig. 2C and D, green) are mostly located at the proximity of the triple-helical junction region. This well-conserved region between spongin short-chain collagen and type IV collagen NC1 domains corresponds in type IV collagen to the ß-sheet I, which is formed by the 3 noncontiguous strands (ß1, ß10, and ß2) in both NC1 subdomains (Sundaramoorthy et al. 2002
|
Next, the occurrence and relative positions of cysteine residues were investigated in the different NC1 domains. Similar cysteine residues within the type IV collagen NC1-a and NC1-b subdomains were named C1-C6 and C1'-C6', respectively (fig. 1). In type IV collagen NC1, each subdomain is stabilized by 3 intrachain disulfide bonds involving the following pairs: C1-C6, C2-C5, and C3-C4 (Siebold et al. 1988
|
In type IV collagen NC1 hexamers, hydrophobic and hydrophilic interactions stabilize the protomerprotomer interface. Moreover, it has been shown that in the [(
1)2
2] mammalian type IV collagen network, the stability of the NC1 hexamer might be reinforced by a covalent cross-link involving the NC1 residues Met93 and Lys211 contributed by both protomers (Than et al. 2002However, it should be kept in mind that the structural model might not reflect the exact position of the cysteine residues within the actual spongin short-chain collagen (related) NC1 domain. More generally, great caution should be taken in interpreting these results obtained by comparative protein modeling, due to the low similarity between spongin short-chain and type IV collagen NC1 domains.
Phylogenetic Analysis
Comparison of modular organization, as well as conservation of critical residues and modeling data, provides strong evidence that spongin short-chain collagen and type IV collagen NC1 domains are structurally related and presumably share a common ancestor. Because spongin short-chain collagenrelated and type IV collagen NC1 domains could reasonably be considered as homologous, multiple alignments were used as input for phylogenic analyses using NJ and ML methods. Monophyly of the type IV collagens was extremely well supported in all analyses, as well as the grouping of E. mülleri spongin short-chain collagen and spongin short-chain collagenrelated sequences. Sequences from sponges were usually retrieved at the basis of the spongin short-chain collagenrelated and type IV collagen groups (figs. 4, 5A, and 6). Hence, ancestral type IV collagen and spongin short-chain collagenrelated NC1 domains must have arisen very early during metazoan evolution and diverged before separation of the poriferan and cnidarian lineages. As spongin short-chain collagen and type IV collagen may be ancient paralogues, we were interested in determining the evolutionary relationships within both protein families.
|
|
As previously shown (Mariyama et al. 1992
1-like (
1,
3 and
5 in vertebrates) and
2-like (
2,
4 and
6 in vertebrates) (figs. 4 and 5A). As one of the 2 type IV collagen sequences from Hydra (Hma DN13) could not be unambiguously placed in the different trees, it is unclear at this stage whether the
1-like/
2-like duplication already took place in this organism or if the emergence of the 2 type IV subfamilies occurred after the CnidariaBilateria split. Hydra might also possess an as yet undiscovered
2-like chain or have lost the corresponding gene. In this regard, it is important to note that the type IV collagen chains of the sea anemone N. vectensis, which lies at the basis of the Cnidaria, segregated with the sequences of Hydra magnipapillata, disfavoring the hypothesis of a third,
2-like gene, in Cnidaria (data not shown). Although supported by low bootstrap values, segregation of the ecdysozoan type IV collagen sequences inside the
1-like and
2-like groups was the most frequently retrieved tree topology (figs. 4 and 5A). Although the type IV
2-like NC1 sequence from Caenorhabditis elegans segregates with that of arthropods (figs. 4 and 5A), forming a clear ecdysozoa group, the nematode
1-like sequence (Cel P179) segregates with that of Ciona (Cin BW22). This may be due to the high divergence rate reported for nematode and ciona genes in general compared with other species (Mushegian et al. 1998
1-like sequences in arthropods (that produce longer branches compared with the
2-like cluster, see fig. 5A). Type IV collagen gene diversification has occurred later, in the early evolution of vertebrates, most probably after their divergence with cephalochordates (6 genes were identified in Tetraodon nigroviridis, whereas only 2 genes were found in amphioxus). Previous studies have shown that, in mammals, the col4a1/col4a2, col4a3/col4a4, and col4a5/col4a6 gene pairs were located on 3 different chromosomes in a head-to-head fashion (Hudson et al. 1993
3 evolved before the duplication resulting in the
1/
5 pair in the
1-like cluster, and that duplication of an ancestral
4 gene predated the divergence of
2 and
6 in the
2-like clade. Inspection of the chromosomal location of type IV collagen genes in Gallus gallus revealed identical pairing. Our phylogenetic reconstruction using chicken and human orthologous chains unambiguously placed
3 at the basis of the vertebrate
1-like cluster, but
6 sequences were often retrieved basal to the
2-like cluster, demonstrating phylogenetic incongruence (see figs. 4 and 5A for instance). An NJ analysis (fig. 5B) carried out with a reduced multiple alignment including vertebrate sequences from G. gallus, Mus musculus, and Homo sapiens produced a robust tree with a topology congruent with the proposed phylogenetic scheme. It is noteworthy that a significantly higher Ka/Ks ratio (table S2; Supplementary Material online) was found in a chickenhuman NC1 comparison for col4a3 (0.11), compared with the median value for genes located in intermediate chromosomes (0.052) and, unexpectedly, compared with its neighboring gene col4a4 (0.045). Interestingly, this chicken
3 chain that shows evidence of relaxation from purifying selection already evolved autoimmune epitopes as it is recognized by Goodpasture autoantibodies (MacDonald et al. 2006
3 gene being the least constrained, although in this case the Ka/Ks ratio was not increased more than expected. Interestingly, "disease genes" have been reported to evolve with higher Ka/Ks ratio (Smith and Eyre-Walker 2003
1/
2 pair, which corresponds to the ubiquitously expressed collagen IV chains, exhibited the lowest Ka/Ks ratio in both interspecific comparisons. This finding is consistent with previous data reporting stronger selective constraints for housekeeping and broadly expressed genes (Duret and Mouchiroud 2000
1 and
2 type IV collagen NC1 domains display more than 75% similarity in amino acids with their Pseudocorticium jarrei homologues, illustrating the substantial conservation of type IV collagen NC1. An NJ tree showing the possible interrelationships between the available spongin short-chain collagenrelated sequences (fig. 6) suggest recent duplications of spongin short-chain collagenrelated genes in several organisms, namely, hydra and sea urchin. Unfortunately, owing to the lack of sequence data, phylogeny of the spongin short-chain collagenrelated family can hardly be resolved further.
A novel series of multiple alignments was done using spongin short-chain collagenrelated and collagen IV NC1 subdomains instead of complete domains, and NJ and ML phylogenetic trees were derived. For each protein family, sequences corresponding to the first subdomain clustered as one monophyletic group and sequences corresponding to the second subdomain formed a similar cluster (fig. 7). Trees built by using more accurate multiple alignments of either spongin short-chain collagenrelated or collagen IV NC1 subdomain sequences also strongly supported the separate clustering of each subdomain. In other words, the subdomains of spongin short-chain collagenrelated NC1 are more similar to one another than to the corresponding subdomain in collagen IV NC1. Likewise, there is significantly more similarity between the a and b subdomains of type IV collagen NC1 than there is between these subdomains and the corresponding subdomains of spongin short-chain collagenrelated NC1. These observations could be interpreted as evidence that division into 2 homologous subdomains resulted from 2 independent tandem duplication events in the spongin short-chain collagenrelated and type IV collagen clades. In favor of this hypothesis is the fact that contiguous subdomains are more distantly related in spongin short-chain collagenrelated proteins than in type IV collagen subdomains (fig. 8). However, pairwise percent identity scores (tables S3 and S4, Supplementary Material online) and overall similarity values (see figs. 2C and D and 3) indicate that this may actually be due to faster divergence rates for spongin short-chain collagenrelated NC1 sequences compared with type IV collagen NC1 sequences. Tree topologies demonstrating separate clustering of homologous spongin short-chain collagenrelated and type IV collagen subdomains were likely in light of the great amino acid divergence between each family. As NC1 domains of spongin short-chain collagenrelated and type IV collagen chains are both N-terminally flanked by triple helix, the hypothesis of a single, initial duplication resulting in one complete NC1 sequence subsequently fused to a triple-helical motif seems therefore more parsimonious.
|
| Discussion |
|---|
|
|
|---|
To prospect for the presence of proteins including a specific module in a species, use of Blast programs is successful in most circumstances. However, as exemplified in this work, Blast analyses may sometimes be insufficient to trace the natural history of a protein module (Schmid and Tautz 1997
Extracellular matrix proteins are mainly multimodular and are often defined as mosaic entities with each type of module present in multiple copies in one protein and/or in several protein families. Although domains used in the building of extracellular proteins are usually domains of great mobility (Tordai et al. 2005
; Patthy 1999
), the spongin short-chain collagen/type IV collagen NC1 does not appear to be a mobile domain, that is, it is retrieved from the available sequences with a unique domain partner (the collagen triple helix) and in a conserved architecture. This domain therefore contributed to an ancient multimodular protein, the collagen, but apparently no longer participated in novel domain combinations during metazoan evolution. Interestingly, the situation is analogous for the C-propeptide in fibrillar collagen which, like type IV collagen NC1 domain, is involved in chain selection and in initiation of triple helix formation (Lees et al. 1997
; Myllyharju and Kivirikko 2001
).
A model for the evolution of the spongin short-chain collagen/type IV collagen NC1 domain is presented in figure 9. In this scenario, the sequence encoding the NC1 structural unit made up of 2 homologous subdomains was produced by ancient tandem duplication. This event, leading to the 2-fold repeated structural pattern observed in modern spongin short-chain collagenrelated proteins and type IV collagen NC1 domains, occurred probably in the very early evolution of animals, before the parazoaneumetazoan split. The nature (and possible function) of the ancestral sequence, which gave rise to the NC1 internal repeat, is not known. Moreover, our phylogenetic analysis did not allow us to infer which of the subdomains (a or b) was the primordial building block. The structure of the putative protodomain, rich in ß-sheets, raises the possibility that it might have already been involved in proteinprotein interactions and oligomerization at the extracellular level (Wang and Hecht 2002
; Siepen et al. 2003
). Relevant to this is the fact that the NC1 subdomains of spongin short-chain collagenrelated proteins and collagen IV are disulfide-bonded ß-rich polypeptides, these features being common in extracellular modules that face the oxidative environment of the extracytoplasmic space (Martin et al. 1998
). Partition of modern NC1 domains into 2 subdomains seems to constitute an essential feature for both structure and function, as we were not able to retrieve any sequences encoding isolated subdomains. Crystallographic data indicate that the ß-strands located near the triple-helix junction or close to the hexamer interface are contributed by different subdomains. Therefore, structural requirements driving trimeric association and oligomerization may be sufficient to explain why both subdomains are needed for the NC1 domain in order to achieve its function.
|
The initial tandem replication event was followed by gene duplication creating 2 copies that diverged to become the spongin short-chain collagenrelated and type IV collagen ancestral genes. As domain combinations are usually formed only once (Vogel et al. 2005
The spongin short-chain collagenrelated/collagen IV duplicated genes underwent different fates during eumetazoan evolution, including lineage-specific gene duplications (e.g., spongin [Exposito et al. 1991
], spongin short-chain collagenrelated genes in hydra and sea urchin, see figs. 4 and 7). This scenario of gene duplication from an ancestral half-domain sequence, followed by subsequent gene duplication and diversification, is reminiscent of the evolution of ß/
barrels in the microbial world (Lang et al. 2000
).
Importantly, our data mining analysis suggest that the spongin short-chain collagenrelated gene has been lost in the common ancestor of the ecdysozoa lineage and in the common ancestor of the vertebrate lineage. Analysis of priapulids, which are placed basal to nematodes and arthropods in the Ecdysozoa, and of early vertebrates (e.g., hagfishes and lampreys) will help providing clues to these possible events of gene loss. It is intriguing that vertebrates, that produce mineralized tissues, and moulting invertebrates, which have an external nonmineralized skeleton (note that arthropods have chitin-made cuticles while the nematode exoskeleton is formed by nontype IV collagen proteins) may be devoid of spongin short-chain collagenrelated proteins. If this apparent absence is not the mere result of missing data, it is tempting to speculate that spongin short-chain collagenrelated ancestral gene loss in the ancestors of these organisms played a role in differentiating such specialized tissues. In any case, spongin short-chain collagenrelated genes are likely to be less essential than collagen IV genes because deletions might have eliminated them from several eumetazoan genomes. Alternatively, organisms from these lineages may contain spongin short-chain collagenrelated genes that are too divergent to be uncovered by sequence analysis using current tools, that is, these genes have evolved rapidly in vertebrates and Ecdysozoa and no longer have recognizable similarity. As a general rule, the primary structure of complete spongin short-chain collagenrelated NC1 domains have been poorly conserved, even in closely related species, whereas the sequences of type IV collagen NC1 have been more preserved during metazoan evolution. Thus, sequence evolution rates and propensity for gene loss may be correlated in the system described here, with spongin short-chain collagenrelated sequences evolving faster than collagen IV NC1 sequences. This observation is in line with recent works suggesting that weakly constrained proteins are lost during evolution significantly more often than highly constrained ones (Kamath et al. 2003
; Krylov et al. 2003
).
Assuming that spongin short-chain collagenrelated proteins are involved in extracellular attachment, like spongin short-chain collagens, this marked sequence divergence between the different spongin short-chain collagenrelated NC1 domains may reflect the diversity of substrata available for attachment in various invertebrate lineages. In that respect, it would be of great interest to determine the expression profile of spongin short-chain collagenrelated genes (vs. type IV collagen), and the functions of their encoded products both in sponges and, most importantly, in nonsponge organisms (e.g., hydra, ciona, and amphioxus). What function could proteins related to spongin short-chain collagens have in protostomes and invertebrate deuterostomes? To date, results on expression pattern are only available in Ciona intestinalis and were generated by large-scale automated in situ hybridization (http://ghost.zool.kyoto-u.ac.jp/). These experiments reveal that a C. intestinalis spongin short-chain collagenrelated gene (Cin BW46, expressed sequence tag cluster CLSTR03436r1) is expressed in juvenile animals, in epithelial cells, and in body wall muscle but do not inform on the tissue distribution of the corresponding protein. As a matter of fact, in absence of experimental data, it is not obvious what role could play spongin short-chain collagenrelated proteins in organisms that possess basement membranes. Although they could be suspected of involvement in cell-matrix adhesion, intercellular cohesion, and organismal organization, spongin short-chain collagenrelated proteins may also subserve more specialized functions. Another aspect is whether or not spongin short-chain collagenrelated NC1 domains are involved in protomer formation and assembly into hexamers. Determination of the precise subcellular localization and interaction partners of spongin short-chain collagenrelated proteins together with biochemical characterization will hopefully offer insightful information into these important issues.
| Conclusion |
|---|
|
|
|---|
Spongin short-chain collagens and type IV collagen are among the oldest modular proteins unique to Metazoa because they are already present in Porifera and Cnidaria. In modern multicellular animals, spongin gives a sponge its flexibility and support, whereas collagen gives both properties to a tissue. Spicules and extracellular matrix both integrate cells into 3D structures, emphasizing the functional analogy existing between substratum attachment and basement membrane attachment. In this work, we reported the discovery of a novel family of proteins related to sponge short-chain collagens in a number of nonsponge invertebrates, which may have homologous relationships with type IV collagens in their NC1 domain. Remote homology detection was followed by phylogenetic analysis, revealing that type IV collagens and spongin short-chain collagenrelated proteins have had separate evolutionary histories. Because extracellular matrix attachment is thought to have played crucial roles in the evolution of multicellular animals, deciphering the phylogeny and function of these proteins is of considerable interest.
| Supplementary Material |
|---|
|
|
|---|
Supplementary Tables S1S4 and Figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
A.A. is a recipient of a fellowship from the Centre National de la Recherche Scientifique. V.N. is supported by a grant from Institut National de la Recherche Agronomique.
| Footnotes |
|---|
David Irwin, Associate Editor
1 Present address: Apoptosis and Oncogenesis Laboratory, Institut de Biologie et Chimie des Protéines (IBCP), UMR CNRS 5086, Université Claude Bernard Lyon 1, IFR128 BioSciences Lyon-Gerland, 7, Passage du Vercors, Lyon, France ![]()
| References |
|---|
|
|
|---|


50 are shown. Vertical bars delineate the 





