MBE Advance Access originally published online on October 8, 2008
Molecular Biology and Evolution 2009 26(1):71-84; doi:10.1093/molbev/msn228
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
The Secretin GPCRs Descended from the Family of Adhesion GPCRs

* Department of Neuroscience, Functional Pharmacology, Uppsala University, Uppsala, Sweden
Department of Neuroscience, Developmental Genetics, Uppsala University, Uppsala, Sweden
E-mail: helgi.schioth{at}neuro.uu.se.
| Abstract |
|---|
|
|
|---|
The Adhesion G-protein–coupled receptors (GPCRs) are the most complex gene family among GPCRs with large genomic size, multiple introns, and a fascinating flora of functional domains, though the evolutionary origin of this family has been obscure. Here we studied the evolution of all class B (7tm2)–related genes, including the Adhesion, Secretin, and Methuselah families of GPCRs with a focus on nine genomes. We found that the cnidarian genome of Nematostella vectensis has a remarkably rich set of Adhesion GPCRs with a broad repertoire of N-terminal domains although this genome did not have any Secretin GPCRs. Moreover, the single-celled and colony-forming eukaryotes Monosiga brevicollis and Dictyostelium discoideum contain Adhesion-like GPCRs although these genomes do not have any Secretin GPCRs suggesting that the Adhesion types of GPCRs are the most ancient among class B GPCRs. Phylogenetic analysis found Adhesion group V (that contains GPR133 and GPR144) to be the closest relative to the Secretin family in the Adhesion family. Moreover, Adhesion group V sequences in N. vectensis share the same splice site setup as the Secretin GPCRs. Additionally, one of the most conserved motifs in the entire Secretin family is only found in group V of the Adhesion family. We suggest therefore that the Secretin family of GPCRs could have descended from group V Adhesion GPCRs. We found a set of unique Adhesion-like GPCRs in N. vectensis that have long N-termini containing one Somatomedin B domain each, which is a domain configuration similar to that of a set of Adhesion-like GPCRs found in Branchiostoma floridae. These sequences show slight similarities to Methuselah sequences found in insects. The extended class B GPCRs have a very complex evolutionary history with several species-specific expansions, and we identified at least 31 unique N-terminal domains originating from other protein classes. The overall N-terminal domain structure, however, concurs with the phylogenetic analysis of the transmembrane domains, thus enabling us to track the origin of most of the subgroups.
Key Words: evolution GPCR G-protein 7TM EGF GRAFS
| Introduction |
|---|
|
|
|---|
G-protein–coupled receptors (GPCRs) are one of the largest protein families in mammalian genomes with about 800 members in the human genome (Lagerstrom and Schioth 2008
GPCRs similar to those in mammals are not found in bacteria, but mammalian-like GPCRs can be found in almost any eukaryotic organism. This includes plants (Devoto et al. 1999
; Josefsson 1999
), insects (Hill et al. 2002
), fungi (Versele et al. 2001
), and the amoeba Dictyostelium discoideum (Prabhu and Eichinger 2006
). The light sensing 7TM protein found in bacteria, the bacterial rhodopsin, does not signal through G proteins and has very low sequence identity to GPCRs (Okada et al. 2001
). It is thus presently unclear whether this protein has a common origin with GPCRs in eukaryotic organisms. The five main families (see above) are present in considerable numbers in most bilaterial species including Caenorhabditis elegans, Drosophila melanogaster, Anopheles gambiae, Strongylocentrotus purpuratus, Ciona intestinalis, and the vertebrate species, although the hierarchical relationship between these five main families has not been determined (Fredriksson and Schioth 2005
; Whittaker et al. 2006
; Schioth et al. 2007
). We recently observed that the five main families are found in large numbers in Branchiostoma floridae (Nordstrom et al. 2008
). There are several lineage-specific groups of GPCRs, for example, the nematode chemoreceptors, the gustatory receptors from insects, the odorant receptors from D. melanogaster, Mildew-resistance locus O (MLO) receptors in plants, fungal pheromone (STE2 and STE3) from yeast, and the Methuselah in D. melanogaster, but none of these families are found in vertebrates. Some similarities have been identified between the Secretin, Adhesion, and Methuselah GPCRs (Harmar 2001
), and many domain databases (7tm_2 in Pfam [Finn et al. 2006
], GPCR_secretin in Interpro [Zdobnov and Apweiler 2001
], and 7tm_2 in the National Center for Biotechnology Information [NCBI] Conserved domain database [CDD] [Marchler-Bauer and Bryant 2004
]) use sequences from all three groups to form common fingerprints for search tools despite the fact that the functional characteristics of these families are highly divergent. Recently, Cardoso et al. (2005)
studied evolutionary events that shaped the different branches of the Secretin GPCRs and clearly found the relationship between the Secretin receptors in C. elegans and the vertebrate Secretin family subbranches. The evolution of the Adhesion family has, however, not been studied in detail. The main reason for this is that the Adhesion family is, by far, the most complex group of GPCR sequences. These GPCRs are very large and have a large number of exons. Alternative splicing and complex processing steps, including the putative intracellular cleavage at the GPCR proteolytic site, are also contributing factors to their complexity (Bjarnadottir et al. 2007
). The evolutionary relationship between the Secretin, Adhesion, and Methuselah GPCR and other GPCRs that may show relationship to 7tm2 domain (Finn et al. 2006
) or the extended GPCR class B (also termed class 2) has not been resolved.
Here we investigated the evolution of the entire set of class B–related genes, including the Adhesion, Secretin, and Methuselah families of GPCRs in nine genomes. Special emphasis has been placed on the genomes of Tetraodon nigroviridis, D. melanogaster, and C. elegans together with the prevertebrate sea anemone Nematostella vectensis, whose genome was recently released (Putnam et al. 2007
). We have also revisited the "secretin-like" GPCRs from the social amoeba D. discoideum and the choanoflagellate Monosiga brevicollis genomes (King et al. 2003
; Prabhu and Eichinger 2006
). These are both single-celled and colony-forming eukaryotes and are thereby interesting from an evolutionary perspective due to their positions in the evolutionary tree as predecessors to multicellular organisms. We have put in considerable effort to manually curate the sequences, which is a prerequisite for correct domain and splice site identification. Therefore, this study provides the most comprehensive overview of the evolutionary events that shaped the extended family of class B (family 2) GPCRs.
| Materials and Methods |
|---|
|
|
|---|
Sequence Retrieval and Editing
Full-length Adhesion sequences for Homo sapiens, Mus musculus, and Gallus gallus were downloaded according to information from previously published articles (Bjarnadottir et al. 2004
Domain Search
The N-termini of the sequences were assembled in the same manner as mentioned above and then searched for their domain extent in rps-Blast with a cutoff e value of 0.1. The Methuselah sequences did not show any domain-specific areas in rps-Blast. These sequences were therefore searched with InterProScan at http://www.ebi.ac.uk/InterProScan (Zdobnov and Apweiler 2001
).
Confirmation and Classification
The 7TM regions were aligned to NCBIs nonredundant (nr) database with Blast, and each of the first five hits had to belong to the extended family of class B (Secretin, Adhesion, or Methuselah). All sequence hits in nr were extracted and aligned to the Pfam data set. They were considered as class B transcripts if they contained a 7tm_2 domain not overlapped by any higher scoring hit. The criteria for transcripts from D. discoideum, M. brevicollis, and the Methuselah family in D. melanogaster were lowered due to their divergence from other species. They were kept on the same basis as those used for categorization of the nr transcripts above. These sequences were then classified into the Adhesion, Secretin, or Methuselah family on the basis of a Blast search against our internal database consisting of all human GPCRs together with the nematode chemosensory receptors, the gustatory receptors from insects, the odorant receptors from D. melanogaster, MLO receptors from plants, fungal pheromone receptors (STE2 and STE3) from yeast, vomeronasal receptors (V1R and V3R) and cAMP receptors from D. discoideum. The sequences were classified to the respective family with three or more of the top five hits.
Sequence Identity
Global alignments between all truncated sequences were made using the needle program from the EMBOSS package (Rice et al. 2000
). The result was compiled using an in-house parser written in Python and viewed and manipulated using the spreadsheet program in the Open.Office.org suite.
Phylogenetic Analysis
The protein sequences of GPCRs chosen for phylogenetic analysis were aligned using Kalign (Lassmann and Sonnhammer 2005
) with default parameter settings. The alignments were retranslated to nucleotides using the genomic sequence procured when assembling the GPCR and correcting its splice sites. We used Markov Chain Monte Carlo (MCMC) analysis with MrBayes (Ronquist and Huelsenbeck 2003
), but in order to get at the liberal posterior probabilities of Bayesian analysis (Suzuki et al. 2002
; Alfaro et al. 2003
; Douady et al. 2003
), we chose to implement the bootstrapped MCMC analysis suggested by Douady et al. Hence, the aligned nucleotide file was bootstrapped 200 times using SEQBOOT from the PHYLIP 3.67 package (Felsenstein 2004
), and each alignment was analyzed with the general time reversible model with a proportion of invariable sites and a gamma-shaped distribution of rates across sites in MrBayes (nst = 6 and rates = invgamma). Each analysis ran 500,000 generations. Every hundredth tree from the last 100,000 generations was sampled, and a consensus tree was constructed using CONSENSE from the PHYLIP 3.67 package with the majority rule. Maximum likelihood branch lengths were then calculated with DNAML, also from the PHYLIP 3.67 package, and the tree was plotted in the Win32 version of TreeView 1.6.6 (Page 1996
). The resulting tree was edited in InkScape.
Splice Sites
The positions of splice sites in the 7TM region were extracted by aligning the edited and truncated sequences to their corresponding genome using BLAST. All sequences were aligned with Kalign. Conserved splice sites were identified by viewing the positions of the splice sites and the alignment together in Jalview. Data were manipulated with OpenOffice.
| Results |
|---|
|
|
|---|
Sequence Retrieval and Classification
We collected 250 sequences from the extended class B GPCRs in H. sapiens, M. musculus, G. gallus, T. nigroviridis, D. melanogaster, C. elegans, N. vectensis, M. brevicollis, and D. discoideum (see fig. 1 and supplementary fig. 1, Supplementary Material online). GPCRs from H. sapiens (33 Adhesion and 15 Secretin), M. musculus (29 Adhesion and 15 Secretin), and G. gallus (18 Adhesion and 11 Secretin) were retrieved from previously published work (Fredriksson et al. 2003
|
Phylogenetic Analysis
We constructed a phylogenetic tree of the Adhesion GPCRs (fig. 2 and supplementary fig. 2, Supplementary Material online) using a bootstrapped nucleotide method with MrBayes and estimated the branch lengths using DNAML from the PHYLIP 3.67 package. According to Cardoso et al. (2005)
|
Group I consists of three lectomedin receptors (LECs) and one EGF-TMVII-latrophilin-related receptor (ETL) in H. sapiens. Gallus gallus is missing LEC1 and T. nigroviridis LEC2 although there is one T. nigroviridis GPCR, TnLec3-3, basal to the other LECs. There is also one C. elegans sequence, A_ce4k, which clusters with LEC3. Group II holds EGF-like module–containing receptors (EMRs) and a CD97 antigen receptor (CD97). Homo sapiens has an expansion of four EMRs compared with T. nigroviridis, which has one ortholog in TnCD97-1e. These two species also have one ortholog of CD97 each. Gallus gallus and the prevertebrate species are not present in this group. Groups I and II cluster together in the tree and basal to both groups, we find three prevertebrate sequences. Closest to groups I and II is one N. vectensis sequence, Nv24490, and basal to this and the two groups is a clade consisting of one D. melanogaster GPCR, sDm2, and one C. elegans GPCR, A_ce3. Group III is made up of GPR123, GPR124, and GPR125 in H. sapiens. All the vertebrates we searched have the same three sequences found in H. sapiens, whereas T. nigroviridis has a duplication of one of the three, which is placed basal to the other vertebrate sequences. Basal to group III, there are five sequences; sdm_6 from D. melanogaster, Mb_37365 from M. brevicollis, and Nv238736, Nv199785, and Nv200012 from N. vectensis. Group IV is represented by three EGF LAG seven-pass G-type receptor (CELSRs) in H. sapiens, whereas G. gallus has two orthologs, one to CELSR1 and one to CELSR3. There are also two sequences in T. nigroviridis, one basal to CELSR1 and one that clusters with the human CELSR2. Basal to group IV, there are sdm1 from D. melanogaster and Nv84228 from N. vectensis. Group V consists of GPR133 and GPR144 in H. sapiens, and G. gallus has the same configuration. Tetraodon nigroviridis only contains a GPR144 ortholog. In group VI, H. sapiens has a repertoire of five GPCRs, GPR110, GPR111, GPR113, GPR115, and GPR116, whereas G. gallus has two; an ortholog to GPR116 and ggGPR115, which falls basal to GPR111 and GPR115 in mammals. Tetraodon nigroviridis also has two group VI sequences, TnGPR116-1 basal to GPR110, GPR111, and GPR115 and TnGPR116-2 basal to the whole group and thereby closest to GPR113. We could not find evidence for group VI outside of the vertebrate species. Group VII consists of the brain-specific angiogenesis-inhibitory receptors (BAIs) which are present in three copies in vertebrates. The G. gallus ortholog of BAI2 is not included in the tree as its 7tm-domain was incomplete. According to our tree, BAI1 falls basal to BAI2 and BAI3. The human members of group VIII are GPR56, GPR64 (or He6), GPR97, GPR112, GPR114, and GPR126. Only GPR112 and GPR126 are found in G. gallus. GPR56 and GPR114 form one subbranch together with GPR97. Basal to this clade are TnGPR97-1 and TnGPR56-1 from T. nigroviridis. Tetraodon nigroviridis also has one GPR126 ortholog in TnGPR126-1. Between GPR64 and GPR112 are TnGPR112-3 and TnGPR112-2 from T. nigroviridis, and basal to the entire group is TnGPR126-3 from T. nigroviridis.
Among the ungrouped Adhesion GPCRs, VLGR1 and GPR128 are both present in H. sapiens and G. gallus and form two stable clades. Mb_21626 from M. brevicollis is located basal to the VLGR1 node. The M. brevicollis GPCR Mb_22592 is placed in the collapsed node holding the N. vectensis expansion, groups III, VI, and VIII. The D. discoideum GPCR, dd_1, shares a node with sdm8 and sdm9 from D. melanogaster. There are six sequences, A_ce5k from C. elegans and Nv81203, Nv125574, Nv133486, Nv145494, and Nv217885 from N. vectensis, basal to the clade consisting of group V and the Secretin GPCRs but with bootstrap values below 50%. The N. vectensis GPCR Nv78835 is placed basal to group VII with a bootstrap value of 25%.
We constructed a phylogenetic tree of the Secretin GPCRs with the same method (see fig. 3 and supplementary fig. 3, Supplementary Material online). According to the Adhesion tree (see fig. 2 and supplementary fig. 2, Supplementary Material online), group V was the Adhesion group closest to the Secretin family, and therefore, we included it in the tree together with six N. vectensis GPCRs showing similarities to group V. The human Frizzled GPCRs were used as an outgroup. The tree places Ce_C18B12 and Ce_13B9.4 from C. elegans sequences together with one D. melanogaster sequence (CG13758-PA) at the root of the Secretin clade. The pair of CG8422-PA and CG12370-PA from D. melanogaster are also placed at the root. The most basal vertebrate GPCRs are present in group A (CRHRs) in the same node, followed by group E (CALCRs), D (parathyroid hormone receptors [PTHRs]), and finally B (growth hormone–releasing hormone receptor [GHRHR], secretin receptor [SCTR], vasoactive intestinal peptide receptors [VIPRs], and pituitary adenylyl cyclase–activating protein [PACAP]) and C (glucagon-like peptide receptors [GLPRs], glucagon receptor [GCGR], and gastric inhibitory polypeptide receptor [GIPR]). Group A consists of CRHR1 and CRHR2, which are present in both mammals and G. gallus. Tetraodon nigroviridis has one ortholog to CRHR1 in Tn_22293. In group B, the mammals and G. gallus have the complete set of GHRHR, SCTR, VIPR1, VIPR2, and PACAP. The mammalian GHRHRs and the G. gallus homolog do not cluster in the same node. There is one T. nigroviridis sequence, Tn_12645, basal to the mammalian GHRHRs, whereas there are two T. nigroviridis sequences, Tn_11830 and Tn_34494, that group basally with the G. gallus GHRHR. Beside this, T. nigroviridis has a complete set with duplicate versions of PACAP, VIPR1, and VIPR2. In group C, all vertebrates have one ortholog of GLP1R, but G. gallus lacks GLP2R. Tn_19752 and Tn_23238 from T. nigroviridis are placed basal to a clade of GCGR and GIPR and the clade of GLP1R. The mammals have one ortholog each of GCGR and GIPR, whereas G. gallus lacks GCGR. For group D, the mammals and G. gallus both have one ortholog of PTHR1 and PTHR2. The T. nigroviridis GPCR Tn_14484 falls basal to the mammalian PTHR2s. There is also one node holding ggPTHR2, Tn_16129, a clade of ggPTHR1 and Tn_35038, and a clade of the mammalian PTHR1s. Group E is the only Secretin group that has prevertebrate GPCRs associated with it. It has two D. melanogaster GPCRs, CG4395-PA and CG32843-PA, basal to the other GPCRs. Both G. gallus and the mammals have one ortholog each of CALCR and CALCRL. There are also four T. nigroviridis GPCRs; Tn_32973 basal to CALCR and Tn_10414 and Tn_16864 and Tn_20073 basal to CALCRL.
|
We also calculated a tree containing the Methuselah GPCRs, the new expansion of N. vectensis GPCRs and the Adhesion GPCRs in groups III, VI, and VIII (see fig. 3 and supplementary fig. 3, Supplementary Material online). Because the domain structure (see below) of the N. vectensis expansion is similar to an expansion in B. floridae (Nordstrom et al. 2008
Domain Search
We searched all Adhesion GPCRs for conserved domains and the domains of the N-terminus are presented together with abbreviations in figure 4. The domain structure of H. sapiens and M. musculus has been thoroughly described in our previous paper (Bjarnadottir et al. 2007
). Gallus gallus—in group I, ggETL has a GPS domain; ggLEC2 has a GPS, a HBD, an OLF, and a GBL domain; and ggLEC3 has the same setup as ggLEC2 excluding the HBD domain. No G. gallus GPCRs could be found in group II. In group III, ggGPR123 has no domains in the N-terminus. ggGPR124 has an Ig and an LRR domain, whereas ggGPR125 has a GPS domain. There are two group IV sequences: ggCELSR1 with one GPS, one HBD, one EGF_Lam, one LamG, and one EGF domain and ggCELSR3 with one GPS followed by one HBD, one EGF_Lam, one EGF, one LamG, another EGF, one LamG, two more EGF, and seven CA domains. There are one ortholog to GPR133 and GPR144 each, but none of them contains any domains in their N-termini. In group VI, ggGPR115 contains one GPS domain and ggGPR116 has a GPS and an Ig domain. Group VII contains three GPCRs in G. gallus. ggBAI1 has one HBD domain and four TSP1 domains; ggBAI2 does not have any domains in its N-terminus; and ggBAI3 has one GPS, one HBD, and four TSP1 domains. In group VIII, ggGPR112, ggGPR126, and ggHE6 have one GPS domain each. The ortholog to GPR128 has no domains in its N-terminus, and the ortholog to VLGR1 has one GPS and three Calx_beta domains.
|
Tetraodon nigroviridis
The group structure of the Adhesion GPCRs is well conserved in T. nigroviridis with sequences from all groups although orthologs to GPR128, GPR133, and VLGR1 are missing. In group I, the ortholog to ETL has gained an HprK domain, and we could not find OLF or GBL domains in the end of the N-termini of TnLec3-1 although these are present in all the human LECs. The two sequences from T. nigroviridis in group II have three conserved EGF domains each. In group III, TnGpr124-1 does not have a GPS domain. Both TnGpr124-1 and TnGpr125-1 have lost the HBD domain compared with their human orthologs. TnCelsr1-1 only has six CA domains in the end of the N-terminus compared with seven in the human ortholog but has an insertion of a TNFR domain after the EGF_Lam domain. The domain structure of TnCelsr2-1 is the same as its human ortholog. The ortholog to GPR144 has a PTX and an Urease_beta domain but no GPS domain. In group VI, TnGpr116-1 and TnGpr116-2 have only GPS domains. In group VII, TnBai2-1 has one GPS, one OPF, one HBD, and three TSP_1 domains. TnBai3-1 has the same N-terminus setup except for the OPF domain. TnBai3-3 only has four TSP_1 domains. TnGpr56-1, TnGpr97-1, and TnGpr126-3 of group VIII only have the GPS domain. In the same group, TnGpr112-2 and TnGpr112-3 have one GPS and one PTX domain, whereas TnGpr126-1, in addition to this setup, sports a CUB domain.
Drosophila melanogaster
sDm2, which falls basal to groups I and II, has both a GPS and a GBL domain. In group III, the sDm6 has one HBD, one Ig domain, and two LRR domains. The sDm1 of group IV has one HBD, two EGF_Lam, and one Metallothio_PEC domain, followed by two pairs of alternating LamG and EGF domains. The N-terminus ends with seven CA domains. sDm8 and sDm9 could not be assigned to a group based on our phylogenetic examinations. sDm8 has a GPS domain.
Caenorhabditis elegans
Ce3 and Ce4 are associated to group I, and both have one GPS, one HBD, and one GBL domain. In addition to this, Ce3 also has a CLECT domain. Ce2, which only has a GPS domain, did not place in any group, whereas the domain setup for Ce5 is similar to group IV. This sequence has one GPS, followed by one HBD, three EGF, two LamG, another EGF, and six CA domains.
Nematostella vectensis
The phylogenetic trees (see figs. 2 and 3 and supplementary fig. 2 and 3, Supplementary Material online) show an expansion with 12 N. vectensis sequences of which three have a Somatomedin_B domain in the N-termini. There is a fourth N. vectensis sequence, Nv212781, that contains this domain, and this sequence was grouped with the other 12 in the expansion. The remaining nine in the expansion lack the Somatomedin_B domain, but they also lack a stop codon in the end of the transcript. When we extracted the genome sequence downstream of these transcripts, translated in the three forward reading frames, and searched for conserved domains, we found reminiscent traces of Somatomedin_B domains. The group II sequence Nv24490 contains a GPS domain in its N-termini. Nv_242046 has a HBD and an Ig domain, which is similar to group III. Based on the phylogenetic analysis, Nv_200012, without domains in the N-terminus, Nv_238726, with a GPS, an Ig, a WSC, and a DUF885 domain, and Nv_199785, with a GPS, an fn3 domain, and three F5_F8_type_C domains, belong to group III. In group IV, Nv_84226 has a series of a GPS, a HDB, an EGF_Lam, two EGF, one LamG, one EGF, one LamG, one EGF, and eight CA domains. The group V sequence Nv20791 has only a GPS domain. Among the sequences associated with group V, Nv_204814 and Nv_201898 have a GPS and a CLECT domain. The CLECT is present in a group I sequence in C. elegans and in the human GPR144, but the phylogenetic analysis puts Nv_204814 closer to group V. Nv_242264 has one GPS domain followed by 16 Calx_beta, one LamG, and another two Calx_beta domains and is likely an ortholog to VLGR1 because of the Calx_beta domains. Fifteen N. vectensis sequences did not place into any of the Adhesion groups. Of these, seven contain a GPS domain.
Monosiga brevicollis
Mb_7962 has one GPS and seven Calx_beta domains, whereas Mb_10341 has a GPS and 12 Calx_beta domains suggesting that they are orthologs to VLGR1. Mb_12491 has one GPS, four EGF, and one ADH_Zinc_N domain and is a putative ortholog to group IV. We could not place Mb_21626, Mb_22592, or Mb_37365 in any group, but all these sequences hold one GPS domain each in the N-termini.
Dictyostelium discoideum
The dd1 GPCR in D. discoideum does not have any other domains except the 7tm_2 domain.
Splice Sites
We analyzed the number and position of splice sites in the 7TM region of the class B GPCRs. We can confirm the results of Cardoso et al. (2005)
considering the positions of the splice sites in the Secretin family GPCRs. There are seven well-conserved splice sites in the vertebrate groups A–D Secretin GPCRs and six in group E Secretin GPCRs (see fig. 5 and supplementary fig. 4, Supplementary Material online). The first conserved splice site (css1) can be defined by one amino acid upstream to a conserved arginine located directly after the end of transmembrane helix 1 (TM1). The second conserved splice site (css2) is located in the first extracellular loop, three amino acids upstream to a conserved cysteine followed by either an arginine or a lysine. The third conserved splice site (css3) is located in the middle of TM4 in a tryptophan between two conserved glycines. The position of the fourth conserved splice site (css4) is one amino acid upstream to a conserved cysteine in the second extracellular loop. The fifth conserved splice site (css5) is located in the middle of TM5, two amino acids upstream to a conserved asparagine. The sixth conserved splice site (css6) is located at the end of the third intracellular loop and can be defined as four amino acids upstream to a position with either an arginine or a lysine in the beginning of TM6, and it is this splice site that is missing in group E. The seventh conserved splice site (css7) is located in the middle of TM7, one amino acid upstream to a conserved glutamine. The invertebrate Secretin GPCRs in C. elegans have fewer splice sites but generally share them with the vertebrates, whereas the splice sites in D. melanogaster are more dispersed.
|
These conserved splice sites are also present in the Adhesion family, although there are distinct differences between the groups. Groups I and II both have a similar setup having the four splice sites, css2, css4, css5, and css6, conserved. There are four prevertebrate GPCRs in group I. The N. vectensis gene Nv24490 has five splice sites, css3–css7, although the sites at css5 and css6 are shifted 6 amino acids and 11 amino acids downstream, respectively. The D. melanogaster sequence in group I and A_ce4k has a splice site at css1. A_ce3k has a splice site between css1 and css2 and a second splice site at css4. Sdm_2 has one splice site between css2 and css3 and one at css5 and one at css6.
The sequences belonging to group III have four splice sites. Of these, one is located at css1 and one at css4. The remaining two do not coincide with the conserved splice sites found among the Secretin GPCRs but are associated with the second intracellular loop, which in group III is extended, and more proline rich compared with other class B GPCRs. The first of these splice sites is located in the beginning of TM3, two amino acids downstream of the cysteine defining css2. The second splice site follows the second extracellular loop, one amino acid upstream to a conserved arginine. The two conserved splice sites css1 and css4 are also present in the four N. vectensis GPCRs in this group, although Nv200012 has css1 shifted upstream. The first intracellular loop, where css1 is located, is extended in Nv200012. The D. melanogaster GPCR (sdm_6) lacks splice sites in the 7TM region but is the only prevertebrate gene with the second intracellular loop that is extended in the vertebrates. The N. vectensis GPCRs also lack the group III–specific splice sites associated with this loop. However, three of them, Nv199785, Nv200012, and Nv242046, have a splice site at css3. Nv200012 also has splice site at the beginning of the second intracellular loop. Group IV has five splice sites. Four of these coincide with css2, css5, css6, and css7. The fifth splice site is located close to css3 but has been shifted seven amino acids downstream and is located one amino acid upstream of another conserved glycine. The N. vectensis GPCR Nv84228 has six splice sites at css1–css5 and css7. The D. melanogaster GPCR sdm1, phylogenetically located basal to this group, has only one splice site in the 7TM region, and it is located at css4. The C. elegans GPCR A_ce5k has two splice sites, one close to css3 and one in TM6 close to css6. The M. brevicollis GPCR Mb_12491 putatively associated to this group lacks splice sites in the 7TM region. Group V is split between two neighboring nodes in the phylogenetic tree, but both have similar patterns of splice sites. All three vertebrate orthologs to GPR144 have six of the conserved Secretin splice sites, with only css4 is missing. The N. vectensis GPCR Nv201898 and Nv204814 both have the complete setup of conserved Secretin splice sites. All the vertebrate orthologs to GPR133 have all the seven conserved splice sites found in the Secretin family. Nv20971 has splice sites at css4–css7. The three N. vectensis GPCRs, Nv78835, Nv125574, and Nv217885, placed in the node holding groups I, II, IV, V, and VII, also have all seven splice sites css1–css7. The sequences in group VI only have one splice site per GPCR. This is located at css7. There are six to seven splice sites in group VII of which the first five splice sites are located at css1–css5. Like group III, group VII has an extended third extracellular loop, and the sixth splice site is located at the beginning of this loop between css5 and css6. Four vertebrate GPCRs in group VII are missing the third extracellular loop, and these are the ones with only six splice sites. The remaining seven vertebrate GPCRs have a seventh splice site at the end of the loop, which also coincides with css6. TnBAI2-1 also has an eighth splice site, which is located between css6 and css7. The sequences in group VIII, in general, have four splice sites at css1, css3, css4, and css7.
| Discussion |
|---|
|
|
|---|
We found that the N. vectensis genome has a remarkably rich set of Adhesion GPCRs with many of the domains found in mammalian GPCRs. Both the phylogeny (see fig. 2 and supplementary fig. 2, Supplementary Material online) and the domain composition (see fig. 4) show that N. vectensis has members, which clearly belong to group III and groups IV and V of the mammalian Adhesion GPCRs. Intriguingly, the N. vectensis genome does not have any Secretin GPCRs. Neither did we find any Secretin GPCRs in M. brevicollis or D. discoideum although both these genomes have Adhesion-like GPCRs. We can thus draw the conclusion that the Adhesion GPCRs are a more ancient GPCR family than the Secretin GPCRs (see fig. 1 and supplementary fig. 1, Supplementary Material online). This is the first time that an evolutionary hierarchy has been clearly delineated among the five main families of vertebrate GPCRs. This is also interesting because the biology of the Secretin receptors has been much more widely studied than that of Adhesion GPCRs.
We searched for evidence that the Secretin GPCRs, which are found in both D. melanogaster and C. elegans, might have originated from one of the ancient branches of Adhesion GPCRs. Our overall phylogenetic tree (see fig. 2 and supplementary fig. 2, Supplementary Material online) suggests that group V (that contains GPR133 and GPR144 in humans) is the closest relative to the Secretin family in the Adhesion family. Interestingly, group V sequences in N. vectensis (Nv_201898 and Nv_204814) both share the same splice site setup as the Secretin GPCRs, and this splice site setup is not shared by any of the other ancient groups. One of the most conserved motifs in the whole Secretin family is PL(L/F)G found in TM6. This motif is highly conserved in Secretin groups A–C and E and also present in group D but with lower degree of conservation (see supplementary fig. 5, Supplementary Material online). If we consider all other Class B groups present in N. vectensis, this specific Secretin family motif is only found in group V of the Adhesion family. Taken together, this provides strong evidence that the Secretin family of GPCRs could have originated from group V Adhesion GPCRs.
We also find it likely that Adhesion groups I, IV, and V share a common ancestor because they group together in the phylogenetic analysis (see fig. 2 and supplementary fig. 2, Supplementary Material online). This hypothesis is also strengthened by the fact that three of their common conserved splice sites (css2, css5, and css6) are missing in the other groups present in N. vectensis (see fig. 5 and supplementary fig. 4, Supplementary Material online). This scenario also suggests that group VII, which we only found in vertebrates, arose from this branch of the evolutionary tree, most likely from group I or IV according to the domain composition. It is also evident that group II originates from group I. These groups branch together and share 30–48% amino acid sequence identity. Group III is likely to have branched from the branch that contains groups I, IV, and V in a common ancestor to N. vectensis and H. sapiens and in turn gives rise to groups VI and VIII. It is also interesting to look at the HBD that is found in all the Secretin GPCRs and in the mammalian groups I, III, IV, VI, and VII of the Adhesion GPCRs. We did find the HBD in two N. vectensis sequences (in groups III and IV) as well as in groups I, III, and IV of the Adhesion GPCRs in the Ecdysozoa. We find it thus likely that the common ancestor of both the main branches (on the one hand, groups I, IV and V shown left in fig. 2; and on the other hand, groups III, VI, and VIII shown right in fig. 2) had a HBD. The VLGR1 has the longest N-terminal of all Adhesion GPCRs and is found as a single gene in many vertebrates but not in Ecdysozoa. Interestingly, we found sequences with clear similarity in the TM region too as well as domain structure comparable to VLGR1 in both M. brevicollis and N. vectensis, suggesting that this is one of the most ancient class B genes. We also found another sequence in M. brevicollis that has multiple domains including GPS as well as both EGF and ADH_Zinc_N domains. This sequence shows most similarity to the group IV GPCRs, although the similarity in the TM domain is as low as 24%. The most ancient class B sequence known is found in D. discoideum. This sequence does not have any domain in the N-terminus, but its first 10 hits in a Blast search against our database of Class B GPCRs are all Adhesion GPCRs, suggesting that this gene is more similar to Adhesion GPCRs than any other branch of GPCRs.
Interestingly, we found a set of unique Adhesion-like GPCRs in N. vectensis, and these are shown in green in figure 4. These 13 genes do not have any GPS domain but have a TM domain that can readily be aligned with Adhesion GPCRs (amino acid identities range from 21% to 30%). These sequences have long N-termini containing one Somatomedin_B domain each. This Somatomedin_B domain is not found in any mammalian Adhesion GPCR but is, interestingly, found in a set of Adhesion-like GPCRs found in B. floridae (Nordstrom et al. 2008
). The TM regions of these two groups do not group with each other or any of the other branches of class B sequences (see fig. 3 and supplementary fig. 3, Supplementary Material online). However, this unique N-terminal composition with no GPS domains and their relatively high amino acid identities (21–38%) suggests that these two groups could be related. One of the mysteries of the class B GPCRs is the origin of the Methuselah GPCRs. These genes are only found in insects (i.e., D. melanogaster and A. gambiae; Finn et al. 2006
). The Methuselah GPCRs have long N-termini but have only one type of domain in their N-termini, the Methuselah_N domain. The Methuselah genes do not cluster with other branches of class B in phylogenetic analyses. We found, however, that one of the Somatomedin_B domain containing sequences (Nv112360) in N. vectensis had three of its five best hits as Methuselah in a Blast search among class B sequences. Moreover, an additional four of this type of N. vectensis sequences had Methuselah sequences among their top five best hits. The sequence similarity between the Methuselah and these Somatomedin_B containing N. vectensis sequences is about 15–32%. There are no conserved motifs in the 7TM region between these groups, and there is no obvious link between the expansion in N. vectensis and B. floridae and Methuselah with any of the groups of the classical Adhesion GPCRs, but the phylogeny shows closer relationship to the Adhesion branch shown on the right in figure 2 (groups III, VI, and VIII) as compared with the groups on the left.
The mammalian Adhesion GPCRs are known to have very complex genomic structure consisting of multiple introns and have large genomic size, whereas most GPCRs and, in particular, the Rhodopsin GPCRs are much simpler, often coded by a single exon. The splice sites in Adhesion GPCRs may play a role in forming alternative splice variants with different set of N-terminal domains (Bjarnadottir et al. 2004
), and this could be important for interaction with other proteins. Interestingly, the overall complexity of the Adhesion GPCRs seems to be well conserved through evolution and thus likely to be important for their overall functions. Several splice sites are highly conserved in the extended class B type of sequences within the 7TM region (see fig. 5 and supplementary fig. 4, Supplementary Material online), and we did not find any gene with good genomic sequence coverage that did not have at least several introns (Nordstrom KJV, unpublished data). There are two major contrasting theories about why mammalian Rhodopsin GPCRs have relatively small number of introns compared with invertebrates (Brosius 1999
; Gentles and Karlin 1999
). One report has suggested that there was a major loss of introns within the GPCR family, whereas we have argued that formation of new genes through RNA-based mechanisms explains the lower intron density in mammalian GPCRs. It is notable that no major expansion of the class B GPCRs that seem to originate from RNA-based mechanisms, as is observed for Rhodopsin GPCRs. Based on the high degree of conservation of the splice sites, it is likely that the expansion of Class B took place mainly through DNA-based mechanisms like duplications of the whole or parts of the genome. There are, however, examples of parts of the genes that are intron free in which introns seem to be lost, such as for the 7TM region of group VI, and there is no general pattern showing a higher number of introns in vertebrates compared with prevertebrates (Nordstrom KJV, unpublished data). Moreover, there are examples even in the TM regions where acquisition of a loop in the second intracellular region in group III GPCRs is associated with the gain of one splice site and the movement of another, comparing the N. vectensis to the vertebrate sequences (see fig. 5 and supplementary fig. 4, Supplementary Material online). In group VII, some transcripts have an extra exon extending the third intracellular loop in a similar manner that an extra exon has been gained in some N-termini mammalian Adhesion GPCRs (Bjarnadottir et al. 2007
). In general, the Class B sequences seem not to follow the 2R pattern (tetraploidization events) although several of the human Adhesion genes are found in the specific paralogues groups (Fredriksson et al. 2003
). However, we find four clear examples where each gene is available in two copies in T. nigroviridis, which is thought to have gone through a third whole-genome duplication (3R) together with the other teleosts (Jaillon et al. 2004
). These genes are GPR123, PACAP, VIPR1, and VIPR2 and in Secretin group C, there is a node with two T. nigroviridis GPCRs basal to the clade containing GIPR and GCGR, which also are the result of putative 3R duplications. In that case, this is a duplication of a GPCR ancestral to both GIPR and GCGR. The fact that we do not see more duplications is consistent with theories of heavy gene loss reported to take place after a whole-genome duplication (Brunet et al. 2006
). Finally, it is interesting to note that the repertoire of Adhesion GPCRs in N. vectensis is more closely related to that of the vertebrates compared with that of the Ecdysozoan C. elegans and D. melanogaster, which is in agreement with the conclusion that a large number of gene families have been lost in the Ecdysozoa lineage (Putnam et al. 2007
).
In summary, we have performed detailed mining of the most multifaceted family of GPCRs in nine genomes. The repertoire of Adhesion GPCRs is uniquely complex with at least 31 unique N-terminal domains, lengths up to 7,042 amino acids, and intron numbers up to 100. At least 84% of the extended class B sequences have identifiable domains in their N-termini. The overall N-terminal domain structure fits remarkably well to the phylogenetic analysis of the TM domains enabling us to track the origin of most of the subgroups. We provide compelling evidence for the ancient origin of the Adhesion GPCR family. Moreover, it is likely that the biologically well-studied Secretin family of receptors, which mediate many key hormonal functions through binding hormones in long N-termini, originated from ancestors to the Adhesion GPCR family.
| Supplementary Material |
|---|
|
|
|---|
Supplementary files 1 and 2 and figures 1–4 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Chris Pickering for help with the language of the article and also Anders Hellström for initial data mining. The sequence data for N. vectensis and M. brevicollis were produced by the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/. The studies were supported by the Swedish Research Council, The Novo Nordisk Foundation, Swedish Royal Academy of Sciences, and Magnus Bergvall Foundation. M.C.L. was supported by the Swedish Brain Research Foundation. R.F. was supported by the Göran Gustafssons foundation.
| Footnotes |
|---|
Billie Swalla, Associate Editor
| References |
|---|
|
|
|---|
Alfaro ME, Zoller S, Lutzoni F. Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol Biol Evol (2003) 20:255–266.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol (1990) 215:403–410.[CrossRef][Web of Science][Medline]
Bjarnadottir TK, Fredriksson R, Hoglund PJ, Gloriam DE, Lagerstrom MC, Schioth HB. The human and mouse repertoire of the adhesion family of G-protein-coupled receptors. Genomics (2004) 84:23–33.[CrossRef][Web of Science][Medline]
Bjarnadottir TK, Fredriksson R, Schioth HB. The adhesion GPCRs: a unique family of G protein-coupled receptors with important roles in both central and peripheral tissues. Cell Mol Life Sci (2007) 64:2104–2119.[CrossRef][Web of Science][Medline]
Bockaert J, Pin JP. Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO J (1999) 18:1723–1729.[CrossRef][Web of Science][Medline]
Brosius J. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene (1999) 238:115–134.[CrossRef][Web of Science][Medline]
Brunet FG, Crollius HR, Paris M, Aury JM, Gibert P, Jaillon O, Laudet V, Robinson-Rechavi M. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol (2006) 23:1808–1816.
Cardoso JC, Clark MS, Viera FA, Bridge PD, Gilles A, Power DM. The secretin G-protein-coupled receptor family: teleost receptors. J Mol Endocrinol (2005) 34:753–765.
Clamp M, Cuff J, Searle SM, Barton GJ. The Jalview Java alignment editor. Bioinformatics (2004) 20:426–427.
Devoto A, Piffanelli P, Nilsson I, Wallin E, Panstruga R, von Heijne G, Schulze-Lefert P. Topology, subcellular localization, and sequence diversity of the Mlo family in plants. J Biol Chem (1999) 274:34993–35004.
Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJ. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol (2003) 20:248–254.
Eddy SR. Profile hidden Markov models. Bioinformatics (1998) 14:755–763.
Felsenstein J. PHYLIP (phylogeny inference package). Distributed by the author. (2004) Seattle (WA): Department of Genome Sciences, University of Washington.
Finn RD, Mistry J, Schuster-Bockler B, et al, (13 co-authors). Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34:D247–D251.
Fredriksson R, Lagerstrom MC, Lundin LG, Schioth HB. The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharmacol (2003) 63:1256–1272.
Fredriksson R, Schioth HB. The repertoire of G-protein-coupled receptors in fully sequenced genomes. Mol Pharmacol (2005) 67:1414–1425.
Gentles AJ, Karlin S. Why are human G-protein-coupled receptors predominantly intronless? Trends Genet (1999) 15:47–49.[CrossRef][Web of Science][Medline]
Gloriam DE, Fredriksson R, Schioth HB. The G protein-coupled receptor subset of the rat genome. BMC Genomics (2007) 8:338.[CrossRef][Medline]
Harmar AJ. Family-B G-protein-coupled receptors. Genome Biol (2001) 2. 1–3013.10.
Hill CA, Fox AN, Pitts RJ, Kent LB, Tan PL, Chrystal MA, Cravchik A, Collins FH, Robertson HM, Zwiebel LJ. G protein-coupled receptors in Anopheles gambiae. Science (2002) 298:176–178.
Jaillon O, Aury JM, Brunet F, et al, (61 co-authors). Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature (2004) 431:946–957.[CrossRef][Medline]
Josefsson LG. Evidence for kinship between diverse G-protein coupled receptors. Gene (1999) 239:333–340.[CrossRef][Web of Science][Medline]
King N, Hittinger CT, Carroll SB. Evolution of key cell signaling and adhesion protein families predates animal origins. Science (2003) 301:361–363.
Kolakowski LF Jr. GCRDb: a G-protein-coupled receptor database. Receptors Channels (1994) 2:1–7.[Web of Science][Medline]
Lagerstrom MC, Hellstrom AR, Gloriam DE, Larsson TP, Schioth HB, Fredriksson R. The G protein-coupled receptor subset of the chicken genome. PLoS Comput Biol (2006) 2:e54.[CrossRef][Medline]
Lagerstrom MC, Schioth HB. Structural diversity of G protein-coupled receptors and significance for drug discovery. Nat Rev Drug Discov (2008) 7:339–357.[CrossRef][Web of Science][Medline]
Lassmann T, Sonnhammer EL. Kalign—an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics (2005) 6:298.[CrossRef][Medline]
Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res (2004) 32:W327–331.
Metpally RP, Sowdhamini R. Genome wide survey of G protein-coupled receptors in Tetraodon nigroviridis. BMC Evol Biol (2005) 5:41.[CrossRef][Medline]
Nordstrom KJ, Fredriksson R, Schioth HB. The amphioxus (Branchiostoma floridae) genome contains a highly diversified set of G protein-coupled receptors. BMC Evol Biol (2008) 8:9.[CrossRef][Medline]
Okada T, Ernst OP, Palczewski K, Hofmann KP. Activation of rhodopsin: new insights from structural and biochemical studies. Trends Biochem Sci (2001) 26:318–324.[CrossRef][Web of Science][Medline]
Page RD. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci (1996) 12:357–358.
Prabhu Y, Eichinger L. The Dictyostelium repertoire of seven transmembrane domain receptors. Eur J Cell Biol (2006) 85:937–946.[CrossRef][Web of Science][Medline]
Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res (2007) 35:D61–D65.
Putnam NH, Srivastava M, Hellsten U, et al, (19 co-authors). Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science (2007) 317:86–94.
Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet (2000) 16:276–277.[CrossRef][Web of Science][Medline]
Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics (2003) 19:1572–1574.
Schioth HB, Nordstrom KJ, Fredriksson R. Mining the gene repertoire and ESTs for G protein-coupled receptors with evolutionary perspective. Acta Physiol (Oxf) (2007) 190:21–31.[CrossRef][Medline]
Suzuki Y, Glazko GV, Nei M. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA (2002) 99:16138–16143.
Versele M, Lemaire K, Thevelein JM. Sex and sugar in yeast: two distinct GPCR systems. EMBO Rep (2001) 2:574–579.[CrossRef][Web of Science][Medline]
Whittaker CA, Bergeron KF, Whittle J, Brandhorst BP, Burke RD, Hynes RO. The echinoderm adhesome. Dev Biol (2006) 300:252–266.[CrossRef][Web of Science][Medline]
Zdobnov EM, Apweiler R. InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics (2001) 17:847–848.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




