Skip Navigation


MBE Advance Access originally published online on August 31, 2005
Molecular Biology and Evolution 2006 23(1):10-22; doi:10.1093/molbev/msj002
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/1/10    most recent
msj002v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Olinski, R. P.
Right arrow Articles by Hallböök, F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Olinski, R. P.
Right arrow Articles by Hallböök, F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org

Research Article

Conserved Synteny Between the Ciona Genome and Human Paralogons Identifies Large Duplication Events in the Molecular Evolution of the Insulin-Relaxin Gene Family

Robert Piotr Olinski*, Lars-Gustav Lundin{dagger} and Finn Hallböök*

* Unit of Developmental Neuroscience, Department of Neuroscience, Uppsala University, Uppsala, Sweden; and {dagger} Unit of Pharmacology, Department of Neuroscience, Uppsala University, Uppsala, Sweden

E-mail: finn.hallbook{at}neuro.uu.se.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The aims of the study were to outline the sequence of events that gave rise to the vertebrate insulin-relaxin gene family and the chromosomal regions in which they reside. We analyzed the gene content surrounding the human insulin/relaxin genes with respect to what family they belonged to and if the duplication history of investigated families parallels the evolution of the insulin-relaxin family members. Markov Clustering and phylogenetic analysis were used to determine family identity. More than 15% of the genes belonged to families that have paralogs in the regions, defining two sets of quadruplicate paralogy regions. Thereby, the localization of insulin/relaxin genes in humans is in accordance with those regions on human chromosomes 1, 11, 12, 19q (insulin/insulin-like growth factors) and 1, 6p/15q, 9/5, 19p (insulin-like factors/relaxins) were formed during two genome duplications. We compared the human genome with that of Ciona intestinalis, a species that split from the vertebrate lineage before the two suggested genome duplications. Two insulin-like orthologs were discovered in addition to the already described Ci-insulin gene. Conserved synteny between the Ciona regions hosting the insulin-like genes and the two sets of human paralogons implies their common origin. Linkage of the two human paralogons, as seen in human chromosome 1, as well as the two regions hosting the Ciona insulin-like genes suggests that a segmental duplication gave rise to the region prior to the genome doublings. Thus, preserved gene content provides support that genome duplication(s) in addition to segmental and single-gene duplications shaped the genomes of extant vertebrates.

Key Words: insulin • relaxin • gene duplication • paralogous region • Ciona intestinalis • conserved synteny


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Paralogous genes are the products of gene duplications and are organized into gene families. There are 10 paralogs in the human insulin/insulin-like growth factor/insulin-like factor/relaxin gene family (insulin-RLN family): insulin, two insulin-like growth factor genes (IGF1, IGF2), four insulin-like factor genes (INSL3, INSL4, INSL5, INSL6), as well as three relaxin genes (RLN1, RLN2, RLN3). The insulin-RLN family proteins share conserved amino acid motifs mainly in the region of the cystein knot (Conlon 2001Go; Conlon, Wang, and Potter 2001Go; Irwin 2004Go). Besides the well-known pancreatic function of insulin in maintenance of blood glucose homeostasis, the individual members play a variety of roles in muscle growth, trophoblast development, and testicular descent during fetal life together with functions during progeny delivery in higher vertebrates. Some of the members are also potent neurotrophic factors.

The primary aims of this study were to outline the sequence of events that gave rise to the vertebrate members in the insulin-RLN gene family and to gain knowledge of how these duplications occurred. The insulin-RLN gene family members were also used as anchor points to study the molecular history of the chromosomal regions in which they reside. The number of insulin family paralogs varies between vertebrate species, but the existence of the family is conserved across phyla with homologs in arthropods, nematodes, and mollusks (Smit et al. 1988Go, 1998Go; Lagueux et al. 1990Go; Nagata et al. 1995Go; Duret et al. 1998Go; Wang et al. 2003Go). Phylogenetic analysis infers that the insulin/IGF sequences form one group and the INSL/RLN sequences form another (Chan, Cao, and Steiner 1990Go) (see also fig. 1). The 10 human gene family members are distributed in the human genome on five chromosomes (Fig. S1, Supplementary Material online). IGF1 is on human chromosome (HSA) 12q23.2. The insulin gene and IGF2 are located in tandem on HSA 11p15.5. INSL5 is alone on 1p31.1. INSL6, INSL4, RLN2, and RLN1 form a cluster on HSA 9p24.1. RLN3 and INSL3 are located on 19p13.2 but are separated by several other genes (fig. 3).



View larger version (32K):
[in this window]
[in a new window]
 
FIG. 1.— Phylogenetic tree of chordate insulin-RLN family. One of 125 equally weighted trees of chordate insulin/IGF/INSL/RLN protein sequences, inferred by maximum parsimony using PAUP*. Tree was rooted with the outgroup of insulins, IGFs together with the insulin-like protein from lancelet. Numbers on the branches correspond to bootstrap values obtained from 500 resamplings of the matrix data. Only nodes that received more than 50% bootstrap support are given a value. Trees had consistency index 0.7104 and homoplasy index 0.2896.

 


View larger version (22K):
[in this window]
[in a new window]
 
FIG. 3.— Local segmental duplication on HSA 19p gave rise to RLN3 and INSL3. Diagram of duplicated region within HSA 19p that hosts RLN3 and INSL3. Paralogs of nine gene families (kruppel-like factors; endothelial differentiation GPCRs; egf-like module containing, mucin-like, hormone receptors-like; ras-related proteins Rab; myosins; janus kinases; juns; phosphodiesterases cAMP-specific; and calreticulins) could all be linked to RLN3 and INSL3 segments indicated by the arrows. The order of genes and scale are in accordance with UCSC Genome Browser assembly July 2003. Gene annotations are according to HUGO approved classification.

 
Two paralogs located in tandem in a chromosomal segment are likely the result of single-gene duplication. Duplication of a chromosomal segment also leads to pairs of paralogs. All the neighboring genes in such segment are duplicated at the same event. The number of duplicate genes depends on the size of the segment. Duplication of large chromosomal segments is often called block or segmental duplication, and the two resulting duplicate segments with several linked paralogs are often denoted as a paralogon or paralogous region (Coulier et al. 2000Go). The insulin/IGF and the INSL/RLN subfamilies are located in segments on different chromosomes that also share paralogs with other gene families, insulin/IGF with families represented on HSA 1, 11, 12, and 19q (Lundin 1993Go; Popovici et al. 2001bGo) and INSL/RLN with families on HSA 1, 6p, 9, and 19p (Katsanis, Fitzgibbon, and Fisher 1996Go; Popovici et al. 2001bGo; Abi-Rached et al. 2002Go; Vienne et al. 2003bGo). Large-scale duplications would explain the observed patterns. Susumo Ohno suggested that vertebrate genomes evolved by genome-wide duplications (Ohno 1970Go), and this suggestion has lately found support by the identification of a large number of paralogons of variable sizes with linked paralogs in the human genome (McLysaght, Hokamp, and Wolfe 2002Go). These data are in agreement with at least one genome-wide duplication. Many of the gene duplications have been dated to early stages in chordate evolution (X. Gu, Wang, and J. Gu 2002Go), which is in agreement with timing of the genome duplications. Analysis of the genome structure and gene content of bony fishes Tetraodon nigroviridis and Fugu rubripes gives support to the idea that whole-genome duplications indeed can occur during vertebrate evolution. The analysis suggests that a whole-genome duplication occurred in the bony fish lineage, subsequent to its divergence from the common vertebrate lineage (Christoffels et al. 2004Go; Jaillon et al. 2004Go; Vandepoele et al. 2004Go).

In order to outline the two paralogons that host the INSL/RLN and the insulin/IGF subfamilies, to test statistical significance of distribution of the paralogon gene content, and to be able to search for traces of synteny between human and Ciona intestinalis genomes, we compiled two tables with human gene families. Each table contains families with two or more members located in the regions hosting either the INSL/RLN or the insulin/IGF subfamilies. The compiled data were used to statistically test if the identified gene pattern differed from a random distribution of gene family members. The regions were then compared to the recently released Ciona genome draft (Dehal et al. 2002Go). Ciona represents a branch of chordates that diverged from the main chordate lineage before the postulated genome duplications and should therefore have a nonduplicated genome (Dehal et al. 2002Go; Yoshizaki et al. 2005Go). One insulin family ortholog is described in the Ciona genome (Satou et al. 2003Go). In this work, we present sequences from the genome draft that predict another two orthologs. The comparisons of genome maps identified synteny between Ciona and human chromosomal segments in the regions of the Ciona insulin-RLN family genes. The results in this study not only allowed the reconstruction of the sequence of duplication events that produced the human insulin-RLN gene family. The data also allowed the construction of a model for the formation of the part of the extant vertebrate genomes that cover more than 25% of the human genome, which include the insulin/IGF and INSL/RLN paralogons.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Phylogenetic Reconstruction
Based on a slightly revised BLOSUM62mt protein score matrix with the weight for Cys = 32, the alignment of full-length insulin/IGF/INSL/RLN proteins was constructed with ClustalW (AlignX in Vector NTI Suite v.9). The alignment was manually curated. Maximum parsimony and neighbor-joining methods were used for phylogenetical analysis. Aligned sequences were imported into PAUP* 4.0 (Swofford 2000Go). Trees were generated with a user-specified outgroup (fig. 1). Support for the individual clades was evaluated by bootstrapping, where 500 replicates were analyzed using heuristic algorithm. A node present in <50% of the constructed consensus trees was regarded as unsupported.

Generation of Gene Families and Identification of Genes in the Paralogons
INSL/RLN Paralogon
Using a step-wise gene by gene reciprocal comparison of the regions hosting the INSL/RLN genes, we investigated in total 4,069 genes with respect to (1) if they belong to a gene family and (2) where the family members are positioned in the human genome. All genes on HSA 19p were analyzed, and initially gene families with at least a second paralog on HSA 1 and 9 were highlighted. However, it became clear that many families had paralogs on HSA 1, 6p/15q and/or on 9/5, in addition to the one on 19p. In order not to miss families that do not have paralogs on 19p, the approach was repeated with all genes on HSA 6p and on 9 as seeds instead of 19p for the comparisons. Zinc finger protein family members were omitted from the analysis. The gene families were generated using Markov Clustering (MCL) (Enright, Van Dongen, and Ouzounis 2002Go). Parameters for the MCL family generation were: -I 3.0, -P 4000, -R 600, -pct 95. The MCL uses domain architecture and compensates for the presence of multidomain proteins, promiscuous domains, and fragmented proteins. The parameters were stringent settings for family recognition. Genes that were not grouped by MCL were first submitted to Blat UCSC search that was run with the default settings (http://genome.ucsc.edu/cgi-bin/hgBlat) against the UCSC Human Genome Database (http://genome.ucsc.edu/; July 2003 assembly) before they were omitted from Table S1 (Supplementary Material online). The Pfam database (http://www.sanger.ac.uk/Software/Pfam/) was used to complement family information in a few cases. Many families are large with paralogs in several parts of the genome, and subfamilies were identified using neighbor-joining and maximum parsimony (PAUP* v.4). Paralogs that fell outside the identified regions but that belong to the families were also collected in the supplementary tables. By applying this approach, we could identify and include nonannotated paralogs (Table S1, Supplementary Material online).

Insulin/IGF Paralogon
A slightly different approach was applied for the identification of the paralogs adjacent to insulin-IGF subfamily members. The list of paralogs designated as belonging to paralogon 1, 11, 12, 19 published elsewhere (Lundin 1993Go; Popovici et al. 2001aGo, 2001bGo) was validated and expanded by MCL or Blat UCSC search. Furthermore, the surveyed gene families positioned in insulin/IGF region were subjected to phylogenetic analysis, and only those that formed bootstrap-supported clade (per 500 bootstrap resamplings) were retained. Therefore, as the search was limited to the data compiled from the literature, not every gene positioned in the chromosomal segments building insulin/IGF paralogon was evaluated as having at least second paralog in the region. The results were deposited in Table S2 (Supplementary Material online).

Statistical Test of Gene Distribution
This test uses the binomial distribution, Bin(n,p) and was made under the assumption that the probability for a gene or family to belong to a region of the genome is proportional to the number of genes in that region (Vienne et al. 2003aGo).

The Binomial probability function:

(1)
{Omega} is defined as the entire genome (23,762); k, the number of representations in the regions; n, the total number of representations of the 174 gene families including the ones outside the regions; p, probability of being in the paralogy region defined as the fraction of the total number of genes, and q, probability of being located elsewhere in the genome (q = 1p).

In a situation where n is large, the binomial distribution can be approximated with a normal distribution. For with a population mean E(X) = np, variance V(X) = npq and standard deviation A t can be calculated for, i.e., which is approximately N(0,1), and thus a p value can then be obtained.

Ciona Database Searches and the Assembly of Novel Ciona INS-L Sequences
Ciona Ghost database released April 2002 (http://ghost.zool.kyoto-u.ac.jp/indexr1.html) was screened for putative Ciona insulin-like sequences (INS-L) using TBlastN search (Altschul et al. 1997Go) with the full-length protein sequences of human insulin, IGF2, and fruitfly INSL-2 in the query. Two different sets of expressed sequence tags (ESTs) were collected and were assembled using Vector NTI Suite Contig Assembly v.9. Putative Ciona INS-L sequences were translated in all reading frames, and the presence of conserved insulin family domains was evaluated. Scaffolds bearing Ciona INS-L genes were retrieved from JGI v.1.0 Ciona draft genome assembly (http://genome.jgi-psf.org/ciona4/ciona4.home.html). Translated Ciona Joint Genome Institute gene models in vicinity of the three Ciona INS-Ls were used as a query in BlastP searches against National Center for Biotechnology Information proteome (http://www.ncbi.nlm.nih.gov/blast/). If prokaryotic genes constituted the three best Blastp hits, the Ciona gene model was omitted from further analysis and was designated as not having vertebrate homolog (fig. 5). If a vertebrate gene was among the three best hits, BlastP search was run against the human RefSeq Protein database (http://www.ncbi.nlm.nih.gov/genome/seq/HsBlast.html). Three best human hits, treated as putative vertebrate homologs of Ciona genes, were checked for chromosomal location within or outside the identified paralogous regions, INSL/RLN (HSA: 1, 9/5, 6p/15q, and 19p) and insulin/IGF (1/2p/20p, 11, 12/14q/15q, and 19q). Phylogenetic analysis, using maximum parsimony and neighbor-joining methods, of Ciona gene model protein sequences, vertebrate and invertebrate homologs, was performed, and the homology between the human genes and Ciona gene models was evaluated. Gene models that formed a cluster with the vertebrate protein sequences were considered as ortholog(s) and were included in figure 5. Several equally similar human orthologs could be identified for many Ciona gene models, and the ones that were phylogenetically more related to the Ciona gene model are shown in bold (fig. 5).



View larger version (51K):
[in this window]
[in a new window]
 
FIG. 5.— Conserved synteny between the chromosomal regions hosting Ciona insulin-RLN homologs and the regions hosting human insulin-RLN gene family. The figure displays the organization of the Ciona intestinalis scaffolds with characterized Ciona INS-L genes. Each Ciona gene model (left bars) is shown together with its corresponding closest human ortholog(s). The location of the human genes is indicated in relation to the insulin/IGF and INSL/RLN paralogons. For many Ciona gene models, several equally similar (BlastP) human orthologs could be identified. The ones more phylogenetically related to the Ciona model are shown in bold. "x" denotes absence of a recognizable human (vertebrate) homolog. Scaffold organization is according to JGI database version 1.0 December 2002 release.

 

    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Phylogenetic Analysis
Phylogenetic analysis of 60 vertebrate insulin-RLN family sequences showed that the insulin and IGF sequences form one clade and that INSL with RLN sequences form another (fig. 1). Similar results were obtained using different phylogenetic analysis methods, and the inferred dual-clade tree topology was supported by high bootstrap values.

The topology of the inferred INSL/RLN branch was not resolved, and many branches were not given support by the bootstrap analysis. The relationship between orthologs, but not between the paralogs, was in general resolved according to established species phylogeny. Most branches linking paralogs had bootstrap values below 50. INSL4, RLN2, and RLN1 were grouped together. The inferred topology of the insulin/IGF branch was better supported. IGF1 and IGF2 grouped together within the insulin/IGF clade. When invertebrate sequences were included in the analysis, the inferred tree topologies were even less supported due to too few informative sites and a high degree of homoplasy of the sequences. These trees were not useful for resolving phylogeny between the distantly related invertebrate and vertebrate taxa (data not shown).

Single-Gene, Segmental, and Genome Duplications
INSL/RLN Subfamily
INSL6, INSL4, RLN2, and RLN1 are located in tandem on HSA 9 (fig. 2). All four genes are present in the chimpanzee genome (http://www.ensembl.org/Pan_troglodytes/index.html; release 24.1.1), while only one relaxin and INSL6 have been found in other mammals. The only so far well-characterized chicken INSL/RLN sequence—''chicken relaxin-like 3'' is on chromosome Z (Hillier et al. 2004Go).



View larger version (38K):
[in this window]
[in a new window]
 
FIG. 2.— Gene family members in the INSL/RLN paralogon. Gene families present in four or three of the chromosomal regions of the INSL/RLN paralogon are displayed. The INSL/RLN paralogon is spanning regions on chromosomes 1, 6p/15q, 9/5, and 19p. For complete list of gene families within described region, see Table S1 (Supplementary Material online). Genes were chosen upon performed phylogenetic analysis of protein sequences grouped by MCL and downloaded as a FASTA from Ensembl Mart (http://www.ensembl.org/Multi/martview?species=Homo_sapiens). Members of gene protein family that formed bootstrap-supported clade were retained. Otherwise, Blat UCSC search was performed, and all resulting sequences were considered as putative paralogs. Panels A, C and B, D are in the scale with physical chromosomal maps according to UCSC Genome Browser assembly July 2003. Gene annotations are according to HUGO approved classification (http://www.gene.ucl.ac.uk/nomenclature/). Ensembl Novel genes are as follows: Novel1 ENSG00000182373; Novel2 ENSG00000167798; Novel3 ENSG00000185403; Novel4 ENSG00000179766; Novel5 ENSG00000183236; Novel6 ENSG00000185634; Novel7 ENSG00000182625; Novel8 ENSG00000182706; Novel9 ENSG00000164294; and Novel10 ENSG00000181703.

 
The human RLN3 and INSL3 are on HSA 19p. They are each neighbored by different paralogs from the same nine families (fig. 3) in two segments. The two segments on 19p were most likely formed by a duplication of a small segment of HSA 19p that contained some of the ten pro-paralogs including an ancestor of human JAK1 and TYK2. The situation is similar in the chimpanzee and mouse genomes, however, in the rat genome RLN3 and INSL3 are on different chromosomes. The chicken genome assembly in this case was uninformative. The analysis of genome assembly of Tetraodon (http://www.genoscope.cns.fr/externe/tetranew/) predicted orthologs of human RLN3 and JAK1 positioned together on Tetraodon chromosome 1. A second JAK family ortholog was found in the vicinity, but the gene was not accompanied by any INSL/RLN gene.

A similar constellation of gene family members—as in the vicinity of RLN3 and INSL3 on HSA 19p—but in a larger scale and more diluted could be discerned in the chromosomal regions that host the INSL/RLN subfamily genes (Katsanis, Fitzgibbon, and Fisher 1996Go). The regions around each paralog of the INSL/RLN subfamily shared other gene family members and spanned segments of HSA 1, 9/5, and 19p. If transitive homology (Vandepoele, Simillion, and Van de Peer 2003Go; Van de Peer 2004Go) between chromosomal regions was considered, similarities were also found on HSA 6p and 15q (fig. 2; Fig. S1 and Table S1, Supplementary Material online). Such similarities are hallmarks of a paralogon (Katsanis, Fitzgibbon, and Fisher 1996Go; Coulier et al. 2000Go; Lundin, Larhammar, and Hallböök 2003Go) and are suggested to be the result of large segmental or total genome duplications that occurred before vertebrate radiation and after amphioxus-vertebrate split. By comparing whether molecular history of genes in the four regions (HSA 1, 6p/15q, 9/5, and 19p) paralleled the evolution of INSL/RLN subfamily members, the extension of those paralogy regions was determined (see Methods).

Gene families were generated using MCL (Enright, Van Dongen, and Ouzounis 2002Go) and Blat UCSC searches (http://genome.ucsc.edu/cgi-bin/hgBlat). Phylogeny was then used to determine if larger families should be divided into subfamilies. One hundred and seventy four families or subfamilies that had at least two paralogs on HSA 1, 6p/15q, 9/5, or 19p were collected, and the result is summarized in table 1 and presented in detail in Table S1 (Supplementary Material online). Out of total 4,069 genes in the region, 625 genes belong to the 174 families. The paralogs were concentrated to certain subregions within HSA 1, 6p/15q, 9/5 and 19p. The subregions were defined and delineated by the borders of chromosomal bands and include HSA 1p36-31 (61), 1p22-p21 (15), 1p13-p11(8), 1q21-q25 (70), 1q31 (5), 6p24-p21 (64), 6p12 (5), 15q13-q15 (9), 15q21-q26 (68), 9p24-p21 (26), 9p13 (12), 9q21-q22 (31), 9q31-q34 (57), 5p15-p12 (11), 5q11-q15 (33), 5q21-q23 (8), and 19p13 (142) with the number of paralogs for each region in parenthesis. Table S1 (Supplementary Material online) contains links to alignments and phylogenies for each family. The identified 174 families had also 248 paralogs outside the regions in question (Table S1, Supplementary Material online). The defined regions span 545.5 Mb (16,9% of the human genome) and contain 4,069 constitutive genes that in turn encompass 17.1% of the predicted 23,762 genes in the human genome (http://www.ensembl.org/Homo_sapiens/index.html; release 24.34e.1). Gene families that were represented on at least three of the four chromosomal regions (HSA 1, 6p/15q, 9/5, or 19p) are depicted in scale in figure 2.


View this table:
[in this window]
[in a new window]
 
Table 1 Description of the INSL/RLN Region

 
Insulin/IGF Subfamily
Insulin and IGF2 are positioned in tandem in all sequenced mammalian genomes as well as in the chicken genome (chromosome 5, position 11.3 Mb). IGF1 is located on HSA 12q23.3.

The regions hosting the insulin/IGF subfamily members share other gene family members and span stretches of HSA 11 and 12. If transitive homology between regions was considered, the similarities could be extended to include HSA 1/2p/20p, 11, 12/14q/15q, and 19q. As with the INSL/RLN paralogy region, these regions have been suggested to be the result of segmental or genome duplications (Lundin 1993Go; Popovici et al. 2001bGo). The extension of the regions was determined by using a similar approach as for the INSL/RLN paralogon (Table S2, Supplementary Material online). Paralogs in 134 families and subfamilies were concentrated to certain subregions that included HSA 1p22-p21, 1p13-p12, 1q21-25, 1q31-q32, 1q41-q44, 2p23-p21, 2p16-p12, 20p12-p11, 11p15-p11, 11q12-q13, 11q23-q24, 12p13-p11, 12q11-q15, 12q21-q24, 14q12-q13, 14q21-q24, 14q31-q32, 15q11-q15, 15q21, 15q24, 15q26, and 19q12-q14. This extended region covered 595 Mb (18,4% of human genome). The total number of genes in this region was not counted.

Gene Families with Paralogs in Both INSL/RLN and Insulin/IGF Paralogons
Gene families belonging to INSL/RLN paralogon often shared a distantly related paralog residing in the insulin/IGF paralogon and vice versa. Forty-two of the 174 families in the INSL/RLN paralogous region had a more distantly related paralog confined within the insulin/IGF paralogon, whereas 36 of 134 families in the insulin/IGF paralogous region had at least one more distantly related paralog in the INSL/RLN paralogon (Table S3, Supplementary Material online). These results indicate that the two paralogons could have a common origin.

Validation of the Presence of Gene Family Members in the INSL/RLN Region
Using MCL and Blat UCSC, 625 of the 4,069 genes in the region were arranged into 174 families or subfamilies that adhered to the set criteria for the paralogon. Out of 625 genes, 178 are located together with a paralog of the same gene family in the same chromosomal region. Phylogenetic reconstructions were used to deduce if they were a result of large-scale duplications that had occurred at the root of vertebrates. The 178 genes could be related to single-gene duplication events late in vertebrate lineage and were not included in further analysis. This leaves 174 families or subfamilies represented 447 times in the region as a whole. The number of representations in 17 subregions is enclosed in parenthesis: HSA 1p36-31 (46), 1p22-p21 (14), 1p13-p11(7), 1q21-q25 (53), 1q31 (4), 6p24-p21 (25), 6p12 (5), 15q13-q15 (8), 15q21-q26 (54), 9p24-p21 (19), 9p13 (7), 9q21-q22 (27), 9q31-q34 (47), 5p15-p12 (7), 5q11-q15 (24), 5q21-q23 (7), and 19p13 (93). The 248 paralogs positioned outside the region constituted 159 representations of the 174 families (Table S1, Supplementary Material online) when corrected for later single-gene duplications.

A statistical model using the binomial distribution, Bin(n,p), with parameters n and p was used. The number of representations in the regions (k) is 447. Total number of representations in the families including the ones outside the regions (n) is 606 (447 + 159). Probability of a gene being in the region (p) is defined as the fraction of the total number of genes which equals p = 0.171. The complementary probability q of being located elsewhere in the genome equals q = 1 – p = 0.829. Using the significance test, described in Methods for testing H_0: p = 0.171 against H_1: p != 0.171 gives t = 37 with a significance level (p < 0.001). Thus, H_0 is rejected. Tests where the 17 subregions were considered separately were also performed. In this case a Bonferroni-correction was applied to adjust the significance level, i.e., {alpha} = 0.05/17. The null hypothesis of randomly distributed genes was rejected for the region (HSA: 1, 6p/15q, 9, or 19p) (p < 0.001).

Three Insulin-RLN Family Homologs in Ciona Genome Draft
A single insulin-RLN family homolog was previously identified in the Ciona genome draft (Satou et al. 2003Go). It was tentatively denoted Ci-insulin and is slightly similar to the IGF-1B in the human proteome, however, this relation is not supported by phylogenetic analysis (data not shown). TBlastN searches (Altschul et al. 1997Go) with the full-length protein sequences of human insulin, IGF2, and fruitfly INSL-2 revealed two additional loci. The Ciona proteins exhibited sequence similarities to both vertebrate and invertebrate insulin-RLN family homologs (fig. 4) that will be described in more detail (Olinski et al. [in preparation]). These two novel putative Ciona insulin-RLN family homologs were denoted Ciona insulin-like proteins 2 and 3 (Ciona INS-L2, Ciona INS-L3), considering the previously described Ci-insulin (Ci-insulin, Ciona gene model; ci0100145885) (Satou et al. 2003Go), which in this work is designated as Ciona INS-L1. EST sequences (Satou et al. 2002Go) were found that correspond to all three putative Ciona insulin-RLN family homologs indicating that all three loci are transcribed.



View larger version (22K):
[in this window]
[in a new window]
 
FIG. 4.— Alignment of Ciona INS-L peptides and members of chordate family of insulin/IGF/RLN. The conserved segments of the sequences are boxed. Invariable amino acids among insulin species are indicated in bold.

 
Genomic Localization of the Three Ciona Insulin/IGF/RLN/INSL Homologs: Ciona INS-L1, Ciona INS-L2, and Ciona INS-L3
Ciona INS-L1 was located in scaffold 58 at position 203316 bp. Scaffold 5 hosted Ciona INS-L2 at position 366022 bp and Ciona INS-L3 at 360447 bp (http://genome.jgi-psf.org/ciona4/ciona4.home.html, C. intestinalis 1.0). INS-L2 and INS-L3 are adjacent on scaffold 5 (fig. 5).

Conserved Synteny Between Ciona and Human Genomes in Regions Hosting the Insulin-RLN Gene Family
Ciona INS-L2 and INS-L3 are positioned in tandem close to a Ciona ortholog (Ciona gene model ci0100145352) of the vertebrate monooxygenase (PAH; fig. 5). A close linkage of IGF genes and monooxygenase genes is conserved throughout the vertebrate subphylum (data not shown). An extended comparison of gene homologies between Ciona and Homo sapiens, with the Ciona INS-L genes as anchor points, revealed similarities between the two species in the regions hosting the insulin-RLN family members. If transitive homology was considered and a comparison was made between the predicted genes surrounding the three Ciona INS-Ls and the human paralogon regions hosting insulin/IGF as well as the INSL/RLN subfamilies identified in this study, the similarity was even greater. Figure 5 displays homologies between Ciona gene models and human genes. Ten or more gene predictions in each direction from the Ciona INS-L genes were analyzed. The two phylogenetically most related human genes for each Ciona gene model were included in figure 5. Those genes were considered as human orthologs. For several Ciona gene models, only one gene could be assigned as an ortholog. The most related ortholog is indicated in bold (fig. 5). Five of the 50 analyzed Ciona gene models could not be assigned a vertebrate homolog. Thirty-five of the remaining 45 Ciona gene models had a human ortholog that resided within the regions of the insulin/IGF and/or INSL/RLN human paralogons. Ten Ciona gene models had human orthologs that could not be found within the two paralogon regions. If only the phylogenetically most related orthologs were considered, 27 genes fall within the regions and 20 outside (fig. 5). In both cases, the number of orthologs in the paralogons is higher than what can be expected if the distribution would have been random (17 genes), considering the relation of the size of the regions to the total human genome.

Thirty Ciona gene models in three arbitrarily selected segments in scaffolds 1–3 (scaffold 1: 482508–800074, 2: 482508–79666, 3: 6690–386958) were analyzed with respect to the chromosomal location of the two most similar human orthologs. Among the 30 gene models in the segment on scaffold 1, 8 were within and 18 were outside the paralogon regions and 4 could not be assigned an ortholog. For segments on scaffold 2 the numbers were 6 within, 22 outside, and 2 without vertebrate orthologs and for scaffold 3, 10 within, 17 outside, and 4 without vertebrate orthologs. Fewer orthologs than what would be expected from a random distribution were found within the paralogon regions. This is in agreement with the prediction that other Ciona regions most likely are related to human chromosomal regions other than the INSL/RLN and insulin/IGF paralogons.


    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The results presented in this study provide support for three major events that shaped the insulin/IGF/INSL/RLN gene family. In addition, we propose that these events have contributed to the generation of the overall structure of the extant vertebrate genomes. First, a duplication of a chromosomal segment that carried the vertebrate insulin/IGF/INSL/RLN ancestor occurred. The resulting two segment copies not only hosted the insulin/IGF and the INSL/RLN intermediate ancestors but also many ancestors of genes that can be found in the two paralogons described in this study. Two subsequent large duplication events, most likely total genome duplications occurring before the vertebrate radiation (Ohno 1970Go; Lundin 1993Go; Spring 1997Go; Popovici et al. 2001aGo; Abi-Rached et al. 2002Go; X. Gu, Wang, and J. Gu 2002Go; McLysaght, Hokamp, and Wolfe 2002Go; Vandepoele et al. 2004Go), gave rise to the quadruplicate paralogy regions, which today constitute the insulin/IGF as well as the INSL/RLN paralogons in the human genome (fig. 6). Several single-gene duplications, small segmental duplications, as well as gene losses, before and after the major duplication events, contributed to the formation of the extant gene families in the different vertebrate genomes.



View larger version (29K):
[in this window]
[in a new window]
 
FIG. 6.— Schematic diagram showing the proposed sequence of events during the formation of the human insulin-RLN gene family.

 
We propose that the first segmental duplication produced two "blocks" positioned on the same ancestral chromosome and that this chromosome later was duplicated at the genome duplication events. This is supported by the fact that the insulin/IGF and the INSL/RLN paralogons are linked in vertebrate genomes. Human insulin/IGF and the INSL/RLN paralogons are linked in two of four cases: HSA 1 and 19. Reconstructions of putative ancestral mammalian chromosomes using the recently published chicken genome assembly (Hillier et al. 2004Go) place the remaining two insulin/IGF paralogon regions (HSA 11 and HSA 12) together with regions of the INSL/RLN paralogon. Regions of HSA 11 are placed together with HSA 9, HSA 5, and HSA 19p on putative mammalian ancestor chromosome 12. HSA 11 is together with HSA 5 on putative mammalian ancestor chromosome 13 and also with HSA 15 (putative mammalian ancestor chromosome 8). HSA 12 is placed together with HSA 6 on putative chromosome 11. Furthermore, reconstructions of putative ancestral bony fish chromosomes using the Tetraodon genome assembly (Jaillon et al. 2004Go) recognize linkage of HSA 11 with HSA 5 and HSA 9 (ancestral fish chromosomal regions H:3, 1 and A:3, 5). HSA 12 is linked with HSA 6 and HSA 15 (ancestral fish chromosomal regions K:10, 8 and J:2, 6). HSA 1 is suggested to represent an ancestral chromosomal configuration within the mammalian class (Murphy, Stanyon, and O'Brien 2001Go; Murphy et al. 2003Go; Hillier et al. 2004Go). This is in agreement with our suggestion that the pro-orthologous regions for the insulin/IGF and INSL/RLN paralogons were duplicated, at least once, as a continuous segment that carried both paralogy regions and that one of the copies has remained linked (HSA 1). The other(s) were fragmented and are part of smaller chromosomes. A similar interpretation would predict that HSA 19 represents an ancestral configuration. However, it has been proposed that the joining of p- and q-arms of HSA 19 was a late event in vertebrate evolution (Fronicke et al. 2003Go; Yang et al. 2003Go).

Our conclusions are built on the identification and evaluation of the two human paralogons that carry the insulin-RLN family members and the conserved synteny between regions of the Ciona genome and the paralogons. The results infer that the two Ciona regions share common origins with the two discussed paralogons. A rational approach to uncover genomic homology such as paralogons is to employ gene-homology matrices (Wolfe and Shields 1997Go; Vision, Brown, and Tanksley 2000Go). Such analysis requires some conservation of gene order, but when analyzing the human genome this was shown to be too strict (Van de Peer 2004Go). Reducing the constraint of conserved gene order and only considering gene content were shown to be applicable (Abi-Rached et al. 2002Go). We used the MCL method in combination with phylogeny to generate and to identify families and subfamilies that adhere to the paralogon criteria that was stipulated for the definition of regions of the human genome with conserved gene content. The MCL parameters were strict and ensured that genes were not overclustered into families. Several large gene families that were produced by ancient duplications could be divided in subfamilies inferred by phylogeny. Families were only counted once for the statistical validation even if they were represented by several paralogs in the same chromosomal segment. The rationale for this is that duplications that produced paralogs locally are often more recent than the postulated major duplication events. This is in agreement with that many tandem and small segmental duplication events occurred late in the primate lineage (Eichler 2001Go; Bailey et al. 2002Go). The result of the statistical validation strongly argues against a model where the pattern of genes in the INSL/RLN paralogon was a result of a selection from random distribution of genes in the genome. This conclusion is in agreement with previous similar tests (Vienne et al. 2003aGo).

Insulin/IGF Genes
Three divergent Ciona insulin/IGF/INSL/RLN genes argue that two duplication events—one segmental duplication followed by single-gene duplication—are both older than the split of Ciona from the common chordate lineage. This result agrees with a duplication that gave rise to two distinct insulin/IGF genes before vertebrate radiation. At least two insulin/IGF-like genes exist in the tunicate Chelyosoma productum: an insulin-like and an IGF-like gene (McRory and Sherwood 1997Go). However, phylogenetic analysis does not provide support for grouping any Ciona or Chelyosoma genes with any particular vertebrate paralog but rather groups each species separately (data not shown). This reflects the low sequence similarity and low number of informative characters in the alignments. Physical linkage between the two Chelyosoma genes, as is the case for Ciona INS-L2 and INS-L3 and for the insulin and IGF2 genes in mammalian and chicken genomes, would provide strong evidence for their insulin/IGF orthology. Interestingly, in the genome assemblies of Danio rerio, F. rubripes, and Tetraodon, the genes for insulins (Irwin 2004Go) and IGF2s are not physically linked. This separation likely represents a specific feature in the bony fish lineage. The closely linked positions of IGFs and monooxygenase genes are conserved in all of the assembled vertebrate genomes (data not shown, http://www.ensembl.org/). The close linkage of a Ciona monooxygenase homolog with Ciona INS-L2 and INS-L3 (fig. 5) supports the assumption that linkage between insulin/IGF and monooxygenase homologs is ancient (Patton, Luke, and Holland 1998Go). The similarity in the adjacency of Ciona INS-L2 with INS-L3 and the insulin with IGF2 genes in mammalian and chicken genomes indicates that Ciona INS-L2 and INS-L3 are putative orthologs of vertebrate insulin and IGF genes.

Two of the four paralogy regions in the human insulin/IGF paralogon carry insulin and IGF genes. A similar pattern can be seen in all of the analyzed vertebrate genomes. The pattern is distinguishable in bony fishes but disguised by the extra round of genome duplication (Christoffels et al. 2004Go; Jaillon et al. 2004Go). This implies that one of the ancestral insulin/IGF cluster copies could have been already lost after the first of two suggested genome duplications. The proposed outline of events that formed human insulin, IGF1, and IGF2 is displayed in figure 6.

INSL/RLN Genes
Given the data, we propose that Ciona INS-L1 is an ortholog of the human INSLs and RLNs. We suggest that the INSL/RLN ancestor was duplicated twice as a result of putative genome duplications at the dawn of vertebrates producing four genes that resided in the ancestral regions of the INSL/RLN paralogon. The positional information suggests that one copy evolved into INSL5 (HSA 1), a second copy was duplicated together with a small segment and the resulting paralogs evolved into RLN3 and INSL3 (HSA 19p), and the third copy was further single-gene duplicated to produce the cluster with INSL6, INSL4, RLN2, and RLN1 (HSA 9). The fourth copy must have been lost (positioned in the ancestral region of HSA 6p/15q). The results of phylogenetic analysis with inferred relations between RLN and INSL sequences (fig. 1) were not supported statistically, and relations could not be used for resolving the timing of the duplications. Long branches, high homoplasy index, and few informative sites are in agreement with the low degree of conservation between the INSL/RLN paralogs (Bajaj, Blundell, and Wood 1984Go; Dores, Rubin, and Quinn 1996Go; Conlon 2000Go; Wilkinson et al. 2005Go). Our duplication scenario agrees with another scenario that takes function and receptor interactions into consideration (Hsu 2003Go). Position and phylogeny indicate that the duplication producing human INSL4 and RLN1 occurred in the primate lineage and the duplication that produced human INSL6 and RLN2 occurred before the mammalian radiation. A chicken INSL/RLN sequence is positioned on a segment of the chromosome Z (Hillier et al. 2004Go). This region is syntenic to HSA 9 that carry the INSL6, INSL4, RLN2, and RLN1, and this suggests that this gene is an ortholog of the human genes positioned on HSA 9. However, the sequence assembly of chromosome Z is not complete; thus, there may be a second gene on chromosome Z indicating that one of the duplications occurred earlier than the reptilian/avian divergence from the mammalian lineage. The time window, in which the segmental duplication on HSA 19p that gave rise to RLN3 and INSL3 occurred, is less clear. Neither orthologs to RLN3 nor to INSL3 can be found on chicken chromosome 28 that is suggested to be syntenic to HSA 19p (Smith et al. 2002Go; Hillier et al. 2004Go), so the chicken genome is not useful for restricting the time window. Interestingly, two JAK homologs and one RLN homolog are present on Tetraodon chromosome 1, and the region is suggested to be syntenic with HSA 19p (Jaillon et al. 2004Go), which carries RLN3, INSL3, and two JAK homologs. The two Tetraodon JAK homologs are separated on chromosome 1, while the RLN homolog is close to one of them. The organization is similar to the region on human chromosome 19p (fig. 3). This implies that the segmental duplication in the ancestral HSA 19p region is old and occurred before divergence of bony fishes from the common vertebrate lineage but after the second genome duplication.

It has been shown that the relaxin proteins bind and activate leucine-rich repeat–containing G-protein–coupled receptors (LGRs) (Hsu et al. 2000Go, 2002Go; Kumagai et al. 2002Go; Sudo et al. 2003Go) together with two members of relaxin 3 G-protein–coupled receptors (RLN3Rs) (Liu et al. 2003Go, 2005Go). This finding was surprising because the classical signaling unit of the insulin and IGF receptors is of the receptor tyrosine kinase type of receptors. The Ciona genome contains both potential LGR and insulin receptor homologs (Satou et al. 2003Go; Campbell, Satoh, and Degnan 2004Go). Ciona INS-L1, the putative ortholog of the vertebrate INSL/RLN genes, lacks an LGR-binding motif found in several RLNs (Hsu 2003Go; Wilkinson et al. 2005Go), and it is an open issue with what receptor type Ciona INS-L1 interacts. The best characterized vertebrate ligands of LGRs are the heteromeric glycoprotein hormones: follicle-stimulating, luteinizing and thyroid-stimulating hormones. These hormones interact with type A LGRs, relaxins with type C LGRs, while the type B LGRs are still orphan receptors (Hsu et al. 2002Go). Members of type A and B LGRs are located right in the insulin/IGF paralogon (Table S2, Supplementary Material online), whereas RLN3R1 (GPCR135) and RLN3R2 (GPR100) reside in the region that belongs to INSL/RLN paralogon (Table S1, Supplementary Material online). The type C LGRs—the relaxin receptors LGR7 and LGR8—are on HSA 4 and HSA 13 that are associated to the insulin/IGF paralogon via HSA 11 in the predicted ancestral mammalian chromosome 8 (Hillier et al. 2004Go). Coincidental or not, the follicle-stimulating, luteinizing, and thyroid-stimulating hormones are located in the insulin/IGF paralogon (Table S2, Supplementary Material online). Furthermore, the insulin receptor family members are in the INSL/RLN paralogon; thus, the involved gene families are all present in the two paralogons. This implies that ancestral homologs of the ligands (shown in this study) as well as receptors were present in the suggested prechordate chromosomal segment that was subjected to major duplication events.

Simultaneous duplication of ligands, cognate as well as noncognate receptors, opens the possibilities for increased complexity by coevolution of specific ligand-receptors pairs (Hallböök 1999Go) and also for supplying the substrate for ligand interactions with new receptors. The occurrence of ancient large-scale duplications, as supported by the results in this study and that have been proposed by Ohno and many others (Holland et al. 1994Go), have most likely played pivotal roles in the evolutionary process of biological innovations. The similarities between the Ciona genome and the quadruplicate consecutive paralogy regions identified in this study give support for the growing notion that vertebrate genomes are the product of genome duplications of an ancestral prechordate genome. These data also fuel the process to reconstruct the Urbilaterian genome.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Tables S1–S3 (supplemented as Excel files) and Figure S1 (supplemented as JPEG file) are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Table S1. Gene families with at least two paralogs falling into the INSL/RLN paralogon (HSA: 1, 6p/15q, 9/5, 19p).
Table S2. Gene families with at least two paralogs falling into the insulin/IGF paralogon (HSA: 1/2p/20p, 11, 12/14q/15q, 19q).
Table S3. Gene families with the paralogs present in INSL/RLN and insulin/IGF paralogons.
Figure S1. Schematic representation of selected gene family paralogs within the (A) INSL/RLN paralogon spanning human chromosomes 1, 6p/15q, 9/5, and 19p and (B) insulin/IGF paralogon spanning human chromosomes 1/2p/20p, 11, 12/14q/15q, and 19q.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Thanks to Flora de Pablo and Catalina Hernandez and Maria Jönsson for taking part in the initial phase of this work, Dietrich von Rosen for help with the statistics, Dan Larhammar and his group for valuable discussions and input, Bob Campbell for help with identification of putative Ciona LGRs, the analysis of Ciona INS-L1, and Sarah Myers for scrutinizing the manuscript. This work was supported by the European Community's Human Potential Program under contract HPRN-CT-2002-00263, Wallenberg Consortium North, and the Swedish Research Council.


    Footnotes
 
Kenneth Wolfe, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Abi-Rached, L., A. Gilles, T. Shiina, P. Pontarotti, and H. Inoko. 2002. Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 31:100–105.[CrossRef][ISI][Medline]

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

    Bailey, J. A., Z. Gu, R. A. Clark, K. Reinert, R. V. Samonte, S. Schwartz, M. D. Adams, E. W. Myers, P. W. Li, and E. E. Eichler. 2002. Recent segmental duplications in the human genome. Science 297:1003–1007.[Abstract/Free Full Text]

    Bajaj, M., T. Blundell, and S. Wood. 1984. Evolution in the insulin family: molecular clocks that tell the wrong time. Biochem. Soc. Symp. 49:45–54.[Medline]

    Campbell, R. K., N. Satoh, and B. M. Degnan. 2004. Piecing together evolution of the vertebrate endocrine system. Trends Genet. 20:359–366.[Medline]

    Chan, S. J., Q. P. Cao, and D. F. Steiner. 1990. Evolution of the insulin superfamily: cloning of a hybrid insulin/insulin-like growth factor cDNA from amphioxus. Proc. Natl. Acad. Sci. USA 87:9319–9323.[Abstract/Free Full Text]

    Christoffels, A., E. G. Koh, J. M. Chia, S. Brenner, S. Aparicio, and B. Venkatesh. 2004. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol. Biol. Evol. 21:1146–1151.[Abstract/Free Full Text]

    Conlon, J. M. 2000. Molecular evolution of insulin in non-mammalian vertebrates. Am. Zool. 40:200–212.

    ———. 2001. Evolution of the insulin molecule: insights into structure-activity and phylogenetic relationships. Peptides 22:1183–1193.[CrossRef][ISI][Medline]

    Conlon, J. M., Y. Wang, and I. C. Potter. 2001. The structure of Mordacia mordax insulin supports the monophyly of the Petromyzontiformes and an ancient divergence of Mordaciidae and Geotriidae. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 129:65–71.[CrossRef][Medline]

    Coulier, F., C. Popovici, R. Villet, and D. Birnbaum. 2000. MetaHox gene clusters. J. Exp. Zool. 288:345–351.[CrossRef][ISI][Medline]

    Dehal, P., Y. Satou, R. K. Campbell et al. (86 co-authors). 2002. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298:2157–2167.[Abstract/Free Full Text]

    Dores, R. M., D. A. Rubin, and T. W. Quinn. 1996. Is it possible to construct phylogenetic trees using polypeptide hormone sequences? Gen. Comp. Endocrinol. 103:1–12.[CrossRef][Medline]

    Duret, L., N. Guex, M. C. Peitsch, and A. Bairoch. 1998. New insulin-like proteins with atypical disulfide bond pattern characterized in Caenorhabditis elegans by comparative sequence analysis and homology modeling. Genome Res. 8:348–353.[Abstract/Free Full Text]

    Eichler, E. E. 2001. Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17:661–669.[CrossRef][ISI][Medline]

    Enright, A. J., S. Van Dongen, and C. A. Ouzounis. 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30:1575–1584.[Abstract/Free Full Text]

    Fronicke, L., J. Wienberg, G. Stone, L. Adams, and R. Stanyon. 2003. Towards the delineation of the ancestral eutherian genome organization: comparative genome maps of human and the African elephant (Loxodonta africana) generated by chromosome painting. Proc. R. Soc. Lond. B Biol. Sci. 270:1331–1340.[CrossRef]

    Gu, X., Y. Wang, and J. Gu. 2002. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31:205–209.[CrossRef][ISI][Medline]

    Hallböök, F. 1999. Evolution of the vertebrate neurotrophin and Trk receptor gene families. Curr. Opin. Neurobiol. 9:616–621.[CrossRef][Medline]

    Hillier, L., W.W. Miller, E. Birney et al. (175 co-authors). 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695–716.[CrossRef][Medline]

    Holland, P. W., J. Garcia-Fernandez, N. A. Williams, and A. Sidow. 1994. Gene duplications and the origins of vertebrate development. Dev. Suppl. 1994:125–133.

    Hsu, S. Y. 2003. New insights into the evolution of the relaxin-LGR signaling system. Trends Endocrinol. Metab. 14:303–309.[CrossRef][ISI][Medline]

    Hsu, S. Y., M. Kudo, T. Chen, K. Nakabayashi, A. Bhalla, P. J. van der Spek, M. van Duin, and A. J. W. Hsueh. 2000. The three subfamilies of leucine-rich repeat-containing G protein-coupled receptors (LGR): identification of LGR6 and LGR7 and the signaling mechanism for LGR7. Mol. Endocrinol. 14:1257–1271.[Abstract/Free Full Text]

    Hsu, S. Y., K. Nakabayashi, S. Nishi, J. Kumagai, M. Kudo, O. D. Sherwood, and A. J. Hsueh. 2002. Activation of orphan receptors by the hormone relaxin. Science 295:671–674.[Abstract/Free Full Text]

    Irwin, D. M. 2004. A second insulin gene in fish genomes. Gen. Comp. Endocrinol. 135:150–158.[CrossRef][ISI][Medline]

    Jaillon, O., J. M. Aury, F. Brunet et al. (61 co-authors). 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946–957.[CrossRef][Medline]

    Katsanis, N., J. Fitzgibbon, and E. M. Fisher. 1996. Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. Genomics 35:101–108.[CrossRef][Medline]

    Kumagai, J., S. Y. Hsu, H. Matsumi, J. S. Roh, P. Fu, J. D. Wade, R. A. Bathgate, and A. J. Hsueh. 2002. INSL3/Leydig insulin-like peptide activates the LGR8 receptor important in testis descent. J. Biol. Chem. 277:31283–31286.[Abstract/Free Full Text]

    Lagueux, M., L. Lwoff, M. Meister, F. Goltzene, and J. A. Hoffmann. 1990. cDNAs from neurosecretory cells of brains of Locusta migratoria (Insecta, Orthoptera) encoding a novel member of the superfamily of insulins. Eur. J. Biochem. 187:249–254.[ISI][Medline]

    Liu, C., J. Chen, S. Sutton, B. Roland, C. Kuei, N. Farmer, R. Sillard, and T. W. Lovenberg. 2003. Identification of relaxin-3/INSL7 as a ligand for GPCR142. J. Biol. Chem. 278:50765–50770.[Abstract/Free Full Text]

    Liu, C., C. Kuei, S. Sutton et al. (15 co-authors). 2005. INSL5 is a high affinity specific agonist for GPCR142 (GPR100). J. Biol. Chem. 280:292–300.[Abstract/Free Full Text]

    Lundin, L. G. 1993. Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16:1–19.[CrossRef][ISI][Medline]

    Lundin, L. G., D. Larhammar, and F. Hallböök. 2003. Numerous groups of chromosomal regional paralogies strongly indicate two genome doublings at the root of the vertebrates. J. Struct. Funct. Genomics 3:53–63.[CrossRef][Medline]

    McLysaght, A., K. Hokamp, and K. H. Wolfe. 2002. Extensive genomic duplication during early chordate evolution. Nat. Genet. 31:200–204.[CrossRef][ISI][Medline]

    McRory, J. E., and N. M. Sherwood. 1997. Ancient divergence of insulin and insulin-like growth factor. DNA Cell Biol. 16:939–949.[ISI][Medline]

    Murphy, W. J., L. Fronicke, S. J. O'Brien, and R. Stanyon. 2003. The origin of human chromosome 1 and its homologs in placental mammals. Genome Res. 13:1880–1888.[Abstract/Free Full Text]

    Murphy, W. J., R. Stanyon, and S. J. O'Brien. 2001. Evolution of mammalian genome organization inferred from comparative gene mapping. Genome Biol. 2:REVIEWS0005.

    Nagata, K., H. Hatanaka, D. Kohda, H. Kataoka, H. Nagasawa, A. Isogai, H. Ishizaki, A. Suzuki, and F. Inagaki. 1995. Three-dimensional solution structure of Bombyxin-II an insulin-like peptide of the silkmoth Bombyx mori: structural comparison with insulin and relaxin. J. Mol. Biol. 253:749–758.[CrossRef][Medline]

    Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York.

    Patton, S. J., G. N. Luke, and P. W. H. Holland. 1998. Complex history of a chromosomal paralogy region: insights from amphioxus aromatic amino acid hydroxylase genes and insulin-related genes. Mol. Biol. Evol. 15:1373–1380.[Free Full Text]

    Popovici, C., M. Leveugle, D. Birnbaum, and F. Coulier. 2001a. Homeobox gene clusters and the human paralogy map. FEBS Lett. 491:237–242.[CrossRef][ISI][Medline]

    ———. 2001b. Coparalogy: physical and functional clusterings in the human genome. Biochem. Biophys. Res. Commun. 288:362–370.[CrossRef][ISI][Medline]

    Satou, Y., Y. Sasakura, L. Yamada, K. S. Imai, N. Satoh, and B. Degnan. 2003. A genomewide survey of developmentally relevant genes in Ciona intestinalis. V. Genes for receptor tyrosine kinase pathway and Notch signaling pathway. Dev. Genes Evol. 213:254–263.[CrossRef]