MBE Advance Access originally published online on October 20, 2008
Molecular Biology and Evolution 2009 26(1):123-129; doi:10.1093/molbev/msn233
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Parallel Evolution between Aromatase and Androgen Receptor in the Animal Kingdom

* Department of Ecology and Evolution, University of Chicago
Department of Life Science, Assam University, Silchar, India
E-mail: whli{at}uchicago.edu.
| Abstract |
|---|
|
|
|---|
There are now many known cases of orthologous or unrelated proteins in different species that have undergone parallel evolution to satisfy a similar function. However, there are no reported cases of parallel evolution for proteins that bind a common ligand but have different functions. We focused on two proteins that have different functions in steroid hormone biosynthesis and action but bind a common ligand, androgen. The first protein, androgen receptor (AR), is a nuclear hormone receptor and the second one, aromatase (cytochrome P450 19 [CYP19]), converts androgen to estrogen. We hypothesized that binding of the androgen ligand has exerted common selective pressure on both AR and CYP19, resulting in a signature of parallel evolution between these two proteins, though they perform different functions. Consistent with this hypothesis, we found that rates of amino acid change in AR and CYP19 are strongly correlated across the metazoan phylogeny, whereas no significant correlation was found in the control set of proteins. Moreover, we inferred that genomic toolkits required for steroid biosynthesis and action were present in a basal metazoan, cnidarians. The close similarities between vertebrate and sea anemone AR and CYP19 suggest a very ancient origin of their endocrine functions at the base of metazoan evolution. Finally, we found evidence supporting the hypothesis that the androgen-to-estrogen ratio determines the gonadal sex in all metazoans.
Key Words: androgen receptor aromatase steroid hormone parallel evolution
| Introduction |
|---|
|
|
|---|
Parallel evolution is a common phenomenon in nature where two evolutionary lineages undergo similar genotypic and phenotypic modifications under similar environments. It has been observed among geographically isolated animals like Australian marsupials that resemble placental wolves, cats, mice, squirrels, or anteaters. Molecular parallel evolution is manifested by coordinated substitutions of amino acids in independent proteins to achieve optimal structural and functional integrity under similar selective constraints. So far, studies on molecular parallel evolution have been restricted to genes (proteins) involved in the same function in different species such as lysozymes in foregut fermenters (Stewart et al. 1987
Both proteins play a vital role in the vertebrate endocrine system. The first protein is an enzyme, cytochrome P450 19 (CYP19; EC 1.14.14.1 [EC] ), commonly known as aromatase, which converts androgens to estrogens in the steroid biosynthetic pathway. The second protein is a ligand-activated intracellular transcriptional regulator, the androgen receptor (AR), which belongs to the nuclear receptor superfamily. Both proteins bind to androgenic hormones such as testosterone and dihydrotestosterone to execute their function. Androgens are sex hormones responsible for expressing male characteristics in vertebrates.
Animal evolution is characterized by the development of complex nervous and endocrine systems that allow the organism to coordinate its reaction to the environment, to regulate its development, and to maintain homeostasis. The sea anemone Nematostella vectensis (phylum Cnidaria) is the first basal metazoan animal with available genomic sequences and endowed with a nervous system in the form of a nerve net of ectodermal origin (Baguna and Garcia-Fernandez 2003
). However, very little is known about endocrine-like bioregulation in cnidarians due to lack of well-organized organs or systems. Several vertebrate hormones including steroid hormones have been identified in cnidarians (Tarrant 2005
). Moreover, aromatase activity has been demonstrated in coral tissue and found to be temperature dependent as in reptiles (Twan et al. 2006
). Because cnidarians have an extensive complement of metazoan genetic tools, such as developmental genes (Baguna and Garcia-Fernandez 2003
), it is interesting to identify the primitive genomic tools for biosynthesis of steroid hormones or for their receptors in cnidarians. Moreover, the presence of vertebrate-type steroid hormones such as testosterone, estradiol, progesterone in insects, crustaceans, mollusks, echinoderms, annelids, and nematodes has raised speculations about their vertebrate-like biological function in these organisms (De Loof 2006
; Motola et al. 2006
; Barbaglio et al. 2007
; Durou and Mouneyrac 2007
; Kohler et al. 2007
). To date, few orthologs of genes encoding enzymes and receptors involved in the vertebrate endocrine system have been identified in invertebrates. The ever increasing genomic databases of various invertebrates have given us an opportunity to investigate orthologs encoding vertebrate-like endocrine proteins in a basal organism like the sea anemone and in highly divergent groups like insects and nematodes. Moreover, protein components having a shared functional constraint in the form of a common ligand in a complex endocrine system are likely to exhibit parallel evolution in their phylogenetic history.
Thus, this study focuses on detection of parallel evolution between AR and CYP19 by examining their correlation in evolutionary distances in 37 vertebrate and invertebrate species. We also correlate interspecies sequence similarity to the human protein sequences between the two proteins. Similarly, a comparative profiling of the AR and CYP19 phylogenetic trees will be made using topology and branch length information and correlations in the pairwise matrices of patristic distance derived from the two phylogenetic trees.
| Materials and Methods |
|---|
|
|
|---|
Parallel evolution among proteins under a selective constraint is expected to exhibit similar phylogenetic trees in terms of topologies and branch lengths along with high correlation in their maximum likelihood (ML) distances. The topologies and branch lengths of phylogenetic trees were compared using K-tree score, Robinson–Foulds (RF) distance, and patristic distance. Similarly, the mirror tree approach was used to compute a linear correlation coefficient among two ML distance matrices having a common set of species after a correction for the background speciation to establish the degree of functional association between them (Kann et al. 2007
Database Searches and Alignment
Complete genomic sequences and core nucleotides available on Ensembl v. 49 and GenBank were searched for proteins homologous to AR and CYP19A, respectively. The accession numbers of all gene sequences from the 37 species included in this study are given in supplementary table 1 (Supplementary Material online). The coding sequences of both characterized and putative orthologs of AR and CYP19 were collected and translated into amino acid sequences using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Amino acid sequences were aligned using the M-Coffee Web server (Moretti et al. 2007
; http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi). The amino acid sequence alignments of AR and CYP19 are given in supplementary figures 5 and 6 (Supplementary Material online).
Phylogenetic Analysis
The orthologous relationship between all genes was tested by most available methods of phylogenetics. The best model for the AR and CYP19 protein sequences was selected using ProtTest (Abascal et al. 2005
). The Jones–Taylor–Thornton (JTT) model (Jones et al. 1992
) with the rate heterogeneity among-site model was found to be most suitable for both AR and CYP19 based on all available frameworks.
Phylogenetic trees were constructed by the Neighbor-Joining (NJ), maximum parsimony (MP), ML, and Bayesian inference (BI) methods for each set of alignments. Sea anemone was treated as the outgroup for rooting the tree in all phylogenies. NJ trees were constructed using the program MEGA4 (Tamura et al. 2007
) with the JTT model of amino acid replacement. MP trees were constructed using PAUP* 4.0b10 (Swofford 2002
). In all, 1,000 bootstrap replicates were conducted, each composed of 100 random additions of taxa and a heuristic search using Tree Bisection-Reconnection branch swapping. ML analyses were also performed on these data sets, using the program PROML module implemented in the PHYLIP package (Felsenstein 2004
). MrBayes v.3.0b4 (Huelsenbeck and Ronquist 2001
) was used for Bayesian Markov chain Monte Carlo analysis over the JTT model, assuming a four-category gamma among-site rate variation distribution, with uniform priors over trees, branch lengths, and the among-site rate variation alpha parameter. Three independent analyses, each with four chains, were run for 106 generations and sampled every 1,000 generations. The first 250 samples from each run were discarded as burn-in and all analyses converged on the same consensus tree. The topologies of various trees were compared using the one-sided Kishino-Hasegawa (KH) test and the Shimodaira-Hasegawa (SH) test (Goldman et al. 2000
) as implemented in Tree-Puzzle version 5.2 (Schmidt et al. 2002
). Topologies and relative branch lengths of phylogenetic trees were compared using Ktreedist (Soria-Carrasco et al. 2007
), for calculating the K-tree score and Robinson and Foulds (1981)
distance. The distribution of sitewise log-likelihood values was computed using Tree-Puzzle. The nonparametric Kolmogorov–Smirnov test (Gibbons and Chakraborti 2003
) was performed to test different distributions among different topologies using R (R Development Core Team 2008
). The images of phylogenetic trees were created using TreeIllustrator (Trooskens et al. 2005
) and TREEVIEW (Page 1996
).
Correlation Analysis
A distance matrix and similarity table of each alignment set by pairwise comparisons of the proportion of different amino acids per site was calculated as an ML estimate using the PROTDIST module implemented in the PHYLIP package (Felsenstein 2004
). The mirror tree (Pazos and Valencia 2001
; Kann et al. 2007
) approach with a correction for the background speciation was used to estimate the Pearson correlation coefficient between two ML distance matrices. PATRISTICv1.0 (Fourment and Gibbs 2006
) was used to calculate patristic distances from NJ, MP, ML, and BI trees and to calculate Pearson correlation coefficients from distance matrices. Here, the patristic distance is the sum of lengths of the branches that link two extant protein sequences in terms of their pathways of evolutionary divergence. R language and environment were implemented to generate scatter plots from ordered pairs of ML and patristic distances (R Development Core Team 2008
). All distance matrices are included as the supplementary data file (Supplementary Material online).
| Results |
|---|
|
|
|---|
Comparison of Tree Topologies
The ML topology (supplementary fig. 3, Supplementary Material online) is the best topology for both AR and CYP19 proteins as supported by the one-sided KH test and the SH test, followed by the BI tree (fig. 1) and the MP tree (supplementary fig. 2, Supplementary Material online). However, the NJ trees (supplementary fig. 4, Supplementary Material online) are excluded from the confidence interval at the 5% significance level.
|
The phenomenon of parallel evolution between the AR and CYP19 proteins can be revealed using both topology comparison and distance-based approaches. The topology-based methods we used include the K-tree score and the RF distance. The K-tree score takes into account both topology and branch length information of a phylogenetic tree with global evolutionary rates (Soria-Carrasco et al. 2007
|
Among the AR trees, the BI tree is closely similar to the ML tree in terms of the K-tree score (0.49) and the RF distance (9) followed by similarity between the NJ–ML trees (1.25; 30) and the NJ–BI trees (1.45; 25). However, the MP tree has less similarity to the NJ tree (4.18; 36), the BI tree (4.60; 23), and the ML tree (5.86; 28). Among the CYP19 trees, the BI tree is very close to the ML tree (0.39; 13) followed by the NJ tree (1.44; 25) and the MP tree (2.57; 25). The MP tree exhibits more difference in terms of branch length and topology with the BI (2.57; 25), ML (3.51; 28), and NJ trees (5.01; 44) for CYP19.
In the AR topology, the BI tree and the ML tree are almost identical. It is well supported by the Kolmogorov–Smirnov test statistic (D = 0.0051; P = 1) on the distribution of sitewise log-likelihood values between the BI and ML trees. However, the MP tree has some minor differences such as the clustering of Ciona with Tribolium and the clustering of hedgehog and shrew. Moreover, the NJ tree shows an unusual topology for opossum, bush baby, and cow. For CYP19, the ML tree has few alterations with respect to the position of eel, medaka, guinea pig, and squirrel in the BI tree. However, the distribution of sitewise log-likelihood values between the ML and BI trees was closely similar as supported by the two-sided nonparametric Kolmogorov–Smirnov test statistic (D = 0.0108, P = 1). The MP tree shows some unusual positions such as separate branches for Tribolium, Aedes, and silkworm and the clustering of sea urchin with Drosophila. The NJ tree is more unusual with respect to the known phylogeny in the positions of cat, shrew, opossum, chicken, Xenopus, and fish.
Because the BI and ML trees have high statistical support and are close to the known phylogeny, we assume that the two trees approximately reflect the true evolutionary history of the AR and CYP19 proteins. The major topological conflicts observed between the two protein BI trees are the altered positions of sea urchin, insects, eel, and medaka in different clades. The position of sea urchin in the AR tree is more concordant with the known species tree than in the CYP19 tree where sea urchin diverged before the insect–chordate split. Aedes clusters with Tribolium in the CYP19 tree instead of its dipteran relative, Drosophila. The position of Caenorhabditis elegans, a nematode, in both the AR and CYP19 trees is very interesting from the parallel evolutionary point of view. Instead of clustering with its close ecdysozoan relative insects, C. elegans forms a parallel clade alone in AR and with Ciona in CYP19 in close association with the vertebrate clade. This unusual position of C. elegans may be due to convergent evolution of nematode and vertebrate AR and CYP19 proteins after the divergence of the nematode–insect clade. Medaka AR clustered with two species of puffer fish, Fugu, and Tetraodon in concordance with the known phylogeny. However, medaka CYP19 formed a common separate clade with eel, a species under a distant order Anguilliformes. Between the ML trees of AR and CYP19, the observed differences are similar to the BI trees of respective proteins.
Before concluding that the above observations support parallel evolution between AR and CYP19, we compare the variability in the topologies of the AR and CYP19 trees with respect to random background ML trees, using the nonparametric Kolmogorov–Smirnov test (Gibbons and Chakraborti 2003
). The background random trees were generated using TREEVIEW program (Page 1996
) based on ML analysis in PROML program under PHYLIP package (Felsenstein 2004
). Between the AR and CYP19 trees, the test statistic D is significantly different for BI and ML trees (table 1). However, the test statistics between AR and CYP19 (0.29) are much lower than the average value (D = 0.425) among the background random trees. This is well reflected in the topologies of the background random trees, which are not only incongruent with each other but also not concordant with the known species phylogeny (data not shown). Therefore, more similar topologies between the AR and CYP19 trees in comparison with the background random trees indicate parallel evolution of AR and CYP19.
Comparison of Evolutionary Distances
We also study the associated evolution of the AR and CYP19 genes in various animal genomes using distance-based approaches. All pairs in terms of their ML genetic distances in the 37 species are compared between the AR and CYP19 proteins using the mirror tree approach. We find a highly significant association (fig. 2; r = 0.95; P < 10–6) in the pairwise ML distances of the two protein sequences. As a background control, we also study six functionally unrelated proteins, namely erythropoietin, glucagon receptor, myoglobin, ferritin, glucokinase, and amylase and find no significant correlation among their ML distances (table 2). Second, we compare associations between patristic distances of the two proteins in all 37 species in the BI, ML, MP, and NJ trees. The BI tree shows the strongest correlation between patristic distances followed by the ML, MP, and NJ trees (fig. 3). The ML distance-based approach indicates a highly significant association between the evolution of the AR sequence and the evolution of the CYP19 sequence. In contrast, the control set of proteins showed (nonsignificant) low correlations between ML distances (table 2). In conclusion, rates of amino acid substitutions in AR and CYP19 are highly correlated across the metazoan phylogeny.
|
|
|
When the similarities between the AR and CYP19 protein sequences are compared in terms of similarity index with their respective orthologous human proteins, there is a high correlation (r = 0.98; P < 2.2 x 10–16) in interspecies variation of the AR and CYP19 protein sequences (fig. 4). This high correlation in similarity of protein sequences supports our hypothesis on parallel evolution of AR and CYP19.
|
| Discussion |
|---|
|
|
|---|
This work demonstrates for the first time a highly significant parallel evolution between two proteins having different structures and functions (supplementary figure 1, Supplementary Material online). The parallel evolution between AR and CYP19 is well reflected in terms of high correlation between their ML distances and interspecies similarity to the human sequences and similar topology of phylogenetic trees in both proteins. The common factor posing functional constraints on AR and CYP19 is their ligand, androgen. Therefore, this finding reinforces the notion that functional constraint of a protein is the principal evolutionary force in its evolution (Li 1997
This finding of vertebrate-like endocrine components in invertebrates has given a new impetus to Bogart's hypothesis that gonadal sex in all animals is determined by the local gonadal ratio of androgens to estrogens (Bogart 1987
). This ratio in turn induces different gene transcription pathways for sex differentiation in animals. This ratio is controlled by the activity of aromatase (CYP19), which converts androgen testosterone to estrogen. The presence of a temperature-dependent activity (Twan et al. 2003
) in the aromatase of sea anemone suggests the presence of environmental sex determination in this basal metazoan, such as that which exists in the reptiles. Therefore, it seems that endocrine regulation of gonadal sex determination is a ubiquitous phenomenon in the animal kingdom, contrary to the view that it is only valid in vertebrates.
The parallel evolution of AR and CYP19 from sea anemone to human is the manifestation of an intimate evolutionary connection between them. It also indicates that natural selection has resulted in an ordered acquisition of genes and the progressive buildup of molecular mechanisms that increase coordination among various components of the endocrine system in the animal kingdom. This study is the beginning of an analysis of parallel evolution among functionally related proteins in the endocrine system. It is expected that many novel parallel evolutionary trends among various proteins in different physiological systems will be unveiled in the near future.
| Supplementary Material |
|---|
|
|
|---|
Supplementary data file, figures 1– 6, and table 1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
B.K.T. is grateful to the Department of Biotechnology, Government of India, New Delhi, for financial support in form of Overseas Associateship in niche areas of biotechnology. He also thanks Assam University, Silchar, India for granting him study leave to carry out this work. We thank two anonymous reviewers, Ron Adkins, Jake Byrnes, Joshua Rest, and Raja Jothi for valuable suggestions. This study was in part supported by a National Institutes of Health (NIH) grant GM30998 to W.H.L.
| Footnotes |
|---|
David Irwin, Associate Editor
| References |
|---|
|
|
|---|
Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics (2005) 21:2104–2105.
Baguna J, Garcia-Fernandez J. Evo-Devo: the long and winding road. Int J Dev Biol (2003) 47:705–713.[Web of Science][Medline]
Barbaglio A, Sugni M, Di Benedetto C, Bonasoro F, Schnell S, Lavado R, Porte C, Candia Carnevali DM. Gametogenesis correlated with steroid levels during the gonadal cycle of the sea urchin Paracentrotus lividus (Echinodermata: Echinoidea). Comp Biochem Physiol A Mol Integr Physiol (2007) 147:466–474.[CrossRef][Medline]
Bogart MH. Sex determination: a hypothesis based on steroid ratios. J Theor Biol (1987) 128:349–357.[CrossRef][Web of Science][Medline]
De Loof A. Ecdysteroids: the overlooked sex steroids of insects? Males: the black box. Insect Science (2006) 13:325–338.[CrossRef]
Doolittle RF. Convergent evolution: the need to be explicit. Trends Biochem Sci (1994) 19:15–18.[CrossRef][Web of Science][Medline]
Durou C, Mouneyrac C. Linking steroid hormone levels to sexual maturity index and energy reserves in Nereis diversicolor from clean and polluted estuaries. Gen Comp Endocrinol (2007) 150:106–113.[CrossRef][Web of Science][Medline]
Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author (2004) Department of Genome Sciences, University of Washington, Seattle.
Fourment M, Gibbs MJ. PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change. BMC Evol Biol (2006) 6:1.[CrossRef][Medline]
Gibbons JD, Chakraborti S. Nonparametric statistical Inference (2003) 4th ed. New York: Marcel Dekker.
Goldman N, Anderson JP, Rodrigo AG. Likelihood-based tests of topologies in phylogenetics. Syst Biol (2000) 49:652–670.
Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics (2001) 17:754–755.
Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci (1992) 8:275–282.
Kann MG, Jothi R, Cherukuri PF, Przytycka TM. Predicting protein domain interactions from coevolution of conserved regions. Proteins (2007) 67:811–820.[CrossRef][Web of Science][Medline]
Kohler H-R, Kloas W, Schirling M, Lutz I, Reye AL, Langen J-S, Triebskorn R, Nagel R, Schonfelder G. Sex steroid receptor evolution and signalling in aquatic invertebrates. Ecotoxicology (2007) 16:131–143.[CrossRef][Web of Science][Medline]
Li W-H. Molecular evolution (1997) Sunderland (MA): Sinauer Associates.
Moretti S, Armougom F, Wallace IM, Higgins DG, Jongeneel CV, Notredame C. The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. Nucleic Acids Res (2007) 35:645–648.[CrossRef]
Motola DL, Cummins CL, Rottiers V, Sharma KK, Li T, Li Y, Suino-Powell K, Xu HE, Auchus RJ, Antebi A. Identification of ligands for DAF-12 that govern dauer formation and reproduction in C. elegans. Cell (2006) 124:1209–1223.[CrossRef][Web of Science][Medline]
Page RDM. TREEVIEW: an application to display phylogenetic trees on personal computers. Comp Appl Biosci (1996) 12:357–358.[Medline]
Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng (2001) 14:609–614.
R Development Core Team. R: a language and environment for statistical computing (2008) Vienna (Austria): R Foundation for Statistical Computing. ISBN 3-900051-07-0. Available from: http://www.R-project.org.
Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci (1981) 53:131–147.[CrossRef][Web of Science]
Schmidt HA, Strimmer K, Vingron M, von Haeseler A. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics (2002) 18:502–504.
Soria-Carrasco V, Talavera G, Igea J, Castresana J. The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees. Bioinformatics (2007) 23:2954–2956.
Stewart CB, Schilling JW, Wilson AC. Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature (1987) 330:401–404.[CrossRef][Medline]
Swofford DL. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0b 10 (2002) Sunderland (MA): Sinauer Associates.
Tamura K, Dudley J, Nei M, Kumar S. MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol (2007) 24:1596–1599.
Tarrant AM. Endocrine-like signalling in cnidarians: current understanding and implications for ecophysiology. Integr Comp Biol (2005) 45:201–214.
Trooskens G, De Beule D, Decouttere F, Van Criekinge W. Phylogenetic trees: visualizing, customizing and detecting incongruence. Bioinformatics (2005) 21:3801–3802.
Twan WH, Hwang JS, Chang CF. Sex steroids in scleractinian coral, Euphyllia ancora: implication in mass spawning. Biol Reprod (2003) 68:2255–2260.
Twan WH, Hwang JS, Lee YH, Wu HF, Tung YS, Chang CF. Hormones and reproduction in scleractinian corals. Comp Biochem Physiol A Mol Integr Physiol (2006) 144:247–253.[CrossRef][Medline]
Yokoyama R, Yokoyama S. Convergent evolution of the red- and green-like visual pigment genes in fish, Astyanax fasciatus, and human. Proc Natl Acad Sci USA (1990) 87:9315–9318.
Zhang J. Parallel adaptive origins of digestive RNases in Asian and African leaf monkeys. Nature Genetics (2006) 38:819–823.[CrossRef][Web of Science][Medline]
Zhang J, Kumar S. Detection of convergent and parallel evolution at the amino acid sequence level. Mol Biol Evol (1997) 14:527–536.[Abstract]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. V. Markov, R. Tavares, C. Dauphin-Villemant, B. A. Demeneix, M. E. Baker, and V. Laudet Independent elaboration of steroid hormone signaling pathways in metazoans PNAS, July 21, 2009; 106(29): 11913 - 11918. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




