MBE Advance Access originally published online on November 29, 2006
Molecular Biology and Evolution 2007 24(2):465-481; doi:10.1093/molbev/msl182
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Articles |
Functional Diversification of B MADS-Box Homeotic Regulators of Flower Development: Adaptive Evolution in ProteinProtein Interaction Domains after Major Gene Duplication Events
Laboratorio de Genética Molecular, Desarrollo y Evolución de Plantas, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad Universitaria, Coyoacán, México D.F., México
E-mail: ealvarez{at}miranda.ecologia.unam.mx.
| Abstract |
|---|
|
|
|---|
B-class MADS-box genes have been shown to be the key regulators of petal and stamen specification in several eudicot model species such as Arabidopsis thaliana, Antirrhinum majus, and Petunia hybrida. Orthologs of these genes have been found across angiosperms and gymnosperms, and it is thought that the basic regulatory function of B proteins is conserved in seed plant lineages. The evolution of B genes is characterized by numerous duplications that might represent key elements fostering the functional diversification of duplicates with a deep impact on their role in the evolution of the floral developmental program. To evaluate this, we performed a rigorous statistical analysis with B gene sequences. Using maximum likelihood and Bayesian methods, we estimated molecular substitution rates and determined the selective regimes operating at each residue of B proteins. We implemented tests that rely on phylogenetic hypotheses and codon substitution models to detect significant differences in substitution rates (DSRs) and sites under positive adaptive selection (PS) in specific lineages before and after duplication events. With these methods, we identified several protein residues fixed by PS shortly after the origin of PISTILLATA-like and APETALA3-like lineages in angiosperms and shortly after the origin of the euAP3-like lineage in core eudicots, the 2 main B gene duplications. The residues inferred to have been fixed by positive selection lie mostly within the K domain of the protein, which is key to promote heterodimerization. Additionally, we used a likelihood method that accommodates DSRs among lineages to estimate duplication dates for AP3PI and euAP3TM6, calibrating with data from the fossil record. The dates obtained are consistent with angiosperm origins and diversification of core eudicots. Our results strongly suggest that novel multimer formation with other MADS proteins could have been crucial for the functional divergence of B MADS-box genes. We thus propose a mechanism of functional diversification and persistence of gene duplicates by the appearance of novel multimerization capabilities after duplications. Multimer formation in different combinations of regulatory proteins can be a mechanistic basis for the origin of novel regulatory functions and a gene regulatory mechanism for the appearance of morphological innovations.
Key Words: B MADS-box genes flower development positive Darwinian selection functional diversification protein evolution morphological novelties
| Introduction |
|---|
|
|
|---|
The neo-Darwinian synthesis has provided a solid framework to understand evolutionary processes in terms of changes in the genetic composition of populations (Dobzhansky 1951
In angiosperms, the relatively abrupt and extensive diversification that occurred shortly after they appeared in the fossil record (Magallón et al. 1999
; Magallón and Sanderson 2001
) has been considered a classic example of species radiation and has been linked to the appearance of novel structures that allowed a unique reproductive mode: flowers. The genetic bases of flower development have been widely studied in eudicot model species such as Arabidopsis thaliana and Antirrhinum majus since the 1990s. These studies yielded the ABC combinatorial model for floral organ determination (Coen and Meyerowitz 1991
). Four of the 5 genes in the ABC model belong to the Type II plant MADS-box genes, which have been shown to be key regulators in several plant developmental traits (Alvarez-Buylla, Liljergren, et al. 2000
; Alvarez-Buylla, Pelaz, et al. 2000
; Ng and Yanofsky 2001
). Further studies have extended the interactions between ABC proteins to another set of floral MADS-box proteins encoded by the SEPALLATA genes subfamily to achieve their regulatory functions (Egea-Cortines and Davies 2000
; Pelaz et al. 2000
; Honma and Goto 2001). Several studies have yielded evidence indicating that the basic function of floral ABC proteins is conserved across angiosperms and to some extent even in gymnosperms (Bowman 1997
; Egea-Cortines et al. 1999
; Kramer and Irish 1999
, 2000
; Sündstrom et al. 1999
; Sündstrom and Engström 2002
; Whipple et al. 2004
: Kim et al. 2005
). A recent network dynamic model that considers the concerted action of ABC and interacting non-ABC proteins has proposed an explanation for the robustness of the floral genetic developmental program among core eudicots (Espinosa-Soto et al. 2004
).
In a previous study that focused on the complete family of MADS-box genes for A. thaliana, we reported that positive Darwinian or adaptive selection might have been important in fixing specific protein residues after some of the main duplication events leading to the paralogous groups of the gene family (Martínez-Castilla and Alvarez-Buylla 2003
). However, we were not able to establish whether or not adaptive evolution played a major role in the functional diversification of orthologous genes in the major angiosperm lineages. By comparing sequences of orthologous genes from several angiosperm species in a phylogenetic context, in this work, we address this question in one of the main subfamilies of floral regulatory genes, the so-called B MADS-box genes.
There are 2 lineages of B type genes in angiosperms, named APETALA3-like (AP3) and PISTILLATA-like (PI), after their characterization in A. thaliana. As described in the ABC model of flower development, when coexpressed with A function genes in the floral meristem, B genes specify petal formation, whereas when coexpressed with C function genes, they specify stamen formation (Bowman et al. 1991
; Coen and Meyerowitz 1991
; Meyerowitz et al. 1991
). AP3 and PI proteins interact to form a heterodimer that indirectly regulates PI expression and directly binds to the AP3 promoter in a self-activating regulatory loop that maintains the B function in the second and third whorls of the meristem during flower development (Krizek and Meyerowitz 1996
; Hill et al. 1998
; Honma and Goto 2000
).
It has been postulated that the obligate heterodimerization of PI and AP3 proteins observed in core eudicots (Davies et al. 1996
; Riechmann et al. 1996
) evolved from the homodimerization found in gymnosperms, perhaps via a transitory state of facultative homo-heterodimerization (Winter et al. 2002
). The evolution of heterodimerization in B-class proteins could have been fundamental for the establishment of the floral genetic developmental program in angiosperm lineages, promoting their floral morphological diversification. In this study, we address whether or not adaptive evolution was important during the origin of obligate B-protein heterodimerization, particularly following the critical AP3PI duplication toward the base of angiosperms.
In a recent study, it was estimated that the duplication leading to the AP3 and PI lineages in angiosperms occurred 260 million years ago (MYA) (Kim et al. 2004
), shortly after the angiospermgymnosperm divergence (fig. 1B), after which the B MADS-box genes underwent a complex pattern of independent duplications. A main duplication occurred in the AP3 lineage, leading to 2 AP3 sublineages distinguished by characteristic motifs in their C-terminal region (Kramer et al. 1998
). The paleoAP3 lineage is found in basal angiosperms, monocots, magnoliids, and basal eudicots with the proteins possessing a paleoAP3 motif in their C-terminal. Most core eudicots possess a copy of a paleoAP3-like gene, named TM6-like genes after its description in tomato (Pnueli et al. 1991
). In contrast, the euAP3 lineage is found exclusively in core eudicots B-class genes, possessing the characteristic euAP3 C-terminal motif (see figs. 1A, 1B, and 4A). Experimental evidence based on Arabidopsis genetic transformations using chimeric B proteins have shown that the C-terminal domains of AP3 and PI as well as the 2 distinct AP3 genes paleoAP3 and euAP3 are functionally divergent (Lamb and Irish 2003
). A further study in Petunia hybrida showed that PethDEF (euAP3-like) and PethTM6 (paleoAP3-like) are also functionally divergent, with PethTM6 having a more important role in stamen specification and weaker expression in petals (Vandenbussche et al. 2004
).
|
Core eudicots are the most abundant angiosperm lineage, comprising almost 73% of extant angiosperm species (Drinnan et al. 1994
Using statistical comparisons in a likelihood framework, we tested for significant differences in the substitution rates (DSRs) and for positive selection (PS) at individual sites of the protein along the main B gene lineages, to test key hypotheses regarding the evolutionary roles played by the 2 main duplications: 1) the one leading to AP3 and PI lineages in angiosperms, to explore the evolutionary forces that could have played a role in driving the evolution of heterodimerization and 2) the one leading to the euAP3 and TM6 lineages because this duplication could have been key in the evolution of the floral developmental mechanisms characteristic of the core eudicots. Additionally, we estimated divergence dates with these data and compared our results with the information in the angiosperm fossil record.
Our results strongly suggest that shortly after the duplication that lead to the AP3 and PI genes, functional diversification driven by PS acting on different sites within the K domain, which is key for heterodimerization, occurred along both duplicated gene lineages. We report an estimated date for this duplication at 290 MYA, which is closer to the divergence of angiosperms and gymnosperms registered in the fossil record as compared with a previously reported date (Kim et al. 2004
). Our results suggest an early origin for heterodimerization of B MADS-box proteins in angiosperms shortly after they diverged from gymnosperms. Moreover, the estimated date for the euAP3TM6 duplication coincides with the origin of the core eudicots lineage. This result, together with the strong signal for PS along the euAP3 branch lineage, suggests a possible functional divergence of AP3 duplicates that might have been important for the evolution of the core eudicot floral developmental genetic program.
| Materials and Methods |
|---|
|
|
|---|
Sequences
Full-length DNA sequences for coding regions of B MADS-box genes were obtained from GenBank at the National Center for Biotechnology Information Database (http://www.ncbi.nlm.nih.gov/) using Blast and key word searches for each plant family sensu the APG II (2003), aimed at obtaining a homogeneous taxonomic sampling across angiosperm lineages. Accession numbers as well as species names and plant families for each gene are presented in table 1. We sorted the sequences into 3 data sets: one for B MADS-box genes, another including a richer sampling exclusively for PI genes, and a third with a richer sampling for AP3 genes to obtain improved alignments and phylogenetic resolution inside each lineage.
|
Alignment Strategy
Downloaded sequences were edited to identify correct open reading frames. We obtained preliminary amino acid alignments for each of our 3 data sets using MUSCLE (Edgar 2004
Phylogenetic Analysis
We performed the phylogenetic analysis using Bayesian methods as implemented in MrBayes 3.1.1 (Huelsenbeck and Ronquist 2001
) with our edited and nonedited codon alignments. The substitution DNA model was selected with the help of Modeltest by using the Akaike Information Criterion for first + second and third codon positions separately (Posada and Crandall 1998
). Models found were implemented in MrBayes, and partitioned analyses were performed when we found different models for different codon positions. Searches were started from a random tree with a plain prior on 4 different Markov chains in 2,500,000 generations saving every 100th tree. At convergence (around 20,000 generations), confirmed visually from the graphs of the log likelihood scores, we discarded the first 3,000 trees as burn-in and constructed a consensus using the Bayesian posterior probabilities (PPs) to evaluate clades support.
Statistical Tests for DSR and PS Detection
To detect the selective regimes operating on each coding site (codon) of B proteins, we performed statistical tests based on models of DNA substitution by codons that rely on estimates of the ratio
= dN/dS of nonsynonymous substitution rates (dN) relative to synonymous substitution rates (dS). Estimated
values can provide an unambiguous criterion to evaluate the selective regimes operating at the molecular level, expecting
values
1 for neutrality,
< 1 for purifying selection, and
> 1 for positive adaptive selection.
We estimated likelihood scores using 2 different kinds of codon models. The first kind comprises the so-called site-specific models (Nielsen and Yang 1998
; Yang et al. 2000
) and were designed to detect heterogeneous selection pressures and PS among sites (codons) if the average nonsynonymous substitution rate over all lineages is higher than the average synonymous rate. The second kind, the so-called branch-site models, has the same objective but exclusively in particular lineages of a phylogeny, allowing the
ratio to vary across sites and across lineages and detecting codon specific PS or DSRs in preselected branches of a given tree regardless of the average dN/dS ratio. Likelihood scores obtained given such codon models were compared against null models that imply homogeneity in substitution rates and neutrality or purifying selection at all sites and branches, statistically accepting PS and DSR only if the improvement in likelihood scores obtained with the PS/DSR codon models was significant. Model comparisons were performed using likelihood ratio tests (LRTs), which require nested hypotheses of models being compared. Degrees of freedom are equal to the number of extra parameters in the alternative hypothesis. The LRT statistic equals twice the difference of log likelihood scores of models compared (2
l = 2(l2 l1)), and its distribution can be approximated with a chi-squared test; thus, statistical significance was evaluated using a test of goodness of fit for this statistic (Yang and Nielsen, 2002
). The calculations were performed following the algorithms implemented in version 3.14 of the PAML package (http://abacus.gene.ucl.ac.uk/software/paml.html).
The site-specific models used were 0, 1a, 2a, 3, 7, and 8 according to the nomenclature and methods described in Yang et al. (2000)
and in the PAML documentation. These models do not account for molecular rate shifts or PS in particular lineages, thus averaging over all sequences in the phylogeny. PS is difficult to detect with these models because it often operates episodically on few amino acids in a small number of taxa. In contrast, branch-site models have more power to detect DSR and PS on specific lineages because they allow for branches prespecified in the phylogeny to have a class of sites with a different
from the rest of the tree (model D) or for a class of sites with
> 1 (model A) (Yang and Nielsen 2002
; Bielawski and Yang 2004
). Using branch-site models, we tested branches of main lineages before and after important duplication events during the evolution of B genes, which represent potential points for the evolution of possible newly acquired biochemical properties and roles in the floral developmental program following these duplications. We repeated this procedure for different branches in the PI, AP3, and B MADS-box gene Bayesian phylogenies (indicated with arrows in figs. 2 and 3). Branches selected are assigned as "foreground" and as such have additional site classes that allow for PS, whereas the other branches in the phylogeny comprise the "background" branches. First, we used codon model D as an indicator of functional divergence as described in Bielawski and Yang (2004
) that was designed to detect substitution rate shifts not necessarily due to PS. Then, for PS detection, we used codon model A (Yang and Nielsen 2002
) that was designed to infer PS in the preselected tree branches.
|
|
Simulation studies have shown that Yang's branch-site tests to detect PS in specific branches are sensitive to model assumptions and unable to distinguish between relaxation of selective constraints and PS (Zhang 2004
In the case of alternative models that allow for PS (model 2, 8, and model A), the PP that a site (codon) was drawn from a given dN/dS class was calculated using a Bayes Empirical Bayes approach that accounts for sampling errors in maximum likelihood estimates of model parameters and that have shown to be conservative and to have desirable statistical properties (Yang et al. 2005
). We report only those sites with a PP > 0.9 to belong to a class under PS or
> 1. To evaluate the impact of alignment ambiguities and in-dels in our results, we repeated every test with our Gblocks edited alignments.
Estimation of Divergence Dates
We estimated divergence dates for major duplication events in the phylogenies reconstructed for B genes and AP3 genes using the heuristic rate-smoothing algorithm (the AHRS algorithm) described in Yang (2004)
and implemented in PAML version 3.14. The calibration date used for estimating the AP3PI duplication date in the B MADS-box phylogeny (fig. 2) was 121 MYA as minimum for eudicot origins inferred from tricolpate pollen fossil grains described in Doyle and Hotton (1991)
, and for the euAP3TM6 duplication in the AP3 phylogeny (fig. 3A), we used a calibration date of 132 MYA as minimum for the origins of angiosperms inferred from fossil pollen grains described in Brenner (1996)
. Following the AHRS algorithm, the branch lengths for the topologies obtained in our Bayesian analysis were estimated first by maximum likelihood following the pruning algorithm of Felsenstein (1981)
. Then the substitution rates and divergence dates were obtained as preliminary values. A stochastic model of changes in the substitution rates was adjusted with the estimated branch lengths, and the obtained values were used to classify branches into groups according to their rates following maximization by maximum likelihood, thus providing an objective method to consider differences in the branch lengths and substitution rates for different branches, and to enable their classification into different groups. In the final step of the AHRS algorithm, the divergence times and substitution rates were estimated (Yang 2004
).
| Results |
|---|
|
|
|---|
Phylogenetic Analyses of B MADS-Box Lineages
We obtained phylogenetic hypotheses for the entire set of B MADS-box sequences as well as AP3 and PI sequences separately using Bayesian methods with edited and nonedited alignments and evaluated clade support with the PP of each node. Figure 2 shows the consensus phylogeny obtained for B gene sequences with edited alignments. As expected, we recovered the AP3PI clade, which comprises angiosperm sequences exclusively, with a high PP support value (0.99 PP). As sister sequences of this AP3PI clade, we recovered the clade of B-sister genes (1.00 PP) in a poorly supported position (0.34 PP). At the base of angiosperm B genes and B-sister genes, we recovered a grade of gymnosperm B genes.
Figure 3A and B shows Bayesian phylogenies for AP3 and PI lineages, respectively, rooted with the sequences of the basal angiosperm Nymphaea. In both trees, the main clades of angiosperms sensu APG II (2003) were recovered as monophyletic with reasonable PP values. The relationships between main clades were different in each paralogous lineage, revealing a different evolutionary trajectory that may indicate differences in their functional diversification processes.
The trees with higher PP values at nodes for AP3 genes were obtained with nonedited alignments in a partitioned analysis, with a substitution model including 2 substitution types for first and second codon positions and 6 substitution types for third positions. In the case of PI gene trees, higher support values were obtained with complete sequences using 6 substitution types in the model.
Heterogeneous Substitution Rates within B MADS-Domain Protein Sites
We used different site-specific codon substitution models to make inferences regarding among-site substitution rate heterogeneity in B MADS-box proteins and then addressed whether or not some sites had been fixed by PS. We performed analyses for the AP3, PI, and the entire set of B gene sequence alignments with their corresponding phylogenetic hypotheses separately, repeating every test with edited alignments that contained almost exclusively unambiguously aligned positions. Results were nearly identical with both data sets, and we focus on those derived from alignments that exclude ambiguous regions. In a first analysis, we did not consider differences in rates at particular branches within the phylogeny, thus assuming homogeneous selective pressures along all angiosperm and gene lineages. Results are summarized in supplementary table A (Supplementary Material online), whereas model comparisons performed with LRTs as well as the LRT statistic values and P values are given in table 2.
|
Anisimova et al. (2001)
We used the Model M0 to estimate a general
value for each data set. The
was very similar for all data sets used (
= 0.151 for all B MADS-box genes,
= 0.163 for PI genes, and
= 0.170 for AP3 genes; supplementary table A, Supplementary Material online), suggesting a similar selection regime in both lineages and strong purifying selection operating on the majority of sites. Among the models implemented, M2a and M8 imply PS by adding a class of sites allowed to be under
> 1. M8 is an elaborate codon model that introduces the beta distribution to model among-site heterogeneity and adds one extra class for sites under PS. Using this model and edited alignments, we did not detect any sites that could have been fixed by PS in AP3 and PI genes. However, when nonedited alignments were used, a few sites that could have been fixed by PS in AP3 and PI genes were detected (see Results in supplementary table A, Supplementary Material online). Although LRTs done to detect PS with site-specific models failed, LRTs performed to evaluate among-site heterogeneity in
ratios between sites (M0 vs. M3 with K = 3, M0 vs. M3 with K = 2, and M3 K = 2 vs. M3 K = 3) without implying PS resulted in a rejection of the null hypotheses in all cases, with both edited and nonedited alignments (table 2).
Shifts in Substitution Rates and Changes in Protein Residues Fixed by PS after the AP3PI Duplication
It is difficult to detect PS using site-specific models if it has occurred only on particular branches or in certain lineages within the phylogeny because averaging over all lineages might hide the signal for PS or DSR. It is reasonable to expect molecular rate shifts and positive adaptive selection after gene duplications when functional diversification processes affect one or both duplicates. Therefore, we decided to assess whether or not signatures of molecular rate shifts were detectable and whether or not PS had fixed certain amino acids at particular moments of B MADS-box gene evolution, especially following the main duplication events. To achieve this, we used branch-sitespecific codon models that detect DSR and PS at specific coding sites and in particular branches of the phylogenetic tree. We focused on 2 key duplications during the evolution of B genes that coincide with important events in angiosperm evolution (See fig. 1B): 1) the AP3PI duplication before angiosperm diversification, which is key for the origin and evolution of heterodimerization and during which PS might have been important in fixing some key protein residues and 2) the euAP3TM6 duplication within the AP3 genes in the clade recovered for the core eudicots, which correlates with the origin of differentiated petals in a bipartite perianth with distinct calyx and corolla in this important angiosperm group. We also selected branches before duplications as contrasting hypotheses. Branches tested for DSR and PS in the B gene phylogeny are indicated in figure 2 (B-H1B-H3), whereas branches tested in the AP3 and PI lineages are indicated in figure 3A for AP3 (AP3-H1 to AP3-H3) and in figure 3B for PI (PI-H1) phylogenies.
LRTs performed to detect DSR in selected branches are shown in table 3, wherein we compare model D with model M3 with 2 site classes. Model D is useful in detecting DSR in branches selected as foreground lineages without assuming PS (Bielawski and Yang 2004
), thus giving insights into functional diversification processes after duplications. The ancestral branch leading to the AP3PI angiosperm clade (B-H1) was found, however, to have been under significant DSR with both alignments. This could imply slight changes in evolutionary rates early after the origin of angiosperms, before their crown group diversification. With this LRT, we also detected significant DSR after the AP3PI duplication in the branches leading to both lineages.
|
To evaluate whether or not the detected molecular rate shifts could have been due to PS at particular sites along those branches we used codon model A, which featured an extra class of sites with
values calculated from the data as free parameters in foreground or selected branches, thus allowing them to assume values of
> 1 (Yang and Nielsen 2002
|
|
PS was significantly supported for the ancestral branch leading to the angiosperm AP3PI clade (B-H1) only using edited alignments. Interestingly, with both data sets, we detected a clear signal of PS after the AP3PI duplication, in the PI lineage (B-H2) and in the AP3 lineage (B-H3), where the statistical support for PS was even stronger (table 4). We detected several sites under PS in both lineages with the Bayes Empirical Bayes approach, but we report here only the ones with PP values greater than 0.9. Remarkably, all sites inferred to have been fixed by PS are within the K domain. In figure 4B, we show the position of sites detected mapped in the K domain of B-protein sequences from A thaliana following Yang and Jack (2004)
PS after the euAP3TM6 Duplication in Core Eudicots
To gain insight into the functional diversification of AP3 and PI genes in core eudicots, we tested for DSR and PS in B genes of this clade. Branches selected as hypotheses or foreground branches are indicated with arrows in figure 3A and B. We focused especially on the euAP3TM6 duplication of AP3 genes of core eudicots, in a manner analogous to that used for the AP3PI duplication in angiosperms. Table 3 shows the LRT comparisons performed with log likelihood scores and the results obtained for DSR tests in foreground branches tested on AP3 and PI phylogenies. According to our results, there are no significant DSRs in the branches leading to core eudicot AP3 or PI genes. Nor did we detect DSR in the branch leading to the euAP3 lineage in the AP3 phylogeny, but interestingly we observed a strong signal of significant DSR in the TM6 branch lineage.
Two alternative scenarios could account for our observation of DSR in the TM6 and euAP3 branches. On one hand, when DSRs were detected, PS could have acted throughout sites. Alternatively, when DSRs were not detected, PS could have acted on only a very small proportion of sites in foreground branches, thus yielding a hidden signal in DSR tests. To distinguish between these possibilities, we applied model A to the same branch-selection tests. Results are shown in table 4. We did not obtain a signal of PS for the ancestral core eudicot AP3 (AP3-H1) or PI (PI-H1) branches or in the branch leading to core eudicot TM6 lineage, but PS was detected with a strong signal in the branch leading to the euAP3 genes (AP3-H3) (see fig. 4A). The conflicting results of DSR and PS analyses in the euAP3 (DSR tests negative, PS positive) and TM6 (DSR tests positive, PS negative) branches could be an indication that there were DSRs within the TM6 branch early in the origin of this lineage but that these were not due to PS. In contrast, in the euAP3 lineage, substitution rates were overall homogeneous, probably because their functional diversification depended on a very small number of amino acid substitutions that were indeed fixed by PS.
Some of the protein residues inferred to have been fixed by PS in the euAP3 lineage were also found within the K domain, whereas others were found within the C-terminal domain (fig. 4). According to their correspondence to the APETALA3 protein sequence of A. thaliana, sites 148R (0.93 PP), 162C (0.94 PP), and 191K (0.95 PP) are within the K domain, whereas sites 305P (0.99 PP), 313S (0.98 PP), 316I (0.98 PP), 317T (0.97 PP), and 318F (0.97 PP) correspond to sites on the characteristic euAP3 motif in the C-terminal region of the protein. It is noteworthy that all sites detected to have been fixed by PS were corroborated in the analysis with both data sets, for the edited and the nonedited alignments.
Estimation of Divergence Dates
We used the Bayesian phylogenies generated to estimate divergence times of major duplication events in B MADS-box genes. For the AP3PI duplication, we obtained an estimated date of 290 MYA, earlier than
260 MYA (230290 MYA) estimated in previous studies (Kim et al. 2004
) and closer to the angiospermgymnosperm divergence that occurred shortly after the origin of the seed plants 290309.2 MYA (Mapes and Rothwell 1984
, 1991
). The date of the euAP3TM6 duplication was estimated at 92.3 MYA, which is relatively later than Fagales fossils dated at 96.2 MYA and thought to be coincident with the radiation of the core eudicots (Doyle and Hotton 1991
; Magallón and Sanderson 2001
).
| Discussion |
|---|
|
|
|---|
Several studies have addressed the evolutionary patterns of floral developmental genes at population levels, showing evidence of distinct forces driving the evolution of different loci and including the signal of adaptive evolution in some such as LFY or TLF1 (Purugganan and Suddith 1999
We focused on the 2 main duplications along the evolutionary history of B MADS-box genes, the AP3PI and the euAP3TM6 duplication. Phylogenetic analyses show that the main AP3PI duplication occurred at the base of angiosperms and that the euAP3TM6 duplication occurred at the base of core eudicots (Kramer et al. 1998
, 2004
). Both duplications occurred before the crown group diversification of each plant lineage, linked with the appearance of novel morphological structures conforming flowers in angiosperms and the establishment of the floral pattern characteristic of core eudicots. Experimental studies have demonstrated that the AP3 and PI lineages as well as the euAP3 and TM6 lineages exhibit functional divergence, each having relatively distinct biochemical roles in the floral developmental genetic program (Lamb and Irish 2003
; Vandenbussche et al. 2004
). Our estimates of duplication dates indicate that the AP3PI duplication occurred 290 MYA, 30 million years earlier than previously thought (Kim et al. 2004
), placing this duplication closer to the origin and divergence of angiosperms and gymnosperms 290309.2 MYA (Mapes and Rothwell 1984, 1991). We also estimated that the euAP3TM6 duplication occurred 92.3 MYA, perhaps during the time span of origin and amazing radiation of the core eudicot clade around 96 MYA (Brenner 1996
). Taken together with our PS and DSR test results, these dates suggest the relevance of the main B gene duplications in the origins of the genetical mechanisms of the floral developmental program at the base of angiosperms and its establishment in core eudicots.
In our statistical tests, we found a strong signal of PS after the AP3PI duplication, compelling evidence that positive adaptive selection was instrumental for the rapid functional diversification of B genes during angiosperm species radiation. Moreover, the sites inferred to have been fixed by PS in AP3 and PI lineages are found within the K domain (fig. 4B), that is crucial for the proteinprotein interactions among MADS-domain proteins required for achieving their regulatory activities (Fan et al. 1997
; Yang et al. 2003
). Yang and Jack (2004)
observed a strong correlation between alterations in the proteinprotein interactions detected with yeast 2 hybrid experiments and the plant rescue or overexpression floral phenotypes of lines expressing different B proteins with mutations in the K domain, thus suggesting a correlation between proteinprotein interaction strength in these regulatory proteins and the emergence of different phenotypes.
It has been postulated that the K domain folds into 3 amphipathic
-helices referred to as K1, K2, and K3 separated by interhelical regions, each conferring different dimerization specificities (Yang et al. 2003
). Remarkably, one of the sites inferred under PS, 100N in the PI lineage, falls within the K1 region that has been shown specifically to control the strength of the PI/AP3 interaction (Yang an Jack 2004). In addition, site 142A, found to have been fixed by PS also on the PI lineage, is in the K2 region, whereas site 130E in the AP3 lineage is in the interhelical region between K1 and K2, which has been shown to be important in PI for both the PI/AP3 and the PI/SEP3 interaction (Yang and Jack 2004
). Congruent to a remarkable degree with our results is the finding that a PI protein mutant in site 142A (A127P in Yang and Jack 2004
) showed severe defects in the PI/SEP3 interaction (Yang and Jack 2004
), and other mutants with severe defects in proteinprotein interactions reported in Yang and Jack (2004)
include mutations in sites reported here as fixed by PS. These data provide partial experimental support for the functional relevance of the sites identified in this study to have been fixed by PS. The changes detected here in the proteinprotein interaction domains of B sequences could have been fixed by PS for different multimeric combinations and downstream specificities affecting flower development and its evolution during angiosperm diversification. Our results indicate the direction of further experimental work.
It has been suggested that the obligate heterodimerization between the AP3 and PI proteins observed in core eudicots evolved from homodimerization because a B-like protein from Gnetum gnemon was shown to be capable to bind DNA in a sequence-specific manner as a homodimer in vitro (Winter et al. 2002
, see fig. 1). LMADS1, an AP3 protein from the monocot Lilium longiflorum was also observed to be capable to form homodimers as well as heterodimers with the PISTILLATA protein of Arabidopsis in vitro (Tzeng and Yang 2001
; Tzeng et al. 2004
). These studies suggested that this "facultative" condition could be an intermediate evolutionary state between homodimerization and heterodimerization, which could have appeared after the monocot divergence 140200 MYA (Winter et al. 2002
). In contrast, our data suggest that heterodimerization appeared shortly after the split between angiosperms and gymnosperms. At this point in evolution, which coincides with the AP3PI duplication, strategic changes in residues within the proteinprotein interaction domain appear to have been fixed by PS. In concordance with our suggestion, experimental studies of B genes from the monocot Zea mays showed that B function is conserved between maize and Arabidopsis (Whipple et al. 2004
). In maize, SL1 (AP3) and ZMM16 (PI) form obligate heterodimers to bind DNA and also interact forming heterodimers with their respective orthologs in Arabidopsis (ZMM16/AP3, SL1/PI) to be functional both in vitro and in vivo (Whipple et al. 2004
). It has also been reported the obligate functional heterodimerization in rice (Moon et al. 1999
).
The hypothesis of the intermediate "facultative" state between functional homodimerization and obligate heterodimerization relies on the assumption that monocots are the intermediate lineage between gymnosperms and core eudicots. However, the phylogenetic position of monocots relative to eudicots and basal angiosperms is not yet resolved (APG II 2003
). Monocots represent relatively derived lineages, and it is possible that the observed in vitro homodimerization of LMADS1 in Lilium is a reversion to a plesiomorphic state. Alternatively, the Lilium in vitro results may not hold in vivo, and this taxon also could have a conserved B function with respect to Arabidopsis. Therefore, additional in vivo functional assays of homo and heterodimer B proteins in monocots and other basal angiosperms as well as a molecular evolutionary analysis with more monocot sequences are needed.
Another important duplication occurred during the evolution of AP3 genes that coincides with the origin of core eudicots, the euAP3TM6 duplication, that could have had an impact in the evolution of flower development in this speciose group of flowering plants (Lamb and Irish 2003
; Vandenbussche et al. 2004
). The floral structure in core eudicots is characterized by a fixed number of floral organs in each whorl and a perianth with a distinct calyx and corolla, suggesting a canalization of the floral developmental program to morphologically differentiated sepals and petals, in contrast to the more plastic and undifferentiated perianth structure found in noncore eudicot species (Drinnan et al. 1994
; Magallón et al. 1999
; Lamb and Irish 2003
). Core eudicots share 2 paralogous B genes that differ in their C-terminal characteristic motifs, named euAP3 genes and TM6-like genes. The latter shares the ancestral paleoAP3 motif in their C-terminal domain, which is found in monocots, basal angiosperms, and basal eudicots, whereas the euAP3 lineage is characteristic of core eudicots (fig. 4A). The appearance of the euAP3 lineage has been postulated to be important in the evolution of the floral developmental program characteristic of this lineage (Lamb and Irish 2003
).
We found strong evidence for PS after the euAP3TM6 duplication (figs. 3A and 4A) only in the euAP3 lineage, in coincidence with available evidence for the functional divergence of these lineages (Lamb and Irish 2003
; Vandenbussche et al. 2004
). In our analysis, we detected sites 148R, 162C, and 191K to have been fixed by PS in the branch leading to the euAP3 clade. Site 148R is in the K1 subdomain, important for PI/AP3 dimerization (Yang and Jack 2004
). Site 162C is in the interhelical region between K1 and K2. In PI, those regions are important for PI/AP3 and PI/SEP3 dimerization. Site 191K falls within the interhelical region between K2 and K3, which in PI protein of Arabidopsis has been experimentally shown to be specifically important for the PI/SEP3 (or SEP1) dimerization. Mutations in K2 and K3 are shown to be defective in forming AP3/PI/SEP1 ternary complexes (Yang and Jack 2004
). In the euAP3 lineage, we also detected several sites under PS that correspond to sites on the characteristic euAP3 motif in the C-terminal region of the protein. It was observed that a frameshift mutation due to a single nucleotide deletion in the ancestral C-terminal paleoAP3 motif leads to the appearance of the euAP3 motif (Kramer et al. 2006
), and it has been shown experimentally that these motifs are responsible for the functional diversification of euAP3 and paleoAP3 genes (Lamb and Irish 2003
).
Our results reveal that in contrast to the AP3PI duplication that fostered the possibility for acquisition of new functions by fixation of amino acid changes by PS in both duplicates in a symmetrical pattern, the euAP3TM6 duplication was followed by an asymmetrical pattern of diversification, in which one gene copy retained the ancestral states (TM6 lineage), whereas in the other, new residues were fixed by adaptive evolution (euAP3 lineage). Whipple et al. (2004)
experiments suggest that paleoAP3 sequences from Z. mays (SILKY1) function relatively well in Arabidopsis, which only has a euAP3 paralogue. However, future experiments should be performed with core eudicot species possessing both duplicates and additional complementation analyses and not only overexpression experiments must be done to better address functional conservation between paleoAP3 proteins from monocots and AP3 proteins from core eudicots.
It has been suggested that new sequences in the C-terminal domain evolving in euAP3 genes in core eudicots could have been associated with the evolution of differentiated petals (Kramer et al. 1998
; Lamb and Irish 2003
). Since their discovery, the importance of characteristic motifs in the C-terminal domain of B genes in the functional specificity of the proteins and in the evolution of B function has been emphasized. This has stimulated a search for similar motifs in other floral regulatory genes (Johansen et al. 2002
; Kramer et al. 2004
). However, in a recent study in Pisum sativum (Fabales, eurosids I), it was shown that there is a functional conservation of PISTILLATA activity in its PsPI homolog, that is an atypical PI-type polypeptide that lacks the highly conserved C-terminal PI motif (Berbel et al. 2005
), suggesting the importance of other domains in the functional diversification of B function. Our study emphasizes the importance of the K domain in addition to the COOH region in the functional diversification of B MADS-box genes. The strong signal for PS detected for particular sites within both the K and C-terminal domains in euAP3 genes also suggests the importance of these genes in the evolution of the floral developmental mechanism in core eudicots, perhaps by strengthening the interaction of MADS-box proteins and altering the specificity of multimers.
| Conclusions |
|---|
|
|
|---|
Our results strongly suggest that shortly after the duplication leading to the AP3 and PI genes, both lineages experienced a dramatic functional diversification driven by PS acting at specific sites of the K domain, whose main role is to promote heterodimerization with other MADS proteins to achieve different regulatory functions. In addition, we reported an estimated date for this duplication at 290 MYA, closer to the split of angiosperms and gymnosperms than the one reported previously. This suggests an early origin of heterodimerization capabilities of B MADS-box proteins in angiosperms, very close to the split from gymnosperms. Similarly, the euAP3TM6 duplication in core eudicots seems to have been fundamental in the diversification process of these genes, also driven by PS acting at certain residues of the K and C-terminal motifs early after the formation of the euAP3 lineage.
Our study emphasizes the importance of the K domain in the evolutionary diversification of B genes. The importance of this domain in B gene function and evolution had been overshadowed by the discoveries of diagnostic C-terminal motifs and studies of their functional importance in B genes (Kramer et al. 1998
, 2004
; Lamb and Irish 2003
). It has been speculated that the evolution of combinatorial multimer formation of MIKC-type MADS-domain proteins facilitated diversification of this gene family (Kaufmann et al. 2005
). Our results support that the functional divergence of B MADS-box genes after duplications fostered different possible combinations for multimer formation that could have allowed the appearance of novel regulatory functions as the basis for the origin of morphological novelties.
| Supplementary Material |
|---|
|
|
|---|
Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank Susana Magallón for her guidance and helpful discussions, essential during the final stages of this study. We also thank Mark Olson for useful comments that helped to improve this manuscript. Part of this work was done during an academic stance of T.H.-H. at Lorenzo Segovia Lab in the Biotechnology Institute, Universidad Nacional Autónoma de México (UNAM). We thank him for sponsoring that stance and providing computer time. This work was supported by grants to E.A.B. from Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica-UNAM (IN 230002 and -IN212995), UC-MEXUS-ECO IE 271, and Consejo Nacional de Ciencia y Tecnología (CONACYT)-41848-Q, 31871-N and is based on T.H.-H. Masters thesis, whose graduated studies were supported with a fellowship from CONACYT and Dirección General de Estudios de Posgrado at UNAM. We also acknowledge Rigoberto V Pérez-Ruiz and Aida Navarrete for support in various technical and logistical tasks, respectively.
| Footnotes |
|---|
Neelima Sinha: Associate Editor
| References |
|---|
|
|
|---|
Alberch P. (1980) Ontogenesis and morphological diversification. Am Zool 20:653667.
Alvarez-Buylla ER, Liljergren SJ, Pelaz S, Gold SE, Burgeff C, Ditta GS, Vergara-Silva F, Yanofsky MF. (2000) MADS-box gene evolution beyond flowers: expression in pollen, endosperm, guard cells, roots and trichomes. Plant J 25:5593.[CrossRef]
Alvarez-Buylla ER, Pelaz S, Liljergren SJ, Gold SE, Burgeff C, Ditta GS, Ribas DP, Martínez-Castilla L, Yanofsky MF. (2000) An ancestral MADS-box gene duplication occurred before the divergence of plants and animals. Proc Natl Acad Sci USA 97:1053285333.
Anisimova M, Bielawski JP, Yang Z. (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:815851592.
[APG II] Angiosperm Phylogeny Group II. (2003) An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG II. Bot J Linn Soc 141:399436.[CrossRef][ISI]
Bailey TL and Elkan C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology(AAAI Press, Menlo Park (CA)) pp. 2836.
Bailey TL and Elkan C. (1995) The value of prior knowledge in discovering motifs with MEME. Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology(AAAI Press, Menlo Park (CA)) pp. 2129.
Berbel A, Navarro C, Ferrandiz C, Calas LA, Beltran JP, Madueno F. (2005) Functional conservation of PISTILLATA activity in a pea homolog lacking the PI motif. Plant Physiol 139:1174185.
Bielawski JP and Yang Z. (2004) A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. J Mol Evol 59:121132.[ISI][Medline]
Bowman JL. (1997) Evolutionary conservation of angiosperm flower development at the molecular and genetic levels. J



