MBE Advance Access originally published online on June 14, 2007
Molecular Biology and Evolution 2007 24(9):1934-1943; doi:10.1093/molbev/msm121
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2007 The Authors.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Research Articles |
Lineage-specific expansion of the Zinc Finger Associated Domain ZAD
Max-Planck-Institut für biophysikalische Chemie, Abteilung Molekulare Entwicklungsbiologie, Am Fassberg 11, 37077 Göttingen, Germany
E-mail: chung{at}molgen.mpg.de.
| Abstract |
|---|
|
|
|---|
The zinc finger associated domain (ZAD), present in almost 100 distinct proteins, characterizes the largest subgroup of C2H2 zinc finger proteins in Drosophila melanogaster and was initially found to be encoded by arthropod genomes only. Here, we report that the ZAD was also present in the last common ancestor of arthropods and vertebrates, and that vertebrate genomes contain a single conserved gene that codes for a ZAD-like peptide. Comparison of the ZAD proteomes of several arthropod species revealed an extensive and species-specific expansion of ZAD-coding genes in higher holometabolous insects, and shows that only few ZAD-coding genes with essential functions in Drosophila melanogaster are conserved. Furthermore, at least 50% of the ZAD-coding genes of Drosophila melanogaster are expressed in the female germline, suggesting a function in oocyte development and/or a requirement during early embryogenesis. Since the majority of the essential ZAD coding genes of Drosophila melanogaster were not conserved during arthropod or at least during insect evolution, we propose that the LSE of ZAD-coding genes shown here may provide the raw material for the evolution of new functions that allow organisms to pursue novel evolutionary paths.
Key Words: Drosophila melanogaster lineage-specific expansion Zinc Finger Associated Domain
| Introduction |
|---|
|
|
|---|
C2H2 zinc finger proteins (ZFPs) represent the most abundant nucleic acid binding proteins in the eukaryotic kingdom (e.g. Böhm, Frishman, and Mewes 1997
(Payre et al. 1997
The zinc finger associated domains (ZAD, SCAN and KRAB) characterize protein families whose members independently proliferated in distinct lineages, a phenomenon referred to as lineage-specific expansion (LSE; Jordan et al. 2001
; Lespinet et al. 2002
). For example the genomes of mouse and humans encode a comparable number of KRAB domains (Huntley et al. 2006
), but many of these KRAB domain-coding genes were independently generated by gene duplication in the lineages leading to either mouse or humans (e.g. Urrutia 2003
; Huntley et al. 2006
). It appears that KRAB-coding genes are prone to duplicate, suggesting that there is positive selection for these duplication events. However, despite their large numbers in mammals, only few KRAB-coding sequences have been found in other deuterostomes (Birtle and Ponting 2006
and references therein). Thus, the tendency to duplicate is not a unique feature of the genes per se but has to be viewed in the context of the lineage-specific organismal constraints and demands.
Many aspects of the ZAD are very similar to the KRAB domain. Like the KRAB domain in mammals, it characterizes the largest subgroup of ZFPs in Drosophila melanogaster and has previously only been identified in arthropods (Chung et al. 2002
). Here, we show that the ZAD was present in the last common ancestor of arthropods and vertebrates. Although present in vertebrates we were able to identify only a single gene encoding a ZAD-like peptide, suggesting that the ZAD-coding genes underwent LSE in the lineage leading to Drosophila melanogaster. Moreover, by comparing the ZAD sequences of six arthropod species, we show that ZAD-coding genes were subject to LSE especially in the higher holometabolous insects.
Only few Drosophila melanogaster ZAD-coding genes with known and essential functions are conserved in evolution. This observation indicates that some or even the majority of the ZAD-coding genes exert lineage-specific or even species-dependent functions. We speculate that at least some of the lineage-specific ZAD-coding genes may be involved in processes that lead to developmental differences. This hypothesis is supported by our observation that many ZAD-coding genes of Drosophila melanogaster are expressed in the female germline, implying that they are involved in some aspects of oogenesis and/or contribute to the maternal effect on early embryonic development. Consistent with this view, we find that the maternal effect gene pita, which plays a critical role during Drosophila melanogaster oogenesis (Laundrie et al. 2003
), is conserved in all holometabolous insects examined.
| Results and Discussion |
|---|
|
|
|---|
Based on initial searches in expressed sequence tag databases the ZAD was proposed to be restricted to arthropods (Chung et al. 2002
Figure 1 shows a multiple sequence alignment of the ZAD of ZFP276 orthologs and the ZAD of the Drosophila melanogaster transcription factor Grauzone (ZADgrau; Jauch et al. 2003
). The sequences forming secondary structure elements in ZADgrau can be aligned without gaps or insertions. In loop regions, small sequence gaps and insertions are observed. The alignment also shows that most residues implicated to form the hydrophobic core of ZADgrau (Jauch et al. 2003
) maintain their hydrophobic character (Figure 1). Thus, the ZFP276 orthologs of vertebrates contain a bona fide ZAD motif and can be grouped as members of the ZAD protein family. This finding implies that the ZAD was present in the last common ancestor of vertebrates and arthropods. Alternatively, the single ZAD-like domain conserved in vertebrate genomes may have evolved convergently, since neither ZAD nor ZAD-like sequences are present in the genomes of basal deuterostomes such as ascidians (Ciona intestinalis and Ciona savignyi) and echinoderms (Strongylocentrotus purpuratus).
|
Lineage specific expansion of ZAD-coding genes in higher insects
The high number of ZAD proteins in Drosophila melanogaster suggests that the ZAD proteome has been shaped by an evolutionary history that was dominated by expansion of ZAD-coding genes. As only a single ZAD-coding gene was found in vertebrates, the very high number of ZADs found in Drosophila melanogaster must have been generated soon after the divergence of deuterostomes and protostomes and/or only very recently in the evolutionary history of Drosophila melanogaster. We reasoned that we could distinguish between these possibilities by comparing the ZADs of Drosophila melanogaster to ZADs of other protostomes.
In order to get a broad overview of the set of ZADs in each species, we concentrated our efforts on species for which whole genome sequences are available. These species all belong to the arthropod phylum, including the dipteran mosquito Anopheles gambiae, the lepidopteran silk worm Bombyx mori, the coleopteran red flour beetle Tribolium castaneum, the hymenopteran honeybee Apis mellifera and the crustacean water flea Daphnia pulex (see also Figure 2). If ZAD coding genes expanded very early after the split of the deuterostomes and protostomes, the majority of ZAD sequences should be conserved in these arthropod species and orthologs should be readily identifiable. Conversely, in case of a recent expansion, the number of ZAD sequences conserved in all arthropod species should be rather limited, indicating that the majority of ZADs were generated by lineage-specific expansions (LSEs).
|
We were able to identify ZAD-coding sequences in each genome of the six above listed species. However, the numbers of ZADs were very different (Figure 2). In Daphnia pulex, we found four ZADs. This low number increases to 29 in the most basal holometabolous insect Apis mellifera (Savard et al. 2006
In order to infer the characteristics of the LSE in higher insects, we applied the logic described above. Thus, it was necessary to assign potential orthologous and species-specific paralogous groups. The classical approach for identifying orthologs as well as paralogs involves phylogenetic analysis and a procedure referred to as tree reconciliation (e.g. Page and Charleston 1997
). This approach tries to relate the topology of a gene tree to a chosen species tree employing the parsimony principle, i.e. a minimal number of duplications and gene losses in the evolution of the gene tree. Thus, it appears as if this approach is the method of choice for our task. However, we observed that gene trees built from a multiple sequence alignment of all ZADs are very unreliable in most portions of the tree suggesting that they contain many uncertainties and artifacts. We concluded that using such an unreliable gene tree as input for tree reconciliation could lead to many artifacts, which in turn would render an interpretation of the results difficult. Therefore, we developed an alternative approach that enables us to focus on the reliable parts of the gene tree. Briefly, we searched for ZAD peptide sequences that in the case of species-specific paralogous groups were more similar to each other than to any other ZAD sequence of another species, or, in the case of orthologous groups, to any other ZAD sequence. In both cases we tested whether the similarity is significant (see Materials and Methods for details).
We obtained a total of 27 species-specific paralogous groups under these rigorous criteria (Table 1). Seven groups including 40 out of 98 ZAD sequences were found in Drosophila melanogaster, eight groups containing 84 of the 147 genes in Anopheles gambiae, five in Bombyx mori containing 33 of the 86 genes, seven groups in Tribolium castaneum with 25 of the 75 genes, and, finally, no group at all in Apis mellifera and Daphnia pulex. Each group contained between two and 51 members (Table 1). Thus, with the exception of Apis mellifera and Daphnia pulex, between one third and more than half of the ZADs can be assigned to species-specific paralogous groups. This finding indicates that many ZAD-coding genes of the holometabolous insects, possibly excluding Hymenoptera, have been recently generated, or to be more explicit are the result of LSE. The result is consistent with the observation that many members of the paralogous groups in Drosophila melanogaster can be found in neighboring positions in the genome (data not shown).
|
In order to further test this conclusion, we focused on the orthologs. We identified a total of 15 putative orthologous groups. As shown in Figure 3, ZADs of Drosophila melanogaster (Figure 3 B – I) and Anopheles gambiae (Figure 3 A - C and E – I) were found in eight of these orthologous groups, Bombyx mori ZADs in seven groups (Figure 3 B, C E, F and J – L), Tribolium castaneum ZADs in nine (Figure 3 A – D, J, K and M – O), Apis mellifera ZADs in ten (Figure 3 A – D and J – O) and finally a single Daphnia pulex ZAD in one group (Figure 3 A). None of the 15 orthologous groups contained sequences from all six species. But two groups contained sequences from five holometabolous insects (Figure 3 B and C) and one group also contained sequences of Daphnia pulex, lacking sequences of Bombyx mori and Drosophila melanogaster (Figure 3 A).
|
We used the parsimony principle to infer the number of ZAD-coding genes present before the speciation events that led to the recent species. The results of this analysis is summarized in Figure 2. They indicate that at least one ZAD-coding gene was present in the last common ancestor of Daphnia pulex and the insect species. There were at least ten ZAD-coding genes present in the last common ancestor of Apis mellifera and the other holometabolous insects. This number stays the same in the last common ancestor of Tribolium castaneum and the remaining holometabolous insects; there is one instance where we find conservation between Apis mellifera and Bombyx mori, but fail to identify an ortholog in Tribolium castaneum (Figure 3 L). The last common ancestor of Bombyx mori and the dipteran insects had at least nine ZAD-coding genes, with three losses (Figure 3 M – O) and two gains (Figure 3 E and F). Finally, we find that at least nine ZAD-coding genes were present prior to the divergence of Anopheles gambiae and Drosophila melanogaster, with three losses (Figure 3 J – L) and three gains (Figure 3 G – I).
Thus, we conclude that a first burst of expansion occurred after the split of the crustacean and insect lineages, before the divergence of Hymenoptera and the other holometabolous insects. Thereafter, only few ZAD-coding genes have been fixed in evolution. Hence, our results indicate that only few ZAD-coding genes have been fixed in the early evolutionary history of the analyzed species. This in turn suggests that the majority of the holometabolous ZADs was recently generated long after the speciation events that formed the major holometabolous insect orders (and families, in the case of the two dipteran species), which is consistent with our observation that many ZADs can be found in species-specific paralogous groups (see above). This interpretation appears not to be true for Apis mellifera, in which we find that one third of the genes are conserved in evolution and fail to identify any species-specific paralogous group. Here, it is more parsimonious to assume that the majority of the Apis mellifera ZAD-coding genes was fixed very early after the split from the other holometabolous insects, such that we cannot find any significant sequence similarity between potential species-specific paralogous proteins. In support of this hypothesis, we find that 20 of the 29 Apis mellifera ZAD-genes have orthologs in a second hymenopteran species, a parasitic wasp Nasonia vitripennis (data not shown).
In Drosophila melanogaster nine ZAD coding genes possess essential functions, namely grauzone (grau; Schüpbach and Wieschaus 1989
), Serendipity-
(Sry-
; Payre et al. 1990
), deformed wings (dwg; Fahmy and Fahmy 1959
), pita (pita; Laundrie et al. 2003
), weckle (wek; Luschnig et al. 2004
), hangover (hang; Scholz, Franz, and Heberlein 2005
), phyllopod (phyl; Chang et al. 1995
; Dickson et al. 1995
), determiner of breaking down of Ci activator (debra; Dai, Akimaru, and Ishii 2003
) and poils au dos (pad; Gibert et al. 2005
). Notably, we find that only three of these genes are conserved in evolution: the gene pita in all five holometabolous insects examined (Figure 3 C); the gene hang in Apis mellifera and Tribolium castaneum (Figure 3 D); the gene pad in Anopheles gambiae (Figure 3 G). The remaining six genes with essential functions appear to be specific for Drosophila melanogaster. It is possible that we failed to identify orthologs for these genes for several reasons, which include the possibilities that the genome sequences may still contain gaps and uncertainties and/or that the orthologous sequences have diverged so much such that we cannot identify them by means of sequence similarity. Genome sequence quality is certainly an issue. However, it appears to be rather unlikely that gaps and uncertainties in the genome sequences consistently involved loci containing orthologs of these genes in all five species. Fast sequence divergence clearly limits our ability to identify orthologs. The fast decline of sequence similarity implies that the genes in question underwent a period of relaxed selective pressure, which is not compatible with the assumption that they fulfilled essential functions in, for example, the last common ancestor of Drosophila melanogaster and Anopheles gambiae. It is much more likely that they acquired these functions in the lineage leading to Drosophila melanogaster after the split between the two dipteran flies by means of positive selection. We conclude that it is possible that the six non-conserved essential ZAD-coding genes of Drosophila melanogaster carry out functions that are specific for the lineage leading to Drosophila melanogaster. This in turn implies that some of the ZAD-coding genes are likely to carry lineage-specific or even species-specific functions.
In summary, the results show that many ZAD-coding genes have been recently generated by LSE in four out of five holometabolous insects. Only the most basal holometabolous insect order, Hymenoptera (Savard et al. 2006
), showed no evidence for such a recent burst of expansions of ZAD-coding genes. In general, it appears that ZAD-coding genes are prone to duplicate, suggesting that there is positive selection for duplication of ZAD-coding genes. Furthermore, the failure to identify orthologs for the members of the paralogous groups suggests that they have diversified by means of positive selection, which in turn implies that they have acquired novel functions. It is possible that these functions contributed to the establishment of novel processes, leading to novel phenotypic traits, or substituted for the function of other proteins in conserved processes. We note, however, that more neutral mechanisms as outlined in the model of orphan genes in Drosophila (Domazet-Loso and Tautz 2003
) or the common neutral mechanisms of subfunctionalization after gene duplication (Force et al. 1999
) could result in an inability to detect the orthologous relationship.
ZFPs in general seem to be prone to LSE (see for example Lander et al. 2001; Chung et al. 2002
; Englbrecht, Schoof, and Böhm 2004
). LSEs of ZFPs are especially often found in proteins that also contain additional domains. In arthropods these additional domains include the ZAD (Chung et al. 2002
) and the BTB domain (Lander et al. 2001), while in vertebrates these include in particular the SCAN and KRAB domains (Collins, Stone, and Williams 2001
; Lander et al. 2001; Huntley et al. 2006
). All four domains are thought to be protein-protein interaction domains, suggesting that a combination of C2H2 zinc fingers with an additional protein-protein interaction domain represents a versatile platform, which can be used to adopt novel functionalities. These may include the recruitment to the regulation of target genes or entirely different functions, as exemplified by wek that functions as an adaptor, binding to the Toll receptor at the plasma membrane (Chen et al. 2006
).
The KRAB domain defines the largest group of ZFPs in vertebrates. The LSE of KRAB-ZFPs appears to be restricted to the tetrapod vertebrates (Urrutia 2003
), although the domain itself can be traced back to the base of the deuterostomes (Birtle and Ponting 2006
). If we compare that to the ZAD, we find striking similarities, i.e. the LSE of the ZAD is restricted to the higher holometabolous insects, but the domain itself was probably present in the last common ancestor of deuterostomes and protostomes. Although present in vertebrates, we find evidence for only one instance of the ZAD in the protein encoded by ZFP276 (see above), suggesting that in vertebrates the ZAD-ZFPs are not prone to duplication as has been inferred for the higher holometabolous insects (Chung et al. 2002
and this study). Thus, the differential expansion of ZAD-ZFPs in higher holometabolous insects and KRAB-ZFPs in tetrapod vertebrates may reflect distinct evolutionary constraints and demands that are specific for the lineages.
Drosophila melanogaster ZADs are expressed in the female germline
It is possible that lineage-specific functions of ZAD-coding genes are required in processes that evolved in response to changing ecological conditions. Alternatively, ZAD-coding genes may have been involved in processes that led to developmental changes. Based on the observation that four of the nine known ZAD-coding genes of Drosophila melanogaster are expressed in the female germline, we reasoned that ZADs may be involved in either oogenesis and/or are maternally required during early embryogenesis. The functions of the four aforementioned genes are consistent with this view. grau encodes a transcription factor that is required for the completion of meiosis (Chen et al. 2000
; Harms et al. 2000
) and pita is involved in the formation of egg-chambers (Laundrie et al. 2003
). The protein encoded by Sry-
activates the expression of the anterior determinant bicoid (Payre, Crozatier, and Vincent 1994
), while wek function is required for the establishment of the dorso-ventral axis (Luschnig et al. 2004
; Chen et al. 2006
).
In order to test whether female germline expression is common among ZAD-coding genes, we examined two independent microarray datasets (Manak et al. 2006
; Hooper et al. 2007
) that contain gene expression time series of developing Drosophila melanogaster embryos. We find that of the 98 ZAD-coding genes, 46 genes are maternally expressed. These 46 genes include 5 of the 9 previously described Drosophila melanogaster ZAD-coding genes wek, pita, hang, pad and dbr. Though grau and Sry-
are not included in this dataset, maternal expression was previously observed. This indicates that the 46 genes identified in the dataset represent a minimum estimate of maternally expressed ZAD-coding genes.
Furthermore, we examined the microarray datasets if the eight Drosophila melanogaster genes, which had at least one ortholog in other species, were expressed maternally. Significantly, we find that this is the case for seven of these eight ZAD-coding genes. The single conserved ZAD-coding gene that appeared to have no maternal expression corresponds to CG31109. CG31109 encodes a protein with an isolated ZAD, lacking additional C2H2 zinc finger domains. This gene is conserved in all five holometabolous insects examined. Moreover, we were able to identify potential orthologs in the more ancestral, hemimetabolous orthopteran species Laupala kohalensis, suggesting that it was present in the last common ancestor of Orthoptera and Holometabola (Figure 4 A).
|
Given the high-throughput nature of the microarray technology, it was likely that some of the maternally expressed genes were missed. For this reason we examined whether CG31109 is maternally expressed by in situ hybridization of RNA probes to whole mounted Drosophila melanogaster ovaries (see Materials and Methods). Figure 4 B shows the expression pattern of CG31109. It was observed that maternal CG31109 expression starts during stage 8 of oogenesis in the nurse cells and that the transcript accumulates evenly in the oocyte. Thus, all eight conserved Drosophila melanogaster ZAD-coding genes are expressed in the female germline. If we add CG31109 to the list of maternally expressed 48 ZAD-coding Drosophila melanogaster genes, we conclude that at least 50% of the 98 ZAD-coding genes found in Drosophila melanogaster are maternally expressed. Thus, it appears that female germline expression of ZADs is a rather common phenomenon. Therefore, we speculate that many ZAD-coding genes are involved either in some aspects of oogenesis and/or contribute to the maternal effect on early embryonic development. The observation that all conserved Drosophila melanogaster ZADs are expressed in the female germline indicates that maternal expression of ZAD genes is an ancestral property of ZAD-coding genes. This hypothesis suggests that ancestral ZAD-coding genes might have been involved in either oogenesis and/or early embryogenesis.
An ancestral function of ZADs in holometabolous insects
We found that the eight conserved Drosophila melanogaster ZAD-coding genes are expressed in the female germline. Two of them, namely pita and CG31109, are present in all holometabolous insects examined. Currently, only pita has been studied, while the function of CG31109 remains unknown. Orthologs of pita can be traced back to Apis mellifera (see above). Given that our orthology assignment is based only on the ZAD peptide sequence, we tried to extend the protein sequence to the C2H2 zinc fingers. We identified the full-length sequence of pita for four species examined with the exception of Bombyx mori, in the NCBI database. A multiple sequence alignment of these sequences showed that in addition to the ZAD sequences, a region including the C2H2 zinc fingers is also conserved (see Supplementary Figure S1), suggesting that these sequences are indeed orthologous. Thus, we conclude that pita was present in the last common ancestor of Hymenoptera and the other holometabolous insects.
pita is required during oogenesis, as mutations assayed in germline clones lead to defects in the development of egg chambers. The observed defects include degeneration of the egg chambers and abnormal or absent nurse cell nuclei (Laundrie et al. 2003
). Apart from its role during oogenesis, it has been found that pita function is generally required in proliferating tissues as well as cells undergoing endoreplication during S-phase. There it acts as a sequence-specific transcription factor that activates the expression of target genes, including the Orc4 gene, which is essential for initiation of DNA replication (Page et al. 2005
). Thus, the oogenesis defects seen in pita mutant germline clones can be attributed to a failure of the division cycles in the germarium that generate the germ cell cysts and/or a failure of the endoreplication cycles during nurse cell differentiation (Laundrie et al. 2003
; Page et al. 2005
).
Both, formation of germ cell cysts and endoreplication of nurse cells are characteristics of the meroistic type of oogenesis. Meroism evolved independently in several groups of insects, but it appears to be of monophyletic origin in the holometabolous and the paraneopteran insects, as has been inferred from several common features such as formation of branched germ cell cysts (Büning 1996
). In the meroistic ovary, only one cell of the cyst becomes the oocyte while the remaining cells differentiate to nurse cells, whereas the more basal panoistic ovaries lack nurse cells. The nurse cells produce most of the components that are deposited in the oocyte. Nurse cells become polyploid by means of endoreplication in order to cope with the high metabolic burden (Büning 1996
).
Given the conservation of pita in all holometabolous insects we speculate that pita has acquired its function during oogenesis before the radiation of holometabolous insects. In this view, the results suggest that pita may have been involved in the establishment of a novel phenotypic trait, the meroistic ovary. It is also possible that a more general function of pita involving DNA replication was established first and its specific role in oogenesis followed after the divergence of the major orders of the holometabolous insects.
Collectively, we have provided evidence that the ZAD-coding genes are subject to an ongoing LSE, which is most pronounced in the higher holometabolous insects. The LSE of these genes can be explained by the versatile functions that are adopted by the ZAD-containing proteins. ZAD-coding genes in general appear to be involved in developmental processes, suggesting that the frequent duplications of ZAD-coding genes provide the raw material to evolve functionalities that are employed during development. This in turn implies that ZAD-coding genes may have been involved in the establishment of novel morphological characters, such as the meroistic ovary. The relationship between ZAD genes and meroistic ovary development could be experimentally verified once the genomic data from paraneoteran insects, which have also meroistic ovaries, are available.
| Materials and Methods |
|---|
|
|
|---|
Genomic Sequence Data
We used the set of all predicted protein sequences of Drosophila melanogaster Release 3.2 (Celniker et al. 2002
In order to identify ZAD coding sequences we employed tblastn of the BLAST package (Altschul et al. 1997
) using the Drosophila melanogaster ZAD peptide sequences as query. All positive contigs were further analyzed with genewise of the Wise2 package (Birney, Thompson, and Gibson 1996
) using the ZAD profile hidden markov model. All ZAD peptide sequences were extracted and manually inspected, i.e. we discarded sequences with STOP codons, partial and nearly identical sequences and introduced introns if necessary. All identified ZAD peptide sequences are reported in the Supplementary File S2.
Detection of Orthologs and Paralogs
In order to detect orthologous and paralogous groups of proteins we extended the Inparanoid algorithm (Remm, Storm, and Sonnhammer 2001
). We constructed a distance matrix for all ZAD peptide sequences of all six examined species plus the ZAD found in the human ortholog of ZFP276. The construction of the distance matrix involved the following steps: (i) an all-against-all run of ssearch of the FASTA package (fasta34 series; Pearson and Lipman 1988
) using the parameters "-a # -E 1e100 –m 9 –H –s BL80"; (ii) the reported bit scores were converted to raw scores by Sraw = (Sbit ln(2) + ln(K)) /
, with K = 0.071 and
= 0.299; (iii) the raw scores of a match between sequences i and j were converted to symmetrical relative similarity scores by Srel(i, j) = Srel(j, i) = 0.5 [(Sraw(i, j) – Srand) / (Sraw (i, i) – Srand) + (Sraw (j, i) – Srand) / (Sraw (j, j) – Srand)] with Srand = 1/
(5.0 – ln(
* L(i) * L(j)), L(i) being the length of sequence i; and (iv) these symmetrical relative similarity scores were converted to distances by D(i, j) = D(j, i) = – ln(Srel(i, j)) if Srel(i, j) was greater than zero, or D(i, j) = D(j, i) = – ln(1.0e-5) otherwise.
Next, we searched for every sequence i of species A the most similar sequence k of species X
A. This sequence corresponded to a potential ortholog of sequence i. We then searched for sequence j of species B = A whose distance D(i, j) was smaller than the distance D(i, k) to the potential ortholog. For every sequence j that fulfilled this criterion we conducted a statistical test whether the sequences i and j were more closely related than to sequence k. We employed the statistical test proposed by Nei, Stephens, and Saitou 1985
for UPGMA trees that tests whether a branch that separates sequences i and j from sequence k is significantly longer than zero. Every sequence j that passed this test was assigned to be a species-specifc paralog. The so assigned groups were combined using single linkage clustering, i.e. groups that contained at least one overlap were merged. The members of the resulting clusters were then reported to be potential species-specifc paralogs.
In order to assign orthologous groups we employed a similar scheme as outlined above. We merged all members of a paralogous group to form a single entry and we added all singleton sequences. Thus, every sequence i was represented only once in the new gene list, either as member of a paralogous group or as a singleton. For each entry i of species A we searched for the closest entry j of species X
A. There were now four possible scenarios: (i) entry i is a singleton and entry j is a singleton (one-to-one); (ii) entry i is a singleton and entry j is a paralogous group (one-to-many); (iii) entry i is a paralogous group and entry j is a singleton (many-to-one); and (iv) entry i is a paralogous group and entry j is a paralogous group (many-to-many). As a distance measure between the entries i and j we used the average distance between all members of entry i and all members of entry j. In order to assign putative orthologs, we required that entries i and j form symmetrical best hits. For each pair i of species A and j of species B we searched for the closest sequence (of the original set) k of either species A or B whose distance was greater than the (average) distance between entries i and j (this was necessary in order to exclude putative species-specific paralogs that did not pass the test described above). We tested whether the sequences in entry i were more closely related to the sequences in entry j than to the sequence k using the test described above. Every entry j that passed this test was assigned to be a potential ortholog of entry i. The so-derived groups were combined if they contained at least one common member (single linkage clustering). The members of the resulting clusters were reported to be orthologous groups.
Making of DIG labeled RNA in situ Probes
Drosophila genomic DNA was isolated from females using standard methods. The longest exons of ZAD-coding genes were isolated by PCR on genomic DNA using Taq polymerase (Fermentas) and standard methods. Primers (MWG) used to amplify the longest exons of CG31109 (forward: 5'- gcggaattcaagctcggttcacgag-3'; reverse: 5'- gctctagatagctccaccgaccagagac-3'; fragment size 300 bp) contained a 5' EcoRI and a 3' XbaI site. Addition of cloning sites into the PCR-fragments via the primers allowed directional cloning into the Bluescript SK vector. DIG-labeled in situ probes were made following standard protocols, using T3 (Fermentas) to make antisense probes and T7 (Fermentas) to make sense probes.
in situ Hybridizations on Drosophila Ovaries
Ovaries were dissected from 2-day-old females in Grace's medium and placed on ice. Ovaries were fixed in 5% formaldehyde (Sigma) in PBSTX (PBS + 0.1% Triton-X 100 (Sigma)) for 20 min at RT. Ovaries were rinsed 3 times and washed 3 times for 5 min in PBSTX. Proteinase K digestion (final concentration: 5µg/µl in PBSTX) was conducted at RT for 10 min. Ovaries were rinsed 3 times with PBSTX and post-fixed with 5% formaldehyde in PBSTX for 20 min at RT. For pre-hybridization ovaries were first incubated in 1:1 PBSTX/HybBTX (5x SSC; 50% formamide (Sigma); 25 mg/ml torula yeast RNA (Sigma); 0.1% Triton-X 100) for 20 min at RT. The solution was substituted for HybBTX and ovaries were placed at 57 °C for at least one hour. In this time HybBTX was replaced at least three times. Probes were diluted 1:100 in HybBTX, placed at 80°C for 2 min and cooled on ice. HybBTX was removed from the ovaries and replaced with the probe mix. Hybridization was conducted at 57° C overnight. Probe was removed and ovaries were washed 3 times for 20 min with HybBTX at 55° C and once with 1:1 PBSTX/HybBTX at RT. Ovaries were rinsed 3 times and washed 3 times for 20 min with PBSTX at RT. Ovaries were incubated in a 1:1000 dilution of an alkaline phosphatase coupled sheep anti-DIG antibody (Roche) for 1 h at RT. Ovaries were rinsed 3 times and washed 3 times for 20 min with PBSTX. Ovaries were rinsed three times in staining buffer (0.1 M TRIS pH 9.5; 50 mM MgCl2; 0.1 M NaCl; 0.1% Triton-X 100). 10 µl NBT/BCIP solution (Roche) was added to ovaries in 1 ml staining buffer. Staining progress was monitored under a bifocal microscope. Staining was stopped by rinsing three times and washing several times with PBSTX. Ovaries were completely dehydrated in an ethanol dilution series and mounted in Canada Balsam (Sigma). Photographs were taken using a Zeiss axiophot microscope with Nomarski optics and a ProgRES 3012 camera using ProgRes 4.0 software (Kontron Elektronik). Images where further processed using Photoshop 7.0 (Adobe).
| Supplementary Material |
|---|
|
|
|---|
Supplementary figure S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
We thank the anonymous reviewers for the helpful comments. We thank our colleagues, especially Ralf Pflanz, for help and critical discussions. H.-R.C. thanks the Boehringer Ingelheim Fonds for a predoctoral fellowship. Work was supported by the Max-Planck-Gesellschaft.
| Footnotes |
|---|
1 Present address: Max-Planck-Institut für molekulare Genetik, Department of Computational Molecular Biology, Ihnestr. 63-73, 14195 Berlin, Germany.
Diethard Tautz, Associate Editor
| References |
|---|
|
|
|---|
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res (1997) 25:3389–3402.
Birney E, Thompson JD, Gibson TJ. PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res (1996) 24:2730–2739.
Birtle Z, Ponting CP. Meisetz and the birth of the KRAB motif. Bioinformatics (2006) 22:2841–2845.
Böhm S, Frishman D, Mewes HW. Variations of the C2H2 zinc finger motif in the yeast genome and classification of yeast zinc finger proteins. Nucleic Acids Res (1997) 25:2464–2469.
Büning J. Reproductive Biology: Oogenesis and Spermatogenesis. Verh Dtsch Zool Ges (1996) 89:123–137.
Celniker SE, Wheeler DA, Kronmiller B. (32 co-authors). Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol (2002) 3:RESEARCH0079.[Medline]
Chang HC, Solomon NM, Wassarman DA, Karim FD, Therrien M, Rubin GM, Wolff T. phyllopod functions in the fate determination of a subset of photoreceptors in Drosophila. Cell (1995) 80:463–472.[CrossRef][Web of Science][Medline]
Chen B, Harms E, Chu T, Henrion G, Strickland S. Completion of meiosis in Drosophila oocytes requires transcriptional control by grauzone, a new zinc finger protein. Development (2000) 127:1243–1251.[Abstract]
Chen LY, Wang JC, Hyvert Y, Lin HP, Perrimon N, Imler JL, Hsu JC. Weckle is a zinc finger adaptor of the toll pathway in dorsoventral patterning of the Drosophila embryo. Curr Biol (2006) 16:1183–1193.[CrossRef][Web of Science][Medline]
Chung HR, Schäfer U, Jäckle H, Böhm S. Genomic expansion and clustering of ZAD-containing C2H2 zinc-finger genes in Drosophila. EMBO Rep (2002) 3:1158–1162.[CrossRef][Web of Science][Medline]
Collins T, Stone JR, Williams AJ. All in the family: the BTB/POZ, KRAB, and SCAN domains. Mol Cell Biol (2001) 21:3609–3615.
Dai P, Akimaru H, Ishii S. A hedgehog-responsive region in the Drosophila wing disc is defined by debra-mediated ubiquitination and lysosomal degradation of Ci. Dev Cell (2003) 4:917–928.[CrossRef][Web of Science][Medline]
Dickson BJ, Dominguez M, van der Straten A, Hafen E. Control of Drosophila photoreceptor cell fates by phyllopod, a novel nuclear protein acting downstream of the Raf kinase. Cell (1995) 80:453–462.[CrossRef][Web of Science][Medline]
Domazet-Loso T, Tautz D. An evolutionary analysis of orphan genes in Drosophila. Genome Research (2003) 13:2213–2219.
Eddy SR. Profile hidden Markov models. Bioinformatics (1998) 14:755–763.
Englbrecht CC, Schoof H, Böhm S. Conservation, diversification and expansion of C2H2 zinc finger proteins in the Arabidopsis thaliana genome. BMC Genomics (2004) 5:39.[CrossRef][Medline]
Fahmy OG, Fahmy M. New mutants report. Dros Inf Serv (1959) 33:82–94.
Finn RD, Mistry J, Schuster-Böckler B. (13 co-authors). Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34:D247–251.
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duploicate genes by complementary, degenerative mutations. Genetics (1999) 151:1531–1545.
Gibert JM, Marcellini S, David JR, Schlotterer C, Simpson P. A major bristle QTL from a selected population of Drosophila uncovers the zinc-finger transcription factor poils-au-dos, a repressor of achaete-scute. Dev Biol (2005) 288:194–205.[CrossRef][Web of Science][Medline]
Harms E, Chu T, Henrion G, Strickland S. The only function of Grauzone required for Drosophila oocyte meiosis is transcriptional activation of the cortex gene. Genetics (2000) 155:1831–1839.
Holt RA, Subramanian GM, Halpern A. (123 co-authors). The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 298:129–149.
Hooper SD, Boue S, Krause R, Jensen LJ, Mason CE, Ghanim M, White KP, Furlong EE, Bork P. Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis. Mol Syst Biol (2007) 3:72.[Medline]
Huntley S, Baggott DM, Hamilton AT, Tran-Gyamfi M, Yang S, Kim J, Gordon L, Branscomb E, Stubbs L. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res (2006) 16:669–677.
Jauch R, Bourenkov GP, Chung HR, Urlaub H, Reidt U, Jäckle H, Wahl MC. The zinc finger-associated domain of the Drosophila transcription factor grauzone is a novel zinc-coordinating protein-protein interaction module. Structure (2003) 11:1393–1402.[Medline]
Jordan IK, Makarova KS, Spouge JL, Wolf YI, Koonin EV. Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Res (2001) 11:555–565.
Lander ES, Linton LM, Birren B. (255 co-authors). Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
Laundrie B, Peterson JS, Baum JS, Chang JC, Fileppo D, Thompson SR, McCall K. Germline cell death is inhibited by P-element insertions disrupting the dcp-1/pita nested gene pair in Drosophila. Genetics (2003) 165:1881–1888.
Lespinet O, Wolf YI, Koonin EV, Aravind L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res (2002) 12:1048–1059.
Luschnig S, Moussian B, Krauss J, Desjeux I, Perkovic J, Nüsslein-Volhard C. An F1 genetic screen for maternal-effect mutations affecting embryonic pattern formation in Drosophila melanogaster. Genetics (2004) 167:325–342.
Manak JR, Dike S, Sementchenko V. (11 co-authors). Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nat Genet (2006) 38:1151–1158.[CrossRef][Web of Science][Medline]
Mita K, Kasahara M, Sasaki S. (21 co-authors). The genome sequence of silkworm, Bombyx mori. DNA Res (2004) 11:27–35.[Abstract]
Mongin E, Louis C, Holt RA, Birney E, Collins FH. The Anopheles gambiae genome: an update. Trends Parasitol (2004) 20:49–52.[CrossRef][Web of Science][Medline]
Nei M, Stephens JC, Saitou N. Methods for computing the standard errors of branching points in an evolutionary tree and their application to molecular data from humans and apes. Mol Biol Evol (1985) 2:66–85.[Abstract]
Page AR, Kovacs A, Deak P. (12 co-authors). Spotted-dick, a zinc-finger protein of Drosophila required for expression of Orc4 and S phase. Embo J (2005) 24:4304–4315.[CrossRef][Web of Science][Medline]
Page RD, Charleston MA. From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol Phylogenet Evol (1997) 7:231–240.[CrossRef][Web of Science][Medline]
Payre F, Buono P, Vanzo N, Vincent A. Two types of zinc fingers are required for dimerization of the serendipity delta transcriptional activator. Mol Cell Biol (1997) 17:3137–3145.
Payre F, Crozatier M, Vincent A. Direct control of transcription of the Drosophila morphogen bicoid by the serendipity delta zinc finger protein, as revealed by in vivo analysis of a finger swap. Genes Dev (1994) 8:2718–2728.
Payre F, Noselli S, Lefrere V, Vincent A. The closely related Drosophila sry beta and sry delta zinc finger proteins show differential embryonic expression and distinct patterns of binding sites on polytene chromosomes. Development (1990) 110:141–149.[Abstract]
Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA (1988) 85:2444–2448.
Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol (2001) 314:1041–1052.[CrossRef][Web of Science][Medline]
Ruez C, Payre F, Vincent A. Transcriptional control of Drosophila bicoid by Serendipity delta: cooperative binding sites, promoter context, and co-evolution. Mech Dev (1998) 78:125–134.[CrossRef][Web of Science][Medline]
Savard J, Tautz D, Richards S, Weinstock GM, Gibbs RA, Werren JH, Tettelin H, Lercher MJ. Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. Genome Res (2006) 16:1334–1338.
Scholz H, Franz M, Heberlein U. The hangover gene defines a stress pathway required for ethanol tolerance development. Nature (2005) 436:845–847.[CrossRef][Medline]
Schüpbach T, Wieschaus E. Female sterile mutations on the second chromosome of Drosophila melanogaster. I. Maternal effect mutations. Genetics (1989) 121:101–117.
Urrutia R. KRAB-containing zinc-finger repressor proteins. Genome Biol (2003) 4:231.[CrossRef][Medline]
Wong JC, Alon N, Norga K, Kruyt FA, Youssoufian H, Buchwald M. Cloning and analysis of the mouse Fanconi anemia group A cDNA and an overlapping penta zinc finger cDNA. Genomics (2000) 67:273–283.[CrossRef][Web of Science][Medline]
Wong JC, Gokgoz N, Alon N, Andrulis IL, Buchwald M. Cloning and mutation analysis of ZFP276 as a candidate tumor suppressor in breast cancer. J Hum Genet (2003) 48:668–671.[CrossRef][Web of Science][Medline]
Xia Q, Zhou Z, Lu C. (93 co-authors). A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science (2004) 306:1937–1940.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Nowick and L. Stubbs Lineage-specific transcription factors and the evolution of gene regulatory networks Briefings in Functional Genomics, January 16, 2010; (2010) elp056v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. R Copley The animal in the genome: comparative genomics and evolution Phil Trans R Soc B, April 27, 2008; 363(1496): 1453 - 1461. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-helices found in ZADgrau; green triangles indicate the residues implicated to form the hydrophobic core in ZADgrau. hsap: Homo sapiens; mmus: Mus Musculus; ggal: Gallus gallus; xlae: Xenopus laevis; drer: Danio rerio.



