MBE Advance Access originally published online on December 5, 2006
Molecular Biology and Evolution 2007 24(2):611-619; doi:10.1093/molbev/msl190
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Research Articles |
Heterogeneity in Regional GC Content and Differential Usage of Codons and Amino Acids in GC-Poor and GC-Rich Regions of the Genome of Apis mellifera

* Bioinformatics Research Center, Department of Genetics and Ecology, University of Aarhus, Aarhus, Denmark
Department of Molecular Biology and Genetics, Cornell University
E-mail: frank{at}birc.au.dk.
| Abstract |
|---|
|
|
|---|
The honeybee (Apis mellifera) has a genome with a wide variation in GC content showing 2 clear modal GC values, in some ways reminiscent of an isochore-like structure. To gain insight into causes and consequences of this pattern, we used a comparative approach to study the genome-wide alignment of primarily coding sequence of A. mellifera with Drosophila melanogaster and Anopheles gambiae. The latter 2 species show a higher average GC content than A. mellifera and no indications of bimodality, suggesting that the GC-poor mode is a derived condition in honeybee. In A. mellifera, synonymous sites of genes generally adopt the GC content of the region in which they reside. A large proportion of genes in GC-poor regions have not been assigned to the honeybee assembly because of the low sequence complexity of their genome neighborhood. The synonymous substitution rate between A. mellifera and the other species is very close to saturation, but analyses of nonsynonymous substitutions as well as amino acid substitutions indicate that the GC-poor regions are not evolving faster than the GC-rich regions. We describe the codon usage and amino acid usage and show that they are remarkably heterogeneous within the honeybee genome between the 2 different GC regions. Specifically, the genes located in GC-poor regions show a much larger deviation in both codon usage bias and amino acid usage from the Dipterans than the genes located in the GC-rich regions.
Key Words: Apis mellifera GC content isochore codon bias amino acids
| Introduction |
|---|
|
|
|---|
Genome-wide sequencing at high coverage of several Drosophila species (Adams et al. 2000
The honeybee is a member of the Hymenoptera order, one of 11 orders of holometabolous insects. All other insect species sequenced thus far at relatively high coverage belong to one of the following 3 orders: Diptera (Drosophila sp. and Anopheles sp.), Lepidoptera (B. mori), or Coleoptera (Tribolium castaneum). The exact phylogenetic relationships between these 3 orders and Hymenoptera have been hard to resolve, but recently large genome-scale investigations have provided evidence for a basal position of Hymenoptera (Savard et al. 2006
). Combining the new evidence for a basal position of Hymenoptera among the holometabolous insects sequenced thus far with divergence times estimated using a molecular clock, it is believed that the most recent common ancestor of Diptera and Hymenoptera existed, approximately, 330350 MYA (Gaunt and Miles 2002
; Savard et al. 2006
).
Analysis of the complete draft sequence of the honeybee genome has revealed several unique features compared with the other insect genomes sequenced to date. Most relevant to our study are the following 3 observations: an extremely low overall GC content, a relatively slow rate of molecular evolution, and a high frequency of CpG dinucleotides (Honeybee Genome Sequencing Consortium 2006
).
One of the very basic differences between the genomes of amniotes and most other groups of multicellular organisms is the occurrence of relatively homogeneous GC-rich and GC-poor stretches of DNA, often referred to as isochores (Bernardi et al. 1985
). Even though the occurrence of isochores has been known for several decades, there are still several controversies regarding their origin, evolution, and extent (Eyre-Walker and Hurst 2001
; Clay and Bernardi 2005
; Cohen et al. 2005
; Duret et al. 2006
). Although the origin and evolution of isochores is still a controversial subject, the existence of clustering of high-GC and low-GC regions within the genomes of mammals and birds is generally accepted. In insects such as Drosophila melanogaster, the heterogeneity in GC content appears to be far less evident. Initial analyses of the honeybee genome revealed much greater AT richness compared with D. melanogaster. Furthermore, there were large regions of homogeneous GC content varying from low- to high-GC content along the genome, resembling isochore structures (Honeybee Genome Sequencing Consortium 2006
).
A comparative analysis of relatively distantly related species can be more reliable if we restrict the analysis to coding regions. In coding regions there is less ambiguity about alignment, and comparison of synonymous and nonsynonymous sites allows an a priori discrimination between functional classes. Here we examine patterns of GC content variation and the effects it has on codon usage bias and amino acid usage in greater detail by comparing the coding genome of the honeybee with the coding genomes of 2 dipteran genomes, D. melanogaster and An. gambiae.
| Materials and Methods |
|---|
|
|
|---|
Generation of Data
The multi-Z alignments (Blanchette et al. 2004
275 bp) that were used in the subsequent analyses.
Annotation of Data
Using the refgene file for the dm2 version of D. melanogaster annotations, alignments were annotated as 3' untranslated regions (UTR), 5' UTR, exons, introns, or intergenic. For exons, the reading frame of A. mellifera was inferred from the reading frame of D. melanogaster. All annotated reading frames from the refgene file in D. melanogaster were checked for start and stop codons. We found a few cases where the reading frame had shifted 1 or 2 bases to the left or right. These were all corrected before the annotation was done, so as to fulfill the criteria that all annotated genes have a start and a termination codon and no internal stop codons.
Estimation of GC Content and Test for Bimodality in Third Codon Position GC Content
The GC content of a given fragment was estimated as the proportion of Gs and Cs separately for each codon position of each alignment. To test whether the apparent bimodality in honeybee in the third codon position was significant, we fitted 2 different models to our data set: (M1) a beta distribution with 2 parameters (a and b) and (M2) a mixture of 2 beta distributions with 5 parameters (p, a1, b1, a2, and b2), where p is the percentage of sequences that belong to the first beta distribution with parameters (a1, and b1) and (1 p) is the percentage of sequences that belong to the second beta distribution with parameters (a2 and b2). Both models were fitted using the nlminb linear optimization method implemented in the statistical program package R (R Development Core Team 2005). A standard likelihood ratio test of twice the difference in log likelihood of the 2 models M1 and M2 with 3 degrees of freedom (df) was used to compare the 2 models. The 2 probability densities specified by the best-fitting model 2 were used to divide the data into 2 different classes based on the GC content of third codon position in the honeybee sequence.
Codon Usage at 4-Fold Degenerate Amino Acids in Low- and High-GC Areas
From the GC content of the third codon position, all alignments were divided into 2 classes representing the 2 modes in the bimodal distribution: low-GC regions and high-GC regions. All positions where one of the 4-fold degenerate amino acids (glycine, proline, valine, threonine, and alanine) or one of the 6-fold degenerate amino acids (leucine, arginine, and serine) have been conserved between D. melanogaster and A. mellifera were extracted for codon usage bias analyses.
Inference of Equilibrium GC Content under Inferred Substitution Models
The 5-codon substitution matrices that are estimated for the 4-fold degenerate amino acids were transformed into relative rate matrices QAA | AA = (glycine, proline, valine, threonine, and alanine) that satisfy the condition that each row sums to 1: Ri
A + Ri
C + Ri
G + Ri
T = 1, where i is one of the 4 nucleotides (A, C, G, and T). Each index in the matrix represents a relative rate of change from one nucleotide to another:
![]() |
These rate matrices were multiplied with the observed nucleotide frequencies represented by the vector NAA, giving the actual counts of the third codon position nucleotide in each of the five 4-fold degenerate amino acids:
|
|
The dot product N'AAQ'AA=N'AA, where N'AA represents the vector of codon frequencies after one time step of evolution following the substitution rates of QAA. The total number of codons in N'AA will be equal to the total current number of codons in NAA. This procedure was iterated using the new nucleotide count matrix N'AA until equilibrium was reached. The resulting codon distributions and GC content at equilibrium were determined and current deviations from equilibrium were evaluated separately for each amino acid in a
2 distribution with 3 df.
Differential Amino Acid Usage in High and LowGC Content Regions
All aligned amino acids where we have an unambiguous amino acid (no "?" or "-") for all 3 species were counted and divided into the 2 groups determined by the GC content of the third codon position in honeybee. The average GC content of all codons encoding each amino acid was estimated using an assumption of equal synonymous codon frequencies within each amino acid.
Estimation of Evolutionary Distances and Correlation with GC Content
For each 3-species alignment we estimated the 3 different pairwise p distances (DrosophilaApis, DrosophilaAnopheles, and AnophelesApis) for the 3 different codon positions as well as for the encoded amino acids.
Correlation between Third Codon Position GC Content and Regional GC Content
The regional GC content was estimated by dividing the genome into 20-kb windows with 10-kb overlap. The GC content for each region was estimated as a simple count. Unknown nucleotides (N) in the sequence were skipped so the GC content was based on fewer bases in those regions. If any region has less than 15 kb of unambiguous nucleotides, we define the GC content to be missing. The position of each sequence on the honeybee genome was determined by blasting the sequence against the genome using Megablast (Zhang et al. 2000
) with the following settings: word size (w) = 24, minimum e value (e) = 0.000001. The best hit with the highest score was used, but if a hit could not be found or if the regional GC content was missing the alignment was discarded in this analysis.
| Results |
|---|
|
|
|---|
The overall genomic average GC content of the different autosomes of the 3 species studied here are 3339% (range over 16 linkage groups) for A. mellifera, 4243% for D. melanogaster, and 4445% for An. gambiae, so the clear outlier is the honeybee with its exceptionally low average GC content.
Figure 1 shows the distributions of GC contents of each alignment of the 3 species for each codon position separately. The assembly of the honeybee genome contains sequences that did not map unambiguously to a known linkage group, and these are currently just labeled the "unknown group" (Honeybee Genome Sequencing Consortium 2006
). We have decided to split the data of the third codon position into 2 groups, 1 containing all the sequences that have already been assembled and placed onto one of the 16 honeybee linkage groups and the other containing the remaining sequences that have not so far been placed onto any linkage groups; please see the Discussion for a possible explanation of this interesting pattern. Unless otherwise specified, the remaining analyses in this paper deals with the entire alignment data set, but we have also conducted similar analyses on the 2 different subsets of the data and the results all appear to be robust across subsets of the data. Looking at the sequences that map to a known linkage group, we see that the GC content distribution of third codon position in honeybee has a clear bimodal shape, not observed in the 2 dipteran species. Adding the sequences from the unknown group does not change this observation but makes the data in the low-GC mode more abundant.
|
A likelihood ratio test with 3 df shows that a mixture model (M2) of 2 beta distributions fit the data significantly better than a single beta distribution (M1) (P < 10100). The best fit of the M2 mixture model has the following 5 parameters: p = 0.479, a1 = 5.528, b1 = 22.504, a2 = 4.041, and b2 = 3.063. A scaled histogram of our data and the density function of the fitted M2 model as well as the 2 individual beta distributions that make up the mixture model are shown in figure 2. The intersection of the 2 individual beta distributions is approximately 33% GC content. Excluding the unknown linkage groups from the data results in a similar mixture model with a slightly higher intersection around 37% GC. This slightly higher intersection point is mainly due to a wider variance on the low mode in the smaller data set. We conducted the following analysis using GC-cutoff values of 30%, 33%, 35%, 37%, and 40% and because the results are highly similar, we have chosen only to present results using a cutoff of 33% GC determined by the intersection of the 2 individual beta distributions in M2 based on the full data set. The results using the other GC values can be obtained from the authors on request.
|
We estimated the 3 pairwise p distances in the 3-species unrooted tree (D. melanogaster, An. gambiae, and A. mellifera) at the first, second, and third codon positions separately for each of the 2 GC content classes (table 1). Table 1 also includes the pairwise p distances estimated at the amino acid level. The third codon position is highly saturated in all comparisons and therefore not very informative. The first and second codon positions as well as the amino acids are not highly saturated, and from that we see that the low-GC region does not show a higher substitution rate in honeybee.
|
Using the estimated p distances for the individual alignments that have been summarized in table 1, we calculated the correlation between the evolutionary rate of the honeybee in an alignment and the GC content of the honeybee in the corresponding alignment. The mean GC-content of the alignments for each of the three species in low and high GC-regions respectively is summarized in table 2. There are no strongly significant correlations between the rate of evolution and the GC content of the honeybee sequence based on any of the 3 types of data: first codon position, second codon position, and amino acids (Pearson productmoment correlation: cor = 0.01, P = 0.1198; cor = 0.03, P = 1.5 x 106; and cor = 0.02, P = 0.0207), respectively.
|
To investigate the consequences of the bimodality seen in the GC distributions all 4-fold degenerate and 6-fold degenerate amino acids, where the amino acid is conserved among D. melanogaster and A. mellifera, were extracted and divided into the 2 logically exclusive categories based on the GC content of the honeybee sequence they came from (low GC < 33% and high GC
33%). For each 4-fold amino acid, we have a 4 x 4 matrix showing the minimum number of substitutions that are required to explain the data. Table 3 shows 10 matrices for the 5 different amino acids for low- and high-GC content, respectively. For the lowGC content sequences, it is clear that all five 4-fold degenerate amino acids show a similar pattern, a generally very strong change in usage from GC-rich codons in Drosophila to AT-rich codons in honeybee. In total, we observe 114,781 changes from GC-rich codons in Drosophila to AT-rich codons in honeybee compared with 9,048 changes in the other direction, a ratio of approximately 12.7. For the highGC content sequences the trend is in the same direction, but the pattern is much less clear with 41,653 and 28,541 changes, respectively, which results in a ratio of approximately 1.5. Even though the trend is not as strong as in the low-GC regions, it is still highly significant (chi-square = 2,449, P < 106).
|
Table 4 shows the 6 corresponding matrices for the three 6-fold degenerate amino acids. A pattern similar to that seen in the 4-fold degenerate sites is seen in the "4-fold" part of all three 6-fold degenerate amino acids. The amino acids located in a low-GC region generally use the codons with A or T in 3rd position. For leucine and arginine, only 1 mutation is necessary to go from the 4-fold codons to the "2-fold" codons (see table 4), whereas serine requires a mutation in both first and second codon position to change from the 4-fold codons to the 2-fold codons. Because it requires 2 mutations, we do not expect many changes from a 4-fold codon to a 2-fold codon in serine, which agrees well with what we observe. Interestingly, the most extreme ratio of changes between 2 synonymous codons is found in the low-GC region in leucine in a case where it requires 2 mutations (CTG
TTA). We observe a ratio of 16,971/84 an approximately 200-fold greater tendency for TTA in A. mellifera and CTG in D. melanogaster compared with the one having CTG in A. mellifera and TTA in D. melanogaster. In contrast, the ratio seen in the high-GC regions is "only" 2,902/373, approximately 8. Another interesting observation is that the first-position changes between synonymous codons in leucine and arginine on average show a much stronger bias toward the AT-rich first-position codons in honeybee in the low-GC regions compared with the high-GC regions; 15.7 versus 4.0 in leucine and 6.4 versus 3.6 in arginine, respectively.
|
The counts of substitutions presented in table 3 allow estimates of relative rates of substitutions, and from these we can model how the base composition will change under each of the 4 x 4 substitution matrices over time. We use these substitution matrices to find the equilibrium codon frequencies for each of the five 4-fold degenerate amino acids. The resulting third codon position GC content at equilibrium and the current deviation from that for each of the 5 amino acids at both high- and low-GC content is shown in table 5. We find that the high-GC regions seem to be evolving to a GC content fairly similar to what is observed, whereas the low-GC regions are evolving to a GC content only slightly lower. That is, high-GC regions appear to be closer to equilibrium (F-test [total]: 19.00, P = 0.019). In the regions of low-GC content, there remains a pronounced trend for the substitution matrices to drive the GC content to be lower still. Thus, it is the low-GC regions that appear to be in the process of the greatest change. We note that proline does not show this tendency: in fact, for proline the lowGC content region is actually a bit closer to equilibrium than the highGC content region. This could in part be due to sampling because of the relative rarity of proline.
|
The amino acid alignments were used to study the differential usage of amino acids in the low- and high-GC regions of A. mellifera compared with the average of the 2 dipterans. The result of this analysis is shown in figure 3, which shows a strong tendency in the low-GC regions to use more GC-poor amino acids and less GC-rich amino acids. In the high-GC regions in A. mellifera this trend is much less pronounced, resulting in a significantly different amino acid usage in the 2 different regions of the genome of A. mellifera.
|
Dividing the honeybee genome into windows of 20 kb with 10-kb overlap, we mapped each individual sequence onto its specific region and measured the correlation between the 20-kb regional GC content and the GC content of the third codon position of the given honeybee sequence; this plot is shown in figure 4. The correlation is highly significant and positive (Pearson productmoment correlation: cor = 0.68, P < 106). Using only 4-fold degenerate sites, where the encoded amino acid is conserved between D. melanogaster and A. mellifera, or using different regional window sizes yields very similar results (data not shown).
|
| Discussion |
|---|
|
|
|---|
One of the major problems when comparing very distantly related species is that only short fragments can be aligned with high confidence. The last common ancestor of A. mellifera and D. melanogaster probably existed at least 300 MYA, sufficiently long ago that the substitution process at all neutrally evolving sites is very close to saturation (Gaunt and Miles 2002
Random shotgun sequences of whole genomes generally produce a residual set of scaffolds that may have many sequence reads in the assembly, but they fail to connect to any known gene or other mappable feature of the genome. These unmapped scaffolds are often enriched for sequences embedded in heterochromatin (Adams et al. 2000
), and given the GC poorness of most heterochromatic repeats, the unmapped scaffolds tend to be GC poor. The third codon positions of coding sequence in the unmapped contigs in A. mellifera have a very low GC content. Part of the reason for this is most likely that a very high or low GC content will result in a relatively lower sequence complexity, which might make it hard to uniquely place such a sequence on a specific place in the genome. Because the third codon positions were found to be a good indicator of the regional GC content, these coding sequences should probably be placed on the honeybee genome in regions with relatively low sequence complexity. We have very few trustworthy alignments of intronic and intergenic regions because reasonable alignments could usually only be produced if they had a clear and unique overlap with a coding region, but the shape of the GC content distribution of these 2 noncoding regions resembles the distribution of the unknown third codon position fairly well.
This leaves us to explain why such a markedly bimodal distribution of GC content is seen only in the honeybee genome. One peak of the bimodal distribution is approximately equal in GC content to that found in the 2 dipterans, whereas the new peak has markedly lower GC content. One explanation could be that the honeybee generally experiences a very strong mutational bias toward A/T nucleotides. If selection is maintaining the GC content of third codon positions through a very strong codon usage bias in some genes or regions of the genome, we would expect the evolutionary rate in these regions to be significantly slower than other regions. We have no evidence that this is the case. On the contrary, table 1 actually indicates that the low-GC regions evolve a bit slower than the high-GC regions. The reason for this is not obvious, but we can speculate that it might be because stronger negative selection has acted on the regions that are changing in GC content and therefore these changes more often result in different amino acids. The observation that a significantly different distribution of amino acids is now being used in the low-GC regions compared with the high-GC regions supports this idea (fig. 3).
Another explanation of the bimodality in GC content we observe could be that for some currently unknown reason 2 markedly different mutational patterns exist in different regions of the genome. In some regions, a very strong bias toward A/T mutations makes this part of the genome extremely AT rich, whereas an unbiased mutation pattern exists in the other parts of the genome. If this were the case, we would expect to see the following 4 things: 1) a change in synonymous codon usage proportional to the change in GC content in that respective region, 2) a significant shift in amino acid usage to more GC-poor amino acids (depending on the magnitude of the mutation bias and the strength of the selective pressure on conserving the protein structure), 3) third codon position GC content resembling the GC content of the surrounding intergenic regions, and 4) a greater difference between flies and honeybee in GC content in the first codon position compared with the second codon position (because the first codon position changes are not all nonsynonymous). We also expect the AT bias to be significantly more pronounced in the low-GC regions compared with the high-GC regions.
We observe all 4 attributes in our data. Figure 1 shows a significantly lower GC content in the honeybee when looking at the first codon position compared with that of the second codon position. From table 4, we observe that the AT bias is much stronger in the low-GC regions as expected. Furthermore, figure 4 shows a very strong correlation between the regional GC content and third codon position GC content. This coupled with the very strong codon usage bias and amino acid bias we see in the low-GC regions indicates that the GC content of the third codon position could merely be a product of the mutational processes acting on the specific regions the genes reside in and that the very strong codon usage bias and amino acid bias we see in the low-GC regions are simply a consequence of an extremely high A/T bias in the mutation process. This mutation process might simply be so strong that it overwhelms the negative selection that acts to conserve the amino acid composition and the codon usage bias in the different proteins.
In sum, our analysis suggests that 2 very different mutational processes could be acting in different regions of the honeybee genome. One has a pattern of substitutions similar to that in D. melanogaster and An. gambiae. The other, which appears to be unique to honeybee, is driving a portion of the genome to decreased GC content resulting in a very different usage of synonymous codons and in the amino acids used in the encoded proteins in those regions. Both the low- and high-GC regions seem to evolve to be more AT rich, but the low-GC regions seem to be further from equilibrium. The reason for this is possibly the very rigid division into low- and high-GC regions used in this study. Looking at figure 2, it is easy to see that the 2 different beta distributions used to model the different modes in GC content have a rather large overlapping region. Therefore, reducing this complex bimodal distribution to a dichotomous partitioning may be an artificially simple characterization. We have conducted these analyses using different GC-cutoff values and even using a trinomial splitting where the middle region is not used in the analyses. The results are very similar to those reported here, so we do not expect these uncertainties to bias our conclusions.
The consequences of this apparent heterogeneity in mutation processes acting on the A. mellifera genome are still poorly understood, but our analysis indicates that it has had a significant effect on both codon and amino acid usage in the low-GC regions. Given the regional nature of the shift in the substitution process, and the fact that it appears so strong as to change the amino acid usage of a large fraction of the proteins caught in these regions, it seems likely that the change is systemic, changing a process of substitution rather than a novel pressure acting independently on many gene regions. Although the evidence is so far indirect, it appears as though there is a regional shift in the substitution process due to a shift in the underlying mutation process. At the moment we have no direct evidence to explain the underlying mechanism, but one can speculate that this may be due to regional recruitment of an altered DNA polymerase, a regionally maintained chromatin state, and/or occupancy of chromatin-binding proteins. More studies of possible mechanisms driving the GC content shift are certainly warranted, and further analysis will be possible when additional insect genome sequences become available.
| Acknowledgements |
|---|
|
|
|---|
We gratefully acknowledge the Honeybee Genome Sequencing Consortium for generating a valuable set of data and analysis resources and making it available to us. We would like to thank Bo Eskerod Madsen for many useful discussions throughout the duration of this study as well as 2 anonymous reviewers for several useful comments and suggestions. Chris Elsik, Eran Elhaik, and Dan Graur did much of the GC content analysis for the honeybee genome paper and have a manuscript of their own under review. Only the vagaries of publication timing allowed our paper to appear in print before theirs.
FGJ acknowledges funding from the Faculty of Science, University of Aarhus. MHS was supported by grant no. 2052-01-0032 from the Danish Agricultural Science Research Council. AGC was supported by grant no. NIH R01 GM64590 Funding for the Open Access publication was provided by these grants.
| Footnotes |
|---|
Takashi Gojobori, Associate Editor
| References |
|---|
|
|
|---|
Adams MD, Celniker SE, Holt RA, et al. (195 co-authors). (2000) The genome sequence of Drosophila melanogaster. Science 287:21852195.
Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F. (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953958.
Blanchette M, Kent WJ, Riemer C, et al. (12 co-authors). (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14:708715.
Clay O and Bernardi G. (2005) How not to search for isochores: a reply to Cohen et al. Mol Biol Evol 22:23152317.
Cohen N, Dagan T, Stone L, Graur D. (2005) GC composition of the human genome: in search of isochores. Mol Biol Evol 22:12601272.
Duret L, Eyre-Walker A, Galtier N. (2006) A new perspective on isochore evolution. Gene 385:7174.[CrossRef][ISI][Medline]
Eyre-Walker A and Hurst LD. (2001) The evolution of isochores. Nat Rev Genet 2:549555.[CrossRef][ISI][Medline]
Gaunt MW and Miles MA. (2002) An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. Mol Biol Evol 19:748761.
Holt RA, Subramanian GM, Halpern A, et al. (123 co-authors). (2002) The genome sequence of the malaria mosquito Anopheles gambiae. Science 298:129149.
Honeybee Genome Sequencing Consortium. (2006) Insights into social insects from the genome of the honeybee Apis mellifera. Nature 443:931949.[CrossRef][Medline]
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. (2002) The human genome browser at UCSC. Genome Res 12:9961006.
Moriyama EN and Hartl DL. (1993) Codon usage bias and base composition of nuclear genes in Drosophila. Genetics 134:847858.[Abstract]
R Development Core Team. (2006) R: A Language and Environment for Statistical Computing [Internet] R Foundation for Statistical Computing, Vienna, Austria Available from: http://www.R-project.org.
Richards S, Liu Y, Bettencourt BR, et al. (52 co-authors). (2005) Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res 15:118.
Savard J, Tautz D, Richards S, Weinstock GM, Gibbs RA, Werren JH, Tettelin H, Lercher MJ. (2006) Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. Genome Res 16:13341338.
Shields DC, Sharp PM, Higgins DG, Wright F. (1988) "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol Biol Evol 5:704716.[Abstract]
Xia Q, Zhou Z, Lu C, et al. (83 co-authors). (2004) A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science 306:19371940.
Zhang Z, Schwartz S, Wagner L, Miller W. (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203214.[CrossRef][ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Zayed and C. W. Whitfield A genome-wide signature of positive selection in ancient and recent invasive expansions of the honey bee Apis mellifera PNAS, March 4, 2008; 105(9): 3421 - 3426. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





