Skip Navigation


MBE Advance Access originally published online on March 22, 2007
Molecular Biology and Evolution 2007 24(6):1340-1346; doi:10.1093/molbev/msm055
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/6/1340    most recent
msm055v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pamilo, P.
Right arrow Articles by Vihavainen, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pamilo, P.
Right arrow Articles by Vihavainen, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Articles

Exceptionally High Density of NUMTs in the Honeybee Genome

Pekka Pamilo, Lumi Viljakainen and Anu Vihavainen

Department of Biology, University of Oulu, Oulu, Finland

E-mail: pekka.pamilo{at}oulu.fi.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The available genome sequences of 4 insects (the fruit fly, the African malaria mosquito, the flour beetle, and the honeybee) are used to compare the amount of mitochondrial DNA transferred to the nuclear genome (NUMTs). The data from the beetle and the bee show frequent transfer of NUMTs, whereas NUMTs in the 2 other insects are rare. The density of NUMTs in the honeybee (>1.0 bp transferred DNA per 1 kb of the nuclear sequence) is the highest in any animal studied, about ten times higher than in humans and comparable to the densities in plant genomes. The density of NUMTs in the beetle (0.056 bp/kb) is of the same order of magnitude as that in humans. The analysis of the honeybee genome indicates that NUMTs originate from all parts of the mitochondrial genome, that about two-thirds of the nuclear copies result from secondary transpositions within the nuclear genome, that the copies are significantly associated to "mariner" type transposons, and that the NUMTs consist mainly of short and fragmented copies.

Key Words: Apis • Tribolium • honeybee • flour beetle • mitochondrial DNA • mariner transposon • NUMT


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
It has been well established that organelle sequences have moved to nuclear genomes in various eukaryotic organisms during the past evolutionary history (Adams, Daley, et al. 2000Go, Bensasson et al. 2000Go; Henze and Martin 2001Go). This has taken place from both mitochondrial and plastid genomes and is likely to be an ongoing process (Woischnik and Moraes 2002Go; Ricchetti et al. 2004Go). A meta-analysis of the existing data suggests that there is a lot of variation among animal species in the amount of recent DNA transfer from mitochondria to the chromosomes (Richly and Leister 2004Go). Human chromosomes have a large number of segments that most likely have originated from the mitochondrial genome (NUMTs or nuclear copies of mitochondrial origin). Transfers that can be inferred from a high similarity in a Blast search sum up to close to 280 kb in the human nuclear genome (Richly and Leister 2004Go). The amount transferred in other animals is clearly smaller, close to 50 kb in the mouse and yet smaller in the others, whereas plant nuclear genomes have shown to be rich in NUMT sequences (Richly and Leister 2004Go). It is typical to the human NUMTs that they come from all parts of the mitochondrial genome and consist of short sequences with lots of structural changes, being both inverted and interrupted by additional insertions (Leister 2005Go).

Insect chromosomes have so far shown to have very few or no copies of mitochondrial genes. The malaria mosquito (Anopheles gambiae) has none and Drosophila melanogaster nuclear genome has only few short inserts of apparent mitochondrial origin (Richly and Leister 2004Go). In that respect, insects seem to resemble the chicken and fish that are also reported to have few or no NUMTs (Pereira and Baker 2004Go; Venkatesh et al. 2006Go).

The insect data mentioned above come from 2 dipteran species whose complete genome has been sequenced, and the paucity of NUMTs can be specific to the order Diptera rather than to insects in general. In fact, there are early reports of frequent NUMTs in grasshoppers (Bensasson et al. 2001Go) and aphids (Sunnucks and Hales 1996Go), whereas Pereira and Baker (2004)Go and Leister (2005)Go mention that the nuclear genome of the honeybee lacks mitochondrial copies. We will here extend the analysis to new insect orders to see whether the low number of NUMTs is a general pattern in insects. The genome sequences of the honeybee (Apis mellifera) and the flour beetle (Tribolium castaneum) allow such an analysis and show unexpectedly high densities of NUMT sequences in the honeybee. This agrees with the preliminary result of Kaplan and Linial (2006)Go.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The existence of NUMTs was examined in 4 insect species, the fruit fly D. melanogaster, the African malaria mosquito A. gambiae, the honeybee A. mellifera, and the red flour beetle T. castaneum. The first 2 species have been analyzed earlier, but we repeated the analyses in order to see that the previous results are comparable with the approach used by us. The mitochondrial sequences were taken from GenBank (accession numbers NC_001709 for Drosophila, NC_002084 for Anopheles, NC_001566 for Apis, and NC_003081 for Tribolium). The mitochondrial sequences were used to search the nuclear genome by BlastN through National Center for Biotechnology Information using the following versions of the released sequences: Amel_4.0 for the bee (The Honeybee Genome Sequencing Consortium 2006Go), Tcas_2.0 for the beetle (available from Baylor College of Medicine—Human Genome Sequencing Center), build 4.3 for the fly (see Adams, Celniker, et al. 2000Go), and 2.2 for the mosquito (see Holt et al. 2002Go). The whole mtDNA sequence was used as the query, but as there were very many hits in the honeybee, the search in that species was redone by using partial mtDNA sequences. This was done by searching NUMTs for each of the 13 protein-coding genes found in the mitochondrial genome. The low complexity filter was not used in the Blast search as that would have removed large parts of the query sequences, leaving them fragmented. Instead, we corrected for biased nucleotide frequencies by adjusting the E value in the searches of the honeybee genome.

NUMTs were inferred from Blast hits with the expected value E < 10–4. This is the criterion used, for example, by Richly and Leister (2004)Go earlier. There is a risk of false positives particularly when the nucleotide frequencies are biased (Antunes and Ramos 2005Go; Venkatesh et al. 2006Go). As the mitochondrial genome, particularly in the honeybee, has biased nucleotide frequencies (rich in AT), the similarity criterion was adjusted accordingly to avoid classifying short AT-rich sequences as NUMTs (see Results). We also repeated the Blast searches by filtering the low complexity sequences. This reduced the total length of the query sequences by about 50%, but the results for the remaining 50% were practically the same as those reported here. Each hit aligned by Blast was counted as a NUMT. We separately tried to estimate whether closely located NUMTs could be taken as a single transfer that had become fragmented later. As the Blast search was done for each coding gene separately, we particularly tried to identify nuclear transfers that covered neighboring genes. For this purpose, a small number of genome regions with putative long transfers were aligned with the mitochondrial sequences by using DNA block analyzer (DBA; Jareborg et al. 1999Go) available in EMBL.

Phylogenetic analyses of the NUMT sequences were carried out with the Neighbor-Joining method (Saitou and Nei 1987Go) by using MEGA3 (Kumar et al. 2004Go). Pairwise distances were estimated by Jukes–Cantor distances and the support for the clusters was estimated by bootstrapping 250 times. As an outgroup, we used the mitochondrial sequence of the stingless bee Melipona bicolor (NC004529).

Genomic sequences related to the mariner transposon Ammar1 were searched by using the complete 1,409-bp sequence of the element (AY154751 [GenBank] ) as the query (Lampe et al. 2003Go).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
NUMTs in Insects other than the Honeybee
In addition to the honeybee, the insect species included in our analyses were A. gambiae, D. melanogaster, and T. castaneum. Our results in the 2 dipteran species were practically identical with the earlier reports. No NUMTs were detected by a Blast search in the mosquito genome, and the fly had short inserts in 5 chromosomal locations, summing up to 777 bp (Table 1). These copies included sequences from the protein-coding regions and tRNA genes of the fly mitochondrial genome.


View this table:
[in this window]
[in a new window]

 
Table 1 The Amount of mtDNA Transferred to the Nuclear Genome in Insects and Selected Other Taxa

 
The number of NUMTs in the genome of the flour beetle clearly exceeds that of the dipterans. Counting only the Blast hits that were mapped to a defined linkage group (chromosome), there were a total of 91 separate Blast hits summing up to 8,821 bp (table 1). Some of the hits were clustered close to each other, separated sometimes by only a small number of nucleotides, and originated most likely from a single transfer from the mitochondrial genome. This interpretation suggests that the nuclear copies represented 57 insertions (table 1). A large fraction (about one half) of the NUMTs originated from the mitochondrial genes ND4 and ND5. NUMTs were detected in all the 9 autosomal chromosome pairs and the X chromosome of T. castaneum. The chromosomes 3, 5, 8, and 9 had over 1,000 bp of NUMTs each, and these are also the largest of the beetle chromosomes.

Frequency and Size of NUMTs in the Honeybee
The Blast search of the honeybee genome by using the complete mitochondrial genome as a query resulted in >25,000 hits, and the search was redone by using the 13 protein-coding mitochondrial genes separately as query sequences. These searches revealed over 2,000 nuclear copies when counting only those hits that were mapped to one of the honeybee linkage groups. Using the limiting value of E = 10–4 and taking the alignments produced by BlastN showed a total of 2,050 nuclear copies, summing up to 275,022 bp (table 1). The mean length of the hits was 134 bp, ranging up to 926 bp (table 2). Many of the nuclear copies were in small pieces, interrupted by other insertions, and the closely located copies thus most likely resulted from a single copying event.


View this table:
[in this window]
[in a new window]

 
Table 2 The Size Distribution of Honeybee NUMTs Detected by the Blast Search (initial) and after Correcting for Biased Nucleotide Frequencies (corrected)

 
The mitochondrial genome of the honeybee is known to be very AT rich (Crozier RH and Crozier YC 1993Go). The AT content of the coding sequences of individual genes ranges from 76% in COI to 89% in ATP8, the overall frequency in the coding sequences being 83.3%. The chance probability of finding a highly similar sequence from the database increases when the nucleotide frequencies are very biased. Assuming that the probability of finding identity increases by 50% per 1 nt because of the nucleotide bias, the critical E value changes by many orders of magnitude for long sequences, the change per n nucleotides being (1.5)n. The 50% increase per nucleotide corresponds roughly to a change from equal nucleotide frequencies to 84% AT. We used this factor to adjust the critical E values to 10–4/(1.5)n. This change removed mainly some short sequences as possible chance similarities due to biased nucleotide frequencies. The number of nuclear copies dropped to 1,619, with a mean length of 147 bp and a total length of 237,325 bp (Table 1 and supplementary table S1, Supplementary Material online). There was no dramatic change in the overall distribution of the NUMT length (table 2). The hits were found in 575 separate chromosomal locations, when a location was defined as a cluster of copies with intercopy distances less than 10 kb.

Origin of NUMTs in the Honeybee
The honeybee NUMTs originated from all the 10,404 bp of the protein-coding genes of the honeybee mitochondria (supplementary table S1, Supplementary Material online). All parts of the source genes are represented by multiple copies in the nuclear genome, the copy number ranging from 4 to 54 for the 50-bp window lengths used in the analysis (fig. 1). The number of hits and the total amount of DNA transferred to the nuclear genome increased with the length of the coding sequence (fig. 2). The length of the gene correlated with the number of hits derived from it (r = 0.93, degree of freedom [df] = 11, P < 0.01), with the average length of the hits (r = 0.94, df = 11, P < 0.01), and with the total amount of DNA transferred (r = 0.94, df = 11, P < 0.01).


Figure 1
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 1.— The number of nuclear copies originating from separate sites of the honeybee mitochondrial genes. The panel shows the copy number distribution for all the protein-coding sequences, using 50-bp intervals (the total length being 10,404 bp). The boundaries of the genes are shown by gaps.

 

Figure 2
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 2.— The total amount of mitochondrial DNA copied to the honeybee chromosomes, separately for each protein-coding gene plotted according to the length of the gene.

 
Many of the nuclear copies were short, and it was hard to do reliable phylogenetic analysis of them. We therefore restricted the phylogenetic analyses to 13 segments where at least 6 NUMTs longer than 150 bp could be aligned (the alignments covered 164–311 bp, but short alignment gaps reduced the length of the comparable sequences a little). The power of detecting clear phylogenetic signal was weak for many branching points (small bootstrap values), but the results suggested that many of the nuclear copies have originated from a single initial transfer from the mitochondria to the nuclear genome followed by secondary copies among the chromosomes. The trees were rooted with the sequence of the stingless bee Melipona. The results from the 13 Neighbor-Joining trees could be summarized by counting how many times the NUMTs have been derived from the mitochondrial sequence. The total number of NUMT sequences included in the phylogenetic analyses was 113, and the trees suggested 41 independent transfers (36%) from the mitochondrial to the nuclear genome.

Location of NUMTs in the Honeybee Chromosomes
The nuclear inserts were found in all 16 linkage groups of the honeybee. The amount of nuclear copies (in terms of base pairs) correlated with the overall size of the chromosomes (r = 0.60, df = 14, P < 0.05, supplementary fig. S1, Supplementary Material online). When dividing each chromosome into 10 equally long fragments and summing the number of inserts within these 10 fragments over all the chromosomes, the NUMTs tended to be more frequent close to the chromosomal ends (fig. 3, compared with a uniform distribution {chi}2 = 55.4, df = 9, P < 0.001).


Figure 3
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 3.— Distribution of NUMTs along the honeybee chromosomes. The categories of location are obtained by dividing each chromosome into 10 equally long segments.

 
The human nuclear inserts are found preferentially in AT-rich regions of the genome (Leister 2005Go). We restricted the analysis in the honeybee to 147 NUMTs with long mitochondrial inserts and checked the AT content within a 100-kb sequence around the insert (50 kb on both sides). The AT content in these chromosomal segments ranged from 0.54 to 0.77 with a mean of 0.648 and standard deviation (SD) of 0.0465. For a comparison, we selected five 100-kb stretches from each chromosome, located at least 50 kb from the nearest NUMT. The AT content in these 80 sequences had a mean of 0.647 and SD of 0.0564. The mean AT content was the same but the segments containing NUMTs had significantly smaller variance of the AT distribution (the variance ratio F = 1.474, n1 = 147, n2 = 80, P < 0.05). The mean AT content remained the same when reducing the window from 100 kb to 20 kb around the NUMTs.

We checked whether the nuclear copies might have been associated with transposon sequences known in the honeybee. Using the mariner type transposon Ammar1 sequence as a query in the Blast search resulted in 179 hits when counting only those longer than 100 bp and mapped to a defined chromosome. Twenty-two of them were located closer than 4 kb from any NUMT, and they were associated to nuclear copies of 10 different genes. The 8-kb regions around the NUMTs (4 kb on both sides) cover 1.9% of the nuclear genome (575 x 8 kb = 4.6 Mb), and the random expectation is thus 3.5 transposon-related sequences in these areas. The observed number (22) greatly exceeded the random expectation.

We also checked for possible association between NUMTs and protein-coding genes as such an association has been postulated in the human genome (Woischnik and Moraes 2002Go; Ricchetti et al. 2004Go). In the honeybee, NUMTs were found in 244 contigs that had a total length of 110 Mb. We checked the predicted protein-coding genes in these same contigs. There were a total of 3,595 predicted genes with 26,595 exons. After removing the predicted exons, introns made 36% of the remaining sequence and had 35% of the NUMT sites. There was no significant association of NUMTs and exons as the random distribution of NUMTs would predict that 35 of them are closer than 200 bp of the nearest exon, whereas the observed number was 33. Three of the detected NUMTs covered completely a short predicted exon each (27, 66, and 101 bp).

Structural Changes
The nuclear inserts in the honeybee were mainly short and interrupted by other sequences. This was seen from the overall length distribution of the NUMTs (table 2). When mapping the location of the copies, we tried to check if it was possible to detect long, but interrupted, inserts ranging over several mitochondrial source genes. One of the longest inserts ranged over a 3,335-bp area in the mitochondrial genome from COI to COIII (fig. 4). The DBA analysis showed that the sequence was scattered over a 25-kb area in the nuclear linkage group 4. Several parts of the source sequence were deleted, or they had changed to the extent that they could not be aligned, and many other sequences had been inserted. As mentioned above, NUMTs were detected in 575 chromosomal locations in the honeybee genome. In 111 cases the chromosomal location included fragments from at least 2 mitochondrial genes (in 93 cases 2 genes, in 15 cases 3 genes, twice 4 genes, and once 6 genes). At least a quarter of these multigene inserts included genes that were not the neighboring genes in the mitochondrial genome (supplementary table S2, Supplementary Material online).


Figure 4
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 4.— An example of a putatively long honeybee NUMT in which a region of 3,335 bp of mitochondrial sequence is split into 17 regions (the lengths in base pairs shown in the middle row), totaling 1,249 bp that were recognized in the linkage group 4 by DBA. The other numbers show the lengths of the nonaligned intervening sequences in the mitochondrial genome (above) and in the linkage group 4 (below).

 
It was difficult to infer and align long transfers reliably because of the fragmentation of the sequences that had occurred after the copies had been transferred to the chromosomes. An example (fig. 5) shows how an incomplete copy of the Cytb gene has been fragmented and duplicated to form a cluster of 8 tandem repeats in either the same or inverted order.


Figure 5
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
FIG. 5.— An example of the fragmentation and duplication of honeybee NUMTs. The figure shows a 10,788-bp long fragment from the linkage group 5, which includes 8 incomplete copies of the Cytb gene. The blocks show the NUMTs detected by Blast (drawn approximately in scale). The numbers show the lengths of the intervening sequences in the Cytb gene/in the chromosome. The numbers (in bold) at the beginning of each row show the length (in base pairs) of the intervening sequences in the chromosome. The nuclear sequences continue from the end of a row to the beginning of the next row, and a region where the mitochondrial sequences are in an inverted position is shown in gray.

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The insect genomes have so far shown to have only few NUMT sequences. The first thorough analysis came with a strong taxonomic bias from 2 dipteran species (Richly and Leister 2004Go), and the present results from the Tribolium flour beetle and the honeybee show a higher amount of transferred sequences. Particularly, the honeybee has more NUMT sequences than reported from any animal species other than the human (Richly and Leister 2004Go). The plant genomes are known to have many copies from both mitochondrial and chloroplast genomes but the size and gene content of the mitochondrial genome in plants is very different. Our findings contradict the view presented by Pereira and Baker (2004)Go and Leister (2005)Go who claimed, on the basis of an earlier version of the genome sequence, that the honeybee has no NUMTs. A recent preliminary finding by Kaplan and Linial (2006)Go, however, agrees well with our results. If we take into account that the nuclear genome size of the honeybee is less than 10% of that of the humans and calculate the density of NUMTs as base pair transferred sequence per kilo base pair of nuclear genome, the density of NUMTs in the honeybee is over ten times that of humans (table 1). The flour beetle has a NUMT density approaching that of humans, whereas the other insects resemble the chicken and fish (table 1). The total amount of NUMT sequences in the honeybee is larger than that reported here as we only used the protein-coding genes, that is, about 63% of the mitochondrial genome.

There are reports from grasshoppers (Bensasson et al. 2000Go) and aphids (Sunnucks and Hales 1996) suggesting many nuclear copies of specific mitochondrial genes. The genomes of these species have not been sequenced, so it is not possible to examine the complete pattern of their NUMTs, but the copy numbers are as large, or perhaps even larger, than those in the honeybee. Both grasshoppers and aphids are hemimetabolous insects and at least in the grasshoppers the genome sizes are known to be large. The reported genome sizes in the Drosophila (176 Mb), malaria mosquito (278 Mb), Tribolium (158 Mb), and the honeybee (238 Mb) are of the same order of magnitude, whereas those in grasshoppers are orders of magnitude larger, ranging from 5,950 to 20,600 Mb (Bensasson et al. 2001Go). Pons and Vogler (2005)Go studied the evolution of a pseudogene derived from a mitochondrial RNA gene in tiger beetles and suggested that NUMTs are rare in these beetles and possibly in holometabolous insects in general. It seems likely that the genome size can explain a part of the taxonomic variation in the amount of NUMT sequences (Bensasson et al. 2001Go). Nevertheless, the genome size does not explain the difference between the honeybee and other holometabolous insects studied so far.

When comparing the amount and pattern of the occurrence of NUMTs in various species, it must be remembered that the present study only explored nuclear copies of protein-coding mitochondrial genes. Even though it was earlier thought that mitochondrial genes are transferred to chromosomes via mRNA, it has now been documented that direct copying of double stranded DNA is probably more important (Henze and Martin 2001Go). This has been seen in the transfer of noncoding sequences and long sequences stretching over several genes and in the lack of edited changes in the nuclear copies. We did not find many putatively long transfers in the honeybee, although some stretched reliably over several coding genes (see fig. 4; supplementary table S2, Supplementary Material online). As the protein-coding genes (10,404 bp) cover only about 63% of the mitochondrial genome, the total amount of genetic material originating from the mitochondrial genome must be considerably larger than reported here. We have preliminarily checked also the existence of NUMTs originating from other mitochondrial regions, and the overall patterns looked similar to that reported here for the protein-coding genes. There were, however, some technical problems caused by highly AT-rich regions when trying to distinguish real NUMTs amongst AT-rich nuclear sequences.

Several studies in other organisms have tried to estimate whether the transfer of mitochondrial genes is an ongoing process and whether the nuclear copies represent initial transfers or secondary duplications within the nuclear genome (e.g., Bensasson et al. 2000Go; Mourier et al. 2001Go; Woischnik and Moraes 2002Go; Hazkani-Covo et al. 2003Go). Phylogenetic analyses of the nuclear pseudogenes have suggested many separate transfers from the mitochondrion (Bensasson et al. 2000Go), and some insertions may not have been fixed in the genome of a species but segregate as polymorphisms (Zischler 1995Go; Mourier et al. 2001Go). There are estimates that 32–85% of the human NUMTs result from separate insertions (Hazkani-Covo et al. 2003Go; Leister 2005Go). It is evident that secondary duplications have been common in many genomes. There was a clear indication of both processes in the honeybee data. The phylogenetic trees indicated several transfers, even though the fragmentation of the transferred segments made it difficult to align long sequences required for well-supported trees. The trees suggested that the majority of the nuclear copies have resulted from secondary duplications in the chromosomes. The proportion of initial transfers (41 out of 113) in the combined analysis of the 13 phylogenetic trees suggest that on average 36% of the insertions represent separate transfers from the mitochondrial genome, the rest being secondary copies. This fraction is very close to that estimated for human NUMTs by Hazkani-Covo et al. (2003)Go. We do not know if the same ratio applies to short NUMTs. The bootstrap support for the clusters in the phylogenetic trees was not always very high, but the error in clustering may equally go well in both directions (suggesting either a smaller or a larger number of independent transfers).

The analysis of a tiger beetle mitochondrial pseudogene indicates that deletions outnumber insertions in the evolution of NUMT sequences (Pons and Vogler 2005Go). Alignment of long blocks of the honeybee NUMTs (figs. 4 and 5) indicated likewise that the segments that could not be aligned included many relatively short deletions and a few very long insertions.

The distribution and fragmentation of the honeybee NUMTs resembled in many respects the pattern observed in the humans (Leister 2005Go) and also in plants where the nuclear copies of organelle DNA exist both as collinear sequences and as mosaics (Noutsos et al. 2005Go). The mean length of NUMTs has been reported to be 206 bp in the humans, 104 bp in the rat, and 281 bp in the mouse (Richly and Leister 2004Go). The human NUMTs are often disrupted and inverted (Leister 2005Go). This pattern was also clear in many honeybee copies (e.g., figs. 4 and 5). Such structural changes made it difficult to align long sequences, to identify old insertions, and to study the long-term evolution of inserted sequences. The human NUMTs are more common in AT-rich areas of the genome (Leister 2005Go). The chromosomal location of the honeybee NUMTs had exactly the same AT content as the other parts of the genome and close to the mean nucleotide frequency of 67% AT in the whole genome (The Honeybee Genome Sequencing Consortium 2006Go). However, the distribution of the AT content in the areas with NUMTs had a smaller variance suggesting that they are less common with very low or high AT contents. Woischnik and Moraes (2002)Go found that the NUMTs in humans avoid protein-coding genes, whereas Ricchetti et al. (2004)Go showed that particularly newly inserted NUMTs are commonly located in introns. Neither situations held in the honeybee and the NUMTs seemed to be randomly located in introns and intergenic sequences and showed no association with exons.

The honeybee NUMTs were associated with transposons. The only transposons commonly found in the honeybee genome are of the mariner type, and many copies are incomplete and diverged (The Honeybee Genome Sequencing Consortium 2006Go). It is not clear if the NUMTs have dispersed with the transposon sequences, but at least they occur in such genomic regions that have transposons or remnants of an old transposon sequence. These associations strongly support the view that a large number of the honeybee NUMTs have resulted from secondary duplications after the initial transfer from the mitochondrion.

The overall genome size of the honeybee genome does not differ dramatically from that of Drosophila, Anopheles, or Tribolium. Accordingly, the huge difference in the amount of NUMT sequences in these species cannot be explained by a general overall increase of genomic DNA in the honeybee. Another feature specific to the honeybee genome is an exceptionally high rate of recombination (Hunt and Page 1995Go; Solignac et al. 2004Go). The genomic rate in the honeybee exceeds, for example, that in Drosophila by an order of magnitude (Beye et al. 2006Go; Sirviö et al. 2006Go). Could there be an association between the high recombination rate and the high frequency of NUMTs? Both phenomena require double-strand breaks (Ricchetti et al. 2004Go) and could indicate some type of instability of the genome. Alternatively, both high recombination rate and duplication of mitochondrial inserts might be slightly harmful changes. For this reason they would be selected against, but selection is not effective in small populations. The role of the effective population size as an important determinant of genome evolution has been recently raised, for example, by Lynch and Conery (2003)Go. Social insects, such as the honeybee, have a small ratio of the effective population size to the number of living individuals (most of which are sterile workers), and slightly harmful features could accumulate in their genomes at rates higher than in many other organisms (Sirviö et al. 2006Go). A hypothesis based on the same idea was recently presented to explain putatively high rates of nucleotide substitutions in social insects (Bromham and Leys 2005Go), but a genome-wide comparison shows that the genome of the honeybee has evolved more slowly than those of the dipteran insects that have been sequenced (The Honeybee Genome Sequencing Consortium 2006Go). Future comparative studies could shed light on these questions.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary tables S1 and S2 and figure S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We want to thank the reviewers for valuable comments and Matti Viljakainen for help in data mining. The work has been supported by grants from the Academy of Finland (211489 and 214499).


    Footnotes
 
William Martin, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Adams KL, Daley DO, Qiu Y-L, Whelan J, Palmer JD. Repeated, recent and diverse transfers of a mitochondrial gene to the nucleus in flowering plants. Nature (2000) 408:354–357.[CrossRef][Medline]

    Adams MD, Celniker SE, Holt RA, et al, (197 co-authors). The genome sequence of Drosophila melanogaster. Science (2000) 287:2185–2195.[Abstract/Free Full Text]

    Antunes A, Ramos MJ. Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes. Genomics (2005) 86:708–717.[CrossRef][Web of Science][Medline]

    Bensasson D, Zhang D-X, Hartl DL, Hewitt GM. Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends Ecol Evol (2001) 16:314–321.[CrossRef][Medline]

    Bensasson D, Zhang DX, Hewitt GM. Frequent assimilation of mitochondrial DNA by grasshopper nuclear genomes. Mol Biol Evol (2000) 17:406–415.[Abstract/Free Full Text]

    Beye M, Gattermeier I, Hasselmann M, et al, (15 co-authors). Exceptionally high levels of recombination across the honey bee genome. Genome Res (2006) 16:1339–1344.[Abstract/Free Full Text]

    Bromham L, Leys R. Sociality and the rate of molecular evolution. Mol Biol Evol (2005) 22:1393–1402.[Abstract/Free Full Text]

    Crozier RH, Crozier YC. The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization. Genetics (1993) 135:97–117.[Abstract]

    Hazkani-Covo E, Sorek R, Graur D. Evolutionary dynamics of large numts in the human genome: rarity of independent insertions and abundance of post-insertion duplications. J Mol Evol (2003) 56:169–174.[CrossRef][Web of Science][Medline]

    Henze K, Martin W. How do mitochondrial genes get into the nucleus? Trends Genetics (2001) 17:383–387.[CrossRef][Web of Science][Medline]

    Holt RA, Subramanian GM, Halpern A, et al, (123 co-authors). The genome sequence of the malaria mosquito Anopheles gambiae. Science (2002) 298:129–149.[Abstract/Free Full Text]

    Honeybee Genome Sequencing Consortium. Insights into social insects from the genome of the honeybee Apis mellifera. Nature (2006) 443:931–949.[CrossRef][Medline]

    Hunt G, Page RE Jr. Linkage map of the honeybee, Apis mellifera, based on RAPD markers. Genetics (1995) 139:1371–1382.[Abstract]

    Jareborg N, Birney E, Durbin R. Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res (1999) 9:815–824.[Abstract/Free Full Text]

    Kaplan N, Linial M. ProtoBee: hierarchical classification and annotation of the honey bee proteome. Genome Res (2006) 16:1431–1438.[Abstract/Free Full Text]

    Kumar S, Tamura K, Nei M. MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.[Abstract/Free Full Text]

    Lampe DJ, Witherspoon DJ, Soto-Adames FN, Robertson HM. Recent horizontal transfer of mellifera subfamily mariner transposons into insect lineages representing four different orders shows that selection acts only during horizontal transfer. Mol Biol Evol (2003) 20:554–562.[Abstract/Free Full Text]

    Leister D. Origin, evolution and genetic effects of nuclear insertions of organelle DNA. Trends Genetics (2005) 21:655–663.[CrossRef][Web of Science][Medline]

    Lynch M, Conery JS. The origins of genome complexity. Science (2003) 302:1401–1404.[Abstract/Free Full Text]

    Mourier T, Hansen AJ, Willerslev E, Arctander P. The human genome project reveals a continuous transfer of large mitochondrial fragments to the nucleus. Mol Biol Evol (2001) 18:1833–1837.[Free Full Text]

    Noutsos C, Richly E, Leister D. Generation and evolutionary fate of insertions of organelle DNA in the nuclear genomes of flowering plants. Genome Res (2005) 15:616–628.[Abstract/Free Full Text]

    Pereira SL, Baker AJ. Low number of mitochondrial pseudogenes in the chicken (Gallus gallus) nuclear genome: implications for molecular inference of population history and phylogenetics. BMC Evol Biol (2004) 4:17.[CrossRef][Medline]

    Pons J, Vogler AP. Complex pattern of coalescence and fast evolution of a mitochondrial rRNA pseudogene in a recent radiation of tiger beetles. Mol Biol Evol (2005) 22:991–1000.[Abstract/Free Full Text]

    Ricchetti M, Tekaia F, Dujon B. Continued colonization of the human genome by mitochondrial DNA. PLOS Biol (2004) 2:1313–1324.[Web of Science]

    Richly E, Leister D. NUMTs in sequenced eukaryotic genomes. Mol Biol Evol (2004) 21:1081–1084.[Abstract/Free Full Text]

    Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol (1987) 4:406–425.[Abstract]

    Sirviö A, Gadau J, Rueppell O, Lamatsch D, Boomsma JJ, Pamilo P, Page RE Jr. High recombination frequency creates genotypic diversity in colonies of the leaf-cutting ant Acromyrmex echinatior. J Evol Biol (2006) 19:1475–1485.[CrossRef][Web of Science][Medline]

    Solignac M, Vautrin D, Baudry E, Mougel F, Loiseau A, Cornuet JM. A microsatellite-based linkage map of the honeybee, Apis mellifera L. Genetics (2004) 167:253–262.[Abstract/Free Full Text]

    Sunnucks P, Hales DF. Numerous transposed sequences of mitochondrial cytochrome oxidase I_II in aphids of the genus Sitobion (Hemiptera: Aphididae). Mol Biol Evol (1996) 13:510–524.[Abstract]

    Venkatesh B, Dandona N, Brenner S. Fugu genome does not contain mitochondrial pseudogenes. Genomics (2006) 87:307–310.[CrossRef][Web of Science][Medline]

    Woischnik M, Moraes CT. Pattern of organization of human mitochondrial pseudogenes in the nuclear genome. Genome Res (2002) 12:885–893.[Abstract/Free Full Text]

    Zischler H, Geisert H, von Haeseler A, Pääbo S. A nuclear "fossil" of the mitochondrial D-loop and the origin of modern humans. Nature (1995) 378:489–492.[CrossRef][Medline]

Accepted for publication March 15, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
H. Song, J. E. Buhay, M. F. Whiting, and K. A. Crandall
Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified
PNAS, September 9, 2008; 105(36): 13486 - 13491.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
24/6/1340    most recent
msm055v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pamilo, P.
Right arrow Articles by Vihavainen, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pamilo, P.
Right arrow Articles by Vihavainen, A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?