Skip Navigation


MBE Advance Access originally published online on April 17, 2006
Molecular Biology and Evolution 2006 23(7):1370-1385; doi:10.1093/molbev/msk023
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Material
Right arrow All Versions of this Article:
23/7/1370    most recent
msk023v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (6)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Mrázek, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mrázek, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Research Article

Analysis of Distribution Indicates Diverse Functions of Simple Sequence Repeats in Mycoplasma Genomes

Jan Mrázek

Department of Microbiology and Institute of Bioinformatics, University of Georgia

E-mail: mrazek{at}uga.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Simple sequence repeats (SSRs) composed of extensive tandem iterations of a single nucleotide or a short oligonucleotide are rare in most bacterial genomes, but they are common among Mycoplasma. Some of these repeats act as contingency loci in association with families of surface antigens. By contraction or expansion during replication, these SSRs increase genetic variance of the population and facilitate avoidance of the immune response of the host. Occurrence and distribution of SSRs are analyzed in complete genomes of 11 Mycoplasma and 3 related Mollicutes in order to gain insights into functional and evolutionary diversity of the SSRs in Mycoplasma. The results revealed an unexpected variety of SSRs with respect to their distribution and composition and suggest that it is unlikely that all SSRs function as contingency loci or recombination hot spots. Various types of SSRs are most abundant in Mycoplasma hyopneumoniae, whereas Mycoplasma penetrans, Mycoplasma mobile, and Mycoplasma synoviae do not contain unusually long SSRs. Mycoplasma hyopneumoniae and Mycoplasma pulmonis feature abundant short adenine and thymine runs periodically spaced at 11 and 12 bp, respectively, which likely affect the supercoiling propensities of the DNA molecule. Physiological roles of long adenine and thymine runs in M. hyopneumoniae appear independent of location upstream or downstream of genes, unlike contingency loci that are typically located in protein-coding regions or upstream regulatory regions. Comparisons among 3 M. hyopneumoniae strains suggest that the adenine and thymine runs are rarely involved in genome rearrangements. The results indicate that the SSRs in the Mycoplasma genomes play diverse roles, including modulating gene expression as contingency loci, facilitating genome rearrangements via recombination, affecting protein structure and possibly protein–protein interactions, and contributing to the organization of the DNA molecule in the cell.

Key Words: tandem repeats • contingency loci • genome evolution • statistical analysis • DNA sequence periodicity • Mycoplasma hyopneumoniae


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
The genus Mycoplasma includes many pathogens of humans and other vertebrates and is interesting from an evolutionary viewpoint with respect to adaptation to a parasitic lifestyle and different modes of interaction with host cells. Mycoplasmas are characterized by vastly reduced genomes, among the smallest of the free-living organisms (Fraser et al. 1995Go). Furthermore, mycoplasmas are diverse in terms of host environment, phenotypic features, as well as genomic characteristics. They feature a reduced number of DNA repair proteins and exhibit high mutation rates, which contributes to the accelerated evolution within the genus (Rocha and Blanchard 2002Go). Owing to wide interest from both medical and evolutionary perspectives, several sequencing projects have been directed toward Mycoplasma. Currently available complete genomes include Mycoplasma gallisepticum strain R (Papazisi et al. 2003Go), Mycoplasma genitalium strain G-37 (Fraser et al. 1995Go), Mycoplasma hyopneumoniae strains 232 (Minion et al. 2004Go), 7448, and J (Vasconcelos et al. 2005Go), Mycoplasma mobile strain 163K (Jaffe et al. 2004Go), Mycoplasma mycoides strain PG1 (Westberg et al. 2004Go), Mycoplasma penetrans strain HF-2 (Sasaki et al. 2002Go), Mycoplasma pneumoniae strain M129 (Himmelreich et al. 1996Go; Dandekar et al. 2000Go), Mycoplasma pulmonis strain UAB CTIP (Chambaud et al. 2001Go), and Mycoplasma synoviae strain 53 (Vasconcelos et al. 2005Go). Ureaplasma parvum (previously urealyticum) strain ATCC 700970 (Glass et al. 2000Go) is closely related to Mycoplasma and is considered a member of the M. pneumoniae group based on phylogenetic comparisons (Pettersson et al. 1996Go). Also completely sequenced are 2 less closely related Mollicutes, Mesoplasma florum strain L1 (Mesoplasma florum sequencing project, Broad Institute, www.broad.mit.edu) and onion yellows phytoplasma OY-M (Oshima et al. 2004Go). This collection of phenotypically diverse yet phylogenetically related organisms represents an excellent opportunity for comparative analyses of genomic features and their correlation with phylogenetic, environmental, phenotypic, and cellular characteristics of the organisms.

Simple sequence repeats (SSRs) composed of tandem iterations of a single nucleotide or a short oligonucleotide are abundant in eukaryotic genomes (Mrázek and Kypr 1994Go; Behe 1998Go; Matula and Kypr 1999Go). By contrast, long SSRs are underrepresented in most prokaryotic genomes (Field and Wills 1998Go). Long SSRs are highly polymorphic in length and, when present, they often function as contingency loci, affecting gene expression by changes in the SSR length due to DNA polymerase slippage and/or recombination (Moxon et al. 1994Go; Wassenaar et al. 2002Go). Contingency loci are generally located in the protein-coding regions of genes, where a loss or addition of one or more repetitive units can result in a frameshift mutation, or upstream of genes where they affect transcription initiation by modulating distances between protein-binding sites (Willems et al. 1990Go; Moxon et al. 1994Go; Hood et al. 1996Go; Karlin et al. 1996Go). Contingency loci can be particularly beneficial for obligate pathogens, where they promote mutations that facilitate avoidance of the host immune response or generally by promoting adaptive mutations in microbes living in rapidly changing environments (Moxon et al. 1994Go). In pathogens, contingency loci often affect families of surface lipoproteins and other antigens that can be recognized by the host immune system. Perhaps the best-studied example among Mycoplasma is the pMGA (also referred to as VlhA) family of lipoproteins in M. gallisepticum. Different strains possess between 32 and 70 pMGA proteins, and most of them feature an SSR consisting of GAA iterations upstream of the gene (Baseggio et al. 1996Go; Papazisi et al. 2003Go). Expression of the pMGA genes is controlled by the length of the GAA repeats (Glew et al. 1998Go). Some SSRs can also influence local DNA conformation, which can in turn play a role in regulation of physiological processes (Nordheim and Rich 1983Go; Htun and Dahlberg 1989Go; van Holde and Zlatanova 1994Go; Shafer and Smirnov 2000Go). In vivo formation and roles of noncanonical DNA conformations in prokaryotes have been rarely investigated, and some SSRs may participate in regulatory functions through local alterations in structure or physical properties of the DNA molecule.

SSRs acting as contingency loci appear to be relatively common among Mycoplasma (Rocha and Blanchard 2002Go). The general distribution of SSRs has been investigated in some Mycoplasma genomes (Rocha and Blanchard 2002Go; Minion et al. 2004Go), and expansion of some specific SSRs has been linked to phenotypic variations (Liu et al. 2002Go; Simmons et al. 2004Go). A previous analysis of general repeats in 4 Mycoplasma genomes revealed a high potential for local sequence rearrangements by homologous recombination (Rocha and Blanchard 2002Go). The available evidence suggests that the Mycoplasma genomes are extraordinarily dynamic and that the SSRs significantly contribute to their genomic instability. The availability of a dozen of complete Mycoplasma genomes makes it possible to systematically analyze and compare the distribution of SSRs among many different genomes for new insights into the SSR function and evolution. Analyses of distribution of sequence motifs or patterns with respect to genes and other sequence features, periodicity, or unusual spacings can often provide hints about the possible function of the motifs (Blaisdell et al. 1993Go; Cardon et al. 1993Go; Mrázek et al. 2002Go). Here I present a comprehensive comparative analysis of the distribution of SSRs among 11 Mycoplasma and 3 related Mollicutes genomes and discuss their evolutionary and functional implications.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Sequences
Complete genomic sequences and annotations were downloaded from the National Center for Biotechnology Information ftp server at ftp://ftp.ncbi.nih.gov/genomes/Bacteria/.

Simple Sequence Repeats
Tandem repeats, microsatellites, or SSRs consist of tandem iterations of a short oligonucleotide in a DNA sequence. For the purpose of this work, the length of an SSR is measured in nucleotides (base pairs). Measuring the SSR length in nucleotides rather than the number of repetitive units allows accounting for partial copies and facilitates comparisons among SSRs of different lengths and statistical evaluations of the SSR representations. Consider a sequence of nucleotides xi isin {A, C, G, T}, where l ≤ i ≤ L. L is the sequence length. An SSR of length n composed of iterations of a k-mer starts at the position i if xj = xj+k, for all j ≥ i, j ≤ i + nk–1 and simultaneously xi–1 != xi–1+k and xi+nk != xi+n. This definition can be applied to all SSRs of length n ≥ k. In addition, when counting SSRs in a sequence, repeats of a longer oligonucleotide that also qualify as repeats of a shorter oligonucleotide are only counted as the shorter oligonucleotide SSR. For example, a sequence ATATATATATAT qualifies as a dinucleotide SSR (k = 2) as well as a tetranucleotide SSR (k = 4), hexanucleotide SSR (k = 6), etc., but it is only included in the count of dinucleotide SSRs. Likewise, the sequence GCATCCATCCATCCT contains a tetranucleotide (k = 4) SSR of length 13 bp (n = 13, underlined), the sequence TACGACC contains a trinucleotide (k = 3) SSR of length 5 bp (n = 5), and the sequence CTA contains a mononucleotide (k = 1) SSR of length 1 bp (n = 1). The last example does not involve any repeat, but it is defined as an SSR of length 1 bp for consistency.

Assessments of Over- and Underrepresentations of SSR Counts
The observed SSR counts are compared with random sequences in order to evaluate whether SSRs of a given length are over- or underrepresented in genomic DNA sequences. However, random sequences can be generated by various models, and selecting an appropriate model is problematic. The real DNA sequences evolve in a complex way involving single point mutations, recombination, duplications or loss of genomic segments of various sizes, as well as acquiring DNA segments from other organisms. Moreover, different segments of DNA are subject to selection at various levels, which increases overall compositional heterogeneity of the genome. In particular, protein-coding regions exhibit biased codon usage, which is often especially strong in highly expressed genes (Sharp and Li 1986Go; Andersson and Kurland 1990Go; Sharp and Matassi 1994Go; Karlin et al. 1998Go; Karlin and Mrázek 2000Go) and gives rise to significant compositional differences both between protein-coding and noncoding regions and among different genes. In general, homogeneous random models do not properly reflect compositional heterogeneity of native DNA sequences (Fickett et al. 1992Go). In the absence of a single appropriate stochastic model, multiple models are used that reproduce different characteristics of the original sequence (table 1). Homogeneous models (b, m1, m3, and m5) determine the Bernoulli nucleotide probabilities or Markov transition probabilities from the complete genomic sequence. These probabilities are then applied to generate a random sequence, which reproduces the overall nucleotide or short oligonucleotides composition of the original sequence. Heterogeneous models (b-b, b-bp, m1-m1, m1-m1p, m1-c, m1-c1, and m3-m3p) break down the original sequence into segments corresponding to individual protein-coding regions and intergenic regions, then generate random sequence for each segment using nucleotide or transition probabilities determined from the original segment, and finally reassemble the complete random DNA sequence from the randomized genes and intergenic segments. Thus, the heterogeneous models reproduce the compositional heterogeneity of the genomic sequence at the gene scale as well as compositional differences between protein-coding and noncoding DNA segments. Comparisons of SSR counts with different models allow for robust assessments of SSR over- and underrepresentations. The software to generate random sequences by the 11 stochastic models is available for download at www.cmbl.uga.edu.


View this table:
[in this window]
[in a new window]
 
Table 1 Stochastic Models Used to Generate Random Sequences

 
Estimates of statistical significance of deviations of the observed SSR counts from expectations under various random models can be based on the Poisson distribution for SSRs of sufficient length when expected counts are low, whereas the normal distribution is an appropriate approximation when the expected count is high (Karlin et al. 1996Go). Alternatively, assessments of over- or underrepresentations of SSRs can be based on comparisons among SSRs of different lengths. Under homogeneous models (models b, m1, m3, and m5; see table 1), the SSR counts are expected to decrease exponentially with an increasing length (fig. 1). Deviations from an exponential dependence can be interpreted as over- or underrepresentation of SSRs of a given type and length. Although the nonhomogeneous models can deviate from exponential relationship, the deviations are generally not dramatic, and the differences between random models can be used as benchmarks in assessing the significance of deviations observed in the genomic DNA sequence. The expected counts were obtained by Monte Carlo simulations, generating 10 randomized versions of a genomic sequence at hand for each model and averaging the counts observed in the randomized sequences.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
 
FIG. 1.— Mono- (top left), di- (top right), tri- (bottom left), and tetranucleotide (bottom right) tandem repeats in the genome of Mycoplasma gallisepticum. The plots show counts of tandem repeats (logarithmic scale, solid circles) of the exact length given by the abscissa in comparison with counts expected in random sequences generated by different stochastic models: models "b" and "m5," dashed gray lines; models "b-b," "m1-m1p," "m1-c," and "m3-m3p," solid gray lines. See Methods for details. All repeats longer than 50 bp are counted at 50-bp length.

 
Discrete Fourier Transform
Periodic patterns in a DNA sequence can be assessed by analysis of power spectrum generated by discrete Fourier transform (DFT). In this work, a modified DFT is applied to distribution of spacings between A and T runs (consecutive adenine or thymine residues). Let Nd be the number of observations of a pair of adenine or thymine runs of length k starting at positions i and i + d in a DNA sequence. A function Formula is suitable for detection of periodic signals in the distribution Nd (j refers to the imaginary unit). The parameter R determines the range of distances over which the periodicity is evaluated. Peaks in the function F (P) identify harmonic components within the underlying distribution Nd corresponding to the period P. In the present study, the DFT was used to assess periodicity of spacings between short (4- to 7-bp length) runs of adenine or thymine.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
Comparisons of SSR Representations among Mycoplasma Genomes
Counts of SSRs defined as tandem repeats of short oligonucleotides (1-mers to 4-mers) are shown in figures 1 and 2 for M. gallisepticum and M. hyopneumoniae 232. The complete data set including tandem repeats of 1-mers to 11-mers in all genomes is available as Supplementary Material online and qualitatively summarized in table 2. Note that the plots cover only the lengths up to 50 bp, and longer SSRs are reported at length 50 bp. The SSR counts are shown in a logarithmic scale, and the plots are expected to be roughly linear. This is approximately the case in M. gallisepticum for the tetranucleotide SSRs as well as dinucleotide SSRs, with an exception of a single dinucleotide SSR of length 28 bp (fig. 1), which consists of an alternating A/T pattern and is located in the putative oriC region. Trinucleotide SSR counts display an extensive tail indicating an overrepresentation of long trinucleotide tandem repeats. By contrast, the counts of mononucleotide SSRs exhibit a sharp decline following the length 7 bp indicating an underrepresentation of long mononucleotide tandem repeats. This is common in prokaryotic genomes (Field and Wills 1998Go), whereas long mononucleotide SSRs are generally overrepresented in eukaryotes (Mrázek and Kypr 1994Go; Behe 1998Go). Mycoplasma hyopneumoniae 232 (fig. 2) exhibits a lower excess of trinucleotide SSRs and a few improbably long di- and tetranucleotide SSRs. However, the most striking pattern is displayed by the mononucleotide SSRs. Counts of mononucleotide SSRs of lengths 1, 2, and 3 bp are roughly congruent with the random models, but the slope changes at length 3 bp and the mononucleotide SSRs of length 4–7 bp are overrepresented. The SSR counts drop sharply at lengths 8 and 9 bp, and there are no mononucleotide SSRs of length 11–13 bp in the genome. Strikingly, the plot reveals a strong secondary peak centered at length 20 bp, exceeding by several folds the counts expected by chance. Underrepresentation of SSRs longer than 8 bp likely reflects a general tendency of SSRs to contract in the absence of selection (Metzgar et al. 2002Go), and the persistence of long mononucleotide SSRs (>13 bp) in the genome suggests that they are functionally significant. A similar secondary peak of mononucleotide SSR counts is observed in the other 2 strains of M. hyopneumoniae and in a much reduced form in M. mycoides and U. parvum (fig. 3) but not in the other Mollicutes genomes. All nonrandom representations of mononucleotide SSRs are due to tandem repeats of A and T, whereas G and C runs do not significantly differ from expectations based on random models (table 3 and Supplementary Material online).


Figure 2
View larger version (14K):
[in this window]
[in a new window]
 
FIG. 2.— Mono-, di-, tri-, and tetranucleotide tandem repeats in the genome of Mycoplasma hyopneumoniae 232. See legend to figure 1 for description.

 

View this table:
[in this window]
[in a new window]
 
Table 2 Summary of Long SSRs in Mollicutes Genomes

 

Figure 3
View larger version (16K):
[in this window]
[in a new window]
 
FIG. 3.— Mononucleotide SSRs (runs) in the genomes of Mycoplasma hyopneumoniae 7448 (top left), M. hyopneumoniae J (top right), Mycoplasma mycoides (bottom left), and Ureaplasma parvum (bottom right). See legend to figure 1 for description.

 

View this table:
[in this window]
[in a new window]
 
Table 3 Mononucleotide SSRs of Length <13 bp in the Genomes of 3 Strains of Mycoplasma hyopneumoniae

 
Long Mononucleotide SSRs in M. hyopneumoniae
The availability of 3 completely sequenced strains of M. hyopneumoniae presents an opportunity to investigate the evolutionary conservation of mononucleotide SSRs and subsequently speculate about their possible roles in recombination and genomic rearrangements or as contingency loci. The DNA sequences of the 3 strains are sufficiently similar to afford identification of counterparts of virtually all long SSRs by similarity of adjacent genes and flanking DNA sequences (table 3). A total of 61 mononucleotide SSR sites (potential contingency loci) were identified in M. hyopneumoniae. In 48 of the 61 cases, the corresponding SSRs or at least their remnants (not exceeding the 13-bp length) were identifiable in all 3 strains. Consistent with previous results concerning hypermutability of the contingency loci, the lengths of the SSRs vary among the 3 strains. Only 4 SSRs have exactly the same length in all 3 strains (no. 15, 31, 55, 61). By contrast, the flanking sequences are often identical or nearly identical.

What are the roles of the long A and T runs in M. hyopneumoniae? SSRs can function as contingency loci and affect gene or protein activity by changing the length of the repeat via recombination or DNA polymerase slippage during replication. Contingency loci are typically located within protein-coding regions or upstream regulatory regions (Moxon et al. 1994Go). However, it is unlikely that all the A/T runs in M. hyopneumoniae directly influence transcription or translation. First, only 2 long A/T runs are located in protein-coding regions (table 3): the repeats number 29 (overlaps with a p97 paralog gene) and 56 (overlaps with a putative lipoprotein mhp650 and its orthologs in the other 2 strains). P97 is a member of a family of surface proteins involved in attachment of the bacterium to the host epithelium (Zhang et al. 1995Go). In both cases, the repeat is located near the 5' end of the gene, where a frameshift mutation would most likely have a deleterious effect. Surface antigens and lipoproteins are among those often associated with contingency loci in pathogenic bacteria (Moxon et al. 1994Go), and this may be a role of these 2 A/T runs. However, other A/T runs are intergenic and probably are maintained for different reasons. Note that the SSRs 9 and 10 (table 3) are located in the truB gene in strain 232, but this may be a result of an incorrectly assigned start codon. In strain 232, the truB gene overlaps with a gene encoding tRNA-Leu, whereas in strains 7448 and J, the 2 genes do not overlap, and the annotated truB gene is 141 bp shorter than the truB in strain 232.

Can the A/T runs in the M. hyopneumoniae genomes affect transcription rates? Whereas most A/T runs are located between co-oriented or divergent genes, 9 are located between convergent genes, that is, downstream of the stop codon with respect to both adjacent genes. These A/T runs are unlikely to be involved in regulation of transcription initiation, although they might possibly play a role in transcription termination. In this respect, the rho-independent terminators of Escherichia coli involve a stretch of thymine nucleotides but generally not exceeding 12-bp length (Cardon et al. 1993Go; Lesnik et al. 2001Go), whereas M. hyopneumoniae A/T runs are 14 bp and longer. In addition, the A/T runs of M. hyopneumoniae are often associated with large intergenic regions and are distant from the translation start site (table 3). Thus, it appears that at least some A/T runs do not act as typical contingency loci.

General recurrent sequence motifs, including A/T runs, may function as recombination hot spots. Notably, 13 A/T runs appear to be associated with genomic rearrangements or gene insertions/deletions, as indicated by comparisons of genes in the vicinity of the A/T run (table 3). However, in most of these cases, the flanking sequences on both sides of the runs are conserved, suggesting that the A/T runs themselves are not the site of recombination. Exceptions include the SSR no. 4 and 5, where the 3' flank in M. hyopneumoniae strain J differs from that in the other 2 strains (table 3). SSR no. 42 and 43 do not have conserved flanking sequences and may have been directly involved in a genome rearrangement event. These SSRs are adjacent to a transposase gene in strains 7448 and J. In addition, the SSR no. 37 is only found in strains 232 and J. It is located between pairs of homologous genes, but both flanks are substantially different.

Another possible role of long A/T runs may relate to DNA structure. In this respect, long purine or pyrimidine runs, particularly those exhibiting mirror symmetry, can promote triplex formation (H-DNA) (Mirkin et al. 1987Go; Belotserkovskii et al. 1990Go). However, with respect to strict A/T runs, apparently only those extending tens of nucleotides in length promote triplex formation and not the A/T runs of the lengths observed in M. hyopneumoniae (Fox 1990Go).

A survey of genes adjacent to the A/T runs in M. hyopneumoniae reveals mostly hypothetical protein genes but also a large number (25) of genes participating in protein synthesis, namely, rRNA and tRNA genes, ribosomal protein genes, genes encoding amino acyl-tRNA synthetases, elongation factor P, and chaperone DnaK, which binds nascent polypeptide chain and can be considered a part of protein synthesis pathway. Notably, both rRNA loci (the 5S rRNA gene is separated from 16S and 23S genes) are flanked by A/T runs on both sides (table 3). Note that there might be an annotation error involving the 5S rRNA gene. This gene is annotated exceeding 200 bp in length and includes the A/T runs at both ends, but 5S rRNA genes in other bacteria including other Mycoplasma are only about 107 bp long (Szymanski et al. 2002Go). It is possible that the 5S rRNA gene is actually shorter than as annotated and flanked by the long SSRs rather than containing the SSRs within the gene.

It is intriguing to speculate that the A/T runs of the M. hyopneumoniae genome might be involved in some regulatory mechanism affecting many essential genes of the protein synthesis pathway. Interestingly, the closely related M. pulmonis features only 6 long A/T runs and 3 of them are also next to genes functioning in protein synthesis, including the rpsA, dnaK, and tRNA-Leu genes. However, these SSRs are apparently not evolutionarily conserved between the 2 species and may have arisen independently. Mycoplasma pulmonis has an SSR upstream of the dnaK gene, and it was suggested that this SSR can influence the initiation of transcription (Rocha and Blanchard 2002Go). In contrast to M. pulmonis, M. hyopneumoniae has an SSR located downstream of dnaK. In a similar manner, one of the tRNA-Leu genes has an SSR upstream in M. pulmonis, but 2 tRNA-Leu genes in M. hyopneumoniae each feature an SSR downstream. Finally, rpsA does not have a proximal SSR in M. hyopneumoniae, but several other ribosomal protein genes do. The remaining 3 long A/T runs of M. pulmonis are next to genes for putative lipoproteins, and whereas 2 of the 3 A/T runs are located upstream of the lipoprotein genes, the third one is downstream. This contrast between locations of A/T runs in M. hyopneumoniae and M. pulmonis points to interesting evolutionary scenarios. First, the A/T runs could have arisen independently in the 2 species rather than being inherited from a common ancestor. Alternatively, the ancestral genome might have had even more A/T runs, and different ones were lost in M. pulmonis and M. hyopneumoniae. Both scenarios and especially the first one underscore the concept of rapid evolution among mycoplasmas (Rocha and Blanchard 2002Go). The placement of SSRs downstream and upstream of the same genes in different species and their occurrence between convergent as well as divergent and co-oriented genes (table 3) point to a complex relationship (or possibly lack of relationship) between the SSRs and the adjacent genes. Specifically, if these SSRs play analogous roles in the 2 species, their function must be independent of their location downstream or upstream of a gene and probably not limited to a direct modulation of transcription rates, and the physiological role of these SSRs could involve long-range (hundreds of base pairs or more) effects, possibly via altering the structure or physical properties of the DNA molecule. Alternatively, it is possible that these SSRs have different functions in the 2 species or that their roles became redundant in one or both species.

Overrepresentation of Short Adenine and Thymine Runs
Short runs of A or T, typically between 4 and 7 bp, are overrepresented in several Mycoplasma genomes (table 4 and Supplementary Material online). The strongest overrepresentation of short A/T runs is exhibited by M. hyopneumoniae (all 3 strains) and by M. pulmonis, both taxonomically classified within the Mycoplasma hominis group (Johansson and Pettersson 2002Go). By contrast, the short A/T runs are not overrepresented in M. pneumoniae, M. genitalium, M. gallisepticum, U. parvum, and M. florum and only marginally overrepresented in M. penetrans, M. mobile, M. synoviae, M. mycoides, and onion yellows phytoplasma (table 4). The overrepresentation of short A/T runs in M. hyopneumoniae and M. pulmonis is most pronounced in the intergenic regions but still significant in genes (table 5). The weaker overrepresentation of short A/T runs in genes may be due to protein-coding constraints, which may limit the capacity of the gene nucleotide sequence to accommodate the A/T runs without interfering with a proper function of the encoded protein. Interestingly, the spacings between short A/T runs are strongly modulated with a period of about 11 bp in M. hyopneumoniae and close to 12 bp in M. pulmonis (fig. 4). The length of the period was determined by application of the modified DFT to the distribution of spacings shown in figure 4. This periodicity slightly exceeds the helical period of DNA in the canonical B-form, which is close to 10.5 bp (Wang 1979Go; Rhodes and Klug 1980Go). Short A/T runs properly phased with the helical period can induce intrinsic DNA bending (Trifonov 1985Go; Hagerman 1986Go). A periodicity close to 11 bp is common in Eubacteria and has been proposed to induce negative supercoiling in the DNA, whereas 10-bp periodicity is often found in Archaea (Herzel et al. 1998Go; Herzel et al. 1999Go). Notably, Mycoplasma genomes that do not have a high abundance of short A/T runs also do not exhibit their periodic spacings (data not shown), suggesting that overrepresentation of the short A/T runs and their periodic spacings could be related. The apparent association of the overrepresentation and periodicity of short A/T runs in some Mycoplasma genomes suggests that their function relates to DNA curvature and/or bending and may indicate a role in general organization of the chromosome. In particular, some general aspects of replication, transcription, recombination, or folding of the DNA molecule in the cell may differ among different Mycoplasma species, requiring special accommodations in M. hyopneumoniae and M. pulmonis that are facilitated by abundance of properly distributed short runs of A and T.


View this table:
[in this window]
[in a new window]
 
Table 4 Short Mononucleotide SSR Overrepresentation in Mollicutes Genomes

 

View this table:
[in this window]
[in a new window]
 
Table 5 Overrepresentation of 5 and 6 bp Adenine and Thymine Runs in Protein-Coding and Noncoding Regions of Mycoplasma hyopneumoniae and Mycoplasma pulmonis

 

Figure 4
View larger version (14K):
[in this window]
[in a new window]
 
FIG. 4.— Distribution of spacings between pentanucleotides AAAAA or TTTTT in Mycoplasma hyopneumoniae (all 3 strains) and Mycoplasma pulmonis. The abscissa shows the distance between a pair of AAAAA or TTTTT, and the bars indicate the count of pairs that are separated by the given distance. The distance is measured from the first nucleotide of the first pentanucleotide to the first nucleotide of the second pentanucleotide, that is, distance 5 bp refers to a contiguous run of 10 A or T nucleotides.

 
Distribution of SSRs between Protein-Coding and Intergenic Regions
Different selective constraints operate in protein-coding and intergenic regions of the genome, and comparisons of SSR abundance in protein-coding and noncoding regions may provide insights into the SSR roles in different Mycoplasma species. One may expect that long SSRs composed of oligonucleotides whose length is not divisible by 3 would be avoided in protein-coding regions because they could promote frameshift mutations, except for contingency loci where such mutations can be beneficial (Moxon et al. 1994Go). On the other hand, mutations in 3-, 6-, and 9-meric SSRs could have subtle (if any) effect on protein function. Although this expected pattern mostly holds and SSRs in genes generally involve iterations of 3-, 6-, and 9-mers, there are some notable exceptions and contrasts (table 2 and Supplementary Material online). Long trinucleotide SSRs are overrepresented in M. genitalium, M. gallisepticum, and M. hyopneumoniae (all 3 strains). However, whereas the long trinucleotide SSRs in M. hyopneumoniae occur almost exclusively in genes, those in M. genitalium and M. gallisepticum are mostly intergenic (table 2 and Supplementary Material online; Rocha and Blanchard 2002Go; Papazisi et al. 2003Go; Minion et al. 2004Go). This suggests that although both M. hyopneumoniae on the one hand and M. genitalium and M. gallisepticum on the other exhibit similar overall abundance of trinucleotide SSRs, they may play different roles in the 2 sets of genomes. Hexanucleotide SSRs are mostly in genes. Interestingly, dinucleotide SSRs are mostly in genes in M. pulmonis but mostly intergenic in U. parvum and M. mycoides and in both intergenic regions and genes in M. hyopneumoniae (table 2 and Supplementary Material online).

The di-, tri-, and hexanucleotide SSRs in protein-coding regions are listed in table 6. The protein-coding SSRs in M. hyopneumoniae are mostly composed of tri- and hexanucleotide repeats. These SSRs vary in length among orthologous proteins in the 3 strains, but the length variations do not result in frameshift mutations. These SSRs generally occur in genes encoding hypothetical proteins and in the families of adhesin paralogs. The latter also contain tandem repeats of long oligonucleotides (>11 bp) (Minion et al. 2004Go) that were not investigated in this study. The tri- and hexanucleotide SSRs do not induce frameshift mutations when they contract or expand, but they can influence the structure and function of the encoded protein. Clusters of rare codons or codons translated to a rare amino acid can cause translational pausing, which may be relevant for proper protein folding (Krasheninnikov et al. 1991Go; Thanaraj and Argos 1996Go). However, this is unlikely the case in M. hyopneumoniae because the codons and amino acids involved in the trinucleotide SSRs (table 6) are among the most frequent of the M. hyopneumoniae genomes (data not shown). Protein segments of biased amino acid composition, such as those encoded by SSRs, may not adopt a rigid fold, and proteins containing such disordered regions are often promiscuous with respect to protein–protein interactions, functioning as hubs of protein interaction networks (Dunker et al. 2005Go). In addition, some of these SSRs code for amino acid patterns that may play specific roles in protein–protein interactions. For example, glutamine repeats found in several M. hyopneumoniae proteins can promote protein aggregation (Perutz et al. 2002Go). One may speculate that the protein-coding SSRs of M. hyopneumoniae could be important for the ability of the encoded proteins to interact with other proteins. Notably, these SSRs are generally located in hypothetical open reading frames (ORFs) and putative surface antigens (table 6). Interactions of surface antigens with host cells and enzymes affect the survival of the bacterium in the host, and some of the hypothetical ORFs may also be involved in the pathogen–host interactions. This contrasts with the role of trinucleotide SSRs in M. gallisepticum, which generally occur upstream of the start codon and appear to be involved in transcriptional regulation of the VlhA family lipoproteins (Liu et al. 2002Go; Papazisi et al. 2003Go).


View this table:
[in this window]
[in a new window]
 
Table 6 Long Di-, Tri-, and Hexanucleotide SSRs in Protein-Coding Regionsa

 
Some members of the VlhA protein family of M. gallisepticum contain long hexanucleotide repeats near the 5' end of the coding region (table 6). Interestingly, these motifs vary not only in length but also in sequence, including repeats of hexanucleotides ACACCA, ACTCCA (both translated as alternating threonine–proline patterns), and GCACCA (translated as alternating alanine–proline patterns). The variability of the nucleotide sequence of these SSRs raises interesting evolutionary questions and suggests that at least some of the SSRs may have arisen after the amplification of the VlhA protein family. Analogous to tri- and hexanucleotide SSRs in M. hyopneumoniae, these repeats are not composed of rare codons and therefore are not likely to affect translation rates. They are also unlikely to affect transcription initiation, which is modulated by GAA repeats located upstream of the start codon (Baseggio et al. 1996Go; Glew et al. 1998Go), but they may be significant for proper structure and/or function of the encoded proteins. In parallel with the SSRs in M. hyopneumoniae genes, the M. gallisepticum hexanucleotide SSRs are also found in surface proteins, which may be involved in complex networks of protein–protein interactions.

In contrast to M. hyopneumoniae and M. gallisepticum, protein-coding SSRs of M. pulmonis are all dinucleotide tandem repeats, which can cause frameshift mutations via expansion or contraction. All these SSRs occur in coding regions of DNA methylase components of restriction-modification system (RMS) as well as CpG methylases, and they are located near the 3' ends of the genes. This implies that a potential frameshift mutation in the SSR would only result in changes within the C-terminal domain, whereas the bulk of the amino acid sequence would not be affected. Thus, the 2 families of DNA methylases may be characterized by functional variance provided by replaceable C-terminal domains, whereas the bulk of the protein sequence is invariant. Alternatively, a frameshift mutation could lead to a fusion with a gene located downstream, as may be the case with the CpG methylases (Rocha and Blanchard 2002Go). The involvement of DNA methylases in the pathogen–host interactions is probably indirect. The RMS of M. pulmonis may be involved in inversions within the vsa locus, which contains genes for surface antigens (Bhugra et al. 1995Go). The exact role of the CpG methylases in M. pulmonis is unknown, but it was suggested that they could be involved in gene regulation (Rocha and Blanchard 2002Go).

Functional Diversity of SSRs in Mycoplasma Genomes
Although the overrepresentation of long SSRs is common among the Mycoplasma genomes, the comparative analysis of their distribution reveals major differences among different species and suggests diverse functions of different SSRs. SSRs can play various roles depending on their sequence, length, and location. SSRs located upstream of coding regions can function as contingency loci by promoting mutations that influence the rate of transcription initiation (Moxon et al. 1994Go). Some SSRs located in protein-coding regions, particularly near the translation start site, can also function as contingency loci by causing frameshift mutations (Moxon et al. 1994Go). Other SSRs, especially those involving trinucleotide tandem repeats translated as an amino acid run, can affect structure and function of the encoded protein, particularly with respect to protein–protein interactions (Karlin et al. 2002Go; Perutz et al. 2002Go; Dunker et al. 2005Go). Yet another class of SSRs may affect structural or physical properties of DNA and facilitate proper folding and organization of the chromosomal DNA in the cell. Perhaps, the most surprising aspect of the SSR comparisons among different Mycoplasma genomes is the apparent functional diversity of the SSRs, which include all the classes listed above. This functional diversity is further underscored by absence of long SSRs in several genomes, specifically M. penetrans, M. mobile, and M. synoviae. Mycoplasma pneumoniae, which features only 2 marginally long hexanucleotide SSRs, can also be included in this group. These mycoplasmas apparently rely on different strategies for survival in the host that do not involve hypermutable contingency loci or long SSRs in general. The apparent functional diversity of SSRs among different Mycoplasma species may have arisen in conjunction with adaptation to different mechanisms of interaction with the host cells (Vasconcelos et al. 2005Go).


    Supplementary Material
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
A complete set of figures for all 14 complete genomes is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 
I wish to thank Drs Duncan Krause and Anne Summers for stimulating discussions and comments on the manuscript. This work was supported in part by a Faculty Research Grant from the University of Georgia Research Foundation.


    Footnotes
 
Takashi Gojobori, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Andersson SG, Kurland CG. 1990. Codon preferences in free-living microorganisms. Microbiol Rev 54:198–210.[Abstract/Free Full Text]

    Baseggio N, Glew MD, Markham PF, Whithear KG, Browning GF. 1996. Size and genomic location of the pMGA multigene family of Mycoplasma gallisepticum. Microbiology 142(Pt 6):1429–35.[Abstract]

    Behe MJ. 1998. Tracts of adenosine and cytidine residues in the genomes of prokaryotes and eukaryotes. DNA Seq 8:375–83.[Medline]

    Belotserkovskii BP, Veselkov AG, Filippov SA, Dobrynin VN, Mirkin SM, Frank-Kamenetskii MD. 1990. Formation of intramolecular triplex in homopurine-homopyrimidine mirror repeats with point substitutions. Nucleic Acids Res 18:6621–4.[Abstract/Free Full Text]

    Bhugra B, Voelker LL, Zou N, Yu H, Dybvig K. 1995. Mechanism of antigenic variation in Mycoplasma pulmonis: interwoven, site-specific DNA inversions. Mol Microbiol 18:703–14.[CrossRef][ISI][Medline]

    Blaisdell BE, Rudd KE, Matin A, Karlin S. 1993. Significant dispersed recurrent DNA sequences in the Escherichia coli genome. Several new groups. J Mol Biol 229:833–48.[CrossRef][ISI][Medline]

    Cardon LR, Burge C, Schachtel GA, Blaisdell BE, Karlin S. 1993. Comparative DNA sequence features in two long Escherichia coli contigs. Nucleic Acids Res 21:3875–84.[Abstract/Free Full Text]

    Chambaud I, Heilig R, Ferris S et al. (12 co-authors). 2001. The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res 29:2145–53.[Abstract/Free Full Text]

    Dandekar T, Huynen M, Regula JT et al. (13 co-authors). 2000. Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res 28:3278–88.[Abstract/Free Full Text]

    Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. 2005. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J 272:5129–48.[CrossRef][Medline]

    Fickett JW, Torney DC, Wolf DR. 1992. Base compositional structure of genomes. Genomics 13:1056–64.[CrossRef][ISI][Medline]

    Field D, Wills C. 1998. Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. Proc Natl Acad Sci USA 95:1647–52.[Abstract/Free Full Text]

    Fox KR. 1990. Long (dA)n.(dT)n tracts can form intramolecular triplexes under superhelical stress. Nucleic Acids Res 18:5387–91.[Abstract/Free Full Text]

    Fraser CM, Gocayne JD, White O et al. (29 co-authors). 1995. The minimal gene complement of Mycoplasma genitalium. Science 270:397–403.[Abstract/Free Full Text]

    Glass JI, Lefkowitz EJ, Glass JS, Heiner CR, Chen EY, Cassell GH. 2000. The complete sequence of the mucosal pathogen Ureaplasma urealyticum. Nature 407:757–62.[CrossRef][Medline]

    Glew MD, Baseggio N, Markham PF, Browning GF, Walker ID. 1998. Expression of the pMGA genes of Mycoplasma gallisepticum is controlled by variation in the GAA trinucleotide repeat lengths within the 5' noncoding regions. Infect Immun 66:5833–41.[Abstract/Free Full Text]

    Hagerman PJ. 1986. Sequence-directed curvature of DNA. Nature 321:449–50.[CrossRef][Medline]

    Herzel H, Weiss O, Trifonov EN. 1998. Sequence periodicity in complete genomes of archaea suggests positive supercoiling. J Biomol Struct Dyn 16:341–5.[ISI][Medline]

    Herzel H, Weiss O, Trifonov EN. 1999. 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics 15:187–93.[Abstract/Free Full Text]

    Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R. 1996. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res 24:4420–49.[Abstract/Free Full Text]

    Hood DW, Deadman ME, Jennings MP, Bisercic M, Fleischmann RD, Venter JC, Moxon ER. 1996. DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc Natl Acad Sci USA 93:11121–5.[Abstract/Free Full Text]

    Htun H, Dahlberg JE. 1989. Topology and formation of triple-stranded H-DNA. Science 243:1571–6.[Abstract/Free Full Text]

    Jaffe JD, Stange-Thomann N, Smith C et al. (19 co-authors). 2004. The complete genome and proteome of Mycoplasma mobile. Genome Res 14:1447–61.[Abstract/Free Full Text]

    Johansson KE, Pettersson B. 2002. Taxonomy of Mollicutes. In: Razin S, Herrmann R, editors. Molecular biology and pathogenicity of mycoplasmas. New York: Kluwer Academic/Plenum Publishers. p 1–29.

    Karlin S, Brocchieri L, Bergman A, Mrázek J, Gentles AJ. 2002. Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci USA 99:333–8.[Abstract/Free Full Text]

    Karlin S, Campbell AM, Mrázek J. 1998. Comparative DNA analysis across diverse genomes. Annu Rev Genet 32:185–225.[CrossRef][ISI][Medline]

    Karlin S, Mrázek J. 2000. Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182:5238–50.[Abstract/Free Full Text]

    Karlin S, Mrázek J, Campbell AM. 1996. Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res 24:4263–72.[Abstract/Free Full Text]

    Krasheninnikov IA, Komar AA, Adzhubei IA. 1991. Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a contranslational protein-folding model. J Protein Chem 10:445–53.[CrossRef][Medline]

    Lesnik EA, Sampath R, Levene HB, Henderson TJ, McNeil JA, Ecker DJ. 2001. Prediction of rho-independent transcriptional terminators in Escherichia coli. Nucleic Acids Res 29:3583–94.[Abstract/Free Full Text]

    Liu L, Panangala VS, Dybvig K. 2002. Trinucleotide GAA repeats dictate pMGA gene expression in Mycoplasma gallisepticum by affecting spacing between flanking regions. J Bacteriol 184:1335–9.[Abstract/Free Full Text]

    Matula M, Kypr J. 1999. Nucleotide sequences flanking dinucleotide microsatellites in the human, mouse and Drosophila genomes. J Biomol Struct Dyn 17:275–80.[ISI][Medline]

    Metzgar D, Liu L, Hansen C, Dybvig K, Wills C. 2002. Domain-level differences in microsatellite distribution and content result from different relative rates of insertion and deletion mutations. Genome Res 12:408–13.[Abstract/Free Full Text]

    Minion FC, Lefkowitz EJ, Madsen ML, Cleary BJ, Swartzell SM, Mahairas GG. 2004. The genome sequence of Mycoplasma hyopneumoniae strain 232, the agent of swine mycoplasmosis. J Bacteriol 186:7123–33.[Abstract/Free Full Text]

    Mirkin SM, Lyamichev VI, Drushlyak KN, Dobrynin VN, Filippov SA, Frank-Kamenetskii MD. 1987. DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature 330:495–7.[CrossRef][Medline]

    Moxon ER, Rainey PB, Nowak MA, Lenski RE. 1994. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol 4:24–33.[CrossRef][ISI][Medline]

    Mrázek J, Gaynon LH, Karlin S. 2002. Frequent oligonucleotide motifs in genomes of three streptococci. Nucleic Acids Res 30:4216–21.[Abstract/Free Full Text]

    Mrázek J, Kypr J. 1994. Length expansion is a general property of simple sequence repeats in eukaryotic genomes. Miami bio/technology short reports. Advances in gene technology: molecular biology of human genetic disease. Volume 5. IRL Press, Oxford, UK. p 39.

    Nordheim A, Rich A. 1983. The sequence (dC-dA)n x (dG-dT)n forms left-handed Z-DNA in negatively supercoiled plasmids. Proc Natl Acad Sci USA 80:1821–5.[Abstract/Free Full Text]

    Oshima K, Kakizawa S, Nishigawa H et al. (11 co-authors). 2004. Reductive evolution suggested from the complete genome sequence of a plant-pathogenic phytoplasma. Nat Genet 36:27–9.[CrossRef][ISI][Medline]

    Papazisi L, Gorton TS, Kutish G, Markham PF, Browning GF, Nguyen DK, Swartzell S, Madan A, Mahairas G, Geary SJ. 2003. The complete genome sequence of the avian pathogen Mycoplasma gallisepticum strain R(low). Microbiology 149:2307–16.[Abstract/Free Full Text]

    Perutz MF, Pope BJ, Owen D, Wanker EE, Scherzinger E. 2002. Aggregation of proteins with expanded glutamine and alanine repeats of the glutamine-rich and asparagine-rich domains of Sup35 and of the amyloid beta-peptide of amyloid plaques. Proc Natl Acad Sci USA 99:5596–600.[Abstract/Free Full Text]

    Pettersson B, Uhlen M, Johansson KE. 1996. Phylogeny of some mycoplasmas from ruminants based on 16S rRNA sequences and definition of a new cluster within the hominis group. Int J Syst Bacteriol 46:1093–8.[Abstract/Free Full Text]

    Rhodes D, Klug A. 1980. Helical periodicity of DNA determined by enzyme digestion. Nature 286:573–8.[CrossRef][Medline]

    Rocha EP, Blanchard A. 2002. Genomic repeats, genome plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids Res 30:2031–42.[Abstract/Free Full Text]

    Sasaki Y, Ishikawa J, Yamashita A et al. (11 co-authors). 2002. The complete genomic sequence of Mycoplasma penetrans, an intracellular bacterial pathogen in humans. Nucleic Acids Res 30:5293–300.[Abstract/Free Full Text]

    Shafer RH, Smirnov I. 2000. Biological aspects of DNA/RNA quadruplexes. Biopolymers 56:209–27.[CrossRef][ISI][Medline]

    Sharp PM, Li WH. 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol 24:28–38.[CrossRef][ISI][Medline]

    Sharp PM, Matassi G. 1994. Codon usage and genome evolution. Curr Opin Genet Dev 4:851–60.[CrossRef][Medline]

    Simmons WL, Denison AM, Dybvig K. 2004. Resistance of Mycoplasma pulmonis to complement lysis is dependent on the number of Vsa tandem repeats: shield hypothesis. Infect Immun 72:6846–51.[Abstract/Free Full Text]

    Szymanski M, Barciszewska MZ, Erdmann VA, Barciszewski J. 2002. 5S ribosomal RNA database. Nucleic Acids Res 30:176–8.[Abstract/Free Full Text]

    Thanaraj TA, Argos P. 1996. Ribosome-mediated translational pause and protein domain organization. Protein Sci 5:1594–612.[Abstract]

    Trifonov EN. 1985. Curved DNA. CRC Crit Rev Biochem 19:89–106.[ISI][Medline]

    van Holde K, Zlatanova J. 1994. Unusual DNA structures, chromatin and transcription. Bioessays 16:59–68.[CrossRef][ISI][Medline]

    Vasconcelos AT, Ferreira HB, Bizarro CV et al. (86 co-authors). 2005. Swine and poultry pathogens: the complete genome sequences of two strains of Mycoplasma hyopneumoniae and a strain of Mycoplasma synoviae. J Bacteriol 187:5568–77.[Abstract/Free Full Text]

    Wang JC. 1979. Helical repeat of DNA in solution. Proc Natl Acad Sci USA 76:200–3.[Abstract/Free Full Text]

    Wassenaar TM, Wagenaar JA, Rigter A, Fearnley C, Newell DG, Duim B. 2002. Homonucleotide stretches in chromosomal DNA of Campylobacter jejuni display high frequency polymorphism as detected by direct PCR analysis. FEMS Microbiol Lett 212:77–85.[CrossRef][ISI][Medline]

    Westberg J, Persson A, Holmberg A, Goesmann A, Lundeberg J, Johansson KE, Pettersson B, Uhlen M. 2004. The genome sequence of Mycoplasma mycoides subsp. mycoides SC type strain PG1T, the causative agent of contagious bovine pleuropneumonia (CBPP). Genome Res 14:221–7.[Abstract/Free Full Text]

    Willems R, Paul A, van der Heide HG, ter Avest AR, Mooi FR. 1990. Fimbrial phase variation in Bordetella pertussis: a novel mechanism for transcriptional regulation. EMBO J 9:2803–9.[ISI][Medline]

    Zhang Q, Young TF, Ross RF. 1995. Identification and characterization of a Mycoplasma hyopneumoniae adhesin. Infect Immun 63:1013–9.[Abstract]

Accepted for publication April 12, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
J. Mrazek, S. Xie, X. Guo, and A. Srivastava
AIMIE: a web-based environment for detection and interpretation of significant sequence motifs in prokaryotic genomes
Bioinformatics, April 15, 2008; 24(8): 1041 - 1048.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Mrazek, X. Guo, and A. Shah
Simple sequence repeats in prokaryotic genomes
PNAS, May 15, 2007; 104(20): 8472 - 8477.
[Abstract] [Full Text]