Molecular Biology and Evolution 18:1014-1023 (2001)
© 2001 Society for Molecular Biology and Evolution
ARTICLE |
A Role for Selection in Regulating the Evolutionary Emergence of Disease-Causing and Other Coding CAG Repeats in Humans and Mice
MRC Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital, London, England;
Department of Computer Science, Royal Holloway University of London, Egham, Surrey, England
Leishmania Genome Group, Seattle Biomedical Research Institute, Seattle, Washington
| Abstract |
|---|
|
|
|---|
The evolutionary expansion of CAG repeats in human triplet expansion disease genes is intriguing because of their deleterious phenotype. In the past, this expansion has been suggested to reflect a broad genomewide expansion of repeats, which would imply that mutational and evolutionary processes acting on repeats differ between species. Here, we tested this hypothesis by analyzing repeat- and flanking-sequence evolution in 28 repeat-containing genes that had been sequenced in humans and mice and by considering overall lengths and distributions of CAG repeats in the two species. We found no evidence that these repeats were longer in humans than in mice. We also found no evidence for preferential accumulation of CAG repeats in the human genome relative to mice from an analysis of the lengths of repeats identified in sequence databases. We then investigated whether sequence properties, such as base and amino acid composition and base substitution rates, showed any relationship to repeat evolution. We found that repeat-containing genes were enriched in certain amino acids, presumably as the result of selection, but that this did not reflect underlying biases in base composition. We also found that regions near repeats showed higher nonsynonymous substitution rates than the remainder of the gene and lower nonsynonymous rates in genes that contained a repeat in both the human and the mouse. Higher rates of nonsynonymous mutation in the neighborhood of repeats presumably reflect weaker purifying selection acting in these regions of the proteins, while the very low rate of nonsynonymous mutation in proteins containing a CAG repeat in both species presumably reflects a high level of purifying selection. Based on these observations, we propose that the mutational processes giving rise to polyglutamine repeats in human and murine proteins do not differ. Instead, we propose that the evolution of polyglutamine repeats in proteins results from an interplay between mutational processes and selection.
| Introduction |
|---|
|
|
|---|
Human triplet expansion diseases are predominantly neurological and are caused by instability and expansion of tandem repeats of triplet motifs within or near genes (reviewed in Rubinsztein 1999
Two explanations for these observations have been proposed. The first suggests that the evolutionary expansion of these repeats reflects their genomewide expansion along the primate lineage and especially in humans (Rubinsztein et al. 1995a
). The reality of such lineage-specific, genomewide effects remains uncertain, despite a number of subsequent analyses (reviewed in Amos 1999
; Rubinsztein, Amos, and Cooper 1999
). This is primarily because of the confounding effect of ascertainment bias (Ellegren, Primmer, and Sheldon 1995
), that is, the expectation that repeats isolated in one species will be longer than their homologs in other species as they have been isolated because of their polymorphic nature. Long repeats are more polymorphic than short repeats. Ascertainment bias confounds even the relatively well studied comparison between humans and chimpanzees, while evidence for such differences between humans and other primates is lacking, and indeed there is some evidence to the contrary (e.g., Morin et al. 1998
). There is also evidence for very long CAG repeats in mice (King et al. 1998
). A number of explanations have been suggested for the human-chimpanzee difference (Amos 1999
; Rubinsztein, Amos, and Cooper 1999
), but these rely on characteristics of human and chimpanzee evolutionary history and therefore cannot provide an explanation for changes in repeat length over long periods of evolution.
The second possible explanation for the evolutionary expansion of CAG repeats in these genes is that forces or processes that are specific to individual genes and/or genomic locations act on particular genes in particular evolutionary lineages to give rise to locus- and lineage-specific expansions. One prominent candidate for such an influence is local base (and nucleotide motif) composition. Different isochores in mammalian genomes have different GC compositions, and genes within these regions show correlated base compositions, notably at third codon positions (Mouchiroud, Gautier, and Bernardi 1995
). Thus, genes within GC-rich isochores will tend to accumulate concentrations of codons with G and C at their third positions, which might act as seeds for replication slippage and predispose genes to accumulating codon repeats. In the extreme, such biases could even bias amino acid compositions of proteins, again predisposing genes to seeding of codon repeats (Nakachi et al. 1997
; Nishizawa and Nishizawa 1998
; Brock, Anderson, and Monckton 1999
). Brock, Anderson, and Monckton (1999)
have even suggested that local base composition affects the frequency of indel mutations at CAG repeats. Another possibility is that of the effects of local mutation rate. Kruglyak et al. (1998)
have suggested that the equilibrium length of microsatellites is a consequence of the balance between the rates of point and slippage mutation. Incorporation of point mutations into repeats reduces their rate of length change during evolution (Albà, Santibáñez-Koref, and Hancock 1999a
). If either or both of these parameters varied across a genome, this could affect the accumulation of tandem repeats. Finally, Djian, Hancock, and Chana (1996)
have suggested that codon repeats in disease genes are flanked by regions with a relatively high frequency of acceptance of point mutations. Mutational instability of regions immediately flanking CA microsatellites has also been suggested by Brohede and Ellegren (1999)
. High rates of sequence change could reflect a relatively low level of purifying selection in the vicinity of repeats. Selective forces could differ between genes and subregions of genes, depending on the phenotypic consequences of mutations in these different locations. These differences could affect the probability of tandem repeats arising, and, in particular, expanding, during evolution (Nishizawa, Nishizawa, and Kim 1999
). The recent demonstration for Saccharomyces cerevisiae that transcription factors and protein kinases are significantly overrepresented among proteins that contain polyglutamine repeats (Albà, Santibáñez-Koref, and Hancock 1999b
) also indicates a role for selective constraints in the evolution of these structures, although their functional significance remains unclear (Schmid and Tautz 1999
).
Here, we addressed the question of the forces giving rise to the evolutionary expansion of CAG repeats in triplet expansion disease and other genes by comparing the lengths of CAG repeats in humans and mice and by considering the base and codon compositions and rates of synonymous and nonsynonymous substitution in CAG repeat-containing genes. We found no evidence of a preferential accumulation of CAG repeats in the human genome relative to the mouse genome or of differences in the nature of the selection acting on genic positioning of CAG repeats in the two species. When we considered pairs of proteins that contained a CAG repeat in one species but not the other, we found no differences in the properties of surrounding sequences. However, we did find an overrepresentation, relative to the average amino acid usage in humans and mice, of the amino acids proline, glutamine, histidine, and serine, which may have given rise to biases in the gene sequences and predisposed them to accumulating repeats. We also observed locally high levels of nonsynonymous base substitution in the neighborhood of repeats in genes containing a repeat in only one species, but low levels in genes in which repeats were conserved between humans and mice. We combine these observations to propose a hypothesis to explain the evolution of these repeats.
| Materials and Methods |
|---|
|
|
|---|
Database Screening and Analysis
Genes containing repeats of five or more CAG codons in humans (Homo sapiens), mice (Mus musculus), or both were identified from a data set described previously (Albà, Santibáñez-Koref, and Hancock 1999a
For comparative analysis of database sequences containing CAG repeats of length 7 or more, the GenBank and EMBL DNA databases, including EST and STS subgroups, were analyzed using routines from the GCG package, version 9.1 (Genetics Computer Group 1997
), unless otherwise noted. The databases were searched using the pattern recognition routine FINDPATTERNS. Entries showing >95% identity to one another upon multiple sequence alignments using PILEUP (Genetics Computer Group 1997
), CLUSTAL W (Thompson, Higgins, and Gibson 1994
), version 1.7, and FASTA (Pearson and Lipman 1988
) were considered to represent the same sequence and grouped together. This allowed for sequencing errors without grouping members of gene families together as single loci. The sequence with the longest array was again taken as the representative from each of these groups. Database entries were again obtained using ENTREZ. The genic locations of repeats were identified using sequence annotations where these were available.
Sequence Analysis Methods
Tandem codon arrays of length
5 were identified using ARRAYFINDER (Hancock et al. 1999
). A modified version of ARRAYFINDER (PROTARRAY) allowed identification of all amino acid tandem repeats of this length. cDNA codon frequencies were calculated using the GCG program CODONFREQUENCY (Genetics Computer Group 1997
). These frequencies were used to calculate overall and third-codon-position base compositions using a commercially available spreadsheet, which was also used to carry out most statistical tests. Other statistical tests were carried out using the SPSS package and the VassarStats web server (http://faculty.vassar.edu/
lowry/VassarStats.html). Significance thresholds were subjected to Bonferroni adjustment to take into account multiple testing. Significance values quoted in the text are also Bonferroni-adjusted. Expected amino acid frequencies in cDNAs were calculated on the basis of overall codon frequency tables for mice and humans obtained from the CUTG database server (Nakamura, Gojobori, and Ikemura 2000) at http://www.kazusa.or.jp/codon/. To calculate synonymous and nonsynonymous DNA sequence divergences (Ks and Ka), sequence pairs were aligned using the LaserGene program MEGALIGN (DNASTAR, Madison, Wis.). Alignments were calculated by translating cDNAs into protein sequences and using the method of Hein (1990)
, which coped better with sequences of unequal length than the Clustal algorithm (Higgins and Sharp 1989
) as implemented in MEGALIGN. Ks and Ka for sequence pairs were calculated using MEGA, version 1.01 (Kumar, Tamura, and Nei 1993
) using the Jukes-Cantor correction for saturation (Jukes and Cantor 1969
). We excluded all repetitive regions from the analysis. Regions to be excluded were initially identified by length difference between species (i.e., presence of an indel in the alignment). The limits of the repeat region were then defined by extending the repeat as far as the last codon adjacent to the repeat that was identical in two out of three positions to the tandemly repeated codon in either species. This excluded not only CAG repeats, but also all other length-varying codon repeats.
| Results |
|---|
|
|
|---|
Repeat Evolution
We identified 28 genes for which complete cDNA sequences were available for both mice and humans and which contained a (CAG)
5 array in at least one species (table 1
). Of these genes, 10 contained a CAG array in both species (B genes), 10 (of which 5 were human triplet expansion disease genes) contained a CAG array in the human sequence only (H genes), and 8 contained a CAG array in the mouse sequence only (M genes) (table 1
). Thirty-one CAG arrays were identified in 20 human cDNAs, and 31 were identified in 18 mouse cDNAs. Mean CAG repeat lengths were 8.4 for humans and 8.0 for mice. The length distributions were not significantly different (P = 0.73, two-tailed Mann-Whitney U test). Group M genes might be expected to reveal any bias in CAG repeat length between humans and mice, as they contain repeats in both species, but no significant difference was detected in these genes (Wilcoxon signed-ranks test, P > 0.05, N = 14, two-tailed test). Thus, we found no evidence of a difference in CAG repeat length between humans and mice in this data set.
|
We also screened these sequences for amino acid repeats in the conceptual translation, as amino acid repeats are frequently encoded by mixtures of synonymous codons (Albà, Santibáñez-Koref, and Hancock 1999a, 1999b
5 in human proteins were of glutamine, compared with 47 of 82 repeats in mouse proteins. Mean lengths for these repeats were 12.7 for humans and 10.6 for mice (difference not significant, P = 0.50, two-tailed Mann-Whitney U test). Within group B, glutamine repeats were significantly longer in humans than in mice (Wilcoxon signed-ranks test, P < 0.05, N = 17, two-tailed test). The most common other classes of repeats were those of proline (12 in humans, 10 in mice), glycine (9 in humans, 7 in mice), and glutamic acid (7 in humans, 5 in mice). The higher proportion of glutamine repeats with respect to others in the mouse proteins was not significant (P > 0.05, chi-square, df = 1). We therefore found the relative tendencies for proteins to accumulate Gln versus other amino acid repeats to be similar in mice and humans. We also found that Gln repeats accumulating in human proteins tended to be longer than those in mouse proteins in group B. This tendency was not observed for the other gene groups. To further investigate whether the lengths of human and mouse CAG repeats differed, we screened databases for tandem CAG repeats of length >7 in the two species. We identified all repeats, irrespective of their locations within genes, and did not restrict our search to pairs of homologous sequences. Mean lengths (in base pairs) for these repeats were 29.06 (median 27, N = 205) for humans and 36.05 (median 33, N = 63) for mice. The length distributions were significantly different (P < 0.001, Mann-Whitney U test), with mice tending to have longer CAG repeats than humans. We therefore found no bias toward longer CAG repeats in humans versus mice at the whole-genome level and, indeed, found evidence of the opposite bias.
There is no a priori reason to expect tandem repeats of CAG to lie in any particular reading frame of an exon unless selection has constrained the reading frames in which these repeats have been able to expand. Frame specificity of this kind has been reported previously (Stallings 1994
). To test for any global difference in this pattern (and therefore in the selection causing it) between humans and mice, we investigated the locations of the identified repeats that lay within adequately annotated database sequences (table 2
). CAG repeats were preferentially found in the reading frame encoding glutamine (reading frame 1 in table 2
) in both humans and mice (P < 0.0001 for mice, humans, and overall; chi-square against an even distribution in all six reading frames, df = 5). There was no significant difference in repeat distribution between species (chi-square test for inhomogeneity in the 2 x 9 contingency table; P > 0.05; df = 8). Thus, there appear to be no strong differences in the selective forces acting on the locations of CAG repeats in the human and mouse genomes.
|
The results described in this section indicate no significantly greater length of CAG repeats in the human genome with respect to that of the mouse or in human proteins, and, indeed, the opposite appears to be the case. We did, however, observe a significant tendency for glutamine repeats to be longer in human group B proteins than in the homologous mouse proteins.
Base, Codon, and Amino Acid Composition
As base composition has been proposed to be an important factor in driving CAG repeat evolution (Brock, Anderson, and Monckton 1999
), we attempted to identify common sequence properties of genes containing disease-causing CAG repeats and consistent changes in homologs containing repeats relative to homologs not containing repeats by analyzing the base compositions of the cDNA sequences for the 28 gene pairs. For both mouse and human homologs and for all gene groups, G+C compositions were on average higher than expected compositions calculated from the CUTG table of codon frequencies (table 3
). The overall mean G+C composition (i.e., for groups B, M, and H pooled) deviated significantly from expectation in mice and humans (P < 0.05; two-tailed t-test). Third-codon-position base compositions were also higher than expected for all groups, but the pooled difference did not approach significance. Interspecies differences in base composition were not statistically significant. Thus, we found a generally high G+C content in the set of genes in both species, even when the gene did not contain a repeat.
|
High GC compositions could result from mutational bias at third codon positions, for example, due to the isochore location of the gene in question, or they could reflect the amino acid composition of the encoded proteins (Nakachi et al. 1997
As these analyses indicated significantly biased amino acid compositions, at least for groups B and H, we then calculated the relative representations of amino acids within the 28 proteins, again calculating expectations based on species codon frequencies (table 4 ). Significances of the observed/expected (O/E) values so calculated were estimated using the same set of sequences as above, calculating O/E values for the same numbers of random groups of 8 or 10 proteins. Confidence levels were estimated for each amino acid separately after adjusting for multiple tests. In both human and mouse data sets, four amino acids (Gln, Pro, His, and Ser) showed a significant overall excess (P < 0.05) and showed an excess in all three groups.
|
Finally, we investigated whether the observed base compositions of these genes could be explained solely on the basis of their amino acid compositions and average genomic codon usage or whether there was an excess of GC-richness that might be due to codon usage bias. This was done by calculating expected base compositions for proteins given their amino acid compositions and the CUTG synonymous codon usages (table 5 ). Amino acid composition and global genomic codon usage alone could account for the base compositions of these genes. We conclude that the biased base compositions of these genes are due to their unusual amino acid contents rather than any bias in base composition at synonymous codon sites.
|
Substitution Rate
The accumulation of CAG repeats in genes might be related to the accumulation of base substitutions in the gene for two reasons. First, purifying selection could constrain the accumulation of repeats such that proteins or protein regions under higher levels of purifying selection would accumulate repeats more slowly than regions under weaker purifying selection, if at all. Second, Kruglyak et al. (1998)
|
To investigate whether repeats appear in regions of high mutation rate or low selection relative to the remainder of the protein in which they are located (Djian, Hancock, and Chana 1996
|
Substitution rates could be affected by the GC-richness of the sequences, as sequences under pressure to adopt an extreme base composition are unable to accept many substitutions. However, we observed no significant correlations between Ka or Ks and overall or third-codon-position base composition (see also Matassi, Sharp, and Gautier 1999
If a low Ka value is indicative of relatively strong selection acting on a protein, this might also influence the rate of change of the lengths of repeat regions. We therefore investigated the relationship between Ka and the difference in the length of the longest CAG repeat present in each gene, irrespective of the species in which it was found. Ka correlated positively and significantly with this difference (r = 0.420, P < 0.05).
In summary, these results indicate an association of new repeats with regions of high Ka (corresponding to regions of low purifying selection) and no association with regions of high Ks (corresponding to a high local mutation rate).
| Discussion |
|---|
|
|
|---|
We looked for evidence that would support the involvement of various forces in the evolutionary expansion of CAG repeats in human (and murine) genes. We first investigated the possibility of a general accumulation of CAG repeats in the human genome but not in other lineages. We found no evidence for preferential accumulation or expansion of CAG repeats in the human genome relative to that of the mouse by comparing either the numbers of genes in the public databases containing CAG repeats in either species, the lengths of the CAG repeats they contain, or the overall length distributions of anonymous CAG repeats in the databases. The latter analysis indicated longer CAG repeats in the mouse than in the human genome. We found no evidence of any difference in the distribution of CAG repeats within coding regions between the species. While these analyses were subject to biases because of numerous screens for long CAG repeats associated with disease (Riggins et al. 1992
Our data also do not support the suggestion that local base composition has driven the accumulation of repeats within the 28 pairs of homologous repeat-containing genes we considered (Jurka and Pethiyagoda 1995
; Nakachi et al. 1997
; Nishizawa and Nishizawa 1998
; Brock, Anderson, and Monckton 1999
). Although we found higher GC and GC3 contents than expected for all of the gene groups studied here, this reflected solely the biased amino acid compositions of the gene products and was not the result of any preferential use of synonymous codons with GC-rich third positions, as would be expected if mutation toward a biased base composition were the force driving the observed biases. We also did not find any difference in base composition between genes containing repeats and genes not containing repeats, which would be expected if changes in base composition drove repeat evolution.
Finally, we found no relationship between mutation rate, as indicated by the synonymous substitution rate, and the emergence of repeats during evolution. This is not consistent with a model whereby repeat evolution in a genomic locality reflects the balance between point and slippage mutation rates there (Kruglak et al. 1998
). However, there is evidence that substitution rates in regions flanking CA microsatellites correlate inversely with repeat length in a larger data set (unpublished data). It is therefore possible that effects of this kind also contribute to the evolution of CAG repeats in genes but that these effects are relatively weak in this data set and/or could not be detected here because of the data set's relatively small size and the correlation between Ka and Ks.
We found three strong patterns in our data set: overrepresentations of certain amino acids, differences in the nonsynonymous substitution rates observed in group B genes compared with group H and M genes, and elevated nonsynonymous substitution rates in the vicinity of repeats in group H and M genes. At the level of amino acid composition, we observed significant overrepresentation of four amino acids, Gln, Pro, Ser, and His, in all genes studied. Along with Gln repeats, we also observed numerous Pro repeats in these proteins. It is likely that the biased amino acid compositions of these genes reflect in some way functional selection on these genes. As these amino acid composition biases are similar in human and mouse proteins, this selection must have taken place before the divergence of the two lineages, one of the most ancient eutherian divergences. The shared overrepresentation of these amino acids between species also indicates that changes in amino acid bias have not driven repeat accumulation. However, the biased amino acid compositions of repeat-containing proteins indicate that such bias might provide a breeding ground for new repeats because new repeats contain an unusual concentration of Gln codons and related codons such as CCG (Pro). The preference for polyglutamine repeats to occur in proteins with these amino acid composition biases could therefore reflect either selection favoring polyglutamine repeats in these proteins as part of a selection for a high Gln content, preferential seeding of CAG repeats in genes with high concentrations of Gln and high GC-content, or both.
We also found a significant difference in overall Ka (but not Ks) between group B proteins and other proteins and a significant bias toward higher Ka (but not Ks) near the Gln repeat in group H+M but not group B proteins. The Ka values for regions flanking repeats in group H+M genes were twice the average for human-mouse sequence pairs calculated by Makalowski and Boguski (1998)
, 0.201 compared with 0.090, consistent with our suggestion of high rates of sequence change near disease-causing repeats (Djian, Hancock, and Chana 1996
), although this difference was not significant (Mann-Whitney U test). These observations indicate that there have been considerably larger differences in strength of selection than in mutation rate in these proteins. If a high Ka value indicates a low level of purifying selection, polyglutamine repeats in proteins in groups H and M could have evolved as effectively neutral structures in a low-purifying-selection environment. Repeats in the group B genes, on the other hand, may have been conserved in a high-purifying-selection environment. The significant correlation between Ka and CAG length difference between species is consistent with this.
The stronger purifying selection acting on the polyglutamine repeats in group B proteins is also consistent with the observation of a significant difference in the lengths of polyglutamine repeats of humans and mice in these genes: there may be differences in the strength or type of selection acting on these repeats between the two species. This, in turn, may reflect in some way the functions of these structures in the two species. However, this difference in repeat length appears to be a special property of genes that have a repeat in both species, as lengths of CAG repeats did not show any evidence of significant difference between species overall. This difference would therefore not appear to be relevant to neutrally evolving repeats, such as those found in the human disease genes.
Whether or not polyglutamine repeats in proteins affect function remains unclear. Sequence analysis has not provided clear evidence for their functional importance (Treier, Pfeifle, and Tautz 1989
; Green and Wang 1994
; Karlin and Burge 1996
; Michalakis and Veuille 1996
; Tautz and Nigro 1998
; Schmid and Tautz 1999
), but biochemical studies have indicated effects on protein-protein interactions (Kazemi-Esfarjani, Trifiro, and Pinsky 1995
; Lanz et al. 1995
; Pinto and Lobe 1996
; Schwechheimer, Smith, and Bevan 1998
). Our data may explain this apparent discrepancy, as they suggest that polyglutamine repeats may be neutral in some proteins and not in others and that rapidly evolving repeats are more nearly neutral than conserved repeats. Searches for a functional role for polyglutamine repeats in proteins should therefore focus on proteins, such as those in our group B, that show conservation of Gln repeats over long periods of evolutionary time.
In conclusion, we suggest that the following interplay of forces influences the emergence of polyglutamine repeats. Glutamine repeats emerge preferentially in a sequence environment biased toward an overrepresentation of Gln codons (and possibly also related codons such as CCG). These concentrations occur in a class of proteins enriched in these codons by selection for a high content of Gln (as well as Pro, His, and Ser). Repeats emerge in regions of proteins that are subject to lower-than-average levels of purifying selection (Nishizawa, Nishizawa, and Kim 1999
), as indicated by their nonsynonymous divergence rate, although the whole proteins are not subject to atypically low levels of purifying selection. We therefore propose that emerging repeats evolve as essentially neutral structures. As such, we would expect them to be gained or lost in a manner that reflects the underlying dynamics of the mutational process, thought to be predominantly replication slippage. Recent evidence suggests that slippage shows a bias toward expansion for short repeats coupled with shortening of longer repeats (Ellegren 2000
; Xu et al. 2000
), which would give rise to net expansion of new repeats. However, changes in the strength of purifying selection acting on the region of the protein containing the repeat may result in the repeat ceasing to be a neutral structure and becoming fixed in length, as appears to have happened in the proteins in our group B, which contain a repeat in both species. Fixation of repeats, or the susceptibility of proteins to incorporation of them, may reflect the general functional class of the protein concerned, as certain classes of proteins in Saccharomyces cerevisiae, notably transcription factors and protein kinases, are significantly enriched in Gln repeats (Albà, Santibáñez-Koref, and Hancock 1999b
). If purifying selection plays an important role in regulating the emergence of CAG repeats in proteins, the recent suggestion that nonsynonymous substitution rates may vary systematically around mammalian genomes (Williams and Hurst 2000), perhaps reflecting variation in recombination frequency along chromosomes, may have implications for the chromosomal distribution of repeat-containing proteins.
| Acknowledgements |
|---|
|
|
|---|
We thank the U.K. Medical Research Council for support. E.A.W. received an MRC postgraduate studentship.
| Footnotes |
|---|
Diethard Tautz, Reviewing Editor
1 Keywords: CAG repeats
triplet expansion diseases
simple sequences
natural selection ![]()
2 Address for correspondence and reprints: John M. Hancock, Department of Computer Science, Royal Holloway University of London, Egham, Surrey TW20 0EX, United Kingdom. j.hancock{at}dcs.rhul.ac.uk ![]()
| literature cited |
|---|
|
|
|---|
Abbott, C., and D. Chambers. 1994. Analysis of CAG trinucleotide repeats from mouse cDNA sequences. Ann. Hum. Genet. 58:8794[ISI][Medline]
Albà, M. M., M. F. Santibáñez-Koref, and J. M. Hancock. 1999a. Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol. Biol. Evol. 16:16411644
. 1999b. Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J. Mol. Evol. 49:789797
Albanese, V., S. Holbert, C. Saada et al. (14 co-authors). 1998. CAG/CTG and CGG/GCC repeats in human brain reference cDNAs: outcome in searching for new dynamic mutations. Genomics 47:414418
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410[ISI][Medline]
Amos, W. 1999. A comparative approach to the study of microsatellite evolution. Pp. 6679 in D. B. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England
Aoki, M., L. Koranyi, A. C. Riggs et al. (11 co-authors). 1996. Identification of trinucleotide repeat-containing genes in human pancreatic islets. Diabetes 45:157164
Brock, G. J. R., N. H. Anderson, and D. G. Monckton. 1999. Cis-acting modifiers of expanded CAG/CTG triplet repeat expandability: associations with flanking GC content and proximity to CpG islands. Hum. Mol. Genet. 8:10611067
Brohede, J., and H. Ellegren. 1999. Microsatellite evolution: polarity of substitutions within repeats and neutrality of flanking sequences. Proc. R. Soc. Lond. B Biol. Sci. 266:825833[Medline]
Bulle, F., N. Chiannilkulchai, A. Pawlak, J. Weissenbach, G. Gyapay, and G. Guellaen. 1997. Identification and chromosomal localization of human genes containing CAG/CTG repeats expressed in testis and brain. Genome Res. 7:705715
Chambers, D. M., and C. M. Abbott. 1996. Isolation and mapping of novel mouse brain cDNA clones containing trinucleotide repeats, and demonstration of novel alleles in recombinant inbred strains. Genome Res. 6:715723
Djian, P., J. M. Hancock, and H. S. Chana. 1996. Codon repeats in genes associated with human diseases: fewer repeats in the genes of nonhuman primates and nucleotide substitutions concentrated at the sites of reiteration. Proc. Natl. Acad. Sci. USA 93:417421
Ellegren, H. 2000. Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24:400402[ISI][Medline]
Ellegren, H., C. R. Primmer, and B. C. Sheldon. 1995. Microsatellite evolution: directionality or bias? Nat. Genet. 11:360362
Genetics Computer Group. 1997. Wisconsin package. Version 9.1. GCG Genetics Computer Group. 1997. Wisconsin package. Version 9.1. GCG, Madison, Wis
Graur, D. 1985. Amino acid composition and the evolutionary rates of protein-coding genes. J. Mol. Evol. 22:5362[ISI][Medline]
Green, H., and N. Wang. 1994. Codon reiteration and the evolution of proteins. Proc. Natl. Acad. Sci. USA 91:42984302
Hancock, J. M., P. J. Shaw, F. Bonneton, and G. A. Dover. 1999. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. Mol. Biol. Evol. 16:253265[Abstract]
Hein, J. J. 1990. Unified approach to alignment and phylogenies. Methods Enzymol. 183:626645[ISI][Medline]
Higgins, D. G., and P. M. Sharp. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 5:151153
Jiang, J. X., R. H. Deprez, E. C. Zwarthoff, and P. H. Riegman. 1995. Characterization of four novel CAG repeat-containing cDNAs. Genomics 30:9193
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Jurka, J., and C. Pethiyagoda. 1995. Simple repetitive DNA sequences from primates: compilation and analysis. J. Mol. Evol. 40:120126[ISI][Medline]
Karlin, S., and C. Burge. 1996. Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc. Natl. Acad. Sci. USA 93:15601565
Kazemi-Esfarjani, P., M. A. Trifiro, and L. Pinsky. 1995. Evidence for a repressive function of the long polyglutamine tract in the human androgen receptor: possible pathogenetic relevance for the (CAG)n-expanded neuronopathies. Hum. Mol. Genet. 4:523527
Kim, S. J., B. H. Shon, J. H. Kang, K. S. Hahm, O. J. Yoo, Y. S. Park, and K. K. Lee. 1997. Cloning of novel trinucleotide-repeat (CAG) containing genes in mouse brain. Biochem. Biophys. Res. Commun. 240:239243[ISI][Medline]
King, B. L., G. Sirugo, J. H. Nadeau, T. J. Hudson, K. K. Kidd, B. M. Kacinski, and M. Schalling. 1998. Long CAG/CTG repeats in mice. Mamm. Genome 9:392393
Kruglyak, S., R. T. Durrett, M. D. Schug, and C. F. Aquadro. 1998. Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl. Acad. Sci. USA 95:1077410778
Kumar, S., T. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetics analysis. Version 1.01. Pennsylvania State University, University Park
Lanz, R. B., S. Wielands, M. Hug, and S. Rusconi. 1995. A transcriptional repressor obtained by alternative translation of a trinucleotide repeat. Nucleic Acids Res. 23:138145
Li, S. H., M. G. McInnis, R. L. Margolis, S. E. Antonarakis, and C. A. Ross. 1993. Novel triplet repeat containing genes in human brain: cloning, expression, and length polymorphisms. Genomics 16:572579
Makalowski, W., and M. S. Boguski. 1998. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:94079412
Margolis, R. L., M. R. Abraham, S. B. Gatchell, S. H. Li, A. S. Kidwai, T. S. Breschel, O. C. Stine, C. Callahan, M. G. McInnis, and C. A. Ross. 1997. cDNAs with long CAG trinucleotide repeats from human brain. Hum. Genet. 100:114122[ISI][Medline]
Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786791[ISI][Medline]
Michalakis, Y., and M. Veuille. 1996. Length variation of CAG/CAA trinucleotide repeats in natural populations of Drosophila melanogaster and its relation to the recombination rate. Genetics 143:17131725
Morin, P. A., P. Mahboubi, S. Wedel, and J. Rogers. 1998. Rapid screening and comparison of human microsatellite markers in baboons: allele size is conserved, but allele number is not. Genomics 53:1220
Mouchiroud, D., C. Gautier, and G. Bernardi. 1995. Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40:107113[ISI][Medline]
Nakachi, Y., T. Hayakawa, H. Oota, K. Sumiyama, L. Wang, and S. Ueda. 1997. Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. Mol. Biol. Evol. 14:10421049[Abstract]
Nakamura, Y., T. Gojobori, and T. Ikemura. 2000. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 25:244245
Neri, C., V. Albanese, A. S. Lebre et al. (23 co-authors). 1996. Survey of CAG/CTG repeats in human cDNAs representing new genes: candidates for inherited neurological disorders. Hum. Mol. Genet. 5:10011009
Nishizawa, M., and K. Nishizawa. 1998. Biased usages of arginines and lysines in proteins are correlated with local-scale fluctuations of the G + C content of DNA sequences. J. Mol. Evol. 47:385393[ISI][Medline]
Nishizawa, K., M. Nishizawa, and K. S. Kim. 1999. Tendency for local repetitiveness in amino acid usages in modern proteins. J. Mol. Biol. 294:937953[ISI][Medline]
Ohta, T., and Y. Ina. 1995. Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences. J. Mol. Evol. 41:717720[ISI][Medline]
Pawlak, A., N. Chiannikulchai, W. Ansorge, F. Bulle, J. Weissenbach, G. Gyapay, and G. Guellaen. 1998. Identification and mapping of 26 human testis mRNAs containing CAG/CTG repeats. Mamm. Genome 9:745748
Pearson, W. R., and D. J. Lipman. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:24442448
Pinto, M., and C. G. Lobe. 1996. Products of the grg (Groucho-related gene) family can dimerize through the amino-terminal Q domain. J. Biol. Chem. 271:3302633031
Reddy, P. H., E. Stockburger, P. Gillevet, and D. A. Tagle. 1997. Mapping and characterization of novel (CAG)n repeat cDNAs from adult human brain derived by the oligo capture method. Genomics 46:174182
Riggins, G. J., L. K. Lokey, J. L. Chastain, H. A. Leiner, S. L. Sherman, K. D. Wilkinson, and S. T. Warren. 1992. Human genes containing polymorphic trinucleotide repeats. Nat. Genet. 2:186191[ISI][Medline]
Rubinsztein, D. C. 1999. Trinucleotide expansion mutations cause diseases which do not conform to classical Mendelian expectations. Pp. 8097 in D. B. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England
Rubinsztein, D. C., B. Amos, and G. Cooper. 1999. Microsatellite and trinucleotide-repeat evolution: evidence for mutational bias and different rates of evolution in different lineages. Philos. Trans. R. Soc. Lond. B Biol. Sci. 354:10951099[ISI][Medline]
Rubinsztein, D. C., W. Amos, J. Leggo, S. Goodburn, S. Jain, S. H. Li, R. L. Margolis, C. A. Ross, and M. A. Ferguson-Smith. 1995a. Microsatellite evolutionevidence for directionality and variation in rate between species. Nat. Genet. 10:337343
Rubinsztein, D. C., W. Amos, J. Leggo, S. Goodburn, R. S. Ramesar, J. Old, R. Bontrop, R. McMahon, D. E. Barton, and M. A. Ferguson-Smith. 1994. Mutational bias provides a model for the evolution of Huntington's disease and predicts a general increase in disease prevalence. Nat. Genet. 7:525530[ISI][Medline]
Rubinsztein, D. C., J. Leggo, G. A. Coetzee, R. A. Irvine, M. Buckley, and M. A. Ferguson-Smith. 1995b. Sequence variation and size ranges of CAG repeats in the Machado-Joseph disease, spinocerebellar ataxia type 1 and androgen receptor genes. Hum. Mol. Genet. 4:15851590
Schmid, K. J., and D. Tautz. 1999. A comparison of homologous developmental genes from Drosophila and Tribolium reveals major differences in length and trinucleotide repeat content. J. Mol. Evol. 49:558566[ISI][Medline]
Schwechheimer, C., C. Smith, and M. W. Bevan. 1998. The activities of acidic and glutamine-rich transcriptional activation domains in plant cells: design of modular transcription factors for high-level expression. Plant Mol. Biol. 36:195204[ISI][Medline]
Stallings, R. L. 1994. Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. Genomics 21:116121
Tautz, D., and L. Nigro. 1998. Microevolutionary divergence pattern of the segmentation gene hunchback in Drosophila. Mol. Biol. Evol. 15:14031411
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680
Ticher, A., and D. Graur. 1989. Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein-coding genes. J. Mol. Evol. 28:286298[ISI][Medline]
Treier, M., C. Pfeifle, and D. Tautz. 1989. Comparison of the gap segmentation gene hunchback between Drosophila melanogaster and Drosophila virilis reveals novel modes of evolutionary change. EMBO J. 8:15171525[ISI][Medline]
Williams, E. J. B., and L. D. Hurst. 2000. The proteins of linked genes evolve at similar rates. Nature 407:900903
Xu, X., M. Peng, Z. Fang, and X. Xu. 2000. The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 24:396399[ISI][Medline]
Zuhlke, C., R. Kiehl, A. Johannsmeyer, K. H. Grzeschik, and E. Schwinger. 1999. Isolation and characterization of novel CAG repeat containing genes expressed in human brain. DNA Seq. 10:16[Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H. Niculita-Hirzel, M. Stock, and N. Perrin A Key Transcription Cofactor on the Nascent Sex Chromosomes of European Tree Frogs (Hyla arborea) Genetics, July 1, 2008; 179(3): 1721 - 1723. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-C. Thuillet, M. I. Tenaillon, L. K. Anderson, S. E. Mitchell, S. Kresovich, S. M. Stack, B. Gaut, and J. Doebley A Weak Effect of Background Selection on Trinucleotide Microsatellites in Maize J. Hered., January 1, 2008; 99(1): 45 - 55. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. G. Faux, G. A. Huttley, K. Mahmood, G. I. Webb, M. Garcia de la Banda, and J. C. Whisstock RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins Genome Res., July 1, 2007; 17(7): 1118 - 1127. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Slepko, A. M. Bhattacharyya, G. R. Jackson, J. S. Steffan, J. L. Marsh, L. M. Thompson, and R. Wetzel Normal-repeat-length polyglutamine peptides accelerate aggregation nucleation and cytotoxicity of expanded polyglutamine proteins PNAS, September 26, 2006; 103(39): 14367 - 14372. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.-F. Richard and B. Dujon Molecular Evolution of Minisatellites in Hemiascomycetous Yeasts Mol. Biol. Evol., January 1, 2006; 23(1): 189 - 202. [Abstract] [Full Text] [PDF] |
||||
![]() |
I.E. Aknin-Seifer, R.L. Touraine, H. Lejeune, C. Jimenez, J. Chouteau, J.P. Siffroi, K. McElreavey, T. Bienvenu, C. Patrat, and R. Levy Is the CAG repeat of mitochondrial DNA polymerase gamma (POLG) associated with male infertility? A multi-centre French study Hum. Reprod., March 1, 2005; 20(3): 736 - 740. [Abs |





