MBE Advance Access originally published online on March 8, 2007
Molecular Biology and Evolution 2007 24(5):1097-1100; doi:10.1093/molbev/msm051
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letters |
Is There Evidence for Convergent Evolution around Human Microsatellites?

* Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
E-mail: matthew.webster{at}ebc.uu.se.
| Abstract |
|---|
|
|
|---|
A study by Vowles and Amos (2004)
Key Words: microsatellite convergent evolution simulation genome evolution mutation bias
Vowles and Amos (2004)
suggested that microsatellites generate biases in the rate and spectrum of mutations in their flanking sequences. If true, this has important ramifications because it implies that a large fraction (>30%) of the human genome is subject to a previously unrecognized mode of sequence evolution. Vowles and Amos searched the human genome for instances of (AC)5 dinucleotide repeats that were at least 100 bp from the nearest (AC)2 repeat and examined the 100 bp flanking each sequence for common characteristics. We reproduced this analysis using all (AC)5 repeats in the human genome (NCBI 35). A summary of the patterns observed in flanking regions is presented in figure 1A (7,856 repeats in total). There is a pronounced periodicity in the frequency of bases, which decays with distance from the repeat. The major conclusion of Vowles and Amos was that these patterns are due to a mutagenic effect of the microsatellite sequence on its flanking regions, implying widespread biases in point mutation patterns in the human genome.
|
We wished to test an alternative possibilitythat periodicity in base frequencies occurs around perfect repeats because their flanking regions are frequently also derived from tandem repeats. Sequence variation at human microsatellites is often complex. Many loci contain interruptions or are comprised of more than one repeat motif (Bull et al. 1999
We generated an artificial ancestral sequence, which consisted of a large number of dinucleotide repeats of 25 units separated by 100 bp of random sequence. The motifs of each dinucleotide repeat array were chosen to match the known repeat composition of the human genome (reported in Katti et al. 2001
). The values used were 26% AT/TA, 20% AG/GA/CT/TC, and 54% TG/GT/AC/CA, with a negligible contribution from CG/GC. The composition of the random intervening sequence was chosen to match the known dinucleotide base composition of the human genome. This length of random sequence was included to ensure that the flanking regions of each microsatellite were completely free of tandem repeats at the start of each simulation. We estimated the average pattern of nucleotide substitution in the human genome by inferring the relative frequency of each of the 12 possible single-base changes by parsimony in nongenic, nonrepetitive sequence alignments of human and chimpanzee with baboon as an outgroup (taken from Smith et al. 2002
). These revealed a transition bias of 3.6 and a bias in mutations from G:C to A:T of 1.3. The microsatellite slippage mutation rate was set at 1,000 times the average point mutation rate.
In each cycle of the simulation, we applied a round of slippage mutations with an equal chance of expansions or contractions. The probability that a microsatellite expanded or contracted by n units was
which is similar to a geometric distribution. Expansions were performed by randomly choosing a dinucleotide within an array and repeating it the appropriate number of times. Contractions were performed by randomly deleting a tract of the appropriate number of bases within an array. Our assumption of a high slippage rate coupled with a strong bias toward expansions or contractions of a single-repeat unit is compatible with commonly accepted models of the microsatellite mutation process (Ellegren 2004
). We also applied a round of point mutations using the probabilities calculated from the primate alignments. We ran each simulation until each array had accumulated an average of 50 slippage mutations. Assuming a humanchimpanzee split of 5 Myr, this corresponds to an accumulation of point mutations equivalent to
43 Myr. We then searched the sequence for (AC)5 repeats with no (AC)2 repeats in their flanking sequence using the same procedure as for the human genome, retaining an identical sample size (n = 7,856) for further analysis.
The patterns of base frequencies found in the flanking regions of (AC)5 repeats generated by simulation (fig. 1B) are similar to those observed in the human genome, which both exhibit periodicities of the same phase. These results demonstrate that periodic patterns in flanking regions similar to those presented by Vowles and Amos can be generated by the accumulation of neutral mutations in microsatellites. In order to exclude the possibility that flanking sequences contain remnants of ancestral microsatellites, Vowles and Amos excluded those containing (AC)2 motifs from their analysis. Our results indicate that this method is inadequate. It is likely that the flanking regions of many microsatellites contain remnants of repetitive sequences that have decayed to such an extent that they are now impossible to detect by searching for particular motifs.
Vowles and Amos also reported that the strength and pattern of base periodicity depends on the 2 bases immediately flanking the (AC)5 repeat (the cassette) with some patterns exhibiting 5' to 3' asymmetry according to cassette type. Figure 2A shows the patterns observed in our reanalysis of the human genome divided by cassette. Nonrandom patterns of base frequencies are mainly restricted to cassettes with a 5' T or 3' A. Similar patterns can be observed in the simulated data (fig. 2B). However, in general, the periodicity is stronger in the simulations, and some of the cassettes (notably C/T) exhibit patterning that is not seen in the real data. In the simulations, cassette T/A has a strong and symmetrical periodicity in base frequencies. All other cassettes with a 5' T have stronger patterning in the 5' than 3' flanking sequence, whereas all other cassettes with a 3' A have stronger patterning in the 3' than 5' flanking sequence. These cassettes also show the strongest base periodicity and similar asymmetries in the real data, indicating that the decay of microsatellites by accumulation of single-base substitutions and slippage mutations could be an important process in generating these observed patterns. The cause of the asymmetric patterns in both the real and simulated data is unclear, but one possibility is that certain cassettes are more likely to be located at the 5' or 3' end of ancestral repeat tracts, which weakens the periodicity on one side of the repeat motif.
|
In order to produce the desired sample size of 7,856, we needed to search 1,353,798 repeats, indicating that about 0.6% of the ancestral arrays generate (AC)5 repeats that fit the criteria under our simulation conditions. More than 99% of the (AC)5 repeats in the sequence are derived from the ancestral arrays rather than flanking sequence, indicating that the microsatellite mutation processes are the primary way of generating these motifs in our simulations. In the human genome, (AC)5 repeats must be produced by a variety of processes other than the decay we have simulated here. However, these numbers indicate that it is plausible for a subset to be derived from a process similar to the one we have simulated. As the periodicities generated by our simulations are much stronger than observed in the human genome, only a fraction of (AC)5 repeats would be needed to be formed in this way in order to generate the observed periodicities.
We also examined the effect of modifying the length of the search pattern to all lengths between (AC)2 and (AC)15. We observed similar patterns to those presented by Vowles and Amos of an increase in the strength of base periodicities flanking shorter array lengths followed by a decline at longer array lengths. In our simulations, the peak in periodicities occurs around (AC)8 (see supplementary material, Supplementary Material online). This is because shorter AC arrays have a greater chance of appearing outside of ancestral arrays, whereas at longer lengths the AC tracts are likely to occupy a greater proportion of the full length of the microsatellite, so that the patterning extends for a shorter distance.
Microsatellites evolve by a complex interaction between point mutations and tandem repeat length mutations. Many models of microsatellite evolution have been proposed, and the interaction between these processes is poorly understood (Ellegren 2004
). We performed simulations where a sequence containing long perfect repeats was subjected to rounds of length mutations and single-base changes. In practice, long perfect repeats are very rare, and our model is only an approximation of the genesis and evolution of microsatellites. However, although not exactly the same, our simulations generated many similar base periodicities and biases to those observed in the human genome. This suggests that a subset of (AC)5 motifs in the human genome are associated with compound microsatellites derived from a comparable process. A reconstruction of the origin of all (AC)5 motifs and their flanking sequences in the human genome would be impossible. However, the simulations presented here demonstrate that periodicity in base frequencies around microsatellites, such as those reported by Vowles and Amos, can be generated without regional biases in the spectrum of mutations and that evidence for convergent evolution is currently lacking.
| Supplementary Material |
|---|
|
|
|---|
Supplementary material is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
| Acknowledgements |
|---|
|
|
|---|
This study was funded by Science Foundation Ireland and the Swedish Research Council. We thank Ken Wolfe, Hans Ellegren, Marie Sémon, Meg Woolfit, Devin Scannell, and Gavin Conant for useful comments.
| Footnotes |
|---|
1 Present address: Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| References |
|---|
|
|
|---|
Bull LN, Pabon-Pena CR, Freimer NB. Compound microsatellite repeats: practical and theoretical features. Genome Res (1999) 9:830838.
Ellegren H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet (2004) 5:435445.[CrossRef][ISI][Medline]
Katti MV, Ranjekar PK, Gupta VS. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol (2001) 18:11611167.
Smith NG, Webster MT, Ellegren H. Deterministic mutation rate variation in the human genome. Genome Res (2002) 12:13501356.
Vowles EJ, Amos W. Evidence for widespread convergent evolution around human microsatellites. PLoS Biol (2004) 2:E199.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

