*..................................................*

**Figure 1** Frequency Distribution of Isolated, Pure (AC)n Microsatellite Lengths (in Repeat Units) in the Human Genome

Black shading indicates numbers of microsatellites used for analyses in this study.

* *

* ..................................................*

**Figure 2** Frequency Distribution of the 16 Possible Microsatellite-Flanking 5′–3′ Base Combinations Relative to Random Expectation

*(A) Cassette frequencies around (AC)*_{2} microsatellites: black bars, observed; white bars, expected. Error bars show 95% confidence intervals and asterisks indicate significant difference (χ^{2} tests, 1 d.f. p

*(B) Deviation of cassette frequencies from random expectations around (AC)*_{2}, (AC)_{5}, and (AC)_{10} microsatellites: black, white, and hatched bars, respectively.

*(C) Sampled number (solid line) and proportion (dotted line) of microsatellites with cassette T/A as a function of microsatellite length.*

* *

* ..................................................*

**Figure 3** Flanking Sequence Frequency Distributions for Six Representative Nucleotide–Cassette Combinations for (AC)_{5} Microsatellites

*In each panel, the microsatellite is centrally placed, represented as a gap at position zero, and the cassette type, base, and number of sequences considered (n) are given. Frequency distributions are plotted with separate 95% confidence intervals for odd- and even-numbered positions (shading). Horizontal lines indicate mean frequencies for the 3′ and 5′ flanking regions, calculated separately. (A–F) illustrate the six main classes of patterning where either dinucleotide periodicity or 5′–3′ asymmetry are present, summarised for all cassette–base combinations in Table 1.*

* *

* ..................................................*

**Figure 4** Flanking Sequence Frequency Distributions for Three Representative Dinucleotide Motif–Cassette Combinations for (AC)_{5} Microsatellites

*(See Figure S4 for the four other patterns). In each panel, the microsatellite is centrally placed, represented as a gap at position zero, and the cassette type, dinucleotide motif, and number of sequences considered (n) are given. Frequency distributions are plotted with separate 95% confidence intervals for odd- and even-numbered positions (shading). Horizontal lines indicate mean frequencies for the 3′ and 5′ flanking regions, calculated separately. A summary of how all seven patterns are distributed among all dinucleotide motif–cassette combinations is given in Table 2.*

* *

* ..................................................*

**Figure 5** Dependence of Dinucleotide Flanking Sequence Patterning on AC Repeat Number

*Plots are as described in Figure 4. The progression for dinucleotide AT is illustrated for the commonest cassette type, (T/A). (A–F) depict AT dinucleotide frequencies, where patterning is most extreme, and show how periodicity and amplitude increase towards a maximum at around (AC)*_{10} and decline thereafter.

* *

* ..................................................*

**Figure 6** Dependence of Dinucleotide Pattern Strength on the Presence of Repeat Clusters

*Beginning with the dataset from the scenario showing strong patterning and large sample size (cassette T/A, dinucleotide AT, (AC)*_{5}; see Figure 5C), flanking sequences containing (AT)_{x} were excluded, where x equalled 2 or more (A), 3 or more (B), 4 or more (C), and 5 or more (D). Plotting conventions are the same as for Figure 4.

* *

*.................................................. *

**Figure 7** Location of Single AT Dinucleotide Motifs Relative to the Central AC Microsatellite in Flanking Sequences Lacking (AT)_{2+} Microsatellites

*Figure shows frequency of AT dinucleotides around all length classes of AC repeat microsatellites longer than (AC)*_{2} (5′ number of sequences, n = 2,924; 3′ number of sequences, n = 3,309), with significantly greater numbers at odd positions 5′ and even positions 3′. Data are for cassette T/A only. Error bars show upper 95% confidence limit.

* *

* ..................................................*

**Figure 8** Cross-Locus Similarity among Sequences Flanking Microsatellites of Similar Length

*Length classes are as follows: class 1, randomly selected sequences not containing (AC)*_{2+}; classes 2–20, (AC)_{2}–(AC)_{20}; and class 21, (AC)_{21–25}. Figure shows proportion of flanking sequences assigned on the basis of sequence similarity to their own AC repeat number class (dark grey), to the class above (grey), or to the class below (white). Expectation for assignment to self is shown by the horizontal line. Data are for cassette T/A only. Asterisks denote significant overassignment back to the same class or to an adjacent class, tested using χ^{2} tests (p

* *

*.................................................. *

**Figure 9** Dependence of Sequence Similarity among Flanking Sequences on AC Repeat Number

*The average number of matches shown (± standard error) quantifies similarity among three classes of sequence: (1) blocks of 50 bp lying immediately adjacent to a microsatellite; (2) blocks of 50 bp chosen randomly to lie between 500 and 600 bases downstream from a microsatellite; and (3) randomly selected blocks of 50 bp from around the genome. Average level of chance similarity in the genome is shown by a black line in each plot (comparison among class 3). 5′ and 3′ sequences are shown separately. Comparisons among sequence classes are shown for class 1 to class 1 (A), class 1 to class 2 for sequences at the same locus (B), class 1 to class 2 for sequences at different loci (C), and class 1 to class 3 (D).*

* *

* ..................................................*

**Figure 10** Relationship between the Probability of Assigning (AC)_{2} Microsatellite Flanking Sequences to Self and Proximity to the AC Microsatellite

*Solid line shows the probability of assignment back to self. Analysis is restricted to (AC)*_{2} flanking sequences and is based on an assignment window 25 nucleotides wide on each side of the microsatellite. Dotted line indicates assignment probability expected of random DNA sequences.

* *

* ..................................................*