The analysis below examines three increasingly complex cases: case 1 analyzes an A/T DNA target with polyamides containing only two types of rings; case 2 analyzes a general sequence A/T/C/G DNA target and a polyamide with two types of rings; and case 3 analyzes a general-sequence target with polyamides composed of three types of rings. In each case, there are two design issues. First, the optimal choice of base-binding specificity for each ring is determined by exhaustive search of the unique combinations. Second, the optimal design for a polyamide composed of these rings to target a given DNA sequence is determined by deriving upper bounds on the binding fraction (yn) for specified target sequences and then determining relationships among the binding specificities (dN) of the chosen set of rings that permit these bounds on yn to be achieved. Where possible (as in case 1), the obvious design choice is made, associating each base along the target DNA sequence with the polyamide ring that it prefers. But when the number of ring choices is less than the number of unique bases in the target sequence (as in cases 2 and 3), then one or more bases will lack a preferentially binding ring. Strategies developed for placement of rings adjacent to these bases turn out to be nonobvious and even counter-intuitive.
Because of the assumption that each polyamide ring interacts with a single DNA base, the unique characteristic for each target sequence in this study is its base content
the number of AT bp and the number of GC bp
and not its specific base sequence. Reshuffling a base sequence would only require a concomitant reshuffling of the polyamide ring sequence. Thus, an optimal choice of rings to target a sequence composed of 1-GC and 4-AT bp will apply equally well to the target sequences AGAAA, TACAT, ATATG, and others, assuming the proper rearrangement of rings within each polyamide. For brevity, a sequence such as AAAAG will represent the 25 × 5 = 160 different target sequences with 1-GC and 4-AT bp.
Case 1: A/T Target and Polyamides with Two Types of Rings. The simplest case limits the target to a sequence composed of only AT bp, within a mixed sequence genome, and limits the design of the polyamides to two types of ring. Ring specificity can be chosen in three different ways: (i) One ring binds preferentially to adenine and the other to thymine; (ii) One ring recognizes one of the bases in an AT bp, and the other ring prefers one of the bases in a GC bp; and (iii) One ring prefers cytosine and other prefers guanine.
The third choice may be discarded outright given the A/T DNA target. The first choice, an (R
A, R
T) polyamide, obviously is preferable. A polyamide with R
A next to each adenine and R
T next to each thymine will bind more tightly to the target sequence than to any nontarget sequence. Provided that the base specificity of the two rings is high enough (i.e., both
dA and
dT are large), y
n approaches one and near perfect base discrimination is possible. Fig.
2a shows isocontours of the binding fraction for different values of binding strengths
dA and
dT. Points along the
dA =
dT diagonal indicate an ideal choice of rings, providing a maximal fraction of binding to the target sequence given minimal base specificities for each polyamide ring.
Surprisingly, strong base discrimination also is possible with the second choice of rings, with one ring recognizing one of A or T and the other recognizing one of C or G, given particular values of the binding specificities. Consider an (R
A, R
G) polyamide, composed of rings specific for adenine and rings specific for guanine. Near perfect target recognition is possible if the specificity of R
A for adenine is much stronger than the specificity of R
G to guanine. The optimal polyamide design would place R
AR
G at each AT bp, with R
A adjacent to the adenine. Target discrimination improves as the binding specificity of the R
G ring is reduced to zero, effectively becoming an R
0-
placeholder ring that binds equally well to all four bases. One of the surprising findings of this work, also encountered in the two following cases, is that the optimal choice of rings will often include a placeholder in place of a base-specific ring. The performance of a polyamide composed of R
A and a completely nonspecific placeholder R
0 may be seen in Fig.
2a as points along the horizontal axis (that is, changing R
T into R
0 by setting
dT to zero). It is apparent that, to achieve the same binding fraction, each ring in an (R
A, R
T) polyamide need only have one-half the binding specificity of the R
A ring in an (R
A, R
0) polyamide.
Case 2: General Sequence Target and Polyamides with Two Types of Rings. Now consider a target containing a mixed sequence of AT bp and GC bp and polyamides with two different types of rings. Again, there are three ways of matching rings to bases: (i) an (RA, RT) combination, (ii) the four mixed combinations (RA, RG), (RA, RC), (RT, RG), and (RT, RC), and (iii) an (RC, RG) combination.
When faced with a general sequence target containing some GC bp, (R
A, R
T) polyamides provide only limited specificity. The optimal polyamide design places the ring pair R
AR
T at each AT bp in the target, but the choice of rings to bind to GC bp is not obvious. Any choice for pairing of these rings with GC will result in a polyamide that binds to many nontarget sequences as least as favorably as to the target sequence. The best compromise is to choose the ring with the minimum specificity, for instance, R
T if
dT dA, and place two of these next to each GC bp. Indeed, the binding fraction is maximized when the specificity of this weaker ring is reduced to zero, becoming a placeholder unit. An effective design places R
AR
0 with each AT and R
0R
0 with each CG in the target sequence. Because R
0R
0 binds to all base pairs equally, the upper bound on
yn is 1/4
nG, where n
G is the number of GC bp in the sequence. This discussion applies similarly to (R
C, R
G) polyamides, which will be compromised by the number of AT bp in the target sequence.
Mixed polyamides, with one ring recognizing one of adenine or thymine and the other recognizing one of cytosine or guanine, fare better. For example, consider an (R
A, R
G) polyamide with
dA >
dG. The highest binding fractions are obtained with a strongly base-specific R
A and a weaker R
G, such that
dA-
dG is large, but
dG is also large. The best design places R
AR
G with each AT bp, where the R
G acts as a placeholder on the thymine side, because
dA is significantly larger than
G. A pair of R
G rings is placed next to each GC bp. Because
dG is large, the R
GR
G rings exclude AT bp, but the two identical rings cannot distinguish GC from CG bp. This redundancy leads to an upper limit of the binding fraction of 1/2
nG. The target sequence AAAAA would be recognized perfectly. One-half of the polyamides designed to target AAAAG would bind correctly, the other one-half binding erroneously to AAAAC. Targets with additional GC bp diminish the binding fraction still further, until the worse case of GGGGG, in which only

3% of the polyamide binds to the target sequence in the best possible case.
Fig.
2b presents isocontours of the binding fraction for various values of R
A and R
G to a 5-bp sequence with 1-GC bp, AAAAG in our shorthand notation. Note that this discussion applies to many other mixed combinations: for example, a polyamide with a strong G discriminator and a weaker A discriminator would have symmetrically similar behavior, binding to G/C-rich sequences with greater specificity than A/T-rich sequences with an upper limit of the binding fraction of 1/2
nA.
Serendipitously, the imidazole and pyrrole rings currently used in polyamides are similar to the (R
G, R
0) combination. The imidazole ring acts as the R
G ring, showing a binding energy

1.1 kcal/mol stronger to guanine than to adenine, cytosine, or thymine. Pyrrole acts like a placeholder, binding to adenine, cytosine, and thymine with similar affinity, but has a guanine-excluding ability, disfavoring binding to guanine by 1.9 kcal/mol (
16). Pyrrole might be termed an R
ACT ring. Pyrrole/imidazole polyamides show improved discrimination relative to a true (R
G, R
0) combination, because of the GC-excluding ability the pyrrole-pyrrole pairs, but still show poor discriminatory ability with A/T-rich sequences, caused by the ambiguity of AT bp recognition by pyrrole-pyrrole pairs.
For an (R
G, R
ACT) polyamide with strongly specific rings, the upper bound of the binding fraction is 1/2
nA. Pyrrole and imidazole rings, however, do not have binding strengths high enough to reach this limit. Based on binding constants from Walker
et al. (
16), only

3% of the polyamide will bind specifically to a target sequence composed of 4-AT and 1-GC bp, at a concentration that saturates one-half of the target sites, as compared with the theoretical upper limit of 6.25%. This situation is plotted in Fig.
3a. Pyrrole/imidazole polyamides perform better with G/C-rich sequences: for a sequence with 1-AT and 4-GC bp, 12% of the polyamide will bind to the target sequence (upper limit:
yn = 1/2
1 = 50%), as seen in Fig.
3b. For sequences composed entirely of GC bp, the percentage rises to 18% (upper limit: 100%).
Case 3: General Sequence Targets and Polyamides with Three Types of Rings. Polyamides composed of only two types of rings, as discussed above, cannot provide the specificity needed to bind selectively to a given mixed sequence DNA target. A third type of ring must be added to allow design of polyamides to bind to any arbitrary sequence with a maximal binding fraction approaching one. For analysis of polyamides composed of three types of rings, the specificity of the three rings is ordered such that d1
d2 > d3 > 0. There are two unique ways to choose the rings: (i) the two most specific rings recognize bases in the same base pair, such as the combination dA
dT > dG > 0; and (ii) the two most specific rings recognize bases in different base pairs, such as dA
dG > dT > 0.
For the first of these two choices, in which the two most specific rings recognize bases in the same base pair, consider the combination (R
A, R
T, R
G) where
dA
dT >
dG > 0. The rings R
AR
T can be used to recognize AT bp in the target sequence with strong specificity, but the choice for GC bp is not as obvious. The best choice is to place an R
GR
G pair at each GC bp, giving some specificity of GC over AT bp, but failing to discriminate GC vs. CG inversions. Using R
GR
A or R
GR
T to bind to GC bp is a poorer strategy because these ring choices will bind to AT bp with higher affinity than GC bp. The upper limit of
yn with this design is 1/2
nG, in which n
G is the number of GC bp in the target sequence. Thus, surprisingly enough, the addition of a third ring in this combination does not add specificity over an optimal two-ring combination.
For the second of the two choices of rings, in which the two most specific rings bind to bases in different base pairs, consider an (R
G, R
A, R
T) polyamide where
dG
dA >
dT > 0. The optimal design pairs R
AR
T with AT and pairs R
GR
T with GC. For values where
dG >
dA
dT, the binding fraction approaches one, and near perfect discrimination is possible. Note that the R
GR
T ring pair will bind strongly to GC and also weakly to AT bp; this nonspecific binding may be minimized by keeping
dT low. This is a surprising result: the use of a placeholder and two base-specific rings substantially improves the specificity over a polyamide composed of three different base-specific rings. Fig.
4 plots
yn = 0.5 contours for binding of an (R
G, R
A, R
0) polyamide to five different sequences with different base content, from all A/T to all G/C. The curves cross at the
dA =
dG diagonal, indicating the design for an ideal multifunctional polyamide comprised of an effective adenine-discriminating ring, an equally effective guanine-discriminating ring, and a placeholder ring. This design will allow the creation of polyamides to target A/T-rich sequences as well as G/C-rich sequences. Note that this discussion also applies to the other choices possible in this second case, including three other polyamide designs (R
G, R
T, R
0), (R
C, R
A, R
0), and (R
C, R
T, R
0).
The addition of a placeholder ring significantly improves the sequence specificity of a polyamide design. Comparing the (R
G, R
A, R
0) polyamide in Fig.
4 with the (R
G, R
A) polyamide in Fig.
2b illustrates some of the advantages. To bind to the target sequence AAAAG with a binding fraction of

0.5, the rings in the (R
G, R
A) polyamide must have strong specificity: specificity for adenine must be >7 kcal/mol, and specificity for guanine must be >3 kcal/mol. Upon adding a placeholder ring, however, the specificity needed to achieve the same binding fraction drops to

2 kcal/mol for both rings. Moreover, the (R
G, R
A, R
0) polyamide also may approach perfect target recognition, given strong enough base specificity, whereas the upper limit of specificity for the (R
G, R
A) polyamide is 0.5. But perhaps the most attractive feature of the (R
G, R
A, R
0) polyamide is its generality: with these three rings, effective polyamides may be designed to target sequences with widely different base content.
Polyamide Length and DNA Sequence Discrimination. An additional design issue involves the optimal polyamide length for targeting a given DNA sequence. Assume that there is a specific sequence to be targeted in a genome, such as AAAGAAAA. Two opposing considerations affect the choice of polyamide length. On one hand, a short sequence will have far fewer competing sequences of the same length: a 4-bp sequence has 44-1 = 255 competing sequences, whereas a 5-bp sequence has 45-1 = 1,023. A small affinity for nonspecific sites will have a greater harmful effect with longer sequences. On the other hand, longer sequences add additional specificity to the target: if the target is AGAAAA, a short polyamide targeted to AGAAA will bind to the target but also to AGAAAC, AGAAAT, and AGAAAG, whereas the longer polyamide will bind specifically only to the target. Taking this second consideration
that longer sequences are found less frequently in a given genome
into account, the shorter polyamide will be preferred if yn/yn+1 > 4.
Surprisingly, when comparing equal concentrations of polyamides of two different lengths, a shorter polyamide will often perform better. Compare the case of two ideal polyamides, P
3 and P
4, that are composed of four different types of rings, R
A, R
T, R
G, and R
C, such that
dA =
dT =
dG =
dC =
d. Each ring is placed next to its preferred base in the target sequence and the binding fractions are calculated at equal concentrations. Fig.
5a includes values for
y3 and
y4 as a function of
d, and Fig.
5b plots
y3/
y4. For polyamides composed of strong base discriminators, at high values of
d in the graphs, both
y3 and
y4 become arbitrarily close to one so the longer polyamide performs best, binding tightly to the target sequence and binding to fewer sites in the genome. At low

values, similar to the values observed for imidazole and pyrrole rings, the fraction
y3/
y4 is greater than four in nearly all cases, indicating that the shorter polyamide shows better specificity at the given concentration. As
d increases from right to left in Fig.
5a, the value of
y3 increases from zero to one sooner than
y4 because of the smaller number of competing sequences with the shorter polyamide.
This comparison, however, is not entirely fair when approached from the therapeutic standpoint. At equal concentrations, the longer polyamide will occupy a greater fraction of its target sites than the shorter polyamide because it has the stronger binding constant. It is fairer to compare the performance of the polyamides at the same saturation of binding sites, choosing concentrations, for example, that will ensure occupancy of 90% of the target sites in a given genome. In this case, the longer polyamide is always the best choice. It gives a better binding fraction and requires lower concentrations to give the same site saturation as the shorter polyamide. Thus, for use as an antibiotic or in chemotherapy, the longer polyamide is the better choice. The equal-concentration comparison warns, however, that the longer polyamides are sensitive to increases in concentration: high concentrations of the longer polyamides will significantly compromise their specificity as more and more nonspecific sites are also targeted.
Implications for Rational Design of Linked Polyamides. A polyamide of the form (RG, RA, R0), which complements two rings recognizing components of different base pairs with a placeholder ring, is the best choice for design of a multifunctional lexitropsin, allowing the flexibility to target any given sequence using a single set of three rings. Two elements for this optimal design are available in current polyamides: imidazole acts like a moderately specific guanine-reader and pyrrole acts like a placeholder but adds G-excluding ability for extra gains in specificity. The missing element is an adenine-specific ring or a thymine-specific ring. Rational design of one of these rings is a difficult prospect because of the similar steric and hydrogen-bonding properties of the minor groove-accessible faces of adenine, thymine, and cytosine. Based on the crystallographic structure of an imidazole lexitropsin bound to DNA, Kopka et al. (15) have proposed that rings with a bulky group at the base edge contact, such as thiazole or methylpyrrole, might favor adenine over thymine because of the different placement of the adenine N3 and the thymine O2 atoms in the minor groove. Such differences apparently are used by the TATA-binding protein to differentiate TA from AT at the beginning of the TATA-box sequence (20). If thiazole, methylpyrrole, or a similar ring does indeed discriminate in this manner, the degeneracy of AT binding in the current imidazole-pyrrole polyamides would be broken, allowing synthesis of true lexitropsins.
ACKNOWLEDGEMENTS
The authors wish to thank Mary L. Kopka for helpful comments. This work was supported in part by GM-31299 from the National Institutes of Health, National Cancer Institute Grant CA-16042 and a fellowship from the Program in Mathematics and Molecular Biology at Florida State University, supported by National Science Foundation Grant DMS-9406348. This is publication 11237-MB from the Scripps Research Institute.
FOOTNOTES
To whom reprint requests should be addressed. e-mail: goodsell@scripps.edu
.