The DNA primary base sequence is the simplest level of complexity governing the genetic code. However, its importance in regulating nuclear processes should not be underestimated, despite the inability to always predict outcomes based on observed patterns. For example, the consensus sequence for a given transcription factor in a promoter region does not necessary mean that protein is constitutively bound to that site. It is because of this unpredictability that the more complex nature of genomic DNA is considered in later sections of this review.
We have divided the discussion of DNA primary base sequence into four sections in order to introduce the types of anticancer drugs that interact with DNA. These include non-specific interactions and sequence-specific interactions with repetitive DNA, AT-rich DNA, or stringent consensus binding sites. The structures of the compounds vary as well as the way in which they interact with DNA. It is important to realize, however, that clear distinctions between these groups do not always exist.
Non-sequence-specific (global) DNA damage
One of the key observable differences between many types of cancer cells and normal tissues is that the former divide more rapidly. Treating cells with DNA damaging agents should perturb the cancer cell's ability to divide. The way in which this is achieved is determined by the type of DNA lesion, as well as the genetic makeup of the cell. From the point of view of cancer treatment the desired pathway is induction of cell death, as DNA repair can lead to the generation of mutations if the repair is not accurate. If a DNA lesion is not repaired prior to approach of a replication fork either by transcription coupled repair or global repair pathways, the replication fork will stall. A variety of signals are then sent out by the stalled fork so that the lesion is either repaired or bypassed, initiation of other replication origins is inhibited, and possibly apoptosis is induced [42-44].
Inducing apoptosis in anti-cancer therapy is not as straightforward as one would expect. The two major apoptotic pathways are the external death-receptor-induced pathway (which involves ligands and receptors, for example, FAS and TNF) and the mitochondria-apoptosome-mediated pathway, which is intrinsic and induced by insults such as chemotherapy and radiation . Intrinsic programmed cell death is dependent on the activation of cellular checkpoint proteins. In the case of DNA damage, sensing proteins, such as RAD9, RAD1, RAD17, and HUS1, relay the damage signal to signal transducers (Mek1, ChK1, Rad53 in yeast; ATM, ATR, ChK1, and ChK2 in mammals) and effectors (p53, BRCA1, repair proteins, etc) . In mammals the story is not as clear as it is in yeast, although many mammalian homologues of the yeast proteins have been identified. Other protein complexes may play roles in damage sensing such as the BRCA1-associated genome surveillance complex which includes BRCA1, ATM, the MRE11-RAD50-NBS1 (MRN) complex, MSH2/6 and MLH2 mismatch repair proteins, and Bloom's helicase. Activation of the damage response pathway can lead to arrest of cells at various stages in the cell cycle, induction of DNA repair, and activation of specific gene expression, as well as apoptosis. P53 is an important G1 checkpoint protein that prevents passage of the cells into S-phase via transactivation of the cyclin-dependent kinase (CDK) inhibitor, p21waf1/cip1 [48,49] but also plays a role in other cell cycle checkpoints. Other examples of proteins involved in the G1/S checkpoint include Rb-E2F pathway, G1 cyclins, and the ARF proteins[46,48,49].
The S-phase checkpoint involves ATM, ATR, ChK1 and ChK2 proteins [50-54] and leads to inhibition of initiation of replication at origins and stalling of replication forks. How this checkpoint manifests in upper eukaryotes is not entirely clear although in yeast, when a lesion is encountered by a progressing replication fork, replication protein A (RPA) binds to the single stranded DNA about the lesion which in turn recruits the Mec1/Dcd2 sensing complex (Mec1 is a homolog of mammalian ATR and ATM). This sensing complex associates with Rad24/Rfc2-5, and Rad53 (Rad53 is the ChK2 homolog) is then recruited and activated by phosphorylation in a Mec1 dependent manner [42,55]. In mammals, the DNA damage signal is sensed by ATM and ATR proteins and propagated directly or via the ChK1 and ChK2 kinases to downstream effectors including p53, BRCA1, Mus81 and CDC 25 . It should be noted that the exact roles these damage response proteins play in yeast and mammalian cells is not necessarily the same. ATM has also been shown to phosphorylate the Nijmegen breakage syndrome gene (NBS-1) in vitro and in vivo in mammalian cells in response to γ-irradiation . NBS-1 is a component of the MRN complex which is involved in recombination and repair, and thus provides a direct link between the checkpoint proteins and DNA repair. BRCA1 has also been shown to be phosphorylated in an ATM-dependent manner following DNA damage and to bind to the MRN complex [57-59].
Induction of apoptosis involves activation of signalling pathways that often shift the balance from anti-apoptotic proteins to pro-apoptotic proteins, leading to cell cycle arrest and activation of caspase enzymes. The apoptotic cell is characterized by loss in cell volume, membrane blebbing, nuclear condensation, chromatin aggregation and endonucleolytic DNA fragmentation . A variety of techniques can be used to analyze apoptosis, including flow cytometry (annexin V labelling of externalized phosphatidylserine, caspase activation using fluorescently labelled caspase inhibitors, and PI-staining DNA content analysis), microscopy (morphological changes) and gel electrophoresis (DNA laddering). In most cases more than one method is employed to clearly define the apoptotic process. This is particularly important when studying induction of apoptosis in cancer cells compared to normal cells, because expression of growth factors, tumour suppressor proteins and other cell cycle inhibitors is deregulated leading to unexpected outcomes in these classical apoptosis assays. In addition, different cell types may not activate all the pathways that result in the apoptotic cell phenotype.
The first drugs to be used to treat highly proliferative cancers were the relatively non-specific nitrogen mustard DNA alkylating agents such as chlorambucil , melphalan , and cyclophosphamide . These form monoadducts primarily at any G-N7 site in the major groove. However, the biologically important initial lesion formed by mustards in cells is interstrand cross-links, primarily at 5'GPuC sequences. There is also evidence that they cause termination of transcription . Cyclophosphamide is the most widely used mustard clinically, and is a non-specific prodrug of the active metabolite phosphoramide mustard, requiring enzymic activation by cellular mixed function oxidases. The (necessarily) high chemical reactivity of mustards leads to rapid loss of drug by interaction with other cellular nucleophiles, particularly proteins and low molecular weight thiols. This results in the development of cellular resistance by increases in the levels of low molecular weight thiols (particularly glutathione) [62,63]. Of equal importance for efficacy, much of the drug can reach the DNA with only one alkylating moiety intact, leading to mono-alkylation events which are considered to be genotoxic rather than cytotoxic. The fact that cross-linking is a two-step process adds to the proportion of (genotoxic) monoalkylation events, since the second step is very dependent on spatial availability of a second nucleophilic DNA site. Because of their genotoxicity, there is a risk of the development of second cancers from their mutagenic effects, with the most frequent alkylator-induced malignancy being acute leukemia, usually occurring a long period (3–7 years) after treatment.
Another class of even less selective alkylating agents is those which break down to very unstable intermediates that react indiscriminately. These include nitrosoureas such as streptozotocin , which has been used as a component of multi-drug protocols for Hodgkin's disease , and triazenes such as dacarbazine , widely used for malignant melanoma, and the more recent temozolomide , used increasingly for gliomas .
Mitomycin C  is an example of a more complex and sequence-specific DNA cross-linking agent. It is widely used clinically, perhaps most effectively now in bladder cancer , but its use is limited by myelosuppressive side-effects. The mitomycin C-related FR family of antibiotics, including FR900482  and related compounds, are compelling potential replacements which may in some cases offer decreased toxicity[26,27]. The FR family of compounds undergo reductive activation to form reactive mitosene derivatives, which crosslink DNA preferentially at 5'CpG'3 steps. Although generally considered a non-selective agent, there is some evidence that mitomycin and related compounds have shown some selective effects in cells. Using a modified ChIP assay with Jurkat T cells, FR900482 was shown to crosslink regions in the IL-2 and IL-2Rα promoters and the HMG I/Y, HMG 1, and HMG 2 minor groove binding proteins and not the major groove binding proteins (with overlapping DNA target sequences), Elf-1 and NFκB.
Repetitive DNA sequences
We now know that the human genome contains considerable areas of repetitive DNA sequences [70,71]. These are generally organized in heterochromatin, mainly in centromeres. These satellites consist of repeat units of several thousand base pairs. Minisatellites (also called variable number tandem repeats) and microsatellites (also called short tandem repeats) are distinctly different from satellites in that the repeat units are shorter and less complex, and they are dispersed across the genome. The difference between minisatellites and microsatellites is the latter repeat length is between 1 and 13 bp whereas the minisatellite is longer. In some cases mini- and microsatellites may serve important regulatory functions. For instance, a vast majority of the CGG trinucleotide repeats are located in the 5' untranslated regions of genes and are oriented with respect to the transcribed strand such that the mRNA contains the repeat. In addition, repetitive DNA often has the ability to take on non-B form DNA conformations which might recruit certain regulatory proteins that participate in control of gene expression.
"AT-islands", containing between 85–100% AT, are distinct minisatellite regions[3-6,15]. These islands consist of between 200 to 1000 bp of repetitive DNA. A number of critical nuclear processes are organized around AT-rich DNA sequences in the genome. In some instances, these islands function as matrix attachment regions (MARs) that organise DNA loops on the nuclear matrix and coordinate nuclear activities such as DNA replication, transcription, and mitosis. Nuclear matrix binding ability of DNA sequences can be demonstrated in vitro by preparing nuclear matrices and incubating with labelled DNA probes of sequences of interest and unlabeled competitor DNAs (either non-specific or MAR containing), followed by gel electrophoresis of washed matrices. In vivo MAR binding activity can be assessed by digestion of nuclear DNA with a number of enzymes that do not cleave within the potential MAR, followed by nuclear matrix preparation to separate the matrix-associated DNA from loop DNA. Labelled probes are then used to screen dot blots of prepared matrices and associated DNA. Enrichment of DNA sequences within the nuclear-matrix fractions versus loop DNA fraction is highly suggestive of MAR activity.
Repetitive sequences, like these "AT-islands", are notoriously unstable elements with changes occurring either through polymerase slippage or unequal recombination. The types of rearrangements that often occur at these sites include expansion and deletion of the repetitive elements. Not surprisingly, mini- and microsatellite instability features in a number of human diseases and cancers, such as human colorectal cancer and a variety of leukaemias and lymphomas. Using a variety of experiments including in vitro and in vivo MAR binding assays, Jackson et al have recently demonstrated that the AT-islands within the FRA16B fragile site are expanded and preferentially associated with the nuclear matrix in the CEM leukaemia T cell line as compared to normal WI-38 fibroblasts. This alteration in the organization of DNA in the leukaemia cell line correlates with a hypersensitivity to drugs which specifically alkylate in repetitive AT-rich regions [4,6,15].
Origins of replication and various promoter sequences are other examples of AT-rich sequences. Unlike simpler eukaryotes, mammalian origins of replication are not as clearly defined or localized on individual chromosomes. Sites have been identified however, such as the c-MYC origin that lies in the 5' region of the gene. This site is AT-rich although the c-MYC MAR located in the 3' region of the gene has a significantly higher AT content. Destabilization of these regions would undoubtedly affect the cell's ability to initiate DNA synthesis, although the impact this would have on cellular proliferation is not necessarily predictable. Genetic changes associated with instability could also affect gene expression.
Many genes have been identified that contain mini- and microsatellites of all sorts, expansions or other alterations of which have been implicated in deregulation and association with disease[70,71]. Examples include the CAG repeat in the Huntington's gene, the G/C rich repeat 600 bp upstream of the insulin gene ATG (insulin-dependent diabetes mellitus), the GAA repeat in the X25 intron associated with Friedreich's ataxia, and the G/C rich repeat downstream of the HrasI polyA signal, certain alleles of which are associated with increased cancer risk.
AT-rich regions (ORIs, MARs)
The metabolic processes at AT-rich minisatellites may be directly affected by drug-induced DNA lesions and/or hindered by the induction of cellular checkpoints and DNA damage response pathways. For example, drugs that interact specifically at such sites could interfere with essential protein/DNA interactions that then lead to delay in or inhibition of DNA synthesis or deregulation of gene expression. Drug interference could result from competition for factor binding to the sequence, deletion or alteration in the DNA sequence as a result of repair processes, and/or distortion in the local DNA confirmation[14,15].
The cyclopropyindoline compounds, including the natural product CC-1065[73,9] and related synthetic analogues like adozelesin  and bizelesin , are extremely cytotoxic DNA alkylators that target the N-3 of adenine in the minor groove of AT-rich DNA sequences. Even simpler analogues such as the hydroxyl- and aminoCBI compounds  and  show very similar patterns of DNA alkylation when compared on a section of the gpt gene, alkylating preferentially at 5'-A(A/T)AN sequences, although the amino analogue was the more efficient alkylator, and showed similar levels of potency in a variety of cell lines. A comparison of the monoalkylating derivative, adozelesin, and the related bifunctional analogue, bizelesin, showed that while both are highly AT-selective, the latter requires a target site with adenines spaced six base pairs apart, and most commonly alkylates by crosslinking adenines very preferentially at T(A/T)4A sites. Monoadducts have also been observed at A(A/T)4A sites, although to a lesser extent [3,4,76]. In silico drug/DNA binding analysis predicts that the bizelesin binding motif occurs approximately 2.8 times every 250 bp. AT-island hotspots are present once every 106 bp, and within these hotspots bizelesin sites occur 99 times every 250 bp. These long AT-islands are suspected to be the major targets for bizelesin binding and responsible for its high toxicity. Using a model AT-island DNA, actual bizelesin binding sites were determined and confirmed the in silico predictions. Bizelesin was 100 times more reactive with the model AT-island DNA than the non-AT-island model[6,15].
Woynarowski in a recent review has suggested that the potent cytotoxicity of the general indoline class of toxins is caused by disruption in critical nuclear processes that are organized around functional AT-rich DNA sequences in the genome. Matrix attachment regions, origins of replication, and candidate promoters are examples of AT-rich sequences that might be specifically targeted by these compounds as a result of their alkylation preference and sequence specificity. Furthermore, hits to AT-rich sequences by the region-specific AT binding drugs such as bizelesin or other AT-specific drugs like tallimustine that are not organized in these regions would be much less deleterious to the cell because of a lack of functional consequence. The variation in cytotoxic potencies of different indoline analogues and other AT-specific drugs could result from the ability to target these regions effectively.
The degree of susceptibility of a cancer cell to these agents may depend on a number of additional factors including deregulated gene expression and genomic instability, and in this way these compounds may be more specifically toxic to cancer cells. Woynarowski and co-workers have found that the AT-rich fragile sites Fra16B and Fra16D, and the c-MYC origin, a region commonly amplified in cancer cells, are targeted by bizelesin. In addition, we have found that 50% of the recovered mutations in the gpt gene of surviving AS52 Chinese hamster ovary cells treated with aminoCBIs are deletions in AT-rich regions (unpublished). Thus, specifically localized DNA damage can result from treatment with drugs of this class and may contribute to the potent cytotoxicity. Given the increased level of genomic instability in cancer cells, drug potency may be enhanced due to expansion of these satellite regions and/or because of deletions in these critical regions following drug treatment.
CC-1065, bizelesin and adozelesin have been shown to inhibit DNA replication in cell-free and cell-based systems (yeast and mammalian) [77-82]; however, the mechanism of inhibition is not clearly understood. Based on the DNA/adduct distortion studies[83,84], replication initiation may be inhibited by distortion of specifically targeted MARs resulting in a block in origin complex assembly necessary for proper origin firing. This could explain the very high lethality of bizelesin ( (>200 per cell) and conventional mustards (several thousands per cell) which do not demonstrate region-specific DNA binding.
Another reason why selective damage to AT-rich DNA might be important in the mechanism of drug action is that binding to these sequences affects specific gene expression. This may arise by preventing transcription factor binding, increasing the affinity of a transcription factor for its sequence, or creating unnatural binding sites. For example, CC-1065 has been shown by EMSA to inhibit TATA Box Binding protein (TBP) from binding to a DNA oligonucleotide containing the adenovirus major late promoter TATA box sequence. In this case, drug binding was thought to directly hinder minor groove binding of TBP to the TATA box. Binding of Specificity Protein 1 (SP1), a member of the SP/KLF family of transcription factors, to 6 GC boxes present in the simian virus 40 (SV40) early promoter is also inhibited by CC-1065 binding to AGTTA* between the SP1 sites, where * indicates the site of covalent modification. These authors propose that the inhibition in SP1 binding, particularly at the 3'-GC box, resulted from distortions in the DNA caused by adduct formation. High-field NMR studies of the adozelesin/DNA adduct have confirmed that drug binding distorts the DNA double-helix despite maintaining normal Watson-Crick base pairing. AT-rich sequences found in regulatory regions in other genes associated with cancer, such as c-MYC, have also been identified as sites specifically alkylated in cells treated with related compounds. Using a similar approach (real-time PCR stop assay) we have found that the aminoCBI compounds also target AT-rich sites located within the c-MYC gene in cell culture, and furthermore using real-time reverse transcriptase PCR analysis, we have found that c-MYC expression rapidly (within a few hours) decreases following treatment (unpublished). We are currently extending these studies to look at changes in protein expression levels.
Specific DNA sequences (oncogenes)
There are several drug classes that are able to span DNA and recognise a limited number of specific sequences. The most discriminatory sequence selective DNA binding compounds to be developed are the pyrrole-imidazole (Py-Im) polyamides[10,86,87]. These minor groove binding compounds are synthetic ligands that were developed based on the binding properties of the AT base selective drug distamycin A [15,88]. The dimeric hairpin Py-Im polyamides derivatives have been shown to inhibit transcription factor binding, such as TBP, NFκB, and ETS-1, to recognition sequences in vitro [89-91]. Another recent study demonstrated that Py-Im polyamides can derepress expression of the HIV long terminal repeat by inhibiting host factor LSF binding to the repressor complex sequence in the context of host cell chromatin . Despite successful inhibition of transcription factor binding to naked DNA, the hairpin polyamides have not proved to be effective at inhibiting gene expression in cells.
A new approach is to conjugate polyamides with DNA alkylating agents such as chlorambucil (Py-Im-Chl) , in the expectation of increasing their biological potential and hence therapeutic use [91,93,94]. These derivatives were shown to inhibit in vitro replication of SV40, mammalian cell growth, and genomic DNA replication, and cells treated with Py-Im-Chl conjugate arrested in G2/M. More detailed analysis of the accessibility of nuclear chromatin and effects on gene expression have been performed using the Py-Im-Chl conjugate by Dudouet et al.. Using ligation-mediated PCR to examine alkylation sites, these investigators found that the Py-Im-Chl conjugate was capable of accessing target binding sites in the HIV-1 enhancer and promoter in lymphoid cells. Microarray analysis of cellular expression profiles indicated that a limited number of genes (21 genes using 2 conjugates) were affected by polyamide-conjugate treatment, and that in each case match sites were located within the 5'-flanking region of the gene. While it is still not known how effective these agents will be clinically, it is very promising that the expression of so few genes can be altered considering the number of potential polyamide binding sites for these agents within the entire human genome (experimentally determined to be approximately 1 in 1900 bp, expected frequency calculated to be present 1 in 2048 bp).
Et-743  is a minor groove alkylating agent which was originally isolated from the sea squirt Ecteinascidia turbinata, and is currently in clinical development [95,96]. Alkylation by Et-743 of the N2 of the central guanine of the DNA binding triplet results in a conformational change in the DNA, with the minor groove widening and the double helix bending towards the major groove [97-101]. Et-743 does not have the same degree of DNA sequence specificity as the polyamide compounds. However, this compound does demonstrate a unique potential to alter gene expression of discrete loci based on the presence of GC boxes in the promoter regions. The potent cytotoxicity of Et-743 is thought to be due to inhibition of transcription factor binding, resulting in effects on transcription. For example, DNA binding of NF-Y to the CCAAT box and the transcriptional activation of MDR1 and HSP70 (genes regulated in part by NF-Y via the CCAAT motif) are affected by Et-743 [102-104]. Interestingly, constitutive expression of MDR1 and HSP70 is not affected; therefore, Et-743 may work via inhibiting activated transcription in response to certain stimuli. Microarray analysis of tumour cell lines treated with Et-743 and phthalascidin  (Pt 650), a synthetic analog of Et-743, showed similar changes in gene expression including a decrease in expression of genes which bind to CCAAT-boxes which might contribute to the repression in activation of transcription of MDR1 and HSP70 [105,107].
Et-743 also inhibits the transcription of other genes, including c-FOS, c-JUN, E2F1, H2B, and H4. The mechanism is presumably by alkylation of the guanine bases in the GC boxes present in the promoter regions, resulting in a block in the binding of transcription factors such as NF-Y, SP1 and ERG1.In vitro inhibition of the transcription factors TBP, E2F, and SRF has also been observed. Modelling studies suggest that head to tail binding of three Et-743 molecules to DNA resembles an RNA-DNA hybrid complex, and that the distortions mentioned above mimic those induced by zinc finger transcription factor binding. These investigators have speculated that such changes in the DNA could not only inhibit factor recognition but also induce DNA/Et-743/protein interactions. Et-743 also seems to exert its cytotoxic effect on cells by inducing single-strand DNA breaks (ssDB) via an interaction with the transcription-coupled repair machinery, as cells resistant to Et-743 have defects in the xeroderma pigmentosum genes, and show reduced ssDBs following treatment[110,111]. Mutations in the DNA double-strand break repair pathway, however, sensitize cells to Et-743 cytotoxicity. Taken together, these studies suggest that while inhibition of transcription factor binding to certain promoters is an important part of the Et-743 mode of action, the anti-tumour activity is also dependent on endogenous features of the cancerous cell such as certain DNA repair pathways.