Experimental subjects
Study participants of self-reported European ancestry recruited in the Raleigh-Durham metropolitan area by advertising and word of mouth provided informed consents for studies of smoking cessation, averaged age 44 and were 45% female. These participants reported an average of 25 years of smoking, displayed initial Fagerstrom Test for Nicotine Dependence (FTND) [5] scores that averaged 6.4 and provided screening carbon monoxide levels that averaged 34.7. Participants received oral mecamylamine (10 mg/day) and either active (21 mg/24 h) or placebo nicotine skin patches for two weeks before the target quit-smoking date. After the quit-date, participants were randomly assigned to groups that received mecamylamine (10 mg/day) vs matching placebo and 21 mg/24 h vs 42 mg/24 h nicotine skin patch doses to test how mecamylamine might improve effectiveness of nicotine replacement therapy. Behavioral support and self-help quitting manuals were also provided. Fifty-five study participants reported continuous abstinence from smoking when assessed 6 weeks after the quit date. 79 participants were not abstinent at the 6 week time point. Data from these individuals was compared to data from 320 control study participants of self-reported European-American ancestry recruited in Baltimore by advertising and word of mouth who also provided informed consents, averaged age 31, were 36% female and reported no substantial lifetime histories of use of any addictive substance [21,53,54].
DNA preparation, pooling and analysis
Genomic DNA was prepared from blood [21,53,54], carefully quantitated and combined into pools representing 13 – 20 individuals of the same ethnicity and phenotype. Hybridization probes were prepared from the genomic DNA pools as described (Affymetrix Genechip Mapping Assay Manual) with precautions to avoid contamination that included use of dedicated preparation rooms and hoods. 50 ng of each pooled genomic DNA was digested by StyI or by NspI, ligated to appropriate adaptors and amplified using a GeneAmp PCR System 9700 (Applied Biosystems, Foster City, CA) with a 3 min 94°C hot start, 30 cycles of 30 sec 94°C, 45 sec 60°C, 15 sec at 68°C and a final 7 min 68°C extension. PCR products were purified (MinElute™ 96 UF kits, Qiagen, Valencia, CA). PCR products were quantitated and 40 μg were digested for 35 min at 37°C with 0.04 unit/μl DNase I. The 30–100 bp fragments resulting from DNAse treatments were end-labeled using terminal deoxynucleotidyl transferase and biotinylated dideoxynucleotides and hybridized to the appropriate Sty I or Nsp I early access Mendel® microarrays (Affymetrix, Santa Clara, CA). Arrays were stained, washed and scanned as described (Affymetrix Genechip Mapping Assay Manual) using immunopure strepavidin (Pierce, Milwaukee, WI), biotinylated antistreptavidin antibody (Vector Labs, Burlingame, CA) and R-phycoerythrin strepavidin (Molecular Probes, Eugene, OR). Fluorescence intensities were quantitated using an Affymetrix array scanner as described [21].
Identification of positive SNPs
Allele frequencies for each SNP in each DNA pool were assessed based on hybridization to the 12 "perfect match" cells on each of four arrays from replicate experiments, as described [31,55]. In brief, each cell's value was analyzed by subtracting background fluorescence intensities and normalizing background-subtracted values to the values for the highest intensities on each array. We averaged the data from the 12 perfect match cells for A and B alleles for each SNP. To facilitate comparison of data from multiple arrays, we derived the arctangent of the ratio between hybridization intensities for A and B alleles for each array. We then averaged these arctan A/B values for the four replicate arrays that assessed genotype frequencies for each pool. We calculated the mean arctan A/B ratios for nicotine dependent vs control individuals (and for quitters vs nonquitters). We divided the mean arctan A/B ratio for abusers (or quitters) by the mean arctan A/B ratio for controls (or nonquitters) to form abuser/control (or quitter/nonquitter) ratios. We generated a "t" statistic for the differences between abusers and controls or quitters and nonquitters using the formula described previously [22,31,55]. "Nominally significant" SNPs display t values with p vs control comparisons and p vs nonquitter comparisons, respectively. We thus set a relatively strict preplanned criterion for the first comparison that confirms genes with good confidence. We set a more modest criterion, with lower levels of confidence, for the second comparison that nominates genes that merit replication studies. We deleted data from SNPs on sex chromosomes and SNPs whose chromosomal positions could not be adequately determined using Mapviewer (NCBI, build 35.1) or NETAFFYX (Affymetrix, Santa Clara, CA).
Nicotine dependence variants
In preplanned assessments of the allelic variants likely to influence vulnerability to dependence on nicotine and other addictive substances, we focused on autosomal SNPs that provided convergent data with four additional abuser vs comparisons datasets; i.e. SNPs that a) display t values with p vs nicotine dependent research participants; b) identify genes that also display reproducibly-positive associations with addiction vulnerabilities in data from four other samples: i) NIDA African-American and European-American polysubstance abuser vs control comparisons based on 639,401 SNP comparisons with the requirement that both samples provide nominally significant results (p [31] ii) JGIDA (Japanese genetic investigations of drug abuse) Japanese methamphetamine abuser vs control comparisons, based on a requirement for nominal significance (p [56] (manuscript in preparation) and iii) COGA (Collaborative study on the genetics of alcoholism) alcohol dependent vs control comparisons, based on a requirement for nominal significance (p [55] and c) produce an enhanced (eg. lower) Monte Carlo p value for the overall association in comparisons of the current smoker/control data with these four other sample sets vs the Monte Carlo p values for the data from the four other sample sets alone. Each of these Monte Carlo simulation trials began with sampling from a database that contains the results from the current study and results from a larger database that contains data from the prior association studies in the four additional samples noted above to which we compare the current results. For each of these 100,000 simulation trials, a randomly-selected set of SNPs was chosen and the same procedure that had been followed for the actual data was run. The number of trials for which the results from the randomly-selected set of SNPs matched or exceeded the results actually observed from the SNPs identified in the current study was tabulated. Empirical p values were calculated by dividing the number of trials for which the observed results were matched or exceeded by the total number of Monte Carlo simulation trials performed. Since this method examines the properties of the SNPs in the current dataset, assuming independence of their allele frequencies, it should be relatively robust despite the uneven distribution of Affymetrix SNP markers across the genome.
Quit success variants
In comparing results related to successful abstinence, we use less stringent criteria. We focus on autosomal SNPs that display three features [see additional file 1]: 1) they display t values with p vs unsuccessful quitters; 2) they lie within clusters of at least three such nominally positive SNPs so that each positive SNP lies within 0.1 Mb of the nearest positive SNP; 3) they lie within genes whose functions can be inferred. We also compared these observed results to those expected by chance, based on independence of SNP allelic frequency estimates under the null hypothesis, using 10,000 – 100,000 Monte Carlo simulation trials on the database from the current study's results, as noted above [21].
Statistical power
To assess the power of our current approach, we used the observed standard deviations and mean abuser/control differences for the SNPs that provided the largest differences between control and abuser population means, the program PS v2.1.31 [57] and α = 0.05.
Control comparisons
To provide a control for the possibility that the abstainer/nonabstainer and user/control differences observed at some of the clustered, reproducibly-positive SNPs were due to occult ethnic/racial differences in the frequencies of alleles at these same SNPs between abstainers and non-abstainers or between abusers and controls, we compared the present results with those that we have previously obtained from comparisons of allele frequency data in self-reported African-American vs European-American control individuals, focusing on SNPs that display ethnicity difference scores that lie in the outlying +/- 2.5% of all differences (Table 1).
To provide a control for the possibility that the abuser-control differences observed at many of the clustered, reproducibly-positive SNPs were due to noisy assays for these SNPs, we examined the overlap between the clustered positive SNPs and the 2.5% of SNPs which display the largest variation between pools in data from this and other studies using the same arrays.