SNP allele frequency assessments display modest variability. Standard errors for the variation among the four replicate studies of each DNA pool were +/- 0.035. Standard error for the variation among the pools studied for each phenotype group was +/- 0.028. Previous validating studies for these arrays have also revealed good fits between individual and pooled genotyping, with 0.95 correlations between pooled and individually-determined genotype frequencies [21-31]. The observed pool-to-pool standard deviations from these datasets thus indicate 0.94 and 1.0 power to detect 5 and 10% allele frequency differences with α = 0.05 in nicotine dependent vs control comparisons. We have 0.45 and 0.95 power to detect 5 and 10% allele frequency differences in successful vs unsuccessful quitters. Additional false negative results are likely to derive from the additional stringent requirement that four other samples each provide supporting evidence for the nicotine dependent vs control comparisons noted here.
We first focused on the first of the two research questions: 1) smokers vs nonsmokers, with a special interest in the genes that have overlap with dependence on other substances. When we compare allele frequencies in 134 nicotine-dependent vs 320 control individuals, 88,937 of the 520,000 tested SNPs displayed t values that provide nominally-significant abuser vs control allele frequency differences at p 1). 4701 of these nominally-significant SNPs lie within 100 Kb of a cluster of nominally-positive SNPs from replicate African-American and European-American NIDA polysubstance abuser vs control comparisons. Monte Carlo p values for this convergence were 0.0002. Thus, only 2 of 10,000 Monte Carlo simulation trials that each began by selecting 88,937 random SNPs displayed so many nominally-significant results near the clustered positive results from the two NIDA samples. 2133 of the nominally-significant SNPs from the current nicotine dependent vs control comparison meet several criteria. They 1) lie near clusters of positive SNPs from both NIDA samples, 2) lie within annotated genes, 3) lie within genes that also supported by nominally-positive results from JGIDA methamphetamine abuser vs control comparisons and 4) lie within genes that are also supported by nominally-positive results from COGA alcohol dependent vs control comparisons. The Monte Carlo p value for the observed degree of convergence between the current and prior data is 0.018.
The results of the nicotine-dependence vs control comparisons from the current study provide substantial confirmation for a number of genes in several gene classes that have been nominated and confirmed in prior addict vs control studies. Seven previously nominated genes related to cell adhesion processes, CNTN6, LRRN1, SEMA3C, CSMD1, PTPRD, LRRN6C and CDH13 each receive additional support from 100,000 Monte Carlo simulation trials. The convergence between current and previously-obtained data suggest that allelic variants in these genes are thus likely to contribute to individual differences in vulnerability to a variety of addictive substances (Table 1). Four genes related to enzymatic activity, SIPA1L2, PDE1C, PDE4D and PRKG1 each receive similar support. Genes involved in protein processing, a transcriptional regulator, and genes involved in channel, transporter, structural, disease and other processes receive similar support. Three G-protein coupled receptors, the GRM7 metabotropic glutamate receptor, the orphan GPR154 and the HRH4 histamine receptor also receive such support. Each of these genes, taken individually, is thus supported by data from studies of individuals selected on the basis of their dependence on illegal substances (largely cannabis, stimulants and opiates), methamphetamine, alcohol and tobacco.
Controls for occult stratification among these subjects and poor technical quality in the nominally-positive SNPs identified here fail to provide alternative explanations for the positive results of comparisons between smokers and controls. Only 837 of the nominally-positive SNPs from the smoker-control comparisons display large allele frequency differences between European- and African-American control individuals. This number is smaller than the 2,223 SNPs that would be expected to have such properties if they were selected by chance. Only 158 of the nominally-positive SNPs from the smoker-control comparisons in these data lie among the SNPs that display the largest variation between pools in data from this and other studies using the same arrays. This number is also smaller than chance values. These comparisons thus fail to support the alternative hypotheses that either occult ethnic stratification in these samples or technical problems with assays for these SNPs provided the basis for the overall results reported here.
We next focused on the second research question: 2) successful vs unsuccessful quitters.
In comparing data from successful vs unsuccessful quitters, we identified 4,570 SNPs whose allele frequencies differ between these two groups with t values for these differences that yield nominal p values vs unsuccessful quitters cluster together to extents much greater than expected by chance if their allelic frequencies were independent of each other (Monte Carlo p vs unsuccessful quitters, but not if they represented chance independent observations. We defined clusters as chromosomal sites where 1) three or more reproducibly-positive SNPs were positioned within 0.1 Mb of each other and 2) reproducibly-positive SNPs assessed by two different array types were represented, so that all positive data did not come from just Nsp I or from Sty I arrays.
The nominally-positive SNPs from successful vs unsuccessful quitter comparisons that cluster together on small chromosomal regions also cluster together in regions that are annotated as genes to extents much greater than chance if they represented independent observations (Monte Carlo p
Neither controls for occult stratification nor for poor technical quality explain the nominally-positive SNPs from the successful vs unsuccessful quitter comparisons. The SNPs that display the largest allele frequency differences between European- and African-American controls and the SNPs that display the largest between-pool variances do not overlap with those that distinguish successful vs unsuccessful quitters at levels significantly larger than those anticipated by chance (131 vs 114 and 143 vs 114, respectively).
Haplotypes that were present at different frequencies in the successful vs unsuccessful quitters by chance, not based on ethnic stratification, could conceivably contribute to some of this clustering; we thus view the results reported here [see additional file 1] as nominally-positive genes. Nevertheless, the 221 genes identified by these clustered positive results represent a highly interesting set [see additional file 1]. Seventeen of these genes produce products related to cell adhesion, 39 genes' products relate to enzymatic activities, 37 encode receptors and/or G-protein mechanisms, 5 encode channels, 27 encode transcriptional regulators, 9 genes' products are involved in mechanisms for Mendelian disorders, 12 encode structural proteins, 4 encode proteins involved with vesicle function, 5 encode transporters, 32 encode genes involved with DNA, RNA or protein processing and 34 are genes about which so little is known that we cannot confidently place them in a functional class.