Login

Join for Free!
66556 members
table of contents table of contents

Genetical genomics is a very powerful tool to elucidate the basis of …


');

Biology Articles » Genetics » Genomics » Genetical genomics: use all data » Results and Discussion

Results and Discussion
- Genetical genomics: use all data

Variable relevance

Throughout this work, we compared the results for eQTL scan with two models, a first model (model 1) where the only effects are a general mean and the marker, and a second model (model 2) where external cDNA levels were also included as covariate (see methods). Chesler et al. [12] listed a series of ~70 highly significant eQTL (genome wise P-values < 0.04) using a model 1 type strategy. One of the most significant QTL affected the expression of the gene peroxiredoxin 2 (Prdx2), which plays a major role in protecting against oxidative stress. Figure 2 (top) shows the results with model (1), i.e., the classical approach. As expected, we found a highly significant QTL (nominal P-value ~10-15), in agreement with published results. However, when we scanned for association not only the 779 markers genotyped but also the rest of cDNA levels, the most associated variable was actually the transcript level of gene Psma7 (P < 10-20). Interestingly, this gene is involved in regulating the hypoxia-inducible factor-1alpha, a transcription factor important for cellular responses to oxygen tension. Next, we included Psma7 level as covariate in the eQTL model for Prdx2 and we reanalysed the data with model (2), testing as before for the associated significance of each marker in turn. The results are also in Figure 2 (top). Although the profile is similar, the QTL is now far more significant (P-value = 10-23 now vs. 10-15 in the previous model). This observation strongly suggests that Psma7 plays an important role in modifying the expression of Prdx2, but also that its influence is purely environmental and probably under different genetic control; as a result, including Psma7 in the model removes a large part of the residual variation.

The opposite phenomenon was observed with the transcript level of Lin7c, one of the most highly connected gene in the brain [12]. In this case, the original eQTL had a nominal P-value ~10-5 with the usual model (model 1). This significance decreased when Sacm1l expression level was included using model (2) (P = 2 × 10-3). Sacm1l transcript level was the most significant factor associated with Lin7c expression (P < 10-53), i.e., much more significant than any marker. It should also be noted that the position of the maximum statistics was shifted, from marker D12Mit234 in chromosome 12 to D6Mit116 in chromosome 6. Interestingly, the expression level of Sacm1l had an eQTL also in the neighborhood of marker D12Mit234 with model 1. This likely occurs because both traits are highly correlated (ρ = 0.99). Schadt et al. [2] proposed to compare likelihoods P(x1|m) and P(x1|x2), where x1 and x2 are cDNA measures and m, marker genotypes, in order to disentangle whether m or x2 are causal to x1. Here, we compared P(xLin7c | xSacm1l) vs. P(xSacm1l | xLin7c) but they were almost identical and thus we cannot resolve whether one gene is causal to the other by using only statistical evidence, whereas P(xLin7c | D12Mit234) was slightly more significant (P = 10-5) than P(xSacm1l | D12Mit234), P = 10-3. All this hints that Lin7c and Sacm1l are mediated through the same causal effects but that it seems that their association with D12Mit234 could be actually a false positive because it is far less significant than other eQTL found in this same study.

Table 1 shows the associated P-levels and the goodness of fit statistics provided by SVM (AUC, see methods) of marker and expression levels for the ten most significant QTL reported in Chesler et al. [12]. The results corresponding to the most highly connected gene (Lin7c) are also presented. This Table illustrates well the relevance of external transcript levels as compared to markers. Note that the most associated variable was a transcript level rather than a marker in six out of ten genes, and this occurred for the most significant QTL, i.e., where the P-values for associated markers are most significant. Figure 3 displays the AUC obtained with the best 50 markers, the best 50 transcript levels, or the 50 best variables (including both markers and gene levels) in the 67 genes that exhibited the most significant QTL. Again, it can be clearly seen that transcript levels have a significantly and consistently better predictive ability than markers. For a random cDNA, we will expect that external transcript levels will be even more relevant than for those with highly significant QTL. In the case of the most highly connected gene, Lin7c, this is evident (Table 1). Thus, both classical statistics and SVM methods suggest that, on average, external transcript levels are better predictors of a given cDNA level than markers.

The above considerations should not imply that the most significant variable is always an external cDNA level for all genes, and thus that model (2) is to be preferred over model (1). We observed that usual model (1) was to be preferred in about 25% of the most significant QTL listed by Chesler et al[12]. In Table 1, the most significant variable was a marker rather than a transcript level in four out of ten genes (Mela, Myoc, Cd59a, and Krtl-12). We investigated whether modeling can nevertheless be improved in these cases. To do that, we searched the next most significant effect among all external cDNAs and the rest of markers, computing its P-value after fitting the QTL. We observed that the next most significant variable was a transcript which in turn was highly significant (P-values ranged 10-8 – 10-30). Note that this is surely an underestimation of the influence because we were considering those expression levels for which a marker (QTL) is the most significant effect. The conclusion that we can draw, again, is that external cDNAs are likely to be very important factors to be considered in genetical genomics studies.

Reanalyzing hotspots

The observation of eQTL hotspots, i.e., genome regions that seem to harbour a much higher number of QTL than expected by chance has been largely debated in the literature [1,13-15]. This is a remarkable observation and is tempting to look for a functional significance to these regions. It suggests the presence of key regulatory, polymorphic motifs in the genome that can have a profound influence on the genome transcription activity. In a previous simulation study we observed that QTL hotspots appeared even when genotypes and microarrays were shuffled, simply as a consequence of the high correlation that exists between many expression levels [16]. It is important to notice, though, that we can use the correlation between cDNA levels to our advantage in order to improve eQTL modelling dramatically.

To investigate the nature of eQTL hotspots and the influence of model choice, we chose nine genes that were reported by Chesler et al. [12] as influenced by the largest trans regulatory QTL hotspot, i.e., that close to marker D6Mit150 on murine chromosome 6. The nine genes chosen were Reln, Chrng, Slc6a1, Calm4, Mapk6, Adra2b, Mapk1, Gad1, and Htr4. As before, we fitted models (1) and (2). In all analyses with model (1), we found that the most significant marker was D6Mit254, the closest marker to D6Mit150 and also in chromosome 6, thus in agreement with published results. The P-values of the QTL with model (1) are in Table 2, they are in the order of 10-3 – 10-4. Next, we scanned all transcript levels for each of the nine genes, the most significant ones for each gene are also listed in the Table. Note that the P-values are much smaller than the QTL (P-value < 10-30 in all cases), which clearly shows that the nine expression levels studied are much more influenced by external transcript levels than by any of the polymorphisms genotyped. We also performed a QTL scan for these selected external transcripts and we found that they mapped to regions distinct from chromosome 6, and thus that they were not members of the QTL hotspot (results not presented). Finally, we included the relevant transcript as covariate for each of the nine genes and we repeated the QTL analysis (model 2). Results are also in Table 2 (last two columns).

Two aspects are important. First, the QTL were more significant now than with model (1); sometimes significance increased dramatically, e.g., P-value changed from 10-4 to 10-21 for gene Reln. This is a clear evidence that expression levels included in the model remove noise and thus may increase power for QTL detection, as we observed previously (Figure 2 top). We did not always observe this, in other instances we found that adjusting for transcript levels decreased QTL significance (Figure 2 bottom). In these latter cases we can conclude that the QTL found with the simple model is an artefact and that the QTL was truly affecting other gene expression level. The second noticeable aspect in Table 2 is that the QTL location was shifted for some transcripts (Reln, Calm4, Adra2b, and Htr4). Although we did not systematically search for all cDNA levels affected by this QTL, it is clear that the number of the genes in the hotspot has been reduced by ~50%. This is a challenging result and calls for revisiting the significance of eQTL hotspots, as they are highly dependent on the model used. From a purely statistical – rational – point of view, a model that includes a highly correlated transcript level is to be preferred over one that includes only the marker: P-values in the order of 10-30 to 10-45 vs. ~10-4.


rating: 0.00 from 0 votes | updated on: 30 Jun 2009 | views: 3330 |

Rate article:







excellent!bad…