Login

Join for Free!
17779 members
table of contents table of contents

Large-scale expressed sequence tag (EST) – based bioinformatics analysis deal with the heterogenous …


Biology Articles » Bioinformatics » In silico identification and comparative analysis of differentially expressed genes in human and mouse tissues » Discussion

Discussion
- In silico identification and comparative analysis of differentially expressed genes in human and mouse tissues

Knowledge of the tissue in which a gene is specifically or preferentially expressed is often an important clue to its function. The very large database of ESTs has been a useful source for extracting such information by bioinformatics approaches. Several related bioinformatics tools, including the NCBI's Digital Differential Display (DDD) [39], are available, but they usually require the user to manually specify which libraries for the two groups of tissues should be included in the comparison. Others, such as TissueInfo [17] and ExQuest [18], like the present approach, use tissue hierarchies to extract ESTs from the tissue being searched. TissueInfo only includes normal tissues and does not provide quantified expression profiles. Although ExQuest distinguishes between tumor-related and normal tissues, it also does not give quantitative gene expression results [40]. Although more and more EST-based differential expression analyses are being reported, they have so far mostly been confined to specific tissues (e.g. placenta [29], heart [41], and retina [42]). Thus, a convenient and integrated web database that allows users to conduct a large-scale analysis is needed. The present work addressed this need by carefully classifying the EST libraries and creating a database that will allow the user to access the whole statistical test results using search options of both gene and tissue. The procedure used to group EST libraries into tissues (Fig. 6), a task difficult to automate because of different nomenclatures, spelling errors, and other deficiencies in the EST report files, can serve as a template for the cataloguing of new libraries.

The total numbers of genes we tested from normal tissues were 72865 for human and 30172 for mouse. The number of genes classified as "differentially expressed" was dictated by the p value threshold (Fig. 1), where one expects more false positives for larger p values. The number of false positives, genes falsely classified as "differentially expressed", can be estimated based on Bonferroni correction [43]: at 1E-6 p value, for example, the predicted false positives were 0.07 for human (72865 × 1E-6) and 0.03 for mouse (30172 × 1E-6). This and the observation that most genes expressed in 3 tissues or less at p 1) suggested that 1E-6 was a reasonable threshold to use for detecting differentially expressed genes in our analysis. Note also that the p value was used here merely as an index to rank expression level and should not be taken as a bona fide probability measure [41].

Overall, our analysis showed that genes identified as differentially expressed by EST analysis generally did not correspond well to those detected by microarray; a similar observation of a weak correlation between the two systems has been previously noted [24]. Nevertheless, as the p value threshold of the A-C test defining differential expression became more stringent, the correlation became more evident, although the degree to which this occurred varied with tissue type (Fig. 2). The factors responsible for the discrepancies between different experimental methods and between different tissues remain poorly understood and require future investigations.

Similar to the comparison with microarray, the tissue-based p value correlation between human and mouse orthologs also became stronger as the threshold for defining tissue-specific orthologs was set smaller, suggesting that tissue-specific orthologs tend to have more similar expression patterns than those lacking significant specificity (Table 2). At p r ≥ 0.6) or very strong (r ≥ 0.8) correlations in terms of their tissue distribution and specificity (Table 3).

Orthologs with significant disparity were also observed. Some, such as KIAA0748,MS4A1, and SLC2A6, differed from their orthologous counterpart only in the level of specificity (p value). Others, such as HATH6 and its mouse ortholog, are preferentially expressed in entirely different tissue(s). Many factors, such as heterogeneity of the tissue samples used to construct EST libraries and insufficient ESTs for theses genes, could contribute to these significant disparities. Inaccurate ortholog pairing is also a potential source of error. For example, with the identification of MUC13, it is now evident that Ly64 had been mistaken for the ortholog of LY64. This mistake has been corrected in a recent release of HomoloGene (on Mar 24, 2005), but is still present in MGI (Mouse Genome Informatics [44]), another widely used curated database of human and mouse orthologous genes. Of course, the observed disparities, especially those substantiated by other sources of data, may indeed represent real phenomena, suggesting that some orthologous genes, despite sharing similar genotypic features, could have disparate phenotypes.


rating: 0.00 from 0 votes | updated on: 31 Oct 2006 | views: 389 |

Rate article:







excellent!bad…